Bound remote translog purge for small nodes#22387
Conversation
PR Reviewer Guide 🔍(Review updated until commit f23e34b)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to f23e34b
Previous suggestionsSuggestions up to commit bc9ff3f
|
Batch remote translog metadata and data file deletions to cap memory and remote_purge queue growth on low-core nodes. Add purge batch cluster settings, single-flight coalescing per shard, and unit tests for batched delete paths. Fixes opensearch-project#20138 Signed-off-by: MdTanwer <tanw9004167@gmail.com>
bc9ff3f to
f23e34b
Compare
|
Persistent review updated to latest commit f23e34b |
|
❌ Gradle check result for f23e34b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Summary
Fix #20138
remote_purge, addressing heap and queue buildup on small-core nodes .cluster.remote_store.translog.purge_batch_size(default 500) andcluster.remote_store.translog.purge_max_batches_per_cycle(default 2), with adaptive throttling when a large backlog is detected.RemoteFsTranslog/RemoteFsTimestampAwareTranslog.