fix(tiering): Track force merges in activeMerges to prevent tiering before force merge completion#22370
Conversation
PR Reviewer Guide 🔍(Review updated until commit 6cc8bfc)Here are some key observations to aid the review process:
|
…ace condition MergeScheduler.forceMerge() runs merges by calling runMerge() directly, bypassing submitMergeTask() which is the only place that increments activeMerges. This makes in-flight force merges invisible to onMergesDrained(), causing tiering's prepare step to proceed immediately while a force merge is still running. When AutoForceMergeManager triggers a force merge on an idle shard and tiering is triggered before it completes, the merge finishes after shard relocation. The merged segment is never uploaded to remote store (RemoteStoreRefreshListener is already closed), leaving the replica permanently diverged with a linearly growing replication lag. Changes: - Increment activeMerges in forceMerge() so onMergesDrained correctly waits for in-flight force merges before proceeding with tiering - Fire drain listeners in forceMerge() finally block when all merges complete (mirrors submitMergeTask behavior) - Add waitForReplicaSync() to IndexShard to verify replicas are in sync after waitForRemoteStoreSync() in the tiering prepare step - Add unit tests verifying force merges block drain and are visible to getActiveMergeCount() Signed-off-by: bkhishor <bkhishor@amazon.com>
PR Code Suggestions ✨Latest suggestions up to 6cc8bfc Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 0450a30
Suggestions up to commit c6a2b50
Suggestions up to commit 1910145
Suggestions up to commit 314ef57
Suggestions up to commit b84ffcd
|
fc254ba to
a2f82ab
Compare
|
Persistent review updated to latest commit a2f82ab |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #22370 +/- ##
============================================
- Coverage 73.44% 73.40% -0.05%
- Complexity 76114 76142 +28
============================================
Files 6076 6076
Lines 345508 345552 +44
Branches 49732 49735 +3
============================================
- Hits 253773 253653 -120
- Misses 71497 71668 +171
+ Partials 20238 20231 -7 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
Persistent review updated to latest commit 8a7d7b5 |
- Add IndexShardTests for waitForReplicaSync behavior - Refine integration test assertions Signed-off-by: bkhishor <bkhishor@amazon.com>
8a7d7b5 to
76d6942
Compare
|
Persistent review updated to latest commit 76d6942 |
…rdcoded value The force merge may or may not complete before prepare runs (timing-dependent). Assert that replica mirrors primary's segment layout rather than expecting exactly 1 segment each. Signed-off-by: bkhishor <bkhishor@amazon.com>
|
Persistent review updated to latest commit e79e354 |
|
❌ Gradle check result for e79e354: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Persistent review updated to latest commit 7379268 |
7379268 to
e8b3592
Compare
|
Persistent review updated to latest commit e8b3592 |
Signed-off-by: bkhishor <bkhishor@amazon.com>
e8b3592 to
b84ffcd
Compare
|
Persistent review updated to latest commit b84ffcd |
…clusters Analogous to MergeDrainTimeoutException.MERGE_DRAIN_TIMEOUT_MARKER, this stable marker substring enables log parsing and coordinator-side detection of replica sync timeouts during tiering preparation. Signed-off-by: bkhishor <bkhishor@amazon.com>
|
Persistent review updated to latest commit 314ef57 |
|
Persistent review updated to latest commit 1910145 |
…onTime waitForReplicaSync previously only checked checkpointsBehindCount == 0, which can be true even when the replica has stale segment layout (DFA metadata map mismatch). Now checks all three conditions: - checkpointsBehindCount == 0 - bytesBehindCount == 0 - currentReplicationTimeMillis == 0 Also upgrades log statements from DEBUG to INFO for operational visibility during tiering preparation. Adds unit tests for bytesBehind and replicationTime timeout scenarios. Signed-off-by: bkhishor <bkhishor@amazon.com>
1910145 to
c6a2b50
Compare
|
Persistent review updated to latest commit c6a2b50 |
|
Persistent review updated to latest commit 0450a30 |
Add tiering.prepare.replica_sync_timeout as a dynamic cluster setting (default 30s, range 5s-5m) so operators can tune the replica sync wait time during tiering preparation for large indices. Signed-off-by: bkhishor <bkhishor@amazon.com>
0450a30 to
6cc8bfc
Compare
|
Persistent review updated to latest commit 6cc8bfc |
Bukhtawar
left a comment
There was a problem hiding this comment.
Thanks, lets see how we can avoid blocking on replica sync, add a TODO
Description
MergeScheduler.forceMerge() bypasses the activeMerges counter, making in-flight force merges invisible to onMergesDrained(). When AutoForceMergeManager triggers a force merge and tiering starts before it completes, tiering proceeds without waiting — causing the merged segment to never reach remote store and leaving the
replica permanently diverged.
Changes
Testing
Description
[Describe what this change achieves]
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.