Add cross-setting validator to prevent balance factor zero crash loop#22347
Add cross-setting validator to prevent balance factor zero crash loop#22347Dhanwani wants to merge 1 commit into
Conversation
PR Reviewer Guide 🔍(Review updated until commit 78a5287)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 78a5287 Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit bb4f758
Suggestions up to commit b39b812
Suggestions up to commit 46f7506
Suggestions up to commit 45799a7
|
45799a7 to
46f7506
Compare
|
Persistent review updated to latest commit 46f7506 |
|
Persistent review updated to latest commit b39b812 |
b39b812 to
bb4f758
Compare
|
Persistent review updated to latest commit bb4f758 |
|
❌ Gradle check result for bb4f758: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Setting both cluster.routing.allocation.balance.shard and cluster.routing.allocation.balance.index to 0.0 causes an unrecoverable cluster-manager crash loop. Each setting independently allows 0.0 as minimum, but the WeightFunction requires their sum to be > 0. This adds a Setting.Validator for both balance factor settings that validates the sum > 0 constraint before the settings enter cluster state, preventing the poison pill from being published. Fixes opensearch-project#22305 Signed-off-by: Abhishek Dhanwani <f20170161h@alumni.bits-pilani.ac.in>
bb4f758 to
78a5287
Compare
|
Persistent review updated to latest commit 78a5287 |
|
❌ Gradle check result for 78a5287: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
| * | ||
| * @opensearch.internal | ||
| */ | ||
| static final class IndexBalanceFactorValidator implements Setting.Validator<Float> { |
There was a problem hiding this comment.
We don't need individual classes for them, In the constructor we can pass the setting name to be fetched
|
Spotless is failing for the new files |
Summary
Setting both
cluster.routing.allocation.balance.shardandcluster.routing.allocation.balance.indexto0.0causes an unrecoverable cluster-manager crash loop. Each setting independently allows0.0as minimum, but theWeightFunctionrequires their sum to be > 0. When both are set to zero, the cluster-manager crashes on every election attempt and becomes unrecoverable via the API.Changes
IndexBalanceFactorValidatorandShardBalanceFactorValidator(same pattern asDiskThresholdSettings.LowDiskWatermarkValidator) that validate the sum of both balance factors > 0 before settings enter cluster stateImpact
Test Plan
Manual test output (post-fix):
{ "error": { "root_cause": [ { "type": "illegal_argument_exception", "reason": "Balance factors [cluster.routing.allocation.balance.index] and [cluster.routing.allocation.balance.shard] must sum to a value greater than zero but was [0.0]" } ], "type": "illegal_argument_exception", "reason": "Balance factors [cluster.routing.allocation.balance.index] and [cluster.routing.allocation.balance.shard] must sum to a value greater than zero but was [0.0]" }, "status": 400 }Issues Resolved
Fixes #22305