Add analytics.query.max_plan_depth dynamic cluster setting by finnegancarroll · Pull Request #22331 · opensearch-project/OpenSearch

finnegancarroll · 2026-06-26T21:18:39Z

Description

Converts the hardcoded plan depth limit (15) into a dynamic cluster setting that operators can tune at runtime.

Setting

analytics.query.max_plan_depth

Default: 15
Min: 1
Scope: NodeScope, Dynamic
Update: PUT /_cluster/settings {"transient": {"analytics.query.max_plan_depth": 20}}

Behavior

Queries with Calcite RelNode trees deeper than this limit are rejected with HTTP 400 before execution begins:

Query plan exceeds maximum depth (15) for index extraction. Simplify the query by reducing nested joins or subqueries.

Testing

Unit test: RelNodeUtilsTests.testDepthGuardThrowsOnExcessiveDepth
Integration test: MaxPlanDepthSettingIT — verifies a query succeeds at default depth, then is rejected after lowering the setting to 1

github-actions · 2026-06-26T21:19:57Z

PR Reviewer Guide 🔍

(Review updated until commit `c1be765`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review Shared Mutable Test State `dataProvisioned` is a static field but is not reset between test class instances or runs. More importantly, if `resetMaxPlanDepth()` fails (e.g., because `setMaxPlanDepth(1)` itself succeeded but a later step threw before entering the try block, or if the reset request fails), the cluster is left with `max_plan_depth=1`, breaking subsequent tests. Consider moving `setMaxPlanDepth` inside the try block and using `@After` cleanup to guarantee restoration. setMaxPlanDepth(1); try { Request rejected = new Request("POST", "/_analytics/ppl"); rejected.setJsonEntity("{\"query\": \"" + query + "\"}"); try { client().performRequest(rejected); fail("Expected 400 for query exceeding max_plan_depth=1"); } catch (ResponseException e) { assertEquals("should be 400", 400, e.getResponse().getStatusLine().getStatusCode()); String body = new String(e.getResponse().getEntity().getContent().readAllBytes(), java.nio.charset.StandardCharsets.UTF_8); assertTrue("error mentions depth: " + body, body.contains("maximum depth")); } } finally { // Restore default so other tests are not affected resetMaxPlanDepth(); }

github-actions · 2026-06-26T21:20:16Z

PR Code Suggestions ✨

Latest suggestions up to c1be765

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Guarantee test cleanup via try block The `setMaxPlanDepth(1)` call is outside the try/finally, so if it throws, the reset in `finally` is skipped (though also unnecessary in that path). More importantly, if the initial `performRequest(ok)` succeeds but any following assertion fails before `resetMaxPlanDepth()` completes (e.g., an unexpected exception type), the transient cluster setting leaks to subsequent tests. Move `setMaxPlanDepth` inside the try block to keep semantics tight. sandbox/qa/analytics-engine-rest/src/test/java/org/opensearch/analytics/qa/MaxPlanDepthSettingIT.java [45-52] -// Lower the max plan depth to 1 — any non-trivial query will be rejected -setMaxPlanDepth(1); try { + // Lower the max plan depth to 1 — any non-trivial query will be rejected + setMaxPlanDepth(1); Request rejected = new Request("POST", "/_analytics/ppl"); rejected.setJsonEntity("{\"query\": \"" + query + "\"}"); try { client().performRequest(rejected); fail("Expected 400 for query exceeding max_plan_depth=1"); Suggestion importance[1-10]: 5 __ Why: Moving `setMaxPlanDepth(1)` inside the try block is a reasonable robustness improvement to ensure cleanup runs even if the setter behaves unexpectedly, but the practical impact is minor in this test context.	Low
Possible issue	Verify setting is registered with plugin For `MAX_PLAN_DEPTH` to be dynamically updatable at runtime, the setting must be registered in the plugin's `getSettings()` list. Verify that `AnalyticsQuerySettings.all()` is wired into the plugin's registered settings; otherwise `addSettingsUpdateConsumer` will fail at startup with an "unregistered setting" error. sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/DefaultPlanExecutor.java [156-157] +this.maxPlanDepth = AnalyticsQuerySettings.MAX_PLAN_DEPTH.get(clusterService.getSettings()); +clusterService.getClusterSettings().addSettingsUpdateConsumer(AnalyticsQuerySettings.MAX_PLAN_DEPTH, v -> maxPlanDepth = v); - Suggestion importance[1-10]: 4 __ Why: This is a verification-only suggestion with identical `existing_code` and `improved_code`. While the concern about setting registration is valid, `AnalyticsQuerySettings.all()` does include `MAX_PLAN_DEPTH`, and the suggestion only asks to verify.	Low

Previous suggestions

Suggestions up to commit c1be765

Category	Suggestion	Impact
General	Avoid static provisioning flag across tests The `static` flag `dataProvisioned` persists across test class instances within a single JVM but can cause issues if the test cluster state is reset between runs while the static remains `true`, leading to missing data on subsequent invocations. Consider making it instance-level or guarding it with an existence check against the cluster, to avoid flaky tests when the class is reloaded or data is wiped. sandbox/qa/analytics-engine-rest/src/test/java/org/opensearch/analytics/qa/MaxPlanDepthSettingIT.java [25-32] -private static boolean dataProvisioned = false; +private boolean dataProvisioned = false; private void ensureDataProvisioned() throws IOException { if (dataProvisioned == false) { DatasetProvisioner.provision(client(), DATASET); dataProvisioned = true; } } Suggestion importance[1-10]: 4 __ Why: Changing the `dataProvisioned` flag from static to instance-level is a reasonable suggestion for test reliability, but the impact is minor and depends on test framework specifics not fully shown in the diff.	Low
General	Document concurrency and bounds for depth setting `maxPlanDepth` is read on the request-handling path via `extractIndices` and updated by a settings consumer concurrently. Although the field is `volatile`, ensure no other code path captures a stale value; also validate that the setting's minimum (1) is enforced to prevent a depth that rejects all plans inadvertently — the lower bound is fine but consider documenting/guarding values that would block normal queries. sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/DefaultPlanExecutor.java [103] +private volatile int maxConcurrentShardRequestsPerNode; +private volatile int maxPlanDepth; - Suggestion importance[1-10]: 2 __ Why: The suggestion's `existing_code` and `improved_code` are identical, providing no actionable change. It mostly asks for verification/documentation, which has low impact.	Low

Suggestions up to commit 00efe15

Category	Suggestion	Impact
General	Avoid static state for test provisioning flag Using a `static` flag for `dataProvisioned` means the provisioning state persists across test class instances and JVM reuse, which can cause flakiness if the cluster state is reset between runs but the flag is not. Consider making it non-static, or guard it by checking whether the index actually exists on the cluster. sandbox/plugins/analytics-engine/src/test/java/org/opensearch/analytics/planner/RelNodeUtilsTests.java [25] -private static boolean dataProvisioned = false; +private boolean dataProvisioned = false; Suggestion importance[1-10]: 3 __ Why: The suggestion targets the wrong file (it references `RelNodeUtilsTests.java` but the code is in `MaxPlanDepthSettingIT.java`). The point about static state is minor and the pattern is common in IT tests for data provisioning efficiency.	Low
General	Validate maxDepth parameter input `extractIndices` does not validate `maxDepth`. If a non-positive value is passed (or eventually set via the cluster setting after a future minimum-value relaxation), `collectIndices` immediately returns false and every query is rejected with a misleading message. Add an explicit precondition or rely on `Setting.intSetting` minimum bound of 1 and document it here. sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/planner/RelNodeUtils.java [216-218] public static String[] extractIndices(RelNode plan, int maxDepth) { + if (maxDepth < 1) { + throw new IllegalArgumentException("maxDepth must be >= 1, got " + maxDepth); + } java.util.Set<String> indices = new java.util.LinkedHashSet<>(); if (!collectIndices(plan, indices, 0, maxDepth)) { Suggestion importance[1-10]: 3 __ Why: The setting already enforces a minimum of 1 via `Setting.intSetting`, so this defensive check is of limited value. It's a minor robustness improvement.	Low

Suggestions up to commit 96a965a

Category	Suggestion	Impact
General	Avoid static mutable state in integration tests The `static boolean dataProvisioned` field persists across test class instances but the JVM/cluster may be reused across test classes with different state, and more importantly, if provisioning fails partway through it will be marked `true` only on success but won't retry cleanly. Consider using a `@BeforeClass` setup or making this instance-level since static mutable state in tests can cause flaky behavior when tests are run in parallel or reordered. sandbox/qa/analytics-engine-rest/src/test/java/org/opensearch/analytics/qa/MaxPlanDepthSettingIT.java [25-32] private static final Dataset DATASET = new Dataset("calcs", "calcs"); -private static boolean dataProvisioned = false; -private void ensureDataProvisioned() throws IOException { - if (dataProvisioned == false) { - DatasetProvisioner.provision(client(), DATASET); - dataProvisioned = true; - } +@BeforeClass +public static void provisionData() throws IOException { + DatasetProvisioner.provision(adminClient(), DATASET); } Suggestion importance[1-10]: 4 __ Why: Reasonable test-quality suggestion about avoiding static mutable state, but the impact is minor and the existing pattern is common in integration tests.	Low
Possible issue	Verify setting scope matches dynamic update path `MAX_PLAN_DEPTH` is registered with `Setting.Property.NodeScope` only, which means it is read from node configuration and is not a cluster-level dynamic setting despite also having `Setting.Property.Dynamic`. The integration test sets it via `_cluster/settings`, which will fail for NodeScope-only settings. Use `Setting.Property.NodeScope` together with the appropriate scope, or remove `NodeScope` and ensure the setting is registered cluster-wide so dynamic updates via `_cluster/settings` work. sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/DefaultPlanExecutor.java [156-157] +this.maxPlanDepth = AnalyticsQuerySettings.MAX_PLAN_DEPTH.get(clusterService.getSettings()); +clusterService.getClusterSettings().addSettingsUpdateConsumer(AnalyticsQuerySettings.MAX_PLAN_DEPTH, v -> maxPlanDepth = v); - Suggestion importance[1-10]: 2 __ Why: The claim is incorrect: in OpenSearch, `NodeScope` combined with `Dynamic` is the standard pattern for cluster-level dynamic settings (similar to existing `MAX_SHARDS_PER_QUERY`), so the suggestion is based on a false premise.	Low

Suggestions up to commit d09f9f1

Category	Suggestion	Impact
Possible issue	Initialize and register dynamic setting consumer The `maxPlanDepth` field is declared as `volatile` but there's no visible initialization or `ClusterSettings.addSettingsUpdateConsumer` registration shown for it. Ensure the field is initialized from `MAX_PLAN_DEPTH.get(settings)` in the constructor and registered as a dynamic update consumer, otherwise `extractIndices` will be called with `maxDepth=0` and reject every query. sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/DefaultPlanExecutor.java [103] -private volatile int maxPlanDepth; +private volatile int maxPlanDepth = AnalyticsQuerySettings.MAX_PLAN_DEPTH.get(Settings.EMPTY); Suggestion importance[1-10]: 7 __ Why: Valid concern - if `maxPlanDepth` isn't initialized from settings, it defaults to 0 and rejects all queries. However, the diff doesn't show the constructor, so it's possible this is already handled elsewhere in the file.	Medium
General	Synchronize shared static provisioning flag The `dataProvisioned` static flag is mutated without synchronization and persists across test instances. If tests run in parallel or the JVM is reused with state reset, this can lead to races or stale state. Consider using `@BeforeClass` or synchronizing access. sandbox/qa/analytics-engine-rest/src/test/java/org/opensearch/analytics/qa/MaxPlanDepthSettingIT.java [25-32] private static final Dataset DATASET = new Dataset("calcs", "calcs"); -private static boolean dataProvisioned = false; +private static volatile boolean dataProvisioned = false; -private void ensureDataProvisioned() throws IOException { +private synchronized void ensureDataProvisioned() throws IOException { if (dataProvisioned == false) { DatasetProvisioner.provision(client(), DATASET); dataProvisioned = true; } } Suggestion importance[1-10]: 3 __ Why: Minor test-only concern. JUnit tests typically don't run in parallel within a class, so the race condition risk is low, making this a marginal improvement.	Low

github-actions · 2026-06-26T21:42:18Z

❌ Gradle check result for d09f9f1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-06-29T18:39:11Z

Persistent review updated to latest commit 96a965a

github-actions · 2026-06-29T19:24:10Z

❌ Gradle check result for 96a965a:

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-06-29T22:13:20Z

Persistent review updated to latest commit 00efe15

github-actions · 2026-06-29T23:09:27Z

❌ Gradle check result for 00efe15: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-06-30T22:15:51Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 00efe15.

Path	Line	Severity	Description
sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/settings/AnalyticsQuerySettings.java	92	medium	MAX_PLAN_DEPTH is defined with a minimum of 1 but no upper bound. An admin (or attacker with cluster-admin access) can set it to Integer.MAX_VALUE, effectively disabling the depth guard that protects against stack-overflow or DoS via pathologically deep query plans. The previous hardcoded constant provided an unconditional ceiling; the new configurable setting removes that guarantee. An explicit maximum (e.g., 500 or 1000) should be enforced in the Setting definition.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 1 | Low: 0

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

Converts the hardcoded MAX_EXTRACT_INDICES_DEPTH (15) into a dynamic cluster setting that can be tuned at runtime via PUT /_cluster/settings. Setting: analytics.query.max_plan_depth Default: 15 Min: 1 Scope: NodeScope, Dynamic Queries with RelNode trees exceeding this depth are rejected with 400 before execution begins. Signed-off-by: Finn Carroll <carrofin@amazon.com>

github-actions · 2026-06-30T22:28:36Z

Persistent review updated to latest commit c1be765

github-actions · 2026-06-30T22:44:49Z

❌ Gradle check result for c1be765: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-07-01T15:10:32Z

Persistent review updated to latest commit c1be765

github-actions · 2026-07-01T16:06:09Z

❕ Gradle check result for c1be765: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

codecov · 2026-07-01T16:06:54Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.41%. Comparing base (319d78a) to head (c1be765).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main   #22331      +/-   ##
============================================
+ Coverage     73.40%   73.41%   +0.01%     
- Complexity    76075    76093      +18     
============================================
  Files          6076     6076              
  Lines        345500   345500              
  Branches      49732    49732              
============================================
+ Hits         253608   253660      +52     
+ Misses        71707    71645      -62     
- Partials      20185    20195      +10

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

finnegancarroll requested a review from a team as a code owner June 26, 2026 21:18

finnegancarroll force-pushed the feature/max-plan-depth-setting branch from d09f9f1 to 96a965a Compare June 29, 2026 18:37

finnegancarroll force-pushed the feature/max-plan-depth-setting branch from 96a965a to 00efe15 Compare June 29, 2026 22:12

opensearch-ci-bot mentioned this pull request Jun 30, 2026

[AUTOCUT] Gradle Check Flaky Test Report for RemotePrimaryLocalRecoveryIT #14314

Open

finnegancarroll closed this Jun 30, 2026

finnegancarroll reopened this Jun 30, 2026

finnegancarroll force-pushed the feature/max-plan-depth-setting branch from 00efe15 to c1be765 Compare June 30, 2026 22:27

finnegancarroll closed this Jul 1, 2026

finnegancarroll reopened this Jul 1, 2026

Uh oh!

Conversation

finnegancarroll commented Jun 26, 2026

Description

Setting

Behavior

Testing

Uh oh!

github-actions Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit c1be765)

Uh oh!

github-actions Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

PR Code Analyzer ❗

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

codecov Bot commented Jul 1, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 26, 2026 •

edited

Loading

(Review updated until commit `c1be765`)

github-actions Bot commented Jun 26, 2026 •

edited

Loading