Skip to content

Add analytics.query.max_plan_depth dynamic cluster setting#22331

Open
finnegancarroll wants to merge 1 commit into
opensearch-project:mainfrom
finnegancarroll:feature/max-plan-depth-setting
Open

Add analytics.query.max_plan_depth dynamic cluster setting#22331
finnegancarroll wants to merge 1 commit into
opensearch-project:mainfrom
finnegancarroll:feature/max-plan-depth-setting

Conversation

@finnegancarroll

Copy link
Copy Markdown
Contributor

Description

Converts the hardcoded plan depth limit (15) into a dynamic cluster setting that operators can tune at runtime.

Setting

analytics.query.max_plan_depth
  • Default: 15
  • Min: 1
  • Scope: NodeScope, Dynamic
  • Update: PUT /_cluster/settings {"transient": {"analytics.query.max_plan_depth": 20}}

Behavior

Queries with Calcite RelNode trees deeper than this limit are rejected with HTTP 400 before execution begins:

Query plan exceeds maximum depth (15) for index extraction. Simplify the query by reducing nested joins or subqueries.

Testing

  • Unit test: RelNodeUtilsTests.testDepthGuardThrowsOnExcessiveDepth
  • Integration test: MaxPlanDepthSettingIT — verifies a query succeeds at default depth, then is rejected after lowering the setting to 1

@finnegancarroll finnegancarroll requested a review from a team as a code owner June 26, 2026 21:18
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

(Review updated until commit c1be765)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Shared Mutable Test State

dataProvisioned is a static field but is not reset between test class instances or runs. More importantly, if resetMaxPlanDepth() fails (e.g., because setMaxPlanDepth(1) itself succeeded but a later step threw before entering the try block, or if the reset request fails), the cluster is left with max_plan_depth=1, breaking subsequent tests. Consider moving setMaxPlanDepth inside the try block and using @After cleanup to guarantee restoration.

setMaxPlanDepth(1);
try {
    Request rejected = new Request("POST", "/_analytics/ppl");
    rejected.setJsonEntity("{\"query\": \"" + query + "\"}");
    try {
        client().performRequest(rejected);
        fail("Expected 400 for query exceeding max_plan_depth=1");
    } catch (ResponseException e) {
        assertEquals("should be 400", 400, e.getResponse().getStatusLine().getStatusCode());
        String body = new String(e.getResponse().getEntity().getContent().readAllBytes(), java.nio.charset.StandardCharsets.UTF_8);
        assertTrue("error mentions depth: " + body, body.contains("maximum depth"));
    }
} finally {
    // Restore default so other tests are not affected
    resetMaxPlanDepth();
}

@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Latest suggestions up to c1be765

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Guarantee test cleanup via try block

The setMaxPlanDepth(1) call is outside the try/finally, so if it throws, the reset
in finally is skipped (though also unnecessary in that path). More importantly, if
the initial performRequest(ok) succeeds but any following assertion fails before
resetMaxPlanDepth() completes (e.g., an unexpected exception type), the transient
cluster setting leaks to subsequent tests. Move setMaxPlanDepth inside the try block
to keep semantics tight.

sandbox/qa/analytics-engine-rest/src/test/java/org/opensearch/analytics/qa/MaxPlanDepthSettingIT.java [45-52]

-// Lower the max plan depth to 1 — any non-trivial query will be rejected
-setMaxPlanDepth(1);
 try {
+    // Lower the max plan depth to 1 — any non-trivial query will be rejected
+    setMaxPlanDepth(1);
     Request rejected = new Request("POST", "/_analytics/ppl");
     rejected.setJsonEntity("{\"query\": \"" + query + "\"}");
     try {
         client().performRequest(rejected);
         fail("Expected 400 for query exceeding max_plan_depth=1");
Suggestion importance[1-10]: 5

__

Why: Moving setMaxPlanDepth(1) inside the try block is a reasonable robustness improvement to ensure cleanup runs even if the setter behaves unexpectedly, but the practical impact is minor in this test context.

Low
Possible issue
Verify setting is registered with plugin

For MAX_PLAN_DEPTH to be dynamically updatable at runtime, the setting must be
registered in the plugin's getSettings() list. Verify that
AnalyticsQuerySettings.all() is wired into the plugin's registered settings;
otherwise addSettingsUpdateConsumer will fail at startup with an "unregistered
setting" error.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/DefaultPlanExecutor.java [156-157]

+this.maxPlanDepth = AnalyticsQuerySettings.MAX_PLAN_DEPTH.get(clusterService.getSettings());
+clusterService.getClusterSettings().addSettingsUpdateConsumer(AnalyticsQuerySettings.MAX_PLAN_DEPTH, v -> maxPlanDepth = v);
 
-
Suggestion importance[1-10]: 4

__

Why: This is a verification-only suggestion with identical existing_code and improved_code. While the concern about setting registration is valid, AnalyticsQuerySettings.all() does include MAX_PLAN_DEPTH, and the suggestion only asks to verify.

Low

Previous suggestions

Suggestions up to commit c1be765
CategorySuggestion                                                                                                                                    Impact
General
Avoid static provisioning flag across tests

The static flag dataProvisioned persists across test class instances within a single
JVM but can cause issues if the test cluster state is reset between runs while the
static remains true, leading to missing data on subsequent invocations. Consider
making it instance-level or guarding it with an existence check against the cluster,
to avoid flaky tests when the class is reloaded or data is wiped.

sandbox/qa/analytics-engine-rest/src/test/java/org/opensearch/analytics/qa/MaxPlanDepthSettingIT.java [25-32]

-private static boolean dataProvisioned = false;
+private boolean dataProvisioned = false;
 
 private void ensureDataProvisioned() throws IOException {
     if (dataProvisioned == false) {
         DatasetProvisioner.provision(client(), DATASET);
         dataProvisioned = true;
     }
 }
Suggestion importance[1-10]: 4

__

Why: Changing the dataProvisioned flag from static to instance-level is a reasonable suggestion for test reliability, but the impact is minor and depends on test framework specifics not fully shown in the diff.

Low
Document concurrency and bounds for depth setting

maxPlanDepth is read on the request-handling path via extractIndices and updated by
a settings consumer concurrently. Although the field is volatile, ensure no other
code path captures a stale value; also validate that the setting's minimum (1) is
enforced to prevent a depth that rejects all plans inadvertently — the lower bound
is fine but consider documenting/guarding values that would block normal queries.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/DefaultPlanExecutor.java [103]

+private volatile int maxConcurrentShardRequestsPerNode;
+private volatile int maxPlanDepth;
 
-
Suggestion importance[1-10]: 2

__

Why: The suggestion's existing_code and improved_code are identical, providing no actionable change. It mostly asks for verification/documentation, which has low impact.

Low
Suggestions up to commit 00efe15
CategorySuggestion                                                                                                                                    Impact
General
Avoid static state for test provisioning flag

Using a static flag for dataProvisioned means the provisioning state persists across
test class instances and JVM reuse, which can cause flakiness if the cluster state
is reset between runs but the flag is not. Consider making it non-static, or guard
it by checking whether the index actually exists on the cluster.

sandbox/plugins/analytics-engine/src/test/java/org/opensearch/analytics/planner/RelNodeUtilsTests.java [25]

-private static boolean dataProvisioned = false;
+private boolean dataProvisioned = false;
Suggestion importance[1-10]: 3

__

Why: The suggestion targets the wrong file (it references RelNodeUtilsTests.java but the code is in MaxPlanDepthSettingIT.java). The point about static state is minor and the pattern is common in IT tests for data provisioning efficiency.

Low
Validate maxDepth parameter input

extractIndices does not validate maxDepth. If a non-positive value is passed (or
eventually set via the cluster setting after a future minimum-value relaxation),
collectIndices immediately returns false and every query is rejected with a
misleading message. Add an explicit precondition or rely on Setting.intSetting
minimum bound of 1 and document it here.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/planner/RelNodeUtils.java [216-218]

 public static String[] extractIndices(RelNode plan, int maxDepth) {
+    if (maxDepth < 1) {
+        throw new IllegalArgumentException("maxDepth must be >= 1, got " + maxDepth);
+    }
     java.util.Set<String> indices = new java.util.LinkedHashSet<>();
     if (!collectIndices(plan, indices, 0, maxDepth)) {
Suggestion importance[1-10]: 3

__

Why: The setting already enforces a minimum of 1 via Setting.intSetting, so this defensive check is of limited value. It's a minor robustness improvement.

Low
Suggestions up to commit 96a965a
CategorySuggestion                                                                                                                                    Impact
General
Avoid static mutable state in integration tests

The static boolean dataProvisioned field persists across test class instances but
the JVM/cluster may be reused across test classes with different state, and more
importantly, if provisioning fails partway through it will be marked true only on
success but won't retry cleanly. Consider using a @BeforeClass setup or making this
instance-level since static mutable state in tests can cause flaky behavior when
tests are run in parallel or reordered.

sandbox/qa/analytics-engine-rest/src/test/java/org/opensearch/analytics/qa/MaxPlanDepthSettingIT.java [25-32]

 private static final Dataset DATASET = new Dataset("calcs", "calcs");
-private static boolean dataProvisioned = false;
 
-private void ensureDataProvisioned() throws IOException {
-    if (dataProvisioned == false) {
-        DatasetProvisioner.provision(client(), DATASET);
-        dataProvisioned = true;
-    }
+@BeforeClass
+public static void provisionData() throws IOException {
+    DatasetProvisioner.provision(adminClient(), DATASET);
 }
Suggestion importance[1-10]: 4

__

Why: Reasonable test-quality suggestion about avoiding static mutable state, but the impact is minor and the existing pattern is common in integration tests.

Low
Possible issue
Verify setting scope matches dynamic update path

MAX_PLAN_DEPTH is registered with Setting.Property.NodeScope only, which means it is
read from node configuration and is not a cluster-level dynamic setting despite also
having Setting.Property.Dynamic. The integration test sets it via _cluster/settings,
which will fail for NodeScope-only settings. Use Setting.Property.NodeScope together
with the appropriate scope, or remove NodeScope and ensure the setting is registered
cluster-wide so dynamic updates via _cluster/settings work.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/DefaultPlanExecutor.java [156-157]

+this.maxPlanDepth = AnalyticsQuerySettings.MAX_PLAN_DEPTH.get(clusterService.getSettings());
+clusterService.getClusterSettings().addSettingsUpdateConsumer(AnalyticsQuerySettings.MAX_PLAN_DEPTH, v -> maxPlanDepth = v);
 
-
Suggestion importance[1-10]: 2

__

Why: The claim is incorrect: in OpenSearch, NodeScope combined with Dynamic is the standard pattern for cluster-level dynamic settings (similar to existing MAX_SHARDS_PER_QUERY), so the suggestion is based on a false premise.

Low
Suggestions up to commit d09f9f1
CategorySuggestion                                                                                                                                    Impact
Possible issue
Initialize and register dynamic setting consumer

The maxPlanDepth field is declared as volatile but there's no visible initialization
or ClusterSettings.addSettingsUpdateConsumer registration shown for it. Ensure the
field is initialized from MAX_PLAN_DEPTH.get(settings) in the constructor and
registered as a dynamic update consumer, otherwise extractIndices will be called
with maxDepth=0 and reject every query.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/DefaultPlanExecutor.java [103]

-private volatile int maxPlanDepth;
+private volatile int maxPlanDepth = AnalyticsQuerySettings.MAX_PLAN_DEPTH.get(Settings.EMPTY);
Suggestion importance[1-10]: 7

__

Why: Valid concern - if maxPlanDepth isn't initialized from settings, it defaults to 0 and rejects all queries. However, the diff doesn't show the constructor, so it's possible this is already handled elsewhere in the file.

Medium
General
Synchronize shared static provisioning flag

The dataProvisioned static flag is mutated without synchronization and persists
across test instances. If tests run in parallel or the JVM is reused with state
reset, this can lead to races or stale state. Consider using @BeforeClass or
synchronizing access.

sandbox/qa/analytics-engine-rest/src/test/java/org/opensearch/analytics/qa/MaxPlanDepthSettingIT.java [25-32]

 private static final Dataset DATASET = new Dataset("calcs", "calcs");
-private static boolean dataProvisioned = false;
+private static volatile boolean dataProvisioned = false;
 
-private void ensureDataProvisioned() throws IOException {
+private synchronized void ensureDataProvisioned() throws IOException {
     if (dataProvisioned == false) {
         DatasetProvisioner.provision(client(), DATASET);
         dataProvisioned = true;
     }
 }
Suggestion importance[1-10]: 3

__

Why: Minor test-only concern. JUnit tests typically don't run in parallel within a class, so the race condition risk is low, making this a marginal improvement.

Low

@github-actions

Copy link
Copy Markdown
Contributor

❌ Gradle check result for d09f9f1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@finnegancarroll finnegancarroll force-pushed the feature/max-plan-depth-setting branch from d09f9f1 to 96a965a Compare June 29, 2026 18:37
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 96a965a

@github-actions

Copy link
Copy Markdown
Contributor

❌ Gradle check result for 96a965a:

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@finnegancarroll finnegancarroll force-pushed the feature/max-plan-depth-setting branch from 96a965a to 00efe15 Compare June 29, 2026 22:12
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 00efe15

@github-actions

Copy link
Copy Markdown
Contributor

❌ Gradle check result for 00efe15: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions

Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 00efe15.

PathLineSeverityDescription
sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/settings/AnalyticsQuerySettings.java92mediumMAX_PLAN_DEPTH is defined with a minimum of 1 but no upper bound. An admin (or attacker with cluster-admin access) can set it to Integer.MAX_VALUE, effectively disabling the depth guard that protects against stack-overflow or DoS via pathologically deep query plans. The previous hardcoded constant provided an unconditional ceiling; the new configurable setting removes that guarantee. An explicit maximum (e.g., 500 or 1000) should be enforced in the Setting definition.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 1 | Low: 0


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

Converts the hardcoded MAX_EXTRACT_INDICES_DEPTH (15) into a dynamic
cluster setting that can be tuned at runtime via PUT /_cluster/settings.

Setting: analytics.query.max_plan_depth
Default: 15
Min: 1
Scope: NodeScope, Dynamic

Queries with RelNode trees exceeding this depth are rejected with 400
before execution begins.

Signed-off-by: Finn Carroll <carrofin@amazon.com>
@finnegancarroll finnegancarroll force-pushed the feature/max-plan-depth-setting branch from 00efe15 to c1be765 Compare June 30, 2026 22:27
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit c1be765

@github-actions

Copy link
Copy Markdown
Contributor

❌ Gradle check result for c1be765: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit c1be765

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

❕ Gradle check result for c1be765: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@codecov

codecov Bot commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.41%. Comparing base (319d78a) to head (c1be765).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #22331      +/-   ##
============================================
+ Coverage     73.40%   73.41%   +0.01%     
- Complexity    76075    76093      +18     
============================================
  Files          6076     6076              
  Lines        345500   345500              
  Branches      49732    49732              
============================================
+ Hits         253608   253660      +52     
+ Misses        71707    71645      -62     
- Partials      20185    20195      +10     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant