Skip to content

Add relevance search function support for unified SQL query#5279

Merged
dai-chen merged 4 commits intoopensearch-project:mainfrom
dai-chen:feature/register-fulltext-search-calcite-sql
Apr 7, 2026
Merged

Add relevance search function support for unified SQL query#5279
dai-chen merged 4 commits intoopensearch-project:mainfrom
dai-chen:feature/register-fulltext-search-calcite-sql

Conversation

@dai-chen
Copy link
Copy Markdown
Collaborator

@dai-chen dai-chen commented Mar 27, 2026

Description

Register relevance search functions (match, match_phrase, multi_match, etc.) as language-level functions in the unified query engine, with backward-compatible named-argument syntax support (for now).

Function Support Matrix

Function SQL V2 (Legacy) PPL V3 (Calcite) SQL (Calcite) — this PR
match
match_phrase
match_bool_prefix
match_phrase_prefix
multi_match ⚠️ single-field only
simple_query_string ⚠️ single-field only
query_string ⚠️ single-field only
query (no-field)
wildcard_query
score / scorequery
match_query (alias)
matchphrase (alias)
matchquery (alias)
multimatch (alias)
matchphrasequery (alias)
multimatchquery (alias)

Design Choices

  • Language-level functions: Relevance functions are part of the language specification, not tied to any specific data source. They are defined declaratively in UnifiedFunctionSpec and registered globally, making them always resolvable in any query context. No local execution is provided; data-source capability is enforced at optimization time by pushdown rules.
  • Named argument rewriting: V2 SQL uses non-standard key=value syntax for optional parameters (e.g., match(name, 'John', operator='AND')) instead of SQL's standard => syntax. NamedArgRewriter runs as a pre-validation pass that normalizes both positional and key=value arguments into MAP-based form (e.g., MAP('field', name), MAP('operator', 'AND')), producing identical plans as the PPL path for shared pushdown rules.

Please find more design details in #5248 (comment).

Next Step

  1. Review all breaking changes between V2 syntax and SQL standard together and decide whether to follow SQL standard or maintain backward compatibility via the current rewriting approach for this. Ref: [FEATURE] SQL query support for Analytics engine integration #5248 (comment)
  2. Register V2 function aliases (matchquery, matchphrase, multimatch, wildcardquery, matchphrasequery, multimatchquery) that are not yet covered.
  3. Multi-field bracket syntax (multi_match(['field1', 'field2'])) is not supported — only single column references are accepted. Documented with FIXME tests. Options: (1) extend the parser to support V2's bracket syntax; (2) switch to ARRAY[...] and accept the breaking change from V2.

Related Issues

Part of #5248

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dai-chen dai-chen self-assigned this Mar 27, 2026
@dai-chen dai-chen added enhancement New feature or request SQL labels Mar 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@dai-chen dai-chen force-pushed the feature/register-fulltext-search-calcite-sql branch from 8e54956 to 5f80dfc Compare March 28, 2026 01:27
@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@dai-chen dai-chen force-pushed the feature/register-fulltext-search-calcite-sql branch from 5f80dfc to 0da8394 Compare April 3, 2026 19:23
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

PR Reviewer Guide 🔍

(Review updated until commit ac0fe64)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections

🔀 Multiple PR themes

Sub-PR theme: Register relevance functions in Calcite operator table via UnifiedFunctionSpec

Relevant files:

  • api/src/main/java/org/opensearch/sql/api/spec/UnifiedFunctionSpec.java
  • api/src/main/java/org/opensearch/sql/api/UnifiedQueryContext.java

Sub-PR theme: Add NamedArgRewriter pre-validation pass for backward-compatible named-argument syntax

Relevant files:

  • api/src/main/java/org/opensearch/sql/api/parser/NamedArgRewriter.java
  • api/src/main/java/org/opensearch/sql/api/UnifiedQueryPlanner.java
  • api/src/test/java/org/opensearch/sql/api/parser/NamedArgRewriterTest.java

Sub-PR theme: Add integration tests for relevance search functions in SQL and PPL planning paths

Relevant files:

  • api/src/test/java/org/opensearch/sql/api/UnifiedRelevanceSearchSqlTest.java
  • api/src/test/java/org/opensearch/sql/api/UnifiedRelevanceSearchTest.java
  • api/src/testFixtures/java/org/opensearch/sql/api/UnifiedQueryTestBase.java

⚡ Recommended focus areas for review

Silent Rewrite Risk

In rewriteToMaps, when a named argument uses an EQUALS node but the left-hand side is not a SqlIdentifier, the code falls back to key.toString(), which may produce unexpected or backtick-decorated key strings. The comment says this avoids backtick-decorated keys for reserved words, but toString() on non-identifier nodes could produce arbitrary SQL text as a map key, silently producing incorrect query plans.

if (op instanceof SqlCall eq && op.getKind() == SqlKind.EQUALS) {
  SqlNode key = eq.operand(0);
  String name =
      key instanceof SqlIdentifier ident
          ? ident.getSimple()
          : key.toString(); // avoid backtick-decorated keys for reserved words
  maps[i] = toMap(name, eq.operand(1));
Bypassed Type Checking

VariadicOperandMetadata.checkOperandTypes always returns true, completely bypassing Calcite's operand type validation. While this is intentional (with a CALCITE-5366 reference), it means invalid argument types (e.g., passing a numeric literal where a field reference is expected) will silently pass validation and only fail at pushdown time, potentially with confusing error messages.

public boolean checkOperandTypes(SqlCallBinding binding, boolean throwOnFailure) {
  return true; // Bypass: CALCITE-5366 breaks optional argument type checking
}
Case Sensitivity

ALL_SPECS lookup in of() uses name.toLowerCase(), but function names are stored using the original funcName (e.g., "match", "multi_match"). If Calcite passes an uppercase or mixed-case function name (depending on parser casing config), the lookup may fail. This should be verified against the UNCHANGED casing config used in UnifiedQueryContext.

public static Optional<UnifiedFunctionSpec> of(String name) {
  return Optional.ofNullable(ALL_SPECS.get(name.toLowerCase()));
}
Hardcoded Category

Only UnifiedFunctionSpec.RELEVANCE is chained into the operator table. If new categories are added to UnifiedFunctionSpec in the future, they must be manually added here. Consider iterating over all categories or exposing a combined operator table from UnifiedFunctionSpec directly.

.operatorTable(
    SqlOperatorTables.chain(
        SqlStdOperatorTable.instance(), UnifiedFunctionSpec.RELEVANCE.operatorTable()))

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

PR Code Suggestions ✨

Latest suggestions up to ac0fe64

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent swallowing assertion errors in error helper

The catch (Exception e) block will also catch the AssertionError thrown on the line
above if AssertionError extends Exception — but in Java, AssertionError extends
Error, not Exception, so this is safe. However, if planner.plan() throws an Error
(not Exception), it will propagate uncaught and produce a confusing test failure.
Consider catching Throwable or at minimum documenting this assumption.

api/src/main/java/org/opensearch/sql/api/UnifiedQueryTestBase.java [153-160]

 protected QueryErrorAssert givenInvalidQuery(String query) {
   try {
     planner.plan(query);
     throw new AssertionError("Expected query to fail: " + query);
+  } catch (AssertionError ae) {
+    throw ae; // re-throw assertion failures from the line above
   } catch (Exception e) {
     return new QueryErrorAssert(e);
   }
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly notes that AssertionError extends Error not Exception, so the current code is already safe. However, explicitly re-throwing AssertionError improves clarity and defensive correctness, making it a minor but valid improvement to test infrastructure.

Low
General
Explicitly allow unbounded optional arguments

SqlOperandCountRanges.from(n) sets the minimum operand count to n but allows any
number above it. However, the paramNames list only contains the required positional
parameters (e.g., ["field", "query"]), so the minimum is correctly set to 2. But
since optional named arguments are also allowed, the upper bound should be
explicitly set to unbounded (-1) to make the intent clear and avoid potential
validation issues with Calcite's operand count checker.

api/src/main/java/org/opensearch/sql/api/spec/UnifiedFunctionSpec.java [162-164]

 @Override
 public SqlOperandCountRange getOperandCountRange() {
-  return SqlOperandCountRanges.from(paramNames.size());
+  return SqlOperandCountRanges.between(paramNames.size(), -1);
 }
Suggestion importance[1-10]: 3

__

Why: SqlOperandCountRanges.from(n) already allows any number of operands above n, so the behavior is functionally equivalent. The suggestion to use between(n, -1) makes the intent more explicit but doesn't change runtime behavior, making it a minor readability improvement.

Low
Fix misleading comment and key extraction logic

The comment says "avoid backtick-decorated keys for reserved words" but the logic is
inverted: getSimple() is used for SqlIdentifier (which is the normal case), while
toString() is used for non-identifier keys (which may produce backtick-decorated
output for reserved words). The fallback key.toString() should be replaced with a
safer extraction, or the comment should clarify that toString() is intentionally
used for non-identifier key types only. Additionally, the key instanceof
SqlIdentifier ident branch already handles the reserved-word case via getSimple(),
so the comment is misleading.

api/src/main/java/org/opensearch/sql/api/parser/NamedArgRewriter.java [53-66]

 if (op instanceof SqlCall eq && op.getKind() == SqlKind.EQUALS) {
     SqlNode key = eq.operand(0);
     String name =
         key instanceof SqlIdentifier ident
-            ? ident.getSimple()
-            : key.toString(); // avoid backtick-decorated keys for reserved words
+            ? ident.getSimple() // use getSimple() to avoid backtick-decorated keys for reserved words
+            : key.toSqlString(null).getSql(); // fallback for non-identifier keys
     maps[i] = toMap(name, eq.operand(1));
 } else {
     if (i >= paramNames.size()) {
       throw new IllegalArgumentException(
           String.format("Invalid arguments for function '%s'", call.getOperator().getName()));
     }
     maps[i] = toMap(paramNames.get(i), op);
 }
Suggestion importance[1-10]: 2

__

Why: The comment placement is slightly misleading, but the actual logic is correct — getSimple() is used for SqlIdentifier which handles reserved words properly. The suggested toSqlString(null) replacement could introduce null-related issues and the improvement is marginal at best.

Low

Previous suggestions

Suggestions up to commit 6538c56
CategorySuggestion                                                                                                                                    Impact
Possible issue
Explicitly set unbounded maximum operand count for variadic functions

SqlOperandCountRanges.from(n) sets the minimum operand count to n but allows any
number above it. However, the paramNames list only contains the required positional
parameters (e.g., ["field", "query"] with size 2), so the minimum is correctly
enforced. But since optional named arguments are also accepted, the maximum should
be unbounded (-1). Using from(n) already achieves this, but it's worth verifying
that from indeed sets max to -1 (unbounded) in the Calcite version used — if it sets
max equal to min, extra named args would be rejected at the operand count check
before reaching checkOperandTypes.

api/src/main/java/org/opensearch/sql/api/spec/UnifiedFunctionSpec.java [162-164]

 @Override
 public SqlOperandCountRange getOperandCountRange() {
-  return SqlOperandCountRanges.from(paramNames.size());
+  // min = required positional params; max = unbounded to allow optional named args
+  return SqlOperandCountRanges.between(paramNames.size(), -1);
 }
Suggestion importance[1-10]: 5

__

Why: SqlOperandCountRanges.from(n) in Calcite sets min=n and max=-1 (unbounded), so the current code is likely correct. However, using between(paramNames.size(), -1) makes the intent explicit and guards against any version-specific behavior differences. This is a valid defensive improvement with moderate impact.

Low
General
Move assertion outside catch to avoid accidental swallowing

The catch (Exception e) block will also catch the AssertionError thrown on the line
above if AssertionError extends Exception — but in Java, AssertionError extends
Error, not Exception, so this is actually safe. However, if planner.plan(query)
throws an Error (not Exception), it would propagate uncaught and produce a confusing
test failure. This is a minor robustness concern, but the current logic is correct
for the intended use case.

api/src/test/java/org/opensearch/sql/api/UnifiedQueryTestBase.java [153-160]

 protected QueryErrorAssert givenInvalidQuery(String query) {
   try {
     planner.plan(query);
-    throw new AssertionError("Expected query to fail: " + query);
   } catch (Exception e) {
     return new QueryErrorAssert(e);
   }
+  throw new AssertionError("Expected query to fail: " + query);
 }
Suggestion importance[1-10]: 4

__

Why: The improved code is a cleaner and more idiomatic pattern that avoids any risk of the AssertionError being caught. While AssertionError extends Error (not Exception) so the current code is technically safe, the refactored version is clearer and follows better Java testing practices.

Low
Cache the operator table to avoid repeated allocation

The operator table is rebuilt on every call to buildFrameworkConfig() by calling
UnifiedFunctionSpec.RELEVANCE.operatorTable(), which creates a new
SqlOperatorTables.of(...) instance each time. Since RELEVANCE is a static constant,
its operator table should also be pre-computed and cached to avoid unnecessary
object allocation on each framework config build.

api/src/main/java/org/opensearch/sql/api/UnifiedQueryContext.java [249-251]

 .operatorTable(
     SqlOperatorTables.chain(
-        SqlStdOperatorTable.instance(), UnifiedFunctionSpec.RELEVANCE.operatorTable()))
+        SqlStdOperatorTable.instance(), UnifiedFunctionSpec.RELEVANCE_OPERATOR_TABLE))
Suggestion importance[1-10]: 3

__

Why: The suggestion to cache RELEVANCE.operatorTable() is a valid minor optimization, but it requires adding a new static field RELEVANCE_OPERATOR_TABLE to UnifiedFunctionSpec, which is a non-trivial API change. The performance impact is minimal since buildFrameworkConfig() is not called in a hot path.

Low
Fix misleading comment and fallback key extraction logic

The comment says "avoid backtick-decorated keys for reserved words" but the logic is
inverted: getSimple() is used for SqlIdentifier (which is the normal case), while
toString() is used for non-identifier keys (which would be the reserved-word case).
For reserved words, the key is still parsed as a SqlIdentifier (possibly quoted), so
getSimple() is correct for identifiers. However, the fallback key.toString() for
non-identifier keys may produce unexpected output. The comment is misleading and the
logic should be clarified or corrected to ensure reserved words are handled
properly.

api/src/main/java/org/opensearch/sql/api/parser/NamedArgRewriter.java [53-66]

 if (op instanceof SqlCall eq && op.getKind() == SqlKind.EQUALS) {
     SqlNode key = eq.operand(0);
+    // Use getSimple() for identifiers (handles both plain and quoted/reserved-word keys)
     String name =
         key instanceof SqlIdentifier ident
             ? ident.getSimple()
-            : key.toString(); // avoid backtick-decorated keys for reserved words
+            : key.toSqlString(null).getSql();
     maps[i] = toMap(name, eq.operand(1));
 } else {
     if (i >= paramNames.size()) {
       throw new IllegalArgumentException(
           String.format("Invalid arguments for function '%s'", call.getOperator().getName()));
     }
     maps[i] = toMap(paramNames.get(i), op);
 }
Suggestion importance[1-10]: 2

__

Why: The comment about "backtick-decorated keys for reserved words" is indeed misleading, but the actual logic is correct — getSimple() on SqlIdentifier handles quoted/reserved-word identifiers properly. The suggested toSqlString(null) fallback is not clearly better and may introduce other issues. This is a minor clarity concern with low impact.

Low
Suggestions up to commit 938d603
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix named-arg key extraction for quoted identifiers

Using toString() on a SqlNode operand (the key in a named argument like
operator='AND') may produce a quoted or fully-qualified identifier string rather
than the plain parameter name. For example, a quoted identifier "operator" would
produce "operator" instead of operator. Use SqlIdentifier's getSimple() or strip
quotes explicitly to get the raw name.

api/src/main/java/org/opensearch/sql/api/parser/NamedArgRewriter.java [52-56]

 if (op instanceof SqlCall eq && op.getKind() == SqlKind.EQUALS) {
-    maps[i] = toMap(eq.operand(0).toString(), eq.operand(1));
+    SqlNode keyNode = eq.operand(0);
+    String key = keyNode instanceof SqlIdentifier id ? id.getSimple() : keyNode.toString();
+    maps[i] = toMap(key, eq.operand(1));
 } else {
     maps[i] = toMap(paramNames.get(i), op);
 }
Suggestion importance[1-10]: 7

__

Why: Using toString() on a SqlNode for the key in named arguments like operator='AND' could produce quoted strings (e.g., "operator") instead of the plain name operator. Using SqlIdentifier.getSimple() would be more robust and correct for identifier nodes.

Medium
Guard against null exception messages in assertions

cause.getMessage() can return null for some exception types, which would cause a
NullPointerException in the contains call and obscure the actual test failure. Add a
null check or use String.valueOf(cause.getMessage()) to safely handle null messages.

api/src/testFixtures/java/org/opensearch/sql/api/UnifiedQueryTestBase.java [171-180]

 public QueryErrorAssert assertErrorMessage(String expected) {
   Throwable cause = error;
   while (cause.getCause() != null) {
     cause = cause.getCause();
   }
+  String actualMessage = String.valueOf(cause.getMessage());
   assertTrue(
-      "Expected error to contain: " + expected + "\nActual: " + cause.getMessage(),
-      cause.getMessage().contains(expected));
+      "Expected error to contain: " + expected + "\nActual: " + actualMessage,
+      actualMessage.contains(expected));
   return this;
 }
Suggestion importance[1-10]: 5

__

Why: cause.getMessage() can return null, causing a NullPointerException that would obscure the actual test failure. Using String.valueOf() is a simple and safe fix for test utility code.

Low
General
Clarify variadic operand count range intent

SqlOperandCountRanges.from(n) sets the minimum operand count to n but allows any
number of additional arguments (unbounded max). Since paramNames only lists the
required positional parameters (e.g., ["field", "query"]), this correctly enforces a
minimum of 2, but it also means calls with fewer than 2 args pass the count check
silently. The intent is to allow variadic optional args beyond the required ones, so
the minimum should be paramNames.size() and max should be unbounded (-1), which
SqlOperandCountRanges.from already does — however, verify this is the intended
behavior and add a comment to clarify.

api/src/main/java/org/opensearch/sql/api/spec/UnifiedFunctionSpec.java [162-164]

 @Override
 public SqlOperandCountRange getOperandCountRange() {
+  // Minimum: required positional params; maximum: unbounded (optional named args allowed)
   return SqlOperandCountRanges.from(paramNames.size());
 }
Suggestion importance[1-10]: 1

__

Why: The suggestion only asks to add a comment to clarify existing behavior without changing any logic. The existing_code and improved_code are functionally identical, making this a documentation-only suggestion with minimal impact.

Low
Suggestions up to commit acade43
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against out-of-bounds positional argument access

When a positional argument index i exceeds the size of paramNames, paramNames.get(i)
will throw an IndexOutOfBoundsException. Since paramNames only contains the required
parameter names (e.g., ["field", "query"]), any extra positional arguments beyond
those will cause a runtime crash. Add a bounds check or handle this case gracefully.

api/src/main/java/org/opensearch/sql/api/parser/NamedArgRewriter.java [52-56]

 if (op instanceof SqlCall eq && op.getKind() == SqlKind.EQUALS) {
     maps[i] = toMap(eq.operand(0).toString(), eq.operand(1));
 } else {
+    if (i >= paramNames.size()) {
+        throw new IllegalArgumentException(
+            "Too many positional arguments for function " + call.getOperator().getName()
+            + ": expected at most " + paramNames.size() + " positional args");
+    }
     maps[i] = toMap(paramNames.get(i), op);
 }
Suggestion importance[1-10]: 6

__

Why: The paramNames.get(i) call will throw an IndexOutOfBoundsException if more positional arguments are passed than there are entries in paramNames. This is a valid edge case that could cause a confusing runtime crash rather than a meaningful error message.

Low
Prevent NullPointerException on null error message

If the root cause's getMessage() returns null, calling
cause.getMessage().contains(expected) will throw a NullPointerException, masking the
original error. Add a null check before calling contains.

api/src/testFixtures/java/org/opensearch/sql/api/UnifiedQueryTestBase.java [172-179]

 Throwable cause = error;
 while (cause.getCause() != null) {
   cause = cause.getCause();
 }
+String actualMessage = cause.getMessage();
+assertNotNull("Expected error message but got null for cause: " + cause, actualMessage);
 assertTrue(
-    "Expected error to contain: " + expected + "\nActual: " + cause.getMessage(),
-    cause.getMessage().contains(expected));
+    "Expected error to contain: " + expected + "\nActual: " + actualMessage,
+    actualMessage.contains(expected));
Suggestion importance[1-10]: 4

__

Why: A null getMessage() on the root cause would throw a NullPointerException, masking the original error. Adding a null check improves robustness of the test helper, though this is a test utility and the impact is limited to test diagnostics.

Low
General
Clarify minimum operand count enforcement for variadic functions

SqlOperandCountRanges.from(n) sets the minimum operand count to n but allows any
number of additional arguments. However, the paramNames list only contains required
parameter names (e.g., ["field", "query"]), so the minimum should be enforced
correctly. The issue is that this allows zero arguments when paramNames is empty,
and doesn't enforce the minimum required args. Consider using
SqlOperandCountRanges.from(paramNames.size()) only if that correctly represents the
minimum — but verify that the intent is "at least N args" and not "exactly N args".

api/src/main/java/org/opensearch/sql/api/spec/UnifiedFunctionSpec.java [162-164]

 @Override
 public SqlOperandCountRange getOperandCountRange() {
+  // Requires at least the mandatory positional params; additional named args are variadic
   return SqlOperandCountRanges.from(paramNames.size());
 }
Suggestion importance[1-10]: 1

__

Why: The improved_code is functionally identical to the existing_code — it only adds a comment. This suggestion doesn't actually change behavior and only asks the user to verify the intent, which warrants a very low score.

Low
Suggestions up to commit c578e47
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against out-of-bounds positional argument index

When index is greater than or equal to paramNames.size(), calling
paramNames.get(index) will throw an IndexOutOfBoundsException. This can happen if a
caller passes more positional arguments than the number of declared param names. Add
a bounds check and throw a descriptive error or handle the extra positional
arguments gracefully.

api/src/main/java/org/opensearch/sql/api/parser/NamedArgRewriter.java [62-70]

 private static SqlNode rewriteOperand(SqlNode operand, int index, List<String> paramNames) {
   if (operand.getKind() == SqlKind.EQUALS) {
     // Optional: operator='AND' → MAP('operator', 'AND')
     SqlCall eq = (SqlCall) operand;
     return toMap(eq.operand(0).toString(), eq.operand(1));
   }
   // Required positional: name → MAP('field', name)
+  if (index >= paramNames.size()) {
+    throw new IllegalArgumentException(
+        "Too many positional arguments: index " + index + " exceeds declared params " + paramNames);
+  }
   return toMap(paramNames.get(index), operand);
 }
Suggestion importance[1-10]: 5

__

Why: The paramNames.get(index) call can throw IndexOutOfBoundsException if more positional arguments are passed than declared param names. However, in practice the variadic functions defined in UnifiedFunctionSpec use SqlOperandCountRanges.from(paramNames.size()) which enforces a minimum, and extra args would be EQUALS kind (named args), so the risk is limited but still worth guarding.

Low
Prevent NullPointerException on null function name lookup

If name is null, calling name.toLowerCase() will throw a NullPointerException. Since
this method is called from NamedArgRewriter.visit() using
visited.getOperator().getName(), which could potentially return null, a null check
should be added.

api/src/main/java/org/opensearch/sql/api/spec/UnifiedFunctionSpec.java [76-78]

 public static UnifiedFunctionSpec of(String name) {
+  if (name == null) return null;
   return ALL_SPECS.get(name.toLowerCase());
 }
Suggestion importance[1-10]: 3

__

Why: While SqlOperator.getName() is unlikely to return null in practice, adding a null guard is a minor defensive improvement. The impact is low since Calcite operators always have non-null names.

Low
General
Guard against null error message in assertion

If the root cause's getMessage() returns null, calling
cause.getMessage().contains(expected) will throw a NullPointerException, causing a
confusing test failure. Add a null check on cause.getMessage() before calling
contains.

api/src/testFixtures/java/org/opensearch/sql/api/UnifiedQueryTestBase.java [171-180]

 public QueryErrorAssert assertErrorMessage(String expected) {
   Throwable cause = error;
   while (cause.getCause() != null) {
     cause = cause.getCause();
   }
+  String message = cause.getMessage();
   assertTrue(
-      "Expected error to contain: " + expected + "\nActual: " + cause.getMessage(),
-      cause.getMessage().contains(expected));
+      "Expected error to contain: " + expected + "\nActual: " + message,
+      message != null && message.contains(expected));
   return this;
 }
Suggestion importance[1-10]: 4

__

Why: If cause.getMessage() returns null, the contains call would throw a NullPointerException instead of a clear assertion failure. The fix improves test robustness by handling null messages gracefully.

Low
Suggestions up to commit fe94ab1
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against out-of-bounds positional argument index

When index is greater than or equal to paramNames.size(), calling
paramNames.get(index) will throw an IndexOutOfBoundsException. This can happen when
a caller passes more positional arguments than there are declared param names. Add a
bounds check and throw a descriptive error or handle the extra positional args
gracefully.

api/src/main/java/org/opensearch/sql/api/parser/NamedArgRewriter.java [61-69]

 private static SqlNode rewriteOperand(SqlNode operand, int index, List<String> paramNames) {
   if (operand.getKind() == SqlKind.EQUALS) {
     // Optional: operator='AND' → MAP('operator', 'AND')
     SqlCall eq = (SqlCall) operand;
     return toMap(eq.operand(0).toString(), eq.operand(1));
   }
   // Required positional: name → MAP('field', name)
+  if (index >= paramNames.size()) {
+    throw new IllegalArgumentException(
+        "Too many positional arguments: index " + index + " exceeds declared params " + paramNames);
+  }
   return toMap(paramNames.get(index), operand);
 }
Suggestion importance[1-10]: 6

__

Why: The paramNames.get(index) call will throw an IndexOutOfBoundsException if more positional arguments are passed than declared param names. However, in practice the variadic functions defined in UnifiedFunctionSpec only have 2 required params and extra args are expected to be EQUALS nodes, so this is a minor edge case worth guarding.

Low
Null-check function name before lookup

If name is null, calling name.toLowerCase() will throw a NullPointerException. Since
of is called from NamedArgRewriter.visit with visited.getOperator().getName(), which
could potentially return null, a null check should be added.

api/src/main/java/org/opensearch/sql/api/spec/UnifiedFunctionSpec.java [80-82]

 public static UnifiedFunctionSpec of(String name) {
+  if (name == null) return null;
   return ALL_SPECS.get(name.toLowerCase());
 }
Suggestion importance[1-10]: 3

__

Why: While theoretically getName() could return null, Calcite operators always have non-null names in practice, making this a very low-probability defensive check with minimal real-world impact.

Low
General
Guard against null error message in assertion

If the root cause's getMessage() returns null, calling
cause.getMessage().contains(expected) will throw a NullPointerException. Add a null
check on cause.getMessage() before calling contains.

api/src/testFixtures/java/org/opensearch/sql/api/UnifiedQueryTestBase.java [171-180]

 public QueryErrorAssert assertErrorMessage(String expected) {
   Throwable cause = error;
   while (cause.getCause() != null) {
     cause = cause.getCause();
   }
+  String message = cause.getMessage();
   assertTrue(
-      "Expected error to contain: " + expected + "\nActual: " + cause.getMessage(),
-      cause.getMessage().contains(expected));
+      "Expected error to contain: " + expected + "\nActual: " + message,
+      message != null && message.contains(expected));
   return this;
 }
Suggestion importance[1-10]: 4

__

Why: The cause.getMessage() could return null for some exception types, causing a NullPointerException in the test assertion. The fix correctly handles this by storing the message in a variable and adding a null check before calling contains.

Low

Add UnifiedFunctionSpec with fluent builder to define relevance function
signatures (match, match_phrase, multi_match, etc.) as a composable
SqlOperatorTable chained into Calcite's FrameworkConfig. Functions are
language-level primitives, always resolvable regardless of default schema.

Add SQL and PPL test coverage for all 7 relevance functions.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen force-pushed the feature/register-fulltext-search-calcite-sql branch from 0da8394 to a1f2d7f Compare April 3, 2026 19:34
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

Persistent review updated to latest commit a1f2d7f

@dai-chen dai-chen force-pushed the feature/register-fulltext-search-calcite-sql branch from a1f2d7f to ae36dc1 Compare April 3, 2026 20:26
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

Persistent review updated to latest commit ae36dc1

@dai-chen dai-chen force-pushed the feature/register-fulltext-search-calcite-sql branch from ae36dc1 to 1908ebe Compare April 3, 2026 20:58
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 1908ebe.

PathLineSeverityDescription
build.log1mediumA build.log file containing local developer machine details (username 'daichen', local file paths '/Users/daichen/IdeaProjects/my-sql-repo/', OS version, JDK paths) has been committed to the repository. This leaks developer environment information and is unrelated to the feature being implemented. Likely accidental inclusion rather than malicious, but warrants review.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 1 | Low: 0


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

Persistent review updated to latest commit 1908ebe

@dai-chen dai-chen force-pushed the feature/register-fulltext-search-calcite-sql branch from 1908ebe to fe94ab1 Compare April 3, 2026 21:02
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

Persistent review updated to latest commit fe94ab1

@dai-chen dai-chen force-pushed the feature/register-fulltext-search-calcite-sql branch from fe94ab1 to c578e47 Compare April 3, 2026 21:31
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

Persistent review updated to latest commit c578e47

Add NamedArgRewriter SqlShuttle that normalizes V2/PPL relevance syntax
into MAP-based form before Calcite validation. Transforms positional and
key=value arguments into MAP[paramName, value] pairs matching PPL's
internal representation for uniform pushdown rules.

Refactor UnifiedFunctionSpec to instance-based design with fluent builder
and Category record for grouping. Use SqlUserDefinedFunction for
consistency with PPL path. Add error tests and QueryErrorAssert to test
base.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen force-pushed the feature/register-fulltext-search-calcite-sql branch from c578e47 to acade43 Compare April 3, 2026 22:38
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

Persistent review updated to latest commit acade43

@dai-chen dai-chen marked this pull request as ready for review April 4, 2026 00:00
Add dedicated NamedArgRewriterTest covering:
- Positional args rewritten to MAPs with correct param names
- V2 equals syntax (key=value) flattened to MAP entries
- Multi-field functions use 'fields' param name
- Non-relevance functions pass through unchanged
- Edge cases: all-equals args, mixed order, extra positional args

Add high-level regression test in UnifiedRelevanceSearchSqlTest
verifying non-relevance functions (upper) are unaffected by rewriter.

Parser config matches production (Casing.UNCHANGED).

Signed-off-by: Chen Dai <daichen@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 6, 2026

Persistent review updated to latest commit 938d603

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 6, 2026

Persistent review updated to latest commit 6538c56

- Add bounds check for positional args with descriptive error message
- Use SqlIdentifier.getSimple() to avoid backtick-decorated keys for
  reserved words like 'escape' in named arguments
- Add null guard in QueryErrorAssert.assertErrorMessage to prevent NPE
  when root cause exception has null message
- Update tests to assert on IllegalArgumentException with error message
- Add test for reserved word as named argument key (query_string escape)

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen force-pushed the feature/register-fulltext-search-calcite-sql branch from 6538c56 to ac0fe64 Compare April 6, 2026 19:03
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 6, 2026

Persistent review updated to latest commit ac0fe64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request SQL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants