Add relevance search function support for unified SQL query#5279
Conversation
|
Failed to generate code suggestions for PR |
8e54956 to
5f80dfc
Compare
|
Failed to generate code suggestions for PR |
5f80dfc to
0da8394
Compare
PR Reviewer Guide 🔍(Review updated until commit ac0fe64)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to ac0fe64 Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 6538c56
Suggestions up to commit 938d603
Suggestions up to commit acade43
Suggestions up to commit c578e47
Suggestions up to commit fe94ab1
|
Add UnifiedFunctionSpec with fluent builder to define relevance function signatures (match, match_phrase, multi_match, etc.) as a composable SqlOperatorTable chained into Calcite's FrameworkConfig. Functions are language-level primitives, always resolvable regardless of default schema. Add SQL and PPL test coverage for all 7 relevance functions. Signed-off-by: Chen Dai <daichen@amazon.com>
0da8394 to
a1f2d7f
Compare
|
Persistent review updated to latest commit a1f2d7f |
a1f2d7f to
ae36dc1
Compare
|
Persistent review updated to latest commit ae36dc1 |
ae36dc1 to
1908ebe
Compare
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 1908ebe.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
|
Persistent review updated to latest commit 1908ebe |
1908ebe to
fe94ab1
Compare
|
Persistent review updated to latest commit fe94ab1 |
fe94ab1 to
c578e47
Compare
|
Persistent review updated to latest commit c578e47 |
Add NamedArgRewriter SqlShuttle that normalizes V2/PPL relevance syntax into MAP-based form before Calcite validation. Transforms positional and key=value arguments into MAP[paramName, value] pairs matching PPL's internal representation for uniform pushdown rules. Refactor UnifiedFunctionSpec to instance-based design with fluent builder and Category record for grouping. Use SqlUserDefinedFunction for consistency with PPL path. Add error tests and QueryErrorAssert to test base. Signed-off-by: Chen Dai <daichen@amazon.com>
c578e47 to
acade43
Compare
|
Persistent review updated to latest commit acade43 |
Add dedicated NamedArgRewriterTest covering: - Positional args rewritten to MAPs with correct param names - V2 equals syntax (key=value) flattened to MAP entries - Multi-field functions use 'fields' param name - Non-relevance functions pass through unchanged - Edge cases: all-equals args, mixed order, extra positional args Add high-level regression test in UnifiedRelevanceSearchSqlTest verifying non-relevance functions (upper) are unaffected by rewriter. Parser config matches production (Casing.UNCHANGED). Signed-off-by: Chen Dai <daichen@amazon.com>
|
Persistent review updated to latest commit 938d603 |
|
Persistent review updated to latest commit 6538c56 |
- Add bounds check for positional args with descriptive error message - Use SqlIdentifier.getSimple() to avoid backtick-decorated keys for reserved words like 'escape' in named arguments - Add null guard in QueryErrorAssert.assertErrorMessage to prevent NPE when root cause exception has null message - Update tests to assert on IllegalArgumentException with error message - Add test for reserved word as named argument key (query_string escape) Signed-off-by: Chen Dai <daichen@amazon.com>
6538c56 to
ac0fe64
Compare
|
Persistent review updated to latest commit ac0fe64 |
Description
Register relevance search functions (
match,match_phrase,multi_match, etc.) as language-level functions in the unified query engine, with backward-compatible named-argument syntax support (for now).Function Support Matrix
Design Choices
UnifiedFunctionSpecand registered globally, making them always resolvable in any query context. No local execution is provided; data-source capability is enforced at optimization time by pushdown rules.key=valuesyntax for optional parameters (e.g.,match(name, 'John', operator='AND')) instead of SQL's standard=>syntax.NamedArgRewriterruns as a pre-validation pass that normalizes both positional and key=value arguments into MAP-based form (e.g.,MAP('field', name), MAP('operator', 'AND')), producing identical plans as the PPL path for shared pushdown rules.Please find more design details in #5248 (comment).
Next Step
matchquery,matchphrase,multimatch,wildcardquery,matchphrasequery,multimatchquery) that are not yet covered.multi_match(['field1', 'field2'])) is not supported — only single column references are accepted. Documented with FIXME tests. Options: (1) extend the parser to support V2's bracket syntax; (2) switch toARRAY[...]and accept the breaking change from V2.Related Issues
Part of #5248
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.