Skip to content

Add support for semantic search#955

Merged
jazairi merged 1 commit intomainfrom
use-411-use-412
Apr 6, 2026
Merged

Add support for semantic search#955
jazairi merged 1 commit intomainfrom
use-411-use-412

Conversation

@jazairi
Copy link
Copy Markdown
Contributor

@jazairi jazairi commented Apr 2, 2026

Why these changes are being introduced:

Now that we have the Semantic Query Builder lambda, we need to integrate it into TIMDEX.

Relevant ticket(s):

How this addresses that need:

  • Adds SemanticQueryBuilder service to invoke the lambda
  • Adds queryMode parameter (keyword/semantic) to GraphQL interface
  • Implements builder selection pattern in Opensearch model
  • Renames AWS credentials to generic variables (AWS_ACCESS_KEY_ID, etc.)

Side effects of this change:

  • Adds aws-sdk-lambda and mocha as dependencies.
  • Makes small change to OpenSearch initializer: ENV.fetch('AWS_SESSION_TOKEN', false) returns an empty string, which will evaluate to truthy. This has been updated to check for the presence of the env var instead.

Developer

  • All new ENV is documented in README
  • All new ENV has been added to Heroku Pipeline, Staging and Prod WARNING: Ensure ENV is updated prior to merge, or builds will break!
  • ANDI or Wave has been run in accordance to
    our guide and
    all issues introduced by these changes have been resolved or opened as new
    issues (link to those issues in the Pull Request details above)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer

  • The commit message is clear and follows our guidelines
    (not just this pull request message)
  • There are appropriate tests covering any new functionality
  • The documentation has been updated or is unnecessary
  • The changes have been verified
  • New dependencies are appropriate or there were no changes

Requires database migrations?

NO

Includes new or updated dependencies?

YES

@qltysh
Copy link
Copy Markdown

qltysh bot commented Apr 2, 2026

❌ 9 blocking issues (12 total)

Tool Category Rule Count
rubocop Lint Avoid parameter lists longer than 5 parameters. [18/5] 3
rubocop Style Line is too long. [143/120] 2
rubocop Style Incorrect formatting, autoformat by running qlty fmt. 1
rubocop Lint Block has too many lines. [62/25] 1
rubocop Lint Assignment Branch Condition size for construct\_query is too high. [<24, 33, 0> 40.8/17] 1
rubocop Lint Unused method argument - fulltext. 1
qlty Structure Function with many parameters (count = 18): search 3

@qltysh one-click actions:

  • Auto-fix formatting (qlty fmt && git push)

@@ -103,11 +105,11 @@ def record_id(id:, index:)
end

def search(searchterm:, citation:, contributors:, funding_information:, geodistance:, geobox:, identifiers:,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 2 issues:

1. Function with many parameters (count = 18): search [qlty:function-parameters]


2. Avoid parameter lists longer than 5 parameters. [18/5] [rubocop:Metrics/ParameterLists]


def search(searchterm:, citation:, contributors:, funding_information:, geodistance:, geobox:, identifiers:,
locations:, subjects:, title:, index:, source:, from:, boolean_type:, fulltext:, per_page: 20, **filters)
locations:, subjects:, title:, index:, source:, from:, boolean_type:, fulltext:, per_page: 20, query_mode: 'keyword', **filters)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line is too long. [143/120] [rubocop:Layout/LineLength]

@@ -135,9 +137,10 @@ def inject_hits_fields_into_source(hits)
end

def construct_query(searchterm, citation, contributors, funding_information, geodistance, geobox, identifiers,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 2 issues:

1. Function with many parameters (count = 15): construct_query [qlty:function-parameters]


2. Avoid parameter lists longer than 5 parameters. [15/5] [rubocop:Metrics/ParameterLists]

@mitlib mitlib temporarily deployed to timdex-api-p-use-411-us-w6dpzk April 2, 2026 15:58 Inactive
@jazairi jazairi temporarily deployed to timdex-api-p-use-411-us-w6dpzk April 2, 2026 20:27 Inactive
locations, subjects, title, source, boolean_type, filters, per_page, query_mode)

results = Opensearch.new.search(from, query, Timdex::OSClient, highlight_requested?, index, fulltext)
results = Opensearch.new.search(from, query, Timdex::OSClient, highlight: highlight_requested?, index: index, fulltext: fulltext, query_mode: query_mode)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line is too long. [159/120] [rubocop:Layout/LineLength]

MAX_SIZE = 200

def search(from, params, client, highlight = false, index = nil, fulltext = false)
def search(from, params, client, highlight: false, index: nil, fulltext: false, query_mode: 'keyword')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 2 issues:

1. Function with many parameters (count = 7): search [qlty:function-parameters]


2. Avoid parameter lists longer than 5 parameters. [7/5] [rubocop:Metrics/ParameterLists]

@jazairi
Copy link
Copy Markdown
Contributor Author

jazairi commented Apr 2, 2026

Note that I ran copilot review in a separate draft PR yesterday, to leave us with a cleaner PR for human review. (Copilot had a lot of suggestions.)

@jazairi jazairi temporarily deployed to timdex-api-p-use-411-us-w6dpzk April 2, 2026 21:12 Inactive
@jazairi jazairi temporarily deployed to timdex-api-p-use-411-us-w6dpzk April 2, 2026 21:39 Inactive
require 'aws-sdk-lambda'

class SemanticQueryBuilder
def build(params, fulltext: false)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused method argument - fulltext. [rubocop:Lint/UnusedMethodArgument]

secret_access_key: ENV.fetch('AWS_SECRET_ACCESS_KEY', nil)
}

client = Aws::Lambda::Client.new(client_options)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider an initializer for the Lambda client like we have for the OpenSearch client? Lambda is weird and different so it may not provide the same persistent connection value we get for OpenSearch, but a shared client would allow for consistency with how we manage those external services. This absolutely might be best here as it only moves a few lines to an initializer and that may end up being more confusing to determine where that is occurring. 🤷🏻

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't, but I like that as a way to polish this up a bit!

LexicalQueryBuilder.new
end

builder.build(@params, fulltext: @fulltext)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, the pattern we put in place anticipating this is working as cleanly as we hoped.

OpenSearch::Client.new log: ENV.fetch('OPENSEARCH_LOG', false), url: ENV['OPENSEARCH_URL'] do |config|
# personal keys use expiring credentials with tokens
if ENV.fetch('AWS_OPENSEARCH_SESSION_TOKEN', false)
if ENV['AWS_SESSION_TOKEN'].present?
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure to have all of the affect ENV duplicated in all tiers before this merges please.

@jazairi jazairi temporarily deployed to timdex-api-p-use-411-us-w6dpzk April 6, 2026 21:11 Inactive
@jazairi jazairi temporarily deployed to timdex-api-p-use-411-us-w6dpzk April 6, 2026 21:24 Inactive
@jazairi jazairi requested a review from JPrevost April 6, 2026 21:48
Copy link
Copy Markdown
Member

@JPrevost JPrevost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Why these changes are being introduced:

Now that we have the Semantic Query Builder lambda, we need to
integrate it into TIMDEX.

Relevant ticket(s):

- [USE-411](https://mitlibraries.atlassian.net/browse/USE-411)
- [USE-412](https://mitlibraries.atlassian.net/browse/USE-412)

How this addresses that need:

- Adds SemanticQueryBuilder service to invoke the lambda
- Adds queryMode parameter (keyword/semantic) to GraphQL interface
- Implements builder selection pattern in Opensearch model
- Renames AWS credentials to generic variables (AWS_ACCESS_KEY_ID, etc.)

Side effects of this change:

- Adds aws-sdk-lambda and mocha as dependencies.
- Makes small change to OpenSearch initializer: `ENV.fetch('AWS_SESSION_TOKEN', false)`
returns an empty string, which will evaluate to truthy. This has been
updated to check for the presence of the env var instead.
@jazairi jazairi temporarily deployed to timdex-api-p-use-411-us-w6dpzk April 6, 2026 22:50 Inactive
@jazairi jazairi merged commit 7c02df2 into main Apr 6, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants