feat(search): index business terms via SemanticModelsManager#541
Merged
mvkonchits-db merged 1 commit intoJun 22, 2026
Merged
Conversation
Wire SemanticModelsManager into the global search index as a SearchableAsset so ontology concepts / glossary terms appear in main search results. - Decorate manager with @searchable_asset and inherit SearchableAsset. - Add get_search_index_items() emitting type=glossary-term entries with link, tags, and synonyms in extra_data so the existing search config (search_config.yaml: glossary-term with synonyms boost) is honored. - Link uses direct /concepts/browser/<encoded_iri> path to land straight on ConceptDetailView, matching concept-detail.tsx convention and avoiding the deprecated ?concept= redirect hop documented in business-terms.tsx. Incremental updates (no full rebuild needed on each edit): - _make_glossary_search_item: single factory shared by the bulk indexer and the per-mutation upsert path so live edits stay shape-consistent with the startup snapshot. - _upsert_concept_in_search / _remove_concept_from_search: live-update helpers wired into create_concept, update_concept, update_concept_status, and delete_concept. - delete_collection captures concept IRIs before triple wipe, then evicts each from the search index. - import_rdf_to_collection upserts every subject from the parsed graph. - _reindex_concepts_in_search hooks into _build_persistent_caches_atomic to cover all whole-graph rebuild paths (taxonomy enable/disable, semantic model file CRUD) without per-route plumbing. Rationale on memory / SPARQL alternative: a SearchIndexItem is small relative to the OntologyConcept and rdflib.ConjunctiveGraph already in memory, so per-item duplication is negligible at typical scales. A delegated per-query SPARQL search would only pay off at 50k+ concepts and would be a generic SearchableAsset redesign, not a glossary-specific shortcut. Verified end-to-end via Playwright: - 502 items in index after startup (was 2; +500 glossary terms). - POST/PATCH/DELETE of a concept reflects live in /api/search without triggering a SearchManager.build_index(). - Search-result click navigates straight to ConceptDetailView for both urn: and http://...#fragment IRIs (no ?concept= intermediate).
3 tasks
mvkonchits-db
approved these changes
Jun 22, 2026
mvkonchits-db
left a comment
Contributor
There was a problem hiding this comment.
LGTM on the code. Approving with the understanding that CI will run for the first time after #540 merges and this PR auto-retargets to `main` — merge should remain gated on those checks going green.
What I liked:
- Shared factory (`_make_glossary_search_item`) used by both the bulk `get_search_index_items` path and the per-mutation upsert path. Eliminates shape drift between the startup snapshot and live edits — exactly the right place to centralise this.
- Mutation hook coverage is comprehensive: `create_concept` / `update_concept` / `update_concept_status` / `delete_concept` / `delete_collection` / `import_rdf_to_collection` / `_build_persistent_caches_atomic`. Skipping ownership mutations is the right call — they don't touch any indexed field.
- Ordering in `delete_collection` is correct: `_collect_concept_iris_in_context` snapshots before the triple wipe, eviction happens post-commit. Easy place to get wrong; this got it right.
- `import_rdf_to_collection` iterates subjects from the just-parsed `temp_graph` rather than the merged graph — avoids reindexing pre-existing concepts on a partial import.
- Safe no-ops when `_search_manager is None` (startup) and when `_cached_concepts is None` (post-invalidation). The reindex helper purges via a snapshot (`list(self._search_manager.index)`) before mutating, so no iterator invalidation.
- Link format change to `/concepts/browser/<encoded_iri>` aligning with `concept-detail.tsx` / `business-terms.tsx` precedent — cleaner than the `?concept=` redirect hop.
Non-blocking nits:
- No automated test coverage for the live-update paths. The manual Playwright walk in the test plan covers it well for now, but an integration-level "create → search hit → delete → search miss" against `SemanticModelsManager` would protect this against future regressions (especially the `delete_collection` ordering and the `temp_graph` subject enumeration). Happy to file as a follow-up if you'd rather not bundle it here.
- The body memory/SPARQL trade-off note is fair — agree the `SearchableAsset` redesign is the right scope for any rework, not this PR.
Merge whenever CI lands clean post-retarget.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wire
SemanticModelsManagerinto the global search index as aSearchableAssetso ontology concepts / glossary terms appear in main search results, with live incremental updates on every concept mutation (no full rebuild per edit).Base:
fix-concept-rdf-types(#540). This PR sits on top of #540 because both editsemantic_models_manager.py. Once #540 merges, GitHub will auto-retarget this PR tomain.Changes
Core integration
@searchable_assetdecorator +SearchableAssetbase onSemanticModelsManager.get_search_index_items()emits oneSearchIndexItemper concept with:type=\"glossary-term\"(matches existingsearch_config.yaml)feature_id=\"semantic-models\"(so existingPermissionCheckerfilters correctly)link=/concepts/browser/<encoded_iri>— direct path toConceptDetailView, matching the convention used byconcept-detail.tsx/business-terms.tsx. Avoids the deprecated?concept=redirect hop documented atbusiness-terms.tsx:86.extra_data[\"synonyms\"]populated fromOntologyConcept.synonymsso the boosted synonym field already configured insearch_config.yaml(priority 2, boost 1.5) is actually exercised.tags=[source_context, concept_type]fortag:query support.Incremental updates (no rebuild required per edit)
_make_glossary_search_item: single factory shared by the bulk indexer and the per-mutation upsert path so live edits stay shape-consistent with the startup snapshot._upsert_concept_in_search/_remove_concept_from_search: live-update helpers wired into:create_concept→ upsertupdate_concept→ upsertupdate_concept_status→ upsertdelete_concept→ removedelete_collection→ snapshots concept IRIs (via new_collect_concept_iris_in_context) before triple wipe, then evicts each from the search index.import_rdf_to_collection→ iterates subjects from the just-parsedtemp_graphand upserts each._reindex_concepts_in_search→ hooked into_build_persistent_caches_atomicso all whole-graph rebuild paths (taxonomy enable/disable, semantic model file CRUD via_rebuild_and_sync_asset_types) refresh the live index without per-route plumbing.Intentionally not wired:
add_concept_owner/remove_concept_owner— ownership changes don't affect any indexed field. Acceptable staleness; spares index churn.Memory / SPARQL trade-off
A
SearchIndexItemis small relative to theOntologyConceptandrdflib.ConjunctiveGraphalready in memory, so per-item duplication is noise at typical scales. A delegated per-query SPARQL search would only pay off at 50k+ concepts and would be a genericSearchableAssetredesign, not a glossary-specific shortcut. Documented in commit message.Test plan
Registering searchable asset manager: SemanticModelsManagerSearchManager initialized with 7 managers(was 6)Search index build complete. Total items: 502(was 2; +500 glossary terms)/concepts/browser/<encoded_iri>(verifiedlocation.search === \"\",location.hash === \"\")urn:glossary:test-3/customer-idandhttp://ontos.app/ontology#LogicalEntityresolve to the right ConceptDetailView page/api/knowledge/concepts→ search for new synonym → hitFollow-up
Filed as a separate issue: small UX papercuts noticed while testing (hyphen-tokenized title search; pre-existing nested-button DOM warning in
LinkedObjectsPanel). Not in scope for this PR.