Skip to content

feat(search): index business terms via SemanticModelsManager#541

Merged
mvkonchits-db merged 1 commit into
fix-concept-rdf-typesfrom
feat-business-terms-in-search
Jun 22, 2026
Merged

feat(search): index business terms via SemanticModelsManager#541
mvkonchits-db merged 1 commit into
fix-concept-rdf-typesfrom
feat-business-terms-in-search

Conversation

@larsgeorge-db

Copy link
Copy Markdown
Collaborator

Summary

Wire SemanticModelsManager into the global search index as a SearchableAsset so ontology concepts / glossary terms appear in main search results, with live incremental updates on every concept mutation (no full rebuild per edit).

Base: fix-concept-rdf-types (#540). This PR sits on top of #540 because both edit semantic_models_manager.py. Once #540 merges, GitHub will auto-retarget this PR to main.

Changes

Core integration

  • @searchable_asset decorator + SearchableAsset base on SemanticModelsManager.
  • get_search_index_items() emits one SearchIndexItem per concept with:
    • type=\"glossary-term\" (matches existing search_config.yaml)
    • feature_id=\"semantic-models\" (so existing PermissionChecker filters correctly)
    • link=/concepts/browser/<encoded_iri> — direct path to ConceptDetailView, matching the convention used by concept-detail.tsx / business-terms.tsx. Avoids the deprecated ?concept= redirect hop documented at business-terms.tsx:86.
    • extra_data[\"synonyms\"] populated from OntologyConcept.synonyms so the boosted synonym field already configured in search_config.yaml (priority 2, boost 1.5) is actually exercised.
    • tags=[source_context, concept_type] for tag: query support.

Incremental updates (no rebuild required per edit)

  • _make_glossary_search_item: single factory shared by the bulk indexer and the per-mutation upsert path so live edits stay shape-consistent with the startup snapshot.
  • _upsert_concept_in_search / _remove_concept_from_search: live-update helpers wired into:
    • create_concept → upsert
    • update_concept → upsert
    • update_concept_status → upsert
    • delete_concept → remove
  • delete_collection → snapshots concept IRIs (via new _collect_concept_iris_in_context) before triple wipe, then evicts each from the search index.
  • import_rdf_to_collection → iterates subjects from the just-parsed temp_graph and upserts each.
  • _reindex_concepts_in_search → hooked into _build_persistent_caches_atomic so all whole-graph rebuild paths (taxonomy enable/disable, semantic model file CRUD via _rebuild_and_sync_asset_types) refresh the live index without per-route plumbing.

Intentionally not wired: add_concept_owner / remove_concept_owner — ownership changes don't affect any indexed field. Acceptable staleness; spares index churn.

Memory / SPARQL trade-off

A SearchIndexItem is small relative to the OntologyConcept and rdflib.ConjunctiveGraph already in memory, so per-item duplication is noise at typical scales. A delegated per-query SPARQL search would only pay off at 50k+ concepts and would be a generic SearchableAsset redesign, not a glossary-specific shortcut. Documented in commit message.

Test plan

  • Backend log confirms registry pickup: Registering searchable asset manager: SemanticModelsManager
  • SearchManager initialized with 7 managers (was 6)
  • Search index build complete. Total items: 502 (was 2; +500 glossary terms)
  • End-to-end via Playwright:
    • Search dropdown shows glossary-term entries with Book icon
    • Click navigates directly to /concepts/browser/<encoded_iri> (verified location.search === \"\", location.hash === \"\")
    • Both urn:glossary:test-3/customer-id and http://ontos.app/ontology#LogicalEntity resolve to the right ConceptDetailView page
  • Live updates (no rebuild needed):
    • POST /api/knowledge/concepts → search for new synonym → hit
    • PATCH (rename label) → search by new title → hit
    • DELETE → search by title → 0 hits
  • No new linter errors

Follow-up

Filed as a separate issue: small UX papercuts noticed while testing (hyphen-tokenized title search; pre-existing nested-button DOM warning in LinkedObjectsPanel). Not in scope for this PR.

Wire SemanticModelsManager into the global search index as a SearchableAsset
so ontology concepts / glossary terms appear in main search results.

- Decorate manager with @searchable_asset and inherit SearchableAsset.
- Add get_search_index_items() emitting type=glossary-term entries with
  link, tags, and synonyms in extra_data so the existing search config
  (search_config.yaml: glossary-term with synonyms boost) is honored.
- Link uses direct /concepts/browser/<encoded_iri> path to land straight
  on ConceptDetailView, matching concept-detail.tsx convention and avoiding
  the deprecated ?concept= redirect hop documented in business-terms.tsx.

Incremental updates (no full rebuild needed on each edit):

- _make_glossary_search_item: single factory shared by the bulk indexer
  and the per-mutation upsert path so live edits stay shape-consistent
  with the startup snapshot.
- _upsert_concept_in_search / _remove_concept_from_search: live-update
  helpers wired into create_concept, update_concept, update_concept_status,
  and delete_concept.
- delete_collection captures concept IRIs before triple wipe, then evicts
  each from the search index.
- import_rdf_to_collection upserts every subject from the parsed graph.
- _reindex_concepts_in_search hooks into _build_persistent_caches_atomic
  to cover all whole-graph rebuild paths (taxonomy enable/disable, semantic
  model file CRUD) without per-route plumbing.

Rationale on memory / SPARQL alternative: a SearchIndexItem is small
relative to the OntologyConcept and rdflib.ConjunctiveGraph already in
memory, so per-item duplication is negligible at typical scales. A
delegated per-query SPARQL search would only pay off at 50k+ concepts
and would be a generic SearchableAsset redesign, not a glossary-specific
shortcut.

Verified end-to-end via Playwright:

- 502 items in index after startup (was 2; +500 glossary terms).
- POST/PATCH/DELETE of a concept reflects live in /api/search without
  triggering a SearchManager.build_index().
- Search-result click navigates straight to ConceptDetailView for both
  urn: and http://...#fragment IRIs (no ?concept= intermediate).

@mvkonchits-db mvkonchits-db left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on the code. Approving with the understanding that CI will run for the first time after #540 merges and this PR auto-retargets to `main` — merge should remain gated on those checks going green.

What I liked:

  • Shared factory (`_make_glossary_search_item`) used by both the bulk `get_search_index_items` path and the per-mutation upsert path. Eliminates shape drift between the startup snapshot and live edits — exactly the right place to centralise this.
  • Mutation hook coverage is comprehensive: `create_concept` / `update_concept` / `update_concept_status` / `delete_concept` / `delete_collection` / `import_rdf_to_collection` / `_build_persistent_caches_atomic`. Skipping ownership mutations is the right call — they don't touch any indexed field.
  • Ordering in `delete_collection` is correct: `_collect_concept_iris_in_context` snapshots before the triple wipe, eviction happens post-commit. Easy place to get wrong; this got it right.
  • `import_rdf_to_collection` iterates subjects from the just-parsed `temp_graph` rather than the merged graph — avoids reindexing pre-existing concepts on a partial import.
  • Safe no-ops when `_search_manager is None` (startup) and when `_cached_concepts is None` (post-invalidation). The reindex helper purges via a snapshot (`list(self._search_manager.index)`) before mutating, so no iterator invalidation.
  • Link format change to `/concepts/browser/<encoded_iri>` aligning with `concept-detail.tsx` / `business-terms.tsx` precedent — cleaner than the `?concept=` redirect hop.

Non-blocking nits:

  • No automated test coverage for the live-update paths. The manual Playwright walk in the test plan covers it well for now, but an integration-level "create → search hit → delete → search miss" against `SemanticModelsManager` would protect this against future regressions (especially the `delete_collection` ordering and the `temp_graph` subject enumeration). Happy to file as a follow-up if you'd rather not bundle it here.
  • The body memory/SPARQL trade-off note is fair — agree the `SearchableAsset` redesign is the right scope for any rework, not this PR.

Merge whenever CI lands clean post-retarget.

@mvkonchits-db mvkonchits-db merged commit 981634f into fix-concept-rdf-types Jun 22, 2026
1 check passed
@mvkonchits-db mvkonchits-db deleted the feat-business-terms-in-search branch June 22, 2026 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants