Skip to content

feat(ui): KG-scoped data source onboarding (k-extract flow)#737

Open
aredenba-rh wants to merge 84 commits into
mainfrom
feature/manage-knowledge-graph
Open

feat(ui): KG-scoped data source onboarding (k-extract flow)#737
aredenba-rh wants to merge 84 commits into
mainfrom
feature/manage-knowledge-graph

Conversation

@aredenba-rh
Copy link
Copy Markdown
Collaborator

Summary

  • Adds full-page data source onboarding at /knowledge-graphs/{kgId}/data-sources/new (URLs → configure → sequential initial sync → summary), modeled after k-extract designer/new.
  • Adds ongoing operations page at /knowledge-graphs/{kgId}/data-sources (phase1 equivalent) for sync, commits, diff, and maintenance focus.
  • KG manage workspace routes Data Sources to onboarding when dataSourceCount === 0, otherwise to the operations page.
  • Post–KG-create toast navigates to the new onboarding route.

Closes #736

Test plan

  • Create a KG → Manage → Data Sources → lands on /data-sources/new
  • Add GitHub URL(s), configure branch/token, connect → run Start initial sync → see progress and summary
  • Open data sources → operations page with cards, sync history, commit refs
  • Return to manage → Data Sources again → operations page (not wizard)
  • Maintain step → ?focus=maintain filters to maintenance-ready sources
  • Global sidebar /data-sources unchanged

Made with Cursor

aredenba-rh and others added 30 commits May 26, 2026 12:58
* chore(skills): add subagent delivery execution protocol

Add a reusable subagent skill that standardizes issue-based branching,
TDD execution, PR structure, and merge/conflict handling into
feature/manage-knowledge-graph.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(management): add knowledge graph workspace mode lifecycle

Implement schema_bootstrap as the default workspace mode and persist
irreversible transition state to extraction_operations across domain,
repository, API responses, and migration coverage.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
…681)

Add a workspace-status API projection with mode, readiness flags,
transition eligibility, and session pointers, including service and
route authorization coverage for manage workspace rendering.

Co-authored-by: Cursor <cursoragent@cursor.com>
…#682)

Enforce workspace readiness checks for minimum entity/relationship type
coverage and prepopulated type instance presence, and project blocking
reasons so validate/transition workflows can render actionable feedback.

Co-authored-by: Cursor <cursoragent@cursor.com>
Expose authorized validate and transition commands for knowledge graph
workspaces, persist session pointers, and create an extraction-mode
session identifier when moving from bootstrap to extraction operations.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add durable run-level mutation metadata storage and lifecycle persistence
for session/scope identity, timestamps, token-cost totals, and
operation-count summaries linked to each sync run.

Co-authored-by: Cursor <cursoragent@cursor.com>
Emit operation-class counts and token/cost totals from mutation-log
application results into MutationsApplied payloads so downstream sync
lifecycle persistence can finalize run-level metadata.

Co-authored-by: Cursor <cursoragent@cursor.com>
#686)

Scaffold extraction application/presentation package structure and add
pytest-archon rules enforcing DDD layer boundaries plus cross-context
isolation so subsequent extraction features stay architecturally clean.

Co-authored-by: Cursor <cursoragent@cursor.com>
Implement per-user/per-knowledge-graph/per-mode extraction session
lifecycle behaviors with clear-chat reset semantics and archived-session
retention backed by repository ports and unit coverage.

Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve mode-specific extraction skill templates from global defaults and
apply deterministic knowledge-graph override merges so session prompts are
stable, customizable, and repeatable.

Co-authored-by: Cursor <cursoragent@cursor.com>
)

Persist extraction agent sessions and expose scoped APIs for active/list/clear-chat so reset creates a fresh session while preserving archived history and runtime context audit records.

Co-authored-by: Cursor <cursoragent@cursor.com>
Persist clone-head, last-extraction baseline, and tracked-branch head
commit references for data sources and expose them in management API
responses for downstream ingestion and UI commit-status workflows.

Co-authored-by: Cursor <cursoragent@cursor.com>
Prepare Git-backed ingestion context by loading data-source commit references,
refreshing tracked branch head, and passing baseline commit plus resolved
credentials into the ingestion pipeline before packaging begins.

Co-authored-by: Cursor <cursoragent@cursor.com>
# Conflicts:
#	src/api/ingestion/application/services/ingestion_service.py
#	src/api/ingestion/infrastructure/event_handler.py
#	src/api/ingestion/ports/services.py
#	src/api/tests/unit/ingestion/infrastructure/test_ingestion_event_handler.py
Skip heavy extraction when tracked branch head equals the last extraction
baseline by emitting a completed lifecycle event and recording an explicit
no-change audit log entry on the sync run.

Co-authored-by: Cursor <cursoragent@cursor.com>
Expose a data-source diff summary API that compares the last extraction
baseline to tracked branch head and returns aggregate counts plus a
large-list-safe changed-file preview for maintenance decisions.

Co-authored-by: Cursor <cursoragent@cursor.com>
Show commit-based diff counts immediately on each data source card and
render the changed-file list as collapsed-by-default with explicit
expand/collapse controls for large-diff safe browsing.

Co-authored-by: Cursor <cursoragent@cursor.com>
…695)

Add explicit data-source actions to refresh tracked/clone commit references and adopt tracked head as the current extraction baseline. This lets the UI surface per-source changed-file counts with user-controlled commit context updates for maintenance decisioning.

Co-authored-by: Cursor <cursoragent@cursor.com>
Strengthen subagent delivery guidance with a parallel execution model, required context packs, and a blocker-question escalation flow so multiple agents can pause and ask focused questions without serializing delivery.

Co-authored-by: Cursor <cursoragent@cursor.com>
…678) (#697)

Add structured mode-specific agent configuration (system prompt, hierarchy, guardrails, and skill pack defaults) and wire session initialization to resolve and persist the configuration per knowledge graph scope.

Co-authored-by: Cursor <cursoragent@cursor.com>
) (#698)

Seed schema bootstrap sessions with a capabilities-intake prompt that offers first-pass or guided co-design paths, and persist the selected path/capability summary in session runtime context so the conversation remains continuous across requests.

Co-authored-by: Cursor <cursoragent@cursor.com>
…679) (#699)

Build a filesystem runtime context for extraction workloads by materializing ingestion package resources, reconstructing repository files, and exposing a deterministic skills directory path; wire it through extraction event handling and local/deployed container configuration.

Co-authored-by: Cursor <cursoragent@cursor.com>
#700)

Enhance schema browser rows to display prepopulated type indicators and live per-type instance counts with lazy query-backed loading, while extending shared type contracts and tests to cover the new inspector metadata behavior.

Co-authored-by: Cursor <cursoragent@cursor.com>
…671) (#701)

Add manage-authorized run-control operations (start, pause, halt, reset_running, reset_failed, reset_completed, reset_all) over data source sync runs, expose them via dedicated management routes, and verify behavior with unit tests for both service transitions and HTTP contract responses.

Co-authored-by: Cursor <cursoragent@cursor.com>
Expose sync-run token/cost metadata in management API responses and add an extraction telemetry dashboard in the data-sources workspace with active worker counts, status buckets, recent job events, and 24h cost trend indicators backed by auto-refreshing sync data.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add knowledge-graph scoped maintenance schedule APIs with timezone-aware cron evaluation and persisted run outcomes, then expose the controls and history in the data-sources operations UI.

Co-authored-by: Cursor <cursoragent@cursor.com>
…704)

Extend the mutations console with a conversation-assisted draft flow and live entity/relationship inspector that highlights edited fields during the active session and resets highlights after apply/refresh.

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace legacy row actions with Manage, Query, and Delete, remove inline edit controls from the list surface, and align structural tests to the new action contract.

Co-authored-by: Cursor <cursoragent@cursor.com>
)

Implement a dedicated manage workspace route that loads workspace status projection, shows readiness and session pointers, and provides Validate and transition-to-extraction controls.

Co-authored-by: Cursor <cursoragent@cursor.com>
Extend the manage workspace page with an always-visible extraction conversation panel, clear-chat reset action, and a tabbed lower operations area for extraction jobs, manual mutations, and run/log navigation.

Co-authored-by: Cursor <cursoragent@cursor.com>
Rework the manage overview into a phased workspace hub and add unpulled-commit
tracking on data sources so ingestion status matches a git-pull mental model.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/api/management/domain/aggregates/data_source.py (1)

311-346: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Constrain pipeline_mode at the domain boundary.

Ingestion logic branches on exact string equality (pipeline_mode == "ingest_only"), but both DataSource.request_sync(..., pipeline_mode: str = "full") and SyncStarted.pipeline_mode: str = "full" still accept unconstrained str, so invalid values can silently fall back to the "full" path. HTTP is already constrained (TriggerSyncRequest.mode: Literal["full","ingest_only"]), but non-HTTP/internal callers (or crafted outbox payloads) bypass that.

Proposed direction
-from typing import TYPE_CHECKING
+from typing import TYPE_CHECKING, Literal
@@
-        pipeline_mode: str = "full",
+        pipeline_mode: Literal["full", "ingest_only"] = "full",

Tighten pipeline_mode consistently at the aggregate/event (and ideally DataSourceService.trigger_sync, which is also pipeline_mode: str) to the existing SyncPipelineMode = Literal["full", "ingest_only"].

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/api/management/domain/aggregates/data_source.py` around lines 311 - 346,
The pipeline_mode parameter is currently an unconstrained str in
DataSource.request_sync and carried onto the SyncStarted event, allowing invalid
values to slip through; change the type annotations for DataSource.request_sync,
SyncStarted.pipeline_mode, and DataSourceService.trigger_sync to use a shared
SyncPipelineMode = Literal["full","ingest_only"] type and add a runtime check in
DataSource.request_sync (and/or in SyncStarted construction) to validate the
passed pipeline_mode, raising a ValueError or similar if it's not one of the
allowed literals so callers and outbox payloads cannot silently default to the
"full" branch.
src/dev-ui/app/pages/knowledge-graphs/[kgId]/data-sources/index.vue (1)

355-371: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Refresh the full KG here and inspect the settled results before showing success.

Under ?focus=maintain, this only refreshes visibleDataSources, so hidden sources never get their branch refs updated and newly stale repos can stay hidden. On top of that, Promise.allSettled() never throws, so this code can still toast “Up to date” when every refresh request failed. Refresh dataSources.value, check how many calls actually succeeded, and use that same source list for the disabled state at Line 639.

Proposed fix
 async function checkAllCommitRefs() {
-  if (visibleDataSources.value.length === 0) return
+  if (dataSources.value.length === 0) return
   checkingAllCommits.value = true
   try {
-    await Promise.allSettled(
-      visibleDataSources.value.map((ds) =>
+    const results = await Promise.allSettled(
+      dataSources.value.map((ds) =>
         apiFetch(`/management/data-sources/${ds.id}/commit-refs/refresh`, { method: 'POST' }),
       ),
     )
+    const refreshedCount = results.filter((result) => result.status === 'fulfilled').length
+    if (refreshedCount === 0) {
+      toast.error('Failed to check for new commits')
+      return
+    }
+
     await loadDataSources()
     const unpulled = visibleDataSources.value.filter((ds) => hasUnpulledCommits(ds))
     if (unpulled.length === 0) {
       toast.success('Up to date with remote branches')
     } else {
@@
-                  :disabled="checkingAllCommits || visibleDataSources.length === 0"
+                  :disabled="checkingAllCommits || dataSources.length === 0"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/dev-ui/app/pages/knowledge-graphs/`[kgId]/data-sources/index.vue around
lines 355 - 371, Replace the refresh of only visibleDataSources with a refresh
over dataSources.value, await Promise.allSettled on apiFetch calls for each ds
in dataSources.value, then inspect the settled results to build a set of
successfully refreshed source IDs (check PromiseResult.status === 'fulfilled'),
call loadDataSources() to reload the full list, compute unpulled by filtering
the reloaded dataSources.value by (1) membership in the succeeded-ID set and (2)
hasUnpulledCommits(ds), and finally show toasts based on the actual succeeded
set (if none succeeded show an error toast, otherwise show “Up to date” only
when succeeded sources have no unpulled commits). Also ensure the same
succeeded-ID set is used for any disabled-state logic that referenced the
previous visibleDataSources.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/dev-ui/app/pages/knowledge-graphs/`[kgId]/manage.vue:
- Around line 451-484: The loop in loadOverviewMetrics iterates dataSources
serially and should be parallelized: replace the for-await loop over dataSources
with a single Promise.allSettled over dataSources.map(ds =>
apiFetch(`/management/data-sources/${ds.id}/sync-runs`) returning the runs
alongside the original ds), then iterate the settled results to compute each row
using the same logic (call latestSyncRun on successful results,
resolvePrepStatusLabel, mark statusVariant 'success' for 'ingested'/'completed',
treat rejected promises or missing runs with the default 'not prepared'), update
prepared using isIngestionPreparedAtHead(ds) (still checked per ds), and push
WorkspaceHubSourceRow objects into rows; ensure errors from rejected promises do
not throw and that behavior for default statuses remains identical to existing
status/statusVariant initialization.

---

Outside diff comments:
In `@src/api/management/domain/aggregates/data_source.py`:
- Around line 311-346: The pipeline_mode parameter is currently an unconstrained
str in DataSource.request_sync and carried onto the SyncStarted event, allowing
invalid values to slip through; change the type annotations for
DataSource.request_sync, SyncStarted.pipeline_mode, and
DataSourceService.trigger_sync to use a shared SyncPipelineMode =
Literal["full","ingest_only"] type and add a runtime check in
DataSource.request_sync (and/or in SyncStarted construction) to validate the
passed pipeline_mode, raising a ValueError or similar if it's not one of the
allowed literals so callers and outbox payloads cannot silently default to the
"full" branch.

In `@src/dev-ui/app/pages/knowledge-graphs/`[kgId]/data-sources/index.vue:
- Around line 355-371: Replace the refresh of only visibleDataSources with a
refresh over dataSources.value, await Promise.allSettled on apiFetch calls for
each ds in dataSources.value, then inspect the settled results to build a set of
successfully refreshed source IDs (check PromiseResult.status === 'fulfilled'),
call loadDataSources() to reload the full list, compute unpulled by filtering
the reloaded dataSources.value by (1) membership in the succeeded-ID set and (2)
hasUnpulledCommits(ds), and finally show toasts based on the actual succeeded
set (if none succeeded show an error toast, otherwise show “Up to date” only
when succeeded sources have no unpulled commits). Also ensure the same
succeeded-ID set is used for any disabled-state logic that referenced the
previous visibleDataSources.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 1328d988-457b-4301-b5c6-98291775ffd6

📥 Commits

Reviewing files that changed from the base of the PR and between e0001b8 and 3012df5.

📒 Files selected for processing (15)
  • src/api/main.py
  • src/api/management/domain/aggregates/data_source.py
  • src/api/management/domain/commit_pull_state.py
  • src/api/management/presentation/data_sources/models.py
  • src/api/management/presentation/data_sources/routes.py
  • src/api/tests/unit/management/domain/test_commit_pull_state.py
  • src/api/tests/unit/management/infrastructure/test_sync_lifecycle_handler.py
  • src/api/tests/unit/management/test_data_source.py
  • src/dev-ui/app/pages/knowledge-graphs/[kgId]/data-sources/index.vue
  • src/dev-ui/app/pages/knowledge-graphs/[kgId]/manage.vue
  • src/dev-ui/app/tests/kg-data-sources-phase1.test.ts
  • src/dev-ui/app/tests/kg-manage-workspace-hub.test.ts
  • src/dev-ui/app/tests/knowledge-graph-manage-workspace.test.ts
  • src/dev-ui/app/utils/kgDataSourcesCommits.ts
  • src/dev-ui/app/utils/kgManageWorkspaceHub.ts

Comment thread src/dev-ui/app/pages/knowledge-graphs/[kgId]/manage.vue
aredenba-rh and others added 3 commits May 28, 2026 22:37
* feat(ui): align graph management step with k-extract phase2 layout

Rework the design chat, schema/session panels, and mode switcher with locked
extraction modes until the workspace transitions to extraction operations.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(ui): rename graph management chat title to Graph Management Assistant

Co-authored-by: Cursor <cursoragent@cursor.com>

* docs(extraction): specify sticky session chat turns and runtime

Document Graph Management chat as NDJSON streaming turns inside sticky
Claude Agent SDK containers with JobPackage gating and UI mode skills.

Closes #738

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
…#744)

Introduce sticky-session-aware chat orchestration with JobPackage gating,
UI-mode skill overlays, and a tracer-bullet deterministic agent. Closes #739.
Closes #740.

Co-authored-by: Cursor <cursoragent@cursor.com>
Stream NDJSON chat turns with thinking/wait activity lines and reload session
history after each turn. Closes #741.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/api/extraction/application/chat_turn_service.py`:
- Line 136: The code unconditionally sets
session.runtime_context["job_package"]["phase"] =
SessionJobPackagePhase.READY.value which can clobber intentional non-wait modes;
change this to only set the phase when a JobPackage actually exists and when its
current phase indicates it should transition to READY (e.g., phase is
missing/falsy or equals an expected prior state) — check
session.runtime_context.get("job_package") and the existing
job_package.get("phase") before assigning, and skip assignment when the current
phase represents a non-wait or NOT_REQUIRED mode so downstream
behavior/telemetry is preserved.

In `@src/api/extraction/dependencies.py`:
- Around line 62-77: The block in get_extraction_chat_turn_service duplicates
construction of ExtractionAgentSessionService (using
ExtractionAgentSessionRepository,
ExtractionSkillResolutionService/ExtractionSkillOverrideRepository, and
ExtractionSessionRunMetricsReader) already built earlier; refactor by creating
the ExtractionAgentSessionService once and reusing that instance instead of
reconstructing it (e.g., extract the shared construction into a single local
variable or helper and reference that variable for session_service), keeping the
same dependencies: ExtractionAgentSessionService,
ExtractionAgentSessionRepository(session),
ExtractionSkillResolutionService(override_repository=ExtractionSkillOverrideRepository()),
ExtractionSessionRunMetricsReader(session) and the sticky_runtime_manager
parameter.

In `@src/api/extraction/presentation/routes.py`:
- Around line 182-192: Wrap the async iteration in event_stream() in a
try/except so any exception from service.stream_chat_turn(...) is caught; on
exception yield a final NDJSON terminal event (e.g.
json.dumps({"type":"done","error": str(e)}) + "\n") before returning to ensure
the client receives a terminal "done" event, and optionally log the error with
the same context; keep existing yields for normal events and still return
StreamingResponse(event_stream(), media_type="application/x-ndjson").

In `@src/api/tests/unit/extraction/application/test_chat_turn_service.py`:
- Line 11: Remove the unused import ExtractionSkillResolutionService from the
test file; locate the import statement "from
extraction.application.skill_resolution_service import
ExtractionSkillResolutionService" in test_chat_turn_service.py and delete it (or
replace it with any actually used symbol from that module if needed) so the
unused-import Ruff (F401) error is resolved.

In `@src/dev-ui/app/components/extraction/SharedConversationPanel.vue`:
- Around line 124-145: The renderAssistantHtml function currently injects
untrusted URLs directly into href via the link-replacement regex, enabling
javascript: or quote-based attribute XSS when the output is used with v-html;
fix by validating and sanitizing the captured URL in the link replacement step
inside renderAssistantHtml: only allow safe schemes (e.g., http, https, mailto,
and relative paths), reject or replace unsafe ones with a safe placeholder
(e.g., '#'), HTML-escape or URL-encode the href value, and add safe attributes
like rel="noopener noreferrer" and target="_blank" as appropriate; update the
/\[([^\]]+)\]\(([^)]+)\)/g replacement to call a sanitizer helper (e.g.,
sanitizeUrl or isSafeScheme) before returning the anchor HTML so v-html never
receives an unsafe href.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 0010ff04-feed-412b-a464-a0c859169a97

📥 Commits

Reviewing files that changed from the base of the PR and between 3012df5 and 8dfb95c.

📒 Files selected for processing (30)
  • specs/extraction/agent-sessions.spec.md
  • specs/extraction/chat-turns.spec.md
  • specs/extraction/operations.spec.md
  • specs/extraction/sticky-session-runtime.spec.md
  • specs/index.spec.md
  • specs/nfr/workload-execution.spec.md
  • src/api/extraction/application/__init__.py
  • src/api/extraction/application/agent_session_service.py
  • src/api/extraction/application/chat_turn_service.py
  • src/api/extraction/application/job_package_gate.py
  • src/api/extraction/application/skill_resolution_service.py
  • src/api/extraction/dependencies.py
  • src/api/extraction/domain/value_objects.py
  • src/api/extraction/infrastructure/deterministic_chat_agent.py
  • src/api/extraction/infrastructure/ingestion_readiness_reader.py
  • src/api/extraction/ports/chat_agent.py
  • src/api/extraction/ports/ingestion_readiness.py
  • src/api/extraction/presentation/models.py
  • src/api/extraction/presentation/routes.py
  • src/api/tests/unit/extraction/application/test_chat_turn_service.py
  • src/api/tests/unit/extraction/application/test_job_package_gate.py
  • src/dev-ui/app/components/extraction/SharedConversationPanel.vue
  • src/dev-ui/app/pages/knowledge-graphs/[kgId]/manage.vue
  • src/dev-ui/app/tests/kg-extraction-chat.test.ts
  • src/dev-ui/app/tests/kg-graph-management-artifacts.test.ts
  • src/dev-ui/app/tests/kg-graph-management-modes.test.ts
  • src/dev-ui/app/tests/knowledge-graph-manage-workspace.test.ts
  • src/dev-ui/app/utils/kgExtractionChat.ts
  • src/dev-ui/app/utils/kgGraphManagement.ts
  • src/dev-ui/app/utils/kgGraphManagementArtifacts.ts

Comment thread src/api/extraction/application/chat_turn_service.py Outdated
Comment thread src/api/extraction/dependencies.py
Comment on lines +182 to +192
async def event_stream():
async for event in service.stream_chat_turn(
user_id=current_user.user_id.value,
knowledge_graph_id=knowledge_graph_id,
mode=mode,
ui_mode=request.graph_management_ui_mode,
message=request.message,
):
yield json.dumps(event) + "\n"

return StreamingResponse(event_stream(), media_type="application/x-ndjson")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Wrap stream generation failures and emit a terminal error event.

event_stream() has no exception handling. If service.stream_chat_turn(...) raises, clients receive a truncated NDJSON stream with no terminal event, which is brittle for UI state handling. Add a guarded error done event before ending the stream.

Suggested fix
 async def event_stream():
-    async for event in service.stream_chat_turn(
-        user_id=current_user.user_id.value,
-        knowledge_graph_id=knowledge_graph_id,
-        mode=mode,
-        ui_mode=request.graph_management_ui_mode,
-        message=request.message,
-    ):
-        yield json.dumps(event) + "\n"
+    try:
+        async for event in service.stream_chat_turn(
+            user_id=current_user.user_id.value,
+            knowledge_graph_id=knowledge_graph_id,
+            mode=mode,
+            ui_mode=request.graph_management_ui_mode,
+            message=request.message,
+        ):
+            yield json.dumps(event) + "\n"
+    except Exception:
+        yield json.dumps(
+            {
+                "type": "done",
+                "ok": False,
+                "error": {"code": "STREAM_FAILED", "message": "Chat stream failed."},
+            }
+        ) + "\n"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def event_stream():
async for event in service.stream_chat_turn(
user_id=current_user.user_id.value,
knowledge_graph_id=knowledge_graph_id,
mode=mode,
ui_mode=request.graph_management_ui_mode,
message=request.message,
):
yield json.dumps(event) + "\n"
return StreamingResponse(event_stream(), media_type="application/x-ndjson")
async def event_stream():
try:
async for event in service.stream_chat_turn(
user_id=current_user.user_id.value,
knowledge_graph_id=knowledge_graph_id,
mode=mode,
ui_mode=request.graph_management_ui_mode,
message=request.message,
):
yield json.dumps(event) + "\n"
except Exception:
yield json.dumps(
{
"type": "done",
"ok": False,
"error": {"code": "STREAM_FAILED", "message": "Chat stream failed."},
}
) + "\n"
return StreamingResponse(event_stream(), media_type="application/x-ndjson")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/api/extraction/presentation/routes.py` around lines 182 - 192, Wrap the
async iteration in event_stream() in a try/except so any exception from
service.stream_chat_turn(...) is caught; on exception yield a final NDJSON
terminal event (e.g. json.dumps({"type":"done","error": str(e)}) + "\n") before
returning to ensure the client receives a terminal "done" event, and optionally
log the error with the same context; keep existing yields for normal events and
still return StreamingResponse(event_stream(),
media_type="application/x-ndjson").

Comment thread src/api/tests/unit/extraction/application/test_chat_turn_service.py Outdated
Comment on lines +124 to +145
function renderAssistantHtml(text: string): string {
let s = text.replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;')
s = s.replace(/\*\*([^*]+)\*\*/g, '<strong class="font-semibold text-foreground">$1</strong>')
s = s.replace(
/`([^`]+)`/g,
'<code class="rounded bg-muted px-1 py-0.5 text-xs font-mono text-foreground">$1</code>',
)
s = s.replace(
/^> (.+)$/gm,
'<p class="my-2 border-l-2 border-amber-500/60 pl-3 text-sm text-muted-foreground italic">$1</p>',
)
s = s.replace(
/\[([^\]]+)\]\(([^)]+)\)/g,
'<a class="text-primary font-medium underline underline-offset-2 hover:text-primary/90" href="$2">$1</a>',
)
s = s.replace(/## (.+)$/gm, '<h3 class="text-base font-semibold mt-3 mb-1 text-foreground">$1</h3>')
s = s.replace(/### (.+)$/gm, '<h4 class="text-sm font-semibold mt-2 text-foreground">$1</h4>')
s = s.replace(/^---$/gm, '<hr class="my-3 border-border" />')
s = s.replace(/\n\n+/g, '<br /><br />')
s = s.replace(/\n/g, '<br />')
return s
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Sanitize markdown links before v-html rendering (XSS risk)

Line 137 interpolates untrusted URL text directly into href, and Line 293 renders the result with v-html. This allows unsafe schemes (for example javascript:) and quote-based attribute injection.

Proposed fix
+function sanitizeHref(raw: string): string | null {
+  const value = raw.trim()
+  if (value.startsWith('/')) return value.replace(/"/g, '%22')
+  try {
+    const parsed = new URL(value)
+    if (!['http:', 'https:', 'mailto:'].includes(parsed.protocol)) return null
+    return parsed.href.replace(/"/g, '%22')
+  } catch {
+    return null
+  }
+}
+
 function renderAssistantHtml(text: string): string {
-  let s = text.replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;')
+  let s = text
+    .replace(/&/g, '&amp;')
+    .replace(/</g, '&lt;')
+    .replace(/>/g, '&gt;')
+    .replace(/"/g, '&quot;')
+    .replace(/'/g, '&`#39`;')
@@
-  s = s.replace(
-    /\[([^\]]+)\]\(([^)]+)\)/g,
-    '<a class="text-primary font-medium underline underline-offset-2 hover:text-primary/90" href="$2">$1</a>',
-  )
+  s = s.replace(/\[([^\]]+)\]\(([^)]+)\)/g, (_, label: string, rawHref: string) => {
+    const href = sanitizeHref(rawHref)
+    if (!href) return label
+    return `<a class="text-primary font-medium underline underline-offset-2 hover:text-primary/90" href="${href}" target="_blank" rel="noopener noreferrer">${label}</a>`
+  })

As per coding guidelines **: Focus on major issues impacting performance, readability, maintainability and security.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function renderAssistantHtml(text: string): string {
let s = text.replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;')
s = s.replace(/\*\*([^*]+)\*\*/g, '<strong class="font-semibold text-foreground">$1</strong>')
s = s.replace(
/`([^`]+)`/g,
'<code class="rounded bg-muted px-1 py-0.5 text-xs font-mono text-foreground">$1</code>',
)
s = s.replace(
/^> (.+)$/gm,
'<p class="my-2 border-l-2 border-amber-500/60 pl-3 text-sm text-muted-foreground italic">$1</p>',
)
s = s.replace(
/\[([^\]]+)\]\(([^)]+)\)/g,
'<a class="text-primary font-medium underline underline-offset-2 hover:text-primary/90" href="$2">$1</a>',
)
s = s.replace(/## (.+)$/gm, '<h3 class="text-base font-semibold mt-3 mb-1 text-foreground">$1</h3>')
s = s.replace(/### (.+)$/gm, '<h4 class="text-sm font-semibold mt-2 text-foreground">$1</h4>')
s = s.replace(/^---$/gm, '<hr class="my-3 border-border" />')
s = s.replace(/\n\n+/g, '<br /><br />')
s = s.replace(/\n/g, '<br />')
return s
}
function sanitizeHref(raw: string): string | null {
const value = raw.trim()
if (value.startsWith('/')) return value.replace(/"/g, '%22')
try {
const parsed = new URL(value)
if (!['http:', 'https:', 'mailto:'].includes(parsed.protocol)) return null
return parsed.href.replace(/"/g, '%22')
} catch {
return null
}
}
function renderAssistantHtml(text: string): string {
let s = text
.replace(/&/g, '&amp;')
.replace(/</g, '&lt;')
.replace(/>/g, '&gt;')
.replace(/"/g, '&quot;')
.replace(/'/g, '&`#39`;')
s = s.replace(/\*\*([^*]+)\*\*/g, '<strong class="font-semibold text-foreground">$1</strong>')
s = s.replace(
/`([^`]+)`/g,
'<code class="rounded bg-muted px-1 py-0.5 text-xs font-mono text-foreground">$1</code>',
)
s = s.replace(
/^> (.+)$/gm,
'<p class="my-2 border-l-2 border-amber-500/60 pl-3 text-sm text-muted-foreground italic">$1</p>',
)
s = s.replace(/\[([^\]]+)\]\(([^)]+)\)/g, (_, label: string, rawHref: string) => {
const href = sanitizeHref(rawHref)
if (!href) return label
return `<a class="text-primary font-medium underline underline-offset-2 hover:text-primary/90" href="${href}" target="_blank" rel="noopener noreferrer">${label}</a>`
})
s = s.replace(/## (.+)$/gm, '<h3 class="text-base font-semibold mt-3 mb-1 text-foreground">$1</h3>')
s = s.replace(/### (.+)$/gm, '<h4 class="text-sm font-semibold mt-2 text-foreground">$1</h4>')
s = s.replace(/^---$/gm, '<hr class="my-3 border-border" />')
s = s.replace(/\n\n+/g, '<br /><br />')
s = s.replace(/\n/g, '<br />')
return s
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/dev-ui/app/components/extraction/SharedConversationPanel.vue` around
lines 124 - 145, The renderAssistantHtml function currently injects untrusted
URLs directly into href via the link-replacement regex, enabling javascript: or
quote-based attribute XSS when the output is used with v-html; fix by validating
and sanitizing the captured URL in the link replacement step inside
renderAssistantHtml: only allow safe schemes (e.g., http, https, mailto, and
relative paths), reject or replace unsafe ones with a safe placeholder (e.g.,
'#'), HTML-escape or URL-encode the href value, and add safe attributes like
rel="noopener noreferrer" and target="_blank" as appropriate; update the
/\[([^\]]+)\]\(([^)]+)\)/g replacement to call a sanitizer helper (e.g.,
sanitizeUrl or isSafeScheme) before returning the anchor HTML so v-html never
receives an unsafe href.

…746)

Ship kartograph-agent-runtime container image with NDJSON turn API, mount
skills and JobPackage workspaces, inject chat-scoped workload tokens, and
delegate graph-management chat turns to the remote runtime when container
backend is enabled. Closes #742.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread src/agent-runtime/kartograph_agent_runtime/server.py Fixed
aredenba-rh and others added 19 commits May 29, 2026 01:47
Align sticky Claude Agent SDK containers with k-extract Vertex auth and warm
the graph-management assistant on UI entry with streamed readiness progress.

Co-authored-by: Cursor <cursoragent@cursor.com>
Prevent JIT provisioning conflicts when Keycloak re-imports the realm and
Postgres still holds rows keyed by the previous SSO subject.

Co-authored-by: Cursor <cursoragent@cursor.com>
Mount gcloud credentials at /gcloud/config and run sticky containers as the
host UID so Claude Agent SDK can reach Vertex AI, while keeping the API root
for Docker-out-of-Docker in dev.

Co-authored-by: Cursor <cursoragent@cursor.com>
Drop the redundant branch tip column from the KG data sources table.

Co-authored-by: Cursor <cursoragent@cursor.com>
Load prepared archives even in schema-design mode, refresh the workspace
on chat reuse, point Claude SDK at /workspace, and remove sibling sticky
and worker containers during make down.

Co-authored-by: Cursor <cursoragent@cursor.com>
Incremental prepares were overwriting last_prepared_file_count with the
number of changed files, so the data sources table showed the wrong
"Files on branch" value after subsequent prepares.

Co-authored-by: Cursor <cursoragent@cursor.com>
Background refreshes no longer toggle the page-level loading gate, so
prepare polling updates status in place with a subtle updating indicator.

Co-authored-by: Cursor <cursoragent@cursor.com>
Graph Management and other manage steps no longer stretch edge-to-edge
on wide screens, matching the data sources workspace layout.

Co-authored-by: Cursor <cursoragent@cursor.com>
Expose job_package_available on data source listings, rematerialize when
the ZIP is gone, and skip ingest-only no-changes short-circuit without it.

Co-authored-by: Cursor <cursoragent@cursor.com>
Skip workspace rematerialization when the container is healthy and JobPackage
IDs match, report 503 until the agent workspace is ready, and only save user
messages after the assistant turn completes or fails.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ct layout

Split the combined schema nav/detail card into a sticky left navigator and
right detail column to match k-extract's Design Artifacts pattern.

Co-authored-by: Cursor <cursoragent@cursor.com>
Surface tool use, reasoning, task progress, and compose previews as NDJSON
thinking events so the Graph Management Assistant panel updates while Vertex
work is in flight.

Co-authored-by: Cursor <cursoragent@cursor.com>
…kspaces

Ensure ingest-only prepares full-branch JobPackages and only materialize packages that contain repository content so Graph Management sessions can reliably read repo files. Add workspace source indexing plus prompt/thinking updates so the agent reports accurate available files and tools.

Co-authored-by: Cursor <cursoragent@cursor.com>
Process SyncStarted outbox events with bounded concurrency and fetch GitHub blobs in parallel to reduce ingestion-context preparation time for multi-source batches.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ement

Expose separate schema-entities and schema-relationships rail items with readiness-driven status and detail panels so designers can track type coverage before transitioning.

Co-authored-by: Cursor <cursoragent@cursor.com>
…imeout

Stream rolling three-line activity updates through warmup, SDK heartbeats,
and the Graph Management UI, with unbuffered NDJSON and clearer timeout
diagnostics. Increase default sticky turn timeout to 10 minutes.

Co-authored-by: Cursor <cursoragent@cursor.com>
…x turns

Improve incremental NDJSON delivery, SDK thinking dispatch, and error
handling; default max_turns to 500 so graph management turns are not capped at 8.

Co-authored-by: Cursor <cursoragent@cursor.com>
Accumulate StreamEvent text deltas, join assistant text blocks, and finalize
turn replies from result metadata so tool-only completions are not reported as empty.

Co-authored-by: Cursor <cursoragent@cursor.com>
)

return StreamingResponse(
event_stream(),
aredenba-rh and others added 2 commits June 3, 2026 01:12
Return AGENT_NO_TEXTUAL_REPLY when no reply can be extracted rather than
surfacing a placeholder string as an assistant message.

Co-authored-by: Cursor <cursoragent@cursor.com>
asyncio.wait_for on query().__anext__() cancelled pending reads after 8s,
breaking the Claude Agent SDK stream before ResultMessage and reply text arrived.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KG-scoped data source onboarding (k-extract-style full-page flow)

2 participants