feat(ui): KG-scoped data source onboarding (k-extract flow)#737
feat(ui): KG-scoped data source onboarding (k-extract flow)#737aredenba-rh wants to merge 84 commits into
Conversation
* chore(skills): add subagent delivery execution protocol Add a reusable subagent skill that standardizes issue-based branching, TDD execution, PR structure, and merge/conflict handling into feature/manage-knowledge-graph. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(management): add knowledge graph workspace mode lifecycle Implement schema_bootstrap as the default workspace mode and persist irreversible transition state to extraction_operations across domain, repository, API responses, and migration coverage. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
…681) Add a workspace-status API projection with mode, readiness flags, transition eligibility, and session pointers, including service and route authorization coverage for manage workspace rendering. Co-authored-by: Cursor <cursoragent@cursor.com>
…#682) Enforce workspace readiness checks for minimum entity/relationship type coverage and prepopulated type instance presence, and project blocking reasons so validate/transition workflows can render actionable feedback. Co-authored-by: Cursor <cursoragent@cursor.com>
Expose authorized validate and transition commands for knowledge graph workspaces, persist session pointers, and create an extraction-mode session identifier when moving from bootstrap to extraction operations. Co-authored-by: Cursor <cursoragent@cursor.com>
Add durable run-level mutation metadata storage and lifecycle persistence for session/scope identity, timestamps, token-cost totals, and operation-count summaries linked to each sync run. Co-authored-by: Cursor <cursoragent@cursor.com>
Emit operation-class counts and token/cost totals from mutation-log application results into MutationsApplied payloads so downstream sync lifecycle persistence can finalize run-level metadata. Co-authored-by: Cursor <cursoragent@cursor.com>
#686) Scaffold extraction application/presentation package structure and add pytest-archon rules enforcing DDD layer boundaries plus cross-context isolation so subsequent extraction features stay architecturally clean. Co-authored-by: Cursor <cursoragent@cursor.com>
Implement per-user/per-knowledge-graph/per-mode extraction session lifecycle behaviors with clear-chat reset semantics and archived-session retention backed by repository ports and unit coverage. Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve mode-specific extraction skill templates from global defaults and apply deterministic knowledge-graph override merges so session prompts are stable, customizable, and repeatable. Co-authored-by: Cursor <cursoragent@cursor.com>
Persist clone-head, last-extraction baseline, and tracked-branch head commit references for data sources and expose them in management API responses for downstream ingestion and UI commit-status workflows. Co-authored-by: Cursor <cursoragent@cursor.com>
Prepare Git-backed ingestion context by loading data-source commit references, refreshing tracked branch head, and passing baseline commit plus resolved credentials into the ingestion pipeline before packaging begins. Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # src/api/ingestion/application/services/ingestion_service.py # src/api/ingestion/infrastructure/event_handler.py # src/api/ingestion/ports/services.py # src/api/tests/unit/ingestion/infrastructure/test_ingestion_event_handler.py
Skip heavy extraction when tracked branch head equals the last extraction baseline by emitting a completed lifecycle event and recording an explicit no-change audit log entry on the sync run. Co-authored-by: Cursor <cursoragent@cursor.com>
Expose a data-source diff summary API that compares the last extraction baseline to tracked branch head and returns aggregate counts plus a large-list-safe changed-file preview for maintenance decisions. Co-authored-by: Cursor <cursoragent@cursor.com>
Show commit-based diff counts immediately on each data source card and render the changed-file list as collapsed-by-default with explicit expand/collapse controls for large-diff safe browsing. Co-authored-by: Cursor <cursoragent@cursor.com>
…695) Add explicit data-source actions to refresh tracked/clone commit references and adopt tracked head as the current extraction baseline. This lets the UI surface per-source changed-file counts with user-controlled commit context updates for maintenance decisioning. Co-authored-by: Cursor <cursoragent@cursor.com>
Strengthen subagent delivery guidance with a parallel execution model, required context packs, and a blocker-question escalation flow so multiple agents can pause and ask focused questions without serializing delivery. Co-authored-by: Cursor <cursoragent@cursor.com>
) (#698) Seed schema bootstrap sessions with a capabilities-intake prompt that offers first-pass or guided co-design paths, and persist the selected path/capability summary in session runtime context so the conversation remains continuous across requests. Co-authored-by: Cursor <cursoragent@cursor.com>
…679) (#699) Build a filesystem runtime context for extraction workloads by materializing ingestion package resources, reconstructing repository files, and exposing a deterministic skills directory path; wire it through extraction event handling and local/deployed container configuration. Co-authored-by: Cursor <cursoragent@cursor.com>
#700) Enhance schema browser rows to display prepopulated type indicators and live per-type instance counts with lazy query-backed loading, while extending shared type contracts and tests to cover the new inspector metadata behavior. Co-authored-by: Cursor <cursoragent@cursor.com>
…671) (#701) Add manage-authorized run-control operations (start, pause, halt, reset_running, reset_failed, reset_completed, reset_all) over data source sync runs, expose them via dedicated management routes, and verify behavior with unit tests for both service transitions and HTTP contract responses. Co-authored-by: Cursor <cursoragent@cursor.com>
Expose sync-run token/cost metadata in management API responses and add an extraction telemetry dashboard in the data-sources workspace with active worker counts, status buckets, recent job events, and 24h cost trend indicators backed by auto-refreshing sync data. Co-authored-by: Cursor <cursoragent@cursor.com>
Add knowledge-graph scoped maintenance schedule APIs with timezone-aware cron evaluation and persisted run outcomes, then expose the controls and history in the data-sources operations UI. Co-authored-by: Cursor <cursoragent@cursor.com>
…704) Extend the mutations console with a conversation-assisted draft flow and live entity/relationship inspector that highlights edited fields during the active session and resets highlights after apply/refresh. Co-authored-by: Cursor <cursoragent@cursor.com>
Replace legacy row actions with Manage, Query, and Delete, remove inline edit controls from the list surface, and align structural tests to the new action contract. Co-authored-by: Cursor <cursoragent@cursor.com>
Extend the manage workspace page with an always-visible extraction conversation panel, clear-chat reset action, and a tabbed lower operations area for extraction jobs, manual mutations, and run/log navigation. Co-authored-by: Cursor <cursoragent@cursor.com>
Rework the manage overview into a phased workspace hub and add unpulled-commit tracking on data sources so ingestion status matches a git-pull mental model. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/api/management/domain/aggregates/data_source.py (1)
311-346:⚠️ Potential issue | 🟠 Major | ⚡ Quick winConstrain
pipeline_modeat the domain boundary.Ingestion logic branches on exact string equality (
pipeline_mode == "ingest_only"), but bothDataSource.request_sync(..., pipeline_mode: str = "full")andSyncStarted.pipeline_mode: str = "full"still accept unconstrainedstr, so invalid values can silently fall back to the"full"path. HTTP is already constrained (TriggerSyncRequest.mode: Literal["full","ingest_only"]), but non-HTTP/internal callers (or crafted outbox payloads) bypass that.Proposed direction
-from typing import TYPE_CHECKING +from typing import TYPE_CHECKING, Literal @@ - pipeline_mode: str = "full", + pipeline_mode: Literal["full", "ingest_only"] = "full",Tighten
pipeline_modeconsistently at the aggregate/event (and ideallyDataSourceService.trigger_sync, which is alsopipeline_mode: str) to the existingSyncPipelineMode = Literal["full", "ingest_only"].🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/api/management/domain/aggregates/data_source.py` around lines 311 - 346, The pipeline_mode parameter is currently an unconstrained str in DataSource.request_sync and carried onto the SyncStarted event, allowing invalid values to slip through; change the type annotations for DataSource.request_sync, SyncStarted.pipeline_mode, and DataSourceService.trigger_sync to use a shared SyncPipelineMode = Literal["full","ingest_only"] type and add a runtime check in DataSource.request_sync (and/or in SyncStarted construction) to validate the passed pipeline_mode, raising a ValueError or similar if it's not one of the allowed literals so callers and outbox payloads cannot silently default to the "full" branch.src/dev-ui/app/pages/knowledge-graphs/[kgId]/data-sources/index.vue (1)
355-371:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRefresh the full KG here and inspect the settled results before showing success.
Under
?focus=maintain, this only refreshesvisibleDataSources, so hidden sources never get their branch refs updated and newly stale repos can stay hidden. On top of that,Promise.allSettled()never throws, so this code can still toast “Up to date” when every refresh request failed. RefreshdataSources.value, check how many calls actually succeeded, and use that same source list for the disabled state at Line 639.Proposed fix
async function checkAllCommitRefs() { - if (visibleDataSources.value.length === 0) return + if (dataSources.value.length === 0) return checkingAllCommits.value = true try { - await Promise.allSettled( - visibleDataSources.value.map((ds) => + const results = await Promise.allSettled( + dataSources.value.map((ds) => apiFetch(`/management/data-sources/${ds.id}/commit-refs/refresh`, { method: 'POST' }), ), ) + const refreshedCount = results.filter((result) => result.status === 'fulfilled').length + if (refreshedCount === 0) { + toast.error('Failed to check for new commits') + return + } + await loadDataSources() const unpulled = visibleDataSources.value.filter((ds) => hasUnpulledCommits(ds)) if (unpulled.length === 0) { toast.success('Up to date with remote branches') } else { @@ - :disabled="checkingAllCommits || visibleDataSources.length === 0" + :disabled="checkingAllCommits || dataSources.length === 0"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/dev-ui/app/pages/knowledge-graphs/`[kgId]/data-sources/index.vue around lines 355 - 371, Replace the refresh of only visibleDataSources with a refresh over dataSources.value, await Promise.allSettled on apiFetch calls for each ds in dataSources.value, then inspect the settled results to build a set of successfully refreshed source IDs (check PromiseResult.status === 'fulfilled'), call loadDataSources() to reload the full list, compute unpulled by filtering the reloaded dataSources.value by (1) membership in the succeeded-ID set and (2) hasUnpulledCommits(ds), and finally show toasts based on the actual succeeded set (if none succeeded show an error toast, otherwise show “Up to date” only when succeeded sources have no unpulled commits). Also ensure the same succeeded-ID set is used for any disabled-state logic that referenced the previous visibleDataSources.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/dev-ui/app/pages/knowledge-graphs/`[kgId]/manage.vue:
- Around line 451-484: The loop in loadOverviewMetrics iterates dataSources
serially and should be parallelized: replace the for-await loop over dataSources
with a single Promise.allSettled over dataSources.map(ds =>
apiFetch(`/management/data-sources/${ds.id}/sync-runs`) returning the runs
alongside the original ds), then iterate the settled results to compute each row
using the same logic (call latestSyncRun on successful results,
resolvePrepStatusLabel, mark statusVariant 'success' for 'ingested'/'completed',
treat rejected promises or missing runs with the default 'not prepared'), update
prepared using isIngestionPreparedAtHead(ds) (still checked per ds), and push
WorkspaceHubSourceRow objects into rows; ensure errors from rejected promises do
not throw and that behavior for default statuses remains identical to existing
status/statusVariant initialization.
---
Outside diff comments:
In `@src/api/management/domain/aggregates/data_source.py`:
- Around line 311-346: The pipeline_mode parameter is currently an unconstrained
str in DataSource.request_sync and carried onto the SyncStarted event, allowing
invalid values to slip through; change the type annotations for
DataSource.request_sync, SyncStarted.pipeline_mode, and
DataSourceService.trigger_sync to use a shared SyncPipelineMode =
Literal["full","ingest_only"] type and add a runtime check in
DataSource.request_sync (and/or in SyncStarted construction) to validate the
passed pipeline_mode, raising a ValueError or similar if it's not one of the
allowed literals so callers and outbox payloads cannot silently default to the
"full" branch.
In `@src/dev-ui/app/pages/knowledge-graphs/`[kgId]/data-sources/index.vue:
- Around line 355-371: Replace the refresh of only visibleDataSources with a
refresh over dataSources.value, await Promise.allSettled on apiFetch calls for
each ds in dataSources.value, then inspect the settled results to build a set of
successfully refreshed source IDs (check PromiseResult.status === 'fulfilled'),
call loadDataSources() to reload the full list, compute unpulled by filtering
the reloaded dataSources.value by (1) membership in the succeeded-ID set and (2)
hasUnpulledCommits(ds), and finally show toasts based on the actual succeeded
set (if none succeeded show an error toast, otherwise show “Up to date” only
when succeeded sources have no unpulled commits). Also ensure the same
succeeded-ID set is used for any disabled-state logic that referenced the
previous visibleDataSources.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 1328d988-457b-4301-b5c6-98291775ffd6
📒 Files selected for processing (15)
src/api/main.pysrc/api/management/domain/aggregates/data_source.pysrc/api/management/domain/commit_pull_state.pysrc/api/management/presentation/data_sources/models.pysrc/api/management/presentation/data_sources/routes.pysrc/api/tests/unit/management/domain/test_commit_pull_state.pysrc/api/tests/unit/management/infrastructure/test_sync_lifecycle_handler.pysrc/api/tests/unit/management/test_data_source.pysrc/dev-ui/app/pages/knowledge-graphs/[kgId]/data-sources/index.vuesrc/dev-ui/app/pages/knowledge-graphs/[kgId]/manage.vuesrc/dev-ui/app/tests/kg-data-sources-phase1.test.tssrc/dev-ui/app/tests/kg-manage-workspace-hub.test.tssrc/dev-ui/app/tests/knowledge-graph-manage-workspace.test.tssrc/dev-ui/app/utils/kgDataSourcesCommits.tssrc/dev-ui/app/utils/kgManageWorkspaceHub.ts
* feat(ui): align graph management step with k-extract phase2 layout Rework the design chat, schema/session panels, and mode switcher with locked extraction modes until the workspace transitions to extraction operations. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(ui): rename graph management chat title to Graph Management Assistant Co-authored-by: Cursor <cursoragent@cursor.com> * docs(extraction): specify sticky session chat turns and runtime Document Graph Management chat as NDJSON streaming turns inside sticky Claude Agent SDK containers with JobPackage gating and UI mode skills. Closes #738 Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
Stream NDJSON chat turns with thinking/wait activity lines and reload session history after each turn. Closes #741. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/api/extraction/application/chat_turn_service.py`:
- Line 136: The code unconditionally sets
session.runtime_context["job_package"]["phase"] =
SessionJobPackagePhase.READY.value which can clobber intentional non-wait modes;
change this to only set the phase when a JobPackage actually exists and when its
current phase indicates it should transition to READY (e.g., phase is
missing/falsy or equals an expected prior state) — check
session.runtime_context.get("job_package") and the existing
job_package.get("phase") before assigning, and skip assignment when the current
phase represents a non-wait or NOT_REQUIRED mode so downstream
behavior/telemetry is preserved.
In `@src/api/extraction/dependencies.py`:
- Around line 62-77: The block in get_extraction_chat_turn_service duplicates
construction of ExtractionAgentSessionService (using
ExtractionAgentSessionRepository,
ExtractionSkillResolutionService/ExtractionSkillOverrideRepository, and
ExtractionSessionRunMetricsReader) already built earlier; refactor by creating
the ExtractionAgentSessionService once and reusing that instance instead of
reconstructing it (e.g., extract the shared construction into a single local
variable or helper and reference that variable for session_service), keeping the
same dependencies: ExtractionAgentSessionService,
ExtractionAgentSessionRepository(session),
ExtractionSkillResolutionService(override_repository=ExtractionSkillOverrideRepository()),
ExtractionSessionRunMetricsReader(session) and the sticky_runtime_manager
parameter.
In `@src/api/extraction/presentation/routes.py`:
- Around line 182-192: Wrap the async iteration in event_stream() in a
try/except so any exception from service.stream_chat_turn(...) is caught; on
exception yield a final NDJSON terminal event (e.g.
json.dumps({"type":"done","error": str(e)}) + "\n") before returning to ensure
the client receives a terminal "done" event, and optionally log the error with
the same context; keep existing yields for normal events and still return
StreamingResponse(event_stream(), media_type="application/x-ndjson").
In `@src/api/tests/unit/extraction/application/test_chat_turn_service.py`:
- Line 11: Remove the unused import ExtractionSkillResolutionService from the
test file; locate the import statement "from
extraction.application.skill_resolution_service import
ExtractionSkillResolutionService" in test_chat_turn_service.py and delete it (or
replace it with any actually used symbol from that module if needed) so the
unused-import Ruff (F401) error is resolved.
In `@src/dev-ui/app/components/extraction/SharedConversationPanel.vue`:
- Around line 124-145: The renderAssistantHtml function currently injects
untrusted URLs directly into href via the link-replacement regex, enabling
javascript: or quote-based attribute XSS when the output is used with v-html;
fix by validating and sanitizing the captured URL in the link replacement step
inside renderAssistantHtml: only allow safe schemes (e.g., http, https, mailto,
and relative paths), reject or replace unsafe ones with a safe placeholder
(e.g., '#'), HTML-escape or URL-encode the href value, and add safe attributes
like rel="noopener noreferrer" and target="_blank" as appropriate; update the
/\[([^\]]+)\]\(([^)]+)\)/g replacement to call a sanitizer helper (e.g.,
sanitizeUrl or isSafeScheme) before returning the anchor HTML so v-html never
receives an unsafe href.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 0010ff04-feed-412b-a464-a0c859169a97
📒 Files selected for processing (30)
specs/extraction/agent-sessions.spec.mdspecs/extraction/chat-turns.spec.mdspecs/extraction/operations.spec.mdspecs/extraction/sticky-session-runtime.spec.mdspecs/index.spec.mdspecs/nfr/workload-execution.spec.mdsrc/api/extraction/application/__init__.pysrc/api/extraction/application/agent_session_service.pysrc/api/extraction/application/chat_turn_service.pysrc/api/extraction/application/job_package_gate.pysrc/api/extraction/application/skill_resolution_service.pysrc/api/extraction/dependencies.pysrc/api/extraction/domain/value_objects.pysrc/api/extraction/infrastructure/deterministic_chat_agent.pysrc/api/extraction/infrastructure/ingestion_readiness_reader.pysrc/api/extraction/ports/chat_agent.pysrc/api/extraction/ports/ingestion_readiness.pysrc/api/extraction/presentation/models.pysrc/api/extraction/presentation/routes.pysrc/api/tests/unit/extraction/application/test_chat_turn_service.pysrc/api/tests/unit/extraction/application/test_job_package_gate.pysrc/dev-ui/app/components/extraction/SharedConversationPanel.vuesrc/dev-ui/app/pages/knowledge-graphs/[kgId]/manage.vuesrc/dev-ui/app/tests/kg-extraction-chat.test.tssrc/dev-ui/app/tests/kg-graph-management-artifacts.test.tssrc/dev-ui/app/tests/kg-graph-management-modes.test.tssrc/dev-ui/app/tests/knowledge-graph-manage-workspace.test.tssrc/dev-ui/app/utils/kgExtractionChat.tssrc/dev-ui/app/utils/kgGraphManagement.tssrc/dev-ui/app/utils/kgGraphManagementArtifacts.ts
| async def event_stream(): | ||
| async for event in service.stream_chat_turn( | ||
| user_id=current_user.user_id.value, | ||
| knowledge_graph_id=knowledge_graph_id, | ||
| mode=mode, | ||
| ui_mode=request.graph_management_ui_mode, | ||
| message=request.message, | ||
| ): | ||
| yield json.dumps(event) + "\n" | ||
|
|
||
| return StreamingResponse(event_stream(), media_type="application/x-ndjson") |
There was a problem hiding this comment.
Wrap stream generation failures and emit a terminal error event.
event_stream() has no exception handling. If service.stream_chat_turn(...) raises, clients receive a truncated NDJSON stream with no terminal event, which is brittle for UI state handling. Add a guarded error done event before ending the stream.
Suggested fix
async def event_stream():
- async for event in service.stream_chat_turn(
- user_id=current_user.user_id.value,
- knowledge_graph_id=knowledge_graph_id,
- mode=mode,
- ui_mode=request.graph_management_ui_mode,
- message=request.message,
- ):
- yield json.dumps(event) + "\n"
+ try:
+ async for event in service.stream_chat_turn(
+ user_id=current_user.user_id.value,
+ knowledge_graph_id=knowledge_graph_id,
+ mode=mode,
+ ui_mode=request.graph_management_ui_mode,
+ message=request.message,
+ ):
+ yield json.dumps(event) + "\n"
+ except Exception:
+ yield json.dumps(
+ {
+ "type": "done",
+ "ok": False,
+ "error": {"code": "STREAM_FAILED", "message": "Chat stream failed."},
+ }
+ ) + "\n"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def event_stream(): | |
| async for event in service.stream_chat_turn( | |
| user_id=current_user.user_id.value, | |
| knowledge_graph_id=knowledge_graph_id, | |
| mode=mode, | |
| ui_mode=request.graph_management_ui_mode, | |
| message=request.message, | |
| ): | |
| yield json.dumps(event) + "\n" | |
| return StreamingResponse(event_stream(), media_type="application/x-ndjson") | |
| async def event_stream(): | |
| try: | |
| async for event in service.stream_chat_turn( | |
| user_id=current_user.user_id.value, | |
| knowledge_graph_id=knowledge_graph_id, | |
| mode=mode, | |
| ui_mode=request.graph_management_ui_mode, | |
| message=request.message, | |
| ): | |
| yield json.dumps(event) + "\n" | |
| except Exception: | |
| yield json.dumps( | |
| { | |
| "type": "done", | |
| "ok": False, | |
| "error": {"code": "STREAM_FAILED", "message": "Chat stream failed."}, | |
| } | |
| ) + "\n" | |
| return StreamingResponse(event_stream(), media_type="application/x-ndjson") |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/api/extraction/presentation/routes.py` around lines 182 - 192, Wrap the
async iteration in event_stream() in a try/except so any exception from
service.stream_chat_turn(...) is caught; on exception yield a final NDJSON
terminal event (e.g. json.dumps({"type":"done","error": str(e)}) + "\n") before
returning to ensure the client receives a terminal "done" event, and optionally
log the error with the same context; keep existing yields for normal events and
still return StreamingResponse(event_stream(),
media_type="application/x-ndjson").
| function renderAssistantHtml(text: string): string { | ||
| let s = text.replace(/&/g, '&').replace(/</g, '<').replace(/>/g, '>') | ||
| s = s.replace(/\*\*([^*]+)\*\*/g, '<strong class="font-semibold text-foreground">$1</strong>') | ||
| s = s.replace( | ||
| /`([^`]+)`/g, | ||
| '<code class="rounded bg-muted px-1 py-0.5 text-xs font-mono text-foreground">$1</code>', | ||
| ) | ||
| s = s.replace( | ||
| /^> (.+)$/gm, | ||
| '<p class="my-2 border-l-2 border-amber-500/60 pl-3 text-sm text-muted-foreground italic">$1</p>', | ||
| ) | ||
| s = s.replace( | ||
| /\[([^\]]+)\]\(([^)]+)\)/g, | ||
| '<a class="text-primary font-medium underline underline-offset-2 hover:text-primary/90" href="$2">$1</a>', | ||
| ) | ||
| s = s.replace(/## (.+)$/gm, '<h3 class="text-base font-semibold mt-3 mb-1 text-foreground">$1</h3>') | ||
| s = s.replace(/### (.+)$/gm, '<h4 class="text-sm font-semibold mt-2 text-foreground">$1</h4>') | ||
| s = s.replace(/^---$/gm, '<hr class="my-3 border-border" />') | ||
| s = s.replace(/\n\n+/g, '<br /><br />') | ||
| s = s.replace(/\n/g, '<br />') | ||
| return s | ||
| } |
There was a problem hiding this comment.
Sanitize markdown links before v-html rendering (XSS risk)
Line 137 interpolates untrusted URL text directly into href, and Line 293 renders the result with v-html. This allows unsafe schemes (for example javascript:) and quote-based attribute injection.
Proposed fix
+function sanitizeHref(raw: string): string | null {
+ const value = raw.trim()
+ if (value.startsWith('/')) return value.replace(/"/g, '%22')
+ try {
+ const parsed = new URL(value)
+ if (!['http:', 'https:', 'mailto:'].includes(parsed.protocol)) return null
+ return parsed.href.replace(/"/g, '%22')
+ } catch {
+ return null
+ }
+}
+
function renderAssistantHtml(text: string): string {
- let s = text.replace(/&/g, '&').replace(/</g, '<').replace(/>/g, '>')
+ let s = text
+ .replace(/&/g, '&')
+ .replace(/</g, '<')
+ .replace(/>/g, '>')
+ .replace(/"/g, '"')
+ .replace(/'/g, '&`#39`;')
@@
- s = s.replace(
- /\[([^\]]+)\]\(([^)]+)\)/g,
- '<a class="text-primary font-medium underline underline-offset-2 hover:text-primary/90" href="$2">$1</a>',
- )
+ s = s.replace(/\[([^\]]+)\]\(([^)]+)\)/g, (_, label: string, rawHref: string) => {
+ const href = sanitizeHref(rawHref)
+ if (!href) return label
+ return `<a class="text-primary font-medium underline underline-offset-2 hover:text-primary/90" href="${href}" target="_blank" rel="noopener noreferrer">${label}</a>`
+ })As per coding guidelines **: Focus on major issues impacting performance, readability, maintainability and security.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function renderAssistantHtml(text: string): string { | |
| let s = text.replace(/&/g, '&').replace(/</g, '<').replace(/>/g, '>') | |
| s = s.replace(/\*\*([^*]+)\*\*/g, '<strong class="font-semibold text-foreground">$1</strong>') | |
| s = s.replace( | |
| /`([^`]+)`/g, | |
| '<code class="rounded bg-muted px-1 py-0.5 text-xs font-mono text-foreground">$1</code>', | |
| ) | |
| s = s.replace( | |
| /^> (.+)$/gm, | |
| '<p class="my-2 border-l-2 border-amber-500/60 pl-3 text-sm text-muted-foreground italic">$1</p>', | |
| ) | |
| s = s.replace( | |
| /\[([^\]]+)\]\(([^)]+)\)/g, | |
| '<a class="text-primary font-medium underline underline-offset-2 hover:text-primary/90" href="$2">$1</a>', | |
| ) | |
| s = s.replace(/## (.+)$/gm, '<h3 class="text-base font-semibold mt-3 mb-1 text-foreground">$1</h3>') | |
| s = s.replace(/### (.+)$/gm, '<h4 class="text-sm font-semibold mt-2 text-foreground">$1</h4>') | |
| s = s.replace(/^---$/gm, '<hr class="my-3 border-border" />') | |
| s = s.replace(/\n\n+/g, '<br /><br />') | |
| s = s.replace(/\n/g, '<br />') | |
| return s | |
| } | |
| function sanitizeHref(raw: string): string | null { | |
| const value = raw.trim() | |
| if (value.startsWith('/')) return value.replace(/"/g, '%22') | |
| try { | |
| const parsed = new URL(value) | |
| if (!['http:', 'https:', 'mailto:'].includes(parsed.protocol)) return null | |
| return parsed.href.replace(/"/g, '%22') | |
| } catch { | |
| return null | |
| } | |
| } | |
| function renderAssistantHtml(text: string): string { | |
| let s = text | |
| .replace(/&/g, '&') | |
| .replace(/</g, '<') | |
| .replace(/>/g, '>') | |
| .replace(/"/g, '"') | |
| .replace(/'/g, '&`#39`;') | |
| s = s.replace(/\*\*([^*]+)\*\*/g, '<strong class="font-semibold text-foreground">$1</strong>') | |
| s = s.replace( | |
| /`([^`]+)`/g, | |
| '<code class="rounded bg-muted px-1 py-0.5 text-xs font-mono text-foreground">$1</code>', | |
| ) | |
| s = s.replace( | |
| /^> (.+)$/gm, | |
| '<p class="my-2 border-l-2 border-amber-500/60 pl-3 text-sm text-muted-foreground italic">$1</p>', | |
| ) | |
| s = s.replace(/\[([^\]]+)\]\(([^)]+)\)/g, (_, label: string, rawHref: string) => { | |
| const href = sanitizeHref(rawHref) | |
| if (!href) return label | |
| return `<a class="text-primary font-medium underline underline-offset-2 hover:text-primary/90" href="${href}" target="_blank" rel="noopener noreferrer">${label}</a>` | |
| }) | |
| s = s.replace(/## (.+)$/gm, '<h3 class="text-base font-semibold mt-3 mb-1 text-foreground">$1</h3>') | |
| s = s.replace(/### (.+)$/gm, '<h4 class="text-sm font-semibold mt-2 text-foreground">$1</h4>') | |
| s = s.replace(/^---$/gm, '<hr class="my-3 border-border" />') | |
| s = s.replace(/\n\n+/g, '<br /><br />') | |
| s = s.replace(/\n/g, '<br />') | |
| return s | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/dev-ui/app/components/extraction/SharedConversationPanel.vue` around
lines 124 - 145, The renderAssistantHtml function currently injects untrusted
URLs directly into href via the link-replacement regex, enabling javascript: or
quote-based attribute XSS when the output is used with v-html; fix by validating
and sanitizing the captured URL in the link replacement step inside
renderAssistantHtml: only allow safe schemes (e.g., http, https, mailto, and
relative paths), reject or replace unsafe ones with a safe placeholder (e.g.,
'#'), HTML-escape or URL-encode the href value, and add safe attributes like
rel="noopener noreferrer" and target="_blank" as appropriate; update the
/\[([^\]]+)\]\(([^)]+)\)/g replacement to call a sanitizer helper (e.g.,
sanitizeUrl or isSafeScheme) before returning the anchor HTML so v-html never
receives an unsafe href.
…746) Ship kartograph-agent-runtime container image with NDJSON turn API, mount skills and JobPackage workspaces, inject chat-scoped workload tokens, and delegate graph-management chat turns to the remote runtime when container backend is enabled. Closes #742. Co-authored-by: Cursor <cursoragent@cursor.com>
Align sticky Claude Agent SDK containers with k-extract Vertex auth and warm the graph-management assistant on UI entry with streamed readiness progress. Co-authored-by: Cursor <cursoragent@cursor.com>
Prevent JIT provisioning conflicts when Keycloak re-imports the realm and Postgres still holds rows keyed by the previous SSO subject. Co-authored-by: Cursor <cursoragent@cursor.com>
Mount gcloud credentials at /gcloud/config and run sticky containers as the host UID so Claude Agent SDK can reach Vertex AI, while keeping the API root for Docker-out-of-Docker in dev. Co-authored-by: Cursor <cursoragent@cursor.com>
Drop the redundant branch tip column from the KG data sources table. Co-authored-by: Cursor <cursoragent@cursor.com>
Load prepared archives even in schema-design mode, refresh the workspace on chat reuse, point Claude SDK at /workspace, and remove sibling sticky and worker containers during make down. Co-authored-by: Cursor <cursoragent@cursor.com>
Incremental prepares were overwriting last_prepared_file_count with the number of changed files, so the data sources table showed the wrong "Files on branch" value after subsequent prepares. Co-authored-by: Cursor <cursoragent@cursor.com>
Background refreshes no longer toggle the page-level loading gate, so prepare polling updates status in place with a subtle updating indicator. Co-authored-by: Cursor <cursoragent@cursor.com>
Graph Management and other manage steps no longer stretch edge-to-edge on wide screens, matching the data sources workspace layout. Co-authored-by: Cursor <cursoragent@cursor.com>
Expose job_package_available on data source listings, rematerialize when the ZIP is gone, and skip ingest-only no-changes short-circuit without it. Co-authored-by: Cursor <cursoragent@cursor.com>
Skip workspace rematerialization when the container is healthy and JobPackage IDs match, report 503 until the agent workspace is ready, and only save user messages after the assistant turn completes or fails. Co-authored-by: Cursor <cursoragent@cursor.com>
…ct layout Split the combined schema nav/detail card into a sticky left navigator and right detail column to match k-extract's Design Artifacts pattern. Co-authored-by: Cursor <cursoragent@cursor.com>
Surface tool use, reasoning, task progress, and compose previews as NDJSON thinking events so the Graph Management Assistant panel updates while Vertex work is in flight. Co-authored-by: Cursor <cursoragent@cursor.com>
…kspaces Ensure ingest-only prepares full-branch JobPackages and only materialize packages that contain repository content so Graph Management sessions can reliably read repo files. Add workspace source indexing plus prompt/thinking updates so the agent reports accurate available files and tools. Co-authored-by: Cursor <cursoragent@cursor.com>
Process SyncStarted outbox events with bounded concurrency and fetch GitHub blobs in parallel to reduce ingestion-context preparation time for multi-source batches. Co-authored-by: Cursor <cursoragent@cursor.com>
…ement Expose separate schema-entities and schema-relationships rail items with readiness-driven status and detail panels so designers can track type coverage before transitioning. Co-authored-by: Cursor <cursoragent@cursor.com>
…imeout Stream rolling three-line activity updates through warmup, SDK heartbeats, and the Graph Management UI, with unbuffered NDJSON and clearer timeout diagnostics. Increase default sticky turn timeout to 10 minutes. Co-authored-by: Cursor <cursoragent@cursor.com>
…x turns Improve incremental NDJSON delivery, SDK thinking dispatch, and error handling; default max_turns to 500 so graph management turns are not capped at 8. Co-authored-by: Cursor <cursoragent@cursor.com>
Accumulate StreamEvent text deltas, join assistant text blocks, and finalize turn replies from result metadata so tool-only completions are not reported as empty. Co-authored-by: Cursor <cursoragent@cursor.com>
| ) | ||
|
|
||
| return StreamingResponse( | ||
| event_stream(), |
Return AGENT_NO_TEXTUAL_REPLY when no reply can be extracted rather than surfacing a placeholder string as an assistant message. Co-authored-by: Cursor <cursoragent@cursor.com>
asyncio.wait_for on query().__anext__() cancelled pending reads after 8s, breaking the Claude Agent SDK stream before ResultMessage and reply text arrived. Co-authored-by: Cursor <cursoragent@cursor.com>
Summary
/knowledge-graphs/{kgId}/data-sources/new(URLs → configure → sequential initial sync → summary), modeled after k-extractdesigner/new./knowledge-graphs/{kgId}/data-sources(phase1 equivalent) for sync, commits, diff, and maintenance focus.dataSourceCount === 0, otherwise to the operations page.Closes #736
Test plan
/data-sources/new?focus=maintainfilters to maintenance-ready sources/data-sourcesunchangedMade with Cursor