diff --git a/.cursor/rules/api-key-controls.mdc b/.cursor/rules/api-key-controls.mdc index 95482d4..cfc4e8a 100644 --- a/.cursor/rules/api-key-controls.mdc +++ b/.cursor/rules/api-key-controls.mdc @@ -10,11 +10,14 @@ alwaysApply: true Enables cloud provider access with LM Studio fallback in default mode (OpenRouter API keys plus desktop OAuth providers such as OpenAI Codex and xAI Grok/SuperGrok) and OpenRouter-only operation in generic mode, plus boost controls and research metrics in the workflow panel. **Key Features:** -- **Per-Role Cloud Provider Selection**: Each role independently uses LM Studio, OpenRouter, or a desktop-only OAuth provider where available (default mode); generic mode remains OpenRouter-only. -- **Cloud Access & Keys**: Header overlay manages OpenRouter API keys and desktop OAuth provider logins. The post-disclaimer startup wizard requires OpenRouter or LM Studio first and presents OAuth only as a later add-on. The OAuth UI is provider-registry driven (`openai_codex_oauth`, `xai_grok_oauth`) so future OAuths can be added without changing saved profile shape. +- **Per-Role Cloud Provider Selection**: Each role independently uses LM Studio, OpenRouter, or a desktop-only OAuth/subscription provider where available (OpenAI Codex OAuth, xAI Grok OAuth, Sakana Fugu API key; default mode); generic mode remains OpenRouter-only. +- **OpenRouter/OAuth**: Header overlay manages OpenRouter API keys and desktop OAuth/provider logins (OpenAI Codex, xAI Grok/SuperGrok OAuth, Sakana Fugu subscription API key). The post-disclaimer startup wizard requires OpenRouter or LM Studio first and presents OAuth/provider add-ons later. The provider UI is registry-driven (`openai_codex_oauth`, `xai_grok_oauth`, `sakana_fugu`) so future providers can be added without changing saved profile shape. OAuth attribution identifiers must refer to MOTO Autonomous ASI, never the bare nickname `moto`; compact OAuth identifier fields use `moto-autonomous-asi` unless a provider explicitly requires the full display name. +- **SyntheticLib4 Access**: SyntheticLib4 backend/search scaffolding remains in place, but the user-facing connectivity pill is currently a `Coming soon` explainer rather than a ready/configuration surface. The modal explains the reciprocal proof-contribution/access model until production corpus access is enabled. +- **Session History Memory**: The connectivity toggle defaults enabled for new users and maps to local MOTO/manual/LeanOJ proof-history memory used by Assistant workflow-memory search during brainstorming, writing, proof work, and LeanOJ solving. It is not raw provider transcript or chain-of-thought storage. Disabling it persists as non-secret runtime state, removes local proof-history corpora from Assistant retrieval, and must not delete proof records or alter internal retry/rejection/prompt memory. +- **Assistant Role**: Aggregator, Compiler, Autonomous Research, and LeanOJ expose one shared non-blocking Assistant LLM role per workflow surface for verified proof-memory support. Assistant selects up to 7 prior verified proof supports, never replaces validators or submitters, never blocks parent workflows, and is disabled when Session History Memory is disabled. Useful Assistant packs may be reused by two eligible receiver reads before the next refresh. True no-history targets are skipped because Assistant only performs proof-memory retrieval for now. Durable cooldown groups transient task IDs/roles by workflow run while preserving real source/session separation: repeated zero-useful retrieval backs off and may shut down for the run; repeated stagnant same-pack retrieval backs off without shutdown. Stale live packs/state clear only on explicit reset/clear or Session History Memory disable. User live activity should show normal Assistant retrieval result logs only, not skip/backoff/shutdown turns. - **Startup provider requirement**: OAuth providers are supplementary role providers, not a standalone startup path, because RAG embeddings route through LM Studio, OpenRouter, or generic-mode FastEmbed. First-run startup and workflow start preflights must require OpenRouter, generic FastEmbed, or LM Studio with an embedding model available before OAuth-only role selection is allowed. - **OpenRouter Auto-Fill**: OpenRouter selectors fetch provider endpoint metadata and compute host-aware context/output settings from a capable endpoint set. Auto mode ignores known weak hosts (currently Venice) and low/missing-cap outliers before computing context/max-output; manual host selection uses that exact host and its largest exposed endpoint output cap. -- **OAuth Auto-Fill**: OAuth model selectors auto-fill only from provider model metadata or documented provider-specific limits. Do not synthesize fallback model entries or invent generic fallback context windows for unknown OAuth models; preserve current settings when metadata is unknown. GPT-5.5 Codex uses the Codex 400K product window, not the 1M regular API window. Grok/SuperGrok OAuth uses xAI model metadata when available and may expose known Grok subscription limits only for known model IDs; model listings must filter xAI catalog entries that are not accepted by the OAuth chat-completions route, such as multi-agent-only models. +- **OAuth Auto-Fill**: OAuth model selectors auto-fill only from provider model metadata, documented provider-specific limits, or curated provider-backed public aliases. Do not invent generic fallback context windows for unknown OAuth models; preserve current settings when metadata is unknown. GPT-5.5 Codex uses the Codex 400K product window, not the 1M regular API window; Codex Spark high is exposed as a curated alias for Codex Spark with high reasoning and its documented 128K window. Grok/SuperGrok OAuth uses xAI model metadata when available and may expose known Grok subscription limits only for known model IDs; model listings must filter xAI catalog entries that are not accepted by the OAuth chat-completions route, such as multi-agent-only models. - **OpenRouter Reasoning Effort**: Every OpenRouter role exposes a visible reasoning-effort selector. Default `auto` sends maximum OpenRouter reasoning effort (`xhigh`) through the normalized `reasoning.effort` request object; users may lower it or set `none`. - **LM Studio Fallback** (default mode only): Optional fallback per role on credit exhaustion - **Free Model Cooldown Handling**: SERIAL BOTTLENECK pause, free model looping, and auto-selector backup (see below) @@ -91,7 +94,7 @@ Enables cloud provider access with LM Studio fallback in default mode (OpenRoute - Observability surfaces must default to metadata/previews with secret redaction. Provider keys, URL query keys, Wolfram query/result text, and full prompt/response bodies must not be persisted or broadcast unless an explicit trusted debug path opts in. Legacy full-payload log fields are scrubbed from persisted API logs on logger startup. - Tool-call assistant/tool protocol turns are the only exception where exact assistant content/structure may need preservation; ordinary JSON retry assistant turns are not tool protocol turns and must use sanitized retry context. - Generic mode must normalize or reject LM Studio role configs and must never fall through to `lm_studio_client.generate_completion()`, even if a direct API caller submits legacy `provider="lm_studio"` or an LM fallback value. -- Desktop OAuth providers are distinct from regular provider API-key billing. OpenAI Codex OAuth (`openai_codex_oauth`) uses ChatGPT/Codex account tokens against `https://chatgpt.com/backend-api/codex`; xAI Grok OAuth (`xai_grok_oauth`) uses current `auth.x.ai` PKCE login with `grok-cli:access` / `api:access` scopes and subscription tokens against the xAI OpenAI-compatible chat-completions API. xAI Console API keys are a separate pay-as-you-go/credit path, not the subscription-backed OAuth path. OAuth tokens are stored through chunked OS-keyring entries, provider status is non-secret, callback listeners are loopback-only and released after pending login completion/expiry, and OAuth providers remain unavailable in generic mode. +- Desktop OAuth providers and Sakana Fugu (`sakana_fugu`) are distinct from regular provider API-key billing. OpenAI Codex OAuth (`openai_codex_oauth`) uses ChatGPT/Codex account tokens against `https://chatgpt.com/backend-api/codex`; xAI Grok OAuth (`xai_grok_oauth`) uses current `auth.x.ai` PKCE login with `grok-cli:access` / `api:access` scopes and subscription tokens against the xAI OpenAI-compatible chat-completions API; Sakana Fugu (`sakana_fugu`) uses the desktop keyring-stored subscription API key against Sakana's OpenAI-compatible `/responses` and `/chat/completions` endpoints. xAI Console API keys are a separate pay-as-you-go/credit path, not the subscription-backed OAuth path. OAuth tokens are stored through chunked OS-keyring entries, Sakana uses a single keyring entry, provider status is non-secret, callback listeners are loopback-only and released after pending login completion/expiry, and these desktop-only providers remain unavailable in generic mode. Hosted/generic settings must not poll desktop OAuth/Sakana status/model endpoints when `/api/features` marks those capabilities unavailable. - Generic mode: `get_embeddings()` early-returns to `FastEmbedProvider` before the LM Studio → OpenRouter fallback chain - Tracks fallback state per role: `_role_fallback_state: Dict[str, str]` - `reset_openrouter_fallbacks()`: Resets all roles originally configured for OpenRouter back from LM Studio fallback. Called automatically on API key set, or manually via reset endpoint. @@ -99,9 +102,10 @@ Enables cloud provider access with LM Studio fallback in default mode (OpenRoute **CRITICAL REQUIREMENT - Role Configuration:** - **EVERY role calling `api_client_manager.generate_completion()` MUST be configured via `api_client_manager.configure_role()`** -- This includes: aggregator submitters/validator, compiler submitters/validator/critique, autonomous agents, Tier 3 final answer agents, and LeanOJ roles/topic/brainstorm submitters +- This includes: aggregator submitters/validator/Assistant, compiler Writing/Rigor & Proofs/validator/Assistant, autonomous agents/Assistant, Tier 3 final answer agents, and LeanOJ roles/topic/brainstorm submitters/Assistant - Role configs must preserve `supercharge_enabled` when copied into proof snapshots, manual proof helpers, child Aggregator/Compiler coordinators, and LeanOJ grouped roles. -- **Proof agents (Part 3, optional)** do NOT have standalone user-facing model settings. Internal proof role IDs are still configured through `api_client_manager.configure_role()` by copying from the `ProofRuntimeConfigSnapshot` (brainstorm submitter, high-context submitter, validator) captured by `autonomous_coordinator._build_proof_runtime_config_snapshot()` and persisted via `research_metadata.set_proof_runtime_config()`, or supplied directly on manual `POST /api/proofs/check`. Manual checks require `lean4_enabled=True` and either a stored or request-provided runtime snapshot. +- **Proof agents (Part 3/manual, optional)** use the visible Rigor & Proofs Submitter settings for proof-solving submitter work. Internal proof role IDs are still configured through `api_client_manager.configure_role()` by copying from the `ProofRuntimeConfigSnapshot` (Rigor & Proofs submitter, validator, optional Assistant) captured by `autonomous_coordinator._build_proof_runtime_config_snapshot()` and persisted via `research_metadata.set_proof_runtime_config()`, or supplied directly on manual `POST /api/proofs/check`. Manual checks require `lean4_enabled=True` and either a stored or request-provided runtime snapshot. +- Assistant is the preferred owner of routine proof-history memory retrieval for eligible non-validator, non-critique flows. It watches the current prompt/phase/target, searches proactively against verified proof corpora, ranks a diverse up-to-7 memory-support pack with cached target/goal reuse and visit-count exploration, reuses useful packs for two eligible receiver reads before refresh, and never blocks the parent workflow. Run-scoped cooldown prevents repeated empty/stagnant retrieval from calling Assistant every turn; ordinary stop preserves that state, while explicit clear/reset removes it. In-role `search_lean_proofs` calls should be explicit legacy/debug or narrow emergency-repair paths only, not the normal retrieval path. **Boost Mode Priority** (`should_use_boost(task_id)`): 1. Always Prefer Boost: `boost_always_prefer=True` → True @@ -115,7 +119,7 @@ Enables cloud provider access with LM Studio fallback in default mode (OpenRoute **Categories from task ID prefix:** - `agg_sub{N}` / `agg_val` → Aggregator and autonomous parent-role tasks -- `comp_hc` / `comp_hp` / `comp_val` / `comp_crit` → Compiler roles; legacy critique task prefixes (`critique_sub*`, `critique_val`, `critique_cleanup`) alias to `comp_crit` +- `comp_writer` / `comp_hp` / `comp_val` → Compiler roles; `comp_hp` is Rigor & Proofs, legacy `comp_crit` and `critique_sub*` alias to `comp_hp`, while critique validator/cleanup task IDs route with `comp_val` - `proof_*` → autonomous proof task IDs (not exposed in the category list unless added explicitly) - Exposed LeanOJ category presets cover topic generation/validation, brainstorm submitters/validator, sufficiency, path validation, and final solver. `leanoj_path_*` path-decision calls are absorbed into the Final Solver category; other LeanOJ helper prefixes (for example prune review, master-proof edit validation, final review, and proof-novelty tasks) use Boost Next X, Always Prefer, or exact task IDs unless promoted to category presets. @@ -129,10 +133,19 @@ Enables cloud provider access with LM Studio fallback in default mode (OpenRoute - Methods: `log_api_call`, `get_logs(limit)`, `clear_logs`, `get_stats` - Boost logs are merged into the main API call log view; persisted/default route output must avoid provider keys and raw full prompt/response bodies. +#### SyntheticLib4 Access And Proof Search (planned) +- SyntheticLib4 integration is an authorized proof-corpus/search surface, not a model route. MOTO may use hosted APIs, downloaded snapshots, and deltas, but must not clone the private Lean repository or scrape website pages. +- Desktop/default credentials use `secret_store.py`; hosted/generic credentials remain env-injected or in process memory. Snapshot files, runtime settings, prompts, and default logs must never contain SyntheticLib4 tokens or API keys. +- The initial unified proof-search backend indexes active and archived canonical MOTO proof records plus SyntheticLib4 mock/offline records through batched local SQLite/FTS rebuilds; search responses enforce the 7-proof combined result cap before workflow/tool integration. Assistant may gather a wider local candidate pool, but final model-visible proof-support packs remain capped at 7. SyntheticLib4 mock/offline mode must keep working when test fixtures are not packaged by using bundled/built-in fallback records until a real data-root snapshot is available. +- Proof-search service calls must centrally filter disabled corpora for REST, tool, and Assistant paths; disabling Session History Memory removes local MOTO/manual/LeanOJ corpora, and disabling SyntheticLib4 removes that corpus even if a client requests it explicitly. +- The shared `search_lean_proofs` tool adapter supports overview/search/hydrate/local usage-attestation actions over the same service. Proof formalization exposes the tool through a bounded provider tool-call loop where supported, while preserving the existing prefetch path for providers that do not emit tool calls; local per-attempt exclusions avoid exact repeat results. Proof formalization records local `model_visible_context` attestations when full SyntheticLib4 Lean code is injected into a successful formalization prompt; whole-code-use submission to SyntheticLib4 remains a later live-service step. +- Local snapshot search remains available when lawfully downloaded and allowed by the recorded license/terms, but refresh, hosted retrieval, user-proof browsing, sharing, redistribution, and public serving require active authorization unless enterprise rights explicitly allow more. +- Full proof payloads should enter model context only through the intentional bounded proof-search result block, and default observability should log metadata, counts, IDs, hashes, and short previews rather than full Lean code. + #### Workflow Task Generation (Internal Backend Tracking) Coordinators track task IDs internally for boost routing. The frontend does NOT display predicted task lists. - Aggregator: `agg_sub{N}_{seq:03d}`, `agg_val_{seq:03d}` -- Compiler: `comp_hc_{seq:03d}`, `comp_hp_{seq:03d}`, `comp_val_{seq:03d}` +- Compiler: `comp_writer_{seq:03d}`, `comp_hp_{seq:03d}`, `comp_val_{seq:03d}` - Autonomous orchestration: parent-role prefixes (`agg_sub1_{seq:03d}` for topic/completion/reference/title/Tier 3 agents; `agg_val_{seq:03d}` for topic/redundancy validators) - Autonomous proof/framing: `proof_framing_gate_{seq:03d}` for the prompt-framing decision, plus Lean-gated proof work IDs `proof_id_{seq:03d}`, `proof_lemma_{seq:03d}`, `proof_form_{seq:03d}`, `proof_integrity_{seq:03d}`, `proof_novelty_{seq:03d}` @@ -150,7 +163,7 @@ Predictions refresh: after initialization, each task completion, mode switches, ## WebSocket Events -Workflow, boost, fallback/reset, provider pause/resume, hung-connection, rate-limit, privacy-policy, and unrecoverable OAuth provider failures should emit user-visible notifications when the frontend or hosted wrapper depends on them. OAuth failure notifications must tell the user to check Cloud Access & Keys, sign in again, and retry. Hung-connection alerts should also appear in the active mode's live activity feed and concisely note that the model may still be thinking, the user can keep waiting, and reasoning effort can be lowered in Settings. Keep consumed event payloads stable enough for UI recovery, but do not treat every internal notification name as a rule-level invariant. +Workflow, boost, fallback/reset, provider pause/resume, hung-connection, rate-limit, privacy-policy, and unrecoverable OAuth provider failures should emit user-visible notifications when the frontend or hosted wrapper depends on them. OAuth failure notifications must tell the user to check OpenRouter/OAuth, sign in again, and retry; live activity should also include the redacted provider error detail capped at 250 chars. Recent non-secret OAuth/provider failure notifications are also persisted under the active data root so the UI can recover missed live popups after reconnect or reload. Hung-connection alerts should also appear in the active mode's live activity feed and concisely note that the model may still be thinking, the user can keep waiting, and reasoning effort can be lowered in Settings. Keep consumed event payloads stable enough for UI recovery, but do not treat every internal notification name as a rule-level invariant. --- @@ -185,8 +198,9 @@ Workflow, boost, fallback/reset, provider pause/resume, hung-connection, rate-li - `POST /api/openrouter/test-connection` — Test key without storing - `GET /api/model-cache` — Cached model ID mapping (display_name → api_id) -### Cloud Access (`backend/api/routes/cloud_access.py`) -- `GET /api/cloud-access/status` — Non-secret Cloud Access & Keys provider status +### OpenRouter/OAuth (`backend/api/routes/cloud_access.py`) +- `GET /api/cloud-access/status` — Non-secret OAuth provider status used by the OpenRouter/OAuth modal and connectivity panel +- `GET /api/cloud-access/provider-notifications` — Recent non-secret OAuth/provider failure notifications for missed-popup recovery - `POST /api/cloud-access/openai-codex/oauth/start` — Start desktop OpenAI Codex OAuth PKCE login and loopback callback listener - `POST /api/cloud-access/openai-codex/oauth/exchange` — Exchange pasted callback URL/code for Codex OAuth tokens - `GET /api/cloud-access/openai-codex/status` — Non-secret OpenAI Codex OAuth status @@ -197,18 +211,45 @@ Workflow, boost, fallback/reset, provider pause/resume, hung-connection, rate-li - `GET /api/cloud-access/xai-grok/status` — Non-secret xAI Grok OAuth status - `GET /api/cloud-access/xai-grok/models` — Grok model list for the signed-in account, including normalized context/output metadata when known - `DELETE /api/cloud-access/xai-grok` — Revoke best-effort and clear stored xAI Grok OAuth tokens +- `GET /api/cloud-access/sakana-fugu/status` — Non-secret Sakana Fugu API-key status +- `POST /api/cloud-access/sakana-fugu/api-key` — Store Sakana Fugu subscription API key in desktop keyring +- `GET /api/cloud-access/sakana-fugu/models` — Sakana Fugu model list for configured key +- `DELETE /api/cloud-access/sakana-fugu` — Clear stored Sakana Fugu API key ### Workflow (`backend/api/routes/workflow.py`) - `GET /api/workflow/predictions` — Current workflow mode (also returns tasks for internal use) - `GET /api/token-stats` — Cumulative token usage (total_input, total_output, by_model, elapsed_seconds) +### Connectivity (`backend/api/routes/connectivity.py`) +- `GET /api/connectivity/status` — Non-secret grouped status for OpenRouter/OAuth, LM Studio, SyntheticLib4, local proof-history Session History Memory, Wolfram Alpha, and Boost +- `POST /api/connectivity/toggles` — Persist non-secret optional-skill enablement (`syntheticlib4_enabled`, `agent_conversation_memory_enabled`, `wolfram_alpha_enabled`) without clearing credentials, snapshots, indexes, or history + +### Proof Search (`backend/api/routes/proof_search.py`) +- `GET /api/proof-search/overview` — Compact corpus/index map for proof navigation, filtered by enabled proof-history/SyntheticLib4 corpus toggles +- `POST /api/proof-search/search` — Search canonical MOTO proof records and SyntheticLib4 fixture/snapshot records, with public request schema and runtime results capped at 7 combined records +- `GET /api/proof-search/proofs/{source}/{proof_id}` — Fetch one indexed proof record, hydrating SyntheticLib4 fixture code when available +- `POST /api/proof-search/reindex` — Rebuild the local proof-search SQLite/FTS index + +### SyntheticLib4 (`backend/api/routes/syntheticlib4.py`) +- `GET /api/syntheticlib4/status` — Non-secret auth/snapshot/index status for local SyntheticLib4 corpus access +- `GET /api/syntheticlib4/releases` — Locally available SyntheticLib4 releases +- `POST /api/syntheticlib4/refresh` — Validate local snapshot metadata and rebuild the unified proof index +- `POST /api/syntheticlib4/import-local-snapshot` — Safely activate a local snapshot staged under the active data root, preserving the previous active snapshot if validation/import fails +- `POST /api/syntheticlib4/reindex` — Rebuild the unified proof index from available local corpora +- `POST /api/syntheticlib4/retrieve-batch` — Bounded mock/offline retrieve-batch surface matching the planned SyntheticLib4 corpus contract, with public request schema capped at 7 records +- `GET /api/syntheticlib4/account/proofs`, `/account/proofs/search` — Mock/offline accepted-proof browsing/search surfaces. Current bundled frontend keeps SyntheticLib4 user access as a coming-soon explainer until production corpus access is enabled. +- `POST /api/syntheticlib4/api-key`, `DELETE /auth` — Store/clear SyntheticLib4 API-key credentials through the active mode's secret path while live validation is pending +- `POST /api/syntheticlib4/auth/start`, `/auth/exchange` — Production OAuth placeholders that fail clearly until SyntheticLib.com auth is live + --- ## Error Handling **Credit Exhaustion:** HTTP 402 or keywords "credit"/"insufficient"/"balance"/"quota"/"key limit"/"limit exceeded" → `CreditExhaustionError` → default mode: LM Studio fallback for that role when configured; proof workflows with no fallback checkpoint progress and pause, while currently waiting tasks can be woken by `POST /api/openrouter/reset-exhaustion`; ordinary non-proof generic-mode calls still raise provider/config errors when no fallback exists. Fallback state is resettable via `POST /api/openrouter/reset-exhaustion` or by re-setting the API key. -**OAuth provider transient failures:** gateway/timeout/stream disconnect responses are retried where the provider adapter supports it and must preserve proof checkpoints instead of becoming hard "no fallback configured" stops, `no_candidates`, or proof-attempt exhaustion. Auth/config failures remain hard; when an OAuth model call cannot recover through LM Studio fallback, broadcast a user-visible OAuth reconnect notification. +**OAuth provider transient failures:** gateway/timeout/stream disconnect responses use bounded exponential backoff (4 retries) where the provider adapter supports it and must preserve proof checkpoints instead of becoming hard "no fallback configured" stops, `no_candidates`, or proof-attempt exhaustion. Auth/config failures remain hard; when an OAuth model call cannot recover through LM Studio fallback, broadcast a user-visible OAuth reconnect notification with a stable provider/role/reason/model `notification_key` so user-dismissed alerts do not replay after restart. + +**OAuth provider usage limits:** Codex `usage_limit_reached` responses carry `resets_at`/`resets_in_seconds`; `APIClientManager` tracks in-memory provider cooldown, emits one durable `oauth_provider_usage_limited` notification per cooldown window, uses LM Studio fallback when configured, otherwise raises `OAuthProviderCooldownError` so submitter/validator loops wait until reset instead of retry-spamming. Assistant OAuth refreshes reuse the latest cached/deterministic shortlist during cooldown instead of making new OAuth selection calls. **Boost Exhaustion:** Falls back to primary for that task; boost stays enabled; counter NOT decremented. @@ -256,7 +297,7 @@ Workflow, boost, fallback/reset, provider pause/resume, hung-connection, rate-li ## Configuration Persistence -**Secure backend storage (OS keyring — default mode):** OpenRouter global API key and Wolfram Alpha API key persist via `backend/shared/secret_store.py` using the OS keychain/keyring. The keyring service name is derived from `MOTO_SECRET_NAMESPACE`; `None` means the shared desktop service `MOTO-Autonomous-ASI`. `moto_launcher.py` MUST keep `.moto_last_instance.json` stable across normal relaunches (including `instance_id="default"`) so the backend reads the same keyring service every startup. Never let port availability, Windows `TIME_WAIT`, or Lean/LM startup timing create a new namespace for a plain relaunch. +**Secure backend storage (OS keyring — default mode):** OpenRouter global API key and Wolfram Alpha API key persist via `backend/shared/secret_store.py` using the OS keychain/keyring. The keyring service name is derived from `MOTO_SECRET_NAMESPACE`; `None` means the shared desktop service `MOTO-Autonomous-ASI`. `moto_launcher.py` MUST keep plain/default launches on the shared default identity (`backend/data`, `backend/logs`, no keyring namespace, no frontend storage prefix, frontend port 5173) so backend keys, browser profiles/prompts, sessions, and proofs remain visible. Never let port availability, Windows `TIME_WAIT`, stale `.moto_last_instance.json`, or Lean/LM startup timing create a new namespace or data root for a plain relaunch; concurrent default launches must be blocked by active-instance state and a data-root runtime lock. **Startup key detection:** `backend/api/main.py` restores desktop credentials before serving `/api/openrouter/api-key-status`. Expensive optional startup work (Lean 4 warm start, Mathlib cache, etc.) must not block the FastAPI lifespan; run it in the background and clean up subprocesses on cancellation. Frontend startup state must treat unreachable key-status as `unknown`, not `has_key=false`, so it never opens the setup modal or shows a missing-key state over a persisted key. @@ -264,6 +305,6 @@ Workflow, boost, fallback/reset, provider pause/resume, hung-connection, rate-li **Non-secret runtime settings:** `runtime_settings.json` under the active data root persists process-level user settings such as OpenRouter free-model looping/auto-selector and, in desktop/default mode, Lean/SMT proof runtime flags/timeouts. Generic mode keeps proof settings unavailable but may still persist non-secret free-model settings. It must never contain provider keys or prompt/response payloads. -**localStorage:** Key families include workflow/UI preferences (`workflow_panel_collapsed`, `developerModeSettingsEnabled`, `banner_shimmer_enabled`, `startup_provider_choice`, high-score critique seen keys), settings/profile keys (`aggregatorConfig` / `aggregator_settings`, `compiler_settings`, `autonomous_research_settings`, `autonomous_research_profiles`, `leanoj_solver_settings`, `leanoj_solver_profiles`, boost modal settings), prompt helpers, and related free-model/Supercharge fields. Active app mode and tab state are not persisted; a fresh frontend mount starts on the autonomous main interface. Browser storage namespacing is driven by `VITE_MOTO_STORAGE_PREFIX`; launch/control-plane config may supply `MOTO_FRONTEND_STORAGE_PREFIX` and project it into the frontend env. +**localStorage:** Key families include workflow/UI preferences (`workflow_panel_collapsed`, `developerModeSettingsEnabled`, `banner_shimmer_enabled`, `startup_provider_choice`, high-score critique seen keys), settings/profile keys (`aggregatorConfig` / `aggregator_settings`, `compiler_settings`, `autonomous_research_settings`, `autonomous_research_profiles`, `leanoj_solver_settings`, `leanoj_solver_profiles`, boost modal settings), bounded live-activity histories, prompt helpers, and related free-model/Supercharge fields. Persisted frontend settings must avoid secret-like fields and bulky refetchable provider metadata; quota recovery may clear refetchable caches while keeping essential settings. Active app mode and tab state are not persisted; a fresh frontend mount starts on the autonomous main interface. Browser storage namespacing is driven by `VITE_MOTO_STORAGE_PREFIX`; launch/control-plane config may supply `MOTO_FRONTEND_STORAGE_PREFIX` and project it into the frontend env. **Session (in-memory):** fallback state per role, boosted task IDs, boost next count, boosted categories, completed task IDs, free model manager state, and any explicit Boost override key. Boost override keys must never be persisted to `boost_state.json`; legacy plaintext keys are ignored/scrubbed on load. Boost logs and non-secret boost routing state persist under the active instance data root (`boost_api_log.txt`, `boost_state.json`) and are merged into the main API call log view. API call logs store previews/metadata by default; full prompt/response payload persistence is debug opt-in only, and provider/model error logs must report shape/status metadata instead of raw response bodies. diff --git a/.cursor/rules/hosted-web-contract.mdc b/.cursor/rules/hosted-web-contract.mdc index c4e221c..d443880 100644 --- a/.cursor/rules/hosted-web-contract.mdc +++ b/.cursor/rules/hosted-web-contract.mdc @@ -43,11 +43,12 @@ When `False`: program behaves as the existing open-source desktop release. When 2. **`rag_manager.py`** — generic mode skips global RAG lock for embedding calls (FastEmbed is in-process/thread-safe); ChromaDB write locking remains in both modes; synchronous ChromaDB calls and CPU-heavy RAG scoring must run off the FastAPI event loop 3. **`main.py` lifespan** — generic mode skips LM Studio connection test; auto-loads `OPENROUTER_API_KEY` from env if present 4. **`openrouter.py` LM Studio availability** — generic mode returns `{available: false, generic_mode: true}` without pinging LM Studio; workflow inference paths are OpenRouter-only even if hidden legacy diagnostics still exist. Compiler model diagnostics (`POST /api/compiler/test-models`) return unavailable in generic mode instead of touching LM Studio. -4b. **`cloud_access.py` desktop OAuth providers** — desktop/default mode only; generic mode returns unavailable for OAuth login/model routes until hosted callback/proxy login is explicitly designed. Default-mode OAuth model listings normalize provider catalog context/output fields for UI auto-fill; Codex product limits are distinct from regular OpenAI API limits, while xAI Grok/SuperGrok uses xAI subscription model metadata, filters catalog entries not accepted by the OAuth chat-completions route, and keeps loopback-only callback binding. +4b. **`cloud_access.py` desktop OAuth providers** — desktop/default mode only; generic mode returns unavailable for OAuth login/model routes until hosted callback/proxy login is explicitly designed. Hosted/generic settings should not poll desktop OAuth status/model endpoints when `/api/features` reports OAuth unavailable. Default-mode OAuth model listings normalize provider catalog context/output fields for UI auto-fill; Codex product limits are distinct from regular OpenAI API limits, while xAI Grok/SuperGrok uses xAI subscription model metadata, filters catalog entries not accepted by the OAuth chat-completions route, and keeps loopback-only callback binding. 5. **`download.py` PDF** — generic mode returns `501` (hosted Chromium browser runtime is not installed) 6. **Frontend** — calls `GET /api/features` on mount; when `generic_mode=True`, hosted clients hide LM Studio UI and default everything to OpenRouter. The bundled desktop frontend may still surface backend capability errors for desktop-only features. 7. **`middleware.py` + `websocket.py`** — generic mode validates internal proxy auth (`X-Moto-*` signed headers) on all non-allowlisted routes 8. **Long-running workflow isolation** — research/proof/RAG/Lean jobs may run in background tasks, but must not block the FastAPI event loop that serves GUI/status/health/API-key routes +9. **SyntheticLib4 / unified proof search** — generic mode keeps Lean execution gated as today, but authorized corpus search may still work through local snapshots or hosted SyntheticLib4 APIs when configured. SyntheticLib4 secrets follow generic-mode in-memory/env handling and must not be written to desktop keyring, runtime settings, prompts, API logs, or snapshot files. ## Instance-Scoped Runtime Contract (Both Modes) @@ -90,12 +91,15 @@ Build 0 lands the public identity subset first. Returns: "pdf_download_available": bool, "openai_codex_oauth_available": bool, "xai_grok_oauth_available": bool, + "sakana_fugu_available": bool, } ``` The current Build 5 runtime preserves the four identity fields while exposing the stable capability flags above. `proof_downshifted` is a proof workflow event for Lean-accepted proofs preserved under a narrower actual theorem statement, not a `/api/features` field. Later hosted work may extend `/api/features` with additional capability flags such as `max_submitters` and `tier3_available`, but the existing fields above remain stable and `api_contract_version` must bump when that happens. -Build 5 v30 updates desktop xAI Grok/SuperGrok OAuth to the current `auth.x.ai` PKCE endpoints, includes Grok API scopes, filters xAI catalog entries that are not chat-completions models, and clarifies that xAI Console API keys are separate from subscription-backed OAuth and may consume xAI API credits. Build 5 v29 adds desktop xAI Grok/SuperGrok OAuth as a second registry-backed OAuth provider (`xai_grok_oauth`) and exposes `xai_grok_oauth_available`; the frontend labels the OAuth provider area as `oAuth` and selects among configured OAuth providers without changing saved profile shape. Build 5 v28 makes compiler model diagnostics unavailable in generic mode so hosted requests cannot ping LM Studio. Build 5 v27 adds non-secret Codex OAuth token `updated_at` status metadata so relogin flows can distinguish a newly completed OAuth callback from an older saved token, and Codex UI selection is gated on model-list availability. Build 5 v26 adds `openai_codex_oauth_error` WebSocket notifications for unrecoverable desktop Codex OAuth model-call failures. Build 5 v25 adds `scope=manual` to proof-library browsing routes so archived manual proof runs can be viewed separately from the active manual proof context; manual clear/reset rejects while manual proof verification is active. Build 5 v24 adds `scope=autonomous|manual` to current proof listing, dependency, graph, and certificate routes so manual-writer proofs stay in an instance-level manual proof store instead of the autonomous session store. Build 5 v23 adds autonomous proof-round progress fields (`proof_round_index`, `proof_max_rounds`) to proof checkpoint WebSocket events. Build 5 v22 added run-level `allow_mathematical_proofs` / `allow_research_papers` start fields to Autonomous Research and Single Paper Writer. At least one must be true. Generic mode still keeps proof tooling unavailable: proof-only starts must fail clearly, while both-enabled/papers-only hosted runs must not invoke Lean/Z3 even if a client sends proof output enabled. +Current Build 5 contract notes: v53 includes desktop-only Sakana Fugu direct subscription API access (`sakana_fugu_available`, `/api/cloud-access/sakana-fugu/*`, Responses-first generation with chat-completions fallback, and `sakana_fugu_error` notifications), Codex OAuth `usage_limit_reached` cooldown metadata/notifications, and Assistant proof-pack OAuth-cooldown selection modes (`cached_oauth_cooldown`, `deterministic_oauth_cooldown`). v52 includes durable internal Assistant proof-memory cooldown/shutdown/no-history WebSocket events (`assistant_proof_memory_unavailable`, `assistant_proof_memory_cooldown`, `assistant_proof_memory_shutdown`), but user live activity should display only normal Assistant retrieval result logs such as `assistant_proof_pack_updated`. + +Earlier Build 5 contract additions include connectivity toggles/status, SyntheticLib4 and local proof-search routes, scoped proof/manual-history routes, provider/OAuth notification recovery, manual prompt recovery, proof-output allowed-output fields, and direct-context overflow events. Keep exact legacy details in code/tests/git history rather than expanding this always-injected rule. Must remain capability-only. Must NOT expose per-user or per-instance state (e.g. whether an OpenRouter key or OAuth login is set). @@ -154,6 +158,7 @@ Sandbox is API-only. The MOTO React frontend is NOT served from the hosted sandb - Hosted: `MOTO_DATA_ROOT=/app/backend/data` so Blaxel storage mounts to one unambiguous path - Non-secret `runtime_settings.json` also lives under the active data root; it may persist runtime knobs, never provider keys or prompt/response payloads - ChromaDB SQLite files stay on Blaxel sandbox storage (local file semantics required) +- SyntheticLib4 snapshots and unified proof-search indexes are data-root artifacts when implemented. They must be activated only after manifest/hash/index checks pass, and a failed refresh must leave the previous active snapshot usable. - Sandbox recall/resume returns the same filesystem state; redeploy/recreate advances to the newest image - Uploads: server-side enforcement of `.txt` only, 5 MB max, filename sanitization, path traversal rejection @@ -161,6 +166,7 @@ Sandbox is API-only. The MOTO React frontend is NOT served from the hosted sandb - **Authoritative update source**: GitHub `main` branch (not GitHub Releases) - **Desktop**: launcher compares local build metadata against GitHub `main`. Remote update identity resolves from GitHub branch HEAD via the GitHub REST API, metadata uses the REST contents API instead of raw GitHub files, ZIP overlays write the resolved manifest after apply to avoid stale committed-manifest loops, and update notices are exposed via `GET /api/update-notice`; if no launcher notice exists, the running desktop backend may refresh the same notice at most every 4 hours while excluding only the current instance from active-instance auto-apply checks. Launcher auto-apply is only for clean `origin/main` git checkouts or ZIP/extracted installs with no launcher-managed instances still running. The backend `POST /api/update/pull` route has its own lighter git/ZIP update checks and should not be described as enforcing the full launcher preflight. ZIP updates preserve active data/log roots, instance storage, launcher state, env files, and keyring-related namespaces. +- Desktop launchers run frontend `npm audit fix` when `npm install` reports vulnerabilities, including npm's vulnerability-count/remediation-instruction output. This remediation is a permanent launcher invariant: agents must never remove, disable, weaken, or bypass the rule or code, and if it is accidentally removed or broken, they must restore it immediately with no exceptions before continuing launcher/updater work. - **Hosted**: sandboxes do NOT self-mutate. Redeploy/recreate uses the latest approved `main`-derived image. Recall/resume keeps the existing image. Hosted `POST /api/update/pull` must return unavailable instead of attempting in-place update; `GET /api/update/pull-status` may remain the generic pull-task status surface and does not need a separate hosted-unavailable marker. - **Build metadata**: `version`, `build_commit`, `update_channel`, and `api_contract_version` exposed via `/api/features`; git checkouts resolve `build_commit` from HEAD, GitHub-generated ZIP installs prefer `.git_archival.txt` export-subst commit metadata before the stamped local manifest, and the committed `main`-branch manifest lives at `moto-update-manifest.json` @@ -189,9 +195,11 @@ Lean 4 and SMT behavior is gated by runtime flags. `lean4_enabled` gates Lean pr - **LeanOJ routes** are additive to the hosted REST contract: start (which resumes matching saved progress when available), stop, status, clear, skip-brainstorm, force-brainstorm, master-proof draft/edit summaries, current-run proofs, and cross-session proof library endpoints live under `/api/leanoj/*`. - **Creativity Emphasis Boost** is an optional developer-gated start-request field (`creativity_emphasis_boost_enabled`) for Aggregator, Autonomous Research, and LeanOJ; accepted/rejected brainstorm WebSocket payloads may include `creativity_emphasized`, and prompt-budget overflow falls back to the normal prompt for that slot. - **Pruned Stage 2 paper routes** are additive: pruned papers are removed from model context/RAG but remain downloadable under `/api/auto-research/paper-history/pruned*`; hard deletion is limited to explicit delete-all-pruned endpoints. -- **WebSocket progress events** for LeanOJ, compiler critique, provider/OAuth failures, and proof workflows are part of the web-surface contract only when consumed by the hosted wrapper or frontend. Keep them descriptive and stable enough for UI state, but avoid treating every internal progress notification as a permanent rule-level invariant. OAuth provider failures tell desktop users to reconnect the provider in Cloud Access & Keys after unrecoverable auth/model-call failure. Autonomous proof checkpoint progress events may include `proof_round_index` and `proof_max_rounds`; `proof_verified` must only emit after proof registration/reuse and include `proof_id`; proof novelty and duplicate-registration events include `novelty_tier` and `novelty_reasoning` for live activity display. Manual proof events must remain isolated from autonomous proof activity/notifications/graphs unless event payloads become explicitly scoped. +- **WebSocket progress events** for LeanOJ, compiler critique, provider/OAuth failures, and proof workflows are part of the web-surface contract only when consumed by the hosted wrapper or frontend. Keep them descriptive and stable enough for UI state, but avoid treating every internal progress notification as a permanent rule-level invariant. OAuth provider failures tell desktop users to reconnect the provider in OpenRouter/OAuth after unrecoverable auth/model-call failure. Autonomous proof checkpoint progress events may include `proof_round_index` and `proof_max_rounds`; `proof_verified` must only emit after proof registration/reuse and include `proof_id`; proof novelty and duplicate-registration events include `novelty_tier` and `novelty_reasoning` for live activity display. Manual proof events must remain isolated from autonomous proof activity/notifications/graphs unless event payloads become explicitly scoped. - **Proof certificate exports stay text-based** (`.lean` source + JSON metadata). No binary-only proof artifacts. - **Proof runtime config snapshot** (`ProofRuntimeConfigSnapshot`) is persisted via `research_metadata` and may also be supplied directly on manual `POST /api/proofs/check`; required state is `lean4_enabled=True` AND either a stored or request-provided snapshot. +- **SyntheticLib4 / MOTO proof search** is additive proof-history/corpus navigation. The shared `search_lean_proofs` tool adapter returns bounded retrieved proof context and provenance; it must not bypass MOTO's proof registration, Lean/integrity gates for MOTO-generated artifacts, or existing proof-runtime disablement in hosted generic mode. +- **Assistant memory-support role** is an additive workflow/settings surface. It is one shared non-blocking LLM role per workflow surface, not a per-lane submitter clone; validators never receive Assistant context and parent workflows never wait for it. Assistant may provide up to 7 verified proof supports, reuse useful packs for two eligible receiver reads before refresh, and skip true no-external-history targets with `assistant_proof_memory_unavailable` because it only performs proof-memory retrieval for now. Durable cooldown is keyed to stable run scope, grouping transient task IDs/roles while keeping real source/session IDs separate; it emits internal `assistant_proof_memory_cooldown` during zero-useful or stagnant backoff and internal `assistant_proof_memory_shutdown` only when repeated zero-useful retrieval disables retrieval for the run scope. Stagnant same-pack retrieval must not shut down. User live activity should not show skip/backoff/shutdown turns; show only normal Assistant retrieval result summaries. Session History Memory disabled disables Assistant and prevents stale pack injection. REST/WebSocket/profile schema additions for Assistant require this contract, `/openapi.json`, and `api_contract_version` to update in the same merge. - **`api_contract_version` bumps** apply the same way to proof additions as to the base contract: any new proof route or event added after Build 5 must bump the contract version in the same merge. ## Hosting Ownership diff --git a/.cursor/rules/json-prompt-design.mdc b/.cursor/rules/json-prompt-design.mdc index a45d51c..c03be3c 100644 --- a/.cursor/rules/json-prompt-design.mdc +++ b/.cursor/rules/json-prompt-design.mdc @@ -165,8 +165,9 @@ CORRECT RESPONSE: - Improve validator rigor (currently lacks evaluation depth) - Maintain existing prompt assembly order: System → JSON Schema → User Prompt → Context → RAG → Final Instruction - **MATH VARIANT**: Mathematical theorem/exposition validation focuses on rigor, logical correctness, and established mathematical principles rather than mandatory citation format. Empirical, artifact, and literature claims still require explicit support/citations or conservative wording where compiler validators enforce claim provenance. Models with web search capabilities are encouraged to use them for verification. -- **Proof Candidate JSON Contract**: Automated proof identification is novelty-first, not a known-knowledge-base builder. Candidate JSON must use `{"has_provable_theorems": bool, "theorems": [{"theorem_id": str, "statement": str, "formal_sketch": str, "expected_novelty_tier": "major_mathematical_discovery|mathematical_discovery|novel_variant|novel_formulation", "prompt_relevance_rationale": str, "novelty_rationale": str, "why_not_standard_known_result": str}]}`. Every automated proof JSON prompt must treat the USER RESEARCH PROMPT as the primary filter; bounded source-title/brainstorm-topic metadata is context only. Order candidates by direct prompt-solving value first, then prompt-solving discoveries/variants/formalizations that are absent from standard references/Mathlib and independently publishable/citable, then only necessary supporting lemmas for those targets. The three rationale fields must be non-empty or the candidate is skipped before Lean cost. Reject routine helpers, standard/textbook/Mathlib restatements, program-local firsts, single-tactic/routine proof goals, and general verified background-library entries. Never impose an artificial theorem-count cap unless explicitly requested; user-configurable proof concurrency batching limits simultaneous attempts only and must not truncate identified candidates. -- **Optional `lean_proof` Submission Contract**: Aggregator and LeanOJ brainstorm submitters that choose `submission_type="lean_proof"` should include the same novelty fields (`expected_novelty_tier`, `prompt_relevance_rationale`, `novelty_rationale`, `why_not_standard_known_result`) as ranking context. Submitter-side novelty means public/citable prompt-relevant novelty absent from standard references or Mathlib; program-local firsts do not qualify. The shared Lean proof gate may reject malformed submissions, failed Lean attempts, placeholders, or fake proof devices, but once Lean accepts real proof code it must preserve/register the artifact and let novelty/triviality ranking decide context retention, even for not-novel or downshifted supporting lemmas. In LeanOJ brainstorm flow, a proof-gated `lean_proof` submission is preserved/accepted once Lean and integrity checks pass; validator feedback can classify usefulness/context role but does not veto the verified artifact. LeanOJ final master-proof editing is template-solution-first and may use standard facts inline when they directly solve current obligations, but it must not accumulate a general known-knowledge library in `master_proof.lean`. +- **Proof Candidate JSON Contract**: Automated proof identification is impact-first, not a known-knowledge-base builder. Candidate JSON must use `{"has_provable_theorems": bool, "theorems": [{"theorem_id": str, "statement": str, "formal_sketch": str, "expected_novelty_tier": "major_mathematical_discovery|mathematical_discovery|novel_variant|novel_formulation", "prompt_relevance_rationale": str, "novelty_rationale": str, "why_not_standard_known_result": str}]}`. Every automated proof JSON prompt must treat the USER RESEARCH PROMPT as the primary filter; bounded source-title/brainstorm-topic metadata is context only. Order candidates by direct impact on the user's prompt: direct solutions or impossibility results first, then decisive reductions, obstructions, and structural theorems that themselves make major progress. The three rationale fields must be non-empty or the candidate is skipped before Lean cost. Reject supporting lemmas, routine helpers, standard/textbook/Mathlib restatements, program-local firsts, minor reformulations/local formalizations, trivial/easy proofs, single-tactic/routine proof goals, and general verified background-library entries. Never impose an artificial theorem-count cap unless explicitly requested; user-configurable proof concurrency batching limits simultaneous attempts only and must not truncate identified candidates. +- **Assistant Proof-Support Pack Contract**: Assistant is a separate shared non-blocking proof-support role for the active workflow, not a per-submitter role, proof candidate identifier, or validator. Assistant prompt/output schemas should keep the final model-visible pack capped at 7 fully Lean-verified proof supports by default, each with source/provenance, theorem statement/name, relevance/transfer reason, dependencies/imports, code/hash metadata, and target/freshness metadata. Useful packs are reused for two eligible receiver reads before refresh. Assistant skips true no-external-history targets because it only performs proof-memory retrieval for now. Durable run-scoped cooldown backs off repeated zero-useful retrieval and may shut off Assistant for that run; repeated stagnant same-pack retrieval backs off without shutdown. User live activity should show normal Assistant retrieval result summaries only, not skip/backoff/shutdown turns. Assistant may use current solver feedback/rejections/Lean errors as query material, but outputs must state when support is stale/cached and must not broaden proof-candidate eligibility beyond the user's prompt. Partial/failed artifacts are excluded from the final support pack unless a future explicit obstruction-debug mode says otherwise. +- **Optional `lean_proof` Submission Contract**: Aggregator and LeanOJ brainstorm submitters that choose `submission_type="lean_proof"` should include the same novelty fields (`expected_novelty_tier`, `prompt_relevance_rationale`, `novelty_rationale`, `why_not_standard_known_result`) as ranking context. Submitter-side novelty means public/citable prompt-relevant novelty absent from standard references or Mathlib; program-local firsts do not qualify. The shared Lean proof gate rejects missing/invalid novelty tiers and missing prompt-relevance/novelty/anti-standard-result rationales before Lean cost, in addition to malformed submissions, failed Lean attempts, placeholders, and fake proof devices. Once Lean accepts real proof code, preserve/register the artifact and let novelty/triviality ranking decide context retention, even for not-novel or downshifted actual-theorem artifacts. This preservation rule is not permission to target supporting lemmas. Normal Aggregator validation may reject Lean-verified proof artifacts from the accepted-submission database when the actual theorem is low-impact, trivial, routine, or redundant. LeanOJ brainstorm flow is template-solution-specific: proof-gated `lean_proof` artifacts may be accepted only when useful for exact template obligations. LeanOJ final master-proof editing is template-solution-first and may use standard facts inline when they directly solve current obligations, but it must not accumulate a general known-knowledge library in `master_proof.lean`. - **Compiler Outline Injection**: The compiler outline is always fully injected (never truncated, never RAGed) for all modes because it provides the structural framework for document construction and validation. - **TEMPERATURE POLICY**: Default all prompts to `temperature=0.0`. Only two exceptions are allowed: Supercharge candidate attempts and parallel brainstorm submitter lanes. Validators, compiler roles, proof/final roles, and JSON retries must stay deterministic. - **Supercharge Schema Preservation**: Per-role Supercharge calls generate 4 full answer attempts plus a 5th synthesis answer. Candidate attempts must be sanitized to reusable visible answer text before the 5th call; private thought/channel/control transcript text must never be fed into synthesis, retries, feedback memory, accepted memory, or RAG. The synthesis prompt must place the final instruction after the candidate block, treat candidates as optional working material, and preserve the original task's exact output contract; if the original role expects JSON, the 5th answer must output only valid JSON in that same schema and must not mention Supercharge or candidate attempts. @@ -189,7 +190,7 @@ CORRECT RESPONSE: - **LeanOJ Proof Validation Boundary**: Lean 4 is authoritative formal checking for LeanOJ success, but LLM validators still gate planning decisions, Lean-accepted subproof relevance, and final semantic review. A compiled subproof must not be stored as verified run context unless it matches the requested subproof/role; a compiled final solution must not stop the run unless it preserves the template and the Final Proof Solver confirms it solves the actual prompt rather than a formal loophole. - **Aggregator Submitter JSON Retry**: Aggregator submitter retries malformed/non-JSON responses through its standard conversational JSON/LaTeX escaping repair path. The retry preserves sanitized visible failed-output context when useful, but parser exception text inserted into prompts must not replay raw provider output. - **Standard LaTeX-Focused Retry**: Retry prompts explain HOW to escape LaTeX properly. **LaTeX IS allowed** - just escape backslashes once (`\mathbb` → `\\mathbb`). DO NOT double-escape. For `old_string`: copy EXACTLY from document, just escape backslashes. -- **Retry Context Overflow Prevention (CRITICAL)**: Sanitize failed output, then truncate to ~2000 chars before retry. Parser exception messages that are inserted into retry prompts must report failure type/structure only and must not include raw output excerpts. Calculate if retry fits context window. Fall back to simple re-prompt if too large. Set `max_tokens` explicitly (never `None`). NEVER auto-increase beyond user limits. Applies to: `submitter.py`, `validator.py`, `high_context_submitter.py`, `high_param_submitter.py`, `compiler_validator.py`. +- **Retry Context Overflow Prevention (CRITICAL)**: Sanitize failed output, then truncate to ~2000 chars before retry. Parser exception messages that are inserted into retry prompts must report failure type/structure only and must not include raw output excerpts. Calculate if retry fits context window. Fall back to simple re-prompt if too large. Set `max_tokens` explicitly (never `None`). NEVER auto-increase beyond user limits. Applies to: `submitter.py`, `validator.py`, `writer_submitter.py`, `high_param_submitter.py`, `compiler_validator.py`. ## Internal Content Warning (Required in Most Research/Writing Prompts) @@ -288,8 +289,7 @@ VALIDATION DECISION RULES: A submission should be ACCEPTED if it: 1. Aggressively attacks the user's WHOLE question as stated, no partial solutions, OR 2. Addresses the next best necessary piece when a whole-question attack is absolutely not possible in one superintelligence brainstorm, OR -3. Offers rigorous enabling insights not present in existing accepted submissions when a stronger direct step is not yet available, OR -4. Presents rigorous mathematical arguments based on established principles +3. Offers rigorous enabling insights only when they materially strengthen a direct route to the full answer and no stronger direct step is available A submission should be REJECTED if it: 1. Is redundant with the existing accepted submissions @@ -344,10 +344,10 @@ REASONS FOR REMOVAL - A submission should be removed if it: 5. Contains claims that CONFLICT with established mathematical principles evident in other submissions REASONS TO KEEP - A submission should be kept if it: -1. Provides ANY unique information not covered elsewhere -2. Offers a different perspective or approach even if related to other content -3. Contains specific mathematical details, proofs, or techniques -4. Contributes to solution diversity in any meaningful way +1. Provides unique information that materially strengthens a direct route to the user's full prompt +2. Offers a different perspective or approach that materially improves the best direct solution path +3. Contains specific mathematical details, proofs, or techniques necessary for direct prompt progress +4. Contributes to solution diversity only when that diversity improves credible direct-answer progress CONSERVATIVE APPROACH: - When in doubt, DO NOT recommend removal @@ -497,8 +497,7 @@ VALIDATION DECISION RULES (for each submission): A submission should be ACCEPTED if it: 1. Aggressively attacks the user's WHOLE question as stated, no partial solutions, OR 2. Addresses the next best necessary piece when a whole-question attack is absolutely not possible in one superintelligence brainstorm, OR -3. Offers rigorous enabling insights not present in existing accepted submissions when a stronger direct step is not yet available, OR -4. Presents rigorous mathematical arguments based on established principles +3. Offers rigorous enabling insights only when they materially strengthen a direct route to the full answer and no stronger direct step is available A submission should be REJECTED if it: 1. Is redundant with the existing accepted submissions @@ -651,8 +650,7 @@ VALIDATION DECISION RULES (for each submission): A submission should be ACCEPTED if it: 1. Aggressively attacks the user's WHOLE question as stated, no partial solutions, OR 2. Addresses the next best necessary piece when a whole-question attack is absolutely not possible in one superintelligence brainstorm, OR -3. Offers rigorous enabling insights not present in existing accepted submissions when a stronger direct step is not yet available, OR -4. Presents rigorous mathematical arguments based on established principles +3. Offers rigorous enabling insights only when they materially strengthen a direct route to the full answer and no stronger direct step is available A submission should be REJECTED if it: 1. Is redundant with the existing accepted submissions @@ -1066,7 +1064,7 @@ REJECT if: Any criterion fails (especially duplicate section/subsection headers --- -## 4. COMPILER-SUBMITTER CONSTRUCTION PROMPTS (PHASE-BASED) +## 4. WRITING SUBMITTER CONSTRUCTION PROMPTS (PHASE-BASED) **File:** `backend/compiler/prompts/construction_prompts.py` @@ -1406,7 +1404,7 @@ During autonomous paper compilation, the construction JSON includes an optional **Independent Validity Principle**: Each operation must be justified on its own merits. Paper content must not depend on a brainstorm correction. Brainstorm corrections must not depend on paper content. -**Models**: `BrainstormRetroactiveOperation` in `models.py`. Parsed in `high_context_submitter.py`. Handled in `compiler_coordinator._handle_brainstorm_retroactive_operation()`. Validated in `compiler_validator.validate_brainstorm_operation()`. +**Models**: `BrainstormRetroactiveOperation` in `models.py`. Parsed in `writer_submitter.py`. Handled in `compiler_coordinator._handle_brainstorm_retroactive_operation()`. Validated in `compiler_validator.validate_brainstorm_operation()`. --- @@ -1690,7 +1688,7 @@ Output your response ONLY as JSON in this exact format: ## 8. CRITIQUE & SELF-REVIEW PHASE (POST-BODY CONSTRUCTION) -**File:** `backend/compiler/prompts/critique_prompts.py` +**Owner:** Rigor & Proofs Submitter generation (`backend/compiler/agents/high_param_submitter.py`) with Validator-owned acceptance/cleanup. ### Overview @@ -1699,12 +1697,11 @@ After the body section is complete (before conclusion), the system enters a **Cr ### Workflow 1. **Critique Aggregation** (3 total attempts required): - - Single critique submitter generates peer review feedback on body section + - Rigor & Proofs Submitter generates peer review feedback on body section; legacy critique role fields are compatibility aliases - **Decline Mechanism**: Submitter can assess "no critique needed" when body is academically acceptable (counts toward 3 total attempts) - Validator validates critiques/declines (accept/reject with feedback loop) - - Pruning occurs on the child Aggregator's run-local 7-acceptance cleanup cadence - Target: 3 total attempts (accepted + rejected + declined attempts) - - Uses aggregator workflow with critique-specific prompts + - Runs inside the compiler critique phase, not as a separate Aggregator workflow 2. **Self-Review Append**: - If at least 1 critique is accepted: append accepted critiques as `AI Self-Review and Limitations` @@ -1738,110 +1735,14 @@ After the body section is complete (before conclusion), the system enters a **Cr - If accepted critiques exist: append `AI Self-Review and Limitations`, then transition to conclusion - Rationale: With only 3 attempts, no early termination mechanism is needed -### Complete Prompt Structure - Critique Generation - -**Function:** `get_critique_submitter_system_prompt()` - -```python -def get_critique_submitter_system_prompt() -> str: - return """You are a peer reviewer generating constructive criticism of a mathematical document's body section. - -[... INTERNAL CONTENT WARNING ...] - -YOUR TASK: -Identify specific issues, errors, gaps, or improvements needed in the body section. - -WHAT TO CRITIQUE: -- Mathematical errors or unsound reasoning -- Missing proofs or incomplete arguments -- Logical gaps or unclear transitions -- Redundancy or verbosity -- Structural issues (sections out of order) -- Missing content per outline -- Content misaligned with paper title - -WHAT NOT TO CRITIQUE: -- Conclusion/intro/abstract (not written yet) -- Stylistic preferences -- Minor formatting issues - -Output as JSON: -{ - "critique_needed": true or false, - "submission": "Detailed critique (empty string if critique_needed=false)", - "reasoning": "Why critique is/isn't needed" -} - -Examples: -- Critique needed: {"critique_needed": true, "submission": "Section III has flawed proof...", "reasoning": "Critical error"} -- Decline: {"critique_needed": false, "submission": "", "reasoning": "Body is academically acceptable, no substantive issues"} - (Note: Counts as 1 attempt toward 10 total) -""" -``` - -### Complete Prompt Structure - Critique Validation - -**Function:** `get_critique_validator_system_prompt()` - -```python -def get_critique_validator_system_prompt() -> str: - return """You are validating peer review critiques. - -[... INTERNAL CONTENT WARNING ...] - -YOUR TASK: -Decide if critique identifies legitimate issue that would improve the paper. - -ACCEPT if: -- Identifies real mathematical error -- Points out missing content per outline -- Identifies structural issues -- Is specific and actionable - -REJECT if: -- Vague or unhelpful -- Redundant with existing critiques -- Stylistic preference -- Incorrect (body is fine) - -Output as JSON: -{ - "decision": "accept or reject", - "reasoning": "Detailed explanation", - "summary": "Brief summary if rejected (max 750 chars)" -} -""" -``` - -### Assembly in `build_critique_prompt()` - -```python -# Parts assembled in order: -1. get_critique_submitter_system_prompt() -2. "\n---\n" -3. get_critique_json_schema() -4. "\n---\n" -5. f"USER COMPILER-DIRECTING PROMPT:\n{user_prompt}" # ALWAYS direct -6. "\n---\n" -7. f"CURRENT OUTLINE:\n{current_outline}" # ALWAYS fully injected -8. "\n---\n" -9. f"CURRENT BODY SECTION (to critique):\n{current_body}" # Direct if fits, RAG if large -10. "\n---\n" -11. f"AGGREGATOR DATABASE:\n{aggregator_db}" # RAG retrieved -12. if reference_papers: "\n---\nREFERENCE PAPERS:\n{reference_papers}" -13. if existing_critiques: "\n---\nEXISTING ACCEPTED CRITIQUES:\n{existing_critiques}" -14. if rejection_feedback: "\n---\nYOUR LAST 5 REJECTIONS (Learn from these):\n{rejection_feedback}" -15. "\n---\n" -16. "Now generate your critique as JSON:" -``` - ### JSON Schemas **Critique Submission:** ```json { - "submission": "Detailed critique of specific issue", - "reasoning": "Why this is important" + "critique_needed": true, + "submission": "Specific critique, or empty string when critique_needed=false", + "reasoning": "Why critique is or is not needed" } ``` @@ -1869,9 +1770,9 @@ Output as JSON: The rigor loop no longer edits paper text directly during discovery/formalization. Each rigor cycle runs four stages, with the coordinator owning inline validator attempts and appendix routing: **Stage 1: Theorem discovery (unvalidated)** — `build_rigor_theorem_discovery_prompt` -- High-param submitter reads the full writing context (outline direct-injected, paper direct-injected when it fits, RAG for the rest per the offload priority excluding `compiler_outline.txt` + `compiler_paper.txt`). +- Rigor & Proofs Submitter reads the full writing context (outline direct-injected, paper direct-injected when it fits, RAG for the rest per the offload priority excluding `compiler_outline.txt` + `compiler_paper.txt`). - Sees `EXISTING VERIFIED PROOFS` block (from `proof_database.get_all_proofs()`) so it does not re-propose already-verified theorems. -- Sees `OPEN LEMMA TARGETS` block (from `proof_database.get_recent_failure_hints()`) as optional retry candidates. +- Sees `OPEN PROOF TARGETS` block (from `proof_database.get_recent_failure_hints()`) as optional retry candidates for the same high-impact target. - Decides whether a user-prompt-relevant theorem is worth attempting. Decline ends the rigor cycle. - Discovery is explicitly allowed to construct extension theorems from partial paper work, the current outline, supporting context, or the user prompt when helpful to paper construction and/or the user's goal. It is not limited to exact claims already present in the current paper. - Discovery must classify `theorem_origin` as `existing_paper_claim`, `extension_from_partial_work`, or `extension_from_user_prompt`, and must set `placement_preference` to `inline` or `appendix_only`. Extension-derived theorems must use `appendix_only`. @@ -1879,7 +1780,7 @@ The rigor loop no longer edits paper text directly during discovery/formalizatio **Stage 2: Lean 4 formalization** — compiler rigor uses the serial `ProofFormalizationAgent.prove_candidate(max_attempts=5)` path; autonomous proof verification has its own parallel Phase-A pipeline with full-script plus tactic-script attempts - Up to 5 Lean 4 attempts with error-feedback chaining (failing tactic + goal states + raw Lean diagnostics fed back into each retry). - Emits proof progress events with `source_type="compiler_rigor"` so the existing autonomous-mode proof UI can display the flow. Keep frontend-consumed event names stable; `proof_verified` is reserved for the registered/stored proof event. -- All-5-fail: candidate is recorded via `proof_database.record_failed_candidate` (becomes a future open lemma target) and the cycle ends as a decline. +- All-5-fail: candidate is recorded via `proof_database.record_failed_candidate` (becomes a future open high-impact proof target) and the cycle ends as a decline. **Stage 3: Post-Lean integrity + novelty classification + persistence** — shared `validate_full_lean_proof_integrity` helper from `backend/shared/lean_proof_integrity.py`, then shared `assess_proof_novelty` helper from `backend/autonomous/core/proof_novelty.py` - Rejects Lean-accepted proofs that introduce new fake proof devices (`axiom`, `constant`, `opaque`) not present in the source context. @@ -1906,7 +1807,7 @@ The rigor loop no longer edits paper text directly during discovery/formalizatio "source_excerpt": "2-6 sentences of motivating paper/outline/context/user-prompt basis", "theorem_origin": "existing_paper_claim | extension_from_partial_work | extension_from_user_prompt", "placement_preference": "inline | appendix_only", - "retry_existing_failure_id": "theorem_id from OPEN LEMMA TARGETS if retrying, empty otherwise", + "retry_existing_failure_id": "theorem_id from OPEN PROOF TARGETS if retrying, empty otherwise", "reasoning": "why this theorem is the best target right now OR why no theorem" } ``` @@ -1971,7 +1872,7 @@ Lean 4 proof: - Paper: direct-injected when it fits; otherwise RAG'd under `mode="rigor"` excluding `compiler_outline.txt` + `compiler_paper.txt`. - RAG evidence: follows the offload priority (Shared Training DB → Local Submitter DB → Rejection Log → User Upload Files). - EXISTING VERIFIED PROOFS block: compact `(proof_id, novel, statement)` tuples from `proof_database.get_all_proofs()`. -- OPEN LEMMA TARGETS block: recent failure hints from `proof_database.get_recent_failure_hints(limit=5)` (`theorem_id`, statement, Lean error summary, suggested lemma names). +- OPEN PROOF TARGETS block: recent failure hints from `proof_database.get_recent_failure_hints(limit=5)` (`theorem_id`, statement, Lean error summary, and formalization blocker clues). These are retry context for the same high-impact target, not permission to pursue supporting lemmas. ### Websocket events surfaced by the rigor flow @@ -1981,9 +1882,9 @@ Compiler rigor progress should be visible through the standard proof/compiler We ## 10. WOLFRAM ALPHA TOOL (CONSTRUCTION MODE) -**File:** `backend/compiler/agents/high_context_submitter.py` +**File:** `backend/compiler/agents/writer_submitter.py` -Wolfram Alpha is exposed to the main writer as a real OpenAI-compatible tool only during `HighContextSubmitter.submit_construction` (body / conclusion / introduction / abstract). It is NOT available in `outline_create`, `outline_update`, `review`, or the rigor loop. When `system_config.wolfram_alpha_enabled=false` (or the Wolfram client failed to initialize), the tool is not registered on the LLM call at all and construction collapses to the pre-Build-4 single-shot call. +Wolfram Alpha is exposed to the main writer as a real OpenAI-compatible tool only during `WritingSubmitter.submit_construction` (body / conclusion / introduction / abstract). It is NOT available in `outline_create`, `outline_update`, `review`, or the rigor loop. When `system_config.wolfram_alpha_enabled=false` (or the Wolfram client failed to initialize), the tool is not registered on the LLM call at all and construction collapses to the pre-Build-4 single-shot call. ### Tool Schema @@ -2007,7 +1908,7 @@ WOLFRAM_TOOL_SCHEMA = { ### Budget + Loop Semantics -- **20 Wolfram calls per construction submission**, defined by `WOLFRAM_MAX_CALLS_PER_SUBMISSION` in `high_context_submitter.py`. +- **20 Wolfram calls per construction submission**, defined by `WOLFRAM_MAX_CALLS_PER_SUBMISSION` in `writer_submitter.py`. - The submitter loop: call LLM with tools attached → execute each `tool_calls[]` entry via `wolfram_client.query(...)` → append a `role=tool` turn per call → re-call LLM. Repeat until (a) the LLM returns a non-tool message (final JSON construction submission) or (b) the 20-call budget is exhausted. - On budget exhaustion, the coordinator appends a one-time user-role reminder ("You have used all 20 Wolfram Alpha calls for this submission. Finalize your JSON response now.") and re-calls the LLM with `tools=None` so the model must produce a final JSON response. - Fallback: if the tool-loop raises, construction falls back to a plain single-shot `generate_completion` call so forward progress is never blocked by tool-loop failures. @@ -2031,7 +1932,7 @@ Per Wolfram call, the submitter broadcasts: { "type": "compiler_wolfram_call", "data": { - "task_id": "comp_hc_007", + "task_id": "comp_writer_007", "query_redacted": true, "purpose_redacted": true, "result_redacted": true, @@ -2052,6 +1953,74 @@ The frontend's `CompilerLogs.jsx` renders redacted Wolfram metadata, not raw que --- +## 11. LEAN PROOF SEARCH TOOL + +**File:** `backend/shared/proof_search/tool_adapter.py` + +`search_lean_proofs` is the shared OpenAI-compatible tool adapter for searching indexed MOTO proof history and SyntheticLib4 proof records. It is retrieval/navigation infrastructure only; it does not validate proofs, alter Lean gates, or replace MOTO proof registration. + +### Tool Schema + +```python +SEARCH_LEAN_PROOFS_TOOL_SCHEMA = { + "type": "function", + "function": { + "name": "search_lean_proofs", + "description": "Search MOTO local proof history and SyntheticLib4 proof records for prompt-relevant Lean proof patterns. ...", + "parameters": { + "type": "object", + "properties": { + "action": {"type": "string", "enum": ["overview", "search", "hydrate", "attest_usage"]}, + "query": {"type": "string"}, + "goal_statement": {"type": "string"}, + "lean_template": {"type": "string"}, + "imports": {"type": "array", "items": {"type": "string"}}, + "dependency_names": {"type": "array", "items": {"type": "string"}}, + "corpora": {"type": "array", "items": {"type": "string", "enum": ["moto", "manual", "autonomous", "leanoj", "syntheticlib4"]}}, + "verified_only": {"type": "boolean"}, + "include_partial": {"type": "boolean"}, + "include_failed": {"type": "boolean"}, + "novelty_filters": {"type": "array", "items": {"type": "string"}}, + "module_filters": {"type": "array", "items": {"type": "string"}}, + "source_filters": {"type": "array", "items": {"type": "string"}}, + "exclude_ids": {"type": "array", "items": {"type": "string"}}, + "limit": {"type": "integer", "minimum": 1, "maximum": 7}, + "cursor": {"type": "string"}, + "hydrate_lean_code": {"type": "boolean"}, + "search_mode": {"type": "string", "enum": ["auto", "exact", "lexical", "text", "semantic", "hybrid"]}, + "source": {"type": "string", "enum": ["moto", "manual", "autonomous", "leanoj", "syntheticlib4"]}, + "proof_id": {"type": "string"}, + "fingerprint": {"type": "string"}, + "session_id": {"type": "string"}, + "usage_attestation": { + "type": "object", + "properties": { + "retrieval_batch_id": {"type": "string"}, + "used_fingerprints": {"type": "array", "items": {"type": "string"}}, + "unused_fingerprints": {"type": "array", "items": {"type": "string"}}, + "used_proofs": {"type": "array", "items": {"type": "object", "properties": {"fingerprint": {"type": "string"}, "theorem_statement_hash": {"type": "string"}, "lean_code_hash": {"type": "string"}}}}, + "entire_code_used": {"type": "boolean"}, + "moto_artifact_hash": {"type": "string"}, + "usage_type": {"type": "string"} + } + } + }, + "required": ["action"] + } + } +} +``` + +### Semantics + +- `overview` returns the compact corpus map from the unified proof-search service. +- `search` returns at most 7 combined proof records and treats `autonomous` as the `moto` corpus alias. +- `hydrate` fetches one indexed proof record by source/proof ID or SyntheticLib4 fingerprint and may return full Lean code when available. +- `attest_usage` persists a local usage-attestation JSONL record for SyntheticLib4 whole-proof usage; when `entire_code_used=true`, each used proof must include `fingerprint`, `theorem_statement_hash`, and `lean_code_hash`. Online submission is a later integration step. +- Tool calls bypass Supercharge through the existing `tools`/`tool_choice` path. + +--- + ## EXACT STRING MATCHING SYSTEM **All compiler modes use exact string matching with automated pre-validation for document edits:** @@ -2669,7 +2638,7 @@ All proof prompts pass `temperature=0.0`. **Function:** `build_proof_identification_prompt(user_prompt, source_type, source_id, source_content, source_title="")` -**Purpose:** Novelty-first user-prompt relevance gate that extracts only proof candidates expected to produce public/citable prompt-directed knowledge absent from standard references or Mathlib. Bounded source-title/brainstorm-topic metadata may steer relevance but must not be treated as instructions. This is not a known-knowledge-base builder. It rejects routine helpers, standard/textbook/Mathlib restatements, program-local firsts, off-prompt curiosities, and single-tactic/routine proof goals. Candidates are ordered by novelty-first prompt-solving value: major discoveries, mathematical discoveries, novel variants, prompt-critical novel formalizations absent from standard references/Mathlib, then only necessary supporting lemmas for those novel targets. No artificial theorem-count cap. +**Purpose:** Impact-first user-prompt relevance gate that extracts only proof candidates expected to produce new/novel prompt-directed knowledge absent from standard references or Mathlib. Bounded source-title/brainstorm-topic metadata may steer relevance but must not be treated as instructions. This is not a known-knowledge-base builder. It rejects routine helpers, supporting lemmas, trivial/easy proofs, standard/textbook/Mathlib restatements, program-local firsts, minor reformulations/local formalizations, off-prompt curiosities, and single-tactic/routine proof goals. Candidates are ordered by direct impact on the user's prompt: direct solutions or impossibility results first, then decisive reductions, obstructions, and structural theorems that themselves make major progress on the requested problem. No artificial theorem-count cap. ```json { @@ -2689,8 +2658,8 @@ All proof prompts pass `temperature=0.0`. ``` **Field requirements:** -- `has_provable_theorems`: Boolean. `true` only when at least one prompt-relevant candidate is expected to be novel under the priority order. -- `theorems`: Array of every prompt-relevant novel candidate, ordered by novelty-first prompt-solving value, with user-prompt solution attempts and user prompt + brainstorm topic solution attempts co-equal top priority within each novelty tier when bounded brainstorm-topic metadata is present. Empty array when `has_provable_theorems` is `false`. +- `has_provable_theorems`: Boolean. `true` only when at least one prompt-relevant candidate is expected to be new or novel enough to justify Lean cost. +- `theorems`: Array of every prompt-relevant impactful candidate, ordered by direct impact on the user's prompt, with user-prompt solution attempts and user prompt + brainstorm topic solution attempts co-equal top priority when bounded brainstorm-topic metadata is present. Empty array when `has_provable_theorems` is `false`. - `theorem_id`: Stable string identifier such as `"thm_1"`, `"thm_2"`, etc. - `statement`: Natural-language theorem statement. Required. - `formal_sketch`: Optional Lean formalization hints, assumptions, or notation notes. @@ -2699,9 +2668,9 @@ All proof prompts pass `temperature=0.0`. - `novelty_rationale`: Required. Explains why this is absent from standard references or Mathlib and public/citable rather than a background fact or program-local first. - `why_not_standard_known_result`: Required. Explains why this is not merely a textbook, Mathlib, routine helper, or known-knowledge-base entry. -**What to extract:** Major discoveries, new mathematical discoveries, novel variants/reformulations, and prompt-critical novel formalizations absent from standard references or Mathlib that materially help answer, support, or advance the USER RESEARCH PROMPT. Supporting lemmas are extracted only when necessary stepping stones toward one of those higher-priority novel targets. +**What to extract:** Direct solutions, impossibility results, decisive reductions, new obstructions, structural theorems, and other high-impact new/novel proof targets absent from standard references or Mathlib that materially help answer, support, or advance the USER RESEARCH PROMPT. Supporting lemmas are not extracted, even as fallback targets. -**What to reject:** Off-prompt mathematical curiosities, routine helper lemmas, local bookkeeping facts, algebra cleanup, coercion/monotonicity facts, standard Mathlib/textbook restatements, general verified background-library entries, results closable by routine proof search or a single tactic (`simp`, `omega`, `norm_num`, `decide`, `aesop`, `rfl`), tautologies, and definitional equalities. +**What to reject:** Off-prompt mathematical curiosities, routine helper lemmas, minor reformulations/local formalizations, local bookkeeping facts, algebra cleanup, coercion/monotonicity facts, standard Mathlib/textbook restatements, general verified background-library entries, results closable by routine proof search or a single tactic (`simp`, `omega`, `norm_num`, `decide`, `aesop`, `rfl`), tautologies, and definitional equalities. --- @@ -2771,7 +2740,7 @@ All proof prompts pass `temperature=0.0`. - `sorry` / `admit` anywhere → proof rejected, counts as a failed attempt. - Axiomatizing the theorem's own concepts to make the goal trivial → rejected. - Complete source brainstorm/paper is mandatory direct context; do not silently truncate it or replace it with the focused excerpt. -- If the full claim cannot be proved, return a narrower concrete lemma rather than a `sorry`-closed stub. +- If the full claim cannot be proved, do not replace it with a narrower/supporting/trivial lemma; submit the strongest faithful attempt at the selected high-impact target and let Lean feedback expose the blocker. - PRESERVE the theorem's non-trivial content — do not simplify into a trivial identity to make it compile. --- @@ -2820,7 +2789,7 @@ All proof prompts pass `temperature=0.0`. ``` **Field requirements:** -- `novelty_tier`: One of `not_novel`, `novel_formulation`, `novel_variant`, `mathematical_discovery`, or `major_mathematical_discovery`. Any tier except `not_novel` enters the highest-priority direct-injection block for all subsequent brainstorm/paper submitters via `proof_database.get_novel_proofs_for_injection()`. `not_novel` proofs are stored in the database but not injected. +- `novelty_tier`: One of `not_novel`, `novel_formulation`, `novel_variant`, `mathematical_discovery`, or `major_mathematical_discovery`. Any tier except `not_novel` enters the highest-priority direct-injection block for all subsequent brainstorm/paper submitters via `proof_database.get_novel_proofs_for_injection()`. `not_novel` proofs are stored in the database but not injected; manual "Try to Prove This" still preserves/appends non-duplicate Lean-verified `not_novel` proofs intentionally for user-visible RALPH-loop state and exact duplicate avoidance. - `reasoning`: Always required. **Novelty tiers:** diff --git a/.cursor/rules/main-rule-3-code-interaction-and-rule-interaction-rules.mdc b/.cursor/rules/main-rule-3-code-interaction-and-rule-interaction-rules.mdc index b963632..366747d 100644 --- a/.cursor/rules/main-rule-3-code-interaction-and-rule-interaction-rules.mdc +++ b/.cursor/rules/main-rule-3-code-interaction-and-rule-interaction-rules.mdc @@ -19,11 +19,15 @@ alwaysApply: true 7.) Any REST shape, auth contract, `/api/features` capability, or web-consumed WebSocket contract change must update **code, the relevant rule(s), and `api_contract_version` in `/api/features`** in the same approved merge. The live backend's `GET /openapi.json` is the machine-readable REST schema contract. +7b.) SyntheticLib4 / unified proof-history search rule updates should stay high-level until the specific build phase lands. Exact JSON prompt schemas and prompt text are updated word-for-word only when the corresponding prompt/tool implementation is undertaken; planning docs alone do not require an API contract bump. + +7c.) The desktop launcher `npm audit fix` remediation is permanent. Never remove, disable, weaken, or bypass the launcher code or rule that runs `npm audit fix` when `npm install` reports vulnerabilities; if it is accidentally removed or broken, restore it immediately with no exceptions before continuing launcher/updater work. + 8.) Only ONE workflow mode may be active at a time (Aggregator, Compiler, Autonomous Research, or LeanOJ Proof Solver). This constraint applies identically in both default mode and generic mode. Start conflict checks must be serialized and include pending/background-task activity flags such as `autonomous_coordinator.is_active`, not only persisted `state.is_running` fields. 8b.) Autonomous Research and Single Paper Writer expose run-level Allowed Outputs (`allow_mathematical_proofs`, `allow_research_papers`); at least one must be true. Both true preserves existing workflow behavior. The Mathematical Proofs checkbox is the user-facing Lean proof-output enable path and must either sync/enable the runtime proof setting or the backend must reject proof-only/proof-requested starts when Lean is unavailable. Disabling papers must not disable brainstorming itself; proof-only autonomous runs must not silently become brainstorm-only loops and must reset durable workflow state to the next topic/exploration boundary after proof work instead of leaving `pre_paper_compilation`. Disabling proofs must skip proof-output work without affecting developer-only creativity boost behavior. -9.) Lean 4 and SMT features are gated by runtime flags: `lean4_enabled` gates Lean proof execution/model proof work, `lean4_lsp_enabled` only gates the optional persistent LSP optimization (subprocess Lean must still work when it is false), and `smt_enabled` gates Z3/SMT hint generation. All three default false; when disabled they must not invoke their corresponding toolchains, spend proof-model calls, or block workflows, and must never ship Lean or Z3 toolchains or Python wheels into `requirements-generic.txt`, `Dockerfile`, or `docker/entrypoint.sh` (hosted image stays Lean-free and Z3-free). Lean 4 is authoritative formal checking for every stored proof and is necessary for LeanOJ final solutions; SMT contributes hints only, and only valid `unsat` SMT checks become suggested Lean tactics. Z3 executable paths are trusted startup/operator configuration only, must be rejected as runtime API input, and must resolve to a `z3`/`z3.exe` executable. Automated proof candidates should directly serve the user prompt and be novelty-first before Lean cost, but once Lean accepts real proof code it must be preserved and novelty-ranked under the actual proved statement; statement mismatch downshifts storage instead of discarding. Do not spend proof attempts building a general known-knowledge base of routine helpers, standard Mathlib/textbook facts, or merely non-trivial background lemmas. LeanOJ final master-proof edits may use standard facts inline to solve the template, but must not accumulate a separate known-knowledge library. +9.) Lean 4 and SMT features are gated by runtime flags: `lean4_enabled` gates Lean proof execution/model proof work, `lean4_lsp_enabled` only gates the optional persistent LSP optimization (subprocess Lean must still work when it is false), and `smt_enabled` gates Z3/SMT hint generation. All three default false; when disabled they must not invoke their corresponding toolchains, spend proof-model calls, or block workflows, and must never ship Lean or Z3 toolchains or Python wheels into `requirements-generic.txt`, `Dockerfile`, or `docker/entrypoint.sh` (hosted image stays Lean-free and Z3-free). Lean 4 is authoritative formal checking for every stored proof and is necessary for LeanOJ final solutions; SMT contributes hints only, and only valid `unsat` SMT checks become suggested Lean tactics. Z3 executable paths are trusted startup/operator configuration only, must be rejected as runtime API input, and must resolve to a `z3`/`z3.exe` executable. Automated proof candidates should directly serve the user prompt and prioritize high-impact prompt-solving targets before Lean cost, but once Lean accepts real proof code it must be preserved and novelty-ranked under the actual proved statement; statement mismatch downshifts storage instead of discarding. Do not spend proof attempts building a general known-knowledge base of supporting lemmas, routine helpers, standard Mathlib/textbook facts, minor reformulations/local formalizations, trivial/easy proofs, or merely non-trivial background lemmas. LeanOJ final master-proof edits may use standard facts inline to solve the template, but must not accumulate a separate known-knowledge library. 10.) LeanOJ initial topic generation and brainstorm submitters always run in parallel and feed one validator that batch-validates up to 3 topics/submissions. Initial topic candidates/selection must be broad locked foundation questions covering the whole LeanOJ solution route, not narrow sublemma/tactic/local-repair topics. Recursive brainstorming has no separate recursive-topic prepass and must not re-inject the initial selected topic as active steering context; it uses the shared accepted proof-memory database plus the current proof/failure context. Accepted brainstorm memory must preserve occurrence-specific chronological metadata even for duplicate idea text. Never implement active LeanOJ topic or brainstorm phases as round-robin/serial submitter calls; one hung submitter must not halt the phase. 10b.) Developer-enabled LeanOJ Creativity Emphasis Boost applies to every fifth valid queued initial-topic and brainstorm submission per submitter. It only adds optional near-solution/adjacent-solution creativity pressure when apparent and potentially very helpful; validation remains unchanged, accepted/rejected WebSocket payloads mark `creativity_emphasized`, and the block is skipped for that slot if it would overflow the configured prompt budget. @@ -32,9 +36,9 @@ alwaysApply: true 12.) LeanOJ and autonomous proof-check recoverable provider-credit exhaustion should preserve workflow checkpoints and pause rather than become proof-attempt failures. Hard configuration/privacy/missing-key errors should fail visibly with a user-repair path instead of inflating proof attempt loops. `Retry OpenRouter` / `/api/openrouter/reset-exhaustion` wakes currently waiting in-process credit pauses after credits are restored; stopped/restarted runs resume through their persisted LeanOJ/proof checkpoint state. -13.) LeanOJ/RALPH final-proof loop checkpoints may be user-configurable feedback checkpoints or conservative no-progress/stale-edit watchdog handoffs; they must not mark success or discard the durable draft. LeanOJ start requests expose configurable phase caps (`max_initial_brainstorm_accepts`, `max_recursive_brainstorm_accepts`, `final_attempts_per_cycle`); `final_attempts_per_cycle` bounds failed final verification/edit attempts before the next path decision/handoff, so accepted `needs_more_time=true` edits can extend a cycle while they keep passing the intermediate Lean gate. The durable `master_proof.lean` is the authoritative working draft, and every accepted master-proof edit must pass an in-memory Lean gate before persistence: `needs_more_time=true` runs Lean with `sorry`/`admit` placeholders allowed but still requires parse/typecheck, template preservation, and no fake proof devices; `needs_more_time=false` runs Lean placeholder-free and then final semantic review against the user prompt/template before the run stops as verified. Final-proof mode is edit-only: it must not be offered, shown, or taught `stuck_needs_brainstorm`, raw `need_more_brainstorming`, failed-attempt counts, or any path transition. It may see the most recent 5 final attempts as compact execution feedback (Lean errors, stale edit rejections, JSON truncation, watchdog/no-progress notices) so it can avoid repeating failed edits. Lean/template rejection, semantic-review rejection, conservative no-progress/stale-edit watchdog feedback, and validator rejection of non-progressive shortening edits must preserve the master proof and persist structured continuation feedback; non-user-forced no-progress handoffs should gather recursive brainstorm context before re-entering final mode. +13.) LeanOJ/RALPH final-proof loop checkpoints may be user-configurable feedback checkpoints or conservative no-progress/stale-edit watchdog handoffs; they must not mark success or discard the durable draft. LeanOJ start requests expose configurable phase caps (`max_initial_brainstorm_accepts`, `max_recursive_brainstorm_accepts`, `final_attempts_per_cycle`); `final_attempts_per_cycle` bounds failed final verification/edit attempts before the next path decision/handoff, so accepted `needs_more_time=true` edits can extend a cycle while they keep passing the intermediate Lean gate. The durable `master_proof.lean` is the authoritative working draft, and every accepted master-proof edit must pass an in-memory Lean gate before persistence: `needs_more_time=true` runs Lean with `sorry`/`admit` placeholders allowed but still requires parse/typecheck, template preservation, and no fake proof devices; `needs_more_time=false` runs Lean placeholder-free and then final semantic review against the user prompt/template before the run stops as verified. Final-proof mode is edit-only: it must not be offered, shown, or taught `stuck_needs_brainstorm`, raw `need_more_brainstorming`, failed-attempt counts, or any path transition. It may see the most recent 5 final attempts as compact execution feedback (Lean errors, stale edit rejections, JSON truncation, watchdog/no-progress notices) and optional metadata-only `search_lean_proofs` context so it can avoid repeating failed edits without turning the master proof into a known-knowledge library. Lean/template rejection, semantic-review rejection, conservative no-progress/stale-edit watchdog feedback, and validator rejection of non-progressive shortening edits must preserve the master proof and persist structured continuation feedback; non-user-forced no-progress handoffs should gather recursive brainstorm context before re-entering final mode. -14.) LeanOJ/RALPH final verification must remain placeholder-free, but Lean-accepted scaffolds containing `sorry`/`admit` and Lean-accepted non-final-ready code should be saved as partial/supporting proofs for future context. Partial/scaffold checks may use subprocess fallback even when LSP mode is enabled. Partial proofs are citeable incomplete references only; never count them as final verified solutions and never accept fake `axiom`/`constant`/`opaque` proof devices. LeanOJ proof-gated brainstorm `lean_proof` submissions are preserved once Lean and integrity checks pass; brainstorm validation can classify usefulness/context role but should not veto the verified artifact. +14.) LeanOJ/RALPH final verification must remain placeholder-free, but Lean-accepted scaffolds containing `sorry`/`admit` and Lean-accepted non-final-ready code should be saved as partial/supporting proofs for future context. Partial/scaffold checks may use subprocess fallback even when LSP mode is enabled. Partial proofs are citeable incomplete references only; never count them as final verified solutions and never accept fake `axiom`/`constant`/`opaque` proof devices. LeanOJ proof-gated brainstorm `lean_proof` submissions are preserved once Lean and integrity checks pass, but brainstorm validation must still reject artifacts that do not directly discharge, split, or repair exact template obligations. 15.) Parent/user-selected phases have hierarchy precedence over child branches. When a parent phase starts (LeanOJ forced final loop, autonomous paper writing, Tier 3 final answer/final selection), lower-tier brainstorm/topic/path child tasks must stop or be ignored. LeanOJ `Skip Brainstorm` locks the run into the final loop until the configured final-attempt cycle is exhausted; model/path requests for more brainstorming cannot override that user action early. `Force Brainstorm` is a separate explicit user override that returns to recursive brainstorming while preserving proof progress. diff --git a/.cursor/rules/part-1-aggregator-tool-design-specifications.mdc b/.cursor/rules/part-1-aggregator-tool-design-specifications.mdc index a4612d1..8c2ac53 100644 --- a/.cursor/rules/part-1-aggregator-tool-design-specifications.mdc +++ b/.cursor/rules/part-1-aggregator-tool-design-specifications.mdc @@ -45,6 +45,8 @@ Validator processes 1, 2, or 3 submissions simultaneously using batch-specific p **Local Submitter Databases** — Per-submitter rejection log: last 5 rejections (validator summary ≤750 chars + submission preview ≤750 chars). Manual Aggregator files use `Summary_Of_Last_5_Validator_Rejections_For_Submitter_{N}.txt` and clear-all must erase them even after restart/no live submitters. Internal child aggregators must use scoped rejection-log files so topic/title/autonomous runs never feed stale feedback into manual runs or each other. Reset when a submitter reaches the configured consecutive-rejection threshold (default 15). +**User-facing rejection activity**: Submitter rejection logs/live activity should make clear that rejections include validator feedback, that rejections are normal, and that extended rejection streaks can be expected on difficult problems. + **Submission context injection**: Submitter context direct-injects first and offloads existing context to RAG when needed. Validator submissions under review are mandatory direct context; if a single or batch validation prompt is still too large after normal allocation, reject that validation batch with diagnostic feedback rather than indexing the pending submission as RAG. **Upload/path enforcement**: Server-side validation of `.txt` only, 5 MB max, filename sanitization, path traversal rejection. Upload responses return logical filenames, not absolute host paths. Public Aggregator starts resolve `uploaded_files` only under `user_uploads`; internal autonomous reference-paper context may opt into trusted data-root file references via an explicit coordinator flag. @@ -63,7 +65,7 @@ All Aggregator offload order and source-exclusion rules are centralized in `rag- ## Role Selection -User selects model per role. Multiple roles can share a model. Models load with user-set context sizes. +User selects model per role. Multiple roles can share a model. Models load with user-set context sizes. Aggregator settings also expose one shared `aggregator_assistant` memory-search LLM role for optional verified proof-memory support; built-in/default profiles set Assistant equal to the primary Validator, and Session History Memory disabled greys Assistant out. Assistant may provide up to 7 prior verified proofs, reuses useful packs for two eligible receiver reads before refresh, skips true no-external-history targets because it only performs proof-memory retrieval for now, uses run-scoped cooldown for repeated zero-useful or stagnant retrieval, shuts down only for repeated zero-useful retrieval in the current run, hides skip/backoff/shutdown turns from live activity, and never injects into validator prompts. Per-role Supercharge is optional. When enabled for a submitter or validator, `api_client_manager.generate_completion()` runs 4 parallel full answer attempts for that role call, then a 5th same-model synthesis call and returns only the synthesis result. Supercharge candidate attempts intentionally use temperatures `[0.0, 0.2, 0.4, 0.8]` to diversify parallel outputs; synthesis remains `0.0`. Candidate attempts are sanitized to reusable visible answer text before synthesis; private thought/channel/control transcript text must never be fed back as feedback, brainstorm memory, or synthesis context. The synthesis prompt frames candidates as optional working material: the model may use one, combine several, ignore all, or write a stronger new answer, while preserving the original role output contract. If Boost applies to that role/task, all internal Supercharge calls use the Boost config first. Tool-call requests bypass Supercharge. @@ -81,7 +83,7 @@ When ALL submitters AND validator use the same model → single-model mode: ## Multi-Submitter Configuration -Per-submitter: provider (LM Studio / OpenRouter / desktop-only OAuth providers such as OpenAI Codex or xAI Grok in default mode; OpenRouter only in generic mode), model, OpenRouter host provider when applicable, LM Studio fallback for cloud providers (default mode only), context window, max output tokens, and Supercharge checkbox. UI: "Number of Submitters" selector (1-10), "Copy Main to All" button. +Per-submitter: provider (LM Studio / OpenRouter / desktop-only cloud providers such as OpenAI Codex OAuth, xAI Grok OAuth, or Sakana Fugu API key in default mode; OpenRouter only in generic mode), model, OpenRouter host provider when applicable, LM Studio fallback for cloud providers (default mode only), context window, max output tokens, and Supercharge checkbox. Assistant has the same provider/context/output setting shape but is not a submitter and is not counted in submitter parallelism. UI: "Number of Submitters" selector (1-10), "Copy Main to All" button. OpenRouter auto-fill rule: selecting an OpenRouter model auto-fills context from the model-level `context_length`. Max output tokens use `min(20% of model context_length, endpoint max_completion_tokens)`: auto provider mode filters weak/low-cap endpoints and uses the smallest remaining capable endpoint cap, while an explicit host selection uses that host's largest exposed endpoint cap. Endpoint `context_length` / `max_prompt_tokens` rows are diagnostics, not context shrink limits. If endpoint output caps are incomplete, preserve current values (no guessing). @@ -119,6 +121,6 @@ JSON validation failure: reject submission, send reason + content to submitter's ## Optional Lean 4 Proof Submissions -When `lean4_enabled`, submitters may use `submission_type="lean_proof"` for prompt-directed proof candidates whose claimed novelty is public/citable and absent from standard references or Mathlib, not merely new to the program. Once Lean accepts real proof code, preserve and register it for novelty/triviality ranking even if it proves a narrower supporting lemma than intended; downshift the stored statement instead of discarding it. Hard rejection remains only for non-Lean-verified attempts, malformed submissions, or fake proof devices such as new `axiom`/`constant`/`opaque` declarations. +When `lean4_enabled`, submitters may use `submission_type="lean_proof"` only for high-impact prompt-directed proof candidates whose claimed novelty is public/citable and absent from standard references or Mathlib, not merely new to the program. The shared Lean proof gate rejects missing/invalid novelty tiers and missing prompt-relevance/novelty/anti-standard-result rationales before Lean cost. Supporting lemmas, routine helpers, local facts, trivial/easy proofs, and weakened substitutes are not valid targets. Once Lean accepts real proof code, preserve/register the actual artifact for novelty/triviality ranking, but normal brainstorm validation may still reject it from the accepted-submission database when the actual theorem is low-impact, trivial, routine, or redundant. Hard rejection always applies to non-Lean-verified attempts, malformed submissions, or fake proof devices such as new `axiom`/`constant`/`opaque` declarations. -Manual Aggregator live results expose "Try to Prove This" over the current accepted-submissions database. It uses standard proof discovery, asking for candidates that directly solve the user prompt first and then candidates that substantially build toward solving it. Any non-duplicate Lean-verified proof from that user-triggered check is stored in the active manual proof database and appended to the active manual Aggregator database proof appendix for user-visible display/download, regardless of novelty rating; progress is also persisted to the manual Aggregator event log so live activity can recover missed WebSocket events. Later proof-check prompts strip that generated-proof appendix and receive verified proofs through the manual proof database injection instead. Manual Aggregator clear archives the active manual proof database to history and resets active proof context to empty; clear is rejected while manual proof verification is active. +Manual Aggregator live results expose "Try to Prove This" over the current accepted-submissions database. It uses standard proof discovery, asking for candidates that directly solve the user prompt first and then candidates that substantially build toward solving it. When Session History Memory is enabled, the separate Assistant LLM role runs in parallel during brainstorming and proof checks, maintaining a freshness-tagged up-to-7 pack of fully Lean-verified local/SyntheticLib4 memory supports for eligible submitter/proof roles; proof discovery/formalization may use the latest pack but must not wait for it, and useful packs refresh only after two eligible receiver reads. Assistant cooldown state is run-scoped across transient task IDs/roles, survives ordinary stop/restart for that same run scope, is cleared only by explicit workflow/session reset, and hides skip/backoff/shutdown turns from live activity; zero-useful retrieval can shut down for the run, while stagnant same-pack retrieval only backs off. Validators never receive Assistant context. Any non-duplicate Lean-verified proof from that user-triggered check is stored in the active manual proof database and appended to the active manual Aggregator database proof appendix for user-visible display/download, regardless of novelty rating. This known-proof preservation is intentional for RALPH looping: the user can see validated work, and later proof checks can avoid redoing exact verified proofs instead of losing them because they ranked `not_novel`. Progress is also persisted to the manual Aggregator event log so live activity can recover missed WebSocket events. Later proof-check prompts strip that generated-proof appendix and receive verified proofs through the manual proof database injection instead. Manual Aggregator clear archives the active manual proof database to history and resets active proof context to empty; clear is rejected while manual proof verification is active. diff --git a/.cursor/rules/part-1-and-part-2-cointeraction-architecture.mdc b/.cursor/rules/part-1-and-part-2-cointeraction-architecture.mdc index c45af24..17341a8 100644 --- a/.cursor/rules/part-1-and-part-2-cointeraction-architecture.mdc +++ b/.cursor/rules/part-1-and-part-2-cointeraction-architecture.mdc @@ -22,6 +22,8 @@ The live-constructing aggregation results should be viewable in one tab and also The live-constructing compiler-written paper should be viewable in one tab and also a save function that allows the user to save the whole current aggregation database to a .txt file. This paper should be viewable in real-time as the compiler constructs it. +Manual mode prompts are durable run state. Aggregator and Compiler prompts must persist across stop, crash, restart, and repeated continue/start cycles, and must only be cleared by the explicit user clear/reset action for that mode. Empty/default startup state must never overwrite a previously saved prompt. + **Generic mode frontend note**: In generic mode the MOTO sandbox is API-only — the React frontend is served by the Web Team's separate website, not from the sandbox. The frontend calls `GET /api/features` on mount and hides LM Studio UI options when `generic_mode=True`. All other frontend functionality is identical. @@ -69,15 +71,18 @@ Parent workflow actions override child agents immediately. Manual paper writing, ### Compiler Single-Submitter (Part 2) - Fixed sequential architecture (NOT multi-submitter configurable): - - **High-Context Submitter**: Handles outline_create, outline_update, construction, review modes. During construction, may invoke the Wolfram Alpha tool up to 20 times per submission when `system_config.wolfram_alpha_enabled=true`. - - **High-Parameter Submitter**: Handles rigor mode. Rigor is the **Lean-4-verified-theorem flow**: novelty-first user-prompt-relevant discovery (with expected novelty/prompt-relevance/anti-known-result rationale, and explicit extension theorems from partial paper work / outline / source brainstorm or aggregator context / user prompt when helpful) → up to 5 Lean 4 formalization attempts (with error feedback) → novelty classification → placement routing. Discovery and formalization see the current paper plus available source brainstorm/aggregator context and verified-proof summaries; they must not build a general known-knowledge base. Existing-paper-claim theorems may go through inline placement (2 attempts, validator uses `rigor_lean_placement` mode forcing `rigor_check=True`); extension-derived theorems are forced to `placement_preference="appendix_only"` and appended directly to the Theorems Appendix (`placement_outcome="appendix_requested"`). Inline failures still use Theorems Appendix fallback. The compiler writes verified proofs directly into the shared `proof_database` (same database used by autonomous mode); novel proofs automatically enter the highest-priority direct-injection block on the next submitter instantiation. - - **Critique Submitter**: Handles the post-body critique/self-review phase with its own model/context/token settings. + - **Writing Submitter**: Handles outline_create, outline_update, construction, review modes. During construction, may invoke the Wolfram Alpha tool up to 20 times per submission when `system_config.wolfram_alpha_enabled=true`. + - **Rigor & Proofs Submitter** (legacy `high_param_*` fields): Handles all proof-solving submitter work, compiler rigor mode, Lean formalization, theorem placement, and post-body critique/self-review generation. Rigor is the **Lean-4-verified-theorem flow**: impact-first user-prompt-relevant discovery (with expected novelty/prompt-relevance/anti-known-result rationale, and explicit extension theorems from partial paper work / outline / source brainstorm or aggregator context / user prompt when helpful) → up to 5 Lean 4 formalization attempts (with error feedback) → novelty classification → placement routing. Discovery and formalization see the current paper plus available source brainstorm/aggregator context and verified-proof summaries; they must not build a general known-knowledge base. Existing-paper-claim theorems may go through inline placement (2 attempts, validator uses `rigor_lean_placement` mode forcing `rigor_check=True`); extension-derived theorems are forced to `placement_preference="appendix_only"` and appended directly to the Theorems Appendix (`placement_outcome="appendix_requested"`). Inline failures still use Theorems Appendix fallback. The compiler writes verified proofs directly into the shared `proof_database` (same database used by autonomous mode); novel proofs automatically enter the highest-priority direct-injection block on the next submitter instantiation. - Sequential Markov chain workflow (only one submission at a time) - Each compiler role has its own model, context, and max token settings (separate from aggregator) -- UI shows separate High-Context, High-Parameter, Critique Submitter, and Validator settings +- UI shows Validator, Writing Submitter, Rigor & Proofs Submitter, and Assistant settings; standalone Critique Submitter settings are deprecated aliases of Rigor & Proofs. **Why Single Validator?**: Multiple validators would cause divergent evolution of the database, breaking the coherent Markov chain required for solution alignment. The single validator ensures all submissions are evaluated against the same evolving database state. +## Assistant Proof-Retrieval Role + +Aggregator, Compiler, Autonomous Research, and LeanOJ expose one shared `Assistant` LLM model role per workflow surface for non-blocking verified proof-memory retrieval. Assistant is not a submitter, validator, proof checker, workflow phase, or per-lane clone. Eligible main roles may consume the latest up-to-7 verified proof pack opportunistically, and useful packs are refreshed only after two eligible receiver reads; validators and dedicated critique phases never receive Assistant context. The configured Assistant LLM owns live pack selection, parent workflows never wait on it, and no-history targets are skipped because Assistant only performs proof-memory retrieval for now. Durable cooldown is run-scoped across transient task IDs/roles while preserving real source/session separation; zero-useful retrieval can shut down for the current run, while stagnant same-pack retrieval only backs off. Live activity should show normal Assistant retrieval result summaries only, not skip/backoff/shutdown turns. Built-in/default profiles copy Assistant settings from the primary Validator; if Session History Memory is disabled, Assistant is greyed out and does not run. + ## Additional Traits Shared Between Aggregator-Submitters and Compiler-Submitters - The JSON of aggregator-subbmiters and compiler-submitters should include a "reasoning:" request below its "submission:" line. (This forces the submitter to explain the thoughts behind there reasoning and can also reveal deception for additional context for the validator.) diff --git a/.cursor/rules/part-2-compiler-tool-design-specification.mdc b/.cursor/rules/part-2-compiler-tool-design-specification.mdc index 5c2c58a..228e011 100644 --- a/.cursor/rules/part-2-compiler-tool-design-specification.mdc +++ b/.cursor/rules/part-2-compiler-tool-design-specification.mdc @@ -10,7 +10,7 @@ Compiler runs independently from aggregator (manual start via API only). Strict ## Compile/Distillation Tool Outline -Reads aggregator database + user prompt, distills into a single coherent paper. Runtime roles are high-context submitter, high-param submitter, critique submitter, and validator. Main construction/rigor remains sequential (no parallel compiler submitters; critique is its own post-body phase). +Reads aggregator database + user prompt, distills into a single coherent paper. Runtime roles are Writing Submitter, Rigor & Proofs Submitter (legacy `high_param_*` fields), Validator, and optional Assistant. Main construction/rigor remains sequential; post-body critique generation is performed by Rigor & Proofs while critique validation/cleanup remain Validator-owned. Aggregator/brainstorm database material is high-priority optional source context, not a mandatory checklist. Compiler submitters may selectively use, synthesize beyond, or depart from database material when that better serves the user's prompt and remains rigorous. Validator must not reject solely for selective non-use of database material. @@ -30,9 +30,11 @@ Before every `_pre_validate_exact_string_match()`, system calls `paper_memory.en **Outline is ALWAYS fully injected (never RAGed)** into all compiler mode prompts. -**Provider Selection**: Each compiler role (validator, high-context, high-param, critique submitter) can independently use LM Studio, OpenRouter, or desktop-only OAuth providers such as OpenAI Codex or xAI Grok with optional LM Studio fallback for cloud providers (default mode). OpenRouter keeps optional host-provider selection. In generic mode, all roles use OpenRouter only; LM Studio/OAuth options are hidden or unavailable. +**Provider Selection**: Each visible compiler role (Validator, Writing, Rigor & Proofs, Assistant) can independently use LM Studio, OpenRouter, or desktop-only cloud providers such as OpenAI Codex OAuth, xAI Grok OAuth, or Sakana Fugu API key with optional LM Studio fallback for cloud providers (default mode). Deprecated `critique_submitter_*` request fields are compatibility aliases for Rigor & Proofs. OpenRouter keeps optional host-provider selection. In generic mode, all roles use OpenRouter only; LM Studio/OAuth options are hidden or unavailable. Built-in/default profiles set Assistant equal to the primary Validator unless the user edits it. -**Allowed Outputs**: Single Paper Writer start requests include `allow_mathematical_proofs` and `allow_research_papers`; at least one must be true. Both true preserves today's paper-writing plus optional proof behavior. The Mathematical Proofs checkbox is the user-facing proof-output enable path and must not imply proof work when Lean is unavailable; proofs-only starts should reject clearly if Lean is disabled/unavailable. Papers-only suppresses rigor/save-time proof work for that run. Proofs-only runs proof verification over the current Aggregator database instead of compiling a paper, exposes running/stoppable status while the background proof check is active, and remains separate from developer-mode Creativity Emphasis Boost. Manual writer proof output (rigor proofs, save-time checks, proof-only checks, and "Try to Prove This") stores in the active manual proof database and appears in the Manual Mathematical Proofs tab, not in the autonomous session proof tab/activity/graph. Manual Aggregator/Compiler clear actions archive the active manual proof database into manual proof-run history, reset active proof context to empty, and remove the manual Aggregator proof appendix when needed, so old proofs never seed a new manual run; clears are rejected while manual proof verification is active. Manual proof checks always use standard proof discovery: candidates that directly solve the user prompt first, then candidates that substantially build toward solving it. Manual Aggregator proof checks (button or proofs-only Single Paper Writer) append any non-duplicate Lean-verified proof to the active Aggregator proof appendix for live display/download and future proof checks in the same active run, regardless of novelty rating, but proof-check source prompts strip existing generated-proof appendices because verified proofs enter through the active manual proof database injection. Manual Aggregator proof checks must recover the persisted manual Aggregator prompt when the live coordinator prompt is unavailable after stop/restart. Manual `Try to Prove This` role resolution must use the active/manual Aggregator or Compiler role settings, including validator settings, and must not fall back to autonomous proof runtime snapshots. The manual live paper exposes a read-only "Try to Prove This" check over the current paper plus current Aggregator context; it appends any non-duplicate Lean-verified proof to the live paper proof appendix regardless of novelty rating, while save-time proof checks append novel proofs to the saved paper text file. +**Assistant memory support**: When Session History Memory is enabled, compiler/manual writer uses one shared `compiler_assistant` memory role, not one Assistant per writer/rigor/proof lane. Assistant runs beside eligible non-validator, non-critique compiler writing/review/rigor/proof roles as optional verified proof-memory support. It may provide up to 7 prior verified proofs, reuses useful packs for two eligible receiver reads before refresh, skips true no-external-history targets because it only performs proof-memory retrieval for now, and never blocks compiler progress. Durable cooldown is run-scoped across transient compiler task IDs/roles: zero-useful retrieval can eventually shut down for that run, while stagnant same-pack retrieval only backs off. Live activity should show normal Assistant retrieval result summaries only, not skip/backoff/shutdown turns. Compiler validators and dedicated critique/self-review phases never receive Assistant context. + +**Allowed Outputs**: Single Paper Writer start requests include `allow_mathematical_proofs` and `allow_research_papers`; at least one must be true. Both true preserves today's paper-writing plus optional proof behavior. The Mathematical Proofs checkbox is the user-facing proof-output enable path and must not imply proof work when Lean is unavailable; proofs-only starts should reject clearly if Lean is disabled/unavailable. Papers-only suppresses rigor/save-time proof work for that run. Proofs-only runs proof verification over the current Aggregator database instead of compiling a paper, exposes running/stoppable status while the background proof check is active, and remains separate from developer-mode Creativity Emphasis Boost. Manual writer proof output (rigor proofs, save-time checks, proof-only checks, and "Try to Prove This") stores in the active manual proof database and appears in the Manual Mathematical Proofs tab, not in the autonomous session proof tab/activity/graph. Manual Aggregator/Compiler clear actions archive the active manual proof database into manual proof-run history, reset active proof context to empty, and remove the manual Aggregator proof appendix when needed, so old proofs never seed a new manual run; clears are rejected while manual proof verification is active. Manual proof checks always use standard proof discovery: candidates that directly solve the user prompt first, then candidates that substantially build toward solving it. When Session History Memory is enabled, the Assistant role may run in parallel for manual proof checks and live-paper proof checks, but checks continue without waiting for Assistant. Manual Aggregator proof checks (button or proofs-only Single Paper Writer) append any non-duplicate Lean-verified proof to the active Aggregator proof appendix for live display/download and future proof checks in the same active run, regardless of novelty rating; preserving known verified proofs here is intentional RALPH-loop state and exact duplicate avoidance, not a bug. Proof-check source prompts strip existing generated-proof appendices because verified proofs enter through the active manual proof database injection. Manual Aggregator proof checks must recover the persisted manual Aggregator prompt when the live coordinator prompt is unavailable after stop/restart. Manual `Try to Prove This` role resolution must use the active/manual Aggregator or Compiler role settings, including validator and Assistant settings, and must not fall back to autonomous proof runtime snapshots. The manual live paper exposes a read-only "Try to Prove This" check over the current paper plus current Aggregator context; it appends any non-duplicate Lean-verified proof to the live paper proof appendix regardless of novelty rating for the same manual RALPH-loop reason, while save-time proof checks append novel proofs to the saved paper text file. **Supercharge**: Each compiler role has a developer-mode-only Supercharge checkbox. Checked roles run 4 full answer attempts plus a 5th same-model synthesis answer through `api_client_manager.generate_completion()`. If Boost applies, every internal Supercharge call uses the Boost route/model/provider settings first. Tool-call requests bypass Supercharge; this is especially important for the Wolfram-enabled construction loop. @@ -94,21 +96,22 @@ Body content is ALWAYS inserted BEFORE CONCLUSION_PLACEHOLDER. `_apply_edit()` a ## Submitter-Validator Cycle **Outline Creation (Phase 1 — Iterative):** -1. HC submitter generates outline → validator reviews (accept/reject + feedback) +1. Writing submitter generates outline → validator reviews (accept/reject + feedback) 2. If accepted: submitter decides outline_complete=true (lock) or false (refine further) 3. Hard limit: 15 iterations; if no accepted `outline_complete=true` lock happened, the latest generated non-empty outline is force-locked as fallback → fully injected into all future prompts **Construction Loop (repeating):** -- 4× HC construction → validator -- 1× HC outline update → validator *(skipped if body complete)* -- 2× HC review → validator -- Then, if body is still active, run the HP Lean-4 theorem-search rigor loop until the first decline or 5 consecutive rigor cycles, whichever comes first. Each successful rigor cycle lands one verified theorem inline or in the Theorems Appendix; after the cap the compiler returns to construction/review before any later rigor loop. +- 4x Writing Submitter construction → validator +- 1x Writing Submitter outline update → validator *(skipped if body complete)* +- 2x Writing Submitter review → validator +- Then, if body is still active, run the Rigor & Proofs Submitter Lean-4 theorem-search rigor loop until the first decline or 5 consecutive rigor cycles, whichever comes first. Each successful rigor cycle lands one verified theorem inline or in the Theorems Appendix; after the cap the compiler returns to construction/review before any later rigor loop. **Rigor Mode (Lean 4 verified theorems, 4-stage)**: The rigor loop no longer rewrites prose. Each rigor cycle: -- Stage 1 (HP, unvalidated): novelty-first theorem discovery - using the full writing context, decide if a user-prompt-relevant, public/citable novelty-bearing theorem worth formalizing exists that is absent from standard references or Mathlib; return `needs_theorem_work=false` to decline and end the rigor loop. This stage is not a known-knowledge-base builder: routine helpers, standard Mathlib/textbook restatements, program-local firsts, and proof-engineering glue should decline before Lean cost. Discovery is explicitly allowed to construct extension theorems from partial paper work, the outline, supporting context, or the user prompt only when the theorem directly solves or builds toward solving the user's goal, not merely because it improves the paper locally. +- Stage 1 (Rigor & Proofs Submitter, unvalidated): impact-first theorem discovery - using the full writing context, decide if a user-prompt-relevant new/novel theorem worth formalizing would directly solve the user's prompt or materially advance a solution path; return `needs_theorem_work=false` to decline and end the rigor loop. This stage is not a known-knowledge-base builder: supporting lemmas, routine helpers, standard Mathlib/textbook restatements, program-local firsts, minor reformulations/local formalizations, trivial/easy proofs, and proof-engineering glue should decline before Lean cost. Discovery is explicitly allowed to construct extension theorems from partial paper work, the outline, supporting context, or the user prompt only when the theorem directly solves or builds toward solving the user's goal, not merely because it improves the paper locally. - Stage 1 output includes `theorem_origin` (`existing_paper_claim`, `extension_from_partial_work`, `extension_from_user_prompt`), `placement_preference` (`inline`, `appendix_only`), `expected_novelty_tier`, `prompt_relevance_rationale`, `novelty_rationale`, and `why_not_standard_known_result`. Invalid/missing novelty tiers or rationales decline before Lean cost. Extension-derived theorems MUST be forced to `appendix_only`; existing-paper-claim theorems may be inline or appendix-only. -- Stage 2: `ProofFormalizationAgent.prove_candidate(max_attempts=5)` - up to 5 Lean 4 attempts with error-feedback chaining and complete current-paper source plus available source brainstorm/aggregator context as mandatory paper-writing proof context; focused excerpts are supplemental only. On 5 failures: record the candidate via `proof_database.record_failed_candidate` so future cycles see it as an open lemma target; end the rigor cycle as a decline. +- Stage 2: `ProofFormalizationAgent.prove_candidate(max_attempts=5)` - up to 5 Lean 4 attempts with error-feedback chaining and complete current-paper source plus available source brainstorm/aggregator context as mandatory paper-writing proof context; focused excerpts are supplemental only. Routine workflow-memory retrieval is owned by the parallel Assistant LLM role when Session History Memory is enabled: Assistant observes the whole user prompt plus current phase/candidate/draft, rejection feedback, and Lean errors when relevant, then maintains a freshness-tagged up-to-7 pack of fully Lean-verified memory supports from local MOTO/manual/LeanOJ history and authorized SyntheticLib4 sources. Formalization receives the latest pack opportunistically and never waits for Assistant; useful packs refresh only after two eligible receiver reads. Direct `search_lean_proofs` prefetch/tool calls are explicit legacy/debug or narrow emergency-repair paths only. If full SyntheticLib4 Lean code is shown to a successful formalization prompt, MOTO records a local `model_visible_context` usage attestation without claiming whole-code dependency use. On 5 failures: record the candidate via `proof_database.record_failed_candidate` so future cycles see it as an open high-impact proof target; end the rigor cycle as a decline. +- Proof-search retrieval is supporting context only: it must not turn rigor into a known-knowledge-base builder, alter the serial rigor loop, or bypass MOTO proof registration and integrity rules. - Stage 3: hard post-Lean integrity checks reject only fake proof devices such as new `axiom`/`constant`/`opaque`; statement mismatch is non-blocking and downshifts storage to the actual Lean-verified theorem. Novelty classification and persistence go through the shared `register_verified_lean_proof()` path, which ranks preserved proofs and stores novel/non-novel records with duplicate detection under the active paper source id. Rigor discovery sees compact verified-proof summaries from the active proof database (autonomous session or active manual run), while source reads strip appended generated-proof sections so proof code does not duplicate through the paper/brainstorm source context. Canonical proof records and user-visible appendices remain preserved. Non-novel proofs remain available through `/api/proofs` for future user-driven reference selection. - Stage 4: placement - if `placement_preference="inline"`, HP model proposes an inline edit that introduces the theorem with an explicit "verified in Lean 4" marker and an appendix cross-reference. Validator uses `rigor_lean_placement` mode which forces `rigor_check=True` (Lean 4 is the source of mathematical truth) and judges placement/narrative only. Up to 2 placement attempts (attempt 2 gets validator rejection feedback). - Appendix routing: if `placement_preference="appendix_only"`, skip inline placement and append directly to the **Theorems Appendix** with `placement_outcome="appendix_requested"`. If inline placement is attempted but both placement attempts fail, append with `placement_outcome="appendix_fallback"`. Both outcomes count as `rigor_acceptance` because the math is preserved. @@ -208,12 +211,12 @@ Prevents models' fake placeholder text (e.g., "XI. Conclusion\n*placeholder*") f ## Context Allocation Per-role context windows are explicit user/provider settings for each role: -- Validator, High-Context Submitter, High-Parameter Submitter, and Critique Submitter must receive configured context and max-output values; runtime code must not substitute hidden 131K/25K defaults. +- Validator, Writing Submitter, Rigor & Proofs Submitter, and Assistant must receive configured context and max-output values; runtime code must not substitute hidden 131K/25K defaults. Deprecated critique settings mirror Rigor & Proofs. - **Settings flow**: All compiler modules read from `system_config.compiler_*` at runtime. The caller that creates `CompilerCoordinator` MUST write settings to `system_config` before init (manual mode: `/api/compiler/start`; autonomous mode: `autonomous_coordinator.py` before `CompilerCoordinator()` creation). Per-role Supercharge flags must be passed through `ModelConfig`, not `system_config`. - **OpenRouter auto-fill**: Selecting an OpenRouter model auto-fills context from the model-level `context_length`. Max output tokens use `min(20% of model context_length, endpoint max_completion_tokens)`: auto provider mode filters weak/low-cap endpoints and uses the smallest remaining capable endpoint cap, while an explicit host selection uses that host's largest exposed endpoint cap. Endpoint `context_length` / `max_prompt_tokens` rows are diagnostics, not context shrink limits. If endpoint output caps are incomplete, preserve current values (no guessing). - Rigor mode dynamically adjusts RAG budget if outline + system prompts exceed available context - Construction mode dynamically adjusts RAG budget when brainstorm content is present: `rag_budget = max(5000, max_allowed - outline_tokens - paper_tokens - brainstorm_tokens - 5000_overhead)`. Brainstorm always direct-injected at full fidelity; RAG evidence scales to fit remaining budget. -- **Wolfram Alpha as a construction tool**: During `HighContextSubmitter.submit_construction` (body / conclusion / introduction / abstract), when `system_config.wolfram_alpha_enabled=true`, the writer may invoke the `wolfram_alpha_query` OpenAI-compatible tool up to **20 times per submission** to verify factual / computational claims before writing them. On budget exhaustion, the loop forces finalization with tools disabled. Tool replies remain model-visible, but logs/WebSocket events expose only redacted metadata and lengths; paper credits store counts only. Wolfram tool is NOT available in `outline_create`, `outline_update`, `review`, or the rigor loop. +- **Wolfram Alpha as a construction tool**: During `WritingSubmitter.submit_construction` (body / conclusion / introduction / abstract), when `system_config.wolfram_alpha_enabled=true`, the writer may invoke the `wolfram_alpha_query` OpenAI-compatible tool up to **20 times per submission** to verify factual / computational claims before writing them. On budget exhaustion, the loop forces finalization with tools disabled. Tool replies remain model-visible, but logs/WebSocket events expose only redacted metadata and lengths; paper credits store counts only. Wolfram tool is NOT available in `outline_create`, `outline_update`, `review`, or the rigor loop. **Context rules:** User prompt ALWAYS direct injected. The canonical direct-injection, RAG reserve, offload-order, and source-exclusion policy lives in `rag-design-for-overall-program.mdc`. @@ -221,7 +224,7 @@ Per-role context windows are explicit user/provider settings for each role: - `outline_create`, `outline_update`, `rigor`, `construction`, `review`: raises ValueError if exceeds - `validator`: rejects submission if exceeds -**Rigor Mode context**: outline, current paper, verified-proof summaries, and available source brainstorm/aggregator database context are direct-injected into theorem discovery. Manual/single-paper compiler mode uses the Part 1 aggregator database when available; autonomous/multi-paper mode uses the active source brainstorm, while prior brainstorm papers and references remain high-priority RAG evidence. Formalization attempts receive current paper plus the same bounded source context. Supplemental RAG evidence excludes both `compiler_outline.txt` and `compiler_paper.txt`; if the mandatory direct prompt is too large even without RAG evidence, rigor shrinks source context before raising a prompt-size error. Rigor prompts live in `backend/compiler/prompts/rigor_prompts.py` - the pre-Build-4 `standard_enhancement` / `rewrite_focus` / `wolfram_verification` prompts were replaced by `build_rigor_theorem_discovery_prompt` (Stage 1) and `build_rigor_placement_prompt` (Stage 2). +**Rigor Mode context**: outline, current paper, verified-proof summaries, and available source brainstorm/aggregator database context are direct-injected into theorem discovery. Manual/single-paper compiler mode uses the Part 1 aggregator database when available; autonomous/multi-paper mode uses the active source brainstorm, while prior brainstorm papers and references remain high-priority RAG evidence. Formalization attempts receive current paper plus the same bounded source context; the Assistant memory pack is optional, freshness-tagged, and last optional/offloadable after mandatory source context. Supplemental RAG evidence excludes both `compiler_outline.txt` and `compiler_paper.txt`; if the mandatory direct prompt is too large even without RAG evidence, rigor shrinks source context before raising a prompt-size error. Rigor prompts live in `backend/compiler/prompts/rigor_prompts.py` - the pre-Build-4 `standard_enhancement` / `rewrite_focus` / `wolfram_verification` prompts were replaced by `build_rigor_theorem_discovery_prompt` (Stage 1) and `build_rigor_placement_prompt` (Stage 2). **RAG source exclusion (anti-duplication)**: All compiler RAG offload/source-exclusion rules are centralized in `rag-design-for-overall-program.mdc`. diff --git a/.cursor/rules/part-3-autonomous-research-mode.mdc b/.cursor/rules/part-3-autonomous-research-mode.mdc index ac36eaa..8faffaf 100644 --- a/.cursor/rules/part-3-autonomous-research-mode.mdc +++ b/.cursor/rules/part-3-autonomous-research-mode.mdc @@ -57,7 +57,7 @@ The autonomous coordinator USES actual Part 1 aggregator infrastructure for brai - Sets WebSocket broadcaster to propagate aggregator events through autonomous coordinator - Monitors aggregator stats in real-time to track acceptances/rejections - Stops aggregator when completion review decides to write paper -- **Phase enforcement**: Construction submitter must check current phase before declaring completion +- **Phase enforcement**: Writing Submitter must check current phase before declaring completion - **Premature decline rejection**: Coordinator rejects declines if required sections are missing based on current phase - **Parent precedence**: Forced paper writing and forced Tier 3 must stop active child aggregators before the parent tier continues; local exploration/title aggregators must be tracked so they can be stopped. @@ -72,9 +72,9 @@ The autonomous coordinator USES actual Part 2 compiler infrastructure for paper Compiler submitters may selectively use, synthesize beyond, or depart from brainstorm material when that better serves the user's prompt and remains rigorous. Validator must not reject solely for selective non-use of brainstorm/database material. **Critical Implementation Details**: -- **system_config propagation (REQUIRED)**: Before creating `CompilerCoordinator`, autonomous mode MUST write all compiler context/token settings to `system_config` (e.g., `system_config.compiler_high_context_context_window = self._high_context_context`). Compiler modules read from `system_config` at init — the manual `/api/compiler/start` route does this, but autonomous mode bypasses that route and must do it explicitly. Applies to both `_compile_paper()` and `_compile_tier3_paper()`. -- **Supercharge propagation (REQUIRED)**: Autonomous mode must preserve per-role `supercharge_enabled` for brainstorm submitters, validator, high-context, high-param, critique submitter, proof runtime snapshots, and child Compiler/Aggregator coordinators. This setting lives in role configs / `ModelConfig`, not `system_config`. -- **Rigor proof source propagation (REQUIRED)**: Autonomous and Tier 3 child compiler runs must pass the active paper id/title into compiler rigor so High Parameter Lean-verified proofs are ranked and indexed under the real paper source. +- **system_config propagation (REQUIRED)**: Before creating `CompilerCoordinator`, autonomous mode MUST write all compiler context/token settings to `system_config` (e.g., `system_config.compiler_writer_context_window = self._writer_context`). Compiler modules read from `system_config` at init — the manual `/api/compiler/start` route does this, but autonomous mode bypasses that route and must do it explicitly. Applies to both `_compile_paper()` and `_compile_tier3_paper()`. +- **Supercharge propagation (REQUIRED)**: Autonomous mode must preserve per-role `supercharge_enabled` for brainstorm submitters, validator, Writing Submitter, Rigor & Proofs Submitter (legacy `high_param_*` fields), Assistant, proof runtime snapshots, and child Compiler/Aggregator coordinators. Deprecated critique fields mirror Rigor & Proofs during the compatibility period. This setting lives in role configs / `ModelConfig`, not `system_config`. +- **Rigor proof source propagation (REQUIRED)**: Autonomous and Tier 3 child compiler runs must pass the active paper id/title into compiler rigor so Rigor & Proofs Lean-verified proofs are ranked and indexed under the real paper source. - Constrains section order: Body → post-body critique/self-review → Conclusion → Introduction → pre-abstract empirical red-team review → Abstract - Child compiler phase completion is driven by explicit `section_complete: true`; the autonomous wrapper still uses Abstract detection as its parent-level completion monitor - Regex patterns may extract abstract text for metadata and parent monitoring, but child phase advancement remains explicit-signal based @@ -118,16 +118,15 @@ Without exploration, the topic selector samples from the model's highest-probabi - **Prompts** (`backend/autonomous/prompts/topic_exploration_prompts.py`): `build_exploration_user_prompt()` frames the aggregation task for candidate question generation - **Temp DB**: `exploration_candidates.txt` in brainstorms directory (cleaned up after phase) - **Scoped rejection feedback**: Topic/title mini-aggregators and brainstorm aggregators use rejection-log files under the active brainstorm/session directory, never the manual Aggregator `Summary_Of_Last_5...` files. -- **Target**: 5 accepted candidates per exploration cycle -- **Safety valve**: 15 consecutive rejections → proceed with whatever candidates collected +- **Target**: exactly 5 accepted candidates per exploration cycle before topic selection; rejection count, partial candidates, and failed child runs do not advance selection ### Workflow 1. Aggregator starts through the standard Part 1 coordinator; submitters run in parallel except when inherited Aggregator single-model mode serializes them 2. Submitters generate candidate brainstorm questions as standard submissions 3. Validator batch-validates (up to 3 at a time) checking quality, relevance, and DIVERSITY 4. Accepted candidates accumulate in temp exploration database -5. Coordinator monitors aggregator stats, stops at 5 acceptances -6. Reads exploration DB, formats as candidate list for topic selector +5. Coordinator monitors aggregator stats and completes only at 5 acceptances; explicit stop or child-aggregator failure exits/retries without feeding partial candidates to topic selection +6. Reads the completed 5-candidate exploration DB and formats it for topic selector ### WebSocket Events Topic exploration should emit user-visible progress through the standard workflow/WebSocket stream, but exact internal event names are not rule-level invariants unless consumed by the hosted wrapper or frontend contract. @@ -223,9 +222,9 @@ When rejected, the validator's reasoning is added to the topic submitter's rolli The autonomous research system's power comes from its ability to **compound knowledge across research cycles**. Each completed paper represents distilled mathematical insights that can inform and enhance future brainstorm explorations. By selecting reference papers BEFORE brainstorming begins, submitters can: -- Build upon proven mathematical frameworks from prior papers +- Build upon promising mathematical frameworks from prior AI-generated papers while independently re-checking their claims - Avoid re-exploring territory already covered in depth -- Identify novel connections between new topics and established results +- Identify novel connections between new topics and previously explored results - Accelerate convergence on valuable insights by standing on prior work **This is the most crucial mechanism that allows database creativity to compound.** @@ -381,7 +380,7 @@ Runs after each configured interval of accepted submissions (default 10). Cleanu - Coordinator validates state (must be in tier1_aggregation) - Brainstorm aggregator is stopped immediately - Brainstorm is marked complete - - System transitions to paper compilation workflow (Tier 2) + - System transitions to the normal post-brainstorm handoff: proof checkpoint first when proof outputs are enabled, then paper compilation only when research-paper output is enabled **API Endpoint**: `POST /api/auto-research/force-paper-writing` @@ -405,8 +404,8 @@ Runs after each configured interval of accepted submissions (default 10). Cleanu - Does NOT run completion review (bypassed entirely) - Does NOT require self-validation (user decision is final) - Brainstorm is marked complete regardless of acceptance count -- Subsequent paper compilation proceeds normally with all selected reference papers -- **Race condition guard**: `_brainstorm_aggregation_loop()` checks `_manual_paper_writing_triggered` before calling `start()` on the aggregator (catches override during async init). The monitoring loop also stops the aggregator before returning on manual override. +- Subsequent paper compilation proceeds normally with all selected reference papers when research-paper output is enabled; proofs-only runs skip paper/title work after brainstorm proof work and return to topic exploration. +- **Race condition guard**: `_brainstorm_aggregation_loop()` checks `_manual_paper_writing_triggered` before calling `start()` on the aggregator (catches override during async init). The monitoring loop must also treat a user-stopped child aggregator as manual override ownership, not as an unexpected child stop. ### Purpose Assess whether the current brainstorm has been sufficiently explored relative to THIS MODEL'S internal knowledge (weights) and decide whether to continue brainstorming or begin writing a paper. @@ -495,8 +494,8 @@ Same two-step browsing workflow as pre-brainstorm selection (expand request → 2. Submitters generate candidate paper titles as standard submissions 3. Validator checks quality, relevance, and DIVERSITY (rejects near-duplicates) 4. Accepted candidates accumulate in temp title DB -5. Coordinator stops at 5 acceptances (or 15 consecutive rejections safety valve) -6. Reads title DB, formats as candidate list for final title selection +5. Coordinator completes only at 5 acceptances; explicit stop or child-aggregator failure exits/retries without feeding partial candidates to final title selection +6. Reads the completed 5-title DB and formats it for final title selection **Temp DB**: `title_candidates_{topic_id}.txt` in brainstorms dir (cleaned up after phase) @@ -562,8 +561,8 @@ During paper compilation, the compiler submitter sees both the paper AND the sou **Uses existing Part 2 (Compiler) infrastructure**: - Sequential Markov chain workflow -- High-context submitter (outline, construction, review modes) -- High-parameter submitter (rigor mode) +- Writing Submitter (outline, construction, review modes) +- Rigor & Proofs Submitter (rigor mode) - Compiler validator (coherence, rigor, placement validation) - **Iterative outline creation** (submitter refines outline until satisfied or 15 iteration limit) @@ -1204,15 +1203,19 @@ Runs automatically after every completed brainstorm (Tier 1) and completed Tier **Proof Framing Gate (one-shot, at autonomous start)**: On fresh autonomous starts, the coordinator runs `_run_proof_framing_gate()` before research begins. A single LLM call on the user prompt decides `is_proof_amenable` (`build_proof_framing_gate_prompt` → `autonomous_proof_framing_gate` role). The gate errs on the side of `true` when formal proof can help the user's prompt — it returns `false` when the prompt is purely empirical, engineering-focused, or has no meaningful prompt-relevant mathematical content. If `true`, `PROOF_FRAMING_CONTEXT` (which directs submissions to pursue theorems/lemmas/formalizations that directly answer, support, or advance the user prompt, with novelty/non-triviality valuable only inside that boundary) is appended to every subsequent submitter prompt via `_append_proof_framing()` and persisted to workflow state for crash recovery. Decision is broadcast via `proof_framing_decided`. Lean/proof execution remains separately gated by `lean4_enabled`. -**Automatic proof rounds**: Autonomous brainstorm and completed-paper proof checkpoints run the pipeline below for up to 4 rounds. Round 1 uses normal candidate identification. After each completed round, the coordinator asks again whether there are any remaining proofs in the same source that directly solve the user's prompt or substantially advance a solution path, with newly verified proofs visible through the proof-library context. The checkpoint stops early when a round finds no candidates. Manual proof checks, deferred paper retry checks, compiler rigor, and LeanOJ are single-round unless their own mode rules say otherwise. The per-source reservation spans the whole automatic multi-round checkpoint, and completed earlier rounds must never overwrite an active later-round resume cursor. +**Automatic proof rounds**: Proofs-only autonomous runs let brainstorm proof checkpoints run the pipeline below for up to 4 rounds. Round 1 uses normal candidate identification. After each completed round, the coordinator asks again whether there are any remaining proofs in the same source that directly solve the user's prompt or substantially advance a solution path, with newly verified proofs visible through the proof-library context. The checkpoint stops early when a round finds no candidates. When research papers and mathematical proofs are both enabled, automatic proof checkpoints after brainstorming and after completed paper writing are single-round so paper generation remains the main output path. Manual proof checks, deferred paper retry checks, compiler rigor, and LeanOJ are single-round unless their own mode rules say otherwise. The per-source reservation spans the whole automatic multi-round checkpoint, and completed earlier rounds must never overwrite an active later-round resume cursor. + +**Assistant memory-support role**: When Session History Memory is enabled, each active workflow uses one shared Assistant memory role (`autonomous_assistant` for autonomous, `leanoj_assistant` for LeanOJ), not one Assistant per submitter/proof candidate/solver lane. Assistant runs beside eligible non-validator, non-critique brainstorming, selection, writing, proof, and Tier 3 roles as optional verified proof-memory support; short-lived topic/title exploration candidate mini-aggregators are excluded. It may provide up to 7 prior verified proof supports from enabled proof-history corpora and reuse useful packs for two eligible receiver reads before refresh, but validators never receive Assistant context and parent workflows never wait on Assistant. Assistant skips true no-external-history targets because it only performs proof-memory retrieval for now. Durable cooldown is keyed to stable run scope, grouping transient task IDs/roles while preserving real source/session separation; zero-useful retrieval can eventually shut down for the run, while stagnant same-pack retrieval backs off without shutdown. User live activity should show normal Assistant retrieval result summaries only, not skip/backoff/shutdown turns. Stale live packs/state clear only on explicit reset/clear or Session History Memory disable. **Pipeline** (`backend/autonomous/core/proof_verification_stage.py`): -1. **Candidate identification** — `ProofIdentificationAgent` (`build_proof_identification_prompt`) extracts prompt-relevant, novelty-first theorem candidates from brainstorm or paper content. For brainstorms, the brainstorm topic is bounded source-local metadata only and must not broaden eligibility beyond proofs that directly solve, or visibly build toward solving, the user prompt. This stage is not a general known-knowledge-base builder: candidates are ordered by direct prompt solutions first, then major discoveries, mathematical discoveries, novel variants, prompt-critical novel formalizations that are absent from standard references/Mathlib and independently publishable/citable, and only necessary supporting lemmas for those novel targets. Candidate JSON includes non-empty `expected_novelty_tier`, `prompt_relevance_rationale`, `novelty_rationale`, and `why_not_standard_known_result`; invalid, missing-rationale, or `not_novel` candidates are skipped before Lean 4 cost is incurred. Routine helpers, standard/textbook/Mathlib restatements, off-prompt curiosities, program-local firsts, and single-tactic/routine proof goals are rejected. +1. **Candidate identification** — `ProofIdentificationAgent` (`build_proof_identification_prompt`) extracts prompt-relevant, high-impact new/novel theorem candidates from brainstorm or paper content. For brainstorms, the brainstorm topic is bounded source-local metadata only and must not broaden eligibility beyond proofs that directly solve, or visibly build toward solving, the user prompt. This stage is not a general known-knowledge-base builder: candidates are ordered by impact on the user prompt (direct solutions/impossibility results first, then decisive reductions, obstructions, and structural theorems that themselves make major progress). Candidate JSON includes non-empty `expected_novelty_tier`, `prompt_relevance_rationale`, `novelty_rationale`, and `why_not_standard_known_result`; invalid, missing-rationale, or `not_novel` candidates are skipped before Lean 4 cost is incurred. Routine helpers, supporting lemmas, trivial/easy proofs, standard/textbook/Mathlib restatements, off-prompt curiosities, program-local firsts, minor reformulations, and single-tactic/routine proof goals are rejected even as fallback targets. 2. **Optional Mathlib lemma search** — `MathlibLemmaSearchAgent` surfaces relevant existing lemmas into the formalization prompt, tied to the target theorem, user prompt, and brainstorm topic when present 3. **Optional SMT early-exit** — when `smt_enabled`, `SmtClient` classifies candidates conservatively; only valid `unsat` checks become Lean tactic hints (for example `nativeDecide`, `omega`, `decide`, `norm_num`, `linarith`, or `polyrith`-style hints). `sat`, `unknown`, failed translation, and non-amenable candidates produce no hint. SMT results are never stored as standalone proofs; prompts receive the same user prompt + brainstorm topic relevance context -4. **Lean 4 formalization attempts** — two-phase retry: up to 3 full-proof attempts via `ProofFormalizationAgent.prove_candidate`, then up to 2 multi-tactic script attempts via `prove_candidate_tactic_script` (5 total per candidate). Formalization prompts receive the same source title/brainstorm topic context plus the complete source brainstorm/paper as mandatory direct context; focused excerpts are supplemental only. Prompt source reads strip appended generated-proof sections so verified proofs enter through the explicit proof-library context, while canonical proof files and user-visible appendices remain preserved. If the complete source cannot fit, fail visibly instead of silently truncating or proving from excerpt-only context. Prior `FailedProofCandidate` failure hints from `proof_database.inject_failure_hints_into_prompt()` thread into each retry. +4. **Lean 4 formalization attempts** — two-phase retry: up to 3 full-proof attempts via `ProofFormalizationAgent.prove_candidate`, then up to 2 multi-tactic script attempts via `prove_candidate_tactic_script` (5 total per candidate). Formalization prompts receive the same source title/brainstorm topic context plus the complete source brainstorm/paper as mandatory direct context; focused excerpts are supplemental only. The latest Assistant memory pack may be injected as optional proof-pattern/dependency guidance, dropping it before mandatory source context if the prompt would overflow. Direct `search_lean_proofs` prefetch/tool loops are explicit legacy/debug or narrow emergency-repair paths only, not the normal retrieval path. When full SyntheticLib4 Lean code is injected into a prompt and the formalization succeeds, MOTO records a local non-whole-code `model_visible_context` usage attestation; live whole-code usage submission remains tied to the future SyntheticLib4 service contract. Prompt source reads strip appended generated-proof sections so verified proofs enter through the explicit proof-library context, while canonical proof files and user-visible appendices remain preserved. If the complete source cannot fit, fail visibly instead of silently truncating or proving from excerpt-only context. Prior `FailedProofCandidate` failure hints from `proof_database.inject_failure_hints_into_prompt()` thread into each retry. 5. **Post-Lean preservation + novelty check** — hard integrity rejects only fake proof devices such as new `axiom`/`constant`/`opaque`; statement mismatch is classified as a downshift, never a discard. `autonomous_proof_novelty` ranks the actual Lean-verified theorem against standard references/Mathlib, prompt relevance, and the proof library; program-local firsts are not novel. -6. **Storage** — `proof_registration.register_verified_lean_proof()` uses `proof_database.add_proof_if_absent()` to persist novel and known proofs as session-aware records (`proofs_index.json`, `proof_.json`, `proof__lean.lean`). Dependency extraction runs after initial registration and patches the stored record afterward; `proof_verified` may therefore emit before dependency metadata is attached. Same-source duplicate detection is scoped to source type/id + normalized theorem statement + normalized Lean code and must return `duplicate=True` to callers so source files are not appended twice. Cross-source exact theorem/code matches are stored as non-novel reused proof occurrences without spending a novelty API call. If the accepted code proves a narrower supporting lemma than the intended candidate, store the actual theorem statement and retain the original target in notes. If the active/current `proofs_index.json` is corrupt, rebuild from existing `proof_*.json` record files instead of replacing the library with an empty index; cross-session library scans may skip unreadable historical indexes unless explicitly rebuilt. Automatic proof checkpoints append source-file proof sections only for non-duplicate novel records. User-triggered "Try to Prove This" / manual proof checks append any non-duplicate Lean-verified proof to the target brainstorm or paper source appendix regardless of novelty rating. Cross-session read access is provided by `proof_database.list_proof_library()` (all sessions, novelty-filtered) and `proof_database.get_library_proof(session_id, proof_id)`, consumed by the `ProofLibrary` UI component and `/api/proofs/library` endpoints. +6. **Storage** — `proof_registration.register_verified_lean_proof()` uses `proof_database.add_proof_if_absent()` to persist novel and known proofs as session-aware records (`proofs_index.json`, `proof_.json`, `proof__lean.lean`). Dependency extraction runs after initial registration and patches the stored record afterward; `proof_verified` may therefore emit before dependency metadata is attached. Same-source duplicate detection is scoped to source type/id + normalized theorem statement + normalized Lean code and must return `duplicate=True` to callers so source files are not appended twice. Cross-source exact theorem/code matches are stored as non-novel reused proof occurrences without spending a novelty API call. If a Lean-accepted proof nevertheless proves a different or narrower theorem than the intended candidate, store the actual theorem statement and retain the original target in notes; this is preservation of a real artifact, not permission to target supporting lemmas. If the active/current `proofs_index.json` is corrupt, rebuild from existing `proof_*.json` record files instead of replacing the library with an empty index; cross-session library scans may skip unreadable historical indexes unless explicitly rebuilt. Automatic proof checkpoints append source-file proof sections only for non-duplicate novel records. User-triggered "Try to Prove This" / manual proof checks append any non-duplicate Lean-verified proof to the target brainstorm or paper source appendix regardless of novelty rating; preserving known verified proofs here is intentional RALPH-loop state so users can inspect validated work and later checks avoid redoing exact proofs that ranked `not_novel`. Cross-session read access is provided by `proof_database.list_proof_library()` (all sessions, novelty-filtered) and `proof_database.get_library_proof(session_id, proof_id)`, consumed by the `ProofLibrary` UI component and `/api/proofs/library` endpoints. + +**Unified proof search**: Assistant is the preferred owner of routine workflow-memory retrieval. Unified proof search retrieves bounded, provenance-carrying, fully Lean-verified examples from prior MOTO proof history, LeanOJ verified artifacts, manual proof history, and authorized SyntheticLib4 snapshots while excluding active/current-run proof records; the configured Assistant LLM then selects/refines at most 7 model-visible supports. Do not add a second model-rater role separate from Assistant, and do not publish live support packs without configured Assistant LLM selection. Search is not candidate discovery, not a workflow stop condition, and not a replacement for MOTO Lean/integrity/registration gates; SyntheticLib4/search unavailability should return structured warnings while local workflows continue where possible. **Parallelism (two-phase execution per stage run)**: Steps 2–4 above (the per-candidate "Phase A" pipeline: lemma search → optional SMT hint → `prove_candidate` → `prove_candidate_tactic_script` → `proof_attempts_exhausted` broadcast on failure) run concurrently across identified candidates inside a single `ProofVerificationStage.run()` invocation. `system_config.proof_max_parallel_candidates` controls Phase A batching: the default is `6`; `0` (env: `MOTO_PROOF_MAX_PARALLEL_CANDIDATES` / `PROOF_MAX_PARALLEL_CANDIDATES`) means unlimited; positive values run strict batches of that size before the next batch starts. Phase A parallelizes agent/model work, but actual Lean 4 subprocess verification is serialized by `Lean4Client` behind a shared execution lock so all candidates queue one-at-a-time against the shared Mathlib workspace; LSP mode remains independently serialized by its operation lock and subprocess fallback uses the same shared queue. The identification stage (step 1) filters off-prompt, trivial, and well-known results before Phase A begins, so Phase A only processes prompt-relevant theorem candidates. Completed candidates are consumed as tasks finish, and steps 5–6 (the "Phase B" post-processing: novelty assessment, `add_proof`, dependency extraction via `ProofDependencyExtractor`, `append_proofs_section`, `novel_proof_discovered` / `known_proof_verified` broadcast, `record_failed_candidate` for brainstorm failures) are performed strictly **one-at-a-time** in Phase-A completion order inside that driver loop so later candidates can observe earlier stored proofs as MOTO dependencies. If any Phase-A task raises account-credit exhaustion, the driver cancels siblings and re-raises so the autonomous coordinator persists the proof checkpoint pause and waits for OpenRouter reset before retrying. Other Phase-A exceptions cancel siblings, save an error checkpoint, broadcast completion with error state, and return `had_error=True` so the coordinator recovery path can continue without orphaned background API calls. `should_stop` is plumbed into each Phase-A pipeline and checked before each Phase-B pass, so a stop-request short-circuits cleanly without leaking tasks. @@ -1226,7 +1229,7 @@ Runs automatically after every completed brainstorm (Tier 1) and completed Tier **Manual proof checks** (Build 5): `POST /api/proofs/check` reuses `ProofVerificationStage.run_manual()` with a `ProofRuntimeConfigSnapshot` (brainstorm / paper / validator role configs) loaded from stored autonomous metadata or supplied directly in the request. Manual checks accept `source_type="brainstorm"` or `"paper"`; history papers are addressed through the paper source path with `source_id="{session_id}:{paper_id}"` and must resolve to a completed, non-pruned history paper. Prompt-local source reads strip appended generated-proof sections. Active and history paper checks direct-inject available source brainstorm context from the matching session when `source_brainstorm_ids` are available; no hidden character cap replaces mandatory proof source context. Manual checks always use standard proof discovery: first ask for candidates that directly solve the user prompt, then candidates that substantially build toward solving it. Readiness is surfaced via `/api/proofs/status.manual_check_ready` + `manual_check_message`. Required state: `lean4_enabled=True` AND a stored or request-provided runtime snapshot. -**Proof runtime config snapshot** (`research_metadata.set_proof_runtime_config`): Captures a `ProofRuntimeConfigSnapshot` with three `ProofRoleConfigSnapshot` entries — `brainstorm` (from first aggregator submitter config), `paper` (from high-context submitter config), `validator` (from validator config). Each holds provider, model_id, openrouter_provider, openrouter_reasoning_effort, lm_studio_fallback_id, context_window, max_output_tokens, and supercharge_enabled. Lets manual checks run without an active autonomous session when a request snapshot is not supplied. +**Proof runtime config snapshot** (`research_metadata.set_proof_runtime_config`): Captures a `ProofRuntimeConfigSnapshot` with three `ProofRoleConfigSnapshot` entries — `brainstorm` (from first aggregator submitter config), `paper` (from Rigor & Proofs Submitter config for proof work), `validator` (from validator config). Each holds provider, model_id, openrouter_provider, openrouter_reasoning_effort, lm_studio_fallback_id, context_window, max_output_tokens, and supercharge_enabled. Lets manual checks run without an active autonomous session when a request snapshot is not supplied. **Proof WebSocket events** are broadcast through the standard `/api/ws` stream for user-visible progress. Do not make every internal progress notification a rule-level invariant, but keep frontend-consumed events stable and update the hosted contract/API version when changing them. Autonomous proof-round progress events include `proof_round_index` and `proof_max_rounds`. `proof_verified` is emitted only after the proof has passed integrity checks and has been registered/reused in the proof database; payloads include `proof_id`. Novel/known/duplicate proof registration events include the validator's `novelty_tier` and `novelty_reasoning` so live activity can show whether the Lean 4 proof was rated novel or not novel. @@ -1240,12 +1243,15 @@ Runs automatically after every completed brainstorm (Tier 1) and completed Tier 7. Proof certificates stay text-based (`.lean` source + JSON metadata) — no binary artifacts 8. Hosted/generic mode keeps `lean4_enabled` and `smt_enabled` default false and the hosted image stays Lean-free and Z3-free (no proof binaries in the `python:3.12-slim` runtime) 9. Proof framing gate runs once on fresh autonomous starts; the resulting `proof_framing_active` flag and `PROOF_FRAMING_CONTEXT` are persisted in workflow state for crash recovery. Lean/proof model execution remains controlled by `lean4_enabled`. -10. Candidate identification (`build_proof_identification_prompt`) is a novelty-first user-prompt relevance gate, not a known-knowledge-base builder. It rejects off-prompt curiosities, routine helper lemmas, standard/textbook/Mathlib restatements, program-local firsts, and single-tactic/routine proof goals, then returns candidates ordered by direct user-prompt solutions first, followed by prompt-solving discoveries, variants, prompt-critical formalizations absent from standard references/Mathlib and independently publishable/citable, and necessary supporting lemmas. Candidate prompts require expected novelty plus non-empty prompt-relevance, novelty, and anti-standard-result rationale fields; invalid, missing-rationale, or `not_novel` candidates are skipped before Lean cost. Every candidate that passes this gate is attempted — `proof_max_parallel_candidates` defaults to 6, `0` runs all Phase A work without a batch cap, and positive values run strict batches without truncating the post-identification candidate list; actual Lean 4 subprocess verification queues one-at-a-time through `Lean4Client`, and Phase B (novelty / `add_proof` / dependency extraction / brainstorm+paper `append_proofs_section` / novel/known broadcasts / `record_failed_candidate`) remains strictly serialized in Phase-A completion order so intra-batch MOTO dependencies and per-source proof appending stay coherent +10. Candidate identification (`build_proof_identification_prompt`) is an impact-first user-prompt relevance gate, not a known-knowledge-base builder. It rejects off-prompt curiosities, routine helper lemmas, supporting lemmas, trivial/easy proofs, standard/textbook/Mathlib restatements, program-local firsts, minor reformulations/local formalizations, and single-tactic/routine proof goals, then returns candidates ordered by direct impact on the user's prompt: direct solutions or impossibility results first, then decisive reductions, obstructions, and structural theorems that themselves make major progress on the requested problem. Candidate prompts require expected novelty plus non-empty prompt-relevance, novelty, and anti-standard-result rationale fields; invalid, missing-rationale, or `not_novel` candidates are skipped before Lean cost. Every candidate that passes this gate is attempted — `proof_max_parallel_candidates` defaults to 6, `0` runs all Phase A work without a batch cap, and positive values run strict batches without truncating the post-identification candidate list; actual Lean 4 subprocess verification queues one-at-a-time through `Lean4Client`, and Phase B (novelty / `add_proof` / dependency extraction / brainstorm+paper `append_proofs_section` / novel/known broadcasts / `record_failed_candidate`) remains strictly serialized in Phase-A completion order so intra-batch MOTO dependencies and per-source proof appending stay coherent +10b. Proof-identification transport, empty-output, malformed/schema-invalid JSON, or no-JSON reasoning-only failures must preserve the proof checkpoint as an error; only a valid JSON no-candidates response may advance past proof checking as no candidates. 11. Each Phase-A task owns its own `ProofIdentificationAgent` / `MathlibLemmaSearchAgent` / `ProofFormalizationAgent` instance to keep per-agent `task_sequence` counters collision-free; account-credit exhaustion cancels siblings and preserves the checkpoint for provider pause/retry, while other Phase-A failures cancel siblings and return a structured `had_error=True` stage result for coordinator recovery 12. `should_stop` propagates into Phase A and is re-checked before each Phase-B pass so stop-requests short-circuit without leaking tasks or partially-applied Phase-B writes. Autonomous proof checkpoints persist the resolved candidate cursor, processed candidate IDs, proof labels/indexes, Lean attempt feedback, and post-Lean metadata needed for Phase B (including accepted theorem names/code) in workflow state so provider-credit pause, Stop/Start, restart, and model changes resume remaining candidates instead of re-identifying from Proof A; provider pauses during Phase A or Phase B must preserve the same checkpoint. Proof checkpoint completion markers are source- and round-trigger-scoped and must not transfer between brainstorms/papers or overwrite active later-round cursors. 13. Compiler rigor mode (`submit_rigor_lean_theorem`, `_rigor_loop`) is NOT parallelized and is capped at 5 consecutive cycles per rigor loop — rigor cycles discover, verify, and route one theorem per cycle (inline for eligible existing-paper claims, appendix-only for extension-derived theorems or placement fallback) so each verified theorem lands in the paper before the next discovery; the parallel candidate pipeline lives only in `ProofVerificationStage` 14. Post-Lean integrity scanning rejects newly introduced `axiom`, `constant`, and `opaque` declarations even when the declaration name appears on following lines. Generated source text is not an authorization baseline unless explicitly passed as allowed baseline. 15. Lean-accepted real proof code is preservation-worthy even when it misses the intended candidate. Alignment classifiers may downshift the stored theorem statement and ranking may classify it as `not_novel`, but the proof artifact must not be discarded except for hard integrity failures. +16. Unified proof/history search is bounded retrieval support only. It must never broaden proof candidate identification beyond the user's prompt, replace mandatory source context, or count SyntheticLib4 retrieved records as new MOTO proofs without the normal MOTO artifact path. +17. Assistant memory-support retrieval is a parallel, non-blocking configured LLM role across eligible non-validator, non-critique brainstorming, writing, proof, selection/path, and final-answer roles. It provides optional up-to-7 verified proof supports, reuses useful packs for two eligible receiver reads before refresh, skips no-external-history targets, uses durable run-scoped cooldown for repeated zero-useful or stagnant retrieval, shuts down only for repeated zero-useful retrieval in the current run, hides skip/backoff/shutdown turns from live activity, clears stale live packs/state on explicit reset/clear or Session History Memory disable, and never alters phase transitions or candidate eligibility. --- @@ -1300,7 +1306,7 @@ On **clean stop** (user-initiated via stop button), this file is preserved for p On **restart/crash recovery**, if this file exists with a resumable tier/topic/paper (regardless of `is_running`), the next Start detects the interrupted workflow and: 1. Restores internal state (topic ID, acceptance counts, model config, etc.) 2. Recovers stale acceptance counts from brainstorm metadata/database files when workflow state says `0` -3. Resumes from the last known valid phase; completed brainstorms with no generated paper resume at proof/paper handoff, while completed brainstorms that already generated papers are treated as finished and must not replay proof/paper handoff +3. Resumes from the last known valid phase; completed brainstorms with no generated paper resume at proof/paper handoff, while completed brainstorms that already generated papers are treated as finished and must not replay proof/paper handoff. In proofs-only runs, every resume path after brainstorm completion must run/finish brainstorm proof work and save a clean topic-exploration boundary without entering paper-title or paper-compilation phases. 4. Detects completed papers paused before proof verification and resumes `paper_proof_verification` before moving on 5. Broadcasts `auto_research_resumed` WebSocket event @@ -1308,7 +1314,7 @@ If `workflow_state.json` is stale, idle, or missing, session recovery conservati **Important Notes:** -- The user research prompt is saved in `auto_research_metadata.json`, not the workflow state +- The user research prompt is saved in session/research metadata, not the workflow state; `GET /api/auto-research/prompt` exposes only the active/resumable prompt for UI restart hydration - Model configuration is saved to allow resuming with the same model settings - If the workflow state file is corrupted or missing, first try durable session-file recovery; start fresh only if no current topic, in-progress paper, completed unpapered brainstorm, or completed papers can be recovered, and only when the session is not marked non-resumable/history-only - The `clear_all_data` API endpoint preserves session files for history, marks sessions `resume_disabled=true` / `status="cleared"`, and must fail if any session cannot be marked non-resumable @@ -1424,7 +1430,7 @@ Main interface component: - Start/Stop/Clear buttons - Current tier display (Tier 1 Brainstorm / Tier 2 Paper Writing) - Current brainstorm/paper progress -- Live activity feed (topic selections, submissions, completions) +- Live activity feed (topic selections, submissions, completions). Rejected brainstorm/topic/title activity should indicate that validator feedback was provided and reassure users that rejection streaks can be normal on difficult problems. ### BrainstormList.jsx Brainstorm management component: @@ -1481,9 +1487,9 @@ Persistent popup notification component for high-scoring paper critiques: ### AutonomousResearchSettings.jsx Settings integrated into main Settings panel: -- The user research prompt lives in `AutonomousResearchInterface.jsx`, persists as `autonomous_research_prompt`, and is disabled while a run is active. -- Covers model/provider/runtime settings for brainstorm submitters, validator, high-context, high-param, and critique submitter. -- Includes Cloud Access provider controls (OpenRouter provider/reasoning and desktop OAuth model choices), profiles/raw settings, free-model looping/auto-selector, developer-mode Supercharge controls, Tier 3 toggle, Wolfram controls, and proof-strength/sidebar UI. Advanced Lean/SMT proof runtime controls are shown only when desktop capabilities/runtime paths make them available; the user-facing proof-output toggle lives in the main interface Allowed Outputs row. +- The user research prompt lives in `AutonomousResearchInterface.jsx`, persists through the shared prompt-draft storage (`autonomous_research_prompt`, with large drafts offloaded from localStorage), and is disabled while a run is active. +- Covers model/provider/runtime settings for brainstorm submitters, validator, Writing Submitter, Rigor & Proofs Submitter (legacy `high_param_*` fields), and Assistant. Deprecated critique settings mirror Rigor & Proofs. +- Includes OpenRouter/OAuth provider controls (OpenRouter provider/reasoning and desktop OAuth model choices), profiles/raw settings, free-model looping/auto-selector, developer-mode Supercharge controls, Tier 3 toggle, and proof-strength/sidebar UI. Wolfram Alpha access is managed from the grouped connectivity panel. Advanced Lean/SMT proof runtime controls are shown only when desktop capabilities/runtime paths make them available; the user-facing proof-output toggle lives in the main interface Allowed Outputs row. ### Allowed Outputs Autonomous start requests include `allow_mathematical_proofs` and `allow_research_papers`; at least one must be true. Both true preserves today's paper-reference → brainstorm → proof checkpoint → paper-writing behavior. The Mathematical Proofs checkbox is the main user-facing Lean proof-output toggle; if it cannot enable/use Lean in the current runtime, proof-only starts must fail visibly rather than run brainstorms with no allowed output. Proofs-only runs skip paper writing after completed brainstorm proof work, clear any paper-handoff workflow phase before looping, and select up to 3 completed prior brainstorms as proof-stripped references for future brainstorms. Papers-only runs skip proof verification/proof output work. Brainstorm reference selection must use structured JSON handling with retry/validation or clear failure feedback, and selected reference content must enter context only through `strip_proofs=True`/sanitized text paths because novel proofs are injected separately from the proof database. @@ -1551,7 +1557,7 @@ Tier 3 Final Answer display component (separate tab for completed/overall final ### Relationship to Part 2 (Compiler) - Part 3 USES Part 2's compiler infrastructure for Tier 2 paper compilation - Same sequential Markov chain workflow -- Same high-context + high-parameter submitter architecture +- Same Writing + Rigor & Proofs submitter architecture - Same outline/construction/review/rigor modes - Same validator with coherence/rigor/placement checking - **DIFFERENCES**: @@ -1603,8 +1609,8 @@ Tier 3 Final Answer display component (separate tab for completed/overall final - Log error and continue brainstorm aggregation ### Paper Compilation Failure -- Paper compilation is retried indefinitely until success or user stops - no skipping allowed -- A completed brainstorm ALWAYS produces a paper; the system never abandons a brainstorm without writing its paper +- When research-paper output is enabled, paper compilation is retried indefinitely until success or user stops - no skipping allowed +- A completed brainstorm produces a paper only when research-paper output is enabled; proofs-only runs must finish proof work and return to topic exploration without entering paper/title phases - If the referenced brainstorm database was deleted, clear stale Tier 2 paper-writing state and restart the normal topic exploration → reference selection → brainstorm cycle - Title selection retries indefinitely with rejection feedback threaded into each attempt @@ -1613,7 +1619,7 @@ Tier 3 Final Answer display component (separate tab for completed/overall final ## Configuration Defaults ### Autonomous Research Mode -Normal GUI/API startup must pass explicit context-window and max-output-token settings for every role from the user's selected provider/model settings. Runtime code must not substitute hidden 131K/25K fallbacks. +Normal GUI/API startup must pass explicit context-window and max-output-token settings for every role from the user's selected provider/model settings, including the Assistant role when present. Runtime code must not substitute hidden 131K/25K fallbacks. Built-in/default profiles set Assistant equal to the primary Validator unless the user edits it; Session History Memory disabled greys out Assistant and prevents Assistant workflow-memory retrieval. - Completion review interval: 10 accepted submissions (cleanup removals do not advance the trigger) - Max brainstorms in parallel: 1 (sequential brainstorm → paper cycle) - Max topic-cycle reference papers: 3; Tier 3 short-form reference cap: 6 @@ -1664,9 +1670,9 @@ Normal GUI/API startup must pass explicit context-window and max-output-token se 32. **Topic validator validates continuation decisions** - not self-validation (strategic decision, not weight assessment) 33. **Tier 3 checks after brainstorm cycle completes** (move_on or hard limit), not between papers 34. **No brainstorm re-opening during continuation** - strictly write_another_paper or move_on -35. **Topic exploration runs before EVERY topic selection** — Uses the full Part 1 aggregator with batch validation to collect 5 candidate questions. It inherits normal Aggregator execution semantics, including single-model sequential mode when applicable. +35. **Topic exploration runs before EVERY topic selection** — Uses the full Part 1 aggregator with batch validation to collect 5 candidate questions; topic selection must not run on zero or partial exploration candidates. It inherits normal Aggregator execution semantics, including single-model sequential mode when applicable. 36. **Topic exploration uses standard aggregator (cleanup disabled)** — Same submitter scheduling, batch validation (up to 3), queue management, and single-model handling as normal brainstorms. Cleanup/pruning is disabled because the phase is capped at 5 candidates and the temp DB is deleted afterwards. -37. **Paper title exploration runs before EVERY title selection** — Uses full Part 1 aggregator to collect 5 candidate titles before every paper creation (Tier 2 papers 1/2/3, Tier 3 short-form, Tier 3 gap/intro/conclusion chapters). No exceptions. +37. **Paper title exploration runs before EVERY title selection** — Uses full Part 1 aggregator to collect 5 candidate titles before every paper creation (Tier 2 papers 1/2/3, Tier 3 short-form, Tier 3 gap/intro/conclusion chapters); title selection must not run on zero or partial title candidates. No exceptions. 38. **Title exploration uses standard aggregator (cleanup disabled)** — Same submitter scheduling, batch validation, queue management, and single-model handling as normal brainstorms. Cleanup/pruning is disabled because the phase is capped at 5 candidates and the temp DB is deleted afterwards. 39. **Final title selection sees candidate titles** — The 6th selection can choose a candidate, synthesize, or propose new. Must justify divergence from all candidates. 40. **Proof verification is an optional post-brainstorm and post-paper checkpoint** — Gated on `lean4_enabled`; when disabled it may emit lightweight skip/status events, but must not invoke Lean/proof model work or block workflows. Lean 4 is authoritative; SMT (when `smt_enabled`) contributes hints only. See "Proof Verification Stage" section for the full invariant list. @@ -1698,9 +1704,9 @@ Normal GUI/API startup must pass explicit context-window and max-output-token se Each role in autonomous research mode supports cloud provider selection where configured: -- **Provider Toggle** (default mode): Role selectors can choose LM Studio, OpenRouter, or configured desktop-only OAuth providers. In generic mode, all roles use OpenRouter only and non-OpenRouter provider toggles are hidden/unavailable. +- **Provider Toggle** (default mode): Role selectors can choose LM Studio, OpenRouter, or configured desktop-only cloud providers (OAuth or Sakana Fugu API key). In generic mode, all roles use OpenRouter only and non-OpenRouter provider toggles are hidden/unavailable. - **OpenRouter Model Selector**: When OpenRouter enabled, dropdown shows available OpenRouter models -- **OAuth Model Selector**: When a desktop OAuth provider is configured, dropdowns show that account's provider-backed models from the Cloud Access flow; this is distinct from regular provider API-key billing. +- **OAuth Model Selector**: When a desktop OAuth provider is configured, dropdowns show that account's provider-backed models from the OpenRouter/OAuth flow; this is distinct from regular provider API-key billing. - **Provider/Host Selector**: Specific provider selection (e.g., "Anthropic", "Google AI", "AWS Bedrock") or "Default (OpenRouter chooses)" - **OpenRouter Auto-Fill**: Selecting an OpenRouter model auto-fills context from the model-level `context_length`. Max output tokens use `min(20% of model context_length, endpoint max_completion_tokens)`: auto provider mode filters weak/low-cap endpoints and uses the smallest remaining capable endpoint cap, while an explicit host selection uses that host's largest exposed endpoint cap. Endpoint `context_length` / `max_prompt_tokens` rows are diagnostics, not context shrink limits. If endpoint output caps are incomplete, preserve current values (no guessing). - **LM Studio Fallback** (default mode only): Optional fallback model if cloud provider access fails (credit exhaustion, auth errors, or transient provider errors) diff --git a/.cursor/rules/program-directory-and-file-definitions.mdc b/.cursor/rules/program-directory-and-file-definitions.mdc index 254b038..03d2624 100644 --- a/.cursor/rules/program-directory-and-file-definitions.mdc +++ b/.cursor/rules/program-directory-and-file-definitions.mdc @@ -39,6 +39,7 @@ project-root/ │ │ ├── openrouter_client.py # OpenRouter HTTP API client (credit exhaustion detection + model/endpoint metadata) │ │ ├── openai_codex_client.py # Desktop-only OpenAI Codex/ChatGPT OAuth client, Codex backend adapter, and Codex model metadata normalizer │ │ ├── xai_grok_client.py # Desktop-only xAI Grok/SuperGrok `auth.x.ai` PKCE OAuth client and Grok model metadata normalizer +│ │ ├── sakana_fugu_client.py # Desktop-only Sakana Fugu subscription API client (Responses-first, chat-completions fallback, model metadata) │ │ ├── api_client_manager.py # Unified API router (Supercharge wrapper + OpenRouter/OAuth/LM Studio fallback + boost) │ │ ├── boost_manager.py # Singleton boost manager (tracks boost modes: next-count, always-prefer, category; aliases absorbed LeanOJ path-decision tasks into Final Solver boost category) │ │ ├── boost_logger.py # Boost API call logger (persists redacted/default-safe entries to boost_api_log.txt) @@ -56,13 +57,16 @@ project-root/ │ │ ├── critique_memory.py # Paper critique persistence (saves up to 10 validator critiques per paper) │ │ ├── critique_prompts.py # Default critique prompt and builder function for validator critiques │ │ ├── secret_store.py # Secure credential persistence via OS keyring (OpenRouter, desktop OAuth tokens, Wolfram Alpha); bypassed in generic mode -│ │ ├── runtime_settings.py # Non-secret runtime setting persistence under the active data root (free-model knobs plus desktop/default proof knobs) +│ │ ├── runtime_settings.py # Non-secret runtime setting persistence under the active data root (free-model knobs, desktop/default proof knobs, and connectivity toggles) +│ │ ├── provider_notification_store.py # Non-secret recent provider/OAuth failure notifications for missed-popup recovery │ │ ├── build_info.py # Build identity resolver (manifest + git HEAD/ZIP stamp + env overrides) │ │ ├── path_safety.py # Safe path resolution helpers (realpath/normpath containment checks) +│ │ ├── syntheticlib4_client.py # Contract-first SyntheticLib4 mock/offline client for release/status/retrieve/account-proof work, API-key secret scaffolding, snapshot metadata validation, and built-in fallback records when test fixtures are absent +│ │ ├── proof_search/ # Unified proof-search models, source normalizers, batched SQLite/FTS indexer, freshness/toggle-aware service, `search_lean_proofs` tool adapter, and Assistant memory-pack coordinator/ranker/cache │ │ ├── fastembed_provider.py # FastEmbed embedding wrapper (generic mode only, lazy-imported) │ │ ├── lean4_client.py # Lean 4 proof checker client (subprocess gated on `lean4_enabled`, optional LSP persistent mode gated on `lean4_lsp_enabled`; offloads temp/workspace filesystem operations from the FastAPI event loop) │ │ ├── lean_proof_integrity.py # Shared post-Lean integrity gate (rejects fake axiom/constant/opaque devices and validates theorem-statement alignment) -│ │ ├── brainstorm_proof_gate.py # Shared Lean 4 gate for optional brainstorm proof candidates before normal validation; Lean-accepted real proofs are preserved even when ranked non-novel +│ │ ├── brainstorm_proof_gate.py # Shared Lean 4 gate for optional brainstorm proof candidates before normal validation; rejects missing novelty/rationale metadata before Lean cost and preserves Lean-accepted real artifacts for novelty/validator review │ │ └── smt_client.py # Z3/SMT launcher-managed subprocess wrapper (gated on `smt_enabled`; never authoritative on its own) │ ├── aggregator/ # AGGREGATOR │ │ ├── __init__.py @@ -104,9 +108,8 @@ project-root/ │ │ │ └── compiler_rag_manager.py # Compiler-specific RAG wrapper (user-configurable context per role) │ │ ├── agents/ │ │ │ ├── __init__.py # Package initialization -│ │ │ ├── high_context_submitter.py # 3 modes: construction, outline, review -│ │ │ ├── high_param_submitter.py # Rigor enhancement mode -│ │ │ └── critique_submitter.py # Critique phase submitter (peer review) +│ │ │ ├── writer_submitter.py # 3 modes: construction, outline, review +│ │ │ └── high_param_submitter.py # Rigor & Proofs role (rigor/proof/critique generation) │ │ ├── validation/ │ │ │ ├── __init__.py # Package initialization │ │ │ └── compiler_validator.py # Validates coherence, rigor, placement (plus rigor_lean_placement mode for Lean-4 theorem placement) @@ -130,7 +133,7 @@ project-root/ │ │ │ ├── __init__.py # Package initialization │ │ │ ├── autonomous_coordinator.py # Orchestrates the Tier 1 → Tier 2 → Tier 3 autonomous workflow (invokes proof verification checkpoints after brainstorm/Tier 2 paper completion when `lean4_enabled`) │ │ │ ├── autonomous_rag_manager.py # Autonomous-specific RAG wrapper -│ │ │ ├── proof_verification_stage.py # Orchestrates novelty-first proof identification → Lean 4 attempts (3 full + 2 tactic) → shared integrity/downshift gate → novelty check → proof storage; optional SMT hints + Mathlib lemma search; per-source reservation lock +│ │ │ ├── proof_verification_stage.py # Orchestrates impact-first proof identification → Lean 4 attempts (3 full + 2 tactic) → shared integrity/downshift gate → novelty check → proof storage; optional SMT hints + Mathlib lemma search; per-source reservation lock │ │ │ ├── proof_novelty.py # Shared proof novelty assessment helper used by autonomous proof verification and compiler rigor │ │ │ ├── proof_registration.py # Shared registration helper for verified Lean proofs from autonomous, compiler, aggregator, and LeanOJ flows │ │ │ └── proof_dependency_extractor.py # Parses verified Lean 4 code to extract `ProofDependency` records (imports, Mathlib lemmas, MOTO-origin refs) @@ -141,7 +144,7 @@ project-root/ │ │ │ ├── completion_reviewer.py # Brainstorm completion review (SPECIAL SELF-VALIDATION) │ │ │ ├── reference_selector.py # Reference paper selection workflow │ │ │ ├── paper_title_selector.py # Paper title selection -│ │ │ ├── proof_identification_agent.py # Extracts novelty-first theorem candidates with expected novelty/prompt-relevance/anti-known-result rationale; skips not_novel or missing-tier candidates before Lean cost +│ │ │ ├── proof_identification_agent.py # Extracts impact-first theorem candidates with expected novelty/prompt-relevance/anti-known-result rationale; skips not_novel or missing-tier candidates before Lean cost │ │ │ ├── proof_formalization_agent.py # Generates Lean 4 proof scripts for candidates with mandatory full source context plus novelty metadata, Mathlib hints, and SMT hints when enabled │ │ │ ├── lemma_search_agent.py # Mathlib lemma search agent (Build 2) — surfaces relevant existing lemmas for formalization prompts │ │ │ └── final_answer/ # TIER 3 - Final Answer Generation Agents @@ -205,10 +208,13 @@ project-root/ │ │ ├── update.py # Update/check endpoints for launcher/updater state (`POST /api/update/pull`, `GET /api/update/pull-status`) │ │ ├── download.py # PDF generation endpoint via Playwright (desktop only; sanitize/block external requests; returns 501 in generic mode) │ │ ├── openrouter.py # OpenRouter API endpoints (global key, models/providers via header/body keys only, LM Studio availability, model cache, reset exhaustion) -│ │ ├── cloud_access.py # Cloud Access & Keys endpoints including desktop OAuth login/model listing +│ │ ├── cloud_access.py # OpenRouter/OAuth support endpoints including desktop OAuth login/model listing │ │ ├── websocket.py # WebSocket for real-time updates (generic proxy auth or desktop one-time tickets before accept) │ │ ├── features.py # GET /api/features — shared build identity plus stable capability flags; GET /api/update-notice — launcher/runtime-refreshed update notice │ │ ├── proofs.py # Proof database + Lean 4/SMT runtime + manual proof-check + certificate export + dependency graph routes; current proof listing/detail/certificate/graph routes accept `scope=autonomous|manual` so active manual writer proofs use their own instance-level store; proof-library routes accept `scope=autonomous|manual`, with manual scope reading archived manual proof runs only; listing proofs (`GET /`, `/novel`, `/known`, `/library*`) and certificate/lean downloads (`/{id}/certificate`, `/{id}/certificate.lean`) are always available regardless of `lean4_enabled`; dependency/graph routes and `/check` are gated on `lean4_enabled`; `/status` uses short timeouts so it never blocks the UI +│ │ ├── proof_search.py # Unified proof-search routes (`/api/proof-search/*`) over canonical MOTO proofs and SyntheticLib4 fixture/snapshot records; public search is capped at 7 combined results, overview is filtered by enabled corpus toggles, and detail hydration is bounded +│ │ ├── syntheticlib4.py # SyntheticLib4 corpus status/release/refresh/safe local import/reindex/retrieve-batch/account-proof routes plus production OAuth placeholders; public retrieve-batch is capped at 7 records +│ │ ├── connectivity.py # Non-secret grouped connectivity status/toggle routes for inference providers and optional skills │ │ └── health.py # GET /api/health — readiness/liveness probe with slim instance/build metadata │ │ │ ├── data/ # Persistent data storage @@ -258,6 +264,8 @@ project-root/ │ │ ├── proofs/ # Legacy (non-session) Lean 4 proof storage (mirrors per-session proofs/ layout) │ │ ├── manual_proofs/ # Active manual Aggregator/Compiler proof storage; archived/cleared on manual run reset so old proofs do not enter new manual prompts │ │ ├── manual_proof_runs/ # Archived manual proof runs for history/library viewing only, never active prompt context +│ │ ├── proof_search/ # Generated unified proof-search SQLite/FTS index under the active data root +│ │ ├── syntheticlib4/ # Planned authorized SyntheticLib4 local snapshots/status/cache under the active data root │ │ ├── leanoj_sessions/ # LeanOJ run state (state.json, master_proof.lean, master_proof_edits.jsonl, master_proof_snapshots.jsonl, phase counters, proof fragments, attempts, verified final Lean code; stop/crash resumes unless cleared) │ │ ├── leanoj_partial_proofs/ # LeanOJ partial/supporting proof scaffold JSONL store, keyed by session │ │ ├── leanoj_artifacts/ # LeanOJ full-memory artifact logs (accepted ideas with context_role metadata, verified/partial/failed proof fragments, final attempts, final-cycle packets) used for direct-first RAG allocation @@ -280,7 +288,7 @@ project-root/ │ │ │ │ │ │ │ ├── compiler/ # COMPILER │ │ │ │ ├── CompilerInterface.jsx # Replace placeholder: prompt input, start/stop, status -│ │ │ │ ├── CompilerSettings.jsx # Compiler role selections (validator, high-context, high-param, critique), capability-gated LM/OpenRouter UI +│ │ │ │ ├── CompilerSettings.jsx # Compiler role selections (Validator, Writing, Rigor & Proofs, Assistant), capability-gated LM/OpenRouter UI; deprecated critique fields mirror Rigor & Proofs │ │ │ │ ├── CompilerLogs.jsx # Metrics: construction vs rigor, miniscule edits │ │ │ │ └── LivePaper.jsx # Real-time paper viewing, save draft, word count │ │ │ │ @@ -320,7 +328,12 @@ project-root/ │ │ │ └── index.js # LeanOJ component exports │ │ │ │ │ ├── StartupProviderSetupModal.jsx # Post-disclaimer startup chooser for OpenRouter or LM Studio setup; desktop OAuth is shown only as an after-startup add-on because embeddings require OpenRouter/LM Studio (OpenRouter-only in generic mode) -│ │ ├── OpenRouterApiKeyModal.jsx # Cloud Access & Keys modal for OpenRouter API key and desktop OAuth logins +│ │ ├── ConnectivityPanel.jsx # Grouped top-right connectivity launcher for inference providers and optional skills +│ │ ├── OpenRouterApiKeyModal.jsx # OpenRouter/OAuth modal for OpenRouter API key and desktop OAuth logins +│ │ ├── SyntheticLib4AccessModal.jsx # SyntheticLib4 coming-soon proof-corpus explainer +│ │ ├── WolframAlphaAccessModal.jsx # Wolfram Alpha App ID and enable/disable modal +│ │ ├── LMStudioConnectivityModal.jsx # LM Studio status/setup modal +│ │ ├── AgentConversationMemoryModal.jsx # User-facing stored-proof memory toggle/status modal │ │ ├── PaperCritiqueModal.jsx # Modal for displaying validator paper critiques (ratings, feedback, history) │ │ ├── CritiqueNotificationStack.jsx # Persistent popup notifications for high-scoring critiques (≥6.25 avg) │ │ ├── CreditExhaustionNotificationStack.jsx # Persistent red notifications for OpenRouter credit exhaustion with "Retry OpenRouter" reset button @@ -349,6 +362,8 @@ project-root/ │ │ │ ├── openRouterSelection.js # Shared OpenRouter selector auto-fill helpers (context/output from model + host metadata) │ │ │ ├── autonomousProfiles.js # Shared autonomous recommended-profile definitions and persistence helpers │ │ │ ├── leanojProfiles.js # LeanOJ-specific recommended/user profile definitions, persistence helpers, and request builder (topic generation uses all submitters; legacy topic_generator/selector is sourced from Brainstorm Submitter 1; legacy path_decider request field is derived from Final Proof Solver) +│ │ │ ├── safeStorage.js # Defensive browser-storage readers for small UI preferences +│ │ │ ├── activityStyles.js # Shared live-activity formatting/styling helpers │ │ │ ├── runtimeConfig.js # Frontend runtime helpers (instance storage prefix, active data-root display, instance ID) │ │ │ ├── researchRunHistory.js # Groups Tier 2 papers + final answers into per-run history entries for Stage2PaperHistory/FinalAnswerLibrary │ │ │ └── disclaimerHelper.js # Frontend-only disclaimer injection for brainstorm/paper views @@ -378,6 +393,13 @@ project-root/ ├── moto_updater.py # Build 1 updater helper (manifest fetch, install classification, ZIP/git apply flow, launcher state tracking) └── .moto_launcher_state.json # Gitignored local launcher state (tracks active service-window PIDs and runtime roots to block unsafe update-apply) +## SyntheticLib4 / Unified Proof Search + +- MOTO-side SyntheticLib4 work provides a shared client for authorized corpus status, local snapshot handling, bounded retrieve-batch calls, user proof browsing/search, and usage attestations. Secrets follow the existing default-mode keyring and generic-mode in-memory/env split; snapshot files and runtime settings remain non-secret. +- Unified proof/history search is shared infrastructure over canonical MOTO `ProofRecord` stores, selected LeanOJ proof artifacts, manual proof history, and authorized SyntheticLib4 snapshots. It provides compact corpus overviews, dedupe/provenance, bounded retrieval, a shared AI-facing tool adapter, and usage-attestation support without changing proof validation semantics. +- SyntheticLib4 access and proof-search routes are additive to existing proof routes. Auth, membership, quota, schema, or index errors are corpus-search status, not Lean proof failures. Public proof-search/retrieve-batch route schemas expose their 7-result caps, while internal Assistant candidate-pool retrieval may gather a wider bounded pool before producing a final up-to-7 support pack. +- Local SyntheticLib4 cache data lives under the active data root and activates only after manifest/hash/index checks pass. A failed refresh leaves the previous active snapshot usable. + ## File Purpose Descriptions ### Launcher and Updater @@ -401,6 +423,7 @@ project-root/ - `lm_studio_client.py`: LM Studio HTTP client (completions, embeddings, model listing, same-base numeric `:#` instance sharing for independent calls); generic-mode inference and embeddings bypass it, though shared legacy diagnostics may still exist - `openrouter_client.py`: OpenRouter HTTP client (credit exhaustion detection, fallback, model/provider endpoint metadata) - `openai_codex_client.py`: Desktop-only OpenAI Codex/ChatGPT OAuth token lifecycle, Codex backend Responses adapter (`stream=true`, strips unsupported output-limit/temperature knobs, local event aggregation), and Codex model context/output metadata normalizer; not the regular OpenAI API-key billing path +- `sakana_fugu_client.py`: Desktop-only Sakana Fugu subscription API client; stores API key in keyring, prefers `/responses`, falls back to `/chat/completions` for tool turns, normalizes to Chat-Completions-compatible responses, and exposes model metadata for role auto-fill - `api_client_manager.py`: Unified API router (optional per-role Supercharge wrapper, OpenRouter/desktop OAuth/LM Studio fallback, boost, and model tracking); generic mode early-returns FastEmbed for embeddings - `boost_manager.py`: Singleton boost manager (next-count, always-prefer, category, and per-task boost routing; broadcasts events) - `boost_logger.py`: Boost API call logger (persists boost-routed calls for the combined API log view) @@ -408,8 +431,8 @@ project-root/ - `free_model_manager.py`: Free model rotation/cooldown singleton (looping, auto-selector `openrouter/free`, account exhaustion detection) - `model_error_utils.py`: Shared non-retryable provider/config error detection; callers pause recoverable credit exhaustion, while hard config/privacy/missing-key errors fail visibly with a user-repair path instead of becoming proof or validation failures. - `provider_pause.py`: Process-local provider-credit pause/resume event. LeanOJ and autonomous proof checkpoints preserve durable workflow state separately; `/api/openrouter/reset-exhaustion` wakes currently waiting in-process proof workflows. -- `brainstorm_proof_gate.py`: Shared Lean 4 gate for optional proof-candidate brainstorm submissions before normal brainstorm validation; preserves Lean-accepted real proof artifacts even when novelty is low. -- `wolfram_alpha_client.py`: Wolfram Alpha API client. Exposed to the HighContextSubmitter.submit_construction loop as the `wolfram_alpha_query` tool (up to 20 calls per construction submission); logs/broadcasts must redact raw query/result text. +- `brainstorm_proof_gate.py`: Shared Lean 4 gate for optional proof-candidate brainstorm submissions before normal validation; rejects missing novelty/rationale metadata before Lean cost and preserves Lean-accepted real artifacts for novelty/validator review. +- `wolfram_alpha_client.py`: Wolfram Alpha API client. Exposed to the WritingSubmitter.submit_construction loop as the `wolfram_alpha_query` tool (up to 20 calls per construction submission); logs/broadcasts must redact raw query/result text. - `rag_lock.py`: Global RAG operation lock (prevents collision, retry logic for reads); embedding lock skip in generic mode (FastEmbed is in-process/thread-safe) - `token_tracker.py`: Cumulative input/output token tracker singleton with per-model breakdown and research timer. Reset on session start, timer start/stop tied to coordinator lifecycle. Stats broadcast via `token_usage_updated` WebSocket event after each successful LLM call. - `utils.py`: Token counting, text compression, file I/O @@ -418,8 +441,10 @@ project-root/ - `critique_memory.py`: Paper critique persistence (ratings, feedback, history, session-aware) - `critique_prompts.py`: Default critique prompt and builder function - `secret_store.py`: Secure credential persistence via OS keyring; bypassed in generic mode (keys are env-injected/in-memory only) -- `runtime_settings.py`: Persists non-secret process settings such as free-model looping and desktop/default-mode proof runtime flags/timeouts under the active data root +- `runtime_settings.py`: Persists non-secret process settings such as free-model looping, desktop/default-mode proof runtime flags/timeouts, and connectivity feature toggles under the active data root +- `provider_notification_store.py`: Persists recent non-secret provider/OAuth failure notifications so missed live popups can hydrate after reconnect/reload - `build_info.py`: Build identity helper that reads the committed manifest contract, resolves git HEAD or ZIP-stamped build commits, and applies optional env overrides for runtime version/build stamping +- `proof_search/assistant_coordinator.py` / `assistant_ranker.py` / `assistant_cache.py` / `assistant_models.py`: Shared Assistant memory-support infrastructure. Assistant is a configured non-blocking LLM role that selects up to 7 verified proof supports from enabled proof-history corpora, reuses useful packs for two eligible receiver reads before refresh, persists cooldown/shutdown state for repeated unhelpful retrieval, clears stale live packs on reset/clear paths, and disables when Session History Memory is off. - `fastembed_provider.py`: FastEmbed embedding wrapper (generic mode only); lazy-imported so default installs are unaffected - `lean4_client.py`: Lean 4 proof checker client. Subprocess mode by default; optional persistent LSP mode when `lean4_lsp_enabled`. When `lean4_enabled=False`, proof checks return explicit disabled/error results rather than invoking Lean. Never bundled into the hosted image. - `smt_client.py`: Optional Z3/SMT launcher-managed subprocess wrapper. When `smt_enabled=False`, SMT checks return explicit disabled/error results rather than invoking Z3. SMT results are hint-only; Lean 4 remains authoritative. Never bundled into the hosted image. @@ -430,8 +455,8 @@ project-root/ - `compiler_rag_manager.py`: Per-role context windows, direct→RAG priority - `outline_memory.py`, `paper_memory.py`, `compiler_rejection_log.py`: File I/O and logging - `critique_memory.py`: Accepted critiques database for peer review aggregation phase -- `critique_rejection_memory.py`: Last 5 critique rejection feedback logs (helps critique submitter learn) -- `high_context_submitter.py`, `high_param_submitter.py`, `critique_submitter.py`: Submitter agents +- `critique_rejection_memory.py`: Last 5 critique rejection feedback logs (helps Rigor & Proofs critique generation learn during the compatibility period) +- `writer_submitter.py`, `high_param_submitter.py`: Submitter agents; Rigor & Proofs owns rigor, proof, and critique generation - `compiler_validator.py`: Validates coherence, rigor, placement - Prompts: `outline_prompts.py`, `construction_prompts.py`, `review_prompts.py`, `rigor_prompts.py`, `critique_prompts.py` @@ -451,7 +476,7 @@ project-root/ ### LeanOJ Components -- `leanoj_coordinator.py`: Runs the proof-only LeanOJ state machine. It uses parallel submitters plus batch validators for broad initial foundation topics and brainstorms; classifies accepted brainstorm context as `active_plan`, `verified_hint`, `refuted_construction`, or `scratch`; keeps ordinary partial `sorry` scaffolds and failed final attempts out of master-proof seeding unless explicitly elevated; persists accepted-idea `context_role` and chronological occurrence metadata; stores full proof memory independently from trimmed UI/status lists; rejects fake proof devices; persists final-cycle failure packets; emits LeanOJ progress events; routes prompt memory through allocated context blocks; passes the most recent 5 final attempts as compact final-solver execution feedback; and requires Final Proof Solver semantic review before a Lean-passing final proof stops as verified. +- `leanoj_coordinator.py`: Runs the proof-only LeanOJ state machine. It uses parallel submitters plus batch validators for broad initial foundation topics and brainstorms; classifies accepted brainstorm context as `active_plan`, `verified_hint`, `refuted_construction`, or `scratch`; keeps ordinary partial `sorry` scaffolds and failed final attempts out of master-proof seeding unless explicitly elevated; persists accepted-idea `context_role` and chronological occurrence metadata; stores full proof memory independently from trimmed UI/status lists; rejects fake proof devices; persists final-cycle failure packets; emits LeanOJ progress events; routes prompt memory through allocated context blocks; passes the most recent 5 final attempts plus optional metadata-only `search_lean_proofs` results as compact final-solver execution/context guidance; and requires Final Proof Solver semantic review before a Lean-passing final proof stops as verified. - `leanoj_context.py`: Owns LeanOJ artifact JSONL persistence under the active data root, direct-first allocation, final-solver context routing (verified proof fragments + `active_plan` notes direct, refuted constructions only as compact warnings, ordinary partial scaffolds excluded from final direct proof context), source-name generation, RAG indexing, session-scoped retrieval with `include_source_prefixes`, direct-source exclusion, resume reload support, and Clear Progress cleanup for LeanOJ RAG sources. - `prompts.py`: LeanOJ prompt builders for topic, brainstorm, prune review, path/final-solver editing, and final semantic review roles. These consume prepared context blocks (`direct_proof_context`, `rag_evidence_context`, `refuted_construction_warnings`, `capped_rejection_feedback`, `current_final_cycle_packet`) instead of owning persistence or truncation policy; prune prompts may conservatively remove, update, or add one compact corrective memory item without forcing deletion; final-solver prompts must keep `master_proof.lean` to the current chosen proof route, include only compact recent-attempt execution feedback, and avoid accumulating explored/refuted routes. @@ -460,16 +485,18 @@ project-root/ - `compiler.py`: Compiler control (start/stop/status), paper/outline access, critique management - `autonomous.py`: Autonomous research control (start/stop/clear/status), brainstorm/paper access, pruned/history paper routes, Tier 3/final-answer library routes, critique/API-log helpers, and current-paper recovery actions - `proofs.py`: Proof database listing (`GET /`, `/novel`, `/known`) and `/status` runtime readiness — always available, never gated. Current proof listing/detail/certificate/dependency/graph routes accept optional `scope=autonomous|manual`; manual scope reads the active manual writer proof store under the active data root rather than the autonomous session store. `/{id}/certificate` and `/{id}/certificate.lean` — always available (data is stored on disk; Lean version info populated only when Lean is enabled). `/status` uses `asyncio.wait_for` timeouts (5s Lean, 3s Z3) so the endpoint never hangs. `POST /settings` runtime flag updates. `POST /check` manual proof check, `/{id}/dependencies`, `/graph`, `/mathlib/{lemma}/dependents` graph/lineage queries — gated on `lean4_enabled`. `GET /library` + `GET /library/{session_id}/{proof_id}` cross-session proof library endpoints accept `scope=autonomous|manual`; manual library scope reads archived manual proof runs only and never active prompt context. +- `syntheticlib4.py`: SyntheticLib4 corpus access/status routes for local snapshot status, release listing, safe local snapshot import, refresh/reindex, retrieve-batch, account-proof browsing/search, and production OAuth placeholders. API keys use the active mode's secret path while live validation remains pending; local mock/offline search uses data-root snapshots first, then test fixtures when present, then built-in fallback records when packaged fixtures are unavailable. +- `connectivity.py`: Non-secret grouped status and toggle routes (`/api/connectivity/status`, `/api/connectivity/toggles`) for OpenRouter/OAuth, LM Studio, SyntheticLib4, local proof-history Session History Memory, Wolfram Alpha, and Boost state. Toggle changes never clear credentials, snapshots, indexes, or history. - `leanoj.py`: LeanOJ proof-solver routes for start (including matching saved-progress resume), stop, status, clear, skip-brainstorm, force-brainstorm, current proof listing via `/api/leanoj/proofs`, cross-session library via `/api/leanoj/library*`, plus read-only `GET /api/leanoj/master-proof` and `/api/leanoj/master-proof/edits` for the durable master proof draft and compact edit-history summaries. ### Frontend Components -- `App.jsx`: Top-level GUI shell. Default mode is `Autonomous S.T.E.M. ASI` for Part 3 screens; `Advanced Manual S.T.E.M. ASI` contains the manual Part 1 Aggregator + Part 2 Compiler workspace; `LeanOJ Proof Solver` is a developer-mode-only proof mode. Shared utility controls (Boost, Cloud Access & Keys, WorkflowPanel) remain global, and Build 3C bootstraps `/api/features` here so hosted mode can hide LM Studio/desktop-OAuth UI and copy. Shift + Z + X toggles persisted developer-mode settings, LeanOJ mode, raw JSON editors, and Supercharge controls. Supercharge request payloads must be forced off unless developer mode is active. Active app mode and tab state are in-memory only; a fresh frontend mount starts on the autonomous main interface. **Autonomous tab groups**: main tabs (interface, brainstorms, papers, proofs, optional final-answer) + settings group (Your Completed Works Library, API Call Logs, Settings). The "Your Completed Works Library" tab hosts three sub-tabs rendered inside its content area: Stage 2 Papers History, Stage 3 Final Answers History, and Proof Library. -- Live activity feeds keep long bounded histories (thousands of entries) so active workflow context is not lost quickly while still preventing unbounded UI growth. +- `App.jsx`: Top-level GUI shell. Default mode is `Autonomous S.T.E.M. ASI` for Part 3 screens; `Advanced Manual S.T.E.M. ASI` contains the manual Part 1 Aggregator + Part 2 Compiler workspace; `LeanOJ Proof Solver` is a developer-mode-only proof mode. Shared utility controls (ConnectivityPanel, WorkflowPanel, and BoostControlModal from the panel) remain global, and Build 3C bootstraps `/api/features` here so hosted mode can hide LM Studio/desktop-OAuth UI and copy. Shift + Z + X toggles persisted developer-mode settings, LeanOJ mode, raw JSON editors, and Supercharge controls. Supercharge request payloads must be forced off unless developer mode is active. Active app mode and tab state are in-memory only; a fresh frontend mount starts on the autonomous main interface. **Autonomous tab groups**: main tabs (interface, brainstorms, papers, proofs, optional final-answer) + settings group (Your Completed Works Library, API Call Logs, Settings). The "Your Completed Works Library" tab hosts three sub-tabs rendered inside its content area: Stage 2 Papers History, Stage 3 Final Answers History, and Proof Library. +- Live activity feeds keep long bounded histories (thousands of entries) and persist in browser storage across tab reloads, backend restarts, stop/start cycles, and crashes; explicit mode clear/reset actions are the user-facing reset point. - **Aggregator**: `AggregatorInterface.jsx`, `AggregatorSettings.jsx`, `AggregatorLogs.jsx`, `LiveResults.jsx` - **Compiler**: `CompilerInterface.jsx`, `CompilerSettings.jsx`, `CompilerLogs.jsx`, `LivePaper.jsx` - **Autonomous**: `AutonomousResearchInterface.jsx`, `BrainstormList.jsx`, `PaperLibrary.jsx`, `AutonomousResearchSettings.jsx`, `AutonomousResearchLogs.jsx`, `LivePaperProgress.jsx`, `LiveTier3Progress.jsx`, `FinalAnswerView.jsx`, `FinalAnswerLibrary.jsx` (Stage 3 history sub-tab), `ArchiveViewerModal.jsx`, `MathematicalProofs.jsx` (scoped live proof/status/manual-check/certificate tab used by Autonomous and Manual modes), `ProofGraph.jsx` (dependency graph), `ProofNotificationStack.jsx` (novel-proof popups), `ProofLibrary.jsx` (cross-session proof library sub-tab), `Stage2PaperHistory.jsx` (Stage 2 history sub-tab) - **LeanOJ**: `LeanOJInterface.jsx`, `LeanOJBrainstorms.jsx`, `LeanOJLogs.jsx`, `LeanOJMasterProof.jsx`, `LeanOJMathematicalProofs.jsx`, `LeanOJProofLibrary.jsx`, `LeanOJSettings.jsx` -- **Shared**: `StartupProviderSetupModal.jsx`, `OpenRouterApiKeyModal.jsx`, `PaperCritiqueModal.jsx`, `CritiqueNotificationStack.jsx`, `CreditExhaustionNotificationStack.jsx`, `BoostControlModal.jsx`, `WorkflowPanel.jsx`, `TextFileUploader.jsx`, `OpenRouterPrivacyWarningModal.jsx`, `UpdateNotificationBanner.jsx`, `LatexRenderer.jsx` (dual view, KaTeX, theorem parsing), `LatexRenderer.css` +- **Shared**: `StartupProviderSetupModal.jsx`, `ConnectivityPanel.jsx`, `OpenRouterApiKeyModal.jsx`, `SyntheticLib4AccessModal.jsx`, `WolframAlphaAccessModal.jsx`, `LMStudioConnectivityModal.jsx`, `AgentConversationMemoryModal.jsx`, `PaperCritiqueModal.jsx`, `CritiqueNotificationStack.jsx`, `CreditExhaustionNotificationStack.jsx`, `BoostControlModal.jsx`, `WorkflowPanel.jsx`, `TextFileUploader.jsx`, `OpenRouterPrivacyWarningModal.jsx`, `UpdateNotificationBanner.jsx`, `LatexRenderer.jsx` (dual view, KaTeX, theorem parsing), `LatexRenderer.css` - **Hooks**: `useProofCheckRuntime.js` (reads `/api/proofs/status` + runtime config so UI can enable/disable manual proof-check controls) -- **Utils**: `downloadHelpers.js` (PDF/raw download), `modelCache.js` (display_name → api_id lookup), `openRouterSelection.js` (shared OpenRouter selector auto-fill helpers using model context and provider endpoint caps), `autonomousProfiles.js` (shared recommended-profile definitions + persistence helpers; when editing a preset, anchor to the exact profile block and exact nested role such as `validator` or `highContext`, never to a shared literal alone, then verify the diff only touched that intended profile/role), `disclaimerHelper.js` (frontend-only disclaimer injection), `api.js`, `websocket.js` +- **Utils**: `downloadHelpers.js` (PDF/raw download), `modelCache.js` (display_name → api_id lookup), `openRouterSelection.js` (shared OpenRouter selector auto-fill helpers using model context and provider endpoint caps), `autonomousProfiles.js` (shared recommended-profile definitions + persistence helpers; when editing a preset, anchor to the exact profile block and exact nested role such as `validator` or `writer`, never to a shared literal alone, then verify the diff only touched that intended profile/role), `leanojProfiles.js` (LeanOJ profile helpers), `safeStorage.js` (defensive localStorage reads), `activityStyles.js` (live-activity messages/classes), `runtimeConfig.js`, `researchRunHistory.js`, `disclaimerHelper.js`, `api.js`, `websocket.js` diff --git a/.cursor/rules/rag-design-for-overall-program.mdc b/.cursor/rules/rag-design-for-overall-program.mdc index 7eaee13..8158900 100644 --- a/.cursor/rules/rag-design-for-overall-program.mdc +++ b/.cursor/rules/rag-design-for-overall-program.mdc @@ -32,6 +32,12 @@ These priorities apply to Aggregator, Compiler, and Autonomous paper-writing wor Autonomous reference papers selected for paper compilation are currently loaded into the compiler RAG context rather than always being direct-injected. Standalone browsing/selection helpers may direct-inject expanded papers when they fit and use RAG when they do not. +### Unified Proof-Search Result Blocks + +SyntheticLib4 / MOTO proof-search tool results are capped at 7 combined proofs per call. The retrieved proof block is optional direct-first context, not mandatory context. If prompt budget is tight, every other optional source that is not guaranteed direct-inject must be offloaded before the 7-proof block is offloaded. The 7-proof block is therefore the last rag-able/offloadable item in proof-producing prompt assembly. + +Mandatory proof context remains non-negotiable: user prompt, current source content required by the proof workflow, current proof candidate/formalization attempt, Lean errors, LeanOJ `master_proof.lean`, role/schema instructions, and existing highest-priority verified-proof summaries/failure hints that are explicitly direct-injected by MOTO rules must not be displaced by retrieved external proof examples. If the full returned Lean code for all 7 results cannot fit after higher-priority optional sources are offloaded, the proof-search adapter should return fewer hydrated proofs, metadata-only entries, or retrieval guidance rather than crowding out mandatory context. + ### LeanOJ Proof-Only Offload Order These priorities apply only to the LeanOJ proof solver. LeanOJ stores proof artifacts under session-scoped sources such as `leanoj_{session_id}_accepted_ideas` and retrieves with `include_source_prefixes=[f"leanoj_{session_id}_"]`. Do not apply these orders to paper-writing prompts. @@ -149,6 +155,8 @@ User-uploaded files: pre-generate ALL 4 configurations. Dynamic files (training 3. For each content item in priority order: direct inject if `tokens <= remaining_tokens - 5000`, else RAG 4. `rag_max_tokens = max(0, available_tokens - mandatory_tokens - direct_content_tokens - 200)` +When the content item is the SyntheticLib4 / MOTO unified proof-search result block, treat it as the final optional rag-able block: offload all other rag-able optional sources first, then downgrade proof-search results only if the role still cannot fit the prompt with the configured output reserve. + **Key Invariant**: Context allocator returns content parts only. Prompt builder adds template parts (system prompt, JSON, user prompt). Both must be counted to avoid overflow. **Overflow handling**: User prompt always direct injected; if exceeds `context_window - minimum_RAG_allocation`: HALT with error. Mandatory direct-inject content that does not fit: HALT with explicit context-overflow error. Non-mandatory content too large: offload to RAG. If RAG returns no usable evidence, metadata/browsing helpers may fall back to bounded summaries or abstracts; proof/formalization and other mandatory-source paths must fail visibly instead. @@ -201,7 +209,7 @@ User-uploaded files: pre-generate ALL 4 configurations. Dynamic files (training **Autonomous (Part 3)**: Per-topic brainstorm databases; paper-compilation references and prior brainstorm papers are loaded as high-priority RAG evidence, while the current brainstorm DB is direct source context for construction/retroactive correction and paper-writing rigor/proof mode when available. Metadata and browsing agents use bounded summaries/abstracts when appropriate; all agents validate prompt size before LLM calls. -**Proof Verification Stage (optional, gated on `lean4_enabled`)**: Proof identification, formalization, and lemma search agents operate outside the RAG pipeline and use mandatory direct source context rather than excerpt-only RAG. Candidate discovery is novelty-first and skips not-novel/missing-tier candidates before Lean cost. Verified `ProofRecord` summaries and `FailedProofCandidate` hints (from `proof_prompts.format_failure_hints_for_injection`) are **highest-priority direct injections** into subsequent brainstorm/paper submitter prompts when present — never RAG'd. Compiler rigor/paper-writing proof mode direct-injects available source brainstorm/aggregator context alongside the current paper; supplemental references/prior papers remain RAG evidence. Lean source files under the session `proofs/` directory are not indexed into Chroma. +**Proof Verification Stage (optional, gated on `lean4_enabled`)**: Proof identification, formalization, and lemma search agents operate outside the RAG pipeline and use mandatory direct source context rather than excerpt-only RAG. Candidate discovery is impact-first and skips not-novel/missing-tier candidates before Lean cost. Verified `ProofRecord` summaries and `FailedProofCandidate` hints (from `proof_prompts.format_failure_hints_for_injection`) are **highest-priority direct injections** into subsequent brainstorm/paper submitter prompts when present — never RAG'd. Compiler rigor/paper-writing proof mode direct-injects available source brainstorm/aggregator context alongside the current paper; supplemental references/prior papers remain RAG evidence. SyntheticLib4 / MOTO proof-search result blocks are optional proof examples capped at 7 combined results; they are rag-able, but only after every other rag-able optional source has been offloaded. Lean source files under the session `proofs/` directory are not indexed into Chroma. **LeanOJ Proof Solver**: LeanOJ useful proof memory uses the existing RAG pipeline through `backend/leanoj/core/leanoj_context.py`, not a separate/simple retriever. Mandatory prompt inputs (user problem, Lean template, role task, JSON schema) stay direct. Useful artifacts are persisted in full and indexed under session-scoped `leanoj_{session_id}_*` sources when they are eligible for that phase. Final proof editing receives verified subproofs plus accepted `active_plan` notes as proof evidence; ordinary partial scaffolds and failed attempts are excluded except as immediate compact execution feedback. Current final-cycle failure packets are direct context for the next brainstorm/proof-fragment phase; older final-cycle packets remain available through scoped RAG only. Recent rejection/error summaries remain capped direct feedback. During final proof-editing, allocation is narrower: no historical final-cycle packets, no failed-attempt counts, and no phase-transition/path vocabulary; the prompt may still include the most recent 5 final attempts as capped execution feedback so the solver does not repeat stale edits or ignored Lean errors. Validator feedback from rejected non-progressive master-proof shortening edits may be direct feedback because it tells the next final solver what proof progress to restore. The canonical LeanOJ master proof draft (`master_proof.lean`) is file-backed working state, not a RAG artifact: during the final proof-editing loop it is mandatory direct-inject context and must be shown fully or the program must halt with a mandatory direct-context overflow error. Edits always apply to the full persisted proof. diff --git a/.dockerignore b/.dockerignore index 8e3b315..b681bb4 100644 --- a/.dockerignore +++ b/.dockerignore @@ -23,12 +23,28 @@ backend/data/ backend/logs/ .moto_instances/ .moto_launcher_state.json +.moto_last_instance.json .moto_update_notice.json +backend/data/.moto_runtime.lock tests/ web conversion plans/ commits_pending.txt proof-integration-build*-plan.md +PENDING_TESTS +BUILD_PLAN_*.md +(old) SyntheticLib Upgrade Plans/ +HARDOJ_AWS_COMPUTE_DONATION_OUTLINE.md +LEANOJ_MASTER_PROOF_WRITER_REMAINDER.md +LEANOJ_PROBLEM_11_PROMPT.md +randomlog*.txt +RANDOM LOG.txt +current_prompt.md +recovered_manual_aggregator_prompt.txt +leanoj_master_proof_*.lean.txt +PicksTheorem_FinalSkeleton.lean +CLASSICALPICKS.LEAN +moto-zap-quickscan-report.html Click To Launch MOTO.bat linux-ubuntu-launcher.sh diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 20289bd..c4636e7 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -7,6 +7,9 @@ on: - main - "dev/**" +permissions: + contents: read + jobs: python-tests: name: Python tests diff --git a/.gitignore b/.gitignore index f4c94eb..01c6f69 100644 --- a/.gitignore +++ b/.gitignore @@ -74,6 +74,7 @@ backend/data/auto_final_answer/* backend/data/auto_sessions/ backend/data/leanoj_sessions/ backend/data/leanoj_artifacts/ +backend/data/leanoj_partial_proofs/ # Proof verification artifacts (Lean 4 / Z3 hybrid mode) backend/data/proofs/* @@ -94,7 +95,9 @@ backend/data/auto_research_topic_rejections.txt backend/data/auto_api_log.txt backend/data/aggregator_results.txt backend/data/manual_aggregator_prompt.txt +backend/data/manual_compiler_prompt.txt backend/data/runtime_settings.json +backend/data/provider_notifications.json backend/data/chroma_db/* !backend/data/chroma_db/.gitkeep @@ -102,6 +105,8 @@ backend/data/chroma_db/* backend/data/boost_api_log.txt backend/data/boost_state.json backend/data/model_cache.json +backend/data/proof_search/ +backend/data/syntheticlib4/ backend/data/paper_version_*.txt backend/data/critique_feedback_*.txt backend/data/critique_rejection_feedback.txt @@ -136,11 +141,17 @@ randomlog.txt randomlog*.txt leanoj_master_proof_*.lean.txt commits_pending.txt +current_prompt.md recovered_manual_aggregator_prompt.txt PicksTheorem_FinalSkeleton.lean CLASSICALPICKS.LEAN +backend/data/.moto_runtime.lock +moto-zap-quickscan-report.html # Private/local planning notes that should not be published +PENDING_TESTS +BUILD_PLAN_*.md +(old) SyntheticLib Upgrade Plans/ HARDOJ_AWS_COMPUTE_DONATION_OUTLINE.md LEANOJ_MASTER_PROOF_WRITER_REMAINDER.md LEANOJ_PROBLEM_11_PROMPT.md diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 90be93c..d8aef46 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -277,7 +277,7 @@ Test with various model combinations: - Medium models (30B-40B) - Large models (70B+) - OpenRouter models (GPT-4, Claude, etc.) -- OpenAI Codex models through `Cloud Access & Keys` when touching cloud credential/provider routing +- OpenAI Codex models through `OpenRouter/OAuth` when touching cloud credential/provider routing ### Load Testing diff --git a/README.md b/README.md index 2f561d0..986b1bf 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # MOTO Autonomous ASI ## Autonomous Prototype Superintelligence - Automated Theorem Generation with Lean 4 Math Proof Verification -**Version: 1.1.0** +**Version: 1.1.01** [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/) @@ -8,7 +8,7 @@ **A breakthrough in AI automated theorem generation. MOTO is an autonomous research system powered by Intrafere Research Group's new prototype-superintelligence discovery of [Top-P Exploration Through Structured Brainstorming & Validated Feedback](https://intrafere.com/structured-brainstorming-validated-feedback/): a combination of reiterative brainstorming, validation, feedback, and pruning that creates prototype-level superintelligence using creative/combinatory multi-model data from nearly any combination of AI models. When enabled, MOTO pairs this exploration with Lean 4 machine-checked proofs for the exact formal theorem statements it successfully proves.** -**MOTO generates novel and publication-worthy research papers, and it can formalize candidate theorems and lemmas in Lean 4 while only storing proofs that Lean 4 accepts as mathematically verified. Lean 4 automation gives the user machine-checked verification for the exact formal statements produced, while informal papers and interpretations should still be reviewed with scrutiny. Unlike programs that may look similar, MOTO finds novel solutions with 0 creative user input required after launch: press start and let it run for hours or days while it autonomously searches for new solution paths. MOTO's Lean 4 architecture was developed before public reports of Google's AlphaProof Nexus and appears to share the broad pattern of iterative proof attempts aided by proof-sketch brainstorming, though AlphaProof Nexus itself is unreleased and not directly comparable from public information. This exact version of MOTO is customized to be useful for any discipline with an interest in creative and novel solution generation in S.T.E.M.: physicists, engineers, mathematicians, chemists, researchers, etc. This harness can also easily be modified for topics such as general academic research, chatbots, niche research, robotics, or anything requiring creative output and/or general autonomy. MOTO's novel brainstorming and rejection/validation stage allows autonomous long-term runtime without user intervention — if desired, research can be conducted for days or weeks without user input.** +MOTO generates novel and publication-worthy research papers, and it can formalize candidate theorems and lemmas in Lean 4 while only storing proofs that Lean 4 accepts as mathematically verified. Lean 4 automation gives the user machine-checked verification for the exact formal statements produced, while informal papers and interpretations should still be reviewed with scrutiny. Unlike programs that may look similar, MOTO finds novel solutions with 0 creative user input required after launch: press start and let it run for hours or days while it autonomously searches for new solution paths. This exact version of MOTO is customized to be useful for any discipline with an interest in creative and novel solution generation in S.T.E.M.: physicists, engineers, mathematicians, chemists, researchers, etc. This harness can also easily be modified for topics such as general academic research, chatbots, niche research, robotics, or anything requiring creative output and/or general autonomy. MOTO's novel brainstorming and rejection/validation stage allows autonomous long-term runtime without user intervention — if desired, research can be conducted for days or weeks without user input. ### The Core Discovery: Top-P Exploration @@ -28,7 +28,9 @@ Paired with Top-P Exploration — and secondary to it — MOTO has an **optional **Lean 4 is authoritative.** SMT results are hints only — they never substitute for Lean verification, and any proof that would compile only because of a `sorry` or `admit` is rejected. The pipeline is entirely silent and skipped when `lean4_enabled=False`, so it never blocks brainstorm or paper completion; the default hosted image stays Lean-free and Z3-free. A manual-check endpoint (`POST /api/proofs/check`) also lets you re-run the pipeline on any stored brainstorm or paper after the fact, and the compiler's "rigor mode" reuses the same Lean 4 checker to upgrade lemmas inside a paper as it's being written. -Give the program a try — MOTO is as cool as it sounds. Windows has a one-click launcher and Ubuntu 24.04 now has a repo-root launcher too. Use the two links below to download Python and Node.js, they should automatically install in seconds. Once those are downloaded, click the green "< > Code" drop-down menu on the top right of this GitHub page and download the zip file. On Windows, extract it to your desktop and double-click `Click To Launch MOTO.bat`. On Ubuntu 24.04, extract it and run `bash linux-ubuntu-launcher.sh`. Configure cloud access through **Cloud Access & Keys** with an OpenRouter API key, or connect LM Studio for local/faster performance. Desktop **oAuth** logins such as OpenAI Codex or xAI Grok/SuperGrok are supplementary model providers after startup, not a standalone startup path, because RAG embeddings still require OpenRouter, LM Studio, or hosted FastEmbed. xAI Console API keys are separate from Grok subscription OAuth and may consume API credits. Then select your agents in the settings profile - if desired and you are unsure you may use the preselected "fastest" profile. +MOTO also includes a non-blocking Assistant memory layer. As submitters, writers, proof solvers, and LeanOJ solvers work, MOTO observes the current prompt/phase/target, searches local verified proof history and SyntheticLib4 when enabled, and prepares an up-to-7 memory-support pack. The main workflow never waits for this search; when a relevant pack is ready, it is injected as optional supporting context into later eligible producer calls. Validators and critique phases never receive Assistant memory. + +Give the program a try — MOTO is as cool as it sounds. Windows has a one-click launcher and Ubuntu 24.04 now has a repo-root launcher too. Use the two links below to download Python and Node.js, they should automatically install in seconds. Once those are downloaded, click the green "< > Code" drop-down menu on the top right of this GitHub page and download the zip file. On Windows, extract it to your desktop and double-click `Click To Launch MOTO.bat`. On Ubuntu 24.04, extract it and run `bash linux-ubuntu-launcher.sh`. Configure provider access through **OpenRouter/OAuth** with an OpenRouter API key, or connect LM Studio for local/faster performance. Desktop **oAuth** logins such as OpenAI Codex or xAI Grok/SuperGrok are supplementary model providers after startup, not a standalone startup path, because RAG embeddings still require OpenRouter, LM Studio, or hosted FastEmbed. xAI Console API keys are separate from Grok subscription OAuth and may consume API credits. Then select your agents in the settings profile - if desired and you are unsure you may use the preselected "fastest" profile. ***Now you are set up and every time you press launch your home lab is ready for your prompt!*** **Give MOTO the toughest question you can think of and press start to begin YOUR creations!** @@ -43,7 +45,7 @@ MOTO (Multi-Output Token Orchestrator) is a high-risk high-reward (novelty seeki ### Key Features - 🤖 **Autonomous Topic Selection, Brainstorming, and Paper Generation**: AI chooses research avenues based on high-level goals and produces you a final answer with ZERO extra user input. Let MOTO run for days using the best models without touching it, or for a few hours using a faster draft model. How deep you research and how long it takes is left up to you, the user. -- **Cloud Access & Keys**: Supports local LM Studio models, OpenRouter API-key models, and desktop-only OAuth subscription logins as separate provider paths. Run local LM Studio models offline, use OpenRouter to access many third-party providers, or add OpenAI Codex/ChatGPT or xAI Grok/SuperGrok OAuth for subscription-backed chat/model roles. OAuth is supplementary because RAG embeddings are routed separately through LM Studio, OpenRouter, or hosted FastEmbed. +- **OpenRouter/OAuth**: Supports local LM Studio models, OpenRouter API-key models, and desktop-only OAuth subscription logins as separate provider paths. Run local LM Studio models offline, use OpenRouter to access many third-party providers, or add OpenAI Codex/ChatGPT or xAI Grok/SuperGrok OAuth for subscription-backed chat/model roles. OAuth is supplementary because RAG embeddings are routed separately through LM Studio, OpenRouter, or hosted FastEmbed. - **Optional Automated Theorem Generation (Lean 4)**: When enabled, every brainstorm and paper is run through a parallel proof pipeline that identifies theorem/lemma candidates, searches Mathlib for relevant lemmas, optionally runs Z3/SMT for conservative early-exit hints, then attempts Lean 4 formalization (up to 5 retries per candidate with failure-hint direct injection). Only Lean 4-verified proofs are stored, and novel proofs are fed back into subsequent brainstorming as highest-priority context. Secondary to Top-P Exploration and silent when disabled. --- @@ -65,9 +67,9 @@ Before installation, you need: - If using OpenRouter, then download and load at least one model (e.g., DeepSeek, Llama, Qwen - older models and some models below 12 billion parameters may struggle; however, it is always worth a try!) - **Load the LM Studio RAG agent [optional but HIGHLY recommended for much faster outputs/answers]**: Load the embedding model `nomic-ai/nomic-embed-text-v1.5` in your LM Studio "Developer" tab (server tab) (search for "nomic-ai/nomic-embed-text-v1.5" to download it in the LM Studio downloads center). Please note: you may need to enable "Power User" or "Developer" to see this developer tab - this server will let you load the amount and capacity of simultaneous models that your PC will support. In this developer tab is where you load both your nomic-ai embedding agent and any optional local hosted agents you want to use in the program (e.g., GPT OSS 20b, DeepSeek 32B, etc.). **If you do not download LM Studio and enable the Nomic agent the system will run much slower and cost slightly more due to having to use the paid service OpenRouter for RAG calls.** - Start the local server (port 1234) -4. **If using cloud AI - configure Cloud Access & Keys**: +4. **If using cloud AI - configure OpenRouter/OAuth**: - **OpenRouter API key**: Sign up at OpenRouter.ai and get a paid or free API key to use cloud models from many providers. You can see which models are free by checking the "show only free models" checkbox(es) in MOTO settings. - - **oAuth login (desktop only)**: In the `Cloud Access & Keys` overlay, choose the `OAuth` section to sign in through a supported subscription OAuth provider such as OpenAI Codex/ChatGPT or xAI Grok/SuperGrok. This is separate from regular API-key billing, unavailable in hosted/generic mode, and supplementary to an OpenRouter key or LM Studio because OAuth providers do not supply MOTO's RAG embeddings. + - **oAuth login (desktop only)**: In the `OpenRouter/OAuth` overlay, choose the `OAuth` section to sign in through a supported subscription OAuth provider such as OpenAI Codex/ChatGPT or xAI Grok/SuperGrok. This is separate from regular API-key billing, unavailable in hosted/generic mode, and supplementary to an OpenRouter key or LM Studio because OAuth providers do not supply MOTO's RAG embeddings. 5. **On first startup, pick your provider path**: After you acknowledge the disclaimer, MOTO will prompt you to configure OpenRouter or confirm that LM Studio is running with both `nomic-ai/nomic-embed-text-v1.5` and a usable chat model loaded. If you save an OpenRouter key there, the recommended default autonomous profile is applied immediately so you can open Settings and see it already selected. Desktop OAuth logins can also be configured from the header after startup, once the embedding-capable provider path is ready. #### Optional Lean 4 / SMT Proof Verification Requirements @@ -88,7 +90,7 @@ Lean 4 proof verification is optional. The launcher prepares it when available, 2. Start LM Studio and load your models and "nomic-embed-text-v1.5" agent **and/or** have your OpenRouter API key ready. Desktop OAuth logins are optional add-ons after startup. 3. **Double-click `Click To Launch MOTO.bat`** 4. After acknowledging the disclaimer, choose one of the startup setup paths: - - Open `Cloud Access & Keys` to enter your OpenRouter API key + - Open `OpenRouter/OAuth` to enter your OpenRouter API key - Optionally configure an OAuth provider login from the same header overlay after startup (desktop only) - Confirm that LM Studio is already running with a loaded model - Then open Settings to keep the recommended profile or switch to your saved team profile / another default profile @@ -138,7 +140,7 @@ bash linux-ubuntu-launcher.sh - Dirty or locally mutated repos remain runnable, but they are update-detection-only and are not eligible for automatic update-apply behavior. - If launcher-managed backend/frontend services from this install are still running, the updater warns and skips update-apply until those services are closed. - If GitHub `main` is reachable but `moto-update-manifest.json` is not published there yet, the launcher falls back to branch-head comparison and keeps update-apply disabled until the manifest is present. -- Clean git updateability is preserved by avoiding silent tracked-file mutations during normal startup; for example, the launcher no longer auto-runs `npm audit fix`. +- The launcher runs `npm audit fix` when `npm install` reports frontend vulnerabilities, including on clean git checkouts, so dependency CVE remediation remains automatic. - Preservation is defined against the active runtime roots, not only the default folders. The launcher may use `backend/data`, `backend/logs`, or instance-scoped `.moto_instances//...` roots, and browser storage prefixes plus OS-keyring namespaces are part of that same preserved state boundary. --- @@ -176,7 +178,7 @@ bash linux-ubuntu-launcher.sh 1. Go to **Compiler Interface** tab 2. Enter compiler-directing prompt (e.g., "Build a paper titled 'Modular Forms in the Langlands Program'") 3. Configure settings: - - Select validator, high-context, and high-parameter models + - Select Validator, Writing Submitter, and Rigor & Proofs Submitter models - Set context windows and output token limits 4. Click **Start Compiler** 5. Watch real-time paper construction in **Live Paper** tab @@ -243,22 +245,22 @@ moto-math-variant/ **Compiler**: - Validator model (coherence/rigor checking) -- High-context model (outline, construction, review) -- High-parameter model (rigor enhancement) +- Writing Submitter model (outline, construction, review) +- Rigor & Proofs Submitter model (Lean/proof work and rigor enhancement) **Autonomous Research**: - All aggregator and compiler roles configurable - Separate models for topic selection, completion review, etc. -### Cloud Access & Keys +### OpenRouter/OAuth Each role supports: - **Provider**: LM Studio (local), OpenRouter (cloud API key), or desktop OAuth providers such as OpenAI Codex and xAI Grok/SuperGrok - **Model Selection**: Choose from available models -- **Host/Provider**: Select specific OpenRouter provider (e.g., Anthropic, Google) +- **Host/Provider**: Select a specific OpenRouter provider when needed - **Fallback**: Optional LM Studio fallback if a cloud provider fails or runs out of credits -`Cloud Access & Keys` in the header is where you manage cloud credentials. OpenRouter keys are stored through the backend keyring in desktop/default mode and in memory in hosted/generic mode. Desktop OAuth logins store tokens securely on the desktop backend and are selected through the `oAuth` provider dropdown; supported providers include OpenAI Codex/ChatGPT and xAI Grok/SuperGrok. These OAuth paths are separate from regular API-key billing, are not available in hosted/generic mode, and cannot be the only startup provider because MOTO's RAG embeddings are not routed through OAuth providers. +`OpenRouter/OAuth` in the header is where you manage OpenRouter credentials and desktop OAuth logins. OpenRouter keys are stored through the backend keyring in desktop/default mode and in memory in hosted/generic mode. Desktop OAuth logins store tokens securely on the desktop backend and are selected through the `oAuth` provider dropdown; supported providers include OpenAI Codex/ChatGPT and xAI Grok/SuperGrok. These OAuth paths are separate from regular API-key billing, are not available in hosted/generic mode, and cannot be the only startup provider because MOTO's RAG embeddings are not routed through OAuth providers. ### Context and Output Settings diff --git a/backend/aggregator/agents/submitter.py b/backend/aggregator/agents/submitter.py index c22d702..4826bdd 100644 --- a/backend/aggregator/agents/submitter.py +++ b/backend/aggregator/agents/submitter.py @@ -12,12 +12,12 @@ from backend.shared.config import rag_config, system_config from backend.shared.models import Submission, SubmitterState from backend.shared.lm_studio_client import lm_studio_client -from backend.shared.api_client_manager import api_client_manager +from backend.shared.api_client_manager import OAuthProviderCooldownError, api_client_manager from backend.shared.brainstorm_proof_gate import is_lean_proof_submission, verify_brainstorm_proof_candidate from backend.shared.openrouter_client import FreeModelExhaustedError from backend.shared.json_parser import parse_json, sanitize_model_output_for_retry_context from backend.shared.response_extraction import extract_message_text -from backend.aggregator.core.context_allocator import context_allocator +from backend.aggregator.core.context_allocator import ContextAllocationError, context_allocator from backend.aggregator.core.queue_manager import queue_manager from backend.aggregator.memory.shared_training import shared_training_memory from backend.aggregator.memory.local_training import LocalTrainingMemory @@ -50,6 +50,7 @@ def __init__( local_rejection_log_dir: Optional[Any] = None, local_rejection_log_template: Optional[str] = None, reset_local_rejection_log_on_initialize: bool = False, + assistant_workflow_mode_override: Optional[str] = None, ): self.submitter_id = submitter_id self.model_name = model_name @@ -62,6 +63,7 @@ def __init__( self.websocket_broadcaster = websocket_broadcaster self.coordinator = coordinator self.creativity_emphasis_boost_enabled = creativity_emphasis_boost_enabled + self.assistant_workflow_mode_override = assistant_workflow_mode_override # Per-submitter context settings (fall back to global config if not provided) self.context_window = context_window if context_window is not None else rag_config.submitter_context_window @@ -188,6 +190,24 @@ async def _run_loop(self) -> None: # All free models exhausted after retries - wait briefly and retry logger.warning(f"Submitter {self.submitter_id}: all free models exhausted: {e}") await asyncio.sleep(120) # Wait before retrying (all models exhausted) + except OAuthProviderCooldownError as e: + logger.warning( + "Submitter %s paused for OAuth provider cooldown: %s", + self.submitter_id, + e, + ) + await api_client_manager.wait_for_oauth_provider_cooldown( + e, + role_id=self.role_id, + ) + except ContextAllocationError as e: + logger.error("Submitter %s context overflow: %s", self.submitter_id, e) + if self.coordinator and hasattr(self.coordinator, "_handle_context_overflow"): + await self.coordinator._handle_context_overflow(e, role_id=self.role_id) + else: + self.is_running = False + self.state.is_active = False + break except Exception as e: logger.error(f"Submitter {self.submitter_id} error on iteration {iteration}: {e}", exc_info=True) await asyncio.sleep(5) @@ -240,6 +260,14 @@ async def _generate_submission(self) -> Optional[Submission]: lean4_enabled=system_config.lean4_enabled, ) + task_id = self.get_current_task_id() + await api_client_manager.prewarm_assistant_memory_context( + task_id=task_id, + role_id=self.role_id, + prompt=prompt, + workflow_mode_override=self.assistant_workflow_mode_override, + ) + # CRITICAL: Verify actual prompt size fits in context window from backend.shared.utils import count_tokens actual_prompt_tokens = count_tokens(prompt) @@ -279,11 +307,17 @@ async def _generate_submission(self) -> Optional[Submission]: actual_prompt_tokens = count_tokens(prompt) if actual_prompt_tokens > max_allowed_tokens: - logger.error( - f"Submitter {self.submitter_id}: Assembled prompt ({actual_prompt_tokens} tokens) exceeds context window " - f"({max_allowed_tokens} tokens after safety margin). This indicates a context allocation bug." + raise ContextAllocationError( + f"Submitter {self.submitter_id} context overflow: mandatory direct context requires " + f"{actual_prompt_tokens:,} tokens, but the submitter can only accept {max_allowed_tokens:,} " + f"input tokens (context window: {self.context_window:,}, output reserve: {self.max_output_tokens:,}). " + "This prompt must be direct-injected. Please condense into a new prompt and restart, " + "or select a submitter model with a larger context window.", + required_tokens=actual_prompt_tokens, + available_tokens=max_allowed_tokens, + context_window=self.context_window, + output_reserve=self.max_output_tokens, ) - return None # Skip this submission logger.debug(f"Submitter {self.submitter_id} prompt: {actual_prompt_tokens} tokens (max: {max_allowed_tokens})") @@ -297,8 +331,6 @@ async def _generate_submission(self) -> Optional[Submission]: else: logger.debug(f"Submitter {self.submitter_id}: All content direct-injected, no RAG context used") - # Generate task ID for tracking - task_id = self.get_current_task_id() self.task_sequence += 1 # Notify task started (for workflow panel) @@ -319,7 +351,8 @@ async def _generate_submission(self) -> Optional[Submission]: model=self.model_name, messages=[{"role": "user", "content": prompt}], temperature=self._generation_temperature(), - max_tokens=self.max_output_tokens # Per-submitter max output tokens + max_tokens=self.max_output_tokens, # Per-submitter max output tokens + _moto_assistant_workflow_mode=self.assistant_workflow_mode_override, ) call_metadata = api_client_manager.extract_call_metadata(response) break # Success @@ -341,6 +374,20 @@ async def _generate_submission(self) -> Optional[Submission]: logger.error( f"Submitter {self.submitter_id}: Failed to generate completion after {attempt + 1} attempts: {e}" ) + if "context" in error_msg.lower(): + raise ContextAllocationError( + f"Submitter {self.submitter_id} context overflow or provider context mismatch: " + f"the assembled prompt requires {actual_prompt_tokens:,} tokens and the configured " + f"input budget is {max_allowed_tokens:,} tokens (context window: {self.context_window:,}, " + f"output reserve: {self.max_output_tokens:,}). The provider still rejected the request " + "as too large, so the loaded/provider context is smaller than configured. Please condense " + "into a new prompt and restart, select a larger-context model, or reload the local model " + "with the configured context window.", + required_tokens=actual_prompt_tokens, + available_tokens=max_allowed_tokens, + context_window=self.context_window, + output_reserve=self.max_output_tokens, + ) from e # Notify task completed (failed but still completed) if self.task_tracking_callback: self.task_tracking_callback("completed", task_id) @@ -348,6 +395,8 @@ async def _generate_submission(self) -> Optional[Submission]: except FreeModelExhaustedError: raise + except OAuthProviderCooldownError: + raise except RuntimeError as e: if "credits exhausted" in str(e).lower(): raise FreeModelExhaustedError(str(e)) @@ -720,6 +769,8 @@ async def _generate_submission(self) -> Optional[Submission]: return submission + except ContextAllocationError: + raise except Exception as e: logger.error(f"Submitter {self.submitter_id} failed to generate submission: {e}") return None diff --git a/backend/aggregator/agents/validator.py b/backend/aggregator/agents/validator.py index 626e7b0..11afd3c 100644 --- a/backend/aggregator/agents/validator.py +++ b/backend/aggregator/agents/validator.py @@ -14,7 +14,7 @@ from backend.shared.openrouter_client import FreeModelExhaustedError from backend.shared.json_parser import parse_json, sanitize_model_output_for_retry_context from backend.shared.response_extraction import extract_message_text -from backend.aggregator.core.context_allocator import context_allocator +from backend.aggregator.core.context_allocator import ContextAllocationError, context_allocator from backend.aggregator.memory.shared_training import shared_training_memory from backend.aggregator.prompts.validator_prompts import ( build_validator_prompt, @@ -111,7 +111,7 @@ async def validate_submission(self, submission: Submission) -> ValidationResult: return quality_result - except FreeModelExhaustedError: + except (FreeModelExhaustedError, ContextAllocationError): raise except Exception as e: logger.error(f"Validation failed: {e}") @@ -159,16 +159,16 @@ async def _assess_quality(self, submission: Submission) -> ValidationResult: configured_context = context_allocator.validator_context_window if actual_prompt_tokens > max_allowed_tokens: - logger.error( - f"Validator: Assembled prompt ({actual_prompt_tokens} tokens) exceeds context window " - f"({max_allowed_tokens} tokens after safety margin). This indicates a context allocation bug." - ) - return ValidationResult( - submission_id=submission.submission_id, - decision="reject", - reasoning=f"Internal error: Prompt too large ({actual_prompt_tokens} tokens > {max_allowed_tokens} max)", - summary="Internal context overflow error", - json_valid=False + raise ContextAllocationError( + f"Validator context overflow: assembled prompt requires {actual_prompt_tokens:,} tokens, " + f"but the validator can only accept {max_allowed_tokens:,} input tokens " + f"(context window: {configured_context:,}, output reserve: {rag_config.validator_max_output_tokens:,}). " + "A complete and honest validation requires direct context injection. Please condense into a new prompt " + "and restart, or select a validator model with a larger context window.", + required_tokens=actual_prompt_tokens, + available_tokens=max_allowed_tokens, + context_window=configured_context, + output_reserve=rag_config.validator_max_output_tokens, ) logger.debug( @@ -241,14 +241,23 @@ async def _assess_quality(self, submission: Submission) -> ValidationResult: await asyncio.sleep(backoff_time) continue else: - # Final retry or non-recoverable error logger.error(f"Validator: Failed to generate validation after {attempt + 1} attempts: {e}") - # Provide context-specific error message if "context" in error_msg.lower(): - summary = "LM Studio context window mismatch - check logs" - else: - summary = "Internal error" + raise ContextAllocationError( + f"Validator context overflow or provider context mismatch: the assembled prompt " + f"requires {actual_prompt_tokens:,} tokens and the configured validator input budget " + f"is {max_allowed_tokens:,} tokens (context window: {configured_context:,}, " + f"output reserve: {rag_config.validator_max_output_tokens:,}). The provider still rejected " + "the request as too large, so the loaded/provider context is smaller than configured. " + "A complete and honest validation requires direct context injection. Please condense into " + "a new prompt and restart, select a validator model with a larger context window, or reload " + "the local model with the configured context window.", + required_tokens=actual_prompt_tokens, + available_tokens=max_allowed_tokens, + context_window=configured_context, + output_reserve=rag_config.validator_max_output_tokens, + ) from e # Notify task completed (failed but still completed) if self.task_tracking_callback: @@ -258,7 +267,7 @@ async def _assess_quality(self, submission: Submission) -> ValidationResult: submission_id=submission.submission_id, decision="reject", reasoning=f"Quality assessment error: {e}", - summary=summary, + summary="Internal error", json_valid=False ) @@ -428,6 +437,8 @@ async def _assess_quality(self, submission: Submission) -> ValidationResult: return result + except ContextAllocationError: + raise except Exception as e: logger.error(f"Quality assessment failed: {e}") return ValidationResult( @@ -600,20 +611,18 @@ async def _assess_batch_quality(self, submissions: List[Submission]) -> List[Val max_allowed_tokens = rag_config.get_available_input_tokens(context_allocator.validator_context_window, rag_config.validator_max_output_tokens) if actual_prompt_tokens > max_allowed_tokens: - logger.error( - f"Batch validator: Prompt ({actual_prompt_tokens} tokens) exceeds context window " - f"({max_allowed_tokens} tokens). Rejecting entire batch." + raise ContextAllocationError( + f"Validator context overflow: batch validation prompt requires {actual_prompt_tokens:,} tokens, " + f"but the validator can only accept {max_allowed_tokens:,} input tokens " + f"(context window: {context_allocator.validator_context_window:,}, " + f"output reserve: {rag_config.validator_max_output_tokens:,}). A complete and honest batch validation " + "requires direct context injection. Please condense into a new prompt and restart, or select a " + "validator model with a larger context window.", + required_tokens=actual_prompt_tokens, + available_tokens=max_allowed_tokens, + context_window=context_allocator.validator_context_window, + output_reserve=rag_config.validator_max_output_tokens, ) - return [ - ValidationResult( - submission_id=s.submission_id, - decision="reject", - reasoning="Internal error: Batch prompt too large", - summary="Internal context overflow error", - json_valid=False - ) - for s in submissions - ] logger.debug(f"Batch validator prompt: {actual_prompt_tokens} tokens (max: {max_allowed_tokens})") @@ -754,9 +763,22 @@ async def _assess_batch_quality(self, submissions: List[Submission]) -> List[Val return results - except FreeModelExhaustedError: + except (FreeModelExhaustedError, ContextAllocationError): raise except Exception as e: + error_text = str(e) + if "context" in error_text.lower(): + raise ContextAllocationError( + f"Validator context overflow or provider context mismatch during batch validation: " + f"the validator prompt could not be submitted with the configured context window " + f"({context_allocator.validator_context_window:,}) and output reserve " + f"({rag_config.validator_max_output_tokens:,}). A complete and honest batch validation " + "requires direct context injection. Please condense into a new prompt and restart, " + "select a validator model with a larger context window, or reload the local model with " + "the configured context window.", + context_window=context_allocator.validator_context_window, + output_reserve=rag_config.validator_max_output_tokens, + ) from e logger.error(f"Batch quality assessment failed: {e}", exc_info=True) return [ ValidationResult( diff --git a/backend/aggregator/core/context_allocator.py b/backend/aggregator/core/context_allocator.py index 2b2cec5..fbcc8eb 100644 --- a/backend/aggregator/core/context_allocator.py +++ b/backend/aggregator/core/context_allocator.py @@ -15,8 +15,22 @@ class ContextAllocationError(Exception): - """Raised when context allocation fails.""" - pass + """Raised when required direct-injected context cannot fit.""" + + def __init__( + self, + message: str, + *, + required_tokens: int | None = None, + available_tokens: int | None = None, + context_window: int | None = None, + output_reserve: int | None = None, + ) -> None: + super().__init__(message) + self.required_tokens = required_tokens + self.available_tokens = available_tokens + self.context_window = context_window + self.output_reserve = output_reserve class ContextAllocator: @@ -35,6 +49,34 @@ def __init__(self): # Default max output tokens self.submitter_max_output_tokens = rag_config.submitter_max_output_tokens self.validator_max_output_tokens = rag_config.validator_max_output_tokens + + @staticmethod + def _raise_no_rag_budget( + *, + role: str, + available_tokens: int, + already_allocated: int, + rag_formatting_overhead: int, + context_window: int, + output_reserve: int, + ) -> None: + raise ContextAllocationError( + f"{role} context overflow: some lower-priority context must be offloaded to RAG, " + "but no input budget remains for retrieved evidence after mandatory/direct context.", + required_tokens=already_allocated + rag_formatting_overhead + 1, + available_tokens=available_tokens, + context_window=context_window, + output_reserve=output_reserve, + ) + + @staticmethod + def _remove_direct_part(direct_parts: List[str], content: str) -> bool: + """Remove one exact direct-injected block so it can be served through RAG.""" + try: + direct_parts.remove(content) + return True + except ValueError: + return False def set_context_windows(self, submitter_context: int, validator_context: int, submitter_max_output: int = None, validator_max_output: int = None): @@ -131,10 +173,17 @@ async def allocate_submitter_context( # Check if user prompt alone exceeds limits if user_prompt_tokens > (available_tokens - minimum_rag_allocation): + direct_limit = available_tokens - minimum_rag_allocation raise ContextAllocationError( - f"User prompt ({user_prompt_tokens} tokens) exceeds maximum allowed " - f"({available_tokens - minimum_rag_allocation} tokens). " - f"Please shorten your prompt." + f"User prompt ({user_prompt_tokens:,} tokens) exceeds the configured submitter context limit. " + f"The submitter can only accept {direct_limit:,} direct input tokens " + f"(context window: {ctx_window:,}, output reserve: {self.submitter_max_output_tokens:,}). " + "This prompt must be direct-injected. Please condense the prompt and restart, " + "or select a model with a larger context window.", + required_tokens=user_prompt_tokens, + available_tokens=direct_limit, + context_window=ctx_window, + output_reserve=self.submitter_max_output_tokens, ) remaining_tokens = available_tokens - mandatory_tokens @@ -155,23 +204,18 @@ async def allocate_submitter_context( needs_rejection_log_rag = False needs_user_files_rag = False - # Priority 1: Shared training - try direct injection first - # BUT: Reserve minimum space for RAG (at least 5000 tokens) if content needs to be offloaded - minimum_rag_reserve = 5000 # Ensure meaningful RAG retrieval space + # Priority 1: Shared training - direct-first. Do not reserve space for + # RAG unless some content actually fails to fit. if shared_training_content: formatted = f"[SHARED TRAINING]\n{shared_training_content}" tokens = count_tokens(formatted) - # Direct inject only if it fits AND leaves enough space for other content + RAG - if tokens <= remaining_tokens and (tokens < remaining_tokens - minimum_rag_reserve): + if tokens <= remaining_tokens: direct_parts.append(formatted) remaining_tokens -= tokens logger.debug(f"Submitter: Shared training direct injected ({tokens} tokens)") else: needs_shared_training_rag = True - if tokens > remaining_tokens: - logger.info(f"Submitter: Shared training offloaded to RAG ({tokens} tokens > {remaining_tokens} available)") - else: - logger.info(f"Submitter: Shared training offloaded to RAG ({tokens} tokens would leave insufficient RAG space)") + logger.info(f"Submitter: Shared training offloaded to RAG ({tokens} tokens > {remaining_tokens} available)") # Priority 2: Local training - try direct injection first if local_training_content: @@ -215,7 +259,48 @@ async def allocate_submitter_context( # Perform RAG retrieval ONLY if content was offloaded rag_context = None if any([needs_shared_training_rag, needs_local_training_rag, needs_rejection_log_rag, needs_user_files_rag]): - # Build exclusion list: sources that were direct-injected should not appear in RAG + # Available space for RAG (with buffer for RAG formatting overhead) + # RAG content will be wrapped with "\n---\nRETRIEVED EVIDENCE:\n{rag_text}" which adds tokens + rag_formatting_overhead = count_tokens("\n---\nRETRIEVED EVIDENCE:\n") + safety_buffer = 500 # Increased buffer for final prompt assembly + RAG wrapping + + def calculate_rag_budget() -> tuple[int, int, int]: + direct_content_tokens = count_tokens("\n\n".join(direct_parts)) + already_allocated = mandatory_tokens + direct_content_tokens + max_rag_space = available_tokens - already_allocated - safety_buffer - rag_formatting_overhead + return max(0, max_rag_space), direct_content_tokens, already_allocated + + rag_max_tokens, direct_content_tokens, already_allocated = calculate_rag_budget() + if rag_max_tokens <= 0: + if ( + user_files_str + and not needs_user_files_rag + and self._remove_direct_part(direct_parts, user_files_str) + ): + needs_user_files_rag = True + logger.info("Submitter: User files moved from direct context to RAG to preserve retrieval budget") + rag_max_tokens, direct_content_tokens, already_allocated = calculate_rag_budget() + if ( + rag_max_tokens <= 0 + and shared_training_content + and not needs_shared_training_rag + ): + shared_direct = f"[SHARED TRAINING]\n{shared_training_content}" + if self._remove_direct_part(direct_parts, shared_direct): + needs_shared_training_rag = True + logger.info("Submitter: Shared training moved from direct context to RAG to preserve retrieval budget") + rag_max_tokens, direct_content_tokens, already_allocated = calculate_rag_budget() + if rag_max_tokens <= 0: + self._raise_no_rag_budget( + role="Submitter", + available_tokens=available_tokens, + already_allocated=already_allocated, + rag_formatting_overhead=rag_formatting_overhead + safety_buffer, + context_window=ctx_window, + output_reserve=max_output, + ) + + # Build exclusion list after any budget-preserving direct-to-RAG moves. exclude_sources = [] if not needs_shared_training_rag and shared_training_content: exclude_sources.extend(self._get_shared_training_rag_sources()) @@ -224,23 +309,6 @@ async def allocate_submitter_context( if exclude_sources: exclude_sources = list(dict.fromkeys(exclude_sources)) - # FIXED: Calculate RAG budget from REMAINING space after direct injection - # This ensures we maximize context usage without exceeding limits - direct_content_temp = "\n\n".join(direct_parts) - direct_content_tokens = count_tokens(direct_content_temp) - - # Total tokens already allocated - already_allocated = mandatory_tokens + direct_content_tokens - - # Available space for RAG (with buffer for RAG formatting overhead) - # RAG content will be wrapped with "\n---\nRETRIEVED EVIDENCE:\n{rag_text}" which adds tokens - rag_formatting_overhead = count_tokens("\n---\nRETRIEVED EVIDENCE:\n") - safety_buffer = 500 # Increased buffer for final prompt assembly + RAG wrapping - max_rag_space = available_tokens - already_allocated - safety_buffer - rag_formatting_overhead - - # Use as much as possible for RAG while respecting the limit - rag_max_tokens = max(0, max_rag_space) # Ensure non-negative - logger.info( f"Submitter: Performing RAG retrieval (max {rag_max_tokens} tokens) for offloaded content. " f"Breakdown: available={available_tokens}, mandatory={mandatory_tokens}, " @@ -318,11 +386,37 @@ async def allocate_validator_context( mandatory_tokens = user_prompt_tokens + json_schema_tokens + system_prompt_tokens + submission_tokens + assembly_overhead - # Check if user prompt alone exceeds limits + # Check if the required prompt/proof block alone exceeds limits. In + # autonomous mode this may include verified-proof context intentionally + # prepended to the prompt; that context is mandatory for validation and + # must fail visibly instead of being silently offloaded or counted as a + # normal rejection. if user_prompt_tokens > (available_tokens - minimum_rag_allocation): + direct_limit = available_tokens - minimum_rag_allocation + raise ContextAllocationError( + f"Validator context overflow: mandatory direct context requires {user_prompt_tokens:,} tokens, " + f"but the validator can only accept {direct_limit:,} direct input tokens " + f"(context window: {self.validator_context_window:,}, output reserve: {self.validator_max_output_tokens:,}). " + "The validator must see the full user prompt and proof context without compromise. " + "Please condense into a new prompt and restart, or select a validator model with a larger context window.", + required_tokens=user_prompt_tokens, + available_tokens=direct_limit, + context_window=self.validator_context_window, + output_reserve=self.validator_max_output_tokens, + ) + + if mandatory_tokens > available_tokens: raise ContextAllocationError( - f"User prompt ({user_prompt_tokens} tokens) exceeds maximum allowed. " - f"Please shorten your prompt." + f"Validator context overflow: mandatory direct context requires {mandatory_tokens:,} tokens, " + f"but the validator can only accept {available_tokens:,} input tokens " + f"(context window: {self.validator_context_window:,}, output reserve: {self.validator_max_output_tokens:,}). " + "A complete and honest validation requires direct injection of the user prompt, proof context, " + "JSON contract, system instructions, and submission under review. Please condense into a new prompt " + "and restart, or select a validator model with a larger context window.", + required_tokens=mandatory_tokens, + available_tokens=available_tokens, + context_window=self.validator_context_window, + output_reserve=self.validator_max_output_tokens, ) remaining_tokens = available_tokens - mandatory_tokens @@ -344,14 +438,12 @@ async def allocate_validator_context( needs_shared_training_rag = False needs_user_files_rag = False - # Priority 1: Shared training - try direct injection first - # BUT: Reserve minimum space for RAG (at least 5000 tokens) if content needs to be offloaded - minimum_rag_reserve = 5000 # Ensure meaningful RAG retrieval space + # Priority 1: Shared training - direct-first. RAG is a fallback for + # content that cannot fit, not a reason to offload content that fits. if shared_training_content: formatted_training = f"[SHARED TRAINING]\n{shared_training_content}" training_tokens = count_tokens(formatted_training) - # Direct inject only if it fits AND leaves enough space for other content + RAG - if training_tokens <= remaining_tokens and (training_tokens < remaining_tokens - minimum_rag_reserve): + if training_tokens <= remaining_tokens: # Fits - use direct injection direct_parts.append(formatted_training) remaining_tokens -= training_tokens @@ -359,10 +451,7 @@ async def allocate_validator_context( else: # Doesn't fit - offload to RAG needs_shared_training_rag = True - if training_tokens > remaining_tokens: - logger.info(f"Validator: Shared training offloaded to RAG ({training_tokens} tokens > {remaining_tokens} available)") - else: - logger.info(f"Validator: Shared training offloaded to RAG ({training_tokens} tokens would leave insufficient RAG space)") + logger.info(f"Validator: Shared training offloaded to RAG ({training_tokens} tokens > {remaining_tokens} available)") # Priority 2: User files - try direct injection first user_files_str = "" @@ -384,7 +473,48 @@ async def allocate_validator_context( # Perform RAG retrieval ONLY if content was offloaded rag_context = None if needs_shared_training_rag or needs_user_files_rag: - # Build exclusion list: sources that were direct-injected should not appear in RAG + # Available space for RAG (with buffer for RAG formatting overhead) + # RAG content will be wrapped with "\n---\nEXISTING KNOWLEDGE BASE (Retrieved):\n{rag_text}" which adds tokens + rag_formatting_overhead = count_tokens("\n---\nEXISTING KNOWLEDGE BASE (Retrieved):\n") + safety_buffer = 500 # Increased buffer for final prompt assembly + RAG wrapping + + def calculate_rag_budget() -> tuple[int, int, int]: + direct_content_tokens = count_tokens("\n\n".join(direct_parts)) + already_allocated = mandatory_tokens + direct_content_tokens + max_rag_space = available_tokens - already_allocated - safety_buffer - rag_formatting_overhead + return max(0, max_rag_space), direct_content_tokens, already_allocated + + rag_max_tokens, direct_content_tokens, already_allocated = calculate_rag_budget() + if rag_max_tokens <= 0: + if ( + user_files_str + and not needs_user_files_rag + and self._remove_direct_part(direct_parts, user_files_str) + ): + needs_user_files_rag = True + logger.info("Validator: User files moved from direct context to RAG to preserve retrieval budget") + rag_max_tokens, direct_content_tokens, already_allocated = calculate_rag_budget() + if ( + rag_max_tokens <= 0 + and shared_training_content + and not needs_shared_training_rag + ): + shared_direct = f"[SHARED TRAINING]\n{shared_training_content}" + if self._remove_direct_part(direct_parts, shared_direct): + needs_shared_training_rag = True + logger.info("Validator: Shared training moved from direct context to RAG to preserve retrieval budget") + rag_max_tokens, direct_content_tokens, already_allocated = calculate_rag_budget() + if rag_max_tokens <= 0: + self._raise_no_rag_budget( + role="Validator", + available_tokens=available_tokens, + already_allocated=already_allocated, + rag_formatting_overhead=rag_formatting_overhead + safety_buffer, + context_window=self.validator_context_window, + output_reserve=self.validator_max_output_tokens, + ) + + # Build exclusion list after any budget-preserving direct-to-RAG moves. exclude_sources = [] if not needs_shared_training_rag and shared_training_content: exclude_sources.extend(self._get_shared_training_rag_sources()) @@ -393,23 +523,6 @@ async def allocate_validator_context( if exclude_sources: exclude_sources = list(dict.fromkeys(exclude_sources)) - # FIXED: Calculate RAG budget from REMAINING space after direct injection - # This ensures we maximize context usage without exceeding limits - direct_content_temp = "\n\n".join(direct_parts) - direct_content_tokens = count_tokens(direct_content_temp) - - # Total tokens already allocated - already_allocated = mandatory_tokens + direct_content_tokens - - # Available space for RAG (with buffer for RAG formatting overhead) - # RAG content will be wrapped with "\n---\nEXISTING KNOWLEDGE BASE (Retrieved):\n{rag_text}" which adds tokens - rag_formatting_overhead = count_tokens("\n---\nEXISTING KNOWLEDGE BASE (Retrieved):\n") - safety_buffer = 500 # Increased buffer for final prompt assembly + RAG wrapping - max_rag_space = available_tokens - already_allocated - safety_buffer - rag_formatting_overhead - - # Use as much as possible for RAG while respecting the limit - rag_max_tokens = max(0, max_rag_space) # Ensure non-negative - logger.info( f"Validator: Performing RAG retrieval (max {rag_max_tokens} tokens) for offloaded content. " f"Breakdown: available={available_tokens}, mandatory={mandatory_tokens}, " @@ -515,15 +628,13 @@ async def allocate_cleanup_review_context( needs_submissions_rag = False needs_user_files_rag = False - # Reserve space for RAG if needed (at least 5000 tokens) - minimum_rag_reserve = 5000 - # Priority 1: All submissions - try direct injection first if all_submissions_formatted: submissions_tokens = count_tokens(all_submissions_formatted) - # Direct inject if it fits AND leaves space for other content - if submissions_tokens <= remaining_tokens and (submissions_tokens < remaining_tokens - minimum_rag_reserve): + # Direct inject whenever it fits; use RAG only for content that + # cannot fit directly. + if submissions_tokens <= remaining_tokens: direct_parts.append(f"[ALL SUBMISSIONS]\n{all_submissions_formatted}") remaining_tokens -= submissions_tokens logger.info(f"Cleanup: All submissions direct injected ({submissions_tokens} tokens)") @@ -554,7 +665,48 @@ async def allocate_cleanup_review_context( # Perform RAG retrieval if content was offloaded rag_context = None if needs_submissions_rag or needs_user_files_rag: - # Build exclusion list: sources that were direct-injected should not appear in RAG + # Available space for RAG + rag_formatting_overhead = count_tokens("\n---\nADDITIONAL CONTEXT (Retrieved):\n") + safety_buffer = 500 + + def calculate_rag_budget() -> tuple[int, int, int]: + direct_content_tokens = count_tokens("\n\n".join(direct_parts)) + already_allocated = mandatory_tokens + direct_content_tokens + max_rag_space = available_tokens - already_allocated - safety_buffer - rag_formatting_overhead + return max(0, max_rag_space), direct_content_tokens, already_allocated + + rag_max_tokens, direct_content_tokens, already_allocated = calculate_rag_budget() + if rag_max_tokens <= 0: + if ( + user_files_str + and not needs_user_files_rag + and self._remove_direct_part(direct_parts, user_files_str) + ): + needs_user_files_rag = True + logger.info("Cleanup: User files moved from direct context to RAG to preserve retrieval budget") + rag_max_tokens, direct_content_tokens, already_allocated = calculate_rag_budget() + if ( + rag_max_tokens <= 0 + and all_submissions_formatted + and not needs_submissions_rag + ): + submissions_direct = f"[ALL SUBMISSIONS]\n{all_submissions_formatted}" + if self._remove_direct_part(direct_parts, submissions_direct): + needs_submissions_rag = True + submissions_ragged = True + logger.info("Cleanup: Accepted submissions moved from direct context to RAG to preserve retrieval budget") + rag_max_tokens, direct_content_tokens, already_allocated = calculate_rag_budget() + if rag_max_tokens <= 0: + self._raise_no_rag_budget( + role="Cleanup review", + available_tokens=available_tokens, + already_allocated=already_allocated, + rag_formatting_overhead=rag_formatting_overhead + safety_buffer, + context_window=self.validator_context_window, + output_reserve=self.validator_max_output_tokens, + ) + + # Build exclusion list after any budget-preserving direct-to-RAG moves. exclude_sources = [] if not needs_submissions_rag and all_submissions_formatted: exclude_sources.extend(self._get_shared_training_rag_sources()) @@ -563,17 +715,6 @@ async def allocate_cleanup_review_context( if exclude_sources: exclude_sources = list(dict.fromkeys(exclude_sources)) - # Calculate RAG budget from remaining space - direct_content_temp = "\n\n".join(direct_parts) - direct_content_tokens = count_tokens(direct_content_temp) - already_allocated = mandatory_tokens + direct_content_tokens - - # Available space for RAG - rag_formatting_overhead = count_tokens("\n---\nADDITIONAL CONTEXT (Retrieved):\n") - safety_buffer = 500 - max_rag_space = available_tokens - already_allocated - safety_buffer - rag_formatting_overhead - rag_max_tokens = max(0, max_rag_space) - logger.info( f"Cleanup: Performing RAG retrieval (max {rag_max_tokens} tokens) for offloaded content. " f"Breakdown: available={available_tokens}, mandatory={mandatory_tokens}, " diff --git a/backend/aggregator/core/coordinator.py b/backend/aggregator/core/coordinator.py index bdab77a..00f3802 100644 --- a/backend/aggregator/core/coordinator.py +++ b/backend/aggregator/core/coordinator.py @@ -13,13 +13,19 @@ from backend.shared.models import SystemStatus, Submission, ValidationResult, SubmitterConfig, WorkflowTask, ModelConfig, ProofAttemptFeedback from backend.shared.lm_studio_client import lm_studio_client from backend.shared.rag_lock import rag_operation_lock -from backend.shared.api_client_manager import api_client_manager +from backend.shared.api_client_manager import OAuthProviderCooldownError, api_client_manager from backend.shared.openrouter_client import FreeModelExhaustedError from backend.shared.free_model_manager import free_model_manager from backend.shared.path_safety import resolve_path_within_root, validate_single_path_component from backend.shared.log_redaction import redact_log_text +from backend.shared.context_overflow import ( + CONTEXT_OVERFLOW_RESOLUTION, + CONTEXT_OVERFLOW_STOP_MESSAGE, + CONTEXT_OVERFLOW_STOP_REASON, +) from backend.aggregator.agents.submitter import SubmitterAgent from backend.aggregator.agents.validator import ValidatorAgent +from backend.aggregator.core.context_allocator import ContextAllocationError from backend.aggregator.core.queue_manager import queue_manager from backend.aggregator.core.rag_manager import rag_manager from backend.aggregator.memory.shared_training import shared_training_memory @@ -133,6 +139,8 @@ def __init__(self): self.enable_cleanup_review = True self.creativity_emphasis_boost_enabled = False self.persist_event_log = True + self.fatal_error_message: Optional[str] = None + self.fatal_error_type: Optional[str] = None # Optional source-level hard cap used by autonomous brainstorm mode. self.max_total_acceptances: Optional[int] = None @@ -238,6 +246,7 @@ async def initialize( local_rejection_log_dir: Optional[str] = None, local_rejection_log_template: Optional[str] = None, reset_local_rejection_logs_on_start: bool = False, + assistant_workflow_mode_override: Optional[str] = None, ) -> None: """ Initialize the coordinator with configuration. @@ -390,6 +399,8 @@ async def initialize( logger.info("Clearing RAG for fresh Part 1 aggregator session...") await asyncio.to_thread(rag_manager.clear_all_documents) logger.info("RAG cleared successfully for Part 1 aggregator") + await self._rebuild_shared_training_rag_after_cleanup() + logger.info("Persisted Part 1 accepted submissions re-indexed after RAG clear") else: logger.info("Skipping stats load (autonomous mode - starting fresh)") # Reset stats to 0 for autonomous brainstorm @@ -459,6 +470,7 @@ async def initialize( local_rejection_log_dir=local_rejection_log_dir, local_rejection_log_template=local_rejection_log_template, reset_local_rejection_log_on_initialize=reset_local_rejection_logs_on_start, + assistant_workflow_mode_override=assistant_workflow_mode_override, ) await submitter.initialize() # Set callback to add submissions to queue @@ -709,6 +721,8 @@ async def start(self) -> None: return self.is_running = True + self.fatal_error_message = None + self.fatal_error_type = None logger.info("Starting coordinator...") # Reset free model manager state for fresh start @@ -822,6 +836,9 @@ async def _validator_loop(self) -> None: # Batch validate results = await self.validator.validate_batch(submissions) + if not self.is_running or self.fatal_error_type: + logger.info("Skipping validator result processing after workflow stop/context overflow") + break # Process results for submission, result in zip(submissions, results): @@ -847,6 +864,15 @@ async def _validator_loop(self) -> None: "message": "All free models exhausted, waiting to retry", }) await asyncio.sleep(120) # Wait before retrying (all models exhausted) + except OAuthProviderCooldownError as e: + logger.warning("Validator paused for OAuth provider cooldown: %s", e) + await api_client_manager.wait_for_oauth_provider_cooldown( + e, + role_id="aggregator_validator", + ) + except ContextAllocationError as e: + await self._handle_context_overflow(e, role_id="aggregator_validator") + break except Exception as e: logger.error(f"Validator loop error on iteration {iteration}: {e}", exc_info=True) await asyncio.sleep(2) @@ -919,6 +945,9 @@ async def _single_model_workflow(self) -> None: results = await self.validator.validate_batch(submissions) validations_done += len(submissions) + if not self.is_running or self.fatal_error_type: + logger.info("Skipping single-model validator result processing after workflow stop/context overflow") + break for submission, result in zip(submissions, results): if result.decision == "accept" or self._is_verified_brainstorm_proof_submission(submission): @@ -951,6 +980,15 @@ async def _single_model_workflow(self) -> None: "message": "All free models exhausted, waiting to retry", }) await asyncio.sleep(120) # Wait before retrying (all models exhausted) + except OAuthProviderCooldownError as e: + logger.warning("Single-model workflow paused for OAuth provider cooldown: %s", e) + await api_client_manager.wait_for_oauth_provider_cooldown( + e, + role_id="aggregator_single_model", + ) + except ContextAllocationError as e: + await self._handle_context_overflow(e, role_id="aggregator_single_model") + break except Exception as e: logger.error(f"Single-model workflow error at round {round_number}: {e}", exc_info=True) await asyncio.sleep(5) @@ -1068,6 +1106,12 @@ async def _handle_acceptance_cap_reached(self, total_acceptances: int) -> None: logger.error("Acceptance cap callback failed: %s", e, exc_info=True) current_task = asyncio.current_task() + if ( + self._validator_task + and self._validator_task is not current_task + and not self._validator_task.done() + ): + await _cancel_and_drain_task(self._validator_task) for submitter in self.submitters: try: await submitter.stop() @@ -1207,7 +1251,7 @@ async def _handle_rejection(self, submission: Submission, result: ValidationResu creativity_prefix = "(Creativity Emphasized) " if creativity_emphasized else "" await self._add_persisted_event( "submission_rejected", - f"{creativity_prefix}Submission from Submitter {submission.submitter_id} REJECTED: {rejection_reason}", + f"{creativity_prefix}Submission from Submitter {submission.submitter_id} REJECTED WITH FEEDBACK: {rejection_reason}", { "submission_id": submission.submission_id, "submitter_id": submission.submitter_id, @@ -1218,6 +1262,45 @@ async def _handle_rejection(self, submission: Submission, result: ValidationResu # Save stats await self._save_stats() + + async def _handle_context_overflow(self, error: ContextAllocationError, *, role_id: str) -> None: + """Stop the workflow when mandatory direct context cannot fit.""" + self.fatal_error_type = "context_overflow" + self.fatal_error_message = str(error) + self.is_running = False + + logger.error("Fatal context overflow in %s: %s", role_id, error) + + current_task = asyncio.current_task() + for submitter in self.submitters: + try: + if submitter._task is current_task: + submitter.is_running = False + submitter.state.is_active = False + continue + await submitter.stop() + except Exception as stop_exc: + logger.warning( + "Failed to stop submitter %s after context overflow: %s", + submitter.submitter_id, + stop_exc, + ) + + await queue_manager.clear() + + payload = { + "role_id": role_id, + "reason": CONTEXT_OVERFLOW_STOP_REASON, + "message": CONTEXT_OVERFLOW_STOP_MESSAGE, + "error_detail": str(error), + "required_tokens": getattr(error, "required_tokens", None), + "available_tokens": getattr(error, "available_tokens", None), + "context_window": getattr(error, "context_window", None), + "output_reserve": getattr(error, "output_reserve", None), + "resolution": CONTEXT_OVERFLOW_RESOLUTION, + } + await self._broadcast("context_overflow_error", payload) + await self._add_persisted_event("context_overflow_error", payload["message"], payload) async def _perform_cleanup_review(self) -> None: """ @@ -1527,7 +1610,9 @@ async def get_status(self) -> SystemStatus: shared_training_size=shared_training_size, cleanup_reviews_performed=self.cleanup_reviews_performed, removals_proposed=self.removals_proposed, - removals_executed=self.removals_executed + removals_executed=self.removals_executed, + fatal_error_type=self.fatal_error_type, + fatal_error_message=self.fatal_error_message, ) async def get_results(self) -> str: diff --git a/backend/aggregator/core/rag_manager.py b/backend/aggregator/core/rag_manager.py index 48fb32c..81dc012 100644 --- a/backend/aggregator/core/rag_manager.py +++ b/backend/aggregator/core/rag_manager.py @@ -224,6 +224,7 @@ async def retrieve( candidates = await self._hybrid_recall( queries, chunk_size, + exclude_sources=exclude_sources, include_sources=include_sources, include_source_prefixes=include_source_prefixes, ) @@ -328,6 +329,7 @@ async def _hybrid_recall( self, queries: List[str], chunk_size: int, + exclude_sources: Optional[List[str]] = None, include_sources: Optional[List[str]] = None, include_source_prefixes: Optional[List[str]] = None ) -> List[Tuple[DocumentChunk, float]]: @@ -336,6 +338,7 @@ async def _hybrid_recall( # concurrent RAG add/remove operations mutating the live chunk lists. chunks = list(self._filter_chunks_by_source_scope( self.chunks_by_size[chunk_size], + exclude_sources=exclude_sources, include_sources=include_sources, include_source_prefixes=include_source_prefixes, )) @@ -499,18 +502,24 @@ def _bm25_search( def _filter_chunks_by_source_scope( chunks: List[DocumentChunk], *, + exclude_sources: Optional[List[str]] = None, include_sources: Optional[List[str]] = None, include_source_prefixes: Optional[List[str]] = None ) -> List[DocumentChunk]: - """Limit chunks to an explicit source allowlist and/or source prefixes.""" + """Limit chunks to an explicit source allowlist/prefixes and exclusions.""" + exclude_set = {source for source in (exclude_sources or []) if source} include_set = {source for source in (include_sources or []) if source} prefixes = tuple(prefix for prefix in (include_source_prefixes or []) if prefix) - if not include_set and not prefixes: + if not exclude_set and not include_set and not prefixes: return chunks scoped = [] for chunk in chunks: - if chunk.source_file in include_set or (prefixes and chunk.source_file.startswith(prefixes)): + if chunk.source_file in exclude_set: + continue + if not include_set and not prefixes: + scoped.append(chunk) + elif chunk.source_file in include_set or (prefixes and chunk.source_file.startswith(prefixes)): scoped.append(chunk) return scoped diff --git a/backend/aggregator/memory/shared_training.py b/backend/aggregator/memory/shared_training.py index fd78a39..f01d0e9 100644 --- a/backend/aggregator/memory/shared_training.py +++ b/backend/aggregator/memory/shared_training.py @@ -25,10 +25,15 @@ def get_manual_aggregator_prompt_path() -> Path: async def save_manual_aggregator_prompt(prompt: str) -> None: """Persist the latest manual Aggregator prompt for stopped/restarted proof checks.""" + if not (prompt or "").strip(): + logger.warning("Refusing to overwrite manual Aggregator prompt with an empty value") + return path = get_manual_aggregator_prompt_path() path.parent.mkdir(parents=True, exist_ok=True) - async with aiofiles.open(path, "w", encoding="utf-8") as handle: + temp_path = path.with_name(f"{path.name}.tmp") + async with aiofiles.open(temp_path, "w", encoding="utf-8") as handle: await handle.write(prompt or "") + await asyncio.to_thread(temp_path.replace, path) async def load_manual_aggregator_prompt() -> str: diff --git a/backend/aggregator/prompts/submitter_prompts.py b/backend/aggregator/prompts/submitter_prompts.py index 6e5be0b..0af19f6 100644 --- a/backend/aggregator/prompts/submitter_prompts.py +++ b/backend/aggregator/prompts/submitter_prompts.py @@ -26,9 +26,9 @@ def get_submitter_system_prompt(lean4_enabled: bool = False) -> str: """Get system prompt for submitter agents.""" lean_proof_route = ( """OPTIONAL LEAN 4 PROOF ROUTE: -If Lean 4 proof verification is enabled and you can produce a complete Lean 4 proof that would be useful public/citable novelty-bearing brainstorm progress, you may choose the `lean_proof` submission type. Novelty means the proved theorem, formulation, or Lean mechanization is absent from standard references or Mathlib and materially helps the user prompt; do not submit program-local firsts. A Lean proof candidate is NOT added directly to the knowledge base: the system first checks that it declares a valid novelty tier and anti-known-result rationale, then runs Lean 4, gives you up to 5 repair attempts with Lean/integrity feedback, and only then sends the Lean-verified proof to the normal brainstorm validator for usefulness and redundancy review. +If Lean 4 proof verification is enabled and you can produce a complete Lean 4 proof for a high-impact theorem that directly solves, rules out, reduces, obstructs, or otherwise makes major progress on the user prompt, you may choose the `lean_proof` submission type. Novelty means the proved theorem is absent from standard references or Mathlib and materially helps the user prompt; do not submit program-local firsts. A Lean proof candidate is NOT added directly to the knowledge base: the system first checks that it declares a valid novelty tier and anti-known-result rationale, then runs Lean 4, gives you up to 5 repair attempts with Lean/integrity feedback, and only then sends the Lean-verified proof to the normal brainstorm validator for usefulness and redundancy review. -Use `lean_proof` only for complete proof code you genuinely expect Lean 4 to accept. Do not use this route for routine helper lemmas, standard Mathlib/textbook facts, general known-knowledge-base entries, or proofs that are only new to this program. Do not use `sorry`, `admit`, or fake `axiom`/`constant`/`opaque` devices. +Use `lean_proof` only for complete proof code you genuinely expect Lean 4 to accept for that high-impact target. Do not use this route for supporting lemmas, routine helper lemmas, local facts, trivial/easy proofs, standard Mathlib/textbook facts, general known-knowledge-base entries, weakened/downshifted substitutes, or proofs that are only new to this program. Do not use `sorry`, `admit`, or fake `axiom`/`constant`/`opaque` devices. """ if lean4_enabled else "" @@ -37,15 +37,15 @@ def get_submitter_system_prompt(lean4_enabled: bool = False) -> str: """Lean proof candidate: { "submission_type": "lean_proof", - "theorem_statement": "Natural-language statement of the theorem or lemma proved by the Lean code.", - "formal_sketch": "Brief note about assumptions, formalization choices, and why this proof helps the brainstorm.", + "theorem_statement": "Natural-language statement of the high-impact theorem proved by the Lean code.", + "formal_sketch": "Brief note about assumptions, formalization choices, and why this proof directly advances the user's prompt.", "expected_novelty_tier": "major_mathematical_discovery | mathematical_discovery | novel_variant | novel_formulation", "prompt_relevance_rationale": "Why this proof directly solves, solves toward, or materially helps solve the user prompt.", "novelty_rationale": "Why this proof is absent from standard references or Mathlib and would be public/citable novelty rather than background knowledge or a program-local first.", "why_not_standard_known_result": "Why this is not merely a textbook/Mathlib/routine helper result.", "theorem_name": "Optional Lean declaration name", "lean_code": "Complete Lean 4 code expected to verify.", - "reasoning": "Why this verified proof would be a useful brainstorm addition" + "reasoning": "Why this verified proof is high-impact brainstorm progress" } """ if lean4_enabled @@ -145,10 +145,10 @@ def get_submitter_json_schema(lean4_enabled: bool = False) -> str: lean_proof_schema = ( """ -Lean proof candidate, only when Lean 4 is enabled and you can provide complete code: +Lean proof candidate, only when Lean 4 is enabled and you can provide complete code for a high-impact prompt-solving theorem: { "submission_type": "lean_proof", - "theorem_statement": "string - natural-language statement proved", + "theorem_statement": "string - natural-language statement of the high-impact theorem proved", "formal_sketch": "string - formalization notes", "expected_novelty_tier": "string - one of major_mathematical_discovery, mathematical_discovery, novel_variant, novel_formulation", "prompt_relevance_rationale": "string - how this directly serves the prompt", @@ -156,13 +156,13 @@ def get_submitter_json_schema(lean4_enabled: bool = False) -> str: "why_not_standard_known_result": "string - why this is not merely textbook/Mathlib/routine helper knowledge", "theorem_name": "string - optional Lean declaration name", "lean_code": "string - complete Lean 4 source code", - "reasoning": "string - why the verified proof would help the brainstorm" + "reasoning": "string - why the verified proof is high-impact prompt-solving progress" }""" if lean4_enabled else "" ) lean_proof_note = ( - "Lean proof candidates must follow the schema above, but should not be copied from a generic example: only use that route when you can provide complete Lean 4 code for a prompt-specific public/citable novelty-bearing theorem." + "Lean proof candidates must follow the schema above, but should not be copied from a generic example: only use that route when you can provide complete Lean 4 code for a high-impact prompt-solving theorem. Never use it for supporting lemmas, routine helpers, local facts, trivial/easy proofs, or weakened substitutes." if lean4_enabled else "" ) diff --git a/backend/aggregator/prompts/validator_prompts.py b/backend/aggregator/prompts/validator_prompts.py index 109a19f..b6d478d 100644 --- a/backend/aggregator/prompts/validator_prompts.py +++ b/backend/aggregator/prompts/validator_prompts.py @@ -15,9 +15,10 @@ LEAN_VERIFIED_SUBMISSION_RULES = """LEAN 4 VERIFIED SUBMISSION RULES: - A submission containing [LEAN 4 VERIFIED BRAINSTORM PROOF] has already passed Lean 4 and MOTO hard integrity checks before this validator call. -- MOTO may have downshifted the stored theorem statement to the actual Lean-verified supporting lemma when the original candidate was too broad. -- Do NOT reject such a submission by re-litigating Lean syntax, proof-checker correctness, statement alignment, triviality, routine status, or novelty. -- Return accept for Lean-verified proof artifacts. Novelty/triviality ranking and duplicate detection decide how long the proof remains in context.""" +- Lean verification only establishes formal validity. It does NOT make a proof useful, novel, or acceptable brainstorm progress. +- Reject Lean-verified proof artifacts that are trivial, routine, local helper lemmas, standard Mathlib/textbook facts, merely supporting lemmas, weakened/downshifted leftovers, or otherwise not high-impact progress toward the user's prompt. +- Accept a Lean-verified proof artifact only when the actual theorem statement shown in the submission is itself a high-impact prompt-solving theorem and is non-redundant with the existing database. +- Do NOT re-litigate Lean syntax or proof-checker correctness; validate usefulness, prompt impact, non-triviality, and redundancy.""" def get_validator_system_prompt() -> str: @@ -79,8 +80,7 @@ def get_validator_system_prompt() -> str: 1. Directly answers the user's whole problem, OR 2. Addresses a clearly necessary piece of the full problem when a whole-answer route is not possible in one shot, OR 3. Provides valuable progress that materially advances the full answer, OR -4. Offers rigorous enabling insights not present in existing accepted submissions when a stronger direct or necessary-piece step is not yet available, OR -5. Presents rigorous mathematical arguments based on established principles +4. Offers rigorous enabling insights not present in existing accepted submissions ONLY when they materially strengthen a direct route to the full answer and no stronger direct or necessary-piece step is available A submission should be REJECTED if it: 1. Is redundant with the existing accepted submissions @@ -265,8 +265,7 @@ def get_validator_dual_system_prompt() -> str: 1. Directly answers the user's whole problem, OR 2. Addresses a clearly necessary piece of the full problem when a whole-answer route is not possible in one shot, OR 3. Provides valuable progress that materially advances the full answer, OR -4. Offers rigorous enabling insights not present in existing accepted submissions when a stronger direct or necessary-piece step is not yet available, OR -5. Presents rigorous mathematical arguments based on established principles +4. Offers rigorous enabling insights not present in existing accepted submissions ONLY when they materially strengthen a direct route to the full answer and no stronger direct or necessary-piece step is available A submission should be REJECTED if it: 1. Is redundant with the existing accepted submissions @@ -504,8 +503,7 @@ def get_validator_triple_system_prompt() -> str: 1. Directly answers the user's whole problem, OR 2. Addresses a clearly necessary piece of the full problem when a whole-answer route is not possible in one shot, OR 3. Provides valuable progress that materially advances the full answer, OR -4. Offers rigorous enabling insights not present in existing accepted submissions when a stronger direct or necessary-piece step is not yet available, OR -5. Presents rigorous mathematical arguments based on established principles +4. Offers rigorous enabling insights not present in existing accepted submissions ONLY when they materially strengthen a direct route to the full answer and no stronger direct or necessary-piece step is available A submission should be REJECTED if it: 1. Is redundant with the existing accepted submissions @@ -763,10 +761,10 @@ def get_cleanup_review_system_prompt() -> str: REASONS TO KEEP - A submission should be kept if it: 1. Directly answers the user's whole problem or a necessary piece of it better than alternatives -2. Provides ANY unique information not covered elsewhere -3. Offers a different perspective or approach even if related to other content -4. Contains specific mathematical details, proofs, or techniques -5. Contributes to solution diversity in any meaningful way +2. Provides unique information that materially strengthens a direct route to the user's full prompt +3. Offers a different perspective or approach that materially improves the best direct solution path +4. Contains specific mathematical details, proofs, or techniques that are necessary for direct prompt progress +5. Contributes to solution diversity only when that diversity improves credible direct-answer progress CONSERVATIVE APPROACH: - When in doubt, DO NOT recommend removal diff --git a/backend/api/main.py b/backend/api/main.py index 65626c6..89a2c4a 100644 --- a/backend/api/main.py +++ b/backend/api/main.py @@ -23,6 +23,9 @@ features, health, proofs, + proof_search, + syntheticlib4, + connectivity, update, leanoj, cloud_access, @@ -114,6 +117,7 @@ def _apply_generic_mode_openrouter_env(api_client_manager) -> None: def _restore_desktop_provider_credentials(api_client_manager) -> None: """Restore persisted desktop credentials from the OS-backed keyring.""" + from backend.shared.runtime_settings import get_persisted_connectivity_toggles from backend.shared.secret_store import ( SecretStoreError, load_openrouter_api_key, @@ -150,7 +154,8 @@ def _restore_desktop_provider_credentials(api_client_manager) -> None: if wolfram_api_key: initialize_wolfram_client(wolfram_api_key) system_config.wolfram_alpha_api_key = wolfram_api_key - system_config.wolfram_alpha_enabled = True + persisted_toggles = get_persisted_connectivity_toggles() + system_config.wolfram_alpha_enabled = persisted_toggles.get("wolfram_alpha_enabled", True) logger.info("Restored Wolfram Alpha API key from secure backend storage") else: logger.info( @@ -311,7 +316,9 @@ async def _warm_start_lean4() -> None: clear_lean4_client() await lm_studio_client.close() from backend.shared.openai_codex_client import openai_codex_client + from backend.shared.sakana_fugu_client import sakana_fugu_client await openai_codex_client.close() + await sakana_fugu_client.close() logger.info("Shutdown complete") @@ -335,6 +342,9 @@ async def _warm_start_lean4() -> None: app.include_router(features.router) app.include_router(health.router) app.include_router(proofs.router) +app.include_router(proof_search.router) +app.include_router(syntheticlib4.router) +app.include_router(connectivity.router) app.include_router(openrouter.router) app.include_router(cloud_access.router) app.include_router(download.router) diff --git a/backend/api/middleware.py b/backend/api/middleware.py index ab6a3f2..c270458 100644 --- a/backend/api/middleware.py +++ b/backend/api/middleware.py @@ -4,6 +4,7 @@ import hmac import os import re +import time from urllib.parse import urlparse from fastapi import FastAPI, Request from fastapi.middleware.cors import CORSMiddleware @@ -33,6 +34,8 @@ DESKTOP_API_TOKEN_HEADER = "X-Moto-Desktop-Token" UNSAFE_HTTP_METHODS = {"POST", "PUT", "PATCH", "DELETE"} DESKTOP_PUBLIC_PROOF_EXPORT_RE = re.compile(r"^/api/proofs/[^/]+/certificate(?:\.lean)?$") +STALE_DESKTOP_TAB_LOG_INTERVAL_SECONDS = 300 +_stale_desktop_tab_log_state: dict[tuple[str, str], tuple[float, int]] = {} def _is_desktop_public_export(method: str, path: str) -> bool: @@ -52,6 +55,31 @@ def _origin_from_url(value: str) -> str: return f"{parsed.scheme}://{parsed.netloc}" +def _log_desktop_auth_rejection(request: Request, exc: ProxyAuthError) -> None: + """Log stale-tab token misses without flooding desktop logs.""" + if exc.status_code != status.HTTP_401_UNAUTHORIZED: + logger.warning("Rejected desktop request %s %s: %s", request.method, request.url.path, exc.detail) + return + + key = (request.method, request.url.path) + now = time.monotonic() + last_log_at, suppressed = _stale_desktop_tab_log_state.get(key, (0.0, 0)) + + if now - last_log_at < STALE_DESKTOP_TAB_LOG_INTERVAL_SECONDS: + _stale_desktop_tab_log_state[key] = (last_log_at, suppressed + 1) + return + + suffix = f" Suppressed {suppressed} repeat stale-tab probe(s) for this route." if suppressed else "" + logger.info( + "Ignored desktop request without the current MOTO tab token: %s %s. " + "This is usually a harmless stale browser tab from an older launch; returning 401.%s", + request.method, + request.url.path, + suffix, + ) + _stale_desktop_tab_log_state[key] = (now, 0) + + def _validate_desktop_token(request: Request, allowed_origins: list[str]) -> None: """Require the launcher-provided desktop API token outside public routes.""" if is_proxy_auth_allowlisted(request.method, request.url.path): @@ -185,7 +213,7 @@ async def moto_request_auth(request: Request, call_next): try: _validate_desktop_token(request, origins) except ProxyAuthError as exc: - logger.warning("Rejected desktop request %s %s: %s", request.method, request.url.path, exc.detail) + _log_desktop_auth_rejection(request, exc) return JSONResponse(status_code=exc.status_code, content={"detail": exc.detail}) return await call_next(request) diff --git a/backend/api/routes/__init__.py b/backend/api/routes/__init__.py index b1d5cb8..de989f1 100644 --- a/backend/api/routes/__init__.py +++ b/backend/api/routes/__init__.py @@ -1,4 +1,4 @@ """API routes""" -from . import aggregator, compiler, autonomous, websocket, boost, workflow, features, health, proofs, update, leanoj, cloud_access +from . import aggregator, compiler, autonomous, websocket, boost, workflow, features, health, proofs, proof_search, syntheticlib4, connectivity, update, leanoj, cloud_access -__all__ = ['aggregator', 'compiler', 'autonomous', 'websocket', 'boost', 'workflow', 'features', 'health', 'proofs', 'update', 'leanoj', 'cloud_access'] +__all__ = ['aggregator', 'compiler', 'autonomous', 'websocket', 'boost', 'workflow', 'features', 'health', 'proofs', 'proof_search', 'syntheticlib4', 'connectivity', 'update', 'leanoj', 'cloud_access'] diff --git a/backend/api/routes/aggregator.py b/backend/api/routes/aggregator.py index bcae39f..4946d2f 100644 --- a/backend/api/routes/aggregator.py +++ b/backend/api/routes/aggregator.py @@ -7,8 +7,9 @@ from pathlib import Path import aiofiles -from backend.shared.models import AggregatorStartRequest, SystemStatus, ModelInfo +from backend.shared.models import AggregatorStartRequest, SystemStatus, ModelInfo, ModelConfig from backend.shared.lm_studio_client import lm_studio_client +from backend.shared.openrouter_client import OpenRouterClient from backend.shared.config import system_config, rag_config from backend.shared.embedding_readiness import require_embedding_provider_ready from backend.shared.token_tracker import token_tracker @@ -16,6 +17,8 @@ from backend.shared.log_redaction import redact_log_text from backend.shared.manual_proof_context import get_manual_proof_context_lock from backend.shared.workflow_start_guard import workflow_start_guard +from backend.shared.api_client_manager import api_client_manager +from backend.shared.proof_search.assistant_coordinator import assistant_proof_search_coordinator from backend.aggregator.core.coordinator import coordinator from backend.aggregator.core.context_allocator import context_allocator from backend.aggregator.memory.event_log import event_log @@ -104,6 +107,42 @@ def _require_valid_role_limits(context_window: int, max_output_tokens: int, labe ) +async def _require_openrouter_host_provider_available( + *, + label: str, + provider: str, + model_id: str, + host_provider: Optional[str], +) -> None: + """Reject stale pinned OpenRouter host providers before starting work.""" + if provider != "openrouter" or not model_id or not host_provider: + return + if not rag_config.openrouter_api_key: + return + + client = OpenRouterClient(rag_config.openrouter_api_key) + try: + endpoints = await client.get_model_endpoints(model_id) + finally: + await client.close() + + available_hosts = { + endpoint.get("provider_name") + for endpoint in endpoints + if isinstance(endpoint, dict) and endpoint.get("provider_name") + } + if host_provider not in available_hosts: + hosts_text = ", ".join(sorted(available_hosts)) if available_hosts else "none" + raise HTTPException( + status_code=400, + detail=( + f"{label} OpenRouter host provider '{host_provider}' is not currently " + f"available for model '{model_id}'. Set Host Provider to Auto or choose " + f"one of the currently available hosts: {hosts_text}." + ), + ) + + def _get_start_conflict() -> Optional[str]: """Return a user-facing conflict message if another workflow is active.""" if coordinator.is_running: @@ -175,6 +214,9 @@ async def start_aggregator(request: AggregatorStartRequest): if conflict: raise HTTPException(status_code=400, detail=conflict) + if not request.user_prompt.strip(): + raise HTTPException(status_code=400, detail="Aggregator user prompt is required.") + # Validate submitter configs num_submitters = len(request.submitter_configs) if not (system_config.min_submitters <= num_submitters <= system_config.max_submitters): @@ -187,9 +229,25 @@ async def start_aggregator(request: AggregatorStartRequest): request.validator_max_output_tokens, "Validator", ) + effective_assistant_context_size = ( + request.assistant_context_size + if request.assistant_model + else request.validator_context_size + ) + effective_assistant_max_output_tokens = ( + request.assistant_max_output_tokens + if request.assistant_model + else request.validator_max_output_tokens + ) + _require_valid_role_limits( + effective_assistant_context_size, + effective_assistant_max_output_tokens, + "Assistant", + ) for config in request.submitter_configs: label = "Main submitter" if config.submitter_id == 1 else f"Submitter {config.submitter_id}" _require_valid_role_limits(config.context_window, config.max_output_tokens, label) + await save_manual_aggregator_prompt(request.user_prompt) await require_embedding_provider_ready() # Update validator context window configuration @@ -225,6 +283,71 @@ async def start_aggregator(request: AggregatorStartRequest): redact_log_text(request.validator_context_size, 40), redact_log_text(request.validator_max_output_tokens, 40), ) + assistant_model = request.assistant_model or request.validator_model + assistant_provider = ( + request.assistant_provider + if request.assistant_model + else request.validator_provider + ) + assistant_openrouter_provider = ( + request.assistant_openrouter_provider + if request.assistant_model + else request.validator_openrouter_provider + ) + assistant_reasoning_effort = ( + request.assistant_openrouter_reasoning_effort + if request.assistant_model + else request.validator_openrouter_reasoning_effort + ) + assistant_fallback = ( + request.assistant_lm_studio_fallback + if request.assistant_model + else request.validator_lm_studio_fallback + ) + assistant_context_size = ( + effective_assistant_context_size + ) + assistant_max_output_tokens = ( + effective_assistant_max_output_tokens + ) + assistant_supercharge_enabled = ( + request.assistant_supercharge_enabled + if request.assistant_model + else request.validator_supercharge_enabled + ) + for config in request.submitter_configs: + label = "Main submitter" if config.submitter_id == 1 else f"Submitter {config.submitter_id}" + await _require_openrouter_host_provider_available( + label=label, + provider=config.provider, + model_id=config.model_id, + host_provider=config.openrouter_provider, + ) + await _require_openrouter_host_provider_available( + label="Validator", + provider=request.validator_provider, + model_id=request.validator_model, + host_provider=request.validator_openrouter_provider, + ) + await _require_openrouter_host_provider_available( + label="Assistant", + provider=assistant_provider, + model_id=assistant_model, + host_provider=assistant_openrouter_provider, + ) + api_client_manager.configure_role( + "aggregator_assistant", + ModelConfig( + provider=assistant_provider, + model_id=assistant_model, + openrouter_provider=assistant_openrouter_provider, + openrouter_reasoning_effort=assistant_reasoning_effort, + lm_studio_fallback_id=assistant_fallback, + context_window=assistant_context_size, + max_output_tokens=assistant_max_output_tokens, + supercharge_enabled=assistant_supercharge_enabled, + ), + ) # Initialize coordinator with per-submitter configs (includes OpenRouter provider fields) await coordinator.initialize( @@ -242,8 +365,6 @@ async def start_aggregator(request: AggregatorStartRequest): validator_supercharge_enabled=request.validator_supercharge_enabled, creativity_emphasis_boost_enabled=request.creativity_emphasis_boost_enabled, ) - await save_manual_aggregator_prompt(request.user_prompt) - # Start coordinator token_tracker.reset() token_tracker.start_timer() @@ -272,6 +393,10 @@ async def stop_aggregator(): """Stop the aggregator system.""" try: await coordinator.stop() + await assistant_proof_search_coordinator.stop_all( + broadcast=True, + reason="aggregator_stopped", + ) token_tracker.stop_timer() return {"status": "stopped", "message": "Aggregator system stopped"} except Exception as e: @@ -290,6 +415,16 @@ async def get_status(): raise HTTPException(status_code=500, detail="Internal server error") +@router.get("/prompt") +async def get_prompt(): + """Get the durable manual Aggregator prompt.""" + try: + return {"prompt": await load_manual_aggregator_prompt()} + except Exception as e: + logger.error(f"Failed to get manual Aggregator prompt: {e}") + raise HTTPException(status_code=500, detail="Internal server error") + + @router.get("/results") async def get_results(): """Get all accepted submissions with formatting for display.""" @@ -349,6 +484,11 @@ async def clear_all_submissions(): reason="manual_aggregator_clear_all", ) await coordinator.clear_all_submissions() + await assistant_proof_search_coordinator.stop_all( + broadcast=True, + reason="aggregator_cleared", + ) + await assistant_proof_search_coordinator.clear_cooldown_state() await clear_manual_aggregator_prompt() return { diff --git a/backend/api/routes/autonomous.py b/backend/api/routes/autonomous.py index af7925b..29c0721 100644 --- a/backend/api/routes/autonomous.py +++ b/backend/api/routes/autonomous.py @@ -26,10 +26,12 @@ from backend.compiler.core.compiler_coordinator import compiler_coordinator from backend.leanoj.core.leanoj_coordinator import leanoj_coordinator from backend.shared.boost_logger import boost_logger +from backend.shared.config import system_config from backend.shared.embedding_readiness import require_embedding_provider_ready from backend.shared.log_redaction import redact_log_text from backend.shared.workflow_start_guard import workflow_start_guard from backend.shared.response_extraction import extract_message_text +from backend.shared.proof_search.assistant_coordinator import assistant_proof_search_coordinator logger = logging.getLogger(__name__) @@ -772,40 +774,76 @@ async def start_autonomous_research(request: AutonomousResearchStartRequest): validator_model=request.validator_model, validator_context_window=request.validator_context_window, validator_max_tokens=request.validator_max_tokens, - high_context_model=request.high_context_model, - high_context_context_window=request.high_context_context_window, - high_context_max_tokens=request.high_context_max_tokens, + writer_model=request.writer_model, + writer_context_window=request.writer_context_window, + writer_max_tokens=request.writer_max_tokens, high_param_model=request.high_param_model, high_param_context_window=request.high_param_context_window, high_param_max_tokens=request.high_param_max_tokens, - critique_submitter_model=request.critique_submitter_model, - critique_submitter_context_window=request.critique_submitter_context_window, - critique_submitter_max_tokens=request.critique_submitter_max_tokens, + critique_submitter_model=request.high_param_model, + critique_submitter_context_window=request.high_param_context_window, + critique_submitter_max_tokens=request.high_param_max_tokens, # OpenRouter provider configs for each role validator_provider=request.validator_provider, validator_openrouter_provider=request.validator_openrouter_provider, validator_openrouter_reasoning_effort=request.validator_openrouter_reasoning_effort, validator_lm_studio_fallback=request.validator_lm_studio_fallback, - high_context_provider=request.high_context_provider, - high_context_openrouter_provider=request.high_context_openrouter_provider, - high_context_openrouter_reasoning_effort=request.high_context_openrouter_reasoning_effort, - high_context_lm_studio_fallback=request.high_context_lm_studio_fallback, + writer_provider=request.writer_provider, + writer_openrouter_provider=request.writer_openrouter_provider, + writer_openrouter_reasoning_effort=request.writer_openrouter_reasoning_effort, + writer_lm_studio_fallback=request.writer_lm_studio_fallback, high_param_provider=request.high_param_provider, high_param_openrouter_provider=request.high_param_openrouter_provider, high_param_openrouter_reasoning_effort=request.high_param_openrouter_reasoning_effort, high_param_lm_studio_fallback=request.high_param_lm_studio_fallback, - critique_submitter_provider=request.critique_submitter_provider, - critique_submitter_openrouter_provider=request.critique_submitter_openrouter_provider, - critique_submitter_openrouter_reasoning_effort=request.critique_submitter_openrouter_reasoning_effort, - critique_submitter_lm_studio_fallback=request.critique_submitter_lm_studio_fallback, + critique_submitter_provider=request.high_param_provider, + critique_submitter_openrouter_provider=request.high_param_openrouter_provider, + critique_submitter_openrouter_reasoning_effort=request.high_param_openrouter_reasoning_effort, + critique_submitter_lm_studio_fallback=request.high_param_lm_studio_fallback, + assistant_provider=( + request.assistant_provider + if request.assistant_model + else request.validator_provider + ), + assistant_model=request.assistant_model or request.validator_model, + assistant_openrouter_provider=( + request.assistant_openrouter_provider + if request.assistant_model + else request.validator_openrouter_provider + ), + assistant_openrouter_reasoning_effort=( + request.assistant_openrouter_reasoning_effort + if request.assistant_model + else request.validator_openrouter_reasoning_effort + ), + assistant_lm_studio_fallback=( + request.assistant_lm_studio_fallback + if request.assistant_model + else request.validator_lm_studio_fallback + ), + assistant_context_window=( + request.assistant_context_window + if request.assistant_model + else request.validator_context_window + ), + assistant_max_tokens=( + request.assistant_max_tokens + if request.assistant_model + else request.validator_max_tokens + ), tier3_enabled=request.tier3_enabled, creativity_emphasis_boost_enabled=request.creativity_emphasis_boost_enabled, allow_mathematical_proofs=effective_allow_mathematical_proofs, allow_research_papers=request.allow_research_papers, validator_supercharge_enabled=request.validator_supercharge_enabled, - high_context_supercharge_enabled=request.high_context_supercharge_enabled, + writer_supercharge_enabled=request.writer_supercharge_enabled, high_param_supercharge_enabled=request.high_param_supercharge_enabled, - critique_submitter_supercharge_enabled=request.critique_submitter_supercharge_enabled + critique_submitter_supercharge_enabled=request.high_param_supercharge_enabled, + assistant_supercharge_enabled=( + request.assistant_supercharge_enabled + if request.assistant_model + else request.validator_supercharge_enabled + ), ) # Start in background with a retained task handle so Stop can cancel it. @@ -842,6 +880,10 @@ async def stop_autonomous_research(): } await autonomous_coordinator.stop() + await assistant_proof_search_coordinator.stop_all( + broadcast=True, + reason="autonomous_stopped", + ) # Get final stats stats = await research_metadata.get_stats() @@ -882,6 +924,11 @@ async def clear_autonomous_research(confirm: bool = False): try: await autonomous_coordinator.clear_all_data() + await assistant_proof_search_coordinator.stop_all( + broadcast=True, + reason="autonomous_cleared", + ) + await assistant_proof_search_coordinator.clear_cooldown_state() logger.info("Autonomous research data clear completed successfully") return { @@ -916,6 +963,34 @@ async def clear_autonomous_research(confirm: bool = False): ) +@router.get("/prompt") +async def get_prompt(): + """Get the durable Autonomous Research prompt for restart recovery.""" + try: + if session_manager.is_session_active: + prompt = await research_metadata.get_base_user_prompt() + if prompt.strip(): + return {"prompt": prompt} + + interrupted_session = await session_manager.find_interrupted_session( + system_config.auto_sessions_base_dir + ) + if interrupted_session: + workflow_state = interrupted_session.get("workflow_state") or {} + return { + "prompt": ( + interrupted_session.get("user_prompt") + or workflow_state.get("base_user_research_prompt") + or "" + ) + } + + return {"prompt": await research_metadata.get_base_user_prompt()} + except Exception as e: + logger.error(f"Failed to get autonomous research prompt: {e}") + raise HTTPException(status_code=500, detail="Internal server error") + + @router.get("/status") async def get_autonomous_status(): """Get current status and metrics.""" @@ -2171,6 +2246,11 @@ async def clear_tier3_data(confirm: bool = False): ) await final_answer_memory.clear() + await assistant_proof_search_coordinator.stop_all( + broadcast=True, + reason="tier3_cleared", + ) + await assistant_proof_search_coordinator.clear_cooldown_state() # Also clear any final answer critiques from backend.shared.critique_memory import clear_critiques diff --git a/backend/api/routes/boost.py b/backend/api/routes/boost.py index 3766b80..ed8a71f 100644 --- a/backend/api/routes/boost.py +++ b/backend/api/routes/boost.py @@ -430,7 +430,7 @@ async def toggle_category_boost(category: str) -> Dict[str, Any]: Categories: - Aggregator: agg_sub1, agg_sub2, ..., agg_sub10, agg_val - - Compiler: comp_hc, comp_hp, comp_val + - Compiler: comp_writer, comp_hp, comp_val - Autonomous: auto_ts, auto_tv, auto_cr, auto_rs, auto_pt, auto_prc Args: diff --git a/backend/api/routes/cloud_access.py b/backend/api/routes/cloud_access.py index af3050e..d11d2db 100644 --- a/backend/api/routes/cloud_access.py +++ b/backend/api/routes/cloud_access.py @@ -16,6 +16,8 @@ from backend.shared.config import rag_config, system_config from backend.shared.openai_codex_client import OpenAICodexAuthError, openai_codex_client +from backend.shared.provider_notification_store import list_provider_notifications +from backend.shared.sakana_fugu_client import SakanaFuguAuthError, SakanaFuguError, sakana_fugu_client from backend.shared.xai_grok_client import XAIGrokAuthError, xai_grok_client router = APIRouter(prefix="/api/cloud-access", tags=["cloud-access"]) @@ -78,6 +80,10 @@ class XAIGrokOAuthExchangeRequest(BaseModel): redirect_uri: Optional[str] = None +class SakanaFuguApiKeyRequest(BaseModel): + api_key: str + + def _ensure_desktop_codex_allowed() -> None: if system_config.generic_mode: raise HTTPException( @@ -100,6 +106,17 @@ def _ensure_desktop_xai_grok_allowed() -> None: ) +def _ensure_desktop_sakana_fugu_allowed() -> None: + if system_config.generic_mode: + raise HTTPException( + status_code=501, + detail=( + "Sakana Fugu direct subscription API access is currently desktop-only. " + "Hosted mode should use OpenRouter keys until direct provider support is designed." + ), + ) + + def _resolve_codex_redirect_uri(requested_redirect_uri: Optional[str]) -> str: """Keep the Codex OAuth redirect pinned to the local loopback callback.""" default_redirect_uri = openai_codex_client.DEFAULT_REDIRECT_URI @@ -348,6 +365,10 @@ async def get_cloud_access_status() -> Dict[str, Any]: "xAI Grok", xai_grok_client.status(), ) + sakana_status = {"configured": False} if system_config.generic_mode else await _safe_oauth_status( + "Sakana Fugu", + sakana_fugu_client.status(), + ) return { "success": True, "generic_mode": system_config.generic_mode, @@ -366,10 +387,24 @@ async def get_cloud_access_status() -> Dict[str, Any]: "available": not system_config.generic_mode, "desktop_only": True, }, + "sakana_fugu": { + **sakana_status, + "available": not system_config.generic_mode, + "desktop_only": True, + }, }, } +@router.get("/provider-notifications") +async def get_provider_notifications() -> Dict[str, Any]: + """Return recent non-secret provider/OAuth notifications missed by live WebSocket clients.""" + return { + "success": True, + "notifications": await asyncio.to_thread(list_provider_notifications), + } + + @router.post("/openai-codex/oauth/start") async def start_openai_codex_oauth(request: CodexOAuthStartRequest) -> Dict[str, Any]: """Start the OpenAI Codex OAuth PKCE login flow.""" @@ -545,3 +580,48 @@ async def clear_xai_grok_oauth() -> Dict[str, Any]: _ensure_desktop_xai_grok_allowed() await xai_grok_client.clear_tokens() return {"success": True, "message": "xAI Grok login cleared"} + + +@router.get("/sakana-fugu/status") +async def get_sakana_fugu_status() -> Dict[str, Any]: + """Return Sakana Fugu API-key status.""" + if system_config.generic_mode: + return {"success": True, "status": {"configured": False, "available": False, "desktop_only": True}} + return {"success": True, "status": await sakana_fugu_client.status()} + + +@router.post("/sakana-fugu/api-key") +async def set_sakana_fugu_api_key(request: SakanaFuguApiKeyRequest) -> Dict[str, Any]: + """Store a Sakana Fugu API key in the desktop keyring.""" + _ensure_desktop_sakana_fugu_allowed() + try: + key = request.api_key.strip() + models = await sakana_fugu_client.list_models(api_key=key) + sakana_fugu_client.set_api_key(key) + except (ValueError, SakanaFuguAuthError, SakanaFuguError) as exc: + raise HTTPException(status_code=400, detail=str(exc)) from exc + return { + "success": True, + "provider": "sakana_fugu", + "status": await sakana_fugu_client.status(), + "models": models, + } + + +@router.get("/sakana-fugu/models") +async def get_sakana_fugu_models() -> Dict[str, Any]: + """Return available Sakana Fugu models for the configured API key.""" + _ensure_desktop_sakana_fugu_allowed() + try: + models = await sakana_fugu_client.list_models() + except SakanaFuguAuthError as exc: + raise HTTPException(status_code=400, detail=str(exc)) from exc + return {"success": True, "models": models} + + +@router.delete("/sakana-fugu") +async def clear_sakana_fugu_api_key() -> Dict[str, Any]: + """Clear the stored Sakana Fugu API key.""" + _ensure_desktop_sakana_fugu_allowed() + await sakana_fugu_client.clear_api_key() + return {"success": True, "message": "Sakana Fugu API key cleared"} diff --git a/backend/api/routes/compiler.py b/backend/api/routes/compiler.py index 5809965..4842b98 100644 --- a/backend/api/routes/compiler.py +++ b/backend/api/routes/compiler.py @@ -18,7 +18,13 @@ from backend.shared.log_redaction import redact_log_text from backend.shared.manual_proof_context import get_manual_proof_context_lock from backend.shared.workflow_start_guard import workflow_start_guard +from backend.shared.proof_search.assistant_coordinator import assistant_proof_search_coordinator from backend.compiler.core.compiler_coordinator import CRITIQUE_ATTEMPT_TARGET, compiler_coordinator +from backend.compiler.memory.manual_prompt import ( + clear_manual_compiler_prompt, + load_manual_compiler_prompt, + save_manual_compiler_prompt, +) from backend.compiler.memory.outline_memory import outline_memory from backend.compiler.memory.paper_memory import paper_memory from backend.aggregator.core.coordinator import coordinator @@ -137,7 +143,7 @@ async def _run_saved_compiler_paper_proof_check( submitter_model = str(proof_config.get("submitter_model") or "") validator_model = str(proof_config.get("validator_model") or "") if not submitter_model: - logger.warning("Skipping saved compiler paper proof check: high-context model is unavailable") + logger.warning("Skipping saved compiler paper proof check: Rigor & Proofs model is unavailable") return if not validator_model: logger.warning("Skipping saved compiler paper proof check: validator model is unavailable") @@ -204,6 +210,10 @@ async def _run_saved_compiler_paper_proof_check( ) finally: await _release_pre_reserved_source("paper", source_id, source_reserved) + await assistant_proof_search_coordinator.stop_all( + broadcast=True, + reason="saved_compiler_paper_proof_check_complete", + ) def _get_start_conflict() -> str | None: @@ -248,14 +258,14 @@ async def _run_compiler_aggregator_proof_check( role_suffix = "compiler_aggregator" submitter_config = ModelConfig( - provider=request.high_context_provider, - model_id=request.high_context_model, - openrouter_provider=request.high_context_openrouter_provider, - openrouter_reasoning_effort=request.high_context_openrouter_reasoning_effort, - lm_studio_fallback_id=request.high_context_lm_studio_fallback, - context_window=request.high_context_context_size, - max_output_tokens=request.high_context_max_output_tokens, - supercharge_enabled=request.high_context_supercharge_enabled, + provider=request.high_param_provider, + model_id=request.high_param_model, + openrouter_provider=request.high_param_openrouter_provider, + openrouter_reasoning_effort=request.high_param_openrouter_reasoning_effort, + lm_studio_fallback_id=request.high_param_lm_studio_fallback, + context_window=request.high_param_context_size, + max_output_tokens=request.high_param_max_output_tokens, + supercharge_enabled=request.high_param_supercharge_enabled, ) validator_config = ModelConfig( provider=request.validator_provider, @@ -274,6 +284,39 @@ async def _run_compiler_aggregator_proof_check( ): api_client_manager.configure_role(role_id, submitter_config) api_client_manager.configure_role("autonomous_proof_novelty", validator_config) + api_client_manager.configure_role( + "compiler_assistant", + ModelConfig( + provider=( + request.assistant_provider + if request.assistant_model + else request.validator_provider + ), + model_id=request.assistant_model or request.validator_model, + openrouter_provider=( + request.assistant_openrouter_provider + if request.assistant_model + else request.validator_openrouter_provider + ), + openrouter_reasoning_effort=( + request.assistant_openrouter_reasoning_effort + if request.assistant_model + else request.validator_openrouter_reasoning_effort + ), + lm_studio_fallback_id=( + request.assistant_lm_studio_fallback + if request.assistant_model + else request.validator_lm_studio_fallback + ), + context_window=request.assistant_context_size if request.assistant_model else request.validator_context_size, + max_output_tokens=request.assistant_max_output_tokens if request.assistant_model else request.validator_max_output_tokens, + supercharge_enabled=( + request.assistant_supercharge_enabled + if request.assistant_model + else request.validator_supercharge_enabled + ), + ), + ) await websocket.broadcast_event( "compiler_proof_check_started", @@ -285,9 +328,9 @@ async def _run_compiler_aggregator_proof_check( source_type="brainstorm", source_id=source_id, user_prompt=manual_proof_database.inject_into_prompt(request.compiler_prompt), - submitter_model=request.high_context_model, - submitter_context=request.high_context_context_size, - submitter_max_tokens=request.high_context_max_output_tokens, + submitter_model=request.high_param_model, + submitter_context=request.high_param_context_size, + submitter_max_tokens=request.high_param_max_output_tokens, validator_model=request.validator_model, validator_context=request.validator_context_size, validator_max_tokens=request.validator_max_output_tokens, @@ -306,6 +349,10 @@ async def _run_compiler_aggregator_proof_check( ) finally: await _release_pre_reserved_source("brainstorm", MANUAL_AGGREGATOR_SOURCE_ID, source_reserved) + await assistant_proof_search_coordinator.stop_all( + broadcast=True, + reason="compiler_aggregator_proof_check_complete", + ) token_tracker.stop_timer() @@ -346,18 +393,32 @@ async def start_compiler(request: CompilerStartRequest): if conflict: raise HTTPException(status_code=400, detail=conflict) + if not request.compiler_prompt.strip(): + raise HTTPException(status_code=400, detail="Compiler prompt is required.") + if not request.allow_mathematical_proofs and not request.allow_research_papers: raise HTTPException( status_code=400, detail="At least one allowed output must be enabled.", ) + effective_assistant_context_size = ( + request.assistant_context_size + if request.assistant_model + else request.validator_context_size + ) + effective_assistant_max_output_tokens = ( + request.assistant_max_output_tokens + if request.assistant_model + else request.validator_max_output_tokens + ) _validate_positive_role_limits({ "validator": (request.validator_context_size, request.validator_max_output_tokens), - "high-context submitter": (request.high_context_context_size, request.high_context_max_output_tokens), - "high-param submitter": (request.high_param_context_size, request.high_param_max_output_tokens), - "critique submitter": (request.critique_submitter_context_window, request.critique_submitter_max_tokens), + "Writing Submitter": (request.writer_context_size, request.writer_max_output_tokens), + "Rigor & Proofs submitter": (request.high_param_context_size, request.high_param_max_output_tokens), + "assistant": (effective_assistant_context_size, effective_assistant_max_output_tokens), }) + await save_manual_compiler_prompt(request.compiler_prompt) effective_allow_mathematical_proofs = bool( request.allow_mathematical_proofs and not system_config.generic_mode @@ -398,26 +459,64 @@ async def start_compiler(request: CompilerStartRequest): } await require_embedding_provider_ready() + assistant_model = request.assistant_model or request.validator_model + assistant_provider = ( + request.assistant_provider + if request.assistant_model + else request.validator_provider + ) + assistant_openrouter_provider = ( + request.assistant_openrouter_provider + if request.assistant_model + else request.validator_openrouter_provider + ) + assistant_reasoning_effort = ( + request.assistant_openrouter_reasoning_effort + if request.assistant_model + else request.validator_openrouter_reasoning_effort + ) + assistant_fallback = ( + request.assistant_lm_studio_fallback + if request.assistant_model + else request.validator_lm_studio_fallback + ) + api_client_manager.configure_role( + "compiler_assistant", + ModelConfig( + provider=assistant_provider, + model_id=assistant_model, + openrouter_provider=assistant_openrouter_provider, + openrouter_reasoning_effort=assistant_reasoning_effort, + lm_studio_fallback_id=assistant_fallback, + context_window=effective_assistant_context_size, + max_output_tokens=effective_assistant_max_output_tokens, + supercharge_enabled=( + request.assistant_supercharge_enabled + if request.assistant_model + else request.validator_supercharge_enabled + ), + ), + ) # Update system config with user-provided context sizes system_config.compiler_validator_context_window = request.validator_context_size - system_config.compiler_high_context_context_window = request.high_context_context_size + system_config.compiler_writer_context_window = request.writer_context_size system_config.compiler_high_param_context_window = request.high_param_context_size - system_config.compiler_critique_submitter_context_window = request.critique_submitter_context_window + system_config.compiler_critique_submitter_context_window = request.high_param_context_size # Update max output token configurations system_config.compiler_validator_max_output_tokens = request.validator_max_output_tokens - system_config.compiler_high_context_max_output_tokens = request.high_context_max_output_tokens + system_config.compiler_writer_max_output_tokens = request.writer_max_output_tokens system_config.compiler_high_param_max_output_tokens = request.high_param_max_output_tokens - system_config.compiler_critique_submitter_max_tokens = request.critique_submitter_max_tokens + system_config.compiler_critique_submitter_max_tokens = request.high_param_max_output_tokens - # Store critique submitter model - system_config.compiler_critique_submitter_model = request.critique_submitter_model + # Deprecated critique fields are compatibility aliases for Rigor & Proofs. + system_config.compiler_critique_submitter_model = request.high_param_model logger.info( - "Compiler max output tokens - Validator: %s, High-context: %s, High-param: %s", + "Compiler max output tokens - Validator: %s, Writing Submitter: %s, Rigor & Proofs: %s", redact_log_text(request.validator_max_output_tokens, 40), - redact_log_text(request.high_context_max_output_tokens, 40), + redact_log_text(request.writer_max_output_tokens, 40), redact_log_text(request.high_param_max_output_tokens, 40), ) @@ -425,30 +524,30 @@ async def start_compiler(request: CompilerStartRequest): await compiler_coordinator.initialize( compiler_prompt=request.compiler_prompt, validator_model=request.validator_model, - high_context_model=request.high_context_model, + writer_model=request.writer_model, high_param_model=request.high_param_model, - critique_submitter_model=request.critique_submitter_model, + critique_submitter_model=request.high_param_model, # OpenRouter provider configs for each role validator_provider=request.validator_provider, validator_openrouter_provider=request.validator_openrouter_provider, validator_openrouter_reasoning_effort=request.validator_openrouter_reasoning_effort, validator_lm_studio_fallback=request.validator_lm_studio_fallback, - high_context_provider=request.high_context_provider, - high_context_openrouter_provider=request.high_context_openrouter_provider, - high_context_openrouter_reasoning_effort=request.high_context_openrouter_reasoning_effort, - high_context_lm_studio_fallback=request.high_context_lm_studio_fallback, + writer_provider=request.writer_provider, + writer_openrouter_provider=request.writer_openrouter_provider, + writer_openrouter_reasoning_effort=request.writer_openrouter_reasoning_effort, + writer_lm_studio_fallback=request.writer_lm_studio_fallback, high_param_provider=request.high_param_provider, high_param_openrouter_provider=request.high_param_openrouter_provider, high_param_openrouter_reasoning_effort=request.high_param_openrouter_reasoning_effort, high_param_lm_studio_fallback=request.high_param_lm_studio_fallback, - critique_submitter_provider=request.critique_submitter_provider, - critique_submitter_openrouter_provider=request.critique_submitter_openrouter_provider, - critique_submitter_openrouter_reasoning_effort=request.critique_submitter_openrouter_reasoning_effort, - critique_submitter_lm_studio_fallback=request.critique_submitter_lm_studio_fallback, + critique_submitter_provider=request.high_param_provider, + critique_submitter_openrouter_provider=request.high_param_openrouter_provider, + critique_submitter_openrouter_reasoning_effort=request.high_param_openrouter_reasoning_effort, + critique_submitter_lm_studio_fallback=request.high_param_lm_studio_fallback, validator_supercharge_enabled=request.validator_supercharge_enabled, - high_context_supercharge_enabled=request.high_context_supercharge_enabled, + writer_supercharge_enabled=request.writer_supercharge_enabled, high_param_supercharge_enabled=request.high_param_supercharge_enabled, - critique_submitter_supercharge_enabled=request.critique_submitter_supercharge_enabled, + critique_submitter_supercharge_enabled=request.high_param_supercharge_enabled, allow_mathematical_proofs=effective_allow_mathematical_proofs ) @@ -477,9 +576,9 @@ async def start_compiler(request: CompilerStartRequest): if request.validator_model in error_msg: failed_model_type = "validator" failed_model_name = request.validator_model - elif request.high_context_model in error_msg: - failed_model_type = "high_context" - failed_model_name = request.high_context_model + elif request.writer_model in error_msg: + failed_model_type = "writer" + failed_model_name = request.writer_model elif request.high_param_model in error_msg: failed_model_type = "high_param" failed_model_name = request.high_param_model @@ -520,6 +619,10 @@ async def stop_compiler(): await asyncio.gather(_compiler_proof_only_task, return_exceptions=True) _compiler_proof_only_task = None await compiler_coordinator.stop() + await assistant_proof_search_coordinator.stop_all( + broadcast=True, + reason="compiler_stopped", + ) token_tracker.stop_timer() return {"status": "stopped", "message": "Compiler stopped"} except Exception as e: @@ -543,7 +646,7 @@ async def test_models(request: CompilerStartRequest): results = { "validator": {"model": request.validator_model, "passed": False, "error": "", "details": {}}, - "high_context": {"model": request.high_context_model, "passed": False, "error": "", "details": {}}, + "writer": {"model": request.writer_model, "passed": False, "error": "", "details": {}}, "high_param": {"model": request.high_param_model, "passed": False, "error": "", "details": {}} } @@ -556,16 +659,16 @@ async def test_models(request: CompilerStartRequest): results["validator"]["error"] = error results["validator"]["details"] = details - # Test high-context model + # Test writer model is_compat, error, details = await lm_studio_client.test_model_compatibility( - request.high_context_model, - request.high_context_max_output_tokens, + request.writer_model, + request.writer_max_output_tokens, ) - results["high_context"]["passed"] = is_compat - results["high_context"]["error"] = error - results["high_context"]["details"] = details + results["writer"]["passed"] = is_compat + results["writer"]["error"] = error + results["writer"]["details"] = details - # Test high-param model + # Test Rigor & Proofs model is_compat, error, details = await lm_studio_client.test_model_compatibility( request.high_param_model, request.high_param_max_output_tokens, @@ -596,6 +699,16 @@ async def get_status(): raise HTTPException(status_code=500, detail="Internal server error") +@router.get("/prompt") +async def get_prompt(): + """Get the durable manual Compiler prompt.""" + try: + return {"prompt": await load_manual_compiler_prompt()} + except Exception as e: + logger.error(f"Failed to get manual Compiler prompt: {e}") + raise HTTPException(status_code=500, detail="Internal server error") + + @router.get("/paper") async def get_paper(): """Get current paper content (includes outline prepended).""" @@ -649,6 +762,7 @@ async def _save_paper_unlocked(): outline = await outline_memory.get_outline() paper = await paper_memory.get_paper() word_count = await paper_memory.get_word_count() + persisted_prompt = compiler_coordinator.user_prompt or await load_manual_compiler_prompt() # Get model tracking data for author attribution model_data = compiler_coordinator.get_model_tracking_data() @@ -672,8 +786,8 @@ async def _save_paper_unlocked(): # Generate attribution header (no reference papers for manual mode) attribution_section = generate_attribution_for_existing_paper( - user_prompt=compiler_coordinator.user_prompt, - paper_title=compiler_coordinator.paper_title or compiler_coordinator.user_prompt, + user_prompt=persisted_prompt, + paper_title=compiler_coordinator.paper_title or persisted_prompt, model_usage=model_data["model_usage"], generation_date=gen_date, reference_paper_models=None # No reference papers in manual mode @@ -712,31 +826,31 @@ async def _save_paper_unlocked(): async with aiofiles.open(output_path, 'w', encoding='utf-8') as f: await f.write(full_content) - high_context = compiler_coordinator.high_context_submitter + rigor_submitter = compiler_coordinator.high_param_submitter proof_check_scheduled = bool( system_config.lean4_enabled and getattr(compiler_coordinator, "allow_mathematical_proofs", True) and full_content.strip() - and high_context is not None - and getattr(high_context, "model_name", "") + and rigor_submitter is not None + and getattr(rigor_submitter, "model_name", "") and compiler_coordinator.validator_model ) if proof_check_scheduled: - source_title = compiler_coordinator.paper_title or compiler_coordinator.user_prompt or "Compiler Paper" + source_title = compiler_coordinator.paper_title or persisted_prompt or "Compiler Paper" proof_source_content = paper_library.strip_verified_proofs_from_content(full_content) proof_source_hash = hashlib.sha256(proof_source_content.encode("utf-8")).hexdigest()[:16] proof_source_id = f"compiler_manual_{proof_source_hash}" proof_config = { "lean4_enabled": system_config.lean4_enabled, - "user_prompt": compiler_coordinator.user_prompt, - "submitter_model": high_context.model_name, - "submitter_provider": compiler_coordinator.high_context_provider, - "submitter_openrouter_provider": compiler_coordinator.high_context_openrouter_provider, - "submitter_openrouter_reasoning_effort": compiler_coordinator.high_context_openrouter_reasoning_effort, - "submitter_lm_studio_fallback": compiler_coordinator.high_context_lm_studio_fallback, - "submitter_context": system_config.compiler_high_context_context_window, - "submitter_max_tokens": system_config.compiler_high_context_max_output_tokens, - "submitter_supercharge_enabled": getattr(compiler_coordinator, "high_context_supercharge_enabled", False), + "user_prompt": persisted_prompt, + "submitter_model": rigor_submitter.model_name, + "submitter_provider": compiler_coordinator.high_param_provider, + "submitter_openrouter_provider": compiler_coordinator.high_param_openrouter_provider, + "submitter_openrouter_reasoning_effort": compiler_coordinator.high_param_openrouter_reasoning_effort, + "submitter_lm_studio_fallback": compiler_coordinator.high_param_lm_studio_fallback, + "submitter_context": system_config.compiler_high_param_context_window, + "submitter_max_tokens": system_config.compiler_high_param_max_output_tokens, + "submitter_supercharge_enabled": getattr(compiler_coordinator, "high_param_supercharge_enabled", False), "validator_model": compiler_coordinator.validator_model, "validator_provider": compiler_coordinator.validator_provider, "validator_openrouter_provider": compiler_coordinator.validator_openrouter_provider, @@ -856,13 +970,20 @@ async def clear_paper(confirm: bool = False): raise HTTPException(status_code=409, detail=blocker) if compiler_coordinator.is_running: await compiler_coordinator.stop() + persisted_prompt = compiler_coordinator.user_prompt or await load_manual_compiler_prompt() archived_proofs = await manual_proof_database.archive_current_run( Path(system_config.data_dir) / "manual_proof_runs", - user_prompt=compiler_coordinator.user_prompt or "", + user_prompt=persisted_prompt, reason="manual_compiler_clear_paper", ) await clear_manual_shared_training_proof_appendix() await compiler_coordinator.clear_paper() + await assistant_proof_search_coordinator.stop_all( + broadcast=True, + reason="compiler_cleared", + ) + await assistant_proof_search_coordinator.clear_cooldown_state() + await clear_manual_compiler_prompt() # Also clear any paper critiques from backend.shared.critique_memory import clear_critiques diff --git a/backend/api/routes/connectivity.py b/backend/api/routes/connectivity.py new file mode 100644 index 0000000..b750744 --- /dev/null +++ b/backend/api/routes/connectivity.py @@ -0,0 +1,321 @@ +"""Non-secret connectivity status and feature-toggle routes.""" +from __future__ import annotations + +import asyncio +import logging +from datetime import datetime, timezone +from typing import Any + +from fastapi import APIRouter, HTTPException +from pydantic import BaseModel + +from backend.api.routes.cloud_access import get_cloud_access_status +from backend.shared.boost_manager import boost_manager +from backend.shared.config import rag_config, system_config +from backend.shared.embedding_readiness import check_lm_studio_embedding_ready +from backend.shared.lm_studio_client import lm_studio_client +from backend.shared.proof_search.search_service import proof_search_service +from backend.shared.proof_search.assistant_coordinator import assistant_proof_search_coordinator +from backend.shared.runtime_settings import ( + RuntimeSettingsError, + save_connectivity_runtime_settings, +) +from backend.shared.syntheticlib4_client import syntheticlib4_client + +logger = logging.getLogger(__name__) +router = APIRouter(prefix="/api/connectivity", tags=["connectivity"]) + + +class ConnectivityToggleRequest(BaseModel): + syntheticlib4_enabled: bool | None = None + agent_conversation_memory_enabled: bool | None = None + wolfram_alpha_enabled: bool | None = None + + +async def _lm_studio_status() -> dict[str, Any]: + if system_config.generic_mode: + return { + "status": "inactive", + "active": False, + "available": False, + "model_count": 0, + "models": [], + "has_embedding_model": False, + "message": "LM Studio is disabled in hosted/generic mode.", + } + try: + availability = await lm_studio_client.check_availability() + embedding_status = await check_lm_studio_embedding_ready(timeout_seconds=3.0) + except Exception as exc: + return { + "status": "inactive", + "active": False, + "available": False, + "model_count": 0, + "models": [], + "has_embedding_model": False, + "message": f"LM Studio status unavailable: {exc}", + } + active = bool(availability.get("available") and availability.get("has_models")) + return { + "status": "ACTIVE" if active else "inactive", + "active": active, + **availability, + "has_embedding_model": bool(embedding_status.get("ready")), + "embedding_ready": bool(embedding_status.get("ready")), + "embedding_message": embedding_status.get("message"), + } + + +async def _openrouter_oauth_status() -> dict[str, Any]: + try: + cloud_status = await get_cloud_access_status() + except Exception as exc: + logger.warning("Connectivity cloud access status failed: %s", exc) + cloud_status = {"providers": {}} + providers = cloud_status.get("providers") or {} + openrouter_configured = bool(rag_config.openrouter_api_key) + oauth_configured = any( + bool((providers.get(provider_id) or {}).get("configured")) + for provider_id in ("openai_codex_oauth", "xai_grok_oauth", "sakana_fugu") + ) + active = openrouter_configured or oauth_configured + return { + "status": "ACTIVE" if active else "inactive", + "active": active, + "openrouter_configured": openrouter_configured, + "oauth_configured": oauth_configured, + "providers": providers, + } + + +async def _syntheticlib4_status() -> dict[str, Any]: + enabled = bool(system_config.syntheticlib4_enabled) + if not enabled: + return { + "status": "disabled", + "enabled": False, + "ready": False, + "message": "SyntheticLib4 proof-corpus retrieval is disabled for new runs.", + } + try: + account_status, manifest, proof_count, validation, overview = await asyncio.gather( + asyncio.to_thread(syntheticlib4_client.get_status), + asyncio.to_thread(syntheticlib4_client.get_release_manifest), + asyncio.to_thread(lambda: len(syntheticlib4_client.load_proof_metadata())), + asyncio.to_thread(syntheticlib4_client.validate_local_snapshot), + proof_search_service.overview(), + ) + except Exception as exc: + return { + "status": "error", + "enabled": True, + "ready": False, + "message": f"SyntheticLib4 status failed: {exc}", + } + synthetic_corpus = next( + (corpus for corpus in overview.corpora if corpus.get("id") == "syntheticlib4"), + {}, + ) + indexed_records = int(synthetic_corpus.get("count") or 0) + validation_ok = bool(validation.get("valid", False)) + ready = proof_count > 0 and validation_ok and indexed_records > 0 + outdated = _syntheticlib4_is_outdated(account_status, ready=ready) + status_label = "outdated" if outdated else ("ready" if ready else "not ready") + return { + "status": status_label, + "enabled": True, + "ready": ready, + "outdated": outdated, + "credential_configured": bool(account_status.get("credential_configured")), + "auth_mode": account_status.get("auth_mode"), + "release_id": manifest.get("release_id", ""), + "proof_count": proof_count, + "indexed_records": indexed_records, + "validation": validation, + "message": ( + "SyntheticLib4 is using a valid cached snapshot, but subscription/update access is unavailable." + if outdated + else ( + "SyntheticLib4 local proof corpus is ready." + if ready + else "SyntheticLib4 is enabled, but the snapshot/index is not ready." + ) + ), + } + + +def _syntheticlib4_is_outdated(account_status: dict[str, Any], *, ready: bool) -> bool: + """Return True when a usable cached snapshot cannot refresh from live access.""" + if not ready: + return False + for field in ("authenticated", "membership_active", "subscription_active", "access_active"): + if account_status.get(field) is False: + return True + if _syntheticlib4_access_expired(account_status.get("access_expires_at")): + return True + for field in ("last_refresh_error", "refresh_error", "update_error"): + if str(account_status.get(field) or "").strip(): + return True + status_text = str(account_status.get("status") or account_status.get("subscription_status") or "").strip().lower() + if status_text in { + "expired", + "inactive", + "unauthorized", + "forbidden", + "quota_exhausted", + "refresh_failed", + "update_failed", + }: + return True + + credential_configured = bool(account_status.get("credential_configured")) + auth_mode = str(account_status.get("auth_mode") or "").strip().lower() + has_live_access_intent = credential_configured or auth_mode in { + "api_key", + "oauth", + "hosted_oauth", + "subscription", + } + if not has_live_access_intent: + return False + return False + + +def _syntheticlib4_access_expired(value: Any) -> bool: + raw_value = str(value or "").strip() + if not raw_value: + return False + normalized = raw_value.replace("Z", "+00:00") + try: + expires_at = datetime.fromisoformat(normalized) + except ValueError: + return False + if expires_at.tzinfo is None: + expires_at = expires_at.replace(tzinfo=timezone.utc) + return expires_at <= datetime.now(timezone.utc) + + +async def _agent_conversation_memory_status() -> dict[str, Any]: + enabled = bool(system_config.agent_conversation_memory_enabled) + if not enabled: + return { + "status": "disabled", + "enabled": False, + "ready": False, + "message": "Local agent proof/history memory is disabled for new runs.", + } + try: + overview = await proof_search_service.overview() + except Exception as exc: + return { + "status": "error", + "enabled": True, + "ready": False, + "message": f"Local agent proof/history memory status failed: {exc}", + } + local_corpora = {"moto", "manual", "leanoj"} + local_counts = { + str(corpus.get("id")): int(corpus.get("count") or 0) + for corpus in overview.corpora + if corpus.get("id") in local_corpora + } + return { + "status": "ready", + "enabled": True, + "ready": True, + "local_records": sum(local_counts.values()), + "local_corpora": local_counts, + "message": "All stored proofs in memory are ready for AI proof-search retrieval.", + } + + +def _wolfram_status() -> dict[str, Any]: + has_key = bool(system_config.wolfram_alpha_api_key) + active = bool(system_config.wolfram_alpha_enabled and has_key) + return { + "status": "ready" if active else "inactive", + "enabled": bool(system_config.wolfram_alpha_enabled), + "active": active, + "has_key": has_key, + "message": ( + "Wolfram Alpha tool calls are enabled." + if active + else "Wolfram Alpha is disabled or no App ID is configured." + ), + } + + +def _workflow_is_active() -> bool: + """Return True when a top-level workflow is active.""" + try: + from backend.aggregator.core.coordinator import coordinator + from backend.autonomous.core.autonomous_coordinator import autonomous_coordinator + from backend.compiler.core.compiler_coordinator import compiler_coordinator + from backend.leanoj.core.leanoj_coordinator import leanoj_coordinator + + return bool( + coordinator.is_running + or compiler_coordinator.is_running + or autonomous_coordinator.is_active + or leanoj_coordinator.is_active + ) + except Exception as exc: + logger.warning("Connectivity workflow activity check failed: %s", exc) + return True + + +@router.get("/status") +async def get_connectivity_status() -> dict[str, Any]: + """Return non-secret provider and optional-skill connectivity state.""" + openrouter_oauth, lm_studio, syntheticlib4, agent_memory = await asyncio.gather( + _openrouter_oauth_status(), + _lm_studio_status(), + _syntheticlib4_status(), + _agent_conversation_memory_status(), + ) + return { + "success": True, + "generic_mode": system_config.generic_mode, + "inference": { + "openrouter_oauth": openrouter_oauth, + "lm_studio": lm_studio, + }, + "skills": { + "syntheticlib4": syntheticlib4, + "agent_conversation_memory": agent_memory, + "wolfram_alpha": _wolfram_status(), + }, + "boost": boost_manager.get_boost_status(), + } + + +@router.post("/toggles") +async def update_connectivity_toggles(request: ConnectivityToggleRequest) -> dict[str, Any]: + """Update non-secret optional-skill toggles without clearing credentials.""" + if _workflow_is_active(): + raise HTTPException( + status_code=409, + detail="Stop the active workflow before changing run-level connectivity toggles.", + ) + + if request.syntheticlib4_enabled is not None: + system_config.syntheticlib4_enabled = bool(request.syntheticlib4_enabled) + if request.agent_conversation_memory_enabled is not None: + system_config.agent_conversation_memory_enabled = bool(request.agent_conversation_memory_enabled) + if not system_config.agent_conversation_memory_enabled: + await assistant_proof_search_coordinator.stop_all( + clear_packs=True, + broadcast=True, + reason="agent_conversation_memory_disabled", + ) + if request.wolfram_alpha_enabled is not None: + system_config.wolfram_alpha_enabled = bool(request.wolfram_alpha_enabled) + + try: + save_connectivity_runtime_settings() + except RuntimeSettingsError as exc: + raise HTTPException(status_code=500, detail=str(exc)) from exc + + return await get_connectivity_status() + diff --git a/backend/api/routes/features.py b/backend/api/routes/features.py index d40a175..1d28e47 100644 --- a/backend/api/routes/features.py +++ b/backend/api/routes/features.py @@ -46,6 +46,7 @@ async def get_features() -> Dict[str, Any]: "pdf_download_available": not is_generic, "openai_codex_oauth_available": not is_generic, "xai_grok_oauth_available": not is_generic, + "sakana_fugu_available": not is_generic, } ) diff --git a/backend/api/routes/leanoj.py b/backend/api/routes/leanoj.py index 12b52e9..7372d95 100644 --- a/backend/api/routes/leanoj.py +++ b/backend/api/routes/leanoj.py @@ -15,6 +15,7 @@ from backend.shared.config import system_config from backend.shared.embedding_readiness import require_embedding_provider_ready from backend.shared.models import LeanOJStartRequest +from backend.shared.proof_search.assistant_coordinator import assistant_proof_search_coordinator from backend.shared.workflow_start_guard import workflow_start_guard logger = logging.getLogger(__name__) @@ -246,6 +247,10 @@ def _validate_start_role_limits(request: LeanOJStartRequest) -> None: _validate_role_limits("Topic validator", request.topic_validator) _validate_role_limits("Brainstorm validator", request.brainstorm_validator) _validate_role_limits("Final proof solver", request.final_solver) + _validate_role_limits( + "Assistant", + request.assistant if (request.assistant.model_id or "").strip() else request.topic_validator, + ) for index, submitter in enumerate(request.brainstorm_submitters, start=1): _validate_role_limits(f"Brainstorm submitter {index}", submitter) @@ -285,6 +290,10 @@ async def stop_leanoj(): """Stop the active Proof Solver run.""" try: await leanoj_coordinator.stop() + await assistant_proof_search_coordinator.stop_all( + broadcast=True, + reason="leanoj_stopped", + ) return { "success": True, "message": "Proof Solver stopped", @@ -302,6 +311,11 @@ async def clear_leanoj(confirm: bool = False): raise HTTPException(status_code=400, detail="Confirmation required. Use ?confirm=true to clear Proof Solver progress.") try: await leanoj_coordinator.clear() + await assistant_proof_search_coordinator.stop_all( + broadcast=True, + reason="leanoj_cleared", + ) + await assistant_proof_search_coordinator.clear_cooldown_state() return { "success": True, "message": "Proof Solver progress cleared", diff --git a/backend/api/routes/proof_search.py b/backend/api/routes/proof_search.py new file mode 100644 index 0000000..ac9782f --- /dev/null +++ b/backend/api/routes/proof_search.py @@ -0,0 +1,75 @@ +"""Unified proof-search routes for MOTO and SyntheticLib4 corpora.""" +from __future__ import annotations + +import logging + +from fastapi import APIRouter, HTTPException + +from backend.shared.proof_search.models import ( + ProofSearchCorpus, + PublicProofSearchRequest, + UnifiedProofSearchRecord, +) +from backend.shared.proof_search.search_service import proof_search_service + +logger = logging.getLogger(__name__) + +router = APIRouter(prefix="/api/proof-search", tags=["proof-search"]) + + +@router.get("/overview") +async def get_proof_search_overview(): + """Return a compact proof-corpus map for UI and AI navigation.""" + try: + overview = await proof_search_service.overview() + except Exception as exc: + logger.exception("Failed to build proof-search overview") + raise HTTPException(status_code=500, detail=f"Proof-search overview failed: {exc}") from exc + return overview.model_dump(mode="json") + + +@router.post("/search") +async def search_proofs(request: PublicProofSearchRequest): + """Search up to seven combined proof records across indexed corpora.""" + try: + response = await proof_search_service.search(request) + except Exception as exc: + logger.exception("Proof search failed") + raise HTTPException(status_code=500, detail=f"Proof search failed: {exc}") from exc + return response.model_dump(mode="json") + + +@router.get("/proofs/{source}/{proof_id}", response_model=UnifiedProofSearchRecord) +async def get_proof_search_record( + source: ProofSearchCorpus, + proof_id: str, + session_id: str | None = None, +): + """Return one indexed proof record, hydrating SyntheticLib4 code when available.""" + try: + record = await proof_search_service.get_record( + corpus=source, + proof_id=proof_id, + session_id=session_id, + ) + except ValueError as exc: + logger.warning("Proof-search record hydration rejected: %s", exc) + raise HTTPException(status_code=409, detail=f"Proof-search hydration rejected: {exc}") from exc + except Exception as exc: + logger.exception("Proof-search record hydration failed") + raise HTTPException(status_code=500, detail=f"Proof-search hydration failed: {exc}") from exc + if record is None: + raise HTTPException(status_code=404, detail="Proof record not found") + return record + + +@router.post("/reindex") +async def reindex_proofs(): + """Rebuild the local proof-search index from available sources.""" + try: + overview = await proof_search_service.rebuild_index() + except Exception as exc: + logger.exception("Proof-search reindex failed") + raise HTTPException(status_code=500, detail=f"Proof-search reindex failed: {exc}") from exc + return {"success": True, "overview": overview.model_dump(mode="json")} + diff --git a/backend/api/routes/proofs.py b/backend/api/routes/proofs.py index acdecfe..bb7d9ca 100644 --- a/backend/api/routes/proofs.py +++ b/backend/api/routes/proofs.py @@ -28,6 +28,7 @@ from backend.autonomous.memory.proof_database import ProofDatabase, manual_proof_database, proof_database from backend.autonomous.memory.research_metadata import research_metadata from backend.compiler.core.compiler_coordinator import compiler_coordinator +from backend.compiler.memory.manual_prompt import load_manual_compiler_prompt from backend.compiler.memory.outline_memory import outline_memory from backend.compiler.memory.paper_memory import paper_memory from backend.shared.api_client_manager import api_client_manager @@ -47,6 +48,8 @@ ) from backend.shared.manual_proof_context import get_manual_proof_context_lock from backend.shared.path_safety import resolve_path_within_root +from backend.shared.proof_search.assistant_coordinator import assistant_proof_search_coordinator +from backend.shared.proof_search.assistant_models import AssistantTargetSnapshot from backend.shared.runtime_settings import RuntimeSettingsError, save_proof_runtime_settings from backend.shared.smt_client import clear_smt_client, get_smt_client @@ -61,6 +64,7 @@ _manual_proof_run_lock = asyncio.Lock() _LEAN_STATUS_STARTING_LOG_INTERVAL_SECONDS = 60.0 _last_lean_status_starting_log_at = 0.0 +_ASSISTANT_MANUAL_SOURCE_SUMMARY_CHARS = 8000 def _log_lean_status_starting_up(detail: str) -> None: @@ -128,6 +132,30 @@ def _manual_aggregator_proof_event_message(event_type: str, data: dict) -> str: or data.get("proof_id") or "candidate" ) + + def _compact(value: object, limit: int = 1200) -> str: + cleaned = " ".join(str(value or "").split()) + if not cleaned: + return "" + return cleaned[:limit] + ("..." if len(cleaned) > limit else "") + + def _lean_response() -> str: + if data.get("lean_response"): + return _compact(data.get("lean_response")) + if data.get("proof_verified") is True: + return "Lean 4 response: proof verified." + error = _compact( + data.get("error_summary") or data.get("error_output") or data.get("reason"), + limit=1800, + ) + return f"Lean 4 response: {error} - proof not verified." if error else "" + + def _attempt_message(prefix: str) -> str: + attempt = f", attempt {data.get('attempt')}" if data.get("attempt") else "" + response = _lean_response() + base = f"{prefix}: {target}{attempt}" + return f"{base} - {response}" if response else base + if event_type == "proof_check_started": return "Proof check started for the manual Aggregator database" if event_type == "proof_check_no_candidates": @@ -139,9 +167,9 @@ def _manual_aggregator_proof_event_message(event_type: str, data: dict) -> str: if event_type == "proof_lean_accepted": return f"Lean accepted proof: {target}" if event_type == "proof_attempt_failed": - return f"Proof attempt failed: {target}" + return _attempt_message("Proof attempt failed") if event_type == "proof_attempts_exhausted": - return f"Proof attempts exhausted: {target}" + return _attempt_message("Proof attempts exhausted") if event_type == "proof_integrity_rejected": return f"Proof integrity rejected: {data.get('reason') or data.get('message') or target}" if event_type == "proof_verified": @@ -306,6 +334,21 @@ def _get_request_runtime_snapshot(request: Optional[ProofCheckRequest]) -> Optio return snapshot +def _role_config_from_model_config(config: Optional[ModelConfig]) -> ProofRoleConfigSnapshot: + if config is None: + return ProofRoleConfigSnapshot() + return ProofRoleConfigSnapshot( + provider=config.provider, + model_id=config.model_id, + openrouter_provider=config.openrouter_provider, + openrouter_reasoning_effort=config.openrouter_reasoning_effort, + lm_studio_fallback_id=config.lm_studio_fallback_id, + context_window=config.context_window, + max_output_tokens=config.max_output_tokens, + supercharge_enabled=config.supercharge_enabled, + ) + + def _get_active_manual_runtime_snapshot(request: ProofCheckRequest) -> Optional[ProofRuntimeConfigSnapshot]: """Build proof runtime settings from the active manual mode, never from autonomous presets.""" if request.source_type == "brainstorm" and request.source_id == MANUAL_AGGREGATOR_SOURCE_ID: @@ -337,22 +380,25 @@ def _get_active_manual_runtime_snapshot(request: ProofCheckRequest) -> Optional[ brainstorm=submitter_role, paper=submitter_role, validator=validator_role, + assistant=_role_config_from_model_config( + api_client_manager.get_role_config("aggregator_assistant") + ), ) if request.source_type == "paper" and request.source_id == MANUAL_COMPILER_CURRENT_SOURCE_ID: - high_context = compiler_coordinator.high_context_submitter - if high_context is None or not getattr(high_context, "model_name", "") or not compiler_coordinator.validator_model: + rigor_submitter = compiler_coordinator.high_param_submitter + if rigor_submitter is None or not getattr(rigor_submitter, "model_name", "") or not compiler_coordinator.validator_model: return None paper_role = ProofRoleConfigSnapshot( - provider=compiler_coordinator.high_context_provider, - model_id=high_context.model_name, - openrouter_provider=compiler_coordinator.high_context_openrouter_provider, - openrouter_reasoning_effort=compiler_coordinator.high_context_openrouter_reasoning_effort, - lm_studio_fallback_id=compiler_coordinator.high_context_lm_studio_fallback, - context_window=system_config.compiler_high_context_context_window, - max_output_tokens=system_config.compiler_high_context_max_output_tokens, - supercharge_enabled=compiler_coordinator.high_context_supercharge_enabled, + provider=compiler_coordinator.high_param_provider, + model_id=rigor_submitter.model_name, + openrouter_provider=compiler_coordinator.high_param_openrouter_provider, + openrouter_reasoning_effort=compiler_coordinator.high_param_openrouter_reasoning_effort, + lm_studio_fallback_id=compiler_coordinator.high_param_lm_studio_fallback, + context_window=system_config.compiler_high_param_context_window, + max_output_tokens=system_config.compiler_high_param_max_output_tokens, + supercharge_enabled=compiler_coordinator.high_param_supercharge_enabled, ) validator_role = ProofRoleConfigSnapshot( provider=compiler_coordinator.validator_provider, @@ -368,6 +414,9 @@ def _get_active_manual_runtime_snapshot(request: ProofCheckRequest) -> Optional[ brainstorm=paper_role, paper=paper_role, validator=validator_role, + assistant=_role_config_from_model_config( + api_client_manager.get_role_config("compiler_assistant") + ), ) return None @@ -375,13 +424,14 @@ def _get_active_manual_runtime_snapshot(request: ProofCheckRequest) -> Optional[ async def _get_runtime_snapshot(request: Optional[ProofCheckRequest] = None) -> Optional[ProofRuntimeConfigSnapshot]: if request and _is_non_appending_manual_source(request): + request_snapshot = _get_request_runtime_snapshot(request) # Active manual sources must not borrow autonomous proof settings. # Prefer the backend's live manual runtime so stale browser/localStorage # snapshots cannot override the roles that actually produced the source. active_manual_snapshot = _get_active_manual_runtime_snapshot(request) if active_manual_snapshot is not None: return active_manual_snapshot - return _get_request_runtime_snapshot(request) + return request_snapshot request_snapshot = _get_request_runtime_snapshot(request) if request_snapshot is not None: @@ -436,9 +486,78 @@ def _configure_manual_roles(source_type: str, snapshot: ProofRuntimeConfigSnapsh "autonomous_proof_novelty", _build_model_config(snapshot.validator), ) + assistant_config = snapshot.assistant if snapshot.assistant.model_id else snapshot.validator + api_client_manager.configure_role( + "manual_proof_assistant", + _build_model_config(assistant_config), + ) return role_config +def _compact_manual_assistant_source(content: str) -> str: + text = " ".join((content or "").split()) + if len(text) <= _ASSISTANT_MANUAL_SOURCE_SUMMARY_CHARS: + return text + return text[:_ASSISTANT_MANUAL_SOURCE_SUMMARY_CHARS].rstrip() + "..." + + +async def _refresh_manual_assistant_memory( + *, + source_type: str, + source_id: str, + source_title: str, + source_content: str, + user_prompt: str, +) -> None: + """Run Try-to-Prove Assistant memory even before proof prompt preflight. + + Manual proof discovery may fail during mandatory-source context validation + before it reaches ``api_client_manager.generate_completion()``, so the + normal central Assistant injection hook never fires. This preflight refresh + keeps the user-triggered proof-check button covered by Assistant memory and + leaves a visible log/event trail. + """ + if not system_config.agent_conversation_memory_enabled: + logger.info( + "Assistant memory preflight skipped for manual proof check %s:%s because Agent Conversation Memory is disabled", + source_type, + source_id, + ) + return + + snapshot = AssistantTargetSnapshot( + workflow_mode="manual_proof_check", + target_kind="proof_candidate", + workflow_phase="manual_try_to_prove", + active_mode="manual_proof_check", + user_prompt=user_prompt, + current_prompt_or_topic=source_title, + current_submission_or_draft=_compact_manual_assistant_source(source_content), + writing_goal="User-triggered Try to Prove This proof discovery over the selected source.", + paper_or_proof_draft_summary=_compact_manual_assistant_source(source_content), + target_statement=user_prompt or source_title or f"{source_type}:{source_id}", + formal_sketch=_compact_manual_assistant_source(source_content), + source_title=source_title, + source_type=f"manual_{source_type}", + source_id=source_id, + source_titles=[source_title] if source_title else [], + imports=["Mathlib"], + ) + logger.info( + "Assistant memory preflight starting for manual proof check %s:%s (%s)", + source_type, + source_id, + source_title or "untitled source", + ) + pack = await assistant_proof_search_coordinator.refresh_now(snapshot) + logger.info( + "Assistant memory preflight complete for manual proof check %s:%s (results=%s)", + source_type, + source_id, + len(pack.results) if pack else 0, + ) + + async def _prompt_with_verified_proof_context( prompt: str, scoped_proof_database: ProofDatabase = proof_database, @@ -583,11 +702,12 @@ async def _resolve_manual_compiler_current_source( if source_context.strip(): parts.append(f"PART 1 AGGREGATOR DATABASE CONTEXT:\n{source_context.strip()}") + persisted_prompt = compiler_coordinator.user_prompt or await load_manual_compiler_prompt() user_prompt = await _prompt_with_verified_proof_context( - compiler_coordinator.user_prompt or "", + persisted_prompt, scoped_proof_database, ) - source_title = compiler_coordinator.paper_title or compiler_coordinator.user_prompt or "Manual Compiler Paper" + source_title = compiler_coordinator.paper_title or persisted_prompt or "Manual Compiler Paper" return "\n\n---\n\n".join(parts), source_title, user_prompt @@ -711,6 +831,13 @@ async def _run_manual_proof_check(request: ProofCheckRequest) -> None: if _is_manual_aggregator_request(request) else websocket.broadcast_event ) + await _refresh_manual_assistant_memory( + source_type=request.source_type, + source_id=request.source_id, + source_title=source_title, + source_content=source_content, + user_prompt=user_prompt, + ) await stage.run_manual( content=source_content, source_type=request.source_type, @@ -748,11 +875,16 @@ async def _run_manual_proof_check(request: ProofCheckRequest) -> None: "total_candidates": 0, "message": ( "Proof verification encountered an error: " - f"{ProofVerificationStage._summarize_error(str(exc), limit=960)}" + f"{ProofVerificationStage._summarize_error(str(exc), limit=1800)}" ), }, ) await ProofVerificationStage.release_source(request.source_type, request.source_id) + finally: + await assistant_proof_search_coordinator.stop_all( + broadcast=True, + reason="manual_proof_check_complete", + ) @router.get("") diff --git a/backend/api/routes/syntheticlib4.py b/backend/api/routes/syntheticlib4.py new file mode 100644 index 0000000..6ddd89d --- /dev/null +++ b/backend/api/routes/syntheticlib4.py @@ -0,0 +1,304 @@ +"""SyntheticLib4 corpus access and local proof-index control routes.""" +from __future__ import annotations + +import asyncio +import logging +from pathlib import Path +from typing import Any + +from fastapi import APIRouter, HTTPException +from pydantic import BaseModel, Field + +from backend.shared.path_safety import validate_single_path_component +from backend.shared.proof_search.search_service import proof_search_service +from backend.shared.config import system_config +from backend.shared.syntheticlib4_client import ( + SYNTHETICLIB4_CONTRACT_VERSION, + syntheticlib4_client, +) + +logger = logging.getLogger(__name__) + +router = APIRouter(prefix="/api/syntheticlib4", tags=["syntheticlib4"]) + + +class SyntheticLib4AuthStartRequest(BaseModel): + redirect_uri: str | None = None + + +class SyntheticLib4AuthExchangeRequest(BaseModel): + code: str = "" + state: str = "" + redirect_url: str = "" + redirect_uri: str | None = None + + +class SyntheticLib4ApiKeyRequest(BaseModel): + api_key: str + + +class SyntheticLib4RetrieveBatchRequest(BaseModel): + contract_version: str | None = None + query: str = "" + goal_statement: str = "" + imports: list[str] = [] + dependency_names: list[str] = [] + module_filters: list[str] = [] + novelty_filters: list[str] = [] + release_id: str | None = None + channel: str = "stable" + excluded_fingerprints: list[str] = [] + cursor: str | None = None + limit: int = Field(default=7, ge=1, le=7) + include_full_code: bool = True + + +class SyntheticLib4ImportLocalSnapshotRequest(BaseModel): + source_name: str + channel: str = "stable" + + +async def _snapshot_status() -> dict[str, Any]: + """Read non-secret SyntheticLib4 fixture/snapshot status off the event loop.""" + + def _load() -> dict[str, Any]: + account_status = syntheticlib4_client.get_status() + manifest = syntheticlib4_client.get_release_manifest() + proof_count = len(syntheticlib4_client.load_proof_metadata()) + validation = syntheticlib4_client.validate_local_snapshot() + return { + "account_status": account_status, + "manifest": manifest, + "proof_count": proof_count, + "validation": validation, + } + + return await asyncio.to_thread(_load) + + +@router.get("/status") +async def get_syntheticlib4_status(): + """Return SyntheticLib4 auth/snapshot/index status without exposing secrets.""" + try: + snapshot = await _snapshot_status() + overview = await proof_search_service.overview(include_disabled=True) + except Exception as exc: + logger.exception("SyntheticLib4 status failed") + raise HTTPException(status_code=500, detail=f"SyntheticLib4 status failed: {exc}") from exc + + manifest = snapshot["manifest"] + syntheticlib4_corpus = next( + (corpus for corpus in overview.corpora if corpus.get("id") == "syntheticlib4"), + None, + ) + return { + "success": True, + "contract_version": SYNTHETICLIB4_CONTRACT_VERSION, + "status": snapshot["account_status"], + "current_release": { + "release_id": manifest.get("release_id", ""), + "channel": manifest.get("channel", "stable"), + "generated_at": manifest.get("generated_at", ""), + "lean_toolchain": manifest.get("lean_toolchain", ""), + "mathlib_revision": manifest.get("mathlib_revision", ""), + "syntheticlib4_revision": manifest.get("syntheticlib4_revision", ""), + "proof_count": snapshot["proof_count"], + "schema_version": manifest.get("schema_version", ""), + }, + "local_snapshot": { + "available": snapshot["proof_count"] > 0, + "proof_count": snapshot["proof_count"], + "freshness": syntheticlib4_corpus.get("freshness") if syntheticlib4_corpus else "not indexed", + "validation": snapshot["validation"], + }, + "proof_index": { + "total_records": overview.total_records, + "syntheticlib4_records": syntheticlib4_corpus.get("count", 0) if syntheticlib4_corpus else 0, + "result_cap": overview.result_cap, + }, + } + + +@router.post("/auth/start") +async def start_syntheticlib4_auth(_: SyntheticLib4AuthStartRequest): + """Return a clear placeholder until the production SyntheticLib4 OAuth contract is live.""" + raise HTTPException( + status_code=501, + detail=( + "SyntheticLib4 hosted OAuth is not connected in this mock/offline build. " + "Use the local snapshot and proof-search routes until SyntheticLib.com auth goes live." + ), + ) + + +@router.post("/auth/exchange") +async def exchange_syntheticlib4_auth(_: SyntheticLib4AuthExchangeRequest): + """Return a clear placeholder until the production SyntheticLib4 OAuth contract is live.""" + raise HTTPException( + status_code=501, + detail=( + "SyntheticLib4 hosted OAuth exchange is not connected in this mock/offline build." + ), + ) + + +@router.post("/api-key") +async def set_syntheticlib4_api_key(request: SyntheticLib4ApiKeyRequest): + """Store a SyntheticLib4 API key through the mode-appropriate secret path.""" + try: + status = await asyncio.to_thread(syntheticlib4_client.set_api_key, request.api_key) + except Exception as exc: + logger.exception("SyntheticLib4 API-key setup failed") + raise HTTPException(status_code=500, detail=f"SyntheticLib4 API-key setup failed: {exc}") from exc + return { + "success": True, + "message": ( + "SyntheticLib4 API key stored for this MOTO instance. Live SyntheticLib.com " + "validation will activate when the production service contract is available." + ), + "status": status, + } + + +@router.delete("/auth") +async def clear_syntheticlib4_auth(): + """Clear SyntheticLib4 auth state without deleting local snapshots.""" + try: + status = await asyncio.to_thread(syntheticlib4_client.clear_credentials) + except Exception as exc: + logger.exception("SyntheticLib4 auth clear status failed") + raise HTTPException(status_code=500, detail=f"SyntheticLib4 auth clear failed: {exc}") from exc + return { + "success": True, + "message": "SyntheticLib4 credentials cleared. Local snapshots and proof indexes were preserved.", + "status": status, + } + + +@router.get("/releases") +async def list_syntheticlib4_releases(channel: str | None = None): + """List locally available SyntheticLib4 releases.""" + try: + releases = await asyncio.to_thread(syntheticlib4_client.list_releases, channel) + except Exception as exc: + logger.exception("SyntheticLib4 release listing failed") + raise HTTPException(status_code=500, detail=f"SyntheticLib4 release listing failed: {exc}") from exc + return releases + + +@router.post("/refresh") +async def refresh_syntheticlib4_snapshot(): + """ + Refresh local SyntheticLib4 search state. + + The current client is fixture/snapshot-backed, so refresh validates available + metadata and rebuilds the unified local proof index. + """ + try: + validation = await asyncio.to_thread(syntheticlib4_client.validate_local_snapshot) + overview = await proof_search_service.rebuild_index(include_disabled=True) + except Exception as exc: + logger.exception("SyntheticLib4 snapshot refresh failed") + raise HTTPException(status_code=500, detail=f"SyntheticLib4 refresh failed: {exc}") from exc + return { + "success": True, + "message": "SyntheticLib4 local snapshot metadata validated and proof index rebuilt.", + "snapshot_validation": validation, + "overview": overview.model_dump(mode="json"), + } + + +@router.post("/import-local-snapshot") +async def import_syntheticlib4_local_snapshot(request: SyntheticLib4ImportLocalSnapshotRequest): + """ + Activate a local snapshot staged under `data/syntheticlib4/imports/{source_name}`. + + This route is intentionally path-component based rather than accepting an + arbitrary host path. It gives the future downloader/control-plane a safe + activation surface while preserving the existing active snapshot on failure. + """ + try: + source_name = validate_single_path_component(request.source_name, "SyntheticLib4 snapshot import name") + source_dir = Path(system_config.data_dir) / "syntheticlib4" / "imports" / source_name + result = await asyncio.to_thread( + syntheticlib4_client.import_snapshot_directory, + source_dir, + channel=request.channel, + ) + overview = await proof_search_service.rebuild_index(include_disabled=True) + except Exception as exc: + logger.exception("SyntheticLib4 local snapshot import failed") + raise HTTPException(status_code=500, detail=f"SyntheticLib4 local snapshot import failed: {exc}") from exc + return { + **result, + "overview": overview.model_dump(mode="json"), + } + + +@router.post("/reindex") +async def reindex_syntheticlib4_proofs(): + """Rebuild the unified proof-search index from available local proof corpora.""" + try: + overview = await proof_search_service.rebuild_index(include_disabled=True) + except Exception as exc: + logger.exception("SyntheticLib4 proof index rebuild failed") + raise HTTPException(status_code=500, detail=f"SyntheticLib4 reindex failed: {exc}") from exc + return {"success": True, "overview": overview.model_dump(mode="json")} + + +@router.post("/retrieve-batch") +async def retrieve_syntheticlib4_batch(request: SyntheticLib4RetrieveBatchRequest): + """Return a bounded mock/offline SyntheticLib4 retrieve-batch response.""" + try: + response = await asyncio.to_thread( + syntheticlib4_client.retrieve_batch, + request.model_dump(mode="json"), + ) + except Exception as exc: + logger.exception("SyntheticLib4 retrieve-batch failed") + raise HTTPException(status_code=500, detail=f"SyntheticLib4 retrieve-batch failed: {exc}") from exc + return response + + +@router.get("/account/proofs") +async def list_syntheticlib4_account_proofs( + cursor: str | None = None, + limit: int = 50, + release_id: str | None = None, + channel: str | None = None, +): + """Browse accepted SyntheticLib4 account proofs through the mock/offline contract.""" + try: + return await asyncio.to_thread( + syntheticlib4_client.list_account_proofs, + cursor=cursor, + limit=limit, + release_id=release_id, + channel=channel, + ) + except Exception as exc: + logger.exception("SyntheticLib4 account proof listing failed") + raise HTTPException(status_code=500, detail=f"SyntheticLib4 account proof listing failed: {exc}") from exc + + +@router.get("/account/proofs/search") +async def search_syntheticlib4_account_proofs( + q: str = "", + module: str | None = None, + novelty_rank: str | None = None, + cursor: str | None = None, + limit: int = 50, +): + """Search accepted SyntheticLib4 account proofs through the mock/offline contract.""" + try: + return await asyncio.to_thread( + syntheticlib4_client.search_user_proofs, + query=q, + module=module, + novelty_rank=novelty_rank, + cursor=cursor, + limit=limit, + ) + except Exception as exc: + logger.exception("SyntheticLib4 account proof search failed") + raise HTTPException(status_code=500, detail=f"SyntheticLib4 account proof search failed: {exc}") from exc diff --git a/backend/autonomous/agents/completion_reviewer.py b/backend/autonomous/agents/completion_reviewer.py index 37ad000..7708f9a 100644 --- a/backend/autonomous/agents/completion_reviewer.py +++ b/backend/autonomous/agents/completion_reviewer.py @@ -212,12 +212,17 @@ async def _generate_assessment( prompt_tokens = count_tokens(prompt) max_input_tokens = rag_config.get_available_input_tokens(self.context_window, self.max_output_tokens) + task_id = self.get_current_task_id() + await api_client_manager.prewarm_assistant_memory_context( + task_id=task_id, + role_id=self.role_id, + prompt=prompt, + ) + if prompt_tokens > max_input_tokens: logger.error(f"CompletionReviewer: Prompt ({prompt_tokens} tokens) exceeds input limit ({max_input_tokens})") return None - # Generate task ID for tracking - task_id = self.get_current_task_id() self.task_sequence += 1 # Notify task started (for workflow panel) diff --git a/backend/autonomous/agents/final_answer/answer_format_selector.py b/backend/autonomous/agents/final_answer/answer_format_selector.py index 9bb3106..21ef922 100644 --- a/backend/autonomous/agents/final_answer/answer_format_selector.py +++ b/backend/autonomous/agents/final_answer/answer_format_selector.py @@ -17,6 +17,10 @@ from backend.shared.api_client_manager import api_client_manager from backend.shared.openrouter_client import FreeModelExhaustedError +from backend.shared.model_error_utils import ( + is_non_retryable_model_error, + is_transient_model_call_error, +) from backend.shared.json_parser import parse_json from backend.shared.response_extraction import extract_message_text from backend.shared.utils import count_tokens @@ -31,6 +35,17 @@ logger = logging.getLogger(__name__) +def _is_tier3_model_call_failure(exc: Exception) -> bool: + message = str(exc or "").lower() + return ( + is_non_retryable_model_error(exc) + or is_transient_model_call_error(exc) + or "upstream provider timeout" in message + or "response missing 'choices'" in message + or "no api key" in message + ) + + class AnswerFormatSelector: """ Agent that selects the format for the final answer (short vs long form). @@ -50,12 +65,16 @@ def __init__( submitter_model: str, validator_model: str, context_window: int = 0, - max_output_tokens: int = 0 + max_output_tokens: int = 0, + validator_context_window: Optional[int] = None, + validator_max_output_tokens: Optional[int] = None, ): self.submitter_model = submitter_model self.validator_model = validator_model self.context_window = context_window self.max_output_tokens = max_output_tokens + self.validator_context_window = validator_context_window or context_window + self.validator_max_output_tokens = validator_max_output_tokens or max_output_tokens # Task tracking for workflow panel and boost integration self.task_sequence: int = 0 @@ -73,6 +92,13 @@ def get_current_task_id(self) -> str: def _calculate_max_input_tokens(self) -> int: """Calculate available tokens for input prompt.""" return rag_config.get_available_input_tokens(self.context_window, self.max_output_tokens) + + def _calculate_validator_max_input_tokens(self) -> int: + """Calculate available tokens for validator prompts.""" + return rag_config.get_available_input_tokens( + self.validator_context_window, + self.validator_max_output_tokens, + ) async def select_format( self, @@ -165,6 +191,13 @@ async def _generate_selection( rejection_context=rejection_context ) + task_id = self.get_current_task_id() + await api_client_manager.prewarm_assistant_memory_context( + task_id=task_id, + role_id=self.role_id, + prompt=prompt, + ) + # Validate prompt size prompt_tokens = count_tokens(prompt) max_input = self._calculate_max_input_tokens() @@ -173,8 +206,6 @@ async def _generate_selection( logger.error(f"AnswerFormatSelector: Prompt too large ({prompt_tokens} > {max_input})") return None - # Generate task ID - task_id = self.get_current_task_id() self.task_sequence += 1 if self.task_tracking_callback: @@ -220,6 +251,8 @@ async def _generate_selection( except FreeModelExhaustedError: raise except Exception as e: + if _is_tier3_model_call_failure(e): + raise logger.error(f"AnswerFormatSelector: Error generating selection: {e}") return None @@ -247,7 +280,7 @@ async def _validate_selection( # Validate prompt size prompt_tokens = count_tokens(prompt) - max_input = self._calculate_max_input_tokens() + max_input = self._calculate_validator_max_input_tokens() if prompt_tokens > max_input: logger.error(f"AnswerFormatSelector: Validation prompt too large ({prompt_tokens} > {max_input})") @@ -267,7 +300,7 @@ async def _validate_selection( role_id=f"{self.role_id}_validator", model=self.validator_model, messages=[{"role": "user", "content": prompt}], - max_tokens=self.max_output_tokens, + max_tokens=self.validator_max_output_tokens, temperature=0.0 ) @@ -294,6 +327,8 @@ async def _validate_selection( except FreeModelExhaustedError: raise except Exception as e: + if _is_tier3_model_call_failure(e): + raise logger.error(f"AnswerFormatSelector: Error validating selection: {e}") return False, str(e) diff --git a/backend/autonomous/agents/final_answer/certainty_assessor.py b/backend/autonomous/agents/final_answer/certainty_assessor.py index 8aca609..cf526d2 100644 --- a/backend/autonomous/agents/final_answer/certainty_assessor.py +++ b/backend/autonomous/agents/final_answer/certainty_assessor.py @@ -16,6 +16,10 @@ from backend.shared.api_client_manager import api_client_manager from backend.shared.openrouter_client import FreeModelExhaustedError +from backend.shared.model_error_utils import ( + is_non_retryable_model_error, + is_transient_model_call_error, +) from backend.shared.json_parser import parse_json from backend.shared.response_extraction import extract_message_text from backend.shared.utils import count_tokens @@ -32,6 +36,17 @@ logger = logging.getLogger(__name__) +def _is_tier3_model_call_failure(exc: Exception) -> bool: + message = str(exc or "").lower() + return ( + is_non_retryable_model_error(exc) + or is_transient_model_call_error(exc) + or "upstream provider timeout" in message + or "response missing 'choices'" in message + or "no api key" in message + ) + + class CertaintyAssessor: """ Agent that assesses what can be answered with certainty from existing papers. @@ -51,12 +66,16 @@ def __init__( submitter_model: str, validator_model: str, context_window: int = 0, - max_output_tokens: int = 0 + max_output_tokens: int = 0, + validator_context_window: Optional[int] = None, + validator_max_output_tokens: Optional[int] = None, ): self.submitter_model = submitter_model self.validator_model = validator_model self.context_window = context_window self.max_output_tokens = max_output_tokens + self.validator_context_window = validator_context_window or context_window + self.validator_max_output_tokens = validator_max_output_tokens or max_output_tokens # Task tracking for workflow panel and boost integration self.task_sequence: int = 0 @@ -74,6 +93,13 @@ def get_current_task_id(self) -> str: def _calculate_max_input_tokens(self) -> int: """Calculate available tokens for input prompt.""" return rag_config.get_available_input_tokens(self.context_window, self.max_output_tokens) + + def _calculate_validator_max_input_tokens(self) -> int: + """Calculate available tokens for validator prompts.""" + return rag_config.get_available_input_tokens( + self.validator_context_window, + self.validator_max_output_tokens, + ) async def assess_certainty( self, @@ -171,6 +197,13 @@ async def _request_paper_expansion( all_papers ) + task_id = self.get_current_task_id() + await api_client_manager.prewarm_assistant_memory_context( + task_id=task_id, + role_id=self.role_id, + prompt=prompt, + ) + # Validate prompt size prompt_tokens = count_tokens(prompt) max_input = self._calculate_max_input_tokens() @@ -180,8 +213,6 @@ async def _request_paper_expansion( "Proceeding with abstract-only assessment.") return [] - # Generate task ID - task_id = self.get_current_task_id() self.task_sequence += 1 if self.task_tracking_callback: @@ -224,6 +255,8 @@ async def _request_paper_expansion( except FreeModelExhaustedError: raise except Exception as e: + if _is_tier3_model_call_failure(e): + raise logger.error(f"CertaintyAssessor: Error requesting expansion: {e}") return [] @@ -386,8 +419,13 @@ async def _generate_assessment( logger.error("CertaintyAssessor: Cannot fit even summary-only prompt") return None - # Generate task ID task_id = self.get_current_task_id() + await api_client_manager.prewarm_assistant_memory_context( + task_id=task_id, + role_id=self.role_id, + prompt=prompt, + ) + self.task_sequence += 1 if self.task_tracking_callback: @@ -428,6 +466,8 @@ async def _generate_assessment( except FreeModelExhaustedError: raise except Exception as e: + if _is_tier3_model_call_failure(e): + raise logger.error(f"CertaintyAssessor: Error generating assessment: {e}") return None @@ -453,7 +493,7 @@ async def _validate_assessment( # Validate prompt size prompt_tokens = count_tokens(prompt) - max_input = self._calculate_max_input_tokens() + max_input = self._calculate_validator_max_input_tokens() if prompt_tokens > max_input: logger.error(f"CertaintyAssessor: Validation prompt too large ({prompt_tokens} > {max_input})") @@ -473,7 +513,7 @@ async def _validate_assessment( role_id=f"{self.role_id}_validator", model=self.validator_model, messages=[{"role": "user", "content": prompt}], - max_tokens=self.max_output_tokens, + max_tokens=self.validator_max_output_tokens, temperature=0.0 ) @@ -500,6 +540,8 @@ async def _validate_assessment( except FreeModelExhaustedError: raise except Exception as e: + if _is_tier3_model_call_failure(e): + raise logger.error(f"CertaintyAssessor: Error validating assessment: {e}") return False, str(e) diff --git a/backend/autonomous/agents/final_answer/volume_organizer.py b/backend/autonomous/agents/final_answer/volume_organizer.py index 1815324..b3a26d4 100644 --- a/backend/autonomous/agents/final_answer/volume_organizer.py +++ b/backend/autonomous/agents/final_answer/volume_organizer.py @@ -19,6 +19,10 @@ from backend.shared.api_client_manager import api_client_manager from backend.shared.openrouter_client import FreeModelExhaustedError +from backend.shared.model_error_utils import ( + is_non_retryable_model_error, + is_transient_model_call_error, +) from backend.shared.json_parser import parse_json from backend.shared.response_extraction import extract_message_text from backend.shared.utils import count_tokens @@ -37,6 +41,17 @@ logger = logging.getLogger(__name__) +def _is_tier3_model_call_failure(exc: Exception) -> bool: + message = str(exc or "").lower() + return ( + is_non_retryable_model_error(exc) + or is_transient_model_call_error(exc) + or "upstream provider timeout" in message + or "response missing 'choices'" in message + or "no api key" in message + ) + + class VolumeOrganizer: """ Agent that organizes volume structure for long form answers. @@ -57,12 +72,16 @@ def __init__( submitter_model: str, validator_model: str, context_window: int = 0, - max_output_tokens: int = 0 + max_output_tokens: int = 0, + validator_context_window: Optional[int] = None, + validator_max_output_tokens: Optional[int] = None, ): self.submitter_model = submitter_model self.validator_model = validator_model self.context_window = context_window self.max_output_tokens = max_output_tokens + self.validator_context_window = validator_context_window or context_window + self.validator_max_output_tokens = validator_max_output_tokens or max_output_tokens # Task tracking for workflow panel and boost integration self.task_sequence: int = 0 @@ -80,6 +99,13 @@ def get_current_task_id(self) -> str: def _calculate_max_input_tokens(self) -> int: """Calculate available tokens for input prompt.""" return rag_config.get_available_input_tokens(self.context_window, self.max_output_tokens) + + def _calculate_validator_max_input_tokens(self) -> int: + """Calculate available tokens for validator prompts.""" + return rag_config.get_available_input_tokens( + self.validator_context_window, + self.validator_max_output_tokens, + ) async def organize_volume( self, @@ -193,6 +219,13 @@ async def _generate_organization( validator_feedback=validator_feedback ) + task_id = self.get_current_task_id() + await api_client_manager.prewarm_assistant_memory_context( + task_id=task_id, + role_id=self.role_id, + prompt=prompt, + ) + # Validate prompt size prompt_tokens = count_tokens(prompt) max_input = self._calculate_max_input_tokens() @@ -201,8 +234,6 @@ async def _generate_organization( logger.error(f"VolumeOrganizer: Prompt too large ({prompt_tokens} > {max_input})") return None - # Generate task ID - task_id = self.get_current_task_id() self.task_sequence += 1 if self.task_tracking_callback: @@ -280,6 +311,8 @@ async def _generate_organization( except FreeModelExhaustedError: raise except Exception as e: + if _is_tier3_model_call_failure(e): + raise logger.error(f"VolumeOrganizer: Error generating organization: {e}") return None @@ -340,7 +373,7 @@ async def _validate_organization( # Validate prompt size prompt_tokens = count_tokens(prompt) - max_input = self._calculate_max_input_tokens() + max_input = self._calculate_validator_max_input_tokens() if prompt_tokens > max_input: logger.error(f"VolumeOrganizer: Validation prompt too large ({prompt_tokens} > {max_input})") @@ -360,7 +393,7 @@ async def _validate_organization( role_id=f"{self.role_id}_validator", model=self.validator_model, messages=[{"role": "user", "content": prompt}], - max_tokens=self.max_output_tokens, + max_tokens=self.validator_max_output_tokens, temperature=0.0 ) @@ -387,6 +420,8 @@ async def _validate_organization( except FreeModelExhaustedError: raise except Exception as e: + if _is_tier3_model_call_failure(e): + raise logger.error(f"VolumeOrganizer: Error validating organization: {e}") return False, str(e) diff --git a/backend/autonomous/agents/lemma_search_agent.py b/backend/autonomous/agents/lemma_search_agent.py index 3aa7cf8..f0f8934 100644 --- a/backend/autonomous/agents/lemma_search_agent.py +++ b/backend/autonomous/agents/lemma_search_agent.py @@ -252,10 +252,15 @@ async def suggest_relevant_lemmas( ) prompt_tokens = count_tokens(prompt) + task_id = self.get_current_task_id() + await api_client_manager.prewarm_assistant_memory_context( + task_id=task_id, + role_id=self.role_id, + prompt=prompt, + ) if prompt_tokens > max_input_tokens: return [] - task_id = self.get_current_task_id() self.task_sequence += 1 try: diff --git a/backend/autonomous/agents/paper_title_selector.py b/backend/autonomous/agents/paper_title_selector.py index 198fd94..125527b 100644 --- a/backend/autonomous/agents/paper_title_selector.py +++ b/backend/autonomous/agents/paper_title_selector.py @@ -12,6 +12,10 @@ from backend.shared.api_client_manager import api_client_manager from backend.shared.openrouter_client import FreeModelExhaustedError +from backend.shared.model_error_utils import ( + is_non_retryable_model_error, + is_transient_model_call_error, +) from backend.shared.json_parser import parse_json from backend.shared.response_extraction import extract_message_text from backend.shared.models import PaperTitleSelection @@ -25,6 +29,19 @@ logger = logging.getLogger(__name__) +def _is_title_model_call_failure(exc: Exception) -> bool: + message = str(exc or "").lower() + return ( + is_non_retryable_model_error(exc) + or is_transient_model_call_error(exc) + or "upstream provider timeout" in message + or "response missing 'choices'" in message + or "no api key" in message + or "exceeds context" in message + or "exceeds the configured" in message + ) + + class PaperTitleSelectorAgent: """ Agent that selects titles for papers. @@ -36,12 +53,16 @@ def __init__( model_id: str, validator_model_id: str, context_window: int = 0, - max_output_tokens: int = 0 + max_output_tokens: int = 0, + validator_context_window: Optional[int] = None, + validator_max_output_tokens: Optional[int] = None, ): self.model_id = model_id self.validator_model_id = validator_model_id self.context_window = context_window self.max_output_tokens = max_output_tokens + self.validator_context_window = validator_context_window or context_window + self.validator_max_output_tokens = validator_max_output_tokens or max_output_tokens # Task tracking for workflow panel and boost integration self.task_sequence: int = 0 @@ -231,12 +252,19 @@ async def _generate_title( candidate_titles=candidate_titles ) + task_id = self.get_current_task_id() + await api_client_manager.prewarm_assistant_memory_context( + task_id=task_id, + role_id=self.role_id, + prompt=prompt, + ) + if count_tokens(prompt) > max_input_tokens: logger.error("PaperTitleSelector: Cannot fit prompt even after all truncation") - return None + raise ValueError( + "Title generation prompt exceeds context limit even after shedding optional title context." + ) - # Generate task ID for tracking - task_id = self.get_current_validation_task_id() self.task_sequence += 1 # Notify task started (for workflow panel) @@ -284,6 +312,8 @@ async def _generate_title( except FreeModelExhaustedError: raise except Exception as e: + if _is_title_model_call_failure(e): + raise logger.error(f"PaperTitleSelector: Error generating title: {e}") if self.task_tracking_callback and 'task_id' in dir(): self.task_tracking_callback("completed", task_id) @@ -316,9 +346,20 @@ async def _validate_title( proposed_title=proposed_title, title_reasoning=title_reasoning ) + + max_input_tokens = rag_config.get_available_input_tokens( + self.validator_context_window, + self.validator_max_output_tokens, + ) + prompt_tokens = count_tokens(prompt) + if prompt_tokens > max_input_tokens: + raise ValueError( + "Title validation prompt exceeds the configured validator context window " + f"({prompt_tokens} > {max_input_tokens})." + ) # Generate task ID for validation tracking - task_id = self.get_current_task_id() + task_id = self.get_current_validation_task_id() self.task_sequence += 1 # Notify task started (for workflow panel) @@ -333,6 +374,7 @@ async def _validate_title( role_id="autonomous_paper_title_validator", model=self.validator_model_id, messages=[{"role": "user", "content": prompt}], + max_tokens=self.validator_max_output_tokens, temperature=0.0 # Deterministic validation - evolving context provides diversity ) @@ -363,6 +405,8 @@ async def _validate_title( except FreeModelExhaustedError: raise except Exception as e: + if _is_title_model_call_failure(e): + raise logger.error(f"PaperTitleSelector: Error validating title: {e}") if self.task_tracking_callback and 'task_id' in dir(): self.task_tracking_callback("completed", task_id) diff --git a/backend/autonomous/agents/proof_formalization_agent.py b/backend/autonomous/agents/proof_formalization_agent.py index fe4645f..c8330db 100644 --- a/backend/autonomous/agents/proof_formalization_agent.py +++ b/backend/autonomous/agents/proof_formalization_agent.py @@ -4,20 +4,25 @@ from __future__ import annotations import json +import hashlib import logging -from typing import Awaitable, Callable, List, Optional, Tuple +from typing import Any, Awaitable, Callable, List, Optional, Tuple from backend.shared.api_client_manager import api_client_manager from backend.shared.json_parser import parse_json from backend.shared.response_extraction import extract_message_text from backend.shared.lean4_client import get_lean4_client from backend.shared.model_error_utils import ( + format_transient_provider_error, is_non_retryable_model_error, is_retryable_model_output_error, is_transient_model_call_error, ) from backend.shared.models import ProofAttemptFeedback, ProofCandidate, SmtHint from backend.shared.openrouter_client import FreeModelExhaustedError +from backend.shared.proof_search.tool_adapter import execute_search_lean_proofs +from backend.shared.proof_search.assistant_coordinator import assistant_proof_search_coordinator +from backend.shared.proof_search.assistant_models import AssistantTargetSnapshot from backend.shared.utils import count_tokens from backend.shared.config import rag_config, system_config from backend.autonomous.prompts.proof_prompts import ( @@ -27,6 +32,17 @@ logger = logging.getLogger(__name__) + +def _assistant_workflow_mode_for_role(role_id: str) -> str: + normalized = (role_id or "").lower() + if "manual" in normalized or "compiler_aggregator" in normalized: + return "manual_proof_check" + if normalized.startswith("compiler") or normalized.startswith("comp_"): + return "compiler" + if normalized.startswith("leanoj"): + return "leanoj" + return "autonomous" + AttemptCallback = Callable[[ProofAttemptFeedback], Awaitable[None]] AttemptStartCallback = Callable[[int, str], Awaitable[None]] ShouldStopFn = Optional[Callable[[], bool]] @@ -62,8 +78,9 @@ ) _LEAN_WORKSPACE_ERROR_PREFIX = "LEAN 4 WORKSPACE ERROR" _MANDATORY_FULL_SOURCE_CONTEXT_OVERFLOW_PREFIX = "MANDATORY FULL SOURCE CONTEXT OVERFLOW" - - +_PROOF_SEARCH_CONTEXT_OMITTED = ( + "[Proof-search context omitted because it was unavailable or did not fit the configured context budget.]" +) def _is_stop_requested(should_stop: ShouldStopFn) -> bool: if should_stop is None: return False @@ -73,6 +90,24 @@ def _is_stop_requested(should_stop: ShouldStopFn) -> bool: return False +def _format_attempt_feedback_for_assistant(attempts: list[ProofAttemptFeedback], limit: int = 5) -> str: + if not attempts: + return "" + lines: list[str] = [] + for attempt in attempts[-limit:]: + parts = [ + f"Attempt {attempt.attempt}", + f"strategy={attempt.strategy}", + f"success={attempt.success}", + f"reasoning={attempt.reasoning}", + f"lean_error={attempt.error_output}", + f"goal_states={attempt.goal_states}", + ] + lines.append("\n".join(part for part in parts if part and part.split("=", 1)[-1].strip())) + text = "\n\n---\n\n".join(lines) + return text[:5000] + ("..." if len(text) > 5000 else "") + + def _is_json_parse_error(exc: Exception) -> bool: if isinstance(exc, json.JSONDecodeError): return True @@ -208,8 +243,75 @@ def _fit_prompt_to_context( source_excerpt = source_excerpt[: max(len(source_excerpt) // 2, min_excerpt_length)] prompt = prompt_builder(source_excerpt=source_excerpt, **prompt_kwargs) prompt_tokens = count_tokens(prompt) + if ( + prompt_tokens > max_input_tokens + and prompt_kwargs.get("retrieved_proofs_context") + and prompt_kwargs.get("retrieved_proofs_context") != _PROOF_SEARCH_CONTEXT_OMITTED + ): + prompt_kwargs["retrieved_proofs_context"] = _PROOF_SEARCH_CONTEXT_OMITTED + prompt = prompt_builder(source_excerpt=source_excerpt, **prompt_kwargs) + prompt_tokens = count_tokens(prompt) return prompt, source_excerpt, max_input_tokens, prompt_tokens + async def _record_syntheticlib4_context_exposure( + self, + records: list[dict[str, Any]], + *, + theorem_candidate: ProofCandidate, + lean_code: str, + ) -> None: + """ + Persist local usage metadata when full SyntheticLib4 code was model-visible. + + This is intentionally conservative: it records `entire_code_used=false` + because MOTO cannot prove the generated proof consumed an external proof + as a whole dependency unless a later artifact/dependency extractor says so. + """ + used_proofs: list[dict[str, str]] = [] + for record in records: + if record.get("corpus") != "syntheticlib4": + continue + if not str(record.get("lean_code") or "").strip(): + continue + fingerprint = str(record.get("fingerprint") or record.get("proof_id") or "").strip() + statement_hash = str(record.get("theorem_statement_hash") or "").strip() + code_hash = str(record.get("lean_code_hash") or "").strip() + if not fingerprint: + continue + used_proofs.append( + { + "fingerprint": fingerprint, + "theorem_statement_hash": statement_hash, + "lean_code_hash": code_hash, + } + ) + if not used_proofs: + return + + artifact_hash = hashlib.sha256( + "\n\n".join( + [ + theorem_candidate.theorem_id, + theorem_candidate.statement, + lean_code or "", + ] + ).encode("utf-8") + ).hexdigest() + result = await execute_search_lean_proofs( + { + "action": "attest_usage", + "usage_attestation": { + "retrieval_batch_id": "local_proof_search_prefetch", + "used_proofs": used_proofs, + "entire_code_used": False, + "usage_type": "model_visible_context", + "moto_artifact_hash": artifact_hash, + }, + } + ) + if not result.get("success"): + logger.debug("SyntheticLib4 local context-exposure attestation failed: %s", result.get("error")) + async def _run_full_script_attempt( self, *, @@ -222,6 +324,8 @@ async def _run_full_script_attempt( attempt_number: int, smt_hint: Optional[SmtHint] = None, source_title: str = "", + retrieved_proofs_context: str = "", + assistant_memory_target_hash: str = "", ) -> tuple[str, str, ProofAttemptFeedback]: prompt, source_excerpt, max_input_tokens, prompt_tokens = self._fit_prompt_to_context( build_proof_formalization_prompt, @@ -240,6 +344,7 @@ async def _run_full_script_attempt( prompt_relevance_rationale=theorem_candidate.prompt_relevance_rationale, novelty_rationale=theorem_candidate.novelty_rationale, why_not_standard_known_result=theorem_candidate.why_not_standard_known_result, + retrieved_proofs_context=retrieved_proofs_context, ) if prompt_tokens > max_input_tokens: @@ -249,7 +354,9 @@ async def _run_full_script_attempt( reasoning="Mandatory full-source proof context is too large for the configured context window.", error_output=( f"{_MANDATORY_FULL_SOURCE_CONTEXT_OVERFLOW_PREFIX}: Prompt too large after shrinking only the focused excerpt " - f"({prompt_tokens} > {max_input_tokens}). Full source content is mandatory " + f"({prompt_tokens} > {max_input_tokens}). Configured total context={self.context_window}, " + f"max output reserve={self.max_output_tokens}, safety buffer={rag_config.context_buffer_tokens}. " + "Full source content is mandatory " "and was not truncated or dropped." ), strategy="full_script", @@ -269,6 +376,12 @@ async def _run_full_script_attempt( max_tokens=self.max_output_tokens, temperature=0.0, ) + if assistant_memory_target_hash: + assistant_proof_search_coordinator.mark_pack_consumed_by_solver( + assistant_memory_target_hash, + role_id=self.role_id, + task_id=task_id, + ) if not response or not response.get("choices"): raise ValueError("Empty response from formalization model.") @@ -312,7 +425,7 @@ async def _run_full_script_attempt( if is_retryable_model_output_error(exc): raise RuntimeError(_INCOMPLETE_MODEL_OUTPUT_ERROR) from exc if is_transient_model_call_error(exc): - raise RuntimeError(_TRANSIENT_PROVIDER_ERROR) from exc + raise RuntimeError(format_transient_provider_error(exc)) from exc is_parse_error = _is_json_parse_error(exc) feedback = ProofAttemptFeedback( attempt=attempt_number, @@ -358,6 +471,30 @@ async def prove_candidate( theorem_candidate.statement, source_content, ) + assistant_snapshot = AssistantTargetSnapshot( + workflow_mode=_assistant_workflow_mode_for_role(self.role_id), + target_kind="proof_candidate", + user_prompt=user_research_prompt, + target_statement=theorem_candidate.statement, + formal_sketch=theorem_candidate.formal_sketch, + proof_attempt_feedback=_format_attempt_feedback_for_assistant(attempts), + source_title=source_title, + source_type=source_type, + source_id=theorem_candidate.origin_source_id, + dependency_names=[ + str(getattr(lemma, "full_name", "") or getattr(lemma, "requested_name", "") or "").strip() + for lemma in (theorem_candidate.relevant_lemmas or []) + if str(getattr(lemma, "full_name", "") or getattr(lemma, "requested_name", "") or "").strip() + ], + ) + assistant_target_hash = assistant_proof_search_coordinator.submit_target(assistant_snapshot) + assistant_pack = assistant_proof_search_coordinator.get_latest_pack(assistant_target_hash) + retrieved_proofs_context = assistant_pack.to_prompt_context() if assistant_pack else "" + retrieved_proof_records: list[dict[str, Any]] = ( + [support.model_dump(mode="json") for support in assistant_pack.results] + if assistant_pack + else [] + ) theorem_name = "" next_attempt_number = ( @@ -378,6 +515,17 @@ async def prove_candidate( theorem_candidate.theorem_id, ) break + if attempts: + assistant_snapshot = assistant_snapshot.model_copy( + update={"proof_attempt_feedback": _format_attempt_feedback_for_assistant(attempts)} + ) + assistant_target_hash = assistant_proof_search_coordinator.submit_target(assistant_snapshot) + assistant_pack = assistant_proof_search_coordinator.get_latest_pack(assistant_target_hash) + if assistant_pack: + retrieved_proofs_context = assistant_pack.to_prompt_context() + retrieved_proof_records = [ + support.model_dump(mode="json") for support in assistant_pack.results + ] attempt_number = next_attempt_number + attempt_offset if attempt_start_callback and malformed_output_retries == 0: await attempt_start_callback(attempt_number, "full_script") @@ -392,6 +540,8 @@ async def prove_candidate( attempt_number=attempt_number, smt_hint=smt_hint, source_title=source_title, + retrieved_proofs_context=retrieved_proofs_context, + assistant_memory_target_hash=assistant_target_hash if assistant_pack and assistant_pack.results else "", ) terminal_malformed_output = False @@ -417,6 +567,11 @@ async def prove_candidate( await attempt_callback(feedback) if feedback.success: + await self._record_syntheticlib4_context_exposure( + retrieved_proof_records, + theorem_candidate=theorem_candidate, + lean_code=feedback.lean_code, + ) return True, theorem_name, feedback.lean_code, attempts if _is_lean_workspace_error_feedback(feedback): break @@ -451,6 +606,30 @@ async def prove_candidate_tactic_script( theorem_candidate.statement, source_content, ) + assistant_snapshot = AssistantTargetSnapshot( + workflow_mode=_assistant_workflow_mode_for_role(self.role_id), + target_kind="proof_candidate", + user_prompt=user_research_prompt, + target_statement=theorem_candidate.statement, + formal_sketch=theorem_candidate.formal_sketch, + proof_attempt_feedback=_format_attempt_feedback_for_assistant(attempts), + source_title=source_title, + source_type=source_type, + source_id=theorem_candidate.origin_source_id, + dependency_names=[ + str(getattr(lemma, "full_name", "") or getattr(lemma, "requested_name", "") or "").strip() + for lemma in (theorem_candidate.relevant_lemmas or []) + if str(getattr(lemma, "full_name", "") or getattr(lemma, "requested_name", "") or "").strip() + ], + ) + assistant_target_hash = assistant_proof_search_coordinator.submit_target(assistant_snapshot) + assistant_pack = assistant_proof_search_coordinator.get_latest_pack(assistant_target_hash) + retrieved_proofs_context = assistant_pack.to_prompt_context() if assistant_pack else "" + retrieved_proof_records: list[dict[str, Any]] = ( + [support.model_dump(mode="json") for support in assistant_pack.results] + if assistant_pack + else [] + ) theorem_name = "" next_attempt_number = ( @@ -474,6 +653,17 @@ async def prove_candidate_tactic_script( attempt_number = next_attempt_number + attempt_offset if attempt_start_callback and malformed_output_retries == 0: await attempt_start_callback(attempt_number, "tactic_script") + if attempts: + assistant_snapshot = assistant_snapshot.model_copy( + update={"proof_attempt_feedback": _format_attempt_feedback_for_assistant(attempts)} + ) + assistant_target_hash = assistant_proof_search_coordinator.submit_target(assistant_snapshot) + assistant_pack = assistant_proof_search_coordinator.get_latest_pack(assistant_target_hash) + if assistant_pack: + retrieved_proofs_context = assistant_pack.to_prompt_context() + retrieved_proof_records = [ + support.model_dump(mode="json") for support in assistant_pack.results + ] prompt, source_excerpt, max_input_tokens, prompt_tokens = self._fit_prompt_to_context( build_proof_tactic_script_prompt, @@ -492,6 +682,7 @@ async def prove_candidate_tactic_script( prompt_relevance_rationale=theorem_candidate.prompt_relevance_rationale, novelty_rationale=theorem_candidate.novelty_rationale, why_not_standard_known_result=theorem_candidate.why_not_standard_known_result, + retrieved_proofs_context=retrieved_proofs_context, ) if prompt_tokens > max_input_tokens: @@ -501,7 +692,9 @@ async def prove_candidate_tactic_script( reasoning="Mandatory full-source proof context is too large for the configured context window.", error_output=( f"{_MANDATORY_FULL_SOURCE_CONTEXT_OVERFLOW_PREFIX}: Prompt too large after shrinking only the focused excerpt " - f"({prompt_tokens} > {max_input_tokens}). Full source content is mandatory " + f"({prompt_tokens} > {max_input_tokens}). Configured total context={self.context_window}, " + f"max output reserve={self.max_output_tokens}, safety buffer={rag_config.context_buffer_tokens}. " + "Full source content is mandatory " "and was not truncated or dropped." ), strategy="tactic_script", @@ -524,6 +717,12 @@ async def prove_candidate_tactic_script( max_tokens=self.max_output_tokens, temperature=0.0, ) + if assistant_target_hash and assistant_pack and assistant_pack.results: + assistant_proof_search_coordinator.mark_pack_consumed_by_solver( + assistant_target_hash, + role_id=self.role_id, + task_id=task_id, + ) if not response or not response.get("choices"): raise ValueError("Empty response from tactic formalization model.") @@ -561,6 +760,8 @@ async def prove_candidate_tactic_script( attempt_number=attempt_number, smt_hint=smt_hint, source_title=source_title, + retrieved_proofs_context=retrieved_proofs_context, + assistant_memory_target_hash=assistant_target_hash if assistant_pack and assistant_pack.results else "", ) if current_theorem_name: theorem_name = current_theorem_name @@ -583,6 +784,11 @@ async def prove_candidate_tactic_script( if attempt_callback: await attempt_callback(feedback) if feedback.success: + await self._record_syntheticlib4_context_exposure( + retrieved_proof_records, + theorem_candidate=theorem_candidate, + lean_code=feedback.lean_code, + ) return True, theorem_name, feedback.lean_code, attempts if _is_lean_workspace_error_feedback(feedback): break @@ -616,6 +822,11 @@ async def prove_candidate_tactic_script( await attempt_callback(feedback) if lean_result.success: + await self._record_syntheticlib4_context_exposure( + retrieved_proof_records, + theorem_candidate=theorem_candidate, + lean_code=lean_code, + ) return True, theorem_name, lean_code, attempts if _is_lean_workspace_error_feedback(feedback): break @@ -630,7 +841,7 @@ async def prove_candidate_tactic_script( if is_retryable_model_output_error(exc): raise RuntimeError(_INCOMPLETE_MODEL_OUTPUT_ERROR) from exc if is_transient_model_call_error(exc): - raise RuntimeError(_TRANSIENT_PROVIDER_ERROR) from exc + raise RuntimeError(format_transient_provider_error(exc)) from exc is_parse_error = _is_json_parse_error(exc) feedback = ProofAttemptFeedback( attempt=attempt_number, diff --git a/backend/autonomous/agents/proof_identification_agent.py b/backend/autonomous/agents/proof_identification_agent.py index 5a90a25..170f34b 100644 --- a/backend/autonomous/agents/proof_identification_agent.py +++ b/backend/autonomous/agents/proof_identification_agent.py @@ -82,6 +82,12 @@ async def translate_candidate_to_smt( ) prompt_tokens = count_tokens(prompt) + task_id = self.get_current_task_id() + await api_client_manager.prewarm_assistant_memory_context( + task_id=task_id, + role_id=self.role_id, + prompt=prompt, + ) if prompt_tokens > max_input_tokens: logger.debug( "SMT translation prompt exceeds context window (%s > %s) for theorem %s", @@ -91,7 +97,6 @@ async def translate_candidate_to_smt( ) return "" - task_id = self.get_current_task_id() self.task_sequence += 1 try: @@ -153,10 +158,18 @@ async def identify_candidates( ) prompt_tokens = count_tokens(prompt) max_input_tokens = rag_config.get_available_input_tokens(self.context_window, self.max_output_tokens) + task_id = self.get_current_task_id() + await api_client_manager.prewarm_assistant_memory_context( + task_id=task_id, + role_id=self.role_id, + prompt=prompt, + ) if prompt_tokens > max_input_tokens: message = ( "Proof identification prompt exceeds the configured context window " f"({prompt_tokens} > {max_input_tokens}) for {source_type} {source_id}. " + f"Configured total context={self.context_window}, max output reserve={self.max_output_tokens}, " + f"safety buffer={rag_config.context_buffer_tokens}. " "Full source content is mandatory for proof discovery and was not " "truncated or replaced with an excerpt. Increase the proof role " "context window or reduce the source size before retrying." @@ -164,7 +177,6 @@ async def identify_candidates( logger.warning(message) raise ValueError(message) - task_id = self.get_current_task_id() self.task_sequence += 1 try: @@ -177,35 +189,58 @@ async def identify_candidates( temperature=0.0, ) if not response or not response.get("choices"): - return False, [] + raise ValueError("Proof identification returned no model choices.") message = response["choices"][0].get("message", {}) content = extract_message_text(message) if not content: - return False, [] + raise ValueError("Proof identification returned empty model output.") data = parse_json(content) if isinstance(data, list): - data = data[0] if data else {} + if not data: + raise ValueError("Proof identification returned an empty JSON array.") + data = data[0] + if not isinstance(data, dict): + raise ValueError("Proof identification returned JSON that was not an object.") + if "has_provable_theorems" not in data: + raise ValueError("Proof identification JSON omitted has_provable_theorems.") + if not isinstance(data.get("has_provable_theorems"), bool): + raise ValueError("Proof identification has_provable_theorems must be a boolean.") - has_candidates = bool(data.get("has_provable_theorems", False)) - raw_theorems = data.get("theorems", []) or [] + has_candidates = data["has_provable_theorems"] + raw_theorems_value = data.get("theorems", []) + if raw_theorems_value is None: + raw_theorems_value = [] + if not isinstance(raw_theorems_value, list): + raise ValueError("Proof identification theorems must be an array.") + if has_candidates and not raw_theorems_value: + raise ValueError( + "Proof identification claimed provable theorems but returned no theorem entries." + ) + raw_theorems = raw_theorems_value theorem_candidates: List[ProofCandidate] = [] + malformed_candidate_count = 0 + non_novel_candidate_count = 0 for index, theorem in enumerate(raw_theorems, start=1): if not isinstance(theorem, dict): + malformed_candidate_count += 1 continue statement = str(theorem.get("statement", "")).strip() if not statement: + malformed_candidate_count += 1 continue theorem_id = theorem.get("theorem_id") or theorem.get("id") or f"thm_{index}" expected_novelty_tier = str(theorem.get("expected_novelty_tier", "")).strip().lower() if expected_novelty_tier == "not_novel": + non_novel_candidate_count += 1 logger.info( "ProofIdentificationAgent skipped theorem %s because it was marked not_novel.", theorem_id, ) continue if expected_novelty_tier not in _NOVEL_PROOF_TIERS: + malformed_candidate_count += 1 logger.info( "ProofIdentificationAgent skipped theorem %s because it did not include a valid expected_novelty_tier.", theorem_id, @@ -223,6 +258,7 @@ async def identify_candidates( and novelty_rationale and why_not_standard_known_result ): + malformed_candidate_count += 1 logger.info( "ProofIdentificationAgent skipped theorem %s because it lacked required prompt-relevance, novelty, or anti-standard-result rationale.", theorem_id, @@ -240,6 +276,12 @@ async def identify_candidates( ) ) + if has_candidates and not theorem_candidates and malformed_candidate_count: + raise ValueError( + "Proof identification claimed provable theorems but returned no valid theorem candidates " + f"({malformed_candidate_count} malformed, {non_novel_candidate_count} not_novel)." + ) + return has_candidates and bool(theorem_candidates), theorem_candidates except FreeModelExhaustedError: raise @@ -252,4 +294,4 @@ async def identify_candidates( source_id, exc, ) - return False, [] + raise diff --git a/backend/autonomous/agents/reference_selector.py b/backend/autonomous/agents/reference_selector.py index 1f50394..52c82cb 100644 --- a/backend/autonomous/agents/reference_selector.py +++ b/backend/autonomous/agents/reference_selector.py @@ -8,9 +8,9 @@ This is the crucial mechanism that enables COMPOUNDING KNOWLEDGE across research cycles. By selecting reference papers before brainstorming, submitters can: -- Build upon proven mathematical frameworks from prior papers +- Build upon promising mathematical frameworks from prior AI-generated papers while independently re-checking their claims - Avoid re-exploring territory already covered in depth -- Identify novel connections between new topics and established results +- Identify novel connections between new topics and previously explored results - Accelerate convergence on valuable insights by standing on prior work CONTEXT HANDLING: @@ -24,6 +24,10 @@ from backend.shared.api_client_manager import api_client_manager from backend.shared.openrouter_client import FreeModelExhaustedError +from backend.shared.model_error_utils import ( + is_non_retryable_model_error, + is_transient_model_call_error, +) from backend.shared.json_parser import parse_json from backend.shared.response_extraction import extract_message_text from backend.shared.utils import count_tokens @@ -40,6 +44,18 @@ logger = logging.getLogger(__name__) +def _is_reference_model_call_failure(exc: Exception) -> bool: + message = str(exc or "").lower() + return ( + is_non_retryable_model_error(exc) + or is_transient_model_call_error(exc) + or "upstream provider timeout" in message + or "response missing 'choices'" in message + or "no api key" in message + or "exceeds context limit" in message + ) + + class ReferenceSelectorAgent: """ Agent that selects reference papers for paper compilation. @@ -148,7 +164,7 @@ async def select_references( if expansion_request is None: logger.error(f"ReferenceSelector [{mode}]: Failed to get expansion request") - return [] + raise RuntimeError(f"Reference selection failed during expansion request for mode={mode}") # Check if proceeding without references if expansion_request.proceed_without_references: @@ -228,6 +244,13 @@ async def _request_expansion( max_total_papers=max_total_papers, ) + task_id = self.get_current_task_id() + await api_client_manager.prewarm_assistant_memory_context( + task_id=task_id, + role_id=self.role_id, + prompt=prompt, + ) + # Validate prompt size prompt_tokens = count_tokens(prompt) max_input = self._calculate_max_input_tokens() @@ -236,8 +259,6 @@ async def _request_expansion( logger.error(f"ReferenceSelector: Expansion prompt ({prompt_tokens} tokens) exceeds limit ({max_input})") return None - # Generate task ID for tracking - task_id = self.get_current_task_id() self.task_sequence += 1 # Notify task started (for workflow panel) @@ -283,6 +304,8 @@ async def _request_expansion( except FreeModelExhaustedError: raise except Exception as e: + if _is_reference_model_call_failure(e): + raise logger.error(f"ReferenceSelector: Error requesting expansion: {e}") if self.task_tracking_callback and 'task_id' in dir(): self.task_tracking_callback("completed", task_id) @@ -360,6 +383,7 @@ async def _make_final_selection( # Reserve ~40% of context for papers, rest for prompts/brainstorm paper_budget = int(max_input * 0.4) + retrieved_context = "" if total_paper_tokens <= paper_budget: # All papers fit - use direct injection logger.info(f"ReferenceSelector [{mode}]: Direct injection for {len(expanded_papers)} papers " @@ -378,12 +402,17 @@ async def _make_final_selection( query=f"{user_research_prompt} {topic_prompt}" ) - # Create modified papers list with RAG content - papers_for_prompt = [{ - "paper_id": "combined_rag", - "title": f"RAG-retrieved content from {len(expanded_papers)} papers", - "content": rag_content - }] + retrieved_context = rag_content or "" + papers_for_prompt = [] + for paper in expanded_papers: + papers_for_prompt.append({ + **paper, + "content": ( + "[Full paper content omitted from this per-paper slot because the expanded set " + "exceeded the context budget. Use this paper's metadata together with the " + "RAG-retrieved full-paper evidence block below.]" + ), + }) # Build prompt with prepared papers prompt = build_reference_selection_prompt( @@ -392,18 +421,53 @@ async def _make_final_selection( brainstorm_summary=brainstorm_summary, expanded_papers=papers_for_prompt, mode=mode, - max_papers=max_papers + max_papers=max_papers, + retrieved_context=retrieved_context, ) + task_id = self.get_current_task_id() + await api_client_manager.prewarm_assistant_memory_context( + task_id=task_id, + role_id=self.role_id, + prompt=prompt, + ) + # Validate prompt size prompt_tokens = count_tokens(prompt) if prompt_tokens > max_input: logger.error(f"ReferenceSelector [{mode}]: Prompt ({prompt_tokens} tokens) still exceeds limit ({max_input})") - # Fall back to selecting from abstracts only - return [p.get("paper_id") for p in expanded_papers[:max_papers]] + metadata_only_papers = [ + { + **paper, + "content": ( + "[Full content omitted because even retrieved evidence exceeded the context budget. " + "Select only if this paper's title, abstract, and outline are clearly very useful.]" + ), + } + for paper in expanded_papers + ] + prompt = build_reference_selection_prompt( + user_research_prompt=user_research_prompt, + topic_prompt=topic_prompt, + brainstorm_summary=brainstorm_summary, + expanded_papers=metadata_only_papers, + mode=mode, + max_papers=max_papers, + ) + prompt_tokens = count_tokens(prompt) + if prompt_tokens > max_input: + logger.error( + "ReferenceSelector [%s]: Metadata-only selection prompt still exceeds limit (%s > %s); " + "failing visibly rather than selecting by list order.", + mode, + prompt_tokens, + max_input, + ) + raise ValueError( + f"Reference metadata-only final-selection prompt exceeds context limit " + f"({prompt_tokens} > {max_input})." + ) - # Generate task ID for tracking - task_id = self.get_current_task_id() self.task_sequence += 1 # Notify task started (for workflow panel) @@ -453,6 +517,8 @@ async def _make_final_selection( except FreeModelExhaustedError: raise except Exception as e: + if _is_reference_model_call_failure(e): + raise logger.error(f"ReferenceSelector [{mode}]: Error making final selection: {e}") if self.task_tracking_callback and 'task_id' in dir(): self.task_tracking_callback("completed", task_id) diff --git a/backend/autonomous/agents/topic_selector.py b/backend/autonomous/agents/topic_selector.py index 239b429..63c0135 100644 --- a/backend/autonomous/agents/topic_selector.py +++ b/backend/autonomous/agents/topic_selector.py @@ -131,8 +131,13 @@ async def select_topic( logger.error(f"TopicSelector: Even after truncation, prompt ({prompt_tokens}) exceeds limit ({max_input_tokens})") return None - # Generate task ID for tracking task_id = self.get_current_task_id() + await api_client_manager.prewarm_assistant_memory_context( + task_id=task_id, + role_id=self.role_id, + prompt=prompt, + ) + self.task_sequence += 1 # Notify task started (for workflow panel) diff --git a/backend/autonomous/core/autonomous_coordinator.py b/backend/autonomous/core/autonomous_coordinator.py index 2e4256a..1fad7c2 100644 --- a/backend/autonomous/core/autonomous_coordinator.py +++ b/backend/autonomous/core/autonomous_coordinator.py @@ -31,6 +31,10 @@ from backend.shared.json_parser import parse_json from backend.shared.response_extraction import extract_message_text from backend.shared.log_redaction import redact_log_text +from backend.shared.context_overflow import ( + CONTEXT_OVERFLOW_STOP_MESSAGE, + CONTEXT_OVERFLOW_STOP_REASON, +) from backend.shared.provider_pause import ( is_provider_credit_pause_error, mark_provider_paused, @@ -116,6 +120,8 @@ def __init__(self): self._stop_event = asyncio.Event() self._main_task: Optional[asyncio.Task] = None self._stop_broadcast_sent = False + self._fatal_stop_reason: Optional[str] = None + self._fatal_stop_message: str = "" # Configuration (set during initialize) self._user_research_prompt: str = "" @@ -130,17 +136,17 @@ def __init__(self): self._validator_supercharge_enabled: bool = False # Compiler models (separate from aggregator submitters) - self._high_context_model: str = "" + self._writer_model: str = "" self._high_param_model: str = "" - self._high_context_context: int = 0 + self._writer_context: int = 0 self._high_param_context: int = 0 - self._high_context_max_tokens: int = 0 + self._writer_max_tokens: int = 0 self._high_param_max_tokens: int = 0 - self._high_context_provider: str = "lm_studio" - self._high_context_openrouter_provider: Optional[str] = None - self._high_context_openrouter_reasoning_effort: str = "auto" - self._high_context_lm_studio_fallback: Optional[str] = None - self._high_context_supercharge_enabled: bool = False + self._writer_provider: str = "lm_studio" + self._writer_openrouter_provider: Optional[str] = None + self._writer_openrouter_reasoning_effort: str = "auto" + self._writer_lm_studio_fallback: Optional[str] = None + self._writer_supercharge_enabled: bool = False self._high_param_provider: str = "lm_studio" self._high_param_openrouter_provider: Optional[str] = None self._high_param_openrouter_reasoning_effort: str = "auto" @@ -154,6 +160,14 @@ def __init__(self): self._critique_submitter_openrouter_reasoning_effort: str = "auto" self._critique_submitter_lm_studio_fallback: Optional[str] = None self._critique_submitter_supercharge_enabled: bool = False + self._assistant_provider: str = "lm_studio" + self._assistant_model: str = "" + self._assistant_openrouter_provider: Optional[str] = None + self._assistant_openrouter_reasoning_effort: str = "auto" + self._assistant_lm_studio_fallback: Optional[str] = None + self._assistant_context: int = 0 + self._assistant_max_tokens: int = 0 + self._assistant_supercharge_enabled: bool = False # Agents (initialized during setup) self._topic_selector: Optional[TopicSelectorAgent] = None @@ -234,6 +248,11 @@ async def _broadcast(self, event: str, data: Dict[str, Any] = None) -> None: # broadcast_event expects (event_type, data) as separate arguments await self._broadcast_callback(event, data or {}) + def _mark_context_overflow_stop(self) -> None: + """Remember that the next stopped event should explain the fatal overflow.""" + self._fatal_stop_reason = CONTEXT_OVERFLOW_STOP_REASON + self._fatal_stop_message = CONTEXT_OVERFLOW_STOP_MESSAGE + def _track_child_aggregator(self, aggregator: AggregatorCoordinator) -> None: """Track local child aggregators so parent phase changes can stop them.""" if aggregator not in self._active_child_aggregators: @@ -324,6 +343,10 @@ def _proof_outputs_enabled(self) -> bool: """Return whether this run may produce Lean/proof outputs.""" return bool(self._allow_mathematical_proofs and system_config.lean4_enabled) + def _automatic_proof_max_rounds(self) -> int: + """Return automatic proof-round budget for the current output mode.""" + return 4 if not self._allow_research_papers else 1 + async def _save_proofs_only_next_topic_state(self) -> None: """Persist a clean topic-selection boundary after a proofs-only cycle.""" self._state.current_tier = "tier1_aggregation" @@ -334,28 +357,31 @@ async def _save_proofs_only_next_topic_state(self) -> None: self._resume_paper_phase = None await self._save_workflow_state(tier="tier1_aggregation", phase="topic_exploration") + async def _handle_papers_disabled_after_brainstorm(self) -> None: + """Finish a proofs-only brainstorm handoff without entering paper compilation.""" + await self._broadcast("research_papers_disabled_brainstorm_complete", { + "topic_id": self._current_topic_id, + "message": "Research paper output is disabled; returning to topic selection after brainstorm proof work." + }) + self._brainstorm_paper_count = 0 + self._current_brainstorm_paper_ids = [] + self._last_completed_paper_id = None + self._current_reference_papers = [] + self._current_reference_brainstorms = [] + logger.info("Research paper output disabled; skipping Tier 2 paper compilation") + await self._save_proofs_only_next_topic_state() + def _build_proof_runtime_config_snapshot(self) -> Dict[str, Any]: """Build the persisted runtime snapshot used by proof routes/manual checks.""" - first_submitter = self._submitter_configs[0] if self._submitter_configs else None - brainstorm_config = ProofRoleConfigSnapshot( - provider=first_submitter.provider if first_submitter else "lm_studio", - model_id=first_submitter.model_id if first_submitter else self._high_context_model, - openrouter_provider=first_submitter.openrouter_provider if first_submitter else self._high_context_openrouter_provider, - openrouter_reasoning_effort=first_submitter.openrouter_reasoning_effort if first_submitter else self._high_context_openrouter_reasoning_effort, - lm_studio_fallback_id=first_submitter.lm_studio_fallback_id if first_submitter else self._high_context_lm_studio_fallback, - context_window=first_submitter.context_window if first_submitter else self._high_context_context, - max_output_tokens=first_submitter.max_output_tokens if first_submitter else self._high_context_max_tokens, - supercharge_enabled=first_submitter.supercharge_enabled if first_submitter else self._high_context_supercharge_enabled, - ) - paper_config = ProofRoleConfigSnapshot( - provider=self._high_context_provider, - model_id=self._high_context_model, - openrouter_provider=self._high_context_openrouter_provider, - openrouter_reasoning_effort=self._high_context_openrouter_reasoning_effort, - lm_studio_fallback_id=self._high_context_lm_studio_fallback, - context_window=self._high_context_context, - max_output_tokens=self._high_context_max_tokens, - supercharge_enabled=self._high_context_supercharge_enabled, + rigor_config = ProofRoleConfigSnapshot( + provider=self._high_param_provider, + model_id=self._high_param_model, + openrouter_provider=self._high_param_openrouter_provider, + openrouter_reasoning_effort=self._high_param_openrouter_reasoning_effort, + lm_studio_fallback_id=self._high_param_lm_studio_fallback, + context_window=self._high_param_context, + max_output_tokens=self._high_param_max_tokens, + supercharge_enabled=self._high_param_supercharge_enabled, ) validator_config = ProofRoleConfigSnapshot( provider=self._validator_provider, @@ -367,10 +393,21 @@ def _build_proof_runtime_config_snapshot(self) -> Dict[str, Any]: max_output_tokens=self._validator_max_tokens, supercharge_enabled=self._validator_supercharge_enabled, ) + assistant_config = ProofRoleConfigSnapshot( + provider=self._assistant_provider, + model_id=self._assistant_model or self._validator_model, + openrouter_provider=self._assistant_openrouter_provider, + openrouter_reasoning_effort=self._assistant_openrouter_reasoning_effort, + lm_studio_fallback_id=self._assistant_lm_studio_fallback, + context_window=self._assistant_context, + max_output_tokens=self._assistant_max_tokens, + supercharge_enabled=self._assistant_supercharge_enabled, + ) return ProofRuntimeConfigSnapshot( - brainstorm=brainstorm_config, - paper=paper_config, + brainstorm=rigor_config, + paper=rigor_config, validator=validator_config, + assistant=assistant_config, ).model_dump(mode="json") async def _run_proof_framing_gate(self) -> None: @@ -518,24 +555,20 @@ async def _run_proof_verification( if not content or not source_id: return "complete" - if source_type == "brainstorm": - submitter_model = self._submitter_configs[0].model_id if self._submitter_configs else self._high_context_model - submitter_context = self._submitter_configs[0].context_window if self._submitter_configs else self._high_context_context - submitter_max_tokens = self._submitter_configs[0].max_output_tokens if self._submitter_configs else self._high_context_max_tokens - else: - submitter_model = self._high_context_model - submitter_context = self._high_context_context - submitter_max_tokens = self._high_context_max_tokens + submitter_model = self._high_param_model + submitter_context = self._high_param_context + submitter_max_tokens = self._high_param_max_tokens async def save_proof_checkpoint(checkpoint: Dict[str, Any]) -> None: await research_metadata.save_proof_checkpoint(checkpoint) - automatic_followup_rounds = ( + automatic_checkpoint = ( trigger == "automatic" and theorem_candidates is None and source_type in {"brainstorm", "paper"} ) - proof_max_rounds = 4 if automatic_followup_rounds else 1 + proof_max_rounds = self._automatic_proof_max_rounds() if automatic_checkpoint else 1 + automatic_followup_rounds = automatic_checkpoint and proof_max_rounds > 1 prior_round_summaries: List[str] = [] def round_trigger_name(round_index: int) -> str: @@ -925,9 +958,9 @@ async def initialize( validator_model: str, validator_context_window: int = 0, validator_max_tokens: int = 0, - high_context_model: str = "", - high_context_context_window: int = 0, - high_context_max_tokens: int = 0, + writer_model: str = "", + writer_context_window: int = 0, + writer_max_tokens: int = 0, high_param_model: str = "", high_param_context_window: int = 0, high_param_max_tokens: int = 0, @@ -939,30 +972,39 @@ async def initialize( validator_openrouter_provider: Optional[str] = None, validator_openrouter_reasoning_effort: str = "auto", validator_lm_studio_fallback: Optional[str] = None, - # OpenRouter provider configs for high-context submitter - high_context_provider: str = "lm_studio", - high_context_openrouter_provider: Optional[str] = None, - high_context_openrouter_reasoning_effort: str = "auto", - high_context_lm_studio_fallback: Optional[str] = None, - # OpenRouter provider configs for high-param submitter + # OpenRouter provider configs for writing submitter + writer_provider: str = "lm_studio", + writer_openrouter_provider: Optional[str] = None, + writer_openrouter_reasoning_effort: str = "auto", + writer_lm_studio_fallback: Optional[str] = None, + # OpenRouter provider configs for Rigor & Proofs submitter high_param_provider: str = "lm_studio", high_param_openrouter_provider: Optional[str] = None, high_param_openrouter_reasoning_effort: str = "auto", high_param_lm_studio_fallback: Optional[str] = None, - # OpenRouter provider configs for critique submitter + # Deprecated critique compatibility fields mirror Rigor & Proofs critique_submitter_provider: str = "lm_studio", critique_submitter_openrouter_provider: Optional[str] = None, critique_submitter_openrouter_reasoning_effort: str = "auto", critique_submitter_lm_studio_fallback: Optional[str] = None, + # OpenRouter provider configs for Assistant proof retrieval/ranking + assistant_provider: str = "lm_studio", + assistant_model: str = "", + assistant_openrouter_provider: Optional[str] = None, + assistant_openrouter_reasoning_effort: str = "auto", + assistant_lm_studio_fallback: Optional[str] = None, + assistant_context_window: int = 0, + assistant_max_tokens: int = 0, # Tier 3 Final Answer setting tier3_enabled: bool = False, creativity_emphasis_boost_enabled: bool = False, allow_mathematical_proofs: bool = True, allow_research_papers: bool = True, validator_supercharge_enabled: bool = False, - high_context_supercharge_enabled: bool = False, + writer_supercharge_enabled: bool = False, high_param_supercharge_enabled: bool = False, - critique_submitter_supercharge_enabled: bool = False + critique_submitter_supercharge_enabled: bool = False, + assistant_supercharge_enabled: bool = False, ) -> None: """Initialize the coordinator with configuration.""" # Use first submitter config for autonomous agents (topic selector, etc.) @@ -974,9 +1016,9 @@ async def initialize( role_limits = { "brainstorm submitter": (first_submitter_context, first_submitter_max_tokens), "validator": (validator_context_window, validator_max_tokens), - "high-context submitter": (high_context_context_window, high_context_max_tokens), - "high-param submitter": (high_param_context_window, high_param_max_tokens), - "critique submitter": (critique_submitter_context_window, critique_submitter_max_tokens), + "Writing Submitter": (writer_context_window, writer_max_tokens), + "Rigor & Proofs submitter": (high_param_context_window, high_param_max_tokens), + "assistant": (assistant_context_window, assistant_max_tokens), } missing_limits = [] invalid_limits = [] @@ -1008,42 +1050,59 @@ async def initialize( # Compiler settings (separate from aggregator submitters) # Fallback to first submitter model if compiler models not specified - self._high_context_model = high_context_model if high_context_model else first_submitter_model + self._writer_model = writer_model if writer_model else first_submitter_model self._high_param_model = high_param_model if high_param_model else first_submitter_model - self._high_context_context = high_context_context_window + self._writer_context = writer_context_window self._high_param_context = high_param_context_window - self._high_context_max_tokens = high_context_max_tokens + self._writer_max_tokens = writer_max_tokens self._high_param_max_tokens = high_param_max_tokens - # Critique submitter fallback: use high_context_model if not specified - self._critique_submitter_model = critique_submitter_model if critique_submitter_model else self._high_context_model - self._critique_submitter_context = critique_submitter_context_window - self._critique_submitter_max_tokens = critique_submitter_max_tokens + # Deprecated critique role fields are compatibility aliases. Critique + # generation now runs on the Rigor & Proofs submitter settings. + self._critique_submitter_model = self._high_param_model + self._critique_submitter_context = self._high_param_context + self._critique_submitter_max_tokens = self._high_param_max_tokens # Store OpenRouter provider configs for all roles self._validator_provider = validator_provider self._validator_openrouter_provider = validator_openrouter_provider self._validator_openrouter_reasoning_effort = validator_openrouter_reasoning_effort self._validator_lm_studio_fallback = validator_lm_studio_fallback - self._high_context_provider = high_context_provider - self._high_context_openrouter_provider = high_context_openrouter_provider - self._high_context_openrouter_reasoning_effort = high_context_openrouter_reasoning_effort - self._high_context_lm_studio_fallback = high_context_lm_studio_fallback + self._writer_provider = writer_provider + self._writer_openrouter_provider = writer_openrouter_provider + self._writer_openrouter_reasoning_effort = writer_openrouter_reasoning_effort + self._writer_lm_studio_fallback = writer_lm_studio_fallback self._high_param_provider = high_param_provider self._high_param_openrouter_provider = high_param_openrouter_provider self._high_param_openrouter_reasoning_effort = high_param_openrouter_reasoning_effort self._high_param_lm_studio_fallback = high_param_lm_studio_fallback - self._critique_submitter_provider = critique_submitter_provider - self._critique_submitter_openrouter_provider = critique_submitter_openrouter_provider - self._critique_submitter_openrouter_reasoning_effort = critique_submitter_openrouter_reasoning_effort - self._critique_submitter_lm_studio_fallback = critique_submitter_lm_studio_fallback + self._critique_submitter_provider = self._high_param_provider + self._critique_submitter_openrouter_provider = self._high_param_openrouter_provider + self._critique_submitter_openrouter_reasoning_effort = self._high_param_openrouter_reasoning_effort + self._critique_submitter_lm_studio_fallback = self._high_param_lm_studio_fallback + self._assistant_provider = assistant_provider or validator_provider + self._assistant_model = assistant_model or validator_model + self._assistant_openrouter_provider = ( + assistant_openrouter_provider if assistant_model else validator_openrouter_provider + ) + self._assistant_openrouter_reasoning_effort = ( + assistant_openrouter_reasoning_effort if assistant_model else validator_openrouter_reasoning_effort + ) + self._assistant_lm_studio_fallback = ( + assistant_lm_studio_fallback if assistant_model else validator_lm_studio_fallback + ) + self._assistant_context = assistant_context_window if assistant_model else validator_context_window + self._assistant_max_tokens = assistant_max_tokens if assistant_model else validator_max_tokens + self._assistant_supercharge_enabled = ( + assistant_supercharge_enabled if assistant_model else validator_supercharge_enabled + ) self._allow_mathematical_proofs = bool(allow_mathematical_proofs) self._allow_research_papers = bool(allow_research_papers) self._tier3_enabled = bool(tier3_enabled and self._allow_research_papers) self._creativity_emphasis_boost_enabled = creativity_emphasis_boost_enabled self._validator_supercharge_enabled = validator_supercharge_enabled - self._high_context_supercharge_enabled = high_context_supercharge_enabled + self._writer_supercharge_enabled = writer_supercharge_enabled self._high_param_supercharge_enabled = high_param_supercharge_enabled - self._critique_submitter_supercharge_enabled = critique_submitter_supercharge_enabled + self._critique_submitter_supercharge_enabled = self._high_param_supercharge_enabled logger.info(f"Autonomous coordinator initializing with {len(submitter_configs)} submitters") for config in submitter_configs: @@ -1054,6 +1113,7 @@ async def initialize( # This takes precedence over legacy paths and new session creation interrupted_session = await session_manager.find_interrupted_session(system_config.auto_sessions_base_dir) + metadata_prompt = user_research_prompt if interrupted_session: session_id = interrupted_session["session_id"] logger.info(f"Found interrupted session: {session_id}") @@ -1077,6 +1137,10 @@ async def initialize( # Override the user_research_prompt with the one from the interrupted session # This ensures we continue with the same research goal self._user_research_prompt = interrupted_session["user_prompt"] + # The session metadata is already the source of truth on resume. + # Loading without a prompt avoids overwriting proof-framed prompt state + # with a fresh Start request or base-only browser draft. + metadata_prompt = "" else: # PRIORITY 2: Check for existing legacy data # If legacy data exists, use it instead of creating empty new session @@ -1111,7 +1175,7 @@ async def initialize( # Initialize memory systems await brainstorm_memory.initialize() await paper_library.initialize() - await research_metadata.initialize(user_research_prompt) + await research_metadata.initialize(metadata_prompt) await proof_database.initialize() await autonomous_rejection_logs.initialize() @@ -1162,7 +1226,9 @@ async def initialize( model_id=first_submitter_model, validator_model_id=validator_model, context_window=first_submitter_context, - max_output_tokens=first_submitter_max_tokens + max_output_tokens=first_submitter_max_tokens, + validator_context_window=validator_context_window, + validator_max_output_tokens=validator_max_tokens, ) self._redundancy_checker = PaperRedundancyChecker( @@ -1176,21 +1242,27 @@ async def initialize( submitter_model=first_submitter_model, validator_model=validator_model, context_window=first_submitter_context, - max_output_tokens=first_submitter_max_tokens + max_output_tokens=first_submitter_max_tokens, + validator_context_window=validator_context_window, + validator_max_output_tokens=validator_max_tokens, ) self._format_selector = AnswerFormatSelector( submitter_model=first_submitter_model, validator_model=validator_model, context_window=first_submitter_context, - max_output_tokens=first_submitter_max_tokens + max_output_tokens=first_submitter_max_tokens, + validator_context_window=validator_context_window, + validator_max_output_tokens=validator_max_tokens, ) self._volume_organizer = VolumeOrganizer( submitter_model=first_submitter_model, validator_model=validator_model, context_window=first_submitter_context, - max_output_tokens=first_submitter_max_tokens + max_output_tokens=first_submitter_max_tokens, + validator_context_window=validator_context_window, + validator_max_output_tokens=validator_max_tokens, ) # Initialize Tier 3 memory @@ -1306,33 +1378,51 @@ async def initialize( ) ) + autonomous_assistant_config = ModelConfig( + provider=self._assistant_provider, + model_id=self._assistant_model, + openrouter_model_id=self._assistant_model if self._assistant_provider == "openrouter" else None, + openrouter_provider=self._assistant_openrouter_provider, + openrouter_reasoning_effort=self._assistant_openrouter_reasoning_effort, + lm_studio_fallback_id=self._assistant_lm_studio_fallback, + context_window=self._assistant_context, + max_output_tokens=self._assistant_max_tokens, + supercharge_enabled=self._assistant_supercharge_enabled, + ) + api_client_manager.configure_role("autonomous_assistant", autonomous_assistant_config) + # Autonomous topic/title/Tier 1 brainstorm phases use child Aggregator + # submitters, and Tier 2 paper writing uses a child Compiler. Alias + # both Assistant roles to the run-level Assistant model. + api_client_manager.configure_role("aggregator_assistant", autonomous_assistant_config) + api_client_manager.configure_role("compiler_assistant", autonomous_assistant_config) + api_client_manager.configure_role( "autonomous_proof_identification_brainstorm", ModelConfig( - provider=first_config.provider if hasattr(first_config, 'provider') else "lm_studio", - model_id=first_submitter_model, - openrouter_model_id=first_config.openrouter_model_id if hasattr(first_config, 'openrouter_model_id') else None, - openrouter_provider=first_config.openrouter_provider if hasattr(first_config, 'openrouter_provider') else None, - openrouter_reasoning_effort=first_reasoning_effort, - lm_studio_fallback_id=first_config.lm_studio_fallback_id if hasattr(first_config, 'lm_studio_fallback_id') else None, - context_window=first_submitter_context, - max_output_tokens=first_submitter_max_tokens, - supercharge_enabled=first_supercharge_enabled + provider=high_param_provider, + model_id=self._high_param_model, + openrouter_model_id=self._high_param_model if high_param_provider == "openrouter" else None, + openrouter_provider=high_param_openrouter_provider, + openrouter_reasoning_effort=high_param_openrouter_reasoning_effort, + lm_studio_fallback_id=high_param_lm_studio_fallback, + context_window=self._high_param_context, + max_output_tokens=self._high_param_max_tokens, + supercharge_enabled=high_param_supercharge_enabled ) ) api_client_manager.configure_role( "autonomous_proof_lemma_search_brainstorm", ModelConfig( - provider=first_config.provider if hasattr(first_config, 'provider') else "lm_studio", - model_id=first_submitter_model, - openrouter_model_id=first_config.openrouter_model_id if hasattr(first_config, 'openrouter_model_id') else None, - openrouter_provider=first_config.openrouter_provider if hasattr(first_config, 'openrouter_provider') else None, - openrouter_reasoning_effort=first_reasoning_effort, - lm_studio_fallback_id=first_config.lm_studio_fallback_id if hasattr(first_config, 'lm_studio_fallback_id') else None, - context_window=first_submitter_context, - max_output_tokens=first_submitter_max_tokens, - supercharge_enabled=first_supercharge_enabled + provider=high_param_provider, + model_id=self._high_param_model, + openrouter_model_id=self._high_param_model if high_param_provider == "openrouter" else None, + openrouter_provider=high_param_openrouter_provider, + openrouter_reasoning_effort=high_param_openrouter_reasoning_effort, + lm_studio_fallback_id=high_param_lm_studio_fallback, + context_window=self._high_param_context, + max_output_tokens=self._high_param_max_tokens, + supercharge_enabled=high_param_supercharge_enabled ) ) @@ -1354,60 +1444,60 @@ async def initialize( api_client_manager.configure_role( "autonomous_proof_formalization_brainstorm", ModelConfig( - provider=first_config.provider if hasattr(first_config, 'provider') else "lm_studio", - model_id=first_submitter_model, - openrouter_model_id=first_config.openrouter_model_id if hasattr(first_config, 'openrouter_model_id') else None, - openrouter_provider=first_config.openrouter_provider if hasattr(first_config, 'openrouter_provider') else None, - openrouter_reasoning_effort=first_reasoning_effort, - lm_studio_fallback_id=first_config.lm_studio_fallback_id if hasattr(first_config, 'lm_studio_fallback_id') else None, - context_window=first_submitter_context, - max_output_tokens=first_submitter_max_tokens, - supercharge_enabled=first_supercharge_enabled + provider=high_param_provider, + model_id=self._high_param_model, + openrouter_model_id=self._high_param_model if high_param_provider == "openrouter" else None, + openrouter_provider=high_param_openrouter_provider, + openrouter_reasoning_effort=high_param_openrouter_reasoning_effort, + lm_studio_fallback_id=high_param_lm_studio_fallback, + context_window=self._high_param_context, + max_output_tokens=self._high_param_max_tokens, + supercharge_enabled=high_param_supercharge_enabled ) ) api_client_manager.configure_role( "autonomous_proof_identification_paper", ModelConfig( - provider=high_context_provider, - model_id=self._high_context_model, - openrouter_model_id=self._high_context_model if high_context_provider == "openrouter" else None, - openrouter_provider=high_context_openrouter_provider, - openrouter_reasoning_effort=high_context_openrouter_reasoning_effort, - lm_studio_fallback_id=high_context_lm_studio_fallback, - context_window=self._high_context_context, - max_output_tokens=self._high_context_max_tokens, - supercharge_enabled=high_context_supercharge_enabled + provider=high_param_provider, + model_id=self._high_param_model, + openrouter_model_id=self._high_param_model if high_param_provider == "openrouter" else None, + openrouter_provider=high_param_openrouter_provider, + openrouter_reasoning_effort=high_param_openrouter_reasoning_effort, + lm_studio_fallback_id=high_param_lm_studio_fallback, + context_window=self._high_param_context, + max_output_tokens=self._high_param_max_tokens, + supercharge_enabled=high_param_supercharge_enabled ) ) api_client_manager.configure_role( "autonomous_proof_lemma_search_paper", ModelConfig( - provider=high_context_provider, - model_id=self._high_context_model, - openrouter_model_id=self._high_context_model if high_context_provider == "openrouter" else None, - openrouter_provider=high_context_openrouter_provider, - openrouter_reasoning_effort=high_context_openrouter_reasoning_effort, - lm_studio_fallback_id=high_context_lm_studio_fallback, - context_window=self._high_context_context, - max_output_tokens=self._high_context_max_tokens, - supercharge_enabled=high_context_supercharge_enabled + provider=high_param_provider, + model_id=self._high_param_model, + openrouter_model_id=self._high_param_model if high_param_provider == "openrouter" else None, + openrouter_provider=high_param_openrouter_provider, + openrouter_reasoning_effort=high_param_openrouter_reasoning_effort, + lm_studio_fallback_id=high_param_lm_studio_fallback, + context_window=self._high_param_context, + max_output_tokens=self._high_param_max_tokens, + supercharge_enabled=high_param_supercharge_enabled ) ) api_client_manager.configure_role( "autonomous_proof_formalization_paper", ModelConfig( - provider=high_context_provider, - model_id=self._high_context_model, - openrouter_model_id=self._high_context_model if high_context_provider == "openrouter" else None, - openrouter_provider=high_context_openrouter_provider, - openrouter_reasoning_effort=high_context_openrouter_reasoning_effort, - lm_studio_fallback_id=high_context_lm_studio_fallback, - context_window=self._high_context_context, - max_output_tokens=self._high_context_max_tokens, - supercharge_enabled=high_context_supercharge_enabled + provider=high_param_provider, + model_id=self._high_param_model, + openrouter_model_id=self._high_param_model if high_param_provider == "openrouter" else None, + openrouter_provider=high_param_openrouter_provider, + openrouter_reasoning_effort=high_param_openrouter_reasoning_effort, + lm_studio_fallback_id=high_param_lm_studio_fallback, + context_window=self._high_param_context, + max_output_tokens=self._high_param_max_tokens, + supercharge_enabled=high_param_supercharge_enabled ) ) @@ -1429,90 +1519,90 @@ async def initialize( api_client_manager.configure_role( "autonomous_proof_identification_manual_brainstorm", ModelConfig( - provider=first_config.provider if hasattr(first_config, 'provider') else "lm_studio", - model_id=first_submitter_model, - openrouter_model_id=first_config.openrouter_model_id if hasattr(first_config, 'openrouter_model_id') else None, - openrouter_provider=first_config.openrouter_provider if hasattr(first_config, 'openrouter_provider') else None, - openrouter_reasoning_effort=first_reasoning_effort, - lm_studio_fallback_id=first_config.lm_studio_fallback_id if hasattr(first_config, 'lm_studio_fallback_id') else None, - context_window=first_submitter_context, - max_output_tokens=first_submitter_max_tokens, - supercharge_enabled=first_supercharge_enabled + provider=high_param_provider, + model_id=self._high_param_model, + openrouter_model_id=self._high_param_model if high_param_provider == "openrouter" else None, + openrouter_provider=high_param_openrouter_provider, + openrouter_reasoning_effort=high_param_openrouter_reasoning_effort, + lm_studio_fallback_id=high_param_lm_studio_fallback, + context_window=self._high_param_context, + max_output_tokens=self._high_param_max_tokens, + supercharge_enabled=high_param_supercharge_enabled ) ) api_client_manager.configure_role( "autonomous_proof_lemma_search_manual_brainstorm", ModelConfig( - provider=first_config.provider if hasattr(first_config, 'provider') else "lm_studio", - model_id=first_submitter_model, - openrouter_model_id=first_config.openrouter_model_id if hasattr(first_config, 'openrouter_model_id') else None, - openrouter_provider=first_config.openrouter_provider if hasattr(first_config, 'openrouter_provider') else None, - openrouter_reasoning_effort=first_reasoning_effort, - lm_studio_fallback_id=first_config.lm_studio_fallback_id if hasattr(first_config, 'lm_studio_fallback_id') else None, - context_window=first_submitter_context, - max_output_tokens=first_submitter_max_tokens, - supercharge_enabled=first_supercharge_enabled + provider=high_param_provider, + model_id=self._high_param_model, + openrouter_model_id=self._high_param_model if high_param_provider == "openrouter" else None, + openrouter_provider=high_param_openrouter_provider, + openrouter_reasoning_effort=high_param_openrouter_reasoning_effort, + lm_studio_fallback_id=high_param_lm_studio_fallback, + context_window=self._high_param_context, + max_output_tokens=self._high_param_max_tokens, + supercharge_enabled=high_param_supercharge_enabled ) ) api_client_manager.configure_role( "autonomous_proof_formalization_manual_brainstorm", ModelConfig( - provider=first_config.provider if hasattr(first_config, 'provider') else "lm_studio", - model_id=first_submitter_model, - openrouter_model_id=first_config.openrouter_model_id if hasattr(first_config, 'openrouter_model_id') else None, - openrouter_provider=first_config.openrouter_provider if hasattr(first_config, 'openrouter_provider') else None, - openrouter_reasoning_effort=first_reasoning_effort, - lm_studio_fallback_id=first_config.lm_studio_fallback_id if hasattr(first_config, 'lm_studio_fallback_id') else None, - context_window=first_submitter_context, - max_output_tokens=first_submitter_max_tokens, - supercharge_enabled=first_supercharge_enabled + provider=high_param_provider, + model_id=self._high_param_model, + openrouter_model_id=self._high_param_model if high_param_provider == "openrouter" else None, + openrouter_provider=high_param_openrouter_provider, + openrouter_reasoning_effort=high_param_openrouter_reasoning_effort, + lm_studio_fallback_id=high_param_lm_studio_fallback, + context_window=self._high_param_context, + max_output_tokens=self._high_param_max_tokens, + supercharge_enabled=high_param_supercharge_enabled ) ) api_client_manager.configure_role( "autonomous_proof_identification_manual_paper", ModelConfig( - provider=high_context_provider, - model_id=self._high_context_model, - openrouter_model_id=self._high_context_model if high_context_provider == "openrouter" else None, - openrouter_provider=high_context_openrouter_provider, - openrouter_reasoning_effort=high_context_openrouter_reasoning_effort, - lm_studio_fallback_id=high_context_lm_studio_fallback, - context_window=self._high_context_context, - max_output_tokens=self._high_context_max_tokens, - supercharge_enabled=high_context_supercharge_enabled + provider=high_param_provider, + model_id=self._high_param_model, + openrouter_model_id=self._high_param_model if high_param_provider == "openrouter" else None, + openrouter_provider=high_param_openrouter_provider, + openrouter_reasoning_effort=high_param_openrouter_reasoning_effort, + lm_studio_fallback_id=high_param_lm_studio_fallback, + context_window=self._high_param_context, + max_output_tokens=self._high_param_max_tokens, + supercharge_enabled=high_param_supercharge_enabled ) ) api_client_manager.configure_role( "autonomous_proof_lemma_search_manual_paper", ModelConfig( - provider=high_context_provider, - model_id=self._high_context_model, - openrouter_model_id=self._high_context_model if high_context_provider == "openrouter" else None, - openrouter_provider=high_context_openrouter_provider, - openrouter_reasoning_effort=high_context_openrouter_reasoning_effort, - lm_studio_fallback_id=high_context_lm_studio_fallback, - context_window=self._high_context_context, - max_output_tokens=self._high_context_max_tokens, - supercharge_enabled=high_context_supercharge_enabled + provider=high_param_provider, + model_id=self._high_param_model, + openrouter_model_id=self._high_param_model if high_param_provider == "openrouter" else None, + openrouter_provider=high_param_openrouter_provider, + openrouter_reasoning_effort=high_param_openrouter_reasoning_effort, + lm_studio_fallback_id=high_param_lm_studio_fallback, + context_window=self._high_param_context, + max_output_tokens=self._high_param_max_tokens, + supercharge_enabled=high_param_supercharge_enabled ) ) api_client_manager.configure_role( "autonomous_proof_formalization_manual_paper", ModelConfig( - provider=high_context_provider, - model_id=self._high_context_model, - openrouter_model_id=self._high_context_model if high_context_provider == "openrouter" else None, - openrouter_provider=high_context_openrouter_provider, - openrouter_reasoning_effort=high_context_openrouter_reasoning_effort, - lm_studio_fallback_id=high_context_lm_studio_fallback, - context_window=self._high_context_context, - max_output_tokens=self._high_context_max_tokens, - supercharge_enabled=high_context_supercharge_enabled + provider=high_param_provider, + model_id=self._high_param_model, + openrouter_model_id=self._high_param_model if high_param_provider == "openrouter" else None, + openrouter_provider=high_param_openrouter_provider, + openrouter_reasoning_effort=high_param_openrouter_reasoning_effort, + lm_studio_fallback_id=high_param_lm_studio_fallback, + context_window=self._high_param_context, + max_output_tokens=self._high_param_max_tokens, + supercharge_enabled=high_param_supercharge_enabled ) ) @@ -1658,7 +1748,10 @@ async def _check_resume_state(self) -> None: self._proof_framing_reasoning = workflow_state.get("proof_framing_reasoning", "") self._base_user_research_prompt = await research_metadata.get_base_user_prompt() if not self._base_user_research_prompt: - self._base_user_research_prompt = self._user_research_prompt + self._base_user_research_prompt = ( + workflow_state.get("base_user_research_prompt") + or self._user_research_prompt + ) self._user_research_prompt = self._append_proof_framing(self._base_user_research_prompt) # Restore Tier 3 flags for proper resume @@ -2213,6 +2306,7 @@ async def _save_workflow_state(self, tier: str = None, phase: Any = _WORKFLOW_PH "current_paper_id": self._current_paper_id, "current_paper_title": self._current_paper_title, "paper_phase": phase_to_store, + "base_user_research_prompt": self._base_user_research_prompt or self._user_research_prompt, "reference_paper_ids": self._current_reference_papers, # Persist reference papers across restarts "reference_brainstorm_ids": self._current_reference_brainstorms, "acceptance_count": self._acceptance_count, @@ -2240,11 +2334,11 @@ async def _save_workflow_state(self, tier: str = None, phase: Any = _WORKFLOW_PH "validator_model": self._validator_model, "validator_context_window": self._validator_context, "validator_max_tokens": self._validator_max_tokens, - "high_context_model": self._high_context_model, + "writer_model": self._writer_model, "high_param_model": self._high_param_model, - "high_context_context_window": self._high_context_context, + "writer_context_window": self._writer_context, "high_param_context_window": self._high_param_context, - "high_context_max_tokens": self._high_context_max_tokens, + "writer_max_tokens": self._writer_max_tokens, "high_param_max_tokens": self._high_param_max_tokens } } @@ -2293,9 +2387,13 @@ async def _broadcast_stopped_once(self) -> None: self._stop_broadcast_sent = True stats = await research_metadata.get_stats() - await self._broadcast("auto_research_stopped", { + payload = { "final_stats": stats - }) + } + if self._fatal_stop_reason: + payload["reason"] = self._fatal_stop_reason + payload["message"] = self._fatal_stop_message or CONTEXT_OVERFLOW_STOP_MESSAGE + await self._broadcast("auto_research_stopped", payload) async def start(self) -> None: """Start the autonomous research loop.""" @@ -2307,6 +2405,8 @@ async def start(self) -> None: self._stop_event.clear() self._state.is_running = True self._stop_broadcast_sent = False + self._fatal_stop_reason = None + self._fatal_stop_message = "" # Reset free model manager state for fresh start free_model_manager.reset() @@ -2396,12 +2496,25 @@ async def log_callback(task_id, role_id, model, provider, prompt, response, if self._stop_event.is_set(): break + if not candidate_questions.strip(): + logger.warning("Topic exploration produced no validated candidates; restarting exploration") + continue + topic_result = await self._topic_selection_loop(candidate_questions) if self._stop_event.is_set(): break - self._current_reference_papers = await self._pre_brainstorm_reference_selection() + if not topic_result: + logger.warning("Topic selection ended without selecting a topic; restarting topic exploration") + continue + + self._current_reference_papers = [] + self._current_reference_brainstorms = [] + if self._allow_research_papers: + self._current_reference_papers = await self._pre_brainstorm_reference_selection() + else: + self._current_reference_brainstorms = await self._pre_brainstorm_reference_brainstorm_selection() if self._stop_event.is_set(): break @@ -2414,6 +2527,10 @@ async def log_callback(task_id, role_id, model, provider, prompt, response, break if write_paper: + if not self._allow_research_papers: + await self._handle_papers_disabled_after_brainstorm() + continue + while not self._stop_event.is_set(): if await self._paper_compilation_workflow(): break @@ -2568,7 +2685,7 @@ async def log_callback(task_id, role_id, model, provider, prompt, response, if not self._allow_research_papers: logger.info("Research paper output disabled; skipping resumed Tier 2 paper compilation") - await self._save_proofs_only_next_topic_state() + await self._handle_papers_disabled_after_brainstorm() continue # A resumed brainstorm MUST produce a paper - retry until success or stop @@ -2696,6 +2813,10 @@ async def log_callback(task_id, role_id, model, provider, prompt, response, break if write_paper: + if not self._allow_research_papers: + await self._handle_papers_disabled_after_brainstorm() + continue + # A completed brainstorm MUST produce a paper - retry until success or stop _resume_paper_attempt = 0 while not self._stop_event.is_set(): @@ -2845,12 +2966,20 @@ async def log_callback(task_id, role_id, model, provider, prompt, response, if self._stop_event.is_set(): break + + if not candidate_questions.strip(): + logger.warning("Topic exploration produced no validated candidates; restarting exploration") + continue # Phase 1: Topic selection (informed by exploration candidates) topic_result = await self._topic_selection_loop(candidate_questions) if self._stop_event.is_set(): break + + if not topic_result: + logger.warning("Topic selection ended without selecting a topic; restarting topic exploration") + continue # Phase 1.5: Pre-brainstorm reference selection. # Paper-enabled runs keep today's paper-reference behavior; proof-only @@ -2881,17 +3010,7 @@ async def log_callback(task_id, role_id, model, provider, prompt, response, continue if not self._allow_research_papers: - await self._broadcast("research_papers_disabled_brainstorm_complete", { - "topic_id": self._current_topic_id, - "message": "Research paper output is disabled; returning to topic selection after brainstorm proof work." - }) - self._brainstorm_paper_count = 0 - self._current_brainstorm_paper_ids = [] - self._last_completed_paper_id = None - self._current_reference_papers = [] - self._current_reference_brainstorms = [] - logger.info("Research paper output disabled; skipping Tier 2 paper compilation") - await self._save_proofs_only_next_topic_state() + await self._handle_papers_disabled_after_brainstorm() continue # Phase 3: Paper compilation @@ -3364,12 +3483,20 @@ async def _resume_research_loop_after_tier3(self) -> None: if self._stop_event.is_set(): break + + if not candidate_questions.strip(): + logger.warning("Topic exploration produced no validated candidates; restarting exploration") + continue # Phase 1: Topic selection (informed by exploration candidates) - await self._topic_selection_loop(candidate_questions) + topic_result = await self._topic_selection_loop(candidate_questions) if self._stop_event.is_set(): break + + if not topic_result: + logger.warning("Topic selection ended without selecting a topic; restarting topic exploration") + continue # Phase 1.5: Pre-brainstorm reference selection. self._current_reference_papers = [] @@ -3398,17 +3525,7 @@ async def _resume_research_loop_after_tier3(self) -> None: continue if not self._allow_research_papers: - await self._broadcast("research_papers_disabled_brainstorm_complete", { - "topic_id": self._current_topic_id, - "message": "Research paper output is disabled; returning to topic selection after brainstorm proof work." - }) - self._brainstorm_paper_count = 0 - self._current_brainstorm_paper_ids = [] - self._last_completed_paper_id = None - self._current_reference_papers = [] - self._current_reference_brainstorms = [] - logger.info("Research paper output disabled; skipping Tier 2 paper compilation") - await self._save_proofs_only_next_topic_state() + await self._handle_papers_disabled_after_brainstorm() continue # Phase 3: Paper compilation @@ -3538,10 +3655,7 @@ async def _resume_research_loop_after_tier3(self) -> None: shared_training_memory.last_ragged_submission_count = 0 logger.info("Cleared shared_training_memory in-memory data (will reload from file when needed)") - stats = await research_metadata.get_stats() - await self._broadcast("auto_research_stopped", { - "final_stats": stats - }) + await self._broadcast_stopped_once() logger.info("Resumed research loop completed") def get_state(self) -> AutonomousResearchState: @@ -3592,7 +3706,6 @@ async def _topic_exploration_phase(self) -> str: await self._enter_topic_exploration_boundary() TARGET_CANDIDATES = 5 - MAX_CONSECUTIVE_REJECTIONS = 15 await self._broadcast("topic_exploration_started", { "target": TARGET_CANDIDATES, @@ -3629,6 +3742,7 @@ async def _topic_exploration_phase(self) -> str: await shared_training_memory.reload_insights_from_current_path() exploration_aggregator = None + exploration_completed = False try: exploration_aggregator = AggregatorCoordinator() @@ -3653,6 +3767,7 @@ async def _topic_exploration_phase(self) -> str: local_rejection_log_dir=str(brainstorm_memory._base_dir), local_rejection_log_template="topic_exploration_submitter_{submitter_id}_rejections.txt", reset_local_rejection_logs_on_start=True, + assistant_workflow_mode_override="autonomous", ) # Set WebSocket broadcaster so aggregator events flow through @@ -3665,16 +3780,25 @@ async def _topic_exploration_phase(self) -> str: last_acceptances = 0 last_rejections = 0 - consecutive_rejections = 0 while self._running and not self._stop_event.is_set(): status = await exploration_aggregator.get_status() + if getattr(exploration_aggregator, "fatal_error_type", None) == "context_overflow": + logger.error( + "Topic exploration stopped for context overflow: %s", + getattr(exploration_aggregator, "fatal_error_message", ""), + ) + self._mark_context_overflow_stop() + self._stop_event.set() + return "" + if not status.is_running: + logger.warning("Topic exploration aggregator stopped unexpectedly") + return "" current_acceptances = status.total_acceptances current_rejections = status.total_rejections # Track new acceptances if current_acceptances > last_acceptances: - consecutive_rejections = 0 last_acceptances = current_acceptances await self._broadcast("topic_exploration_progress", { @@ -3694,20 +3818,23 @@ async def _topic_exploration_phase(self) -> str: logger.info(f"TopicExploration: Target of {TARGET_CANDIDATES} candidates reached") break - # Track consecutive rejections for safety valve + # Track rejections for progress totals. Rejections do not end + # exploration; this phase runs until enough candidates are accepted. if current_rejections > last_rejections: - new_rejections = current_rejections - last_rejections - consecutive_rejections += new_rejections last_rejections = current_rejections - - if consecutive_rejections >= MAX_CONSECUTIVE_REJECTIONS: - logger.warning(f"TopicExploration: {consecutive_rejections} consecutive rejections - proceeding with {current_acceptances} candidates") - break await asyncio.sleep(2) # Stop the exploration aggregator await exploration_aggregator.stop() + + if last_acceptances < TARGET_CANDIDATES: + logger.warning( + "Topic exploration ended before target: %s/%s candidates accepted", + last_acceptances, + TARGET_CANDIDATES, + ) + return "" # Read accepted candidates from the exploration database candidates_text = "" @@ -3728,12 +3855,19 @@ async def _topic_exploration_phase(self) -> str: lines.append("-" * 40) candidates_text = "\n".join(lines) + if not candidates_text.strip(): + logger.warning( + "Topic exploration reached target count but no candidate text was available; restarting exploration" + ) + return "" + await self._broadcast("topic_exploration_complete", { "accepted_count": last_acceptances, "total_attempts": last_acceptances + last_rejections }) logger.info(f"Topic exploration complete: {last_acceptances} candidates accepted") + exploration_completed = True return candidates_text @@ -3767,8 +3901,9 @@ async def _topic_exploration_phase(self) -> str: shared_training_memory.last_ragged_submission_count = 0 logger.info("Exploration: Restored shared_training_memory state") - # Clean up exploration database file - if exploration_db_path.exists(): + # Clean up only after the target is reached; interrupted/failed runs + # must not masquerade as completed candidate exploration. + if exploration_completed and exploration_db_path.exists(): try: exploration_db_path.unlink() except OSError as cleanup_exc: @@ -3834,9 +3969,8 @@ async def _topic_selection_loop(self, candidate_questions: str = "") -> Optional else: await self._topic_selector.handle_rejection(submission, validation.reasoning) await research_metadata.increment_stat("topic_selection_rejections") - await self._broadcast("topic_selection_rejected", { - "reasoning": validation.reasoning + "reasoning": validation.reasoning, }) logger.info(f"Topic selection rejected: {validation.reasoning[:100]}...") @@ -3890,7 +4024,7 @@ async def _execute_topic_selection( ) await self._broadcast("topic_selection_rejected", { "reasoning": f"Cannot continue brainstorm {topic_id} — it is already marked complete. " - f"Select a new topic or continue an incomplete brainstorm." + f"Select a new topic or continue an incomplete brainstorm.", }) return None @@ -4074,9 +4208,9 @@ async def _pre_brainstorm_reference_selection(self) -> List[str]: This is the crucial mechanism that enables compounding knowledge across research cycles. By selecting reference papers before brainstorming, submitters can: - - Build upon proven mathematical frameworks from prior papers + - Build upon promising mathematical frameworks from prior AI-generated papers while independently re-checking their claims - Avoid re-exploring territory already covered in depth - - Identify novel connections between new topics and established results + - Identify novel connections between new topics and previously explored results - Accelerate convergence on valuable insights by standing on prior work Returns: @@ -4148,8 +4282,8 @@ async def _pre_brainstorm_reference_brainstorm_selection(self) -> List[str]: topic_prompt = metadata.topic_prompt if metadata else "" max_input_tokens = max( 1000, - int((self._submitter_configs[0].context_window if self._submitter_configs else self._high_context_context) or 0) - - int((self._submitter_configs[0].max_output_tokens if self._submitter_configs else self._high_context_max_tokens) or 0) + int((self._submitter_configs[0].context_window if self._submitter_configs else self._writer_context) or 0) + - int((self._submitter_configs[0].max_output_tokens if self._submitter_configs else self._writer_max_tokens) or 0) - 1000, ) @@ -4178,6 +4312,12 @@ def build_prompt(candidates: List[Dict[str, Any]], retry_feedback: str = "") -> while count_tokens(prompt) > max_input_tokens and len(prompt_candidates) > 1: prompt_candidates = prompt_candidates[: max(1, len(prompt_candidates) // 2)] prompt = build_prompt(prompt_candidates) + reference_task_id = self._reference_selector.get_current_task_id() if self._reference_selector else "agg_sub1_000" + await api_client_manager.prewarm_assistant_memory_context( + task_id=reference_task_id, + role_id="autonomous_reference_selector", + prompt=prompt, + ) if count_tokens(prompt) > max_input_tokens: await self._broadcast("brainstorm_reference_selection_failed", { "topic_id": self._current_topic_id, @@ -4199,9 +4339,9 @@ def build_prompt(candidates: List[Dict[str, Any]], retry_feedback: str = "") -> response = await api_client_manager.generate_completion( task_id=self._reference_selector.get_current_task_id() if self._reference_selector else "agg_sub1_000", role_id="autonomous_reference_selector", - model=self._submitter_configs[0].model_id if self._submitter_configs else self._high_context_model, + model=self._submitter_configs[0].model_id if self._submitter_configs else self._writer_model, messages=[{"role": "user", "content": prompt}], - max_tokens=self._submitter_configs[0].max_output_tokens if self._submitter_configs else self._high_context_max_tokens, + max_tokens=self._submitter_configs[0].max_output_tokens if self._submitter_configs else self._writer_max_tokens, temperature=0.0, ) if self._reference_selector: @@ -4439,6 +4579,7 @@ async def hard_limit_callback(total_acceptances: int) -> None: f"brainstorm_{brainstorm_memory._safe_topic_id(self._current_topic_id)}" "_submitter_{submitter_id}_rejections.txt" ), + assistant_workflow_mode_override="autonomous", ) # CRITICAL FIX: Re-ingest existing submissions into RAG after resume @@ -4541,11 +4682,49 @@ async def hard_limit_callback(total_acceptances: int) -> None: await self._brainstorm_aggregator.stop() return False + async def handle_manual_override() -> bool: + logger.info("Manual override detected - transitioning to paper writing") + self._manual_paper_writing_triggered = False + await self._brainstorm_aggregator.stop() + proof_status = await self._run_brainstorm_completion_proofs() + return proof_status == "complete" + # Get current aggregator stats status = await self._brainstorm_aggregator.get_status() + if getattr(self._brainstorm_aggregator, "fatal_error_type", None) == "context_overflow": + logger.error( + "Brainstorm aggregation stopped for context overflow: %s", + getattr(self._brainstorm_aggregator, "fatal_error_message", ""), + ) + self._mark_context_overflow_stop() + self._stop_event.set() + return False current_acceptances = status.total_acceptances current_rejections = status.total_rejections current_cleanup_removals = status.removals_executed # Track actual cleanup/pruning removals + + if not status.is_running: + if self._manual_paper_writing_triggered: + return await handle_manual_override() + + total_acceptances = resume_acceptance_base + current_acceptances + cap_reached = bool( + getattr(self._brainstorm_aggregator, "_acceptance_cap_reached", False) + or self._brainstorm_hard_limit_triggered + or ( + getattr(self._brainstorm_aggregator, "max_total_acceptances", None) is not None + and total_acceptances >= getattr(self._brainstorm_aggregator, "max_total_acceptances") + ) + ) + if cap_reached: + self._acceptance_count = max(self._acceptance_count, total_acceptances) + if not self._brainstorm_hard_limit_triggered: + await self._trigger_brainstorm_hard_limit(self._acceptance_count) + proof_status = await self._run_brainstorm_completion_proofs() + return proof_status == "complete" + + logger.warning("Brainstorm aggregator stopped unexpectedly") + return False # Track cleanup removals for status display if current_cleanup_removals != self._cleanup_removals: @@ -4612,11 +4791,7 @@ async def hard_limit_callback(total_acceptances: int) -> None: # Check for manual override trigger (before checking stop event) if self._manual_paper_writing_triggered: - logger.info("Manual override detected - transitioning to paper writing") - self._manual_paper_writing_triggered = False - await self._brainstorm_aggregator.stop() - proof_status = await self._run_brainstorm_completion_proofs() - return proof_status == "complete" + return await handle_manual_override() # Track consecutive rejections and increment total rejections stat if current_rejections > last_rejections: @@ -5144,6 +5319,10 @@ async def _paper_compilation_workflow( if self._stop_event.is_set(): return False + + if not candidate_titles.strip(): + logger.warning("Paper title exploration produced no validated candidates; retrying title exploration") + return False # Step 3: Final title selection (informed by candidate titles) paper_title = await self._paper_title_selection( @@ -5377,7 +5556,6 @@ async def _paper_title_exploration_phase( ) TARGET_CANDIDATES = 5 - MAX_CONSECUTIVE_REJECTIONS = 15 # Build the exploration user prompt for the aggregator from backend.autonomous.prompts.paper_title_exploration_prompts import build_title_exploration_user_prompt @@ -5424,6 +5602,7 @@ async def _paper_title_exploration_phase( ) exploration_aggregator = None + exploration_completed = False try: # Short-circuit: if we already have enough candidates from a prior run, @@ -5461,6 +5640,7 @@ async def _paper_title_exploration_phase( "_submitter_{submitter_id}_rejections.txt" ), reset_local_rejection_logs_on_start=True, + assistant_workflow_mode_override="autonomous", ) if self._broadcast_callback: @@ -5475,16 +5655,25 @@ async def _paper_title_exploration_phase( last_aggregator_acceptances = 0 last_acceptances = resumed_count last_rejections = 0 - consecutive_rejections = 0 while self._running and not self._stop_event.is_set(): status = await exploration_aggregator.get_status() + if getattr(exploration_aggregator, "fatal_error_type", None) == "context_overflow": + logger.error( + "Paper title exploration stopped for context overflow: %s", + getattr(exploration_aggregator, "fatal_error_message", ""), + ) + self._mark_context_overflow_stop() + self._stop_event.set() + return "" + if not status.is_running: + logger.warning("Paper title exploration aggregator stopped unexpectedly") + return "" current_aggregator_acceptances = status.total_acceptances current_acceptances = resumed_count + current_aggregator_acceptances current_rejections = status.total_rejections if current_aggregator_acceptances > last_aggregator_acceptances: - consecutive_rejections = 0 last_aggregator_acceptances = current_aggregator_acceptances last_acceptances = current_acceptances @@ -5506,17 +5695,19 @@ async def _paper_title_exploration_phase( break if current_rejections > last_rejections: - new_rejections = current_rejections - last_rejections - consecutive_rejections += new_rejections last_rejections = current_rejections - - if consecutive_rejections >= MAX_CONSECUTIVE_REJECTIONS: - logger.warning(f"TitleExploration: {consecutive_rejections} consecutive rejections - proceeding with {current_acceptances} candidates") - break await asyncio.sleep(2) await exploration_aggregator.stop() + + if last_acceptances < TARGET_CANDIDATES: + logger.warning( + "Paper title exploration ended before target: %s/%s candidates accepted", + last_acceptances, + TARGET_CANDIDATES, + ) + return "" # Read accepted candidates from the title candidates database candidates_text = "" @@ -5536,12 +5727,19 @@ async def _paper_title_exploration_phase( lines.append("-" * 40) candidates_text = "\n".join(lines) + if not candidates_text.strip(): + logger.warning( + "Paper title exploration reached target count but no candidate text was available; retrying title exploration" + ) + return "" + await self._broadcast("paper_title_exploration_complete", { "accepted_count": last_acceptances, "total_attempts": last_acceptances + last_rejections }) logger.info(f"Paper title exploration complete: {last_acceptances} candidates accepted") + exploration_completed = True return candidates_text @@ -5572,7 +5770,7 @@ async def _paper_title_exploration_phase( shared_training_memory.last_ragged_submission_count = 0 logger.info("TitleExploration: Restored shared_training_memory state") - if title_db_path.exists(): + if exploration_completed and title_db_path.exists(): try: title_db_path.unlink() except OSError as cleanup_exc: @@ -5609,8 +5807,8 @@ async def _compile_paper( # route sets these, so autonomous mode must do it explicitly. system_config.compiler_validator_context_window = self._validator_context system_config.compiler_validator_max_output_tokens = self._validator_max_tokens - system_config.compiler_high_context_context_window = self._high_context_context - system_config.compiler_high_context_max_output_tokens = self._high_context_max_tokens + system_config.compiler_writer_context_window = self._writer_context + system_config.compiler_writer_max_output_tokens = self._writer_max_tokens system_config.compiler_high_param_context_window = self._high_param_context system_config.compiler_high_param_max_output_tokens = self._high_param_max_tokens system_config.compiler_critique_submitter_context_window = self._critique_submitter_context @@ -5633,7 +5831,7 @@ async def _compile_paper( await self._paper_compiler.initialize( compiler_prompt=self._get_effective_compiler_prompt(paper_title), validator_model=self._validator_model, - high_context_model=self._high_context_model, + writer_model=self._writer_model, high_param_model=self._high_param_model, critique_submitter_model=self._critique_submitter_model, skip_aggregator_db=True, # Don't load Part 1 aggregator - use brainstorm DB only @@ -5642,10 +5840,10 @@ async def _compile_paper( validator_openrouter_provider=self._validator_openrouter_provider, validator_openrouter_reasoning_effort=self._validator_openrouter_reasoning_effort, validator_lm_studio_fallback=self._validator_lm_studio_fallback, - high_context_provider=self._high_context_provider, - high_context_openrouter_provider=self._high_context_openrouter_provider, - high_context_openrouter_reasoning_effort=self._high_context_openrouter_reasoning_effort, - high_context_lm_studio_fallback=self._high_context_lm_studio_fallback, + writer_provider=self._writer_provider, + writer_openrouter_provider=self._writer_openrouter_provider, + writer_openrouter_reasoning_effort=self._writer_openrouter_reasoning_effort, + writer_lm_studio_fallback=self._writer_lm_studio_fallback, high_param_provider=self._high_param_provider, high_param_openrouter_provider=self._high_param_openrouter_provider, high_param_openrouter_reasoning_effort=self._high_param_openrouter_reasoning_effort, @@ -5655,7 +5853,7 @@ async def _compile_paper( critique_submitter_openrouter_reasoning_effort=self._critique_submitter_openrouter_reasoning_effort, critique_submitter_lm_studio_fallback=self._critique_submitter_lm_studio_fallback, validator_supercharge_enabled=self._validator_supercharge_enabled, - high_context_supercharge_enabled=self._high_context_supercharge_enabled, + writer_supercharge_enabled=self._writer_supercharge_enabled, high_param_supercharge_enabled=self._high_param_supercharge_enabled, critique_submitter_supercharge_enabled=self._critique_submitter_supercharge_enabled ) @@ -5825,6 +6023,13 @@ async def _compile_paper( # Check if compiler has stopped (error or other reason) if not self._paper_compiler.is_running: + if getattr(self._paper_compiler, "fatal_error_type", None) == CONTEXT_OVERFLOW_STOP_REASON: + logger.error( + "Paper compiler stopped for context overflow: %s", + getattr(self._paper_compiler, "fatal_error_message", ""), + ) + self._mark_context_overflow_stop() + self._stop_event.set() logger.warning("Compiler stopped unexpectedly") break @@ -5872,28 +6077,53 @@ def _has_abstract(self, paper_content: str) -> bool: def _extract_abstract(self, paper_content: str) -> str: """Extract abstract text from paper.""" - # Try to find abstract section - abstract_patterns = [ - r"##\s*Abstract\s*\n(.*?)(?=\n##|\n#|\Z)", - r"#\s*Abstract\s*\n(.*?)(?=\n##|\n#|\Z)", - r"\*\*Abstract\*\*\s*\n(.*?)(?=\n##|\n#|\n\*\*|\Z)", - r"\\(?:section|chapter)\*?\{Abstract\}\s*\n(.*?)(?=\n\\(?:section|chapter)\*?\{|\Z)", + def limit_metadata_abstract(abstract_text: str) -> Optional[str]: + abstract_text = abstract_text.strip() + if not abstract_text or abstract_text.lower() == "abstract": + return None + return abstract_text[:500] if len(abstract_text) > 500 else abstract_text + + begin_match = re.search( r"\\begin\{abstract\}\s*(.*?)\s*\\end\{abstract\}", - ] - - for pattern in abstract_patterns: - match = re.search(pattern, paper_content, re.IGNORECASE | re.DOTALL) - if match: - abstract = match.group(1).strip() - # Limit to first 500 chars for metadata - return abstract[:500] if len(abstract) > 500 else abstract - - # Fallback: first paragraph after title + paper_content, + re.IGNORECASE | re.DOTALL, + ) + if begin_match: + abstract = limit_metadata_abstract(begin_match.group(1)) + if abstract: + return abstract + + heading_pattern = re.compile( + r"(?im)^\s*(?:#{1,6}\s*)?(?:\*\*)?Abstract(?:\*\*)?\s*$|" + r"^\s*\\(?:section|chapter)\*?\{Abstract\}\s*$" + ) + next_section_pattern = re.compile( + r"(?im)^\s*(?:#{1,6}\s*)?(?:\*\*)?" + r"(?:[IVXLCDM]+\.|\d+\.)?\s*" + r"(?:Introduction|Background|Preliminaries|Body|Conclusion|References|Bibliography|Appendix)\b" + r"|^\s*\\(?:section|chapter)\*?\{(?!Abstract\})[^}]+\}\s*$" + ) + + for match in heading_pattern.finditer(paper_content): + section_start = match.end() + next_match = next_section_pattern.search(paper_content, section_start) + section_end = next_match.start() if next_match else len(paper_content) + abstract = limit_metadata_abstract(paper_content[section_start:section_end]) + if abstract: + return abstract + + # Fallback: first content line after title/header metadata, skipping section headings. lines = paper_content.split('\n') - for i, line in enumerate(lines): - if line.strip() and not line.startswith('#'): - # Found first non-heading line - return lines[i].strip()[:500] + heading_line_pattern = re.compile( + r"^\s*(?:#{1,6}\s*)?(?:\*\*)?" + r"(?:Abstract|Introduction|Conclusion|References|Bibliography|Appendix)" + r"(?:\*\*)?\s*$", + re.IGNORECASE, + ) + for line in lines: + stripped = line.strip() + if stripped and not stripped.startswith('#') and not heading_line_pattern.match(stripped): + return stripped[:500] return "[Abstract not found]" @@ -7215,6 +7445,10 @@ async def _tier3_title_selection( if self._stop_event.is_set(): return None + + if not candidate_titles.strip(): + logger.warning("Tier 3 title exploration produced no validated candidates; retrying title selection later") + return None # Use the existing title selector with special context + candidate titles title = await self._title_selector.select_title( @@ -7246,8 +7480,8 @@ async def _compile_tier3_paper( # Same as in _compile_paper_from_brainstorm — compiler modules read from system_config at init. system_config.compiler_validator_context_window = self._validator_context system_config.compiler_validator_max_output_tokens = self._validator_max_tokens - system_config.compiler_high_context_context_window = self._high_context_context - system_config.compiler_high_context_max_output_tokens = self._high_context_max_tokens + system_config.compiler_writer_context_window = self._writer_context + system_config.compiler_writer_max_output_tokens = self._writer_max_tokens system_config.compiler_high_param_context_window = self._high_param_context system_config.compiler_high_param_max_output_tokens = self._high_param_max_tokens system_config.compiler_critique_submitter_context_window = self._critique_submitter_context @@ -7269,7 +7503,7 @@ async def _compile_tier3_paper( f"Known Certainties: {assessment.known_certainties_summary}" ), validator_model=self._validator_model, - high_context_model=self._high_context_model, + writer_model=self._writer_model, high_param_model=self._high_param_model, critique_submitter_model=self._critique_submitter_model, skip_aggregator_db=True, # CRITICAL: Don't load any aggregator database @@ -7278,10 +7512,10 @@ async def _compile_tier3_paper( validator_openrouter_provider=self._validator_openrouter_provider, validator_openrouter_reasoning_effort=self._validator_openrouter_reasoning_effort, validator_lm_studio_fallback=self._validator_lm_studio_fallback, - high_context_provider=self._high_context_provider, - high_context_openrouter_provider=self._high_context_openrouter_provider, - high_context_openrouter_reasoning_effort=self._high_context_openrouter_reasoning_effort, - high_context_lm_studio_fallback=self._high_context_lm_studio_fallback, + writer_provider=self._writer_provider, + writer_openrouter_provider=self._writer_openrouter_provider, + writer_openrouter_reasoning_effort=self._writer_openrouter_reasoning_effort, + writer_lm_studio_fallback=self._writer_lm_studio_fallback, high_param_provider=self._high_param_provider, high_param_openrouter_provider=self._high_param_openrouter_provider, high_param_openrouter_reasoning_effort=self._high_param_openrouter_reasoning_effort, @@ -7291,7 +7525,7 @@ async def _compile_tier3_paper( critique_submitter_openrouter_reasoning_effort=self._critique_submitter_openrouter_reasoning_effort, critique_submitter_lm_studio_fallback=self._critique_submitter_lm_studio_fallback, validator_supercharge_enabled=self._validator_supercharge_enabled, - high_context_supercharge_enabled=self._high_context_supercharge_enabled, + writer_supercharge_enabled=self._writer_supercharge_enabled, high_param_supercharge_enabled=self._high_param_supercharge_enabled, critique_submitter_supercharge_enabled=self._critique_submitter_supercharge_enabled ) @@ -7315,12 +7549,17 @@ async def _compile_tier3_paper( # IMPORTANT: Use paper_library.get_paper_path() for session-aware path resolution paper_path = paper_library.get_paper_path(ref_paper_id) if os.path.exists(paper_path): - await rag_manager.add_document( - paper_path, - chunk_sizes=[512], - is_user_file=True # High priority - ) - logger.info(f"Tier 3 reference loaded: {ref_paper_id}") + ref_content = await paper_library.get_paper_content(ref_paper_id, strip_proofs=True) + if ref_content: + await rag_manager.add_text( + ref_content, + f"tier3_reference_paper_{ref_paper_id}.txt", + chunk_sizes=[512], + is_permanent=True, + ) + logger.info(f"Tier 3 reference loaded with proof sections stripped: {ref_paper_id}") + else: + logger.warning(f"Tier 3 reference paper was empty after proof stripping: {ref_paper_id}") # Start compiler await self._paper_compiler.start() @@ -7336,6 +7575,13 @@ async def _compile_tier3_paper( break if not self._paper_compiler.is_running: + if getattr(self._paper_compiler, "fatal_error_type", None) == CONTEXT_OVERFLOW_STOP_REASON: + logger.error( + "Tier 3 compiler stopped for context overflow: %s", + getattr(self._paper_compiler, "fatal_error_message", ""), + ) + self._mark_context_overflow_stop() + self._stop_event.set() break await asyncio.sleep(3) @@ -7399,6 +7645,10 @@ async def _write_volume_chapter( if self._stop_event.is_set(): return False + + if not candidate_titles.strip(): + logger.warning("Volume chapter title exploration produced no validated candidates; retrying chapter later") + return False # Select chapter title from candidates chapter_title = await self._title_selector.select_title( diff --git a/backend/autonomous/core/proof_verification_stage.py b/backend/autonomous/core/proof_verification_stage.py index c61416a..9816d2e 100644 --- a/backend/autonomous/core/proof_verification_stage.py +++ b/backend/autonomous/core/proof_verification_stage.py @@ -18,7 +18,11 @@ from backend.autonomous.core.proof_registration import register_verified_lean_proof from backend.shared.config import system_config from backend.shared.lean_proof_integrity import validate_full_lean_proof_integrity -from backend.shared.model_error_utils import is_non_retryable_model_error +from backend.shared.model_error_utils import ( + format_transient_provider_error, + is_non_retryable_model_error, + is_transient_model_call_error, +) from backend.shared.models import ProofAttemptFeedback, ProofAttemptResult, ProofCandidate, ProofStageResult, SmtHint from backend.shared.openrouter_client import FreeModelExhaustedError from backend.shared.provider_pause import is_provider_credit_pause_error @@ -1110,6 +1114,37 @@ async def cancel_and_drain(extra_tasks=()) -> None: if is_non_retryable_model_error(exc): await save_checkpoint("provider_paused") raise + if is_transient_model_call_error(exc): + await save_checkpoint("error") + result.had_error = True + result.error_message = format_transient_provider_error(exc) + logger.warning( + "Proof verification transient provider failure for %s %s; preserving checkpoint: %s", + source_type, + source_id, + exc, + ) + await self._broadcast( + broadcast_fn, + "proof_check_complete", + { + "source_type": source_type, + "source_id": source_id, + "source_title": source_title, + "trigger": trigger, + "proof_round_index": proof_round_index, + "proof_max_rounds": proof_max_rounds, + "novel_count": result.novel_count, + "verified_count": result.verified_count, + "total_candidates": result.total_candidates, + "message": ( + "Proof verification hit a transient provider error after retries; " + "the proof checkpoint was preserved for retry: " + f"{self._summarize_error(str(exc), limit=1800)}" + ), + }, + ) + return result await save_checkpoint("error") result.had_error = True result.error_message = str(exc) @@ -1134,7 +1169,7 @@ async def cancel_and_drain(extra_tasks=()) -> None: "total_candidates": result.total_candidates, "message": ( "Proof verification encountered an error: " - f"{self._summarize_error(str(exc), limit=960)}" + f"{self._summarize_error(str(exc), limit=1800)}" ), }, ) diff --git a/backend/autonomous/memory/brainstorm_memory.py b/backend/autonomous/memory/brainstorm_memory.py index 24e5569..0ba7f38 100644 --- a/backend/autonomous/memory/brainstorm_memory.py +++ b/backend/autonomous/memory/brainstorm_memory.py @@ -305,10 +305,12 @@ async def get_database_content(self, topic_id: str, *, strip_proofs: bool = Fals async with aiofiles.open(db_path, 'r', encoding='utf-8') as f: content = await f.read() if strip_proofs and content: - marker = "=== PROOFS GENERATED FROM THIS BRAINSTORM" - idx = content.find(marker) - if idx > 0: - content = content[:idx].rstrip() + match = re.search( + r"(?m)^=== PROOFS GENERATED FROM THIS BRAINSTORM(?: \(Lean 4 Verified\))? ===\s*$", + content, + ) + if match and match.start() > 0: + content = content[:match.start()].rstrip() return content except Exception as e: logger.error( diff --git a/backend/autonomous/memory/paper_library.py b/backend/autonomous/memory/paper_library.py index 843ea9d..12d069e 100644 --- a/backend/autonomous/memory/paper_library.py +++ b/backend/autonomous/memory/paper_library.py @@ -401,12 +401,13 @@ def strip_verified_proofs_from_content(content: str) -> str: empty_appendix = f"{appendix_start}\n{empty_placeholder}\n{appendix_end}" stripped = stripped[:start_idx] + empty_appendix + stripped[end_idx:] - terminal_headers = ( - "=== PROOFS GENERATED FROM THIS PAPER", - "=== PROOFS ATTACHED TO THIS PAPER", - ) header_positions = [ - idx for header in terminal_headers if (idx := stripped.find(header)) > 0 + match.start() + for match in re.finditer( + r"(?m)^=== PROOFS (?:GENERATED FROM|ATTACHED TO) THIS PAPER(?: \(Lean 4 Verified\))? ===\s*$", + stripped, + ) + if match.start() > 0 ] if header_positions: proof_start = min(header_positions) diff --git a/backend/autonomous/memory/proof_database.py b/backend/autonomous/memory/proof_database.py index 6736a4b..bdf303e 100644 --- a/backend/autonomous/memory/proof_database.py +++ b/backend/autonomous/memory/proof_database.py @@ -745,7 +745,7 @@ async def inject_failure_hints_into_prompt( hints_block = format_failure_hints_for_injection(hints) if not hints_block: return prompt - if "=== OPEN LEMMA TARGETS LEAN 4 COULD NOT YET CLOSE ===" in prompt: + if "=== OPEN PROOF TARGETS LEAN 4 COULD NOT YET CLOSE ===" in prompt: return prompt if not prompt: return hints_block diff --git a/backend/autonomous/memory/research_metadata.py b/backend/autonomous/memory/research_metadata.py index 4696d77..010ca8d 100644 --- a/backend/autonomous/memory/research_metadata.py +++ b/backend/autonomous/memory/research_metadata.py @@ -129,6 +129,18 @@ async def initialize(self, user_research_prompt: str = "") -> None: if self._metadata_path.exists(): await self._load_metadata() + needs_save = False + saved_prompt = ( + self._data.get("user_research_prompt") + or self._data.get("user_prompt") + or "" + ) + if saved_prompt and not self._data.get("user_research_prompt"): + self._data["user_research_prompt"] = saved_prompt + needs_save = True + if saved_prompt and not self._data.get("base_user_research_prompt"): + self._data["base_user_research_prompt"] = saved_prompt + needs_save = True # If prompt provided and differs from saved, optionally update if user_research_prompt and self._data.get("user_research_prompt") != user_research_prompt: logger.info("User research prompt updated") @@ -136,6 +148,8 @@ async def initialize(self, user_research_prompt: str = "") -> None: if not self._data.get("base_user_research_prompt"): self._data["base_user_research_prompt"] = user_research_prompt await self._save_metadata() + elif needs_save: + await self._save_metadata() else: self._data = { "user_research_prompt": user_research_prompt, @@ -281,15 +295,15 @@ def _get_default_workflow_state(self) -> Dict[str, Any]: "model_config": { "submitter_model": None, "validator_model": None, - "high_context_model": None, + "writer_model": None, "high_param_model": None, "submitter_context_window": 0, "validator_context_window": 0, - "high_context_context_window": 0, + "writer_context_window": 0, "high_param_context_window": 0, "submitter_max_tokens": 0, "validator_max_tokens": 0, - "high_context_max_tokens": 0, + "writer_max_tokens": 0, "high_param_max_tokens": 0 }, "last_updated": datetime.now().isoformat() @@ -889,6 +903,11 @@ async def clear_all(self) -> None: async with self._lock: self._data = { "user_research_prompt": "", + "base_user_research_prompt": "", + "proof_framing_active": False, + "proof_framing_context": "", + "proof_framing_reasoning": "", + "proof_runtime_config": {}, "brainstorms": [], "papers": [], "next_topic_id": 1, diff --git a/backend/autonomous/prompts/paper_redundancy_prompts.py b/backend/autonomous/prompts/paper_redundancy_prompts.py index 8e6ea00..d0ef056 100644 --- a/backend/autonomous/prompts/paper_redundancy_prompts.py +++ b/backend/autonomous/prompts/paper_redundancy_prompts.py @@ -50,11 +50,11 @@ def get_paper_redundancy_system_prompt() -> str: REASONS TO KEEP - A paper should be kept if it: 1. Provides a stronger direct answer to the user's prompt than overlapping papers -2. Provides ANY unique mathematical content not covered elsewhere -3. Offers a different perspective or approach even if related to other papers -4. Contains specific proofs, theorems, or techniques not present elsewhere -5. Contributes to research diversity in any meaningful way -6. Covers distinct mathematical subtopics within a broader area +2. Provides unique mathematical content that materially strengthens a direct route to the user's prompt +3. Offers a different perspective or approach that materially improves the strongest direct answer path +4. Contains specific proofs, theorems, or techniques necessary for direct prompt progress +5. Contributes to research diversity only when that diversity improves credible direct-answer progress +6. Covers distinct mathematical subtopics only when those subtopics are necessary to the user's prompt CONSERVATIVE APPROACH: - When in doubt, DO NOT recommend removal diff --git a/backend/autonomous/prompts/paper_reference_prompts.py b/backend/autonomous/prompts/paper_reference_prompts.py index 6a138ad..51ba212 100644 --- a/backend/autonomous/prompts/paper_reference_prompts.py +++ b/backend/autonomous/prompts/paper_reference_prompts.py @@ -8,12 +8,12 @@ This is the CRUCIAL MECHANISM that enables COMPOUNDING KNOWLEDGE across research cycles. By selecting reference papers before brainstorming, submitters can: -- Build upon proven mathematical frameworks from prior papers +- Build upon promising mathematical frameworks from prior AI-generated papers while independently re-checking their claims - Avoid re-exploring territory already covered in depth -- Identify novel connections between new topics and established results +- Identify novel connections between new topics and previously explored results - Accelerate convergence on valuable insights by standing on prior work """ -from typing import List, Dict, Any +from typing import List, Dict, Any, Optional def get_reference_title_text(paper: Dict[str, Any]) -> str: @@ -59,9 +59,9 @@ def get_pre_brainstorm_expansion_system_prompt(max_papers: int) -> str: WHY THIS MATTERS - COMPOUNDING KNOWLEDGE: This is the crucial mechanism that allows the system to compound knowledge across research cycles. By selecting reference papers BEFORE brainstorming, you can: -- Build upon proven mathematical frameworks from prior papers +- Build upon promising mathematical frameworks from prior AI-generated papers, while independently re-checking their claims - Avoid re-exploring territory already covered in depth -- Identify novel connections between your new topic and established results +- Identify novel connections between your new topic and previously explored results - Accelerate convergence on valuable insights by standing on prior work THRESHOLD: "VERY USEFUL FOR BRAINSTORMING" @@ -518,7 +518,8 @@ def build_reference_selection_prompt( brainstorm_summary: str, expanded_papers: List[Dict[str, Any]], mode: str = "initial", - max_papers: int = 6 + max_papers: int = 6, + retrieved_context: Optional[str] = None, ) -> str: """ Build the final reference selection prompt (Step 2: full papers). @@ -549,12 +550,17 @@ def build_reference_selection_prompt( "\n---\n" ] - # Add expanded papers with full content and outlines - parts.append("EXPANDED PAPERS (Full Content):\n") + # Add expanded papers with full content and outlines. When expanded papers + # overflow the prompt, callers may pass metadata plus a separate retrieved + # evidence block; keep real paper IDs visible so selection never targets a + # synthetic combined paper. + parts.append("EXPANDED PAPERS (Full Content or Metadata + Retrieved Evidence):\n") for p in expanded_papers: parts.append(f"\n{'=' * 60}") parts.append(f"\nPaper ID: {p.get('paper_id', 'Unknown')}") parts.append(f"\nTitle: {get_reference_title_text(p)}") + if p.get("abstract"): + parts.append(f"\nAbstract: {p.get('abstract')}") parts.append(f"\nWord Count: {p.get('word_count', 0)}") parts.append(f"\n{'=' * 60}") @@ -565,6 +571,17 @@ def build_reference_selection_prompt( parts.append(f"\n{'-' * 60}\n") parts.append(f"\n\nFULL PAPER CONTENT:\n{p.get('content', '[Content not available]')}\n") + + if retrieved_context: + parts.append("\n---\n") + parts.append( + "RAG-RETRIEVED FULL-PAPER EVIDENCE FROM THE EXPANDED REAL PAPER IDS ABOVE:\n" + ) + parts.append(retrieved_context) + parts.append( + "\n\nUse this evidence only to choose among the real Paper IDs listed above. " + "Do not select synthetic IDs." + ) parts.append("\n---\n") parts.append(f"REMINDER: You can select up to {max_papers} papers maximum for this selection.") diff --git a/backend/autonomous/prompts/proof_prompts.py b/backend/autonomous/prompts/proof_prompts.py index ba9726e..dc8d390 100644 --- a/backend/autonomous/prompts/proof_prompts.py +++ b/backend/autonomous/prompts/proof_prompts.py @@ -11,9 +11,9 @@ PROOF_FRAMING_CONTEXT = """[PROOF FRAMING CONTEXT -- This research prompt targets formal mathematical proof. All proof work must serve the user's research prompt. Submissions should pursue theorems, lemmas, and formalizations that directly help answer, support, or advance -that prompt. Seek public/citable novel knowledge first: major discoveries, -mathematical discoveries, novel variants, and only then prompt-critical novel -formalizations absent from standard references or Mathlib. +that prompt. Seek the most impactful new or novel proof targets possible: direct +solutions to the user's prompt first, then proof targets that materially advance +a solution path. The Lean 4 proof assistant is available for formal verification. Do not build a general known-knowledge base. Standard identities, routine helper lemmas, irrelevant curiosities, and well-known Mathlib/textbook results are NOT valuable @@ -87,8 +87,9 @@ def _format_attempt_history(prior_attempts: Iterable[ProofAttemptFeedback]) -> s rejection_banner = ( "!! PLACEHOLDER REJECTION !! This prior attempt was rejected " "because it used `sorry` / `admit` (or an equivalent placeholder). " - "Do NOT submit another placeholder proof. Either prove the goal " - "fully, or return a narrower lemma you can actually close." + "Do NOT submit another placeholder proof, and do NOT replace " + "the target with a narrower, easier, routine, or merely " + "supporting lemma. Attempt the same high-impact target faithfully." ) block = [ f"ATTEMPT {attempt.attempt}:", @@ -150,6 +151,11 @@ def _format_smt_hint(smt_hint: SmtHint | None) -> str: return "\n".join(sections) +def _format_retrieved_proof_context(retrieved_proofs_context: str = "") -> str: + text = (retrieved_proofs_context or "").strip() + return text if text else "[No retrieved proof-search context provided.]" + + def _format_candidate_novelty_context( expected_novelty_tier: str = "", prompt_relevance_rationale: str = "", @@ -177,7 +183,9 @@ def _format_candidate_novelty_context( - NEVER use `sorry` or `admit` in the proof body. MOTO rejects any proof that contains `sorry` or `admit` anywhere, even though Lean would only emit a warning. A proof with `sorry` is not a proof. If you cannot close - every goal, return a narrower lemma that you CAN fully prove. + every goal, do NOT replace the target with a narrower, easier, routine, or + merely supporting lemma. Attempt the same high-impact target faithfully and + let Lean feedback expose the real blocker. - NEVER introduce new `axiom` declarations that exist only to make the target theorem trivial. Axiomatizing the concepts in the statement (e.g. `axiom Protocol : Type`, `axiom IC ... : ℝ`) and then proving the @@ -221,8 +229,8 @@ def format_failure_hints_for_injection(failure_hints: Iterable[Any]) -> str: return "" lines = [ - "=== OPEN LEMMA TARGETS LEAN 4 COULD NOT YET CLOSE ===", - "[These are recent proof attempts that failed. Prefer brainstorms that generate missing lemmas, stronger assumptions, or cleaner formal theorem statements only when they directly support the user's research prompt.]", + "=== OPEN PROOF TARGETS LEAN 4 COULD NOT YET CLOSE ===", + "[These are recent high-impact proof attempts that failed. Use them only to repair or retry the same prompt-solving target with stronger assumptions, clearer theorem statements, or corrected formalization strategy. Do NOT downshift to supporting lemmas, routine helpers, or easy local facts.]", "", ] for index, hint in enumerate(hints, start=1): @@ -256,9 +264,9 @@ def format_failure_hints_for_injection(failure_hints: Iterable[Any]) -> str: placeholder_note = ( "Note: the previous formalization attempt was rejected because " "it used `sorry`/`admit` or axiomatized the theorem's concepts " - "to make the goal trivial. Prefer brainstorms that state a " - "narrower, concretely provable lemma that still supports the " - "user's research prompt instead of the full claim." + "to make the goal trivial. Prefer brainstorms that repair the " + "real blocker for the same high-impact target. Do NOT downshift " + "to a narrower, easier, routine, or merely supporting lemma." ) lines.extend( [ @@ -266,13 +274,13 @@ def format_failure_hints_for_injection(failure_hints: Iterable[Any]) -> str: f"Expected novelty tier: {expected_novelty_tier or '[unknown]'}", f"Novelty rationale: {_truncate_text(novelty_rationale or '[not recorded]', 200)}", f"Lean 4 failure summary: {_truncate_text(error_summary or '[no summary available]', 200)}", - f"Suggested lemma targets: {', '.join(suggested_targets[:6]) if suggested_targets else '[none identified]'}", + f"Lean blocker clues: {', '.join(suggested_targets[:6]) if suggested_targets else '[none identified]'}", ] ) if placeholder_note: lines.append(placeholder_note) lines.append("---") - lines.append("=== END OPEN LEMMA TARGETS ===") + lines.append("=== END OPEN PROOF TARGETS ===") return "\n".join(lines) @@ -378,17 +386,17 @@ def build_proof_identification_prompt( This is NOT a known-knowledge-base construction task. Do not collect standard facts just because they are true, useful, formalizable, or prompt-adjacent. Lean 4 verification cost is reserved for candidates that could become public, citable prompt-relevant knowledge rather than run-local firsts. -Above all, list first any claims that aggressively attempt to solve the USER RESEARCH PROMPT itself. A BRAINSTORM TOPIC, when present, is source metadata that helps interpret context; it must never broaden eligibility to proofs that are merely brainstorm-related. After direct solution attempts, include only genuinely novel supporting subgoals that visibly build toward solving the USER RESEARCH PROMPT. +Above all, list first any claims that aggressively attempt to solve the USER RESEARCH PROMPT itself. A BRAINSTORM TOPIC, when present, is source metadata that helps interpret context; it must never broaden eligibility to proofs that are merely brainstorm-related. Do not extract supporting subgoals as proof targets; a candidate must itself be a high-impact prompt-solving theorem. {proof_round_context} MOTO's goal is to push the frontier of mathematical knowledge in service of the user's stated problem. You are the gatekeeper that decides which theorems are worth the cost of formal verification. Be ambitious, but do not chase unrelated mathematical curiosities: a proof candidate must be useful for the user's prompt, not merely non-trivial in isolation. -NOVELTY PRIORITY ORDER (extract in this order): -1. major_mathematical_discovery: exceptional breakthroughs that appear to resolve an important prompt-relevant problem or create unusually powerful new theory. -2. mathematical_discovery: new theorems, bounds, reductions, impossibility results, structural facts, or connections not present in standard references or Mathlib. -3. novel_variant: meaningful reformulations of known mathematics that change hypotheses, strengthen conclusions, expose a new bridge, or use a genuinely new proof strategy toward the prompt. -4. novel_formulation: prompt-critical formulations or Lean 4 formalizations whose exact theorem/formulation is not present in standard references or Mathlib and would be independently publishable/citable; this is lower priority than mathematical novelty. -5. Supporting lemmas only when they are necessary stepping stones toward one of the higher-priority novel targets above. Do not extract routine helper lemmas as standalone proof goals. +TARGET SELECTION: +- Seek the most impactful new or novel proof targets possible for the USER RESEARCH PROMPT. +- Prefer proof targets that directly solve the prompt, rule out an impossible prompt, establish a decisive reduction, prove a new obstruction, or otherwise make major progress on the requested problem. +- Supporting lemmas, routine helper lemmas, local facts, and trivial/easy proofs are NEVER valid proof targets, even as a fallback or last resort. +- Do not settle for a minor reformulation, local formalization, or easy-to-prove fact. +- If a target is selected, the downstream formalization agent will receive multiple Lean 4 attempts with compiler feedback. Choose ambitious high-impact targets instead of tiny safe targets selected only because they are easy. WHAT TO REJECT (never extract these): - Mathematically interesting claims that do not materially help the USER RESEARCH PROMPT @@ -401,10 +409,10 @@ def build_proof_identification_prompt( - Routine algebraic manipulations with no conceptual content Rules: -- Return TRUE only when at least one prompt-relevant theorem candidate is expected to be novel under the priority order above. +- Return TRUE only when at least one prompt-relevant theorem candidate is expected to be new or novel enough to be worth Lean 4 verification. - Return FALSE if the source contains no theorem that would materially help answer, support, or advance the USER RESEARCH PROMPT. -- Order candidates by novelty-first prompt-solving value: direct USER RESEARCH PROMPT solutions first, then major discoveries, mathematical discoveries, novel variants, citable novel formulations/formalizations absent from standard references and Mathlib, then necessary supporting lemmas that build toward those prompt-solving targets. This ordering is not a cap. -- Return every prompt-relevant theorem that is novel enough to be worth attempting. +- Order candidates by impact on the USER RESEARCH PROMPT: direct solutions or decisive impossibility results first, then the strongest reductions, obstructions, or structural theorems that themselves make major progress on the requested problem. This ordering is not a cap. +- Return every prompt-relevant theorem that is impactful enough to be worth attempting. - For each candidate, set expected_novelty_tier to one of: major_mathematical_discovery, mathematical_discovery, novel_variant, novel_formulation. - For each candidate, include prompt_relevance_rationale, novelty_rationale, and why_not_standard_known_result. The prompt_relevance_rationale must explicitly say whether the candidate directly solves the USER RESEARCH PROMPT or how it builds toward solving it. If you cannot explain that, or cannot explain why it is not merely standard known mathematics, reject it. - Welcome bold or speculative claims only when they are prompt-relevant -- if the source proposes something ambitious that might be provable with the right formalization, extract it. The downstream formalization agent will handle narrowing if needed. @@ -546,11 +554,13 @@ def build_proof_formalization_prompt( prompt_relevance_rationale: str = "", novelty_rationale: str = "", why_not_standard_known_result: str = "", + retrieved_proofs_context: str = "", ) -> str: """Build the Lean 4 formalization prompt for one theorem.""" attempt_history = _format_attempt_history(prior_attempts) relevant_lemmas_block = _format_relevant_lemmas(relevant_lemmas) smt_hint_block = _format_smt_hint(smt_hint) + retrieved_proofs_block = _format_retrieved_proof_context(retrieved_proofs_context) candidate_novelty_block = _format_candidate_novelty_context( expected_novelty_tier=expected_novelty_tier, prompt_relevance_rationale=prompt_relevance_rationale, @@ -587,7 +597,7 @@ def build_proof_formalization_prompt( proofs (e.g. axiomatizing the theorem's own concepts and then closing with `sorry`) will be rejected even if Lean compiles them with only a warning. -- If the theorem seems invalid or underspecified, still make the strongest faithful formalization attempt you can from the provided source. If the full theorem cannot be proved, prove a narrower concrete lemma that is faithful to the source -- do NOT return a `sorry`-closed stub. +- If the theorem seems invalid or underspecified, still make the strongest faithful formalization attempt you can from the provided source. If the full theorem cannot be proved, do NOT replace it with a narrower, easier, routine, trivial, local, or merely supporting lemma. Submit only a faithful attempt at the selected high-impact target and let Lean feedback expose the real blocker. - The full source content is mandatory authoritative context. Use the focused excerpt only as a navigation aid for the selected theorem, not as a replacement for the full brainstorm or paper. @@ -627,6 +637,11 @@ def build_proof_formalization_prompt( If SMT guidance is present, treat it as a hint only. Lean 4 must still prove the theorem directly. If one of the suggested tactics is genuinely appropriate, you may use it. Do not force it when it does not fit the goal. +SYNTHETIC / LOCAL VERIFIED PROOF SEARCH RESULTS: +{retrieved_proofs_block} + +Use retrieved proofs only as optional proof-pattern/dependency guidance for the TARGET THEOREM. Do not replace the selected theorem with a routine helper, a standard known result, or an unrelated retrieved theorem. + {LEAN4_COMMON_PITFALLS} PRIOR ATTEMPT HISTORY: @@ -651,11 +666,13 @@ def build_proof_tactic_script_prompt( prompt_relevance_rationale: str = "", novelty_rationale: str = "", why_not_standard_known_result: str = "", + retrieved_proofs_context: str = "", ) -> str: """Build a tactic-oriented Lean 4 prompt for one theorem.""" attempt_history = _format_attempt_history(prior_attempts) relevant_lemmas_block = _format_relevant_lemmas(relevant_lemmas) smt_hint_block = _format_smt_hint(smt_hint) + retrieved_proofs_block = _format_retrieved_proof_context(retrieved_proofs_context) candidate_novelty_block = _format_candidate_novelty_context( expected_novelty_tier=expected_novelty_tier, prompt_relevance_rationale=prompt_relevance_rationale, @@ -697,7 +714,7 @@ def build_proof_tactic_script_prompt( `sorry`/`admit` will be rejected even if Lean compiles it. - Include needed assumptions in the theorem header. Do NOT axiomatize the concepts inside the theorem statement just to make the goal trivial. -- If the theorem is underspecified, make the strongest faithful formalization attempt you can from the source. If you cannot close every goal, return a narrower concrete lemma instead of a `sorry`-closed stub. +- If the theorem is underspecified, make the strongest faithful formalization attempt you can from the source. If you cannot close every goal, do NOT replace it with a narrower, easier, routine, trivial, local, or merely supporting lemma. Submit only a faithful attempt at the selected high-impact target and let Lean feedback expose the real blocker. - The full source content is mandatory authoritative context. Use the focused excerpt only as a navigation aid for the selected theorem, not as a replacement for the full brainstorm or paper. @@ -737,6 +754,11 @@ def build_proof_tactic_script_prompt( If SMT guidance is present, treat it as a hint only. Lean 4 must still verify the theorem directly. Suggested tactics are optional and should only be used when they genuinely match the goal. +SYNTHETIC / LOCAL VERIFIED PROOF SEARCH RESULTS: +{retrieved_proofs_block} + +Use retrieved proofs only as optional proof-pattern/dependency guidance for the TARGET THEOREM. Do not replace the selected theorem with a routine helper, a standard known result, or an unrelated retrieved theorem. + {LEAN4_COMMON_PITFALLS} PRIOR ATTEMPT HISTORY: @@ -830,10 +852,10 @@ def build_proof_statement_alignment_prompt( Lean 4 already verified that the code is logically valid. Your task is NOT to reject the proof. Your task is to identify whether the Lean-accepted theorem -matches the intended candidate, or whether MOTO should preserve it as a narrower -supporting lemma under the actual statement proved by the code. +matches the intended candidate, or whether MOTO should preserve it under the +actual statement proved by the code. -If the code proves only a weakened, narrower, or supporting result, set +If the code proves only a weakened, narrower, routine, trivial, or unrelated result, set `matches_intended` to false and write `actual_theorem_statement` as the strongest accurate natural-language description of what Lean verified. If the code is a routine identity, `True`, or unrelated lemma, still describe the actual theorem @@ -859,8 +881,8 @@ def build_proof_statement_alignment_prompt( Classification examples: - Same/equivalent claim: `matches_intended=true`, actual statement can match the intended candidate. -- Narrower useful lemma: `matches_intended=false`, actual statement should name the narrower lemma and explain how it relates. +- Different/weakened theorem: `matches_intended=false`, actual statement should name what Lean actually proved and explain how it relates. - Trivial/unrelated theorem: `matches_intended=false`, actual statement should honestly describe the trivial/unrelated theorem so novelty ranking can classify it as not novel. -{_json_only_footer('{"matches_intended": false, "actual_theorem_name": "lean_declaration_name_if_identifiable", "actual_theorem_statement": "the actual theorem Lean verified", "relationship_to_candidate": "narrower_supporting_lemma|equivalent|unrelated|trivial|uncertain", "downshift_reason": "why this should be stored under the actual statement instead of the intended candidate", "reasoning": "brief explanation"}')} +{_json_only_footer('{"matches_intended": false, "actual_theorem_name": "lean_declaration_name_if_identifiable", "actual_theorem_statement": "the actual theorem Lean verified", "relationship_to_candidate": "weakened|equivalent|unrelated|trivial|uncertain", "downshift_reason": "why this should be stored under the actual statement instead of the intended candidate", "reasoning": "brief explanation"}')} """ diff --git a/backend/compiler/README.md b/backend/compiler/README.md index dc003b6..ed13586 100644 --- a/backend/compiler/README.md +++ b/backend/compiler/README.md @@ -14,10 +14,10 @@ The compiler tool reads the aggregator's shared training database and systematic - **Sequential Markov Chain Workflow**: One submitter runs at a time, each submission must be validated before proceeding - **Multiple Submitter Modes**: - - **Outline Creation/Update**: High-context model creates and maintains paper structure - - **Paper Construction**: High-context model writes paper sections following the outline - - **Review/Cleanup**: High-context model reviews and fixes errors (without aggregator DB context) - - **Rigor Mode (Lean 4)**: High-parameter model proposes one theorem per cycle, runs up to 5 Lean 4 formalization attempts with error-feedback chaining, persists the verified proof into the shared `proof_database`, and places it inline (2 placement attempts) or appends it to the Theorems Appendix on double rejection. + - **Outline Creation/Update**: Writing Submitter creates and maintains paper structure + - **Paper Construction**: Writing Submitter writes paper sections following the outline + - **Review/Cleanup**: Writing Submitter reviews and fixes errors (without aggregator DB context) + - **Rigor Mode (Lean 4)**: Rigor & Proofs Submitter proposes one theorem per cycle, runs up to 5 Lean 4 formalization attempts with error-feedback chaining, persists the verified proof into the shared `proof_database`, and places it inline (2 placement attempts) or appends it to the Theorems Appendix on double rejection. - **Real-time Paper Viewing**: Live updates in the GUI as the paper is constructed - **Intelligent Placement Logic**: Automatically inserts content at the correct location based on placement context - **Separate GUI Tabs**: Compiler Interface, Settings, Logs, and Live Paper view @@ -31,8 +31,8 @@ The compiler tool reads the aggregator's shared training database and systematic ### Agents -- `high_context_submitter.py` - Low-parameter, high-context model (outline, construction, review) -- `high_param_submitter.py` - High-parameter model. Rigor mode: discovery + 5x Lean 4 attempts + novelty classification + 2-attempt placement + Theorems Appendix fallback. +- `writer_submitter.py` - Writing Submitter role for outline, construction, and review. +- `high_param_submitter.py` - Rigor & Proofs Submitter role. Rigor mode: discovery + 5x Lean 4 attempts + novelty classification + 2-attempt placement + Theorems Appendix fallback. ### Validation @@ -106,6 +106,6 @@ The compiler continuously reads from the aggregator's shared training database ( ## Tools Available to Submitters -- **Wolfram Alpha (construction mode only)**: When `system_config.wolfram_alpha_enabled=true`, the high-context submitter may invoke the `wolfram_alpha_query` OpenAI-compatible tool up to 20 times per construction submission. See `WOLFRAM_TOOL_SCHEMA` in `high_context_submitter.py`. Audit trail attached to `CompilerSubmission.metadata["wolfram_calls"]`. Not available in `outline_create`, `outline_update`, `review`, or rigor mode. +- **Wolfram Alpha (construction mode only)**: When `system_config.wolfram_alpha_enabled=true`, the writing submitter may invoke the `wolfram_alpha_query` OpenAI-compatible tool up to 20 times per construction submission. See `WOLFRAM_TOOL_SCHEMA` in `writer_submitter.py`. Audit trail attached to `CompilerSubmission.metadata["wolfram_calls"]`. Not available in `outline_create`, `outline_update`, `review`, or rigor mode. - **Lean 4 (rigor mode only)**: The rigor loop uses `ProofFormalizationAgent.prove_candidate(max_attempts=5)` from `backend/autonomous/agents/proof_formalization_agent.py` backed by the Lean 4 toolchain + Mathlib workspace. Verified proofs are persisted in the shared `proof_database` (same store used by autonomous mode). Novel proofs are automatically injected into the highest-priority direct-injection block on subsequent submitter instantiations. diff --git a/backend/compiler/agents/high_param_submitter.py b/backend/compiler/agents/high_param_submitter.py index 747abbd..2e38362 100644 --- a/backend/compiler/agents/high_param_submitter.py +++ b/backend/compiler/agents/high_param_submitter.py @@ -1,5 +1,5 @@ """ -High-parameter submitter agent for the compiler's rigor loop. +Rigor & Proofs submitter agent for the compiler's rigor loop. The rigor loop no longer rewrites paper text. Instead it runs a two-stage Lean-4-verified-theorem flow (see RIGOR_LEAN_BUILD_PLAN.md): @@ -17,7 +17,7 @@ and appendix insertion. The Wolfram sub-mode that used to live here has been removed in Phase 2. -Wolfram Alpha is now a tool available to HighContextSubmitter.submit_construction +Wolfram Alpha is now a tool available to WritingSubmitter.submit_construction (see Phase 3 of the build plan). """ @@ -26,15 +26,21 @@ import logging import uuid from dataclasses import dataclass, field +from datetime import datetime from typing import Any, Awaitable, Callable, Dict, List, Optional from backend.autonomous.memory.paper_library import PaperLibrary +from backend.autonomous.agents.proof_formalization_agent import ( + _MANDATORY_FULL_SOURCE_CONTEXT_OVERFLOW_PREFIX as MANDATORY_FULL_SOURCE_CONTEXT_OVERFLOW_PREFIX, +) from backend.autonomous.memory.proof_database import proof_database as autonomous_proof_database from backend.compiler.core.compiler_rag_manager import compiler_rag_manager +from backend.compiler.memory.critique_rejection_memory import CritiqueRejectionMemory from backend.compiler.memory.outline_memory import outline_memory from backend.compiler.memory.paper_memory import ( paper_memory, ) +from backend.compiler.prompts.critique_prompts import build_critique_prompt from backend.compiler.prompts.rigor_prompts import ( build_rigor_placement_prompt, build_rigor_theorem_discovery_prompt, @@ -42,13 +48,19 @@ from backend.shared.api_client_manager import api_client_manager from backend.shared.config import rag_config, system_config from backend.shared.json_parser import parse_json, sanitize_model_output_for_retry_context +from backend.shared.model_error_utils import ( + is_non_retryable_model_error, + is_transient_model_call_error, +) from backend.shared.response_extraction import extract_message_text from backend.shared.lean_proof_integrity import validate_full_lean_proof_integrity from backend.shared.lm_studio_client import lm_studio_client +from backend.shared.openrouter_client import FreeModelExhaustedError, OpenRouterInvalidResponseError from backend.shared.models import ( CompilerSubmission, ProofAttemptFeedback, ProofCandidate, + Submission, ) from backend.shared.utils import count_tokens @@ -62,6 +74,20 @@ } +def _is_rigor_model_call_failure(exc: Exception) -> bool: + """Return true for provider/config failures that must not become declines.""" + message = str(exc or "").lower() + return ( + isinstance(exc, OpenRouterInvalidResponseError) + or is_non_retryable_model_error(exc) + or is_transient_model_call_error(exc) + or "model output incomplete" in message + or "transient provider error" in message + or "upstream provider timeout" in message + or "response missing 'choices'" in message + ) + + def _normalize_string_field(value) -> str: """Normalize string field from LLM response (tolerates list-of-strings mistakes).""" if isinstance(value, list): @@ -162,7 +188,7 @@ class RigorTheoremResult: class HighParamSubmitter: - """High-parameter submitter for the compiler's rigor loop. + """Rigor & Proofs submitter for the compiler's rigor loop. Drives the Lean-4-verified-theorem flow end-to-end: discovery -> 5 Lean attempts -> novelty classification -> persist -> initial placement @@ -211,6 +237,10 @@ def __init__( self.task_sequence: int = 0 self.role_id = "compiler_high_param" self.task_tracking_callback: Optional[Callable[[str, str], None]] = None + self.critique_task_sequence: int = 0 + self.critique_submission_count: int = 0 + self.critique_submitter_id: int = 1 + self.critique_rejection_memory = CritiqueRejectionMemory() # Populated by initialize() self.context_window: int = system_config.compiler_high_param_context_window @@ -237,11 +267,13 @@ def set_rigor_proof_source(self, source_id: str = "", source_title: str = "") -> self._rigor_proof_source_id = (source_id or "").strip() self._rigor_proof_source_title = (source_title or "").strip() - def _get_direct_source_material_context(self, max_chars: int = 50000) -> str: - """Return bounded direct source context; full content remains available via RAG.""" + def _get_direct_source_material_context(self, max_chars: Optional[int] = None) -> str: + """Return direct source context, optionally bounded for diagnostic callers.""" context = self._source_material_context.strip() if not context: return "" + if max_chars is None: + return context if len(context) <= max_chars: return context head = max_chars // 2 @@ -258,7 +290,7 @@ def _get_paper_proof_source_content(self, current_paper: str) -> str: parts = [ "CURRENT PAPER UNDER CONSTRUCTION:\n" + (proof_paper or "").strip(), ] - source_context = self._get_direct_source_material_context(max_chars=30000) + source_context = self._get_direct_source_material_context() if source_context: label = self._source_material_label or "Source brainstorm / paper-writing database" parts.append(f"{label.upper()}:\n{source_context}") @@ -271,18 +303,157 @@ async def initialize(self) -> None: self.context_window = system_config.compiler_high_param_context_window self.max_output_tokens = system_config.compiler_high_param_max_output_tokens if int(self.validator_context_window or 0) <= 0 or int(self.validator_max_tokens or 0) <= 0: - raise ValueError("High-param validator context and max output settings must be configured.") + raise ValueError("Rigor & Proofs validator context and max output settings must be configured.") self.available_input_tokens = rag_config.get_available_input_tokens( self.context_window, self.max_output_tokens ) + await self.critique_rejection_memory.initialize() self._initialized = True - logger.info(f"High-param submitter initialized with model: {self.model_name}") + logger.info(f"Rigor & Proofs submitter initialized with model: {self.model_name}") logger.info( f"Context budget: {self.available_input_tokens} tokens " f"(window: {self.context_window})" ) + # ------------------------------------------------------------ critique mode + + async def reset_critique_rejection_memory(self) -> None: + """Clear critique feedback before a fresh post-body critique phase.""" + await self.critique_rejection_memory.reset() + + def next_critique_task_id(self, prefix: str = "critique_sub1") -> str: + """Return a critique-phase task ID while keeping rigor task IDs stable.""" + task_id = f"{prefix}_{self.critique_task_sequence:03d}" + self.critique_task_sequence += 1 + return task_id + + async def submit_critique( + self, + user_prompt: str, + current_body: str, + current_outline: str, + aggregator_db: str, + reference_papers: Optional[str] = None, + existing_critiques: Optional[str] = None, + accumulated_history: Optional[str] = None, + ) -> Optional[Submission]: + """Generate post-body critique or a validated decline assessment.""" + try: + rejection_feedback = await self.critique_rejection_memory.get_all_content() + prompt = build_critique_prompt( + user_prompt=user_prompt, + current_body=current_body, + current_outline=current_outline, + aggregator_db=aggregator_db, + reference_papers=reference_papers, + critique_feedback=existing_critiques, + rejection_feedback=rejection_feedback, + accumulated_history=accumulated_history, + ) + + prompt_tokens = count_tokens(prompt) + max_allowed = rag_config.get_available_input_tokens( + self.context_window, + self.max_output_tokens, + ) + if prompt_tokens > max_allowed: + logger.error( + "Critique prompt (%s tokens) exceeds Rigor & Proofs context window " + "(%s tokens available)", + prompt_tokens, + max_allowed, + ) + return None + + task_id = self.next_critique_task_id(f"critique_sub{self.critique_submitter_id}") + if self.task_tracking_callback: + self.task_tracking_callback("started", task_id) + + response = await api_client_manager.generate_completion( + task_id=task_id, + role_id=self.role_id, + model=self.model_name, + messages=[{"role": "user", "content": prompt}], + temperature=0.0, + max_tokens=self.max_output_tokens, + ) + + if self.task_tracking_callback: + self.task_tracking_callback("completed", task_id) + + if not response.get("choices") or not response["choices"][0].get("message"): + logger.error("Critique: Rigor & Proofs LLM returned empty response structure") + return None + + message = response["choices"][0]["message"] + llm_output = extract_message_text(message) + data = parse_json(llm_output) + if data is None: + logger.error("Failed to parse critique JSON response") + return None + + if isinstance(data, list): + logger.warning("Rigor & Proofs critique returned array instead of object - using first element") + if not data: + logger.error("Empty array response from Rigor & Proofs critique generation") + return None + data = data[0] + + if "critique_needed" not in data: + logger.error("Critique response missing 'critique_needed' field") + return None + if "reasoning" not in data: + logger.error("Critique response missing 'reasoning' field") + return None + + critique_needed = data.get("critique_needed", True) + is_decline = not critique_needed + if critique_needed and "submission" not in data: + logger.error("Critique response missing 'submission' field when critique_needed=true") + return None + + submission = Submission( + submission_id=str(uuid.uuid4()), + submitter_id=self.critique_submitter_id, + content=data.get("submission", ""), + reasoning=data.get("reasoning", ""), + chunk_size_used=512, + timestamp=datetime.now(), + is_decline=is_decline, + ) + + self.critique_submission_count += 1 + if is_decline: + logger.info( + "Rigor & Proofs declined critique (assessment #%s)", + self.critique_submission_count, + ) + else: + logger.info( + "Rigor & Proofs generated critique #%s", + self.critique_submission_count, + ) + return submission + + except FreeModelExhaustedError: + raise + except RuntimeError as exc: + if "credits exhausted" in str(exc).lower() or _is_rigor_model_call_failure(exc): + raise + logger.error("Error generating critique through Rigor & Proofs: %s", exc, exc_info=True) + return None + except Exception as exc: + if _is_rigor_model_call_failure(exc): + raise + logger.error("Error generating critique through Rigor & Proofs: %s", exc, exc_info=True) + return None + + async def handle_critique_rejection(self, summary: str, content: str) -> None: + """Store critique rejection feedback for later critique attempts.""" + await self.critique_rejection_memory.add_rejection(summary, content) + logger.info("Critique rejected - Rigor & Proofs feedback stored: %s...", summary[:100]) + # -------------------------------------------------------- broadcast helpers async def _broadcast(self, event: str, data: Dict[str, Any]) -> None: @@ -331,7 +502,7 @@ async def _build_rigor_rag_context( ) -> str: """Retrieve RAG evidence for the rigor prompts. - Mirrors the HighContextSubmitter.submit_construction budget + Mirrors the WritingSubmitter.submit_construction budget pattern: outline + paper are direct-injected by the caller, so we exclude them from RAG. The remaining budget goes to the RAG offload priority (Shared Training DB -> Local Submitter DB @@ -635,23 +806,22 @@ async def _step_discovery(self) -> Optional[dict]: self.context_window, self.max_output_tokens ) - # Build with empty RAG first to measure the mandatory footprint, - # then allocate the rest to RAG. If the direct source context itself - # is too large, shrink it before falling back to RAG. - while True: - base_prompt = await build_rigor_theorem_discovery_prompt( - user_prompt=self.user_prompt, - current_outline=current_outline, - current_paper=current_paper, - rag_evidence="", - existing_verified_proofs=existing_proofs, - recent_failure_hints=failure_hints, - source_material_context=source_material_context, - source_material_label=self._source_material_label, + base_prompt = await build_rigor_theorem_discovery_prompt( + user_prompt=self.user_prompt, + current_outline=current_outline, + current_paper=current_paper, + rag_evidence="", + existing_verified_proofs=existing_proofs, + recent_failure_hints=failure_hints, + source_material_context=source_material_context, + source_material_label=self._source_material_label, + ) + if count_tokens(base_prompt) > max_allowed: + raise ValueError( + "Rigor discovery prompt exceeds available input budget with mandatory full source context " + f"({count_tokens(base_prompt)} tokens > {max_allowed} tokens). Source material was not " + "silently truncated; choose a larger Rigor & Proofs context window or reduce source size." ) - if count_tokens(base_prompt) <= max_allowed or len(source_material_context) <= 4000: - break - source_material_context = source_material_context[: max(len(source_material_context) // 2, 4000)] mandatory_tokens = count_tokens(base_prompt) query_seed = (self.raw_user_prompt + " " + current_paper[-1500:]).strip() @@ -706,7 +876,7 @@ async def _step_formalize( Returns (theorem_name, lean_code, attempts, integrity) on success, None on all-5-fail. On failure, records the candidate in proof_database so - future rigor cycles can see it as an open lemma target. + future rigor cycles can see it as an open high-impact proof target. """ current_paper_raw = await paper_memory.get_paper() current_paper = _strip_paper_markers_for_llm(current_paper_raw) @@ -786,8 +956,11 @@ async def _on_attempt_feedback(feedback: ProofAttemptFeedback) -> None: max_attempts=5, attempt_callback=_on_attempt_feedback, attempt_start_callback=_on_attempt_started, + source_title=self._compiler_source_title(), ) except Exception as exc: + if _is_rigor_model_call_failure(exc): + raise logger.error("Rigor formalization raised (%s); declining cycle", exc, exc_info=True) await self._broadcast( "proof_check_complete", @@ -801,7 +974,10 @@ async def _on_attempt_feedback(feedback: ProofAttemptFeedback) -> None: return None if not success: - # Record as an open lemma target so the next rigor cycle's + last_error = attempts[-1].error_output if attempts else "" + if MANDATORY_FULL_SOURCE_CONTEXT_OVERFLOW_PREFIX.lower() in last_error.lower(): + raise ValueError(last_error) + # Record as an open proof target so the next rigor cycle's # discovery step can optionally retry it. try: error_summary = attempts[-1].error_output if attempts else "" @@ -924,6 +1100,8 @@ async def _step_assess_novelty_and_store( stored = registration.record return stored.novel, stored.novelty_reasoning, stored, registration.duplicate except Exception as exc: + if _is_rigor_model_call_failure(exc): + raise logger.warning("Novelty assessment failed; rigor proof will not be stored: %s", exc) await self._broadcast( "proof_check_complete", @@ -1090,7 +1268,7 @@ async def _call_llm_and_parse( prompt: str, task_label: str, ) -> Optional[Any]: - """Send `prompt` to the high-param model and return parsed JSON. + """Send `prompt` to the Rigor & Proofs model and return parsed JSON. On a JSON parse failure, issues a single conversational retry that feeds the failed output back with a JSON-escape-rules reminder. @@ -1105,7 +1283,7 @@ async def _call_llm_and_parse( {"context_length": self.context_window, "model_path": self.model_name}, ) except Exception as exc: - logger.debug("LM Studio cache warmup skipped for high-param submitter: %s", exc) + logger.debug("LM Studio cache warmup skipped for Rigor & Proofs Submitter: %s", exc) if self.task_tracking_callback: self.task_tracking_callback("started", task_id) @@ -1120,13 +1298,15 @@ async def _call_llm_and_parse( max_tokens=self.max_output_tokens, ) except Exception as exc: - logger.error("High-param LLM call failed (%s): %s", task_label, exc) + if _is_rigor_model_call_failure(exc): + raise + logger.error("Rigor & Proofs LLM call failed (%s): %s", task_label, exc) if self.task_tracking_callback: self.task_tracking_callback("completed", task_id) return None if not response or not response.get("choices") or not response["choices"][0].get("message"): - logger.error("High-param LLM returned empty response (%s)", task_label) + logger.error("Rigor & Proofs LLM returned empty response (%s)", task_label) if self.task_tracking_callback: self.task_tracking_callback("completed", task_id) return None @@ -1134,7 +1314,7 @@ async def _call_llm_and_parse( message = response["choices"][0]["message"] llm_output = extract_message_text(message) if not llm_output.strip(): - logger.error("High-param LLM returned empty content (%s)", task_label) + logger.error("Rigor & Proofs LLM returned empty content (%s)", task_label) if self.task_tracking_callback: self.task_tracking_callback("completed", task_id) return None @@ -1146,7 +1326,7 @@ async def _call_llm_and_parse( return parsed except Exception as parse_error: logger.info( - "High-param submitter (%s): initial JSON parse failed, attempting one retry: %s", + "Rigor & Proofs Submitter (%s): initial JSON parse failed, attempting one retry: %s", task_label, parse_error, ) @@ -1166,15 +1346,27 @@ async def _call_llm_and_parse( try: truncated_preview = sanitize_model_output_for_retry_context(llm_output, max_chars=2000) + retry_messages = [ + {"role": "user", "content": prompt}, + {"role": "assistant", "content": truncated_preview}, + {"role": "user", "content": retry_prompt}, + ] + max_input_tokens = rag_config.get_available_input_tokens( + self.context_window, + self.max_output_tokens, + ) + if sum(count_tokens(str(message.get("content") or "")) for message in retry_messages) > max_input_tokens: + retry_messages[1]["content"] = "[failed output omitted because retry context would exceed the model input budget]" + if sum(count_tokens(str(message.get("content") or "")) for message in retry_messages) > max_input_tokens: + raise ValueError( + f"Rigor & Proofs retry prompt exceeds context limit for {task_label}; " + "the original prompt is already at the configured model budget." + ) retry_response = await api_client_manager.generate_completion( task_id=f"{task_id}_retry", role_id=self.role_id, model=self.model_name, - messages=[ - {"role": "user", "content": prompt}, - {"role": "assistant", "content": truncated_preview}, - {"role": "user", "content": retry_prompt}, - ], + messages=retry_messages, temperature=0.0, max_tokens=self.max_output_tokens, ) @@ -1182,13 +1374,17 @@ async def _call_llm_and_parse( retry_msg = retry_response["choices"][0]["message"] retry_output = extract_message_text(retry_msg) parsed = parse_json(retry_output) - logger.info("High-param submitter (%s): retry succeeded", task_label) + logger.info("Rigor & Proofs Submitter (%s): retry succeeded", task_label) if self.task_tracking_callback: self.task_tracking_callback("completed", task_id) return parsed except Exception as retry_error: + if _is_rigor_model_call_failure(retry_error): + raise + if "retry prompt exceeds context limit" in str(retry_error).lower(): + raise logger.warning( - "High-param submitter (%s): retry failed: %s", task_label, retry_error + "Rigor & Proofs Submitter (%s): retry failed: %s", task_label, retry_error ) if self.task_tracking_callback: diff --git a/backend/compiler/agents/high_context_submitter.py b/backend/compiler/agents/writer_submitter.py similarity index 93% rename from backend/compiler/agents/high_context_submitter.py rename to backend/compiler/agents/writer_submitter.py index 125e98e..d7c3e4e 100644 --- a/backend/compiler/agents/high_context_submitter.py +++ b/backend/compiler/agents/writer_submitter.py @@ -1,5 +1,5 @@ """ -High-context submitter agent for compiler. +Writing submitter agent for compiler. Handles 3 modes: construction, outline update, and review. """ import hashlib @@ -175,9 +175,9 @@ def _strip_paper_markers_for_llm(paper_content: str) -> str: return paper_content.strip() -class HighContextSubmitter: +class WritingSubmitter: """ - High-context, low-parameter submitter for compiler. + Writing submitter for compiler outline, construction, and review work. Modes: - outline_create: Generate initial outline @@ -203,13 +203,13 @@ def __init__( self._initialized = False # Calculate context budget from the user-configured role settings. - self.context_window = system_config.compiler_high_context_context_window - self.max_output_tokens = system_config.compiler_high_context_max_output_tokens + self.context_window = system_config.compiler_writer_context_window + self.max_output_tokens = system_config.compiler_writer_max_output_tokens self.available_input_tokens = rag_config.get_available_input_tokens(self.context_window, self.max_output_tokens) # Task tracking for workflow panel and boost integration self.task_sequence: int = 0 - self.role_id = "compiler_high_context" + self.role_id = "compiler_writer" self.task_tracking_callback: Optional[Callable] = None def set_task_tracking_callback(self, callback: Callable) -> None: @@ -218,7 +218,7 @@ def set_task_tracking_callback(self, callback: Callable) -> None: def get_current_task_id(self) -> str: """Get the task ID for the current/next API call.""" - return f"comp_hc_{self.task_sequence:03d}" + return f"comp_writer_{self.task_sequence:03d}" async def initialize(self) -> None: """Initialize submitter.""" @@ -226,12 +226,12 @@ async def initialize(self) -> None: return # Re-read context window from config (in case it was updated) - self.context_window = system_config.compiler_high_context_context_window - self.max_output_tokens = system_config.compiler_high_context_max_output_tokens + self.context_window = system_config.compiler_writer_context_window + self.max_output_tokens = system_config.compiler_writer_max_output_tokens self.available_input_tokens = rag_config.get_available_input_tokens(self.context_window, self.max_output_tokens) self._initialized = True - logger.info(f"High-context submitter initialized with model: {self.model_name}") + logger.info(f"Writing submitter initialized with model: {self.model_name}") logger.info(f"Context budget: {self.available_input_tokens} tokens (window: {self.context_window})") async def submit_outline_create(self) -> CompilerSubmission: @@ -261,10 +261,17 @@ async def submit_outline_create(self) -> CompilerSubmission: rag_evidence=context_pack.text ) logger.info(f"Prompt built: {len(prompt)} chars") + + task_id = self.get_current_task_id() + await api_client_manager.prewarm_assistant_memory_context( + task_id=task_id, + role_id=self.role_id, + prompt=prompt, + ) # Validate prompt size actual_prompt_tokens = count_tokens(prompt) - max_allowed_tokens = rag_config.get_available_input_tokens(system_config.compiler_high_context_context_window, system_config.compiler_high_context_max_output_tokens) + max_allowed_tokens = rag_config.get_available_input_tokens(system_config.compiler_writer_context_window, system_config.compiler_writer_max_output_tokens) if actual_prompt_tokens > max_allowed_tokens: logger.error( @@ -275,8 +282,6 @@ async def submit_outline_create(self) -> CompilerSubmission: logger.debug(f"outline_create prompt: {actual_prompt_tokens} tokens (max: {max_allowed_tokens})") - # Generate task ID for tracking - task_id = self.get_current_task_id() self.task_sequence += 1 # Notify task started (for workflow panel) @@ -291,7 +296,7 @@ async def submit_outline_create(self) -> CompilerSubmission: model=self.model_name, messages=[{"role": "user", "content": prompt}], temperature=0.0, # Deterministic generation - evolving context provides diversity - max_tokens=system_config.compiler_high_context_max_output_tokens # User-configurable (outline creation, update, construction, review) + max_tokens=system_config.compiler_writer_max_output_tokens # User-configurable (outline creation, update, construction, review) ) # Check for empty response @@ -409,10 +414,17 @@ async def submit_outline_update(self) -> Optional[CompilerSubmission]: rag_evidence=context_pack.text ) logger.info(f"Prompt built: {len(prompt)} chars") + + task_id = self.get_current_task_id() + await api_client_manager.prewarm_assistant_memory_context( + task_id=task_id, + role_id=self.role_id, + prompt=prompt, + ) # Validate prompt size actual_prompt_tokens = count_tokens(prompt) - max_allowed_tokens = rag_config.get_available_input_tokens(system_config.compiler_high_context_context_window, system_config.compiler_high_context_max_output_tokens) + max_allowed_tokens = rag_config.get_available_input_tokens(system_config.compiler_writer_context_window, system_config.compiler_writer_max_output_tokens) if actual_prompt_tokens > max_allowed_tokens: logger.error( @@ -423,8 +435,6 @@ async def submit_outline_update(self) -> Optional[CompilerSubmission]: logger.debug(f"outline_update prompt: {actual_prompt_tokens} tokens (max: {max_allowed_tokens})") - # Generate task ID for tracking - task_id = self.get_current_task_id() self.task_sequence += 1 # Notify task started (for workflow panel) @@ -439,7 +449,7 @@ async def submit_outline_update(self) -> Optional[CompilerSubmission]: model=self.model_name, messages=[{"role": "user", "content": prompt}], temperature=0.0, # Deterministic generation - evolving context provides diversity - max_tokens=system_config.compiler_high_context_max_output_tokens # User-configurable (outline creation, update, construction, review) + max_tokens=system_config.compiler_writer_max_output_tokens # User-configurable (outline creation, update, construction, review) ) # Check for empty response @@ -555,8 +565,8 @@ async def submit_construction( # Calculate RAG budget accounting for brainstorm content (prevents context overflow) max_allowed_tokens = rag_config.get_available_input_tokens( - system_config.compiler_high_context_context_window, - system_config.compiler_high_context_max_output_tokens + system_config.compiler_writer_context_window, + system_config.compiler_writer_max_output_tokens ) outline_tokens = count_tokens(current_outline) paper_tokens = count_tokens(paper_for_llm) if paper_for_llm else 0 @@ -652,6 +662,13 @@ async def submit_construction( rejection_feedback=rejection_feedback ) logger.info(f"Prompt built: {len(prompt)} chars") + + task_id = self.get_current_task_id() + await api_client_manager.prewarm_assistant_memory_context( + task_id=task_id, + role_id=self.role_id, + prompt=prompt, + ) # Validate prompt size (max_allowed_tokens already calculated above for RAG budget) actual_prompt_tokens = count_tokens(prompt) @@ -668,8 +685,6 @@ async def submit_construction( logger.debug(f"construction prompt: {actual_prompt_tokens} tokens (max: {max_allowed_tokens})") - # Generate task ID for tracking - task_id = self.get_current_task_id() self.task_sequence += 1 # Notify task started (for workflow panel) @@ -699,7 +714,7 @@ async def submit_construction( model=self.model_name, messages=[{"role": "user", "content": prompt}], temperature=0.0, - max_tokens=system_config.compiler_high_context_max_output_tokens, + max_tokens=system_config.compiler_writer_max_output_tokens, ) if not fallback.get("choices") or not fallback["choices"][0].get("message"): logger.error("construction: LLM returned empty response structure") @@ -875,10 +890,17 @@ async def submit_review(self, review_focus: str = "general") -> Optional[Compile review_focus=review_focus ) logger.info(f"Prompt built: {len(prompt)} chars") + + task_id = self.get_current_task_id() + await api_client_manager.prewarm_assistant_memory_context( + task_id=task_id, + role_id=self.role_id, + prompt=prompt, + ) # Validate prompt size actual_prompt_tokens = count_tokens(prompt) - max_allowed_tokens = rag_config.get_available_input_tokens(system_config.compiler_high_context_context_window, system_config.compiler_high_context_max_output_tokens) + max_allowed_tokens = rag_config.get_available_input_tokens(system_config.compiler_writer_context_window, system_config.compiler_writer_max_output_tokens) if actual_prompt_tokens > max_allowed_tokens: logger.error( @@ -889,8 +911,6 @@ async def submit_review(self, review_focus: str = "general") -> Optional[Compile logger.debug(f"review prompt: {actual_prompt_tokens} tokens (max: {max_allowed_tokens})") - # Generate task ID for tracking - task_id = self.get_current_task_id() self.task_sequence += 1 # Notify task started (for workflow panel) @@ -905,7 +925,7 @@ async def submit_review(self, review_focus: str = "general") -> Optional[Compile model=self.model_name, messages=[{"role": "user", "content": prompt}], temperature=0.0, # Deterministic generation - evolving context provides diversity - max_tokens=system_config.compiler_high_context_max_output_tokens # User-configurable (outline creation, update, construction, review) + max_tokens=system_config.compiler_writer_max_output_tokens # User-configurable (outline creation, update, construction, review) ) # Check for empty response @@ -1061,7 +1081,7 @@ async def _generate_completion_with_wolfram_tool( model=self.model_name, messages=messages, temperature=0.0, - max_tokens=system_config.compiler_high_context_max_output_tokens, + max_tokens=system_config.compiler_writer_max_output_tokens, tools=tools_param, ) @@ -1252,7 +1272,7 @@ async def _parse_json_response_with_retry( return parsed except Exception as parse_error: error = str(parse_error) - logger.info(f"Compiler high-context submitter ({mode}): Initial JSON parse failed, attempting single retry") + logger.info(f"Compiler writing submitter ({mode}): Initial JSON parse failed, attempting single retry") logger.debug(f"Parse error: {error}") # Build mode-specific retry prompt @@ -1263,16 +1283,29 @@ async def _parse_json_response_with_retry( # Generate a retry task ID (append _retry to distinguish from original) retry_task_id = f"{self.get_current_task_id()}_retry" retry_context = sanitize_model_output_for_retry_context(response) + retry_messages = [ + {"role": "user", "content": original_prompt}, + {"role": "assistant", "content": retry_context}, + {"role": "user", "content": retry_prompt} + ] + max_input_tokens = rag_config.get_available_input_tokens( + self.context_window, + self.max_output_tokens, + ) + if sum(count_tokens(str(message.get("content") or "")) for message in retry_messages) > max_input_tokens: + retry_context = "[failed output omitted because retry context would exceed the model input budget]" + retry_messages[1]["content"] = retry_context + if sum(count_tokens(str(message.get("content") or "")) for message in retry_messages) > max_input_tokens: + raise ValueError( + f"Compiler writing retry prompt exceeds context limit for {mode}; " + "the original prompt is already at the configured model budget." + ) retry_response = await api_client_manager.generate_completion( task_id=retry_task_id, role_id=self.role_id, model=self.model_name, - messages=[ - {"role": "user", "content": original_prompt}, - {"role": "assistant", "content": retry_context}, - {"role": "user", "content": retry_prompt} - ], + messages=retry_messages, temperature=0.0, # Deterministic JSON formatting max_tokens=self.max_output_tokens ) @@ -1283,18 +1316,20 @@ async def _parse_json_response_with_retry( try: parsed = parse_json(retry_output) - logger.info(f"Compiler high-context submitter ({mode}): Retry succeeded!") + logger.info(f"Compiler writing submitter ({mode}): Retry succeeded!") return parsed except Exception as retry_parse_error: - logger.warning(f"Compiler high-context submitter ({mode}): Retry parse failed - {retry_parse_error}") + logger.warning(f"Compiler writing submitter ({mode}): Retry parse failed - {retry_parse_error}") else: - logger.warning(f"Compiler high-context submitter ({mode}): Retry returned empty response") + logger.warning(f"Compiler writing submitter ({mode}): Retry returned empty response") except Exception as e: - logger.error(f"Compiler high-context submitter ({mode}): Retry request failed - {e}") + if "retry prompt exceeds context limit" in str(e).lower(): + raise + logger.error(f"Compiler writing submitter ({mode}): Retry request failed - {e}") # Retry failed - return None and let coordinator handle it - logger.error(f"Compiler high-context submitter ({mode}): JSON validation failed after retry: {error}") + logger.error(f"Compiler writing submitter ({mode}): JSON validation failed after retry: {error}") return None def _build_retry_prompt(self, mode: str, error: str) -> str: diff --git a/backend/compiler/core/compiler_coordinator.py b/backend/compiler/core/compiler_coordinator.py index 2ec57fd..890295a 100644 --- a/backend/compiler/core/compiler_coordinator.py +++ b/backend/compiler/core/compiler_coordinator.py @@ -17,11 +17,19 @@ from backend.shared.brainstorm_proof_gate import BRAINSTORM_LEAN_PROOF_MARKER from backend.shared.free_model_manager import free_model_manager from backend.shared.json_parser import parse_json +from backend.shared.model_error_utils import ( + is_non_retryable_model_error, + is_transient_model_call_error, +) from backend.shared.response_extraction import extract_message_text from backend.shared.utils import count_tokens -from backend.compiler.agents.high_context_submitter import HighContextSubmitter +from backend.shared.context_overflow import ( + CONTEXT_OVERFLOW_RESOLUTION, + CONTEXT_OVERFLOW_STOP_MESSAGE, + CONTEXT_OVERFLOW_STOP_REASON, +) +from backend.compiler.agents.writer_submitter import WritingSubmitter from backend.compiler.agents.high_param_submitter import HighParamSubmitter -from backend.compiler.agents.critique_submitter import CritiqueSubmitterAgent from backend.compiler.validation.compiler_validator import CompilerValidator, normalize_unicode_hyphens, find_with_normalized_hyphens from backend.compiler.memory.outline_memory import outline_memory, OUTLINE_ANCHOR from backend.compiler.memory.paper_memory import ( @@ -43,6 +51,20 @@ MAX_RIGOR_CYCLES_PER_LOOP = 5 +def _is_rigor_model_call_failure(exc: Exception) -> bool: + """Return true for provider/config failures that must not become proof declines.""" + message = str(exc or "").lower() + return ( + isinstance(exc, OpenRouterInvalidResponseError) + or is_non_retryable_model_error(exc) + or is_transient_model_call_error(exc) + or "model output incomplete" in message + or "transient provider error" in message + or "upstream provider timeout" in message + or "response missing 'choices'" in message + ) + + async def _cancel_and_drain_task(task: asyncio.Task) -> None: """Cancel a task, suppressing only cancellation while preserving real failures.""" task.cancel() @@ -69,7 +91,7 @@ async def _cancel_and_drain_task(task: asyncio.Task) -> None: def _classify_submitter_error(err: BaseException) -> tuple[str, str]: """ - Classify an exception raised by a HighContextSubmitter.submit_* call. + Classify an exception raised by a WritingSubmitter.submit_* call. Distinguishes true context / prompt-size overflows (which are meaningful "decline to submit" signals) from upstream transport / API failures @@ -89,7 +111,13 @@ def _classify_submitter_error(err: BaseException) -> tuple[str, str]: if isinstance(err, OpenRouterInvalidResponseError): return ("API transport error", "API transport error") - if "prompt too large" in msg_lower or "tokens > " in msg_lower: + if ( + "prompt too large" in msg_lower + or "tokens > " in msg_lower + or "exceeds context limit" in msg_lower + or "exceeds available input budget" in msg_lower + or "exceeds the configured input budget" in msg_lower + ): return ("Context overflow", "Context overflow") if msg_lower.startswith("openrouter api error") or msg_lower.startswith("openrouter connection failed") or msg_lower.startswith("openrouter rate limit"): @@ -108,13 +136,15 @@ class CompilerCoordinator: """ def __init__(self): - self.high_context_submitter: Optional[HighContextSubmitter] = None + self.writer_submitter: Optional[WritingSubmitter] = None self.high_param_submitter: Optional[HighParamSubmitter] = None self.validator: Optional[CompilerValidator] = None self.is_running = False self.current_mode = "idle" self.outline_accepted = False + self.fatal_error_type: Optional[str] = None + self.fatal_error_message: str = "" # Stats self.total_submissions = 0 @@ -146,7 +176,7 @@ def __init__(self): self.allow_mathematical_proofs: bool = True # Critique phase state (post-body peer review) - self.critique_submitter = None # CritiqueSubmitterAgent instance + self.critique_submitter = None # Rigor & Proofs submitter reused for critique generation self.critique_aggregator = None # Coordinator instance for critique workflow self.in_critique_phase = False self.critique_acceptances = 0 @@ -177,7 +207,7 @@ async def initialize( self, compiler_prompt: str, validator_model: str, - high_context_model: str, + writer_model: str, high_param_model: str, critique_submitter_model: str, skip_aggregator_db: bool = False, @@ -186,23 +216,23 @@ async def initialize( validator_openrouter_provider: Optional[str] = None, validator_openrouter_reasoning_effort: str = "auto", validator_lm_studio_fallback: Optional[str] = None, - # OpenRouter provider config for high-context submitter - high_context_provider: str = "lm_studio", - high_context_openrouter_provider: Optional[str] = None, - high_context_openrouter_reasoning_effort: str = "auto", - high_context_lm_studio_fallback: Optional[str] = None, - # OpenRouter provider config for high-param submitter + # OpenRouter provider config for writing submitter + writer_provider: str = "lm_studio", + writer_openrouter_provider: Optional[str] = None, + writer_openrouter_reasoning_effort: str = "auto", + writer_lm_studio_fallback: Optional[str] = None, + # OpenRouter provider config for Rigor & Proofs submitter high_param_provider: str = "lm_studio", high_param_openrouter_provider: Optional[str] = None, high_param_openrouter_reasoning_effort: str = "auto", high_param_lm_studio_fallback: Optional[str] = None, - # OpenRouter provider config for critique submitter + # Deprecated critique compatibility fields, mirrored from Rigor & Proofs critique_submitter_provider: str = "lm_studio", critique_submitter_openrouter_provider: Optional[str] = None, critique_submitter_openrouter_reasoning_effort: str = "auto", critique_submitter_lm_studio_fallback: Optional[str] = None, validator_supercharge_enabled: bool = False, - high_context_supercharge_enabled: bool = False, + writer_supercharge_enabled: bool = False, high_param_supercharge_enabled: bool = False, critique_submitter_supercharge_enabled: bool = False, allow_mathematical_proofs: bool = True @@ -213,26 +243,23 @@ async def initialize( Args: compiler_prompt: User's compiler-directing prompt validator_model: Model for validator - high_context_model: Model for high-context submitter - high_param_model: Model for high-param submitter + writer_model: Model for writing submitter + high_param_model: Model for Rigor & Proofs submitter critique_submitter_model: Model for critique generation skip_aggregator_db: If True, don't load Part 1 aggregator database (for autonomous mode) validator_provider: Provider for validator ("lm_studio" or "openrouter") validator_openrouter_provider: OpenRouter host provider for validator validator_openrouter_reasoning_effort: OpenRouter reasoning effort for validator validator_lm_studio_fallback: LM Studio fallback model for validator - high_context_provider: Provider for high-context submitter - high_context_openrouter_provider: OpenRouter host provider for high-context submitter - high_context_openrouter_reasoning_effort: OpenRouter reasoning effort for high-context submitter - high_context_lm_studio_fallback: LM Studio fallback model for high-context submitter - high_param_provider: Provider for high-param submitter - high_param_openrouter_provider: OpenRouter host provider for high-param submitter - high_param_openrouter_reasoning_effort: OpenRouter reasoning effort for high-param submitter - high_param_lm_studio_fallback: LM Studio fallback model for high-param submitter - critique_submitter_provider: Provider for critique submitter - critique_submitter_openrouter_provider: OpenRouter host provider for critique submitter - critique_submitter_openrouter_reasoning_effort: OpenRouter reasoning effort for critique submitter - critique_submitter_lm_studio_fallback: LM Studio fallback model for critique submitter + writer_provider: Provider for writing submitter + writer_openrouter_provider: OpenRouter host provider for writing submitter + writer_openrouter_reasoning_effort: OpenRouter reasoning effort for writing submitter + writer_lm_studio_fallback: LM Studio fallback model for writing submitter + high_param_provider: Provider for Rigor & Proofs submitter + high_param_openrouter_provider: OpenRouter host provider for Rigor & Proofs submitter + high_param_openrouter_reasoning_effort: OpenRouter reasoning effort for Rigor & Proofs submitter + high_param_lm_studio_fallback: LM Studio fallback model for Rigor & Proofs submitter + critique_submitter_*: Deprecated compatibility aliases mirrored from Rigor & Proofs """ logger.info("Initializing compiler coordinator...") @@ -243,29 +270,31 @@ async def initialize( self.validator_model = validator_model self.validator_context_window = system_config.compiler_validator_context_window self.validator_max_tokens = system_config.compiler_validator_max_output_tokens - self.critique_submitter_model = critique_submitter_model + # Deprecated critique role fields are compatibility aliases. Critique + # generation now runs on the Rigor & Proofs submitter settings. + self.critique_submitter_model = high_param_model # Store OpenRouter provider configs for all roles self.validator_provider = validator_provider self.validator_openrouter_provider = validator_openrouter_provider self.validator_openrouter_reasoning_effort = validator_openrouter_reasoning_effort self.validator_lm_studio_fallback = validator_lm_studio_fallback - self.high_context_provider = high_context_provider - self.high_context_openrouter_provider = high_context_openrouter_provider - self.high_context_openrouter_reasoning_effort = high_context_openrouter_reasoning_effort - self.high_context_lm_studio_fallback = high_context_lm_studio_fallback + self.writer_provider = writer_provider + self.writer_openrouter_provider = writer_openrouter_provider + self.writer_openrouter_reasoning_effort = writer_openrouter_reasoning_effort + self.writer_lm_studio_fallback = writer_lm_studio_fallback self.high_param_provider = high_param_provider self.high_param_openrouter_provider = high_param_openrouter_provider self.high_param_openrouter_reasoning_effort = high_param_openrouter_reasoning_effort self.high_param_lm_studio_fallback = high_param_lm_studio_fallback - self.critique_submitter_provider = critique_submitter_provider - self.critique_submitter_openrouter_provider = critique_submitter_openrouter_provider - self.critique_submitter_openrouter_reasoning_effort = critique_submitter_openrouter_reasoning_effort - self.critique_submitter_lm_studio_fallback = critique_submitter_lm_studio_fallback + self.critique_submitter_provider = high_param_provider + self.critique_submitter_openrouter_provider = high_param_openrouter_provider + self.critique_submitter_openrouter_reasoning_effort = high_param_openrouter_reasoning_effort + self.critique_submitter_lm_studio_fallback = high_param_lm_studio_fallback self.validator_supercharge_enabled = validator_supercharge_enabled - self.high_context_supercharge_enabled = high_context_supercharge_enabled + self.writer_supercharge_enabled = writer_supercharge_enabled self.high_param_supercharge_enabled = high_param_supercharge_enabled - self.critique_submitter_supercharge_enabled = critique_submitter_supercharge_enabled + self.critique_submitter_supercharge_enabled = high_param_supercharge_enabled self.allow_mathematical_proofs = bool(allow_mathematical_proofs) # Reset workflow state for fresh start @@ -351,27 +380,27 @@ async def initialize( logger.info("Skipping Part 1 aggregator database load (autonomous mode)") # Create agents - self.high_context_submitter = HighContextSubmitter( - high_context_model, + self.writer_submitter = WritingSubmitter( + writer_model, compiler_prompt, websocket_broadcaster=self.websocket_broadcaster, proof_database_store=proof_database if self.autonomous_mode else None, ) - await self.high_context_submitter.initialize() + await self.writer_submitter.initialize() # Set up task tracking callback for workflow panel integration - self.high_context_submitter.set_task_tracking_callback(self._handle_task_event) - # Configure API client manager for high-context submitter (OpenRouter/LM Studio routing) + self.writer_submitter.set_task_tracking_callback(self._handle_task_event) + # Configure API client manager for writing submitter (OpenRouter/LM Studio routing) api_client_manager.configure_role( - role_id="compiler_high_context", + role_id="compiler_writer", config=ModelConfig( - provider=self.high_context_provider, - model_id=high_context_model, - openrouter_provider=self.high_context_openrouter_provider, - openrouter_reasoning_effort=self.high_context_openrouter_reasoning_effort, - lm_studio_fallback_id=self.high_context_lm_studio_fallback, - context_window=system_config.compiler_high_context_context_window, - max_output_tokens=system_config.compiler_high_context_max_output_tokens, - supercharge_enabled=self.high_context_supercharge_enabled + provider=self.writer_provider, + model_id=writer_model, + openrouter_provider=self.writer_openrouter_provider, + openrouter_reasoning_effort=self.writer_openrouter_reasoning_effort, + lm_studio_fallback_id=self.writer_lm_studio_fallback, + context_window=system_config.compiler_writer_context_window, + max_output_tokens=system_config.compiler_writer_max_output_tokens, + supercharge_enabled=self.writer_supercharge_enabled ) ) @@ -395,7 +424,7 @@ async def initialize( await self.high_param_submitter.initialize() # Set up task tracking callback for workflow panel integration self.high_param_submitter.set_task_tracking_callback(self._handle_task_event) - # Configure API client manager for high-param submitter (OpenRouter/LM Studio routing) + # Configure API client manager for Rigor & Proofs submitter (OpenRouter/LM Studio routing) api_client_manager.configure_role( role_id="compiler_high_param", config=ModelConfig( @@ -489,7 +518,7 @@ async def refresh_workflow_predictions(self) -> None: from backend.shared.boost_manager import boost_manager # Get actual sequence counters from agents - hc_seq = self.high_context_submitter.task_sequence if self.high_context_submitter else 0 + writer_seq = self.writer_submitter.task_sequence if self.writer_submitter else 0 hp_seq = self.high_param_submitter.task_sequence if self.high_param_submitter else 0 val_seq = self.validator.task_sequence if self.validator else 0 @@ -497,13 +526,13 @@ async def refresh_workflow_predictions(self) -> None: tasks = [] if not self.outline_accepted: - # Outline creation phase: HC -> V -> HC -> V ... + # Outline creation phase: writer -> validator -> writer -> validator ... for i in range(20): if i % 2 == 0: - task_id = f"comp_hc_{hc_seq:03d}" - role = "High-Context" + task_id = f"comp_writer_{writer_seq:03d}" + role = "Writing Submitter" mode = "Outline Creation" - hc_seq += 1 + writer_seq += 1 else: task_id = f"comp_val_{val_seq:03d}" role = "Validator" @@ -521,21 +550,21 @@ async def refresh_workflow_predictions(self) -> None: else: # Construction cycle pattern cycle_pattern = [ - ("hc", "High-Context", "Construction"), + ("writer", "Writing Submitter", "Construction"), ("val", "Validator", "Construction Review"), - ("hc", "High-Context", "Construction"), + ("writer", "Writing Submitter", "Construction"), ("val", "Validator", "Construction Review"), - ("hc", "High-Context", "Construction"), + ("writer", "Writing Submitter", "Construction"), ("val", "Validator", "Construction Review"), - ("hc", "High-Context", "Construction"), + ("writer", "Writing Submitter", "Construction"), ("val", "Validator", "Construction Review"), - ("hc", "High-Context", "Outline Update"), + ("writer", "Writing Submitter", "Outline Update"), ("val", "Validator", "Outline Review"), - ("hc", "High-Context", "Paper Review"), + ("writer", "Writing Submitter", "Paper Review"), ("val", "Validator", "Review Validation"), - ("hc", "High-Context", "Paper Review"), + ("writer", "Writing Submitter", "Paper Review"), ("val", "Validator", "Review Validation"), - ("hp", "High-Param", "Rigor Enhancement"), + ("hp", "Rigor & Proofs", "Rigor Enhancement"), ("val", "Validator", "Rigor Review"), ] @@ -543,9 +572,9 @@ async def refresh_workflow_predictions(self) -> None: pattern_idx = i % len(cycle_pattern) agent_type, role, mode = cycle_pattern[pattern_idx] - if agent_type == "hc": - task_id = f"comp_hc_{hc_seq:03d}" - hc_seq += 1 + if agent_type == "writer": + task_id = f"comp_writer_{writer_seq:03d}" + writer_seq += 1 elif agent_type == "hp": task_id = f"comp_hp_{hp_seq:03d}" hp_seq += 1 @@ -611,7 +640,7 @@ def _handle_task_event(self, event_type: str, task_id: str) -> None: Args: event_type: "started" or "completed" - task_id: The task ID (e.g., "comp_hc_001", "comp_hp_002", "comp_val_003") + task_id: The task ID (e.g., "comp_writer_001", "comp_hp_002", "comp_val_003") """ if event_type == "started": try: @@ -700,6 +729,8 @@ async def start(self) -> None: return self.is_running = True + self.fatal_error_type = None + self.fatal_error_message = "" logger.info("Starting compiler...") # Reset free model manager state for fresh start @@ -771,6 +802,22 @@ async def stop(self) -> None: await self._broadcast("compiler_stopped", {"message": "Compiler stopped"}) logger.info("Compiler stopped") + + async def _handle_context_overflow(self, error: BaseException, *, role_id: str, mode: Optional[str] = None) -> None: + """Stop the compiler when mandatory direct context cannot fit.""" + self.fatal_error_type = CONTEXT_OVERFLOW_STOP_REASON + self.fatal_error_message = str(error) + self.is_running = False + payload = { + "role_id": role_id, + "mode": mode or self.current_mode, + "reason": CONTEXT_OVERFLOW_STOP_REASON, + "message": CONTEXT_OVERFLOW_STOP_MESSAGE, + "error_detail": str(error), + "resolution": CONTEXT_OVERFLOW_RESOLUTION, + } + logger.error("Compiler context overflow in %s: %s", role_id, error) + await self._broadcast("context_overflow_error", payload) async def _main_workflow(self) -> None: """Main compiler workflow loop.""" @@ -832,6 +879,21 @@ async def _main_workflow(self) -> None: await asyncio.sleep(120) # Wait before retrying (all models exhausted) if self.is_running: self._main_task = asyncio.create_task(self._main_workflow()) + except ValueError as e: + is_context_overflow = ( + "mandatory full source context" in str(e).lower() + or _classify_submitter_error(e)[0] == "Context overflow" + ) + if not is_context_overflow: + logger.error(f"Compiler workflow error: {e}", exc_info=True) + self.is_running = False + await self._broadcast("compiler_error", { + "error": "Compiler workflow encountered an internal error", + "mode": self.current_mode, + "total_submissions": self.total_submissions + }) + return + await self._handle_context_overflow(e, role_id="compiler_rigor") except Exception as e: logger.error(f"Compiler workflow error: {e}", exc_info=True) self.is_running = False @@ -915,9 +977,18 @@ async def _outline_creation_loop(self) -> None: # Generate outline (creation or refinement) try: - submission = await self.high_context_submitter.submit_outline_create() + submission = await self.writer_submitter.submit_outline_create() except FreeModelExhaustedError: raise + except ValueError as e: + label, _ = _classify_submitter_error(e) + if label == "Context overflow": + await self._handle_context_overflow(e, role_id="compiler_writer", mode="outline_create") + return + logger.error(f"Iteration {iteration}: Outline submission failed with error: {e} - retrying") + await asyncio.sleep(5) + iteration -= 1 + continue except Exception as e: logger.error(f"Iteration {iteration}: Outline submission failed with error: {e} - retrying") await asyncio.sleep(5) @@ -1091,7 +1162,7 @@ async def _initial_paper_loop(self) -> None: rejection_feedback = None # Store rejection feedback for retry while self.is_running and not initial_portion_accepted: - # High-context submitter writes first portion + # Writing submitter writes the first portion. submission = None attempt += 1 backoff_time = min(2 ** (attempt - 1), 16) # 1s, 2s, 4s, 8s, 16s max @@ -1110,7 +1181,7 @@ async def _initial_paper_loop(self) -> None: except Exception as exc: logger.debug("Unable to load initial brainstorm context for construction: %s", exc) - submission = await self.high_context_submitter.submit_construction( + submission = await self.writer_submitter.submit_construction( is_first_portion=True, section_phase=section_phase, rejection_feedback=rejection_feedback, @@ -1138,6 +1209,9 @@ async def _initial_paper_loop(self) -> None: raise except (ValueError, OpenRouterInvalidResponseError) as e: label, reason_prefix = _classify_submitter_error(e) + if label == "Context overflow": + await self._handle_context_overflow(e, role_id="compiler_writer", mode="construction") + return logger.error(f"Construction {label.lower()} in initial loop (attempt {attempt}): {e}") await self._broadcast("compiler_rejection", { "mode": "construction", @@ -1275,7 +1349,7 @@ async def _construction_loop(self) -> None: def _track_submission_wolfram_calls(self, submission: CompilerSubmission) -> None: """Record accepted construction-mode Wolfram tool calls in paper credits. - HighContextSubmitter stores the full Wolfram audit trail on + WritingSubmitter stores the full Wolfram audit trail on `submission.metadata["wolfram_calls"]`. PaperModelTracker only tracks a count (and accepts the query for logging), so we bridge the two here after the paper operation has been accepted. @@ -1398,7 +1472,7 @@ async def _submit_and_validate_construction(self, rejection_feedback: Optional[s submission = None try: - submission = await self.high_context_submitter.submit_construction( + submission = await self.writer_submitter.submit_construction( is_first_portion=False, section_phase=section_phase, rejection_feedback=rejection_feedback, @@ -1407,6 +1481,9 @@ async def _submit_and_validate_construction(self, rejection_feedback: Optional[s ) except (ValueError, OpenRouterInvalidResponseError) as e: label, reason_prefix = _classify_submitter_error(e) + if label == "Context overflow": + await self._handle_context_overflow(e, role_id="compiler_writer", mode="construction") + return False, CONTEXT_OVERFLOW_STOP_MESSAGE logger.error(f"Construction {label.lower()}: {e}") self.construction_rejections += 1 overflow_reason = f"{reason_prefix}: {e}" @@ -1982,7 +2059,14 @@ async def _submit_and_validate_outline_update(self) -> bool: self.current_mode = "outline_update" try: - submission = await self.high_context_submitter.submit_outline_update() + submission = await self.writer_submitter.submit_outline_update() + except ValueError as e: + label, _ = _classify_submitter_error(e) + if label == "Context overflow": + await self._handle_context_overflow(e, role_id="compiler_writer", mode="outline_update") + return False + logger.error(f"Outline update submission failed with error: {e} - skipping this cycle") + return False except Exception as e: logger.error(f"Outline update submission failed with error: {e} - skipping this cycle") return False @@ -2089,9 +2173,12 @@ async def _submit_and_validate_review(self, review_focus: str = "general") -> bo submission = None try: - submission = await self.high_context_submitter.submit_review(review_focus=review_focus) + submission = await self.writer_submitter.submit_review(review_focus=review_focus) except (ValueError, OpenRouterInvalidResponseError) as e: label, reason_prefix = _classify_submitter_error(e) + if label == "Context overflow": + await self._handle_context_overflow(e, role_id="compiler_writer", mode="review") + return False logger.error(f"{review_label.capitalize()} {label.lower()}: {e}") self.review_declines += 1 await compiler_rejection_log.add_decline("review", f"{reason_prefix}: {e}") @@ -2293,6 +2380,12 @@ async def _submit_and_validate_rigor(self) -> bool: self.high_param_submitter.set_source_material_context(source_context, source_label) lean_result = await self.high_param_submitter.submit_rigor_lean_theorem() except ValueError as exc: + if ( + "mandatory full source context" in str(exc).lower() + or _classify_submitter_error(exc)[0] == "Context overflow" + or _is_rigor_model_call_failure(exc) + ): + raise logger.error(f"Rigor lean flow error: {exc}") self.rigor_declines += 1 await compiler_rejection_log.add_decline("rigor", f"LLM error: {exc}") @@ -2301,6 +2394,8 @@ async def _submit_and_validate_rigor(self) -> bool: ) return False except Exception as exc: + if _is_rigor_model_call_failure(exc): + raise logger.error(f"Rigor lean flow raised: {exc}", exc_info=True) self.rigor_declines += 1 await compiler_rejection_log.add_decline( @@ -2860,39 +2955,20 @@ async def _start_critique_phase(self) -> None: logger.info(f"Critique memory initialized for {paper_id}") - # Create critique submitter agent - self.critique_submitter = CritiqueSubmitterAgent( - model=self.critique_submitter_model, - context_window=system_config.compiler_critique_submitter_context_window, - max_tokens=system_config.compiler_critique_submitter_max_tokens, - submitter_id=1 - ) - - # Initialize rejection memory - await self.critique_submitter.initialize() + if not self.high_param_submitter: + logger.error("Cannot start critique phase: Rigor & Proofs submitter is not initialized") + await self._end_critique_phase(self_review_appended=False) + return + + self.critique_submitter = self.high_param_submitter # Clear rejection feedback from previous critique phases (fresh start) - await self.critique_submitter.rejection_memory.reset() + await self.critique_submitter.reset_critique_rejection_memory() logger.info("Cleared critique rejection feedback for fresh start") - logger.info(f"Critique submitter created with model: {self.critique_submitter.model}") - - # Set up task tracking callback for workflow panel integration - self.critique_submitter.set_task_tracking_callback(self._handle_task_event) - - # Configure API client manager for critique submitter (OpenRouter/LM Studio routing) - api_client_manager.configure_role( - role_id="compiler_critique_submitter", - config=ModelConfig( - provider=self.critique_submitter_provider, - model_id=self.critique_submitter_model, - openrouter_provider=self.critique_submitter_openrouter_provider, - openrouter_reasoning_effort=self.critique_submitter_openrouter_reasoning_effort, - lm_studio_fallback_id=self.critique_submitter_lm_studio_fallback, - context_window=system_config.compiler_critique_submitter_context_window, - max_output_tokens=system_config.compiler_critique_submitter_max_tokens, - supercharge_enabled=self.critique_submitter_supercharge_enabled - ) + logger.info( + "Critique generation using Rigor & Proofs submitter model: %s", + self.critique_submitter.model_name, ) # Configure API client manager for critique validator (uses same settings as compiler_validator) @@ -2955,8 +3031,8 @@ async def _get_reference_papers_context_for_critique( from backend.autonomous.memory.brainstorm_memory import brainstorm_memory max_input_tokens = rag_config.get_available_input_tokens( - system_config.compiler_critique_submitter_context_window, - system_config.compiler_critique_submitter_max_tokens + system_config.compiler_high_param_context_window, + system_config.compiler_high_param_max_output_tokens ) direct_injected_context = "\n\n".join( @@ -3157,7 +3233,7 @@ async def _run_critique_aggregation(self) -> None: # Store rejection feedback for learning if validation_result and validation_result.summary: - await self.critique_submitter.handle_rejection( + await self.critique_submitter.handle_critique_rejection( summary=validation_result.summary, content=submission.content ) @@ -3225,8 +3301,7 @@ async def _validate_critique(self, submission) -> Optional[ValidationResult]: prompt = ''.join(parts) # Generate task ID - task_id = f"critique_val_{self.critique_submitter.task_sequence:03d}" - self.critique_submitter.task_sequence += 1 + task_id = self.critique_submitter.next_critique_task_id("critique_val") # Call validator response = await api_client_manager.generate_completion( @@ -3268,6 +3343,8 @@ async def _validate_critique(self, submission) -> Optional[ValidationResult]: return result except Exception as e: + if _is_rigor_model_call_failure(e): + raise logger.error(f"Error validating critique: {e}", exc_info=True) return None @@ -3303,8 +3380,7 @@ async def _perform_critique_cleanup(self) -> None: prompt = ''.join(parts) # Call validator - task_id = f"critique_cleanup_{self.critique_submitter.task_sequence:03d}" - self.critique_submitter.task_sequence += 1 + task_id = self.critique_submitter.next_critique_task_id("critique_cleanup") response = await api_client_manager.generate_completion( task_id=task_id, diff --git a/backend/compiler/core/compiler_rag_manager.py b/backend/compiler/core/compiler_rag_manager.py index 44f84b2..bad1aae 100644 --- a/backend/compiler/core/compiler_rag_manager.py +++ b/backend/compiler/core/compiler_rag_manager.py @@ -32,12 +32,12 @@ def __init__(self): # role settings that only exist after a start request. self.context_window = max( system_config.compiler_validator_context_window, - system_config.compiler_high_context_context_window, + system_config.compiler_writer_context_window, system_config.compiler_high_param_context_window ) self.max_output_tokens = max( system_config.compiler_validator_max_output_tokens, - system_config.compiler_high_context_max_output_tokens, + system_config.compiler_writer_max_output_tokens, system_config.compiler_high_param_max_output_tokens ) self.available_tokens = 0 @@ -88,12 +88,12 @@ async def initialize(self) -> None: # conservative shared RAG budget. max_context_window = max( system_config.compiler_validator_context_window, - system_config.compiler_high_context_context_window, + system_config.compiler_writer_context_window, system_config.compiler_high_param_context_window ) max_output_tokens = max( system_config.compiler_validator_max_output_tokens, - system_config.compiler_high_context_max_output_tokens, + system_config.compiler_writer_max_output_tokens, system_config.compiler_high_param_max_output_tokens ) self.update_context_window(max_context_window, max_output_tokens) diff --git a/backend/compiler/memory/manual_prompt.py b/backend/compiler/memory/manual_prompt.py new file mode 100644 index 0000000..cc5a47e --- /dev/null +++ b/backend/compiler/memory/manual_prompt.py @@ -0,0 +1,55 @@ +"""Durable prompt storage for manual Compiler mode.""" + +import asyncio +import logging +from pathlib import Path + +import aiofiles + +from backend.shared.config import system_config + +logger = logging.getLogger(__name__) + +MANUAL_COMPILER_PROMPT_FILE = "manual_compiler_prompt.txt" + + +def get_manual_compiler_prompt_path() -> Path: + return Path(system_config.data_dir) / MANUAL_COMPILER_PROMPT_FILE + + +async def save_manual_compiler_prompt(prompt: str) -> None: + """Persist the latest manual Compiler prompt until explicit clear.""" + if not (prompt or "").strip(): + logger.warning("Refusing to overwrite manual Compiler prompt with an empty value") + return + path = get_manual_compiler_prompt_path() + path.parent.mkdir(parents=True, exist_ok=True) + temp_path = path.with_name(f"{path.name}.tmp") + async with aiofiles.open(temp_path, "w", encoding="utf-8") as handle: + await handle.write(prompt or "") + await asyncio.to_thread(temp_path.replace, path) + + +async def load_manual_compiler_prompt() -> str: + """Load the latest manual Compiler prompt, if one has been persisted.""" + path = get_manual_compiler_prompt_path() + if not path.exists(): + return "" + try: + async with aiofiles.open(path, "r", encoding="utf-8") as handle: + return await handle.read() + except Exception as exc: + logger.debug("Unable to load manual Compiler prompt: %s", exc) + return "" + + +async def clear_manual_compiler_prompt() -> None: + """Clear manual Compiler prompt state after an explicit reset.""" + path = get_manual_compiler_prompt_path() + if path.exists(): + try: + await asyncio.to_thread(path.unlink) + except FileNotFoundError: + return + except Exception as exc: + logger.debug("Unable to clear manual Compiler prompt: %s", exc) diff --git a/backend/compiler/prompts/construction_prompts.py b/backend/compiler/prompts/construction_prompts.py index b5907db..26e31b4 100644 --- a/backend/compiler/prompts/construction_prompts.py +++ b/backend/compiler/prompts/construction_prompts.py @@ -29,11 +29,21 @@ def get_wolfram_tool_guidance() -> str: """Return prompt guidance for the construction-only Wolfram tool. The actual OpenAI-compatible tool schema is registered by - HighContextSubmitter.submit_construction. This prompt section is only shown + WritingSubmitter.submit_construction. This prompt section is only shown when Wolfram is enabled so the model knows the tool exists and when to use it. """ - if not system_config.wolfram_alpha_enabled: + if not system_config.wolfram_alpha_enabled or not system_config.wolfram_alpha_api_key: + return "" + try: + from backend.shared.wolfram_alpha_client import get_wolfram_client + except ImportError: + return "" + try: + client_available = get_wolfram_client() is not None + except Exception: + client_available = False + if not client_available: return "" return """WOLFRAM ALPHA TOOL AVAILABLE (CONSTRUCTION MODE ONLY): diff --git a/backend/compiler/prompts/rigor_prompts.py b/backend/compiler/prompts/rigor_prompts.py index d62baee..85d4f53 100644 --- a/backend/compiler/prompts/rigor_prompts.py +++ b/backend/compiler/prompts/rigor_prompts.py @@ -6,12 +6,11 @@ Stage 1 - Theorem discovery (build_rigor_theorem_discovery_prompt): Using the full writing context, the submitter asks itself whether the - paper, outline, support context, or user prompt expose a novelty-first - theorem worth formalizing and proving in Lean 4. Candidate theorems may - verify prompt-critical existing paper claims or extend partial work when - that creates public/citable novelty for the paper construction / user - prompt, not merely a program-local first. Output is a candidate theorem - JSON (or a decline). + paper, outline, support context, or user prompt expose an impactful new + or novel theorem worth formalizing and proving in Lean 4. Candidate + theorems may verify prompt-critical existing paper claims or extend + partial work when that materially advances the user's prompt, not merely + a program-local first. Output is a candidate theorem JSON (or a decline). Stage 2 - Placement (build_rigor_placement_prompt): Given a Lean-4-verified theorem, the submitter proposes an inline @@ -25,8 +24,8 @@ Submitter: Shared Training DB -> Local Submitter DB -> Rejection Log -> User Upload Files -The high-param submitter direct-injects the outline and paper when they fit -inside the budget (mirroring HighContextSubmitter.submit_construction), then +The Rigor & Proofs Submitter direct-injects the outline and paper when they fit +inside the budget (mirroring WritingSubmitter.submit_construction), then fills the remaining budget with RAG results that exclude `compiler_outline.txt` and `compiler_paper.txt`. """ @@ -60,7 +59,7 @@ # STAGE 1: THEOREM DISCOVERY # ============================================================================= -_DISCOVERY_SYSTEM_PROMPT = f"""You are the rigor agent for a mathematical-paper compiler. Your job during the rigor loop is to look at the paper-in-progress together with the full research context and decide whether there is a novelty-first theorem worth formalizing and proving in Lean 4 because it directly helps answer, support, or advance the USER RESEARCH PROMPT. Paper-improvement value only counts when it visibly builds toward that prompt. +_DISCOVERY_SYSTEM_PROMPT = f"""You are the rigor agent for a mathematical-paper compiler. Your job during the rigor loop is to look at the paper-in-progress together with the full research context and decide whether there is an impactful new or novel theorem worth formalizing and proving in Lean 4 because it directly helps answer, support, or advance the USER RESEARCH PROMPT. Paper-improvement value only counts when it visibly builds toward that prompt. {INTERNAL_CONTENT_WARNING} @@ -69,33 +68,35 @@ 1. Read the current outline and the current paper text. 2. Read the USER RESEARCH PROMPT and treat it as the relevance boundary for all theorem work. 3. Read the list of theorems that have ALREADY been verified by Lean 4 (EXISTING VERIFIED PROOFS block). -4. Read the list of theorems that PREVIOUSLY FAILED Lean 4 verification (OPEN LEMMA TARGETS block, if present). +4. Read the list of theorems that PREVIOUSLY FAILED Lean 4 verification (OPEN PROOF TARGETS block, if present). 5. Decide exactly one of: - (A) `needs_theorem_work=false` - no prompt-relevant novel theorem worth trying right now. Good reasons: all useful novel claims for the user's prompt are already covered by existing verified proofs; the paper is in too early a state; there is no claim a Lean 4 proof could close usefully; or the only available claims are routine, known, or off-topic. - (B) `needs_theorem_work=true` - propose a single prompt-relevant novel candidate theorem to formalize. + (A) `needs_theorem_work=false` - no prompt-relevant new or novel theorem worth trying right now. Good reasons: all useful claims for the user's prompt are already covered by existing verified proofs; the paper is in too early a state; there is no claim a Lean 4 proof could close usefully; or the only available claims are routine, known, or off-topic. + (B) `needs_theorem_work=true` - propose a single prompt-relevant new or novel candidate theorem to formalize. RULES FOR PROPOSING A THEOREM: - This is NOT a known-knowledge-base construction task. Do not propose standard facts just because they are true, useful, formalizable, or prompt-adjacent. - The theorem must directly help answer, support, or advance the USER RESEARCH PROMPT. Do not propose a theorem merely because it is non-trivial or mathematically interesting. - The theorem must be provable in Lean 4 with Mathlib. - You MUST NOT re-propose a theorem that is already in EXISTING VERIFIED PROOFS. Look for theorems that are DIFFERENT - new results, missed lemmas, or sharper versions that are not yet on the list. -- You MAY retry a theorem from OPEN LEMMA TARGETS when it is still prompt-relevant, novelty-bearing, and the paper now gives you a better angle on it. When you do, set `retry_existing_failure_id` to the failed `theorem_id`. +- You MAY retry a theorem from OPEN PROOF TARGETS when it is still prompt-relevant, novelty-bearing, high-impact, and the paper now gives you a better angle on the same target. When you do, set `retry_existing_failure_id` to the failed `theorem_id`. - EXTENSION IS EXPLICITLY ALLOWED AND ENCOURAGED WHERE HELPFUL: you are NOT limited to exact claims already present in the current paper. You may construct a Lean-verifiable theorem by extending partial paper work, the current outline, supporting context, or the USER RESEARCH PROMPT only when that theorem would directly solve or build toward solving the user's requested goal. -- NOVELTY PRIORITY ORDER: prefer `major_mathematical_discovery`, then `mathematical_discovery`, then `novel_variant`, then prompt-critical `novel_formulation` whose exact formulation/formalization is absent from standard references or Mathlib and independently publishable/citable. Supporting lemmas are allowed only when they are necessary stepping stones toward one of those novel targets. +- Seek the most impactful proof target possible for the USER RESEARCH PROMPT: direct solutions, impossibility results, decisive reductions, new obstructions, or structural theorems that materially advance the requested problem. +- Supporting lemmas, routine helper lemmas, local facts, and trivial/easy proofs are NEVER valid proof targets, even as a fallback or last resort. +- Do not settle for a minor reformulation, local formalization, or easy-to-prove fact when a more consequential prompt-solving theorem is available. - Reject routine helper lemmas, proof-engineering glue, local bookkeeping facts, coercion facts, algebra cleanup, definitional rewrites, standard Mathlib/textbook restatements, or single-tactic/routine proof goals. - Set `theorem_origin="existing_paper_claim"` only when the theorem directly formalizes a claim already present in the current paper text. - Set `theorem_origin="extension_from_partial_work"` when the theorem is constructed by extending the current paper, outline, or supporting context beyond the exact written claim. - Set `theorem_origin="extension_from_user_prompt"` when the theorem is prompted primarily by the USER RESEARCH PROMPT and helps the paper even if the current paper has not yet written the claim. - Extension-derived theorems (`extension_from_partial_work` or `extension_from_user_prompt`) MUST set `placement_preference="appendix_only"`. These proofs belong at the end of the paper in the Theorems Appendix, not inline in the main body. - Existing-paper-claim theorems may set `placement_preference="inline"` when a local body insertion would strengthen the existing argument, or `placement_preference="appendix_only"` when the proof is useful but would distract from the prose. -- Prefer theorems whose statements are tight enough that Lean 4 can actually close them (arithmetic facts, concrete inequalities, specific algebraic identities, small group/ring/field lemmas, concrete combinatorial identities) over large open conjectures. +- State ambitious targets tightly enough that Lean 4 can attack them, but do not downshift to supporting lemmas, local facts, or unrelated easy facts. If the target cannot be attacked without becoming trivial, decline instead. - The `theorem_statement` is for a human reader. It should be precise, self-contained, and include the hypotheses. - The `formal_sketch` tells the formalization agent what tactics or lemmas look promising in Lean 4 / Mathlib and why this theorem helps the user's prompt. Keep it concrete. - The `source_excerpt` is 2-6 sentences of motivating context. For `existing_paper_claim`, it must be a direct paraphrase or quote from the current paper. For extension-derived theorems, it may explain the partial paper work, outline item, supporting evidence, and/or user-prompt need that the theorem extends. - Set `expected_novelty_tier` to one of: "major_mathematical_discovery", "mathematical_discovery", "novel_variant", "novel_formulation". If the best honest tier is "not_novel", decline. - Include `prompt_relevance_rationale`, `novelty_rationale`, and `why_not_standard_known_result`. If you cannot explain why the target is not merely standard known mathematics, decline. -If Stage 1 guesses wrong, Stage 2 cannot recover - 5 Lean 4 attempts will be spent on the wrong target. Prefer declining over a weak or off-prompt proposal. +If Stage 1 selects a target, Stage 2 receives up to 5 Lean 4 attempts with error feedback. Use that retry budget for ambitious but plausibly Lean-addressable prompt-solving targets, not meager routine progress. Prefer declining over a weak, easy, or off-prompt proposal. Output your response ONLY as JSON in this exact format: {{{{ @@ -109,7 +110,7 @@ "prompt_relevance_rationale": "why proving this directly solves, solves toward, or materially helps solve the user prompt (empty if needs_theorem_work=false)", "novelty_rationale": "why this is absent from standard references or Mathlib and public/citable rather than known background or program-local novelty (empty if needs_theorem_work=false)", "why_not_standard_known_result": "why this is not merely textbook/Mathlib/routine helper knowledge (empty if needs_theorem_work=false)", - "retry_existing_failure_id": "theorem_id from OPEN LEMMA TARGETS if retrying a prior failure, empty string otherwise", + "retry_existing_failure_id": "theorem_id from OPEN PROOF TARGETS if retrying a prior failure, empty string otherwise", "reasoning": "why this theorem is the best prompt-relevant target right now OR why no theorem should be attempted" }}}}""" @@ -138,9 +139,9 @@ "source_excerpt": "In Section 2 we introduce a refinement operator on admissible witness families and claim that it preserves the compression invariant used by the main construction...", "theorem_origin": "existing_paper_claim", "placement_preference": "inline", - "expected_novelty_tier": "novel_formulation", + "expected_novelty_tier": "mathematical_discovery", "prompt_relevance_rationale": "The paper uses this invariant-preservation fact as a required local step for the user's requested argument.", - "novelty_rationale": "The theorem packages a paper-specific invariant/refinement interaction that is not a standard textbook or Mathlib theorem and would be citable as part of the paper's formal contribution.", + "novelty_rationale": "The theorem establishes a new invariant-preservation result for the paper's prompt-specific construction rather than merely formalizing a known fact.", "why_not_standard_known_result": "The statement depends on paper-defined objects and an original invariant, rather than restating a standard library lemma or routine arithmetic fact.", "retry_existing_failure_id": "", "reasoning": "Section 2 relies on this invariant-preservation claim to support the user's requested argument but currently presents it without a verified proof. The statement is narrow enough for Lean 4 while still capturing a non-standard formal contribution." @@ -154,9 +155,9 @@ "source_excerpt": "The outline proposes a pruning operator for saturated obstruction graphs but has not yet isolated the monotone descent claim needed to justify termination. This theorem extends the partial plan into a Lean-checkable appendix result.", "theorem_origin": "extension_from_partial_work", "placement_preference": "appendix_only", - "expected_novelty_tier": "novel_variant", + "expected_novelty_tier": "mathematical_discovery", "prompt_relevance_rationale": "This theorem would justify the paper's prompt-specific descent route and make the proposed obstruction-elimination argument formally checkable.", - "novelty_rationale": "The statement combines paper-defined saturation, pruning, and score objects into a new descent theorem rather than restating a standard library fact.", + "novelty_rationale": "The statement proves a new descent theorem for the paper's obstruction-elimination route rather than restating a standard library fact.", "why_not_standard_known_result": "The target depends on paper-specific definitions and an original descent construction; decline if the available target reduces to a textbook graph lemma or routine Mathlib fact.", "retry_existing_failure_id": "", "reasoning": "This is not an exact written claim in the current paper; it extends the partial outline into a useful verified theorem. Because it is extension-derived, it should be stored in the Theorems Appendix rather than inserted inline." @@ -308,12 +309,12 @@ def _format_recent_failure_hints(hints: Iterable) -> str: if error_summary: line += f"\n last Lean 4 failure: {error_summary[:240]}" if targets: - line += f"\n suggested targets: {', '.join(targets[:6])}" + line += f"\n Lean blocker clues: {', '.join(targets[:6])}" entries.append(line) if not entries: return "" return ( - "OPEN LEMMA TARGETS LEAN 4 COULD NOT YET CLOSE (optional retry candidates):\n" + "OPEN PROOF TARGETS LEAN 4 COULD NOT YET CLOSE (optional retry candidates for the same high-impact target):\n" + "\n".join(entries) ) @@ -433,7 +434,7 @@ async def build_rigor_placement_prompt( user_prompt: User's compiler-directing prompt. current_outline: Full outline (direct-injected). current_paper: Current paper content (direct-injected or RAG'd by the - caller per the high-context submitter budget rules). + caller per the writing submitter budget rules). rag_evidence: Optional RAG-retrieved supporting context. theorem_statement: Human-readable statement of the verified theorem. lean_code: Full Lean 4 source that compiled. Included so the model diff --git a/backend/leanoj/core/leanoj_coordinator.py b/backend/leanoj/core/leanoj_coordinator.py index 299189d..f1157cf 100644 --- a/backend/leanoj/core/leanoj_coordinator.py +++ b/backend/leanoj/core/leanoj_coordinator.py @@ -49,6 +49,11 @@ from backend.shared.api_client_manager import api_client_manager from backend.shared.brainstorm_proof_gate import is_lean_proof_submission, verify_brainstorm_proof_candidate from backend.shared.config import rag_config, system_config +from backend.shared.context_overflow import ( + CONTEXT_OVERFLOW_RESOLUTION, + CONTEXT_OVERFLOW_STOP_MESSAGE, + CONTEXT_OVERFLOW_STOP_REASON, +) from backend.shared.json_parser import parse_json from backend.shared.response_extraction import extract_message_text from backend.shared.lean4_client import Lean4Result, get_lean4_client @@ -70,6 +75,8 @@ mark_provider_paused, wait_for_provider_resume, ) +from backend.shared.proof_search.assistant_coordinator import assistant_proof_search_coordinator +from backend.shared.proof_search.assistant_models import AssistantTargetSnapshot from backend.shared.token_tracker import token_tracker from backend.shared.utils import count_tokens @@ -150,6 +157,7 @@ "master proof route", "next obligation", ) +_LEANOJ_PROOF_SEARCH_MAX_TOKENS = 3500 class LeanOJConfigurationError(RuntimeError): @@ -291,6 +299,9 @@ def __init__(self) -> None: self._restored_from_disk = False self._master_proof_no_progress_count = 0 self._last_master_proof_edit_signature = "" + self._pending_final_solver_assistant_target_hash = "" + self._fatal_stop_reason: Optional[str] = None + self._fatal_stop_message: str = "" @property def is_running(self) -> bool: @@ -307,6 +318,34 @@ async def _broadcast(self, event: str, data: Optional[dict[str, Any]] = None) -> if self._broadcast_callback: await self._broadcast_callback(event, data or {}) + @staticmethod + def _is_context_overflow_exception(exc: BaseException) -> bool: + message = str(exc or "").lower() + return ( + "context overflow" in message + or "prompt context overflow" in message + or "exceeds the configured input budget" in message + or "mandatory direct-inject" in message + ) + + async def _handle_context_overflow_stop(self, exc: BaseException, *, role_id: str = "") -> None: + self._fatal_stop_reason = CONTEXT_OVERFLOW_STOP_REASON + self._fatal_stop_message = CONTEXT_OVERFLOW_STOP_MESSAGE + self._state.phase = "stopped" + self._state.last_error = str(exc) + await self._persist_and_broadcast( + "context_overflow_error", + { + "workflow_mode": "leanoj", + "role_id": role_id, + "phase": self._state.phase, + "reason": CONTEXT_OVERFLOW_STOP_REASON, + "message": CONTEXT_OVERFLOW_STOP_MESSAGE, + "error_detail": str(exc), + "resolution": CONTEXT_OVERFLOW_RESOLUTION, + }, + ) + def get_state(self) -> LeanOJState: return self._state @@ -520,6 +559,8 @@ async def start(self) -> None: self._running = True self._state.is_running = True + self._fatal_stop_reason = None + self._fatal_stop_message = "" if self._state.phase == "idle": self._state.phase = "initial_topic_candidates" elif self._state.phase in {"stopped", "error"}: @@ -553,11 +594,24 @@ async def start(self) -> None: await self._run_workflow(self._request) except asyncio.CancelledError: raise + except LeanOJConfigurationError as exc: + if self._is_context_overflow_exception(exc): + logger.error("LeanOJ stopped for context overflow: %s", exc) + await self._handle_context_overflow_stop(exc) + else: + logger.exception("LeanOJ workflow failed") + self._state.phase = "error" + self._state.last_error = str(exc) + await self._persist_and_broadcast("leanoj_error", {"message": str(exc)}) except Exception as exc: - logger.exception("LeanOJ workflow failed") - self._state.phase = "error" - self._state.last_error = str(exc) - await self._persist_and_broadcast("leanoj_error", {"message": str(exc)}) + if self._is_context_overflow_exception(exc): + logger.error("LeanOJ stopped for context overflow: %s", exc) + await self._handle_context_overflow_stop(exc) + else: + logger.exception("LeanOJ workflow failed") + self._state.phase = "error" + self._state.last_error = str(exc) + await self._persist_and_broadcast("leanoj_error", {"message": str(exc)}) finally: self._running = False self._state.is_running = False @@ -566,7 +620,14 @@ async def start(self) -> None: self._state.updated_at = datetime.now() token_tracker.stop_timer() api_client_manager.set_autonomous_logger_callback(None) - await self._persist_and_broadcast("leanoj_stopped") + stopped_payload = None + if self._fatal_stop_reason: + stopped_payload = { + **self.get_status(), + "reason": self._fatal_stop_reason, + "message": self._fatal_stop_message or CONTEXT_OVERFLOW_STOP_MESSAGE, + } + await self._persist_and_broadcast("leanoj_stopped", stopped_payload) async def stop(self) -> None: if not self.is_active and not self._state.session_id: @@ -619,6 +680,8 @@ async def clear(self) -> None: self._restored_from_disk = False self._master_proof_no_progress_count = 0 self._last_master_proof_edit_signature = "" + self._fatal_stop_reason = None + self._fatal_stop_message = "" await self._broadcast("leanoj_cleared", self.get_status()) async def skip_brainstorm(self) -> None: @@ -1017,7 +1080,10 @@ async def _topic_submitter_loop( ) except asyncio.CancelledError: raise - except LeanOJConfigurationError: + except LeanOJConfigurationError as exc: + if self._is_context_overflow_exception(exc): + await self._handle_context_overflow_stop(exc, role_id=role_id) + self._stop_event.set() raise except Exception as exc: logger.warning("LeanOJ topic submitter %s failed: %s", submitter_index, exc) @@ -1652,7 +1718,10 @@ async def _brainstorm_submitter_loop( ) except asyncio.CancelledError: raise - except LeanOJConfigurationError: + except LeanOJConfigurationError as exc: + if self._is_context_overflow_exception(exc): + await self._handle_context_overflow_stop(exc, role_id=role_id) + self._stop_event.set() raise except Exception as exc: logger.warning("LeanOJ brainstorm submitter %s failed: %s", submitter_index, exc) @@ -2701,7 +2770,86 @@ async def _build_context_blocks( current_working_proof_attempt=working_proof_attempt, capped_rejection_feedback=capped_rejection_feedback, ) - return allocation.as_prompt_blocks() + context_blocks = allocation.as_prompt_blocks() + if resolved_scope == "final_solver": + proof_search_context = await self._build_final_solver_proof_search_context( + request=request, + task_request=task_request, + ) + if proof_search_context: + context_blocks["proof_search_context"] = proof_search_context + return context_blocks + + async def _build_final_solver_proof_search_context( + self, + *, + request: LeanOJStartRequest, + task_request: str, + ) -> str: + """Return compact optional proof-search context for the LeanOJ final solver.""" + self._pending_final_solver_assistant_target_hash = "" + master_proof = await self._read_master_proof() + snapshot = AssistantTargetSnapshot( + workflow_mode="leanoj", + target_kind="final_solver", + user_prompt=request.user_prompt, + target_statement=request.lean_template, + lean_template=request.lean_template, + formal_sketch=master_proof[-4000:], + rejection_feedback="\n".join( + str(item.get("feedback") or item.get("reasoning") or item) + for item in self._final_solver_failure_window()[-5:] + ), + accepted_solver_summary="\n".join(self._final_solver_active_plan_items()[-8:]), + source_title="LeanOJ final proof solver", + source_type="leanoj", + source_id=self._state.session_id, + imports=["Mathlib"], + ) + target_hash = assistant_proof_search_coordinator.submit_target(snapshot) + pack = assistant_proof_search_coordinator.get_latest_pack(target_hash) + if not pack: + return "" + formatted = pack.to_prompt_context(max_code_chars_per_result=0) + if count_tokens(formatted) > _LEANOJ_PROOF_SEARCH_MAX_TOKENS: + return "[Assistant proof-support results omitted because they exceeded the final-solver optional context budget.]" + if pack.results: + self._pending_final_solver_assistant_target_hash = target_hash + return formatted + + @staticmethod + def _format_final_solver_proof_search_results(records: list[dict[str, Any]]) -> str: + lines = [ + "[Optional retrieved proof context. Use only as verified-helper/proof-pattern guidance for the current LeanOJ template. Do not paste unrelated proofs or build a known-knowledge library.]", + ] + for index, record in enumerate(records[:7], start=1): + source = " ".join( + str(part).strip() + for part in [ + record.get("corpus"), + record.get("corpus_scope") or record.get("release_id"), + ] + if str(part or "").strip() + ) + lines.extend( + [ + "", + f"Result {index}", + f"Source: {source or '[unknown]'}", + f"Source kind: {record.get('source_kind') or '[unknown]'}", + f"Proof ID: {record.get('proof_id') or '[none]'}", + f"Fingerprint: {record.get('fingerprint') or '[none]'}", + f"Theorem: {record.get('theorem_name') or record.get('display_title') or '[unnamed]'}", + f"Statement: {record.get('theorem_statement') or '[missing]'}", + f"Description: {record.get('proof_description') or record.get('formal_sketch') or '[none]'}", + f"Imports: {', '.join(record.get('imports') or []) or '[none]'}", + f"Dependencies: {', '.join(record.get('dependency_names') or []) or '[none]'}", + f"Theorem statement hash: {record.get('theorem_statement_hash') or '[none]'}", + f"Lean code hash: {record.get('lean_code_hash') or '[none]'}", + f"Canonical URI: {record.get('canonical_uri') or '[none]'}", + ] + ) + return "\n".join(lines) def _infer_context_scope(self, mode: str) -> str: if mode == "final_solver": @@ -4689,9 +4837,15 @@ async def _call_json( config.context_window, config.max_output_tokens, ) + await api_client_manager.prewarm_assistant_memory_context( + task_id=task_id, + role_id=role_id, + prompt=current_prompt, + ) call_payload["prompt_tokens"] = prompt_tokens call_payload["max_input_tokens"] = max_input_tokens if prompt_tokens > max_input_tokens: + self._pending_final_solver_assistant_target_hash = "" raise LeanOJConfigurationError( "PROOF SOLVER PROMPT CONTEXT OVERFLOW: assembled prompt exceeds the configured " f"input budget for role {role_id}. Prompt tokens: {prompt_tokens}. " @@ -4706,6 +4860,13 @@ async def _call_json( max_tokens=config.max_output_tokens, temperature=temperature, ) + if self._pending_final_solver_assistant_target_hash: + assistant_proof_search_coordinator.mark_pack_consumed_by_solver( + self._pending_final_solver_assistant_target_hash, + role_id=role_id, + task_id=task_id, + ) + self._pending_final_solver_assistant_target_hash = "" self.completed_task_ids.add(task_id) choices = response.get("choices") or [] @@ -4868,6 +5029,8 @@ def _missing_model_roles(request: LeanOJStartRequest) -> list[str]: ("brainstorm_validator", request.brainstorm_validator), ("final_solver", request.final_solver), ] + if (request.assistant.model_id or "").strip(): + role_configs.append(("assistant", request.assistant)) role_configs.extend( (f"brainstorm_submitter_{index}", submitter) for index, submitter in enumerate(request.brainstorm_submitters, start=1) @@ -4918,6 +5081,7 @@ def _refresh_workflow_tasks(self, active_prefix: str = "leanoj_topic", active_ro self.workflow_tasks = tasks def _configure_roles(self, request: LeanOJStartRequest) -> None: + assistant_config = request.assistant if (request.assistant.model_id or "").strip() else request.topic_validator self._configure_role("leanoj_topic_generator", request.topic_generator) self._configure_role("leanoj_topic_selector", request.topic_generator) self._configure_role("leanoj_topic_validator", request.topic_validator) @@ -4926,6 +5090,7 @@ def _configure_roles(self, request: LeanOJStartRequest) -> None: self._configure_role("leanoj_brainstorm_validator", request.brainstorm_validator) self._configure_role("leanoj_master_proof_edit_validator", request.brainstorm_validator) self._configure_role("leanoj_final_solver", request.final_solver) + self._configure_role("leanoj_assistant", assistant_config) for index, submitter in enumerate(request.brainstorm_submitters, start=1): self._configure_role(f"leanoj_topic_submitter_{index}", submitter) self._configure_role(f"leanoj_brainstorm_submitter_{index}", submitter) diff --git a/backend/leanoj/prompts.py b/backend/leanoj/prompts.py index 765a9a2..47afaf2 100644 --- a/backend/leanoj/prompts.py +++ b/backend/leanoj/prompts.py @@ -268,6 +268,7 @@ def _format_context_blocks(context_blocks: dict[str, str] | None, fallback: str) current_packet = (context_blocks.get("current_final_cycle_packet") or "").strip() direct_context = (context_blocks.get("direct_proof_context") or "").strip() rag_context = (context_blocks.get("rag_evidence_context") or "").strip() + proof_search_context = (context_blocks.get("proof_search_context") or "").strip() refuted_warnings = (context_blocks.get("refuted_construction_warnings") or "").strip() capped_feedback = (context_blocks.get("capped_rejection_feedback") or "").strip() if working_proof: @@ -278,6 +279,11 @@ def _format_context_blocks(context_blocks: dict[str, str] | None, fallback: str) sections.append(f"DIRECT PROOF CONTEXT:\n{direct_context}") if rag_context: sections.append(f"RETRIEVED LEANOJ RAG EVIDENCE:\n{rag_context}") + if proof_search_context: + sections.append( + "SYNTHETIC / LOCAL VERIFIED PROOF SEARCH RESULTS:\n" + f"{proof_search_context}" + ) if refuted_warnings: sections.append( "REFUTED CONSTRUCTIONS - DO NOT USE AS PROOF EVIDENCE:\n" diff --git a/backend/shared/api_client_manager.py b/backend/shared/api_client_manager.py index ff52d17..5251c0d 100644 --- a/backend/shared/api_client_manager.py +++ b/backend/shared/api_client_manager.py @@ -11,6 +11,7 @@ import asyncio import json import logging +import re import time from typing import Dict, Any, List, Optional, Callable @@ -22,7 +23,8 @@ RateLimitError, FreeModelExhaustedError ) -from backend.shared.openai_codex_client import OpenAICodexError, openai_codex_client +from backend.shared.openai_codex_client import OpenAICodexError, OAuthUsageLimitError, openai_codex_client +from backend.shared.sakana_fugu_client import SakanaFuguError, sakana_fugu_client from backend.shared.xai_grok_client import XAIGrokError, xai_grok_client from backend.shared.boost_manager import boost_manager from backend.shared.boost_logger import boost_logger @@ -32,12 +34,107 @@ from backend.shared.json_parser import sanitize_model_output_for_retry_context from backend.shared.log_redaction import redact_log_text from backend.shared.models import ModelConfig +from backend.shared.provider_notification_store import record_provider_notification +from backend.shared.proof_search.assistant_coordinator import assistant_proof_search_coordinator +from backend.shared.proof_search.assistant_models import AssistantTargetSnapshot from backend.shared.response_extraction import extract_response_text from backend.shared.token_tracker import token_tracker +from backend.shared.utils import count_tokens logger = logging.getLogger(__name__) +OAUTH_LIVE_ERROR_MAX_CHARS = 1800 +_TRUNCATION_SUFFIX = "..." + + +def _cap_oauth_live_error_text(value: Any, max_chars: int = OAUTH_LIVE_ERROR_MAX_CHARS) -> str: + """Return a redacted one-line provider error that is at most max_chars long.""" + text = redact_log_text(value).strip() + if len(text) <= max_chars: + return text + if max_chars <= len(_TRUNCATION_SUFFIX): + return text[:max_chars] + return text[: max_chars - len(_TRUNCATION_SUFFIX)] + _TRUNCATION_SUFFIX + + +def _extract_error_message_from_json(value: Any) -> tuple[Optional[str], Optional[str]]: + """Extract (code, message) from common provider error JSON shapes.""" + if not isinstance(value, dict): + return None, None + + code = value.get("code") + message = value.get("message") + if isinstance(message, str) and message.strip(): + return (str(code).strip() if code is not None else None), message.strip() + + for key in ("error", "response"): + nested = value.get(key) + if isinstance(nested, dict): + nested_code, nested_message = _extract_error_message_from_json(nested) + if nested_message: + return nested_code or (str(code).strip() if code is not None else None), nested_message + + return (str(code).strip() if code is not None else None), None + + +def oauth_live_activity_error_message(error: Exception) -> str: + """Best-effort visible OAuth provider error summary for live activity.""" + raw = str(error) + start = raw.find("{") + end = raw.rfind("}") + if 0 <= start < end: + try: + parsed = json.loads(raw[start : end + 1]) + except json.JSONDecodeError: + parsed = None + code, message = _extract_error_message_from_json(parsed) + if message: + detail = f"{code}: {message}" if code else message + return _cap_oauth_live_error_text(detail) + + for prefix in ( + "OpenAI Codex completion failed:", + "OpenAI Codex failed:", + "xAI Grok completion failed:", + "xAI Grok failed:", + "Sakana Fugu completion failed:", + "Sakana Fugu failed:", + ): + if raw.startswith(prefix): + raw = raw[len(prefix):].strip() + break + return _cap_oauth_live_error_text(raw) + + +class OAuthProviderCooldownError(RuntimeError): + """Raised when an OAuth provider is cooling down until a provider reset time.""" + + def __init__( + self, + *, + provider: str, + provider_label: str, + role_id: str, + model: str, + resets_at: Optional[int], + resets_in_seconds: Optional[int], + plan_type: str = "", + message: str = "", + ) -> None: + self.provider = provider + self.provider_label = provider_label + self.role_id = role_id + self.model = model + self.resets_at = resets_at + self.resets_in_seconds = resets_in_seconds + self.plan_type = plan_type + base = message or f"{provider_label} usage limit reached" + if resets_in_seconds is not None: + base = f"{base}; resets in {resets_in_seconds} seconds" + super().__init__(base) + + def _response_shape_for_logging(response: Any) -> str: """Summarize an upstream response shape without logging provider/model text.""" if isinstance(response, dict): @@ -68,6 +165,54 @@ class APIClientManager: 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, ) + ASSISTANT_MEMORY_MAX_CODE_CHARS = 1200 + ASSISTANT_MEMORY_MAX_TARGET_CHARS = 8000 + ASSISTANT_MEMORY_SUMMARY_CHARS = 1800 + ASSISTANT_MEMORY_SECTION_BOUNDARIES = ( + "USER PROMPT", + "USER'S RESEARCH PROMPT", + "USER RESEARCH PROMPT", + "ORIGINAL USER PROMPT", + "RESEARCH PROMPT", + "RESEARCH GOAL", + "USER GOAL", + "HIGH-LEVEL RESEARCH PROMPT", + "CURRENT BRAINSTORM TOPIC", + "BRAINSTORM TOPIC", + "TOPIC PROMPT", + "CURRENT TOPIC", + "LEANOJ PROBLEM", + "PROBLEM", + "YOUR TASK", + "WRITING GOAL", + "CURRENT PHASE", + "PAPER TITLE", + "THEOREM CANDIDATE", + "TARGET THEOREM", + "CURRENT OUTLINE", + "OUTLINE", + "VOLUME ORGANIZATION", + "CURRENT DOCUMENT PROGRESS", + "CURRENT PAPER", + "MASTER PROOF", + "LEAN TEMPLATE", + "CURRENT PROOF DRAFT", + "SOURCE CONTENT", + "CURRENT ACCEPTED SUBMISSIONS DATABASE", + "ACCEPTED SUBMISSIONS", + "BRAINSTORM SUMMARY", + "VERIFIED PROOF SUMMARIES", + "DIRECT PROOF CONTEXT", + "SHARED TRAINING", + "LOCAL TRAINING", + "REJECTION LOG", + "RETRIEVED EVIDENCE", + "REJECTION FEEDBACK", + "RECENT REJECTIONS", + "FAILED ATTEMPTS", + "LEAN ERRORS", + "EXECUTION FEEDBACK", + ) def __init__(self): self._openrouter_client: Optional[OpenRouterClient] = None @@ -101,6 +246,12 @@ def __init__(self): # Track roles that have already broadcast fallback_failed (prevent GUI log spam) self._fallback_failed_notified: set = set() + + # Track OAuth provider usage-limit cooldowns reported by the provider. + # Keys are provider identifiers such as "openai_codex_oauth". + self._oauth_provider_cooldowns: Dict[str, Dict[str, Any]] = {} + self._oauth_cooldown_notified: set[str] = set() + self._oauth_cooldown_fallback_roles: set[str] = set() # Lock for thread-safe state updates self._state_lock = asyncio.Lock() @@ -133,19 +284,56 @@ async def _broadcast_unrecoverable_codex_error( error: Exception, ) -> None: """Notify the UI when a Codex role cannot recover through fallback.""" - await self._broadcast("openai_codex_oauth_error", { + payload = { "role_id": role_id, "model": model, "provider": "openai_codex_oauth", + "provider_label": "OpenAI Codex", "reason": "unrecoverable_codex_error", "recoverable": False, "message": ( "OpenAI Codex failed and no LM Studio fallback is configured. " - "Please check your OpenAI Codex OAuth connection in Cloud Access & Keys, " + "Please check your OpenAI Codex OAuth connection in OpenRouter/OAuth, " "sign in again, and retry." ), "error_summary": redact_log_text(str(error), 700), - }) + "oauth_error_message": oauth_live_activity_error_message(error), + } + stored_payload = await asyncio.to_thread( + record_provider_notification, + "openai_codex_oauth_error", + payload, + ) + await self._broadcast("openai_codex_oauth_error", stored_payload) + + async def _broadcast_unrecoverable_sakana_fugu_error( + self, + *, + role_id: str, + model: str, + error: Exception, + ) -> None: + """Notify the UI when a Sakana Fugu role cannot recover through fallback.""" + payload = { + "role_id": role_id, + "model": model, + "provider": "sakana_fugu", + "provider_label": "Sakana Fugu", + "reason": "unrecoverable_sakana_fugu_error", + "recoverable": False, + "message": ( + "Sakana Fugu failed and no LM Studio fallback is configured. " + "Please check your Sakana Fugu API key in OpenRouter/OAuth and retry." + ), + "error_summary": redact_log_text(str(error), 700), + "oauth_error_message": oauth_live_activity_error_message(error), + } + stored_payload = await asyncio.to_thread( + record_provider_notification, + "sakana_fugu_error", + payload, + ) + await self._broadcast("sakana_fugu_error", stored_payload) async def _broadcast_unrecoverable_xai_grok_error( self, @@ -155,7 +343,7 @@ async def _broadcast_unrecoverable_xai_grok_error( error: Exception, ) -> None: """Notify the UI when a Grok OAuth role cannot recover through fallback.""" - await self._broadcast("oauth_provider_error", { + payload = { "role_id": role_id, "model": model, "provider": "xai_grok_oauth", @@ -164,12 +352,146 @@ async def _broadcast_unrecoverable_xai_grok_error( "recoverable": False, "message": ( "xAI Grok failed and no LM Studio fallback is configured. " - "Please check your xAI Grok OAuth connection in Cloud Access & Keys, " + "Please check your xAI Grok OAuth connection in OpenRouter/OAuth, " "sign in again, and retry. If xAI reports subscription or credit limits, " "check your SuperGrok/X Premium entitlement." ), "error_summary": redact_log_text(str(error), 700), - }) + "oauth_error_message": oauth_live_activity_error_message(error), + } + stored_payload = await asyncio.to_thread( + record_provider_notification, + "oauth_provider_error", + payload, + ) + await self._broadcast("oauth_provider_error", stored_payload) + + @staticmethod + def _cooldown_until_from_error(error: OAuthUsageLimitError) -> int: + if error.resets_at: + return int(error.resets_at) + if error.resets_in_seconds: + return int(time.time()) + int(error.resets_in_seconds) + return int(time.time()) + 3600 + + def get_provider_cooldown(self, provider: str) -> Optional[Dict[str, Any]]: + """Return active cooldown metadata for a provider, clearing expired entries.""" + provider_key = str(provider or "").strip() + if not provider_key: + return None + cooldown = self._oauth_provider_cooldowns.get(provider_key) + if not cooldown: + return None + cooldown_until = int(cooldown.get("cooldown_until") or cooldown.get("resets_at") or 0) + if cooldown_until and cooldown_until <= int(time.time()): + self._oauth_provider_cooldowns.pop(provider_key, None) + self._oauth_cooldown_notified = { + key for key in self._oauth_cooldown_notified if not key.startswith(f"{provider_key}:") + } + return None + resets_in_seconds = max(1, cooldown_until - int(time.time())) if cooldown_until else None + return {**cooldown, "resets_in_seconds": resets_in_seconds} + + def is_provider_cooling_down(self, provider: str) -> bool: + return self.get_provider_cooldown(provider) is not None + + async def wait_for_oauth_provider_cooldown( + self, + error: OAuthProviderCooldownError, + *, + role_id: str = "", + ) -> None: + """Sleep until an OAuth provider usage-limit cooldown expires.""" + provider = str(error.provider or "").strip() or "openai_codex_oauth" + active_role = role_id or error.role_id or "unknown" + while self.is_provider_cooling_down(provider): + cooldown = self.get_provider_cooldown(provider) or {} + wait_seconds = cooldown.get("resets_in_seconds") + if wait_seconds is None and error.resets_in_seconds is not None: + wait_seconds = error.resets_in_seconds + if wait_seconds is None and error.resets_at: + wait_seconds = max(1, int(error.resets_at) - int(time.time())) + wait_seconds = max(1, min(int(wait_seconds or 60), 300)) + logger.warning( + "%s usage-limit cooldown active for role '%s'; waiting %s seconds before retry", + error.provider_label or provider, + active_role, + wait_seconds, + ) + await asyncio.sleep(wait_seconds) + + def _mark_oauth_provider_cooldown( + self, + error: OAuthUsageLimitError, + *, + role_id: str, + model: str, + ) -> Dict[str, Any]: + cooldown_until = self._cooldown_until_from_error(error) + resets_in_seconds = max(1, cooldown_until - int(time.time())) + payload = { + "provider": error.provider, + "provider_label": error.provider_label, + "role_id": role_id, + "model": model, + "reason": "usage_limit_reached", + "recoverable": True, + "plan_type": error.plan_type, + "resets_at": cooldown_until, + "cooldown_until": cooldown_until, + "resets_in_seconds": resets_in_seconds, + "message": ( + f"{error.provider_label} usage limit reached for {role_id}. " + f"Provider reports reset in {resets_in_seconds} seconds." + ), + "error_summary": redact_log_text(str(error), 700), + "oauth_error_message": oauth_live_activity_error_message(error), + } + self._oauth_provider_cooldowns[error.provider] = payload + return payload + + async def _broadcast_oauth_usage_limit( + self, + payload: Dict[str, Any], + *, + fallback_model: str = "", + ) -> None: + notify_payload = dict(payload) + provider_label = notify_payload.get("provider_label", "OAuth provider") + role_id = notify_payload.get("role_id", "a role") + resets_in_seconds = notify_payload.get("resets_in_seconds") + if fallback_model: + notify_payload["fallback_model"] = fallback_model + notify_payload["message"] = ( + f"{provider_label} usage limit reached for " + f"{role_id}. Using LM Studio fallback model {fallback_model} " + f"until the provider reset." + ) + elif notify_payload.get("reason") == "usage_limit_reached": + reset_text = ( + f" Provider reports reset in {resets_in_seconds} seconds." + if resets_in_seconds is not None + else "" + ) + notify_payload["message"] = ( + f"{provider_label} usage limit reached for {role_id}." + " Roles without fallback will wait until the provider reset." + f"{reset_text}" + ) + cooldown_key = ( + f"{notify_payload.get('provider')}:{notify_payload.get('role_id')}:" + f"{notify_payload.get('model')}:{notify_payload.get('cooldown_until')}:" + f"{notify_payload.get('fallback_model', '')}" + ) + if cooldown_key in self._oauth_cooldown_notified: + return + self._oauth_cooldown_notified.add(cooldown_key) + stored_payload = await asyncio.to_thread( + record_provider_notification, + "oauth_provider_usage_limited", + notify_payload, + ) + await self._broadcast("oauth_provider_usage_limited", stored_payload) async def _with_hung_connection_watchdog( self, @@ -463,9 +785,10 @@ def configure_role(self, role_id: str, config: ModelConfig) -> None: config = config.model_copy(update={"lm_studio_fallback_id": None}) self._role_model_configs[role_id] = config + self._oauth_cooldown_fallback_roles.discard(role_id) # Set initial fallback state based on provider - if config.provider in {"openrouter", "openai_codex_oauth", "xai_grok_oauth"}: + if config.provider in {"openrouter", "openai_codex_oauth", "xai_grok_oauth", "sakana_fugu"}: self._role_fallback_state[role_id] = config.provider else: self._role_fallback_state[role_id] = "lm_studio" @@ -482,8 +805,470 @@ def configure_role(self, role_id: str, config: ModelConfig) -> None: elif config.provider == "xai_grok_oauth": fallback_str = f", fallback={config.lm_studio_fallback_id}" if config.lm_studio_fallback_id else "" logger.info(f"Configured role '{role_id}': provider=xai_grok_oauth, model={config.model_id}{fallback_str}") + elif config.provider == "sakana_fugu": + fallback_str = f", fallback={config.lm_studio_fallback_id}" if config.lm_studio_fallback_id else "" + logger.info(f"Configured role '{role_id}': provider=sakana_fugu, model={config.model_id}{fallback_str}") else: logger.info(f"Configured role '{role_id}': provider=lm_studio, model={config.model_id}") + + def get_role_config(self, role_id: str) -> Optional[ModelConfig]: + """Return a configured role snapshot without exposing mutable internals.""" + config = self._role_model_configs.get(role_id) + return config.model_copy() if config is not None else None + + @classmethod + def _assistant_memory_role_is_excluded(cls, role_id: str, task_id: str, prompt: str) -> bool: + """Return True for roles that must never receive Assistant memory context.""" + role_key = f"{role_id} {task_id}".lower() + prompt_key = (prompt or "").lower() + excluded_markers = ( + "assistant", + "validator", + "_val", + "validation", + "critique", + "paper_critic", + "redundancy", + "checker", + "integrity", + "gate", + "novelty", + ) + if any(marker in role_key for marker in excluded_markers): + return True + if "self-validation" in prompt_key or "self validation" in prompt_key: + return True + user_prompt_key = cls._extract_assistant_goal_hint(prompt).lower() + if ( + "topic exploration phase" in user_prompt_key + or "paper title exploration phase" in user_prompt_key + ): + return True + if '"critique_needed"' in prompt_key or "critique_needed" in prompt_key: + return True + if "validate the" in prompt_key and "respond as json" in prompt_key: + return True + return False + + @staticmethod + def _assistant_workflow_mode_for_role(role_id: str) -> str: + normalized = (role_id or "").lower() + if "manual" in normalized or "compiler_aggregator" in normalized: + return "manual_proof_check" + if normalized.startswith("leanoj"): + return "leanoj" + if normalized.startswith("compiler") or normalized.startswith("comp_"): + return "compiler" + if normalized.startswith("agg") or normalized.startswith("aggregator"): + return "aggregator" + return "autonomous" + + @staticmethod + def _assistant_target_kind_for_role(role_id: str, task_id: str, prompt: str) -> str: + role_key = f"{role_id} {task_id}".lower() + prompt_key = (prompt or "").lower() + if role_id.lower().startswith("aggregator_submitter_"): + return "brainstorm_context" + if "reference" in role_key: + return "reference_selection_context" + if "title" in role_key: + return "title_context" + if "topic" in role_key: + return "topic_context" + if "completion" in role_key: + return "completion_review_context" + if "certainty" in role_key or "format_selector" in role_key or "volume_organizer" in role_key: + return "final_answer_context" + if "path" in role_key: + return "path_context" + if "final_review" in role_key or "semantic" in role_key: + return "semantic_review_context" + if "final" in role_key: + return "final_solver" + if "proof" in role_key or "rigor" in role_key or "high_param" in role_key: + return "theorem_discovery" + if "outline" in prompt_key or "outline_complete" in prompt_key: + return "outline_context" + if "current document progress" in prompt_key or "construction" in role_key or "writer" in role_key: + return "writing_context" + return "brainstorm_context" + + @staticmethod + def _assistant_workflow_phase_for_role(role_id: str, task_id: str, prompt: str) -> str: + role_key = f"{role_id} {task_id}".lower() + prompt_key = (prompt or "").lower() + if role_id.lower().startswith("aggregator_submitter_"): + return "brainstorm" + if "outline" in prompt_key or "outline" in role_key: + return "outline" + if "construction" in role_key or "current document progress" in prompt_key: + return "construction" + if "review" in role_key or "red-team" in prompt_key or "red team" in prompt_key: + return "review" + if "rigor" in role_key or "proof" in role_key or "lemma" in role_key: + return "proof" + if "reference" in role_key: + return "reference_selection" + if "title" in role_key: + return "title_selection" + if "topic" in role_key: + return "topic" + if "completion" in role_key: + return "completion_review" + if "final" in role_key or "certainty" in role_key or "format_selector" in role_key or "volume" in role_key: + return "final_answer" + if "leanoj" in role_key: + return "leanoj" + return "brainstorm" + + @classmethod + def _build_assistant_target_snapshot(cls, role_id: str, task_id: str, prompt: str) -> AssistantTargetSnapshot: + workflow_mode = cls._assistant_workflow_mode_for_role(role_id) + return cls._build_assistant_target_snapshot_with_overrides( + role_id, + task_id, + prompt, + workflow_mode_override=workflow_mode, + ) + + @classmethod + def _build_assistant_target_snapshot_with_overrides( + cls, + role_id: str, + task_id: str, + prompt: str, + *, + workflow_mode_override: Optional[str] = None, + ) -> AssistantTargetSnapshot: + workflow_mode = workflow_mode_override or cls._assistant_workflow_mode_for_role(role_id) + target_kind = cls._assistant_target_kind_for_role(role_id, task_id, prompt) + workflow_phase = cls._assistant_workflow_phase_for_role(role_id, task_id, prompt) + compact_prompt = cls._compact_assistant_text(prompt, cls.ASSISTANT_MEMORY_MAX_TARGET_CHARS) + goal_hint = cls._extract_assistant_goal_hint(prompt) + topic_hint = cls._extract_assistant_section( + prompt, + ( + "CURRENT BRAINSTORM TOPIC", + "BRAINSTORM TOPIC", + "TOPIC PROMPT", + "CURRENT TOPIC", + "LEANOJ PROBLEM", + "PROBLEM", + ), + ) + writing_goal = cls._extract_assistant_section( + prompt, + ( + "YOUR TASK", + "WRITING GOAL", + "CURRENT PHASE", + "PAPER TITLE", + "THEOREM CANDIDATE", + "TARGET THEOREM", + ), + ) + outline_summary = cls._extract_assistant_section( + prompt, + ("CURRENT OUTLINE", "OUTLINE", "VOLUME ORGANIZATION"), + ) + draft_summary = cls._extract_assistant_section( + prompt, + ( + "CURRENT DOCUMENT PROGRESS", + "CURRENT PAPER", + "MASTER PROOF", + "LEAN TEMPLATE", + "CURRENT PROOF DRAFT", + "SOURCE CONTENT", + ), + ) + accepted_summary = cls._extract_assistant_section( + prompt, + ( + "CURRENT ACCEPTED SUBMISSIONS DATABASE", + "ACCEPTED SUBMISSIONS", + "BRAINSTORM SUMMARY", + "VERIFIED PROOF SUMMARIES", + "DIRECT PROOF CONTEXT", + "SHARED TRAINING", + ), + ) + rejection_feedback = cls._extract_assistant_section( + prompt, + ( + "REJECTION FEEDBACK", + "RECENT REJECTIONS", + "FAILED ATTEMPTS", + "LEAN ERRORS", + "EXECUTION FEEDBACK", + ), + ) + source_titles = cls._extract_assistant_source_titles(prompt) + + target_statement = goal_hint or topic_hint or writing_goal or f"{workflow_mode}:{target_kind}" + is_aggregator_submitter = role_id.lower().startswith("aggregator_submitter_") + if is_aggregator_submitter: + # All parallel submitters in one brainstorm phase share one Assistant + # memory target. Per-lane rejection logs and task IDs are intentionally + # excluded so the pack refreshes for the brainstorm state, not each lane. + compact_prompt = "" + rejection_feedback = "" + source_title = f"{workflow_mode}:brainstorm_submitter_pack" + source_type = f"{workflow_mode}_brainstorm_submitters" + source_id = "shared_brainstorm_pack" + else: + source_title = f"{role_id} {task_id}".strip() + source_type = role_id + source_id = task_id + return AssistantTargetSnapshot( + workflow_mode=workflow_mode, + target_kind=target_kind, + workflow_phase=workflow_phase, + active_mode=workflow_mode, + user_prompt=goal_hint or compact_prompt, + current_prompt_or_topic=topic_hint, + current_submission_or_draft=compact_prompt, + accepted_memory_summary=accepted_summary, + writing_goal=writing_goal, + outline_summary=outline_summary, + paper_or_proof_draft_summary=draft_summary, + recent_activity_summary=rejection_feedback, + rejection_feedback=rejection_feedback, + target_statement=target_statement, + formal_sketch=compact_prompt, + source_title=source_title, + source_type=source_type, + source_id=source_id, + source_titles=source_titles, + imports=["Mathlib"], + ) + + @classmethod + def _compact_assistant_text(cls, value: str, max_chars: int) -> str: + text = " ".join((value or "").split()) + if len(text) <= max_chars: + return text + return text[:max_chars].rstrip() + "..." + + @classmethod + def _extract_assistant_goal_hint(cls, prompt: str) -> str: + return cls._extract_assistant_section( + prompt, + ( + "USER PROMPT", + "USER COMPILER-DIRECTING PROMPT", + "USER'S RESEARCH PROMPT", + "USER RESEARCH PROMPT", + "ORIGINAL USER PROMPT", + "RESEARCH PROMPT", + "RESEARCH GOAL", + "USER GOAL", + "HIGH-LEVEL RESEARCH PROMPT", + ), + ) + + @classmethod + def _extract_assistant_section(cls, prompt: str, headings: tuple[str, ...]) -> str: + if not prompt: + return "" + lines = prompt.splitlines() + capture: list[str] = [] + found = False + for line in lines: + stripped = line.strip() + if not found: + matched, remainder = cls._assistant_heading_match(stripped, headings) + if not matched: + continue + found = True + if remainder: + capture.append(remainder) + continue + if cls._assistant_line_is_boundary(stripped): + break + capture.append(line) + if not found: + return "" + text = " ".join("\n".join(capture).split()) + return cls._compact_assistant_text(text, cls.ASSISTANT_MEMORY_SUMMARY_CHARS) + + @classmethod + def _assistant_heading_match(cls, line: str, headings: tuple[str, ...]) -> tuple[bool, str]: + normalized_line = cls._normalize_assistant_heading(line) + for heading in headings: + normalized_heading = cls._normalize_assistant_heading(heading) + if normalized_line == normalized_heading: + return True, "" + if normalized_line.startswith(f"{normalized_heading}:"): + return True, line.split(":", 1)[1].strip() + return False, "" + + @classmethod + def _assistant_line_is_boundary(cls, line: str) -> bool: + if not line: + return False + if set(line) == {"-"}: + return True + matched, _ = cls._assistant_heading_match(line, cls.ASSISTANT_MEMORY_SECTION_BOUNDARIES) + return matched + + @staticmethod + def _normalize_assistant_heading(value: str) -> str: + text = re.sub(r"^\s*#+\s*", "", value or "").strip() + if text.startswith("[") and text.endswith("]"): + text = text[1:-1].strip() + text = text.rstrip(":").strip() + return " ".join(text.upper().split()) + + @classmethod + def _extract_assistant_source_titles(cls, prompt: str) -> list[str]: + if not prompt: + return [] + titles: list[str] = [] + patterns = ( + r"(?im)^\s*(?:paper|source|reference)\s+title\s*:\s*(.+)$", + r"(?im)^\s*title\s*:\s*(.+)$", + ) + for pattern in patterns: + for match in re.finditer(pattern, prompt): + title = " ".join(match.group(1).split())[:200] + if title and title not in titles: + titles.append(title) + if len(titles) >= 8: + return titles + return titles + + async def _maybe_add_assistant_memory_context( + self, + *, + task_id: str, + role_id: str, + role_config: Optional[ModelConfig], + messages: List[Dict[str, Any]], + max_tokens: Optional[int], + tools: Optional[List[Dict[str, Any]]], + tool_choice: Optional[Any], + workflow_mode_override: Optional[str] = None, + ) -> tuple[List[Dict[str, Any]], str]: + """Append non-blocking Assistant memory to eligible non-validator calls. + + Assistant memory is optional and last-drop. Validators, critique roles, + multi-turn tool-call protocol conversations, and retry conversations are + intentionally left untouched. Initial single-user messages may still + receive memory before tools are offered to the model. + """ + if not system_config.agent_conversation_memory_enabled: + return messages, "" + if role_config is None: + return messages, "" + if len(messages) != 1 or messages[0].get("role") != "user": + return messages, "" + + prompt = str(messages[0].get("content") or "") + if not prompt or "ASSISTANT RETRIEVED " in prompt: + return messages, "" + if self._assistant_memory_role_is_excluded(role_id, task_id, prompt): + return messages, "" + + snapshot = self._build_assistant_target_snapshot_with_overrides( + role_id, + task_id, + prompt, + workflow_mode_override=workflow_mode_override, + ) + target_hash = assistant_proof_search_coordinator.submit_target(snapshot) + pack = assistant_proof_search_coordinator.get_latest_pack(target_hash) + if not pack or not pack.results: + return messages, "" + + assistant_context = pack.to_memory_prompt_context( + max_code_chars_per_result=self.ASSISTANT_MEMORY_MAX_CODE_CHARS, + ) + augmented_prompt = self._append_assistant_memory_block(prompt, assistant_context) + if not self._prompt_fits_role_budget( + augmented_prompt, + role_config=role_config, + explicit_max_tokens=max_tokens, + role_id=role_id, + ): + metadata_only_context = pack.to_memory_prompt_context(max_code_chars_per_result=0) + augmented_prompt = self._append_assistant_memory_block(prompt, metadata_only_context) + if not self._prompt_fits_role_budget( + augmented_prompt, + role_config=role_config, + explicit_max_tokens=max_tokens, + role_id=role_id, + ): + return messages, "" + + return [{**messages[0], "content": augmented_prompt}], target_hash + + async def prewarm_assistant_memory_context( + self, + *, + task_id: str, + role_id: str, + prompt: str, + workflow_mode_override: Optional[str] = None, + ) -> str: + """Schedule Assistant memory for an eligible prompt before model-call preflight. + + Many workflows validate mandatory prompt size before calling + `generate_completion()`. This helper gives those producer paths the same + non-blocking Assistant lifecycle as normal completions, even if the + prompt later overflows and no model call is made. + """ + if not system_config.agent_conversation_memory_enabled: + return "" + async with self._state_lock: + role_config = self._role_model_configs.get(role_id) + if role_config is None: + return "" + prompt = str(prompt or "") + if not prompt or "ASSISTANT RETRIEVED " in prompt: + return "" + if self._assistant_memory_role_is_excluded(role_id, task_id, prompt): + return "" + snapshot = self._build_assistant_target_snapshot_with_overrides( + role_id, + task_id, + prompt, + workflow_mode_override=workflow_mode_override, + ) + target_hash = assistant_proof_search_coordinator.submit_target(snapshot) + return target_hash + + @staticmethod + def _append_assistant_memory_block(prompt: str, assistant_context: str) -> str: + return ( + f"{prompt}\n\n---\n\n" + "OPTIONAL ASSISTANT MEMORY CONTEXT:\n" + f"{assistant_context}\n\n" + "Use the Assistant memory only when it is relevant. It is supporting context, " + "not validator feedback, not a requirement to cite, and not a replacement for the user prompt." + ) + + def _prompt_fits_role_budget( + self, + prompt: str, + *, + role_config: ModelConfig, + explicit_max_tokens: Optional[int], + role_id: str, + ) -> bool: + try: + effective_max_tokens = self._effective_max_tokens( + explicit_max_tokens, + role_config.max_output_tokens, + role_id, + ) + max_input_tokens = rag_config.get_available_input_tokens( + role_config.context_window, + effective_max_tokens, + ) + except Exception: + return False + return count_tokens(prompt) <= max_input_tokens def _determine_boost_mode(self, task_id: str) -> Optional[str]: """ @@ -528,13 +1313,26 @@ async def generate_completion( **kwargs ) -> Dict[str, Any]: """Generate a completion, optionally wrapping the role with Supercharge.""" + disable_supercharge = bool(kwargs.pop("_moto_disable_supercharge", False)) + assistant_workflow_mode_override = kwargs.pop("_moto_assistant_workflow_mode", None) async with self._state_lock: role_config = self._role_model_configs.get(role_id) - supercharge_enabled = bool(getattr(role_config, "supercharge_enabled", False)) + messages, assistant_memory_target_hash = await self._maybe_add_assistant_memory_context( + task_id=task_id, + role_id=role_id, + role_config=role_config, + messages=messages, + max_tokens=max_tokens, + tools=tools, + tool_choice=tool_choice, + workflow_mode_override=assistant_workflow_mode_override, + ) + + supercharge_enabled = bool(getattr(role_config, "supercharge_enabled", False)) and not disable_supercharge # Tool-call conversations need exact assistant/tool turn pairing, so keep them single-shot. if not supercharge_enabled or tools or tool_choice is not None: - return await self._generate_completion_once( + response = await self._generate_completion_once( task_id=task_id, role_id=role_id, model=model, @@ -546,17 +1344,24 @@ async def generate_completion( tool_choice=tool_choice, **kwargs ) - - return await self._generate_supercharged_completion( - task_id=task_id, - role_id=role_id, - model=model, - messages=messages, - temperature=temperature, - max_tokens=max_tokens, - response_format=response_format, - **kwargs - ) + else: + response = await self._generate_supercharged_completion( + task_id=task_id, + role_id=role_id, + model=model, + messages=messages, + temperature=temperature, + max_tokens=max_tokens, + response_format=response_format, + **kwargs + ) + if assistant_memory_target_hash: + assistant_proof_search_coordinator.mark_pack_consumed_by_solver( + assistant_memory_target_hash, + role_id=role_id, + task_id=task_id, + ) + return response @staticmethod def _response_text(response: Dict[str, Any]) -> str: @@ -728,10 +1533,16 @@ async def _generate_completion_once( forced_boost_mode = kwargs.pop("_moto_force_boost_mode", None) consume_boost_count = kwargs.pop("_moto_consume_boost_count", True) strict_boost = kwargs.pop("_moto_strict_boost", False) + reasoning_effort_override = kwargs.pop("_moto_reasoning_effort_override", None) requested_model = model async with self._state_lock: initial_role_config = self._role_model_configs.get(role_id) configured_provider = initial_role_config.provider if initial_role_config else None + role_reasoning_effort = ( + reasoning_effort_override + if reasoning_effort_override is not None + else (initial_role_config.openrouter_reasoning_effort if initial_role_config else None) + ) # Check if task should use boost (unified check for all boost modes) if forced_boost_mode == "__none__": @@ -779,7 +1590,11 @@ async def _generate_completion_once( ), response_format=response_format, provider=boost_provider, - reasoning_effort=boost_manager.boost_config.boost_reasoning_effort, + reasoning_effort=( + reasoning_effort_override + if reasoning_effort_override is not None + else boost_manager.boost_config.boost_reasoning_effort + ), tools=tools, tool_choice=tool_choice, ), @@ -841,7 +1656,11 @@ async def _generate_completion_once( boosted=True, boost_mode=boost_mode, openrouter_provider=boost_provider, - openrouter_reasoning_effort=boost_manager.boost_config.boost_reasoning_effort, + openrouter_reasoning_effort=( + reasoning_effort_override + if reasoning_effort_override is not None + else boost_manager.boost_config.boost_reasoning_effort + ), ) # Log the boost call @@ -1083,6 +1902,20 @@ async def _generate_completion_once( ) fallback_state = "openrouter" self._role_fallback_state[role_id] = "openrouter" + elif ( + role_config + and role_config.provider == "openai_codex_oauth" + and fallback_state == "lm_studio" + and role_id in self._oauth_cooldown_fallback_roles + and not self.is_provider_cooling_down("openai_codex_oauth") + ): + logger.info( + "OpenAI Codex cooldown expired for role '%s'; returning role to Codex OAuth.", + role_id, + ) + fallback_state = "openai_codex_oauth" + self._role_fallback_state[role_id] = "openai_codex_oauth" + self._oauth_cooldown_fallback_roles.discard(role_id) # If OpenRouter configured and not fallen back, try OpenRouter if fallback_state == "openrouter" and role_config: @@ -1143,9 +1976,10 @@ async def _generate_completion_once( max_tokens=self._effective_max_tokens(max_tokens, role_config.max_output_tokens, role_id), response_format=response_format, provider=openrouter_provider, - reasoning_effort=role_config.openrouter_reasoning_effort, + reasoning_effort=role_reasoning_effort, tools=tools, tool_choice=tool_choice, + allow_provider_auto_fallback=role_id.endswith("_assistant"), ), role_id=role_id, model=openrouter_model, @@ -1154,6 +1988,17 @@ async def _generate_completion_once( # Calculate duration and extract response duration_ms = (time.time() - start_time) * 1000 + provider_auto_fallback = None + if isinstance(result, dict): + provider_auto_fallback = result.pop("_moto_openrouter_provider_auto_fallback", None) + if provider_auto_fallback and openrouter_provider: + logger.warning( + "Clearing unavailable OpenRouter host provider '%s' for Assistant role '%s'; future calls will use Auto routing.", + redact_log_text(openrouter_provider, 120), + role_id, + ) + role_config.openrouter_provider = None + openrouter_provider = None # Check for missing choices (upstream provider timeout/error) if not result.get("choices"): @@ -1187,7 +2032,7 @@ async def _generate_completion_once( boosted=False, boost_mode=None, openrouter_provider=openrouter_provider, - openrouter_reasoning_effort=role_config.openrouter_reasoning_effort, + openrouter_reasoning_effort=role_reasoning_effort, ) # Log to autonomous API logger if callback set @@ -1254,7 +2099,7 @@ async def _generate_completion_once( temperature=temperature, max_tokens=self._effective_max_tokens(max_tokens, role_config.max_output_tokens, role_id), response_format=response_format, - reasoning_effort=role_config.openrouter_reasoning_effort, + reasoning_effort=role_reasoning_effort, tools=tools, tool_choice=tool_choice, ) @@ -1440,45 +2285,43 @@ async def _generate_completion_once( ) raise - if fallback_state == "openai_codex_oauth" and role_config: - codex_model = role_config.model_id + if fallback_state == "sakana_fugu" and role_config: + sakana_model = role_config.model_id start_time = time.time() try: - logger.debug("Role %s using OpenAI Codex OAuth: %s", role_id, codex_model) + logger.debug("Role %s using Sakana Fugu: %s", role_id, sakana_model) result = await self._with_hung_connection_watchdog( - openai_codex_client.generate_completion( - model=codex_model, + sakana_fugu_client.generate_completion( + model=sakana_model, messages=messages, temperature=temperature, max_tokens=self._effective_max_tokens(max_tokens, role_config.max_output_tokens, role_id), response_format=response_format, - reasoning_effort=role_config.openrouter_reasoning_effort, + reasoning_effort=role_reasoning_effort, tools=tools, tool_choice=tool_choice, ), role_id=role_id, - model=codex_model, - provider="OpenAI Codex", + model=sakana_model, + provider="Sakana Fugu", ) duration_ms = (time.time() - start_time) * 1000 if not result.get("choices"): logger.error( - "OpenAI Codex response missing 'choices' after %.0fms - %s", + "Sakana Fugu response missing 'choices' after %.0fms - %s", duration_ms, _response_shape_for_logging(result), ) - raise ValueError(f"OpenAI Codex response missing 'choices' after {duration_ms:.0f}ms") + raise ValueError(f"Sakana Fugu response missing 'choices' after {duration_ms:.0f}ms") - response_content = "" + response_content = extract_response_text(result, context=task_id) tokens_used = None - if result.get("choices"): - response_content = extract_response_text(result, context=task_id) if result.get("usage"): tokens_used = result["usage"].get("total_tokens") _pt = result["usage"].get("prompt_tokens") _ct = result["usage"].get("completion_tokens") if _pt is not None and _ct is not None: - token_tracker.track(codex_model, _pt, _ct) + token_tracker.track(sakana_model, _pt, _ct) await self._broadcast("token_usage_updated", token_tracker.get_stats()) result = self._annotate_response_with_call_metadata( @@ -1486,12 +2329,12 @@ async def _generate_completion_once( task_id=task_id, role_id=role_id, configured_model=requested_model, - actual_model=codex_model, + actual_model=sakana_model, configured_provider=role_config.provider, - actual_provider="openai_codex_oauth", + actual_provider="sakana_fugu", boosted=False, boost_mode=None, - openrouter_reasoning_effort=role_config.openrouter_reasoning_effort, + openrouter_reasoning_effort=role_reasoning_effort, ) if self._autonomous_logger_callback: @@ -1499,8 +2342,8 @@ async def _generate_completion_once( await self._autonomous_logger_callback( task_id=task_id, role_id=role_id, - model=codex_model, - provider="openai_codex_oauth", + model=sakana_model, + provider="sakana_fugu", prompt=full_prompt, response=response_content, tokens_used=tokens_used, @@ -1510,18 +2353,17 @@ async def _generate_completion_once( phase=self._current_autonomous_phase, ) - await self._track_model_usage(codex_model) + await self._track_model_usage(sakana_model) return result - - except OpenAICodexError as e: + except SakanaFuguError as e: duration_ms = (time.time() - start_time) * 1000 if self._autonomous_logger_callback: full_prompt = self._prompt_for_logging(messages) await self._autonomous_logger_callback( task_id=task_id, role_id=role_id, - model=codex_model, - provider="openai_codex_oauth", + model=sakana_model, + provider="sakana_fugu", prompt=full_prompt, response="", tokens_used=None, @@ -1534,19 +2376,19 @@ async def _generate_completion_once( async with self._state_lock: self._role_fallback_state[role_id] = "lm_studio" logger.warning( - "OpenAI Codex failed for role '%s'; falling back to LM Studio model %s", + "Sakana Fugu failed for role '%s'; falling back to LM Studio model %s", role_id, role_config.lm_studio_fallback_id, ) model = role_config.lm_studio_fallback_id else: - await self._broadcast_unrecoverable_codex_error( + await self._broadcast_unrecoverable_sakana_fugu_error( role_id=role_id, - model=codex_model, + model=sakana_model, error=e, ) raise RuntimeError( - f"OpenAI Codex failed for role '{role_id}' and no LM Studio fallback is configured: {e}" + f"Sakana Fugu failed for role '{role_id}' and no LM Studio fallback is configured: {e}" ) from e except Exception as e: duration_ms = (time.time() - start_time) * 1000 @@ -1555,8 +2397,8 @@ async def _generate_completion_once( await self._autonomous_logger_callback( task_id=task_id, role_id=role_id, - model=codex_model, - provider="openai_codex_oauth", + model=sakana_model, + provider="sakana_fugu", prompt=full_prompt, response="", tokens_used=None, @@ -1569,20 +2411,240 @@ async def _generate_completion_once( async with self._state_lock: self._role_fallback_state[role_id] = "lm_studio" logger.warning( - "OpenAI Codex error for role '%s': %s; falling back to LM Studio model %s", + "Sakana Fugu error for role '%s': %s; falling back to LM Studio model %s", role_id, e, role_config.lm_studio_fallback_id, ) model = role_config.lm_studio_fallback_id else: - await self._broadcast_unrecoverable_codex_error( + await self._broadcast_unrecoverable_sakana_fugu_error( role_id=role_id, - model=codex_model, + model=sakana_model, error=e, ) raise + if fallback_state == "openai_codex_oauth" and role_config: + codex_model = role_config.model_id + active_cooldown = self.get_provider_cooldown("openai_codex_oauth") + if active_cooldown: + if role_config.lm_studio_fallback_id: + async with self._state_lock: + self._role_fallback_state[role_id] = "lm_studio" + self._oauth_cooldown_fallback_roles.add(role_id) + await self._broadcast_oauth_usage_limit( + {**active_cooldown, "role_id": role_id, "model": codex_model}, + fallback_model=role_config.lm_studio_fallback_id, + ) + logger.warning( + "OpenAI Codex cooldown active for role '%s'; using LM Studio fallback model %s until reset", + role_id, + role_config.lm_studio_fallback_id, + ) + model = role_config.lm_studio_fallback_id + else: + await self._broadcast_oauth_usage_limit({**active_cooldown, "role_id": role_id, "model": codex_model}) + raise OAuthProviderCooldownError( + provider="openai_codex_oauth", + provider_label="OpenAI Codex", + role_id=role_id, + model=codex_model, + resets_at=active_cooldown.get("cooldown_until") or active_cooldown.get("resets_at"), + resets_in_seconds=active_cooldown.get("resets_in_seconds"), + plan_type=str(active_cooldown.get("plan_type") or ""), + message=str(active_cooldown.get("message") or ""), + ) + start_time = time.time() + use_codex = not ( + model == role_config.lm_studio_fallback_id + and self._role_fallback_state.get(role_id) == "lm_studio" + ) + if use_codex: + try: + logger.debug("Role %s using OpenAI Codex OAuth: %s", role_id, codex_model) + result = await self._with_hung_connection_watchdog( + openai_codex_client.generate_completion( + model=codex_model, + messages=messages, + temperature=temperature, + max_tokens=self._effective_max_tokens(max_tokens, role_config.max_output_tokens, role_id), + response_format=response_format, + reasoning_effort=role_reasoning_effort, + tools=tools, + tool_choice=tool_choice, + ), + role_id=role_id, + model=codex_model, + provider="OpenAI Codex", + ) + duration_ms = (time.time() - start_time) * 1000 + if not result.get("choices"): + logger.error( + "OpenAI Codex response missing 'choices' after %.0fms - %s", + duration_ms, + _response_shape_for_logging(result), + ) + raise ValueError(f"OpenAI Codex response missing 'choices' after {duration_ms:.0f}ms") + + response_content = "" + tokens_used = None + if result.get("choices"): + response_content = extract_response_text(result, context=task_id) + if result.get("usage"): + tokens_used = result["usage"].get("total_tokens") + _pt = result["usage"].get("prompt_tokens") + _ct = result["usage"].get("completion_tokens") + if _pt is not None and _ct is not None: + token_tracker.track(codex_model, _pt, _ct) + await self._broadcast("token_usage_updated", token_tracker.get_stats()) + + result = self._annotate_response_with_call_metadata( + result, + task_id=task_id, + role_id=role_id, + configured_model=requested_model, + actual_model=codex_model, + configured_provider=role_config.provider, + actual_provider="openai_codex_oauth", + boosted=False, + boost_mode=None, + openrouter_reasoning_effort=role_reasoning_effort, + ) + + if self._autonomous_logger_callback: + full_prompt = self._prompt_for_logging(messages) + await self._autonomous_logger_callback( + task_id=task_id, + role_id=role_id, + model=codex_model, + provider="openai_codex_oauth", + prompt=full_prompt, + response=response_content, + tokens_used=tokens_used, + duration_ms=duration_ms, + success=True, + error=None, + phase=self._current_autonomous_phase, + ) + + await self._track_model_usage(codex_model) + return result + + except OAuthUsageLimitError as e: + duration_ms = (time.time() - start_time) * 1000 + if self._autonomous_logger_callback: + full_prompt = self._prompt_for_logging(messages) + await self._autonomous_logger_callback( + task_id=task_id, + role_id=role_id, + model=codex_model, + provider="openai_codex_oauth", + prompt=full_prompt, + response="", + tokens_used=None, + duration_ms=duration_ms, + success=False, + error=str(e), + phase=self._current_autonomous_phase, + ) + cooldown_payload = self._mark_oauth_provider_cooldown(e, role_id=role_id, model=codex_model) + if role_config.lm_studio_fallback_id: + async with self._state_lock: + self._role_fallback_state[role_id] = "lm_studio" + self._oauth_cooldown_fallback_roles.add(role_id) + await self._broadcast_oauth_usage_limit( + cooldown_payload, + fallback_model=role_config.lm_studio_fallback_id, + ) + logger.warning( + "OpenAI Codex usage limit reached for role '%s'; falling back to LM Studio model %s until provider reset", + role_id, + role_config.lm_studio_fallback_id, + ) + model = role_config.lm_studio_fallback_id + else: + await self._broadcast_oauth_usage_limit(cooldown_payload) + raise OAuthProviderCooldownError( + provider=e.provider, + provider_label=e.provider_label, + role_id=role_id, + model=codex_model, + resets_at=cooldown_payload.get("cooldown_until") or e.resets_at, + resets_in_seconds=cooldown_payload.get("resets_in_seconds") or e.resets_in_seconds, + plan_type=e.plan_type, + message=str(cooldown_payload.get("message") or str(e)), + ) from e + except OpenAICodexError as e: + duration_ms = (time.time() - start_time) * 1000 + if self._autonomous_logger_callback: + full_prompt = self._prompt_for_logging(messages) + await self._autonomous_logger_callback( + task_id=task_id, + role_id=role_id, + model=codex_model, + provider="openai_codex_oauth", + prompt=full_prompt, + response="", + tokens_used=None, + duration_ms=duration_ms, + success=False, + error=str(e), + phase=self._current_autonomous_phase, + ) + if role_config.lm_studio_fallback_id: + async with self._state_lock: + self._role_fallback_state[role_id] = "lm_studio" + logger.warning( + "OpenAI Codex failed for role '%s'; falling back to LM Studio model %s", + role_id, + role_config.lm_studio_fallback_id, + ) + model = role_config.lm_studio_fallback_id + else: + await self._broadcast_unrecoverable_codex_error( + role_id=role_id, + model=codex_model, + error=e, + ) + raise RuntimeError( + f"OpenAI Codex failed for role '{role_id}' and no LM Studio fallback is configured: {e}" + ) from e + except Exception as e: + duration_ms = (time.time() - start_time) * 1000 + if self._autonomous_logger_callback: + full_prompt = self._prompt_for_logging(messages) + await self._autonomous_logger_callback( + task_id=task_id, + role_id=role_id, + model=codex_model, + provider="openai_codex_oauth", + prompt=full_prompt, + response="", + tokens_used=None, + duration_ms=duration_ms, + success=False, + error=str(e), + phase=self._current_autonomous_phase, + ) + if role_config.lm_studio_fallback_id: + async with self._state_lock: + self._role_fallback_state[role_id] = "lm_studio" + logger.warning( + "OpenAI Codex error for role '%s': %s; falling back to LM Studio model %s", + role_id, + e, + role_config.lm_studio_fallback_id, + ) + model = role_config.lm_studio_fallback_id + else: + await self._broadcast_unrecoverable_codex_error( + role_id=role_id, + model=codex_model, + error=e, + ) + raise + if fallback_state == "xai_grok_oauth" and role_config: xai_model = role_config.model_id start_time = time.time() @@ -1595,7 +2657,7 @@ async def _generate_completion_once( temperature=temperature, max_tokens=self._effective_max_tokens(max_tokens, role_config.max_output_tokens, role_id), response_format=response_format, - reasoning_effort=role_config.openrouter_reasoning_effort, + reasoning_effort=role_reasoning_effort, tools=tools, tool_choice=tool_choice, ), @@ -1634,7 +2696,7 @@ async def _generate_completion_once( actual_provider="xai_grok_oauth", boosted=False, boost_mode=None, - openrouter_reasoning_effort=role_config.openrouter_reasoning_effort, + openrouter_reasoning_effort=role_reasoning_effort, ) if self._autonomous_logger_callback: diff --git a/backend/shared/boost_manager.py b/backend/shared/boost_manager.py index f38a94a..1fb39ce 100644 --- a/backend/shared/boost_manager.py +++ b/backend/shared/boost_manager.py @@ -12,7 +12,7 @@ Certainty Assessor, Format Selector, Volume Organizer → agg_sub1 (Submitter 1) - Topic Validator, Redundancy Checker → agg_val (Agg Validator) - Brainstorm aggregation submitters/validator → agg_sub1..10, agg_val (via Coordinator) -- Paper compilation → comp_hc, comp_hp, comp_val, comp_crit (critique_* task IDs alias to comp_crit) +- Paper compilation → comp_writer, comp_hp, comp_val (critique generation aliases to comp_hp) - LeanOJ path-decision calls use `leanoj_path_*` task IDs for workflow display, but belong to the Final Solver boost category (`leanoj_final`) because that role owns final-readiness decisions. @@ -47,10 +47,9 @@ "agg_sub10": "Submitter 10", "agg_val": "Agg Validator", # Compiler - "comp_hc": "High-Context Model", - "comp_hp": "High-Param Model", + "comp_writer": "Writing Submitter", + "comp_hp": "Rigor & Proofs Submitter", "comp_val": "Compiler Validator", - "comp_crit": "Critique Submitter", # LeanOJ "leanoj_topic": "Proof Solver Topic Generator", "leanoj_topic_val": "Proof Solver Topic Validator", @@ -81,12 +80,14 @@ } CATEGORY_ALIASES = { + "_".join(("comp", "hc")): "comp_writer", # Path decisions are absorbed into the dominant Final Solver role. "leanoj_path": "leanoj_final", - # Critique phase has legacy task IDs but one user-facing category. - "critique_val": "comp_crit", - "critique_cleanup": "comp_crit", - **{f"critique_sub{i}": "comp_crit" for i in range(1, 11)}, + # Critique generation is owned by Rigor & Proofs; validation/cleanup stay on Validator. + "comp_crit": "comp_hp", + "critique_val": "comp_val", + "critique_cleanup": "comp_val", + **{f"critique_sub{i}": "comp_hp" for i in range(1, 11)}, } @@ -123,7 +124,7 @@ def __init__(self): # Counter-based boost mode self.boost_next_count: int = 0 - # Category-based boost mode (role prefixes like "agg_sub1", "comp_hc") + # Category-based boost mode (role prefixes like "agg_sub1", "comp_writer") self.boosted_categories: Set[str] = set() # Always-prefer boost mode: try boost for every call, fall back on failure @@ -140,6 +141,20 @@ def __init__(self): def _get_state_file() -> str: """Return the instance-scoped boost state file.""" return str(os.path.join(system_config.data_dir, "boost_state.json")) + + @staticmethod + def _canonical_category(category: str) -> str: + """Map absorbed/legacy category prefixes to their owning role category.""" + return CATEGORY_ALIASES.get(category, category) + + @classmethod + def _canonical_task_id(cls, task_id: str) -> str: + """Map exact task IDs with legacy role prefixes to current task IDs.""" + parts = task_id.rsplit("_", 1) + if len(parts) != 2: + return cls._canonical_category(task_id) + prefix, sequence = parts + return f"{cls._canonical_category(prefix)}_{sequence}" def _load_state(self) -> None: """Load persisted boost state from disk.""" @@ -171,7 +186,10 @@ def _load_state(self) -> None: for category in state.get('boosted_categories', []) } self.boost_always_prefer = state.get('boost_always_prefer', False) - self.boosted_task_ids = set(state.get('boosted_task_ids', [])) + self.boosted_task_ids = { + self._canonical_task_id(task_id) + for task_id in state.get('boosted_task_ids', []) + } logger.info( "Loaded boost state: enabled=%s, model=%s, next_count=%s, categories=%s, always_prefer=%s", @@ -283,6 +301,7 @@ async def toggle_task_boost(self, task_id: str) -> bool: Returns: True if task is now boosted, False if unboosted """ + task_id = self._canonical_task_id(task_id) async with self._lock: if task_id in self.boosted_task_ids: self.boosted_task_ids.remove(task_id) @@ -316,7 +335,7 @@ def is_task_boosted(self, task_id: str) -> bool: return ( self.boost_config is not None and self.boost_config.enabled and - task_id in self.boosted_task_ids + self._canonical_task_id(task_id) in self.boosted_task_ids ) async def set_boost_next_count(self, count: int) -> None: @@ -364,7 +383,7 @@ async def toggle_category_boost(self, category: str) -> bool: Toggle boost for an entire category (role prefix). Args: - category: Category prefix (e.g., "agg_sub1", "comp_hc", "agg_val") + category: Category prefix (e.g., "agg_sub1", "comp_writer", "agg_val") Returns: True if category is now boosted, False if unboosted @@ -391,18 +410,13 @@ async def toggle_category_boost(self, category: str) -> bool: return boosted - @staticmethod - def _canonical_category(category: str) -> str: - """Map absorbed/legacy category prefixes to their owning role category.""" - return CATEGORY_ALIASES.get(category, category) - def _extract_role_prefix(self, task_id: str) -> str: """ Extract role prefix from task ID. Examples: "agg_sub1_001" -> "agg_sub1" - "comp_hc_005" -> "comp_hc" + "comp_writer_005" -> "comp_writer" "auto_ts_002" -> "auto_ts" """ # Split on last underscore and take everything before it @@ -445,7 +459,7 @@ def should_use_boost(self, task_id: str) -> bool: return True # Check exact task ID (legacy per-task mode) - if task_id in self.boosted_task_ids: + if self._canonical_task_id(task_id) in self.boosted_task_ids: return True return False @@ -540,12 +554,11 @@ def get_available_categories(self, mode: str = "all") -> List[Dict[str, str]]: "group": "Aggregator" }) - # Compiler (matches CompilerSettings order: Validator, High-Context, High-Param, Critique) + # Compiler (matches CompilerSettings order: Validator, Writing Submitter, Rigor & Proofs) categories.extend([ {"id": "comp_val", "label": "Compiler Validator", "group": "Compiler"}, - {"id": "comp_hc", "label": "High-Context Model", "group": "Compiler"}, - {"id": "comp_hp", "label": "High-Param Model", "group": "Compiler"}, - {"id": "comp_crit", "label": "Critique Submitter", "group": "Compiler"}, + {"id": "comp_writer", "label": "Writing Submitter", "group": "Compiler"}, + {"id": "comp_hp", "label": "Rigor & Proofs Submitter", "group": "Compiler"}, ]) categories.extend([ @@ -578,7 +591,7 @@ def is_role_boosted(self, role_prefix: str) -> bool: For example, role_prefix="agg_sub1" matches "agg_sub1_001". Args: - role_prefix: Role prefix (e.g., "agg_sub1", "comp_hc", "auto_ts") + role_prefix: Role prefix (e.g., "agg_sub1", "comp_writer", "auto_ts") Returns: True if any task for this role is boosted @@ -586,8 +599,9 @@ def is_role_boosted(self, role_prefix: str) -> bool: if not self.boost_config or not self.boost_config.enabled: return False + role_prefix = self._canonical_category(role_prefix) for task_id in self.boosted_task_ids: - if task_id.startswith(role_prefix): + if self._canonical_task_id(task_id).startswith(role_prefix): return True return False @@ -604,7 +618,7 @@ def get_boosted_roles(self) -> set: # e.g., "agg_sub1_001" -> "agg_sub1" parts = task_id.rsplit('_', 1) if len(parts) == 2: - roles.add(parts[0]) + roles.add(self._canonical_category(parts[0])) return roles def get_next_boosted_task_for_role(self, role_prefix: str) -> Optional[str]: @@ -612,7 +626,7 @@ def get_next_boosted_task_for_role(self, role_prefix: str) -> Optional[str]: Get the next boosted task ID for a role prefix. Args: - role_prefix: Role prefix (e.g., "agg_sub1", "comp_hc") + role_prefix: Role prefix (e.g., "agg_sub1", "comp_writer") Returns: Task ID if found, None otherwise @@ -621,9 +635,10 @@ def get_next_boosted_task_for_role(self, role_prefix: str) -> Optional[str]: return None # Find all matching tasks and return the one with lowest sequence number + role_prefix = self._canonical_category(role_prefix) matching_tasks = [ - task_id for task_id in self.boosted_task_ids - if task_id.startswith(role_prefix) + self._canonical_task_id(task_id) for task_id in self.boosted_task_ids + if self._canonical_task_id(task_id).startswith(role_prefix) ] if not matching_tasks: diff --git a/backend/shared/brainstorm_proof_gate.py b/backend/shared/brainstorm_proof_gate.py index dc2fea2..7c52ce8 100644 --- a/backend/shared/brainstorm_proof_gate.py +++ b/backend/shared/brainstorm_proof_gate.py @@ -24,8 +24,6 @@ "novel_variant", "novel_formulation", } - - @dataclass class BrainstormProofGateResult: """Result of checking a proof candidate before normal brainstorm validation.""" @@ -98,6 +96,44 @@ def _format_lean_feedback(lean_result: Any) -> str: return "\n\n".join(parts).strip() or "Lean 4 accepted with no diagnostics." +def _candidate_metadata_rejection( + *, + expected_novelty_tier: str, + prompt_relevance_rationale: str, + novelty_rationale: str, + why_not_standard_known_result: str, +) -> str: + """Return a rejection reason when a proof target is not eligible for Lean cost.""" + normalized_tier = (expected_novelty_tier or "").strip().lower() + if normalized_tier == "not_novel": + return ( + "Lean proof candidate rejected before Lean cost: `expected_novelty_tier` was " + "`not_novel`. Supporting, routine, trivial, local, or known results must not " + "be submitted as proof targets." + ) + if normalized_tier not in NOVEL_PROOF_TIERS: + return ( + "Lean proof candidate rejected before Lean cost: missing or invalid " + "`expected_novelty_tier`. The proof route is only for high-impact " + "prompt-solving theorem targets." + ) + missing = [ + field_name + for field_name, value in ( + ("prompt_relevance_rationale", prompt_relevance_rationale), + ("novelty_rationale", novelty_rationale), + ("why_not_standard_known_result", why_not_standard_known_result), + ) + if not (value or "").strip() + ] + if missing: + return ( + "Lean proof candidate rejected before Lean cost: missing required " + f"novelty/prompt-impact rationale field(s): {', '.join(missing)}." + ) + return "" + + def _build_retry_prompt( *, user_prompt: str, @@ -149,15 +185,14 @@ def _build_retry_prompt( Respond with ONLY valid JSON: {{ "theorem_name": "Lean declaration name, if named", - "theorem_statement": "natural-language theorem statement being proved", - "formal_sketch": "updated formalization notes", - "expected_novelty_tier": "{expected_novelty_tier}", - "prompt_relevance_rationale": "{prompt_relevance_rationale}", - "novelty_rationale": "{novelty_rationale}", - "why_not_standard_known_result": "{why_not_standard_known_result}", "lean_code": "complete Lean 4 code", - "reasoning": "brief explanation of the repair" + "reasoning": "brief explanation of how this repairs the SAME intended high-impact theorem target without narrowing, weakening, or substituting a supporting lemma" }} + +Do not change the intended theorem statement, novelty tier, prompt relevance rationale, +novelty rationale, or anti-standard-result rationale. The repair budget is for +fixing the Lean proof of the same high-impact target, not for replacing it with a +narrower, easier, routine, trivial, local, or merely supporting lemma. """ @@ -251,8 +286,27 @@ async def verify_brainstorm_proof_candidate( ), attempts=[], ) - if expected_novelty_tier not in NOVEL_PROOF_TIERS: - expected_novelty_tier = expected_novelty_tier or "not_novel" + metadata_rejection = _candidate_metadata_rejection( + expected_novelty_tier=expected_novelty_tier, + prompt_relevance_rationale=prompt_relevance_rationale, + novelty_rationale=novelty_rationale, + why_not_standard_known_result=why_not_standard_known_result, + ) + if metadata_rejection: + return BrainstormProofGateResult( + accepted=False, + theorem_statement=theorem_statement, + theorem_name=theorem_name, + formal_sketch=formal_sketch, + expected_novelty_tier=expected_novelty_tier or "not_novel", + prompt_relevance_rationale=prompt_relevance_rationale, + novelty_rationale=novelty_rationale, + why_not_standard_known_result=why_not_standard_known_result, + lean_code=lean_code, + reasoning=reasoning, + failure_feedback=metadata_rejection, + attempts=[], + ) attempts: list[ProofAttemptFeedback] = [] current = { @@ -280,6 +334,28 @@ async def verify_brainstorm_proof_candidate( lean_code = str(current.get("lean_code") or "").strip() reasoning = str(current.get("reasoning") or reasoning).strip() + metadata_rejection = _candidate_metadata_rejection( + expected_novelty_tier=expected_novelty_tier, + prompt_relevance_rationale=prompt_relevance_rationale, + novelty_rationale=novelty_rationale, + why_not_standard_known_result=why_not_standard_known_result, + ) + if metadata_rejection: + return BrainstormProofGateResult( + accepted=False, + theorem_statement=theorem_statement, + theorem_name=theorem_name, + formal_sketch=formal_sketch, + expected_novelty_tier=expected_novelty_tier or "not_novel", + prompt_relevance_rationale=prompt_relevance_rationale, + novelty_rationale=novelty_rationale, + why_not_standard_known_result=why_not_standard_known_result, + lean_code=lean_code, + reasoning=reasoning, + attempts=list(attempts), + failure_feedback=metadata_rejection, + ) + lean_result = await get_lean4_client().check_proof( lean_code, timeout=system_config.lean4_proof_timeout, @@ -403,21 +479,13 @@ async def verify_brainstorm_proof_candidate( if not isinstance(repaired, dict): raise ValueError("Proof repair response was not a JSON object.") current = { - "theorem_statement": str(repaired.get("theorem_statement") or theorem_statement).strip(), - "formal_sketch": str(repaired.get("formal_sketch") or formal_sketch).strip(), + "theorem_statement": theorem_statement, + "formal_sketch": formal_sketch, "theorem_name": str(repaired.get("theorem_name") or theorem_name).strip(), - "expected_novelty_tier": str( - repaired.get("expected_novelty_tier") or expected_novelty_tier - ).strip(), - "prompt_relevance_rationale": str( - repaired.get("prompt_relevance_rationale") or prompt_relevance_rationale - ).strip(), - "novelty_rationale": str( - repaired.get("novelty_rationale") or novelty_rationale - ).strip(), - "why_not_standard_known_result": str( - repaired.get("why_not_standard_known_result") or why_not_standard_known_result - ).strip(), + "expected_novelty_tier": expected_novelty_tier, + "prompt_relevance_rationale": prompt_relevance_rationale, + "novelty_rationale": novelty_rationale, + "why_not_standard_known_result": why_not_standard_known_result, "lean_code": str(repaired.get("lean_code") or "").strip(), "reasoning": str(repaired.get("reasoning") or "").strip(), } diff --git a/backend/shared/build_info.py b/backend/shared/build_info.py index 0532005..1fa7ab7 100644 --- a/backend/shared/build_info.py +++ b/backend/shared/build_info.py @@ -25,7 +25,7 @@ "version": "0.0.0-dev", "build_commit": "dev", "update_channel": "main", - "api_contract_version": "build5-v30", + "api_contract_version": "build5-v53", } _ENV_OVERRIDES = { diff --git a/backend/shared/config.py b/backend/shared/config.py index 07ff150..ddc806b 100644 --- a/backend/shared/config.py +++ b/backend/shared/config.py @@ -211,30 +211,30 @@ class SystemConfig(BaseSettings): # Compiler settings (Phase 2). Set from explicit user/provider settings at runtime. compiler_validator_context_window: int = 0 - compiler_high_context_context_window: int = 0 - compiler_high_param_context_window: int = 0 - compiler_critique_submitter_context_window: int = 0 + compiler_writer_context_window: int = 0 + compiler_high_param_context_window: int = 0 # Rigor & Proofs submitter + compiler_critique_submitter_context_window: int = 0 # Deprecated alias mirrored from Rigor & Proofs # Compiler output token limits (user-configurable) compiler_validator_max_output_tokens: int = 0 - compiler_high_context_max_output_tokens: int = 0 - compiler_high_param_max_output_tokens: int = 0 - compiler_critique_submitter_max_tokens: int = 0 + compiler_writer_max_output_tokens: int = 0 + compiler_high_param_max_output_tokens: int = 0 # Rigor & Proofs submitter + compiler_critique_submitter_max_tokens: int = 0 # Deprecated alias mirrored from Rigor & Proofs # Compiler model selections (set at runtime by API) - compiler_critique_submitter_model: str = "" # Set by user in GUI + compiler_critique_submitter_model: str = "" # Deprecated alias mirrored from Rigor & Proofs # Autonomous Research settings (Part 3) # Context windows (separate for each role, set from user settings) autonomous_submitter_context_window: int = 0 autonomous_validator_context_window: int = 0 - autonomous_high_context_context_window: int = 0 + autonomous_writer_context_window: int = 0 autonomous_high_param_context_window: int = 0 # Autonomous output token limits (user-configurable) autonomous_submitter_max_tokens: int = 0 autonomous_validator_max_tokens: int = 0 - autonomous_high_context_max_tokens: int = 0 + autonomous_writer_max_tokens: int = 0 autonomous_high_param_max_tokens: int = 0 # Autonomous workflow settings @@ -247,6 +247,11 @@ class SystemConfig(BaseSettings): wolfram_alpha_enabled: bool = False wolfram_alpha_api_key: Optional[str] = None + # Optional proof-search and memory connectivity toggles. These are + # user-facing runtime switches, not credential/snapshot deletion flags. + syntheticlib4_enabled: bool = True + agent_conversation_memory_enabled: bool = True + # Lean 4 proof verification integration (optional) lean4_enabled: bool = Field( default=False, diff --git a/backend/shared/context_overflow.py b/backend/shared/context_overflow.py new file mode 100644 index 0000000..6c8aee9 --- /dev/null +++ b/backend/shared/context_overflow.py @@ -0,0 +1,12 @@ +"""Shared user-facing context overflow messages.""" + +CONTEXT_OVERFLOW_STOP_REASON = "context_overflow" +CONTEXT_OVERFLOW_STOP_MESSAGE = ( + "Research stopped. Some required source content must be injected directly to preserve " + "answer quality, and it reached the maximum context size for the selected model. " + "Start a new session with a condensed prompt, or choose a model with a higher " + "context limit." +) +CONTEXT_OVERFLOW_RESOLUTION = ( + "Start a new session with a condensed prompt, or choose a model with a higher context limit." +) diff --git a/backend/shared/embedding_readiness.py b/backend/shared/embedding_readiness.py index 0567624..ce1ee5f 100644 --- a/backend/shared/embedding_readiness.py +++ b/backend/shared/embedding_readiness.py @@ -3,6 +3,7 @@ import asyncio import logging +import time from typing import Any from backend.shared.config import rag_config, system_config @@ -10,6 +11,11 @@ logger = logging.getLogger(__name__) +_LM_STUDIO_EMBEDDING_STATUS_CACHE_TTL_SECONDS = 60.0 +_lm_studio_embedding_status_cache: dict[str, Any] | None = None +_lm_studio_embedding_status_cache_at = 0.0 +_lm_studio_embedding_status_lock = asyncio.Lock() + EMBEDDING_PROVIDER_UNAVAILABLE_MESSAGE = ( "RAG embeddings are unavailable. Configure an OpenRouter API key or run LM Studio " "with the nomic-ai/nomic-embed-text-v1.5 embedding model loaded. OAuth providers " @@ -17,33 +23,67 @@ ) -async def check_lm_studio_embedding_ready(timeout_seconds: float = 10.0) -> dict[str, Any]: +async def check_lm_studio_embedding_ready( + timeout_seconds: float = 10.0, + *, + force_refresh: bool = False, +) -> dict[str, Any]: """Return whether LM Studio can serve the configured embedding model.""" - try: - embeddings = await asyncio.wait_for( - lm_studio_client.get_embeddings( - ["MOTO embedding readiness check"], - rag_config.embedding_model, - ), - timeout=timeout_seconds, - ) - if embeddings and embeddings[0]: - return { - "ready": True, - "provider": "lm_studio", - "message": "LM Studio embeddings are available.", - } - except Exception as exc: - logger.info("LM Studio embedding readiness check failed: %s", exc) + global _lm_studio_embedding_status_cache, _lm_studio_embedding_status_cache_at - return { - "ready": False, - "provider": "lm_studio", - "message": EMBEDDING_PROVIDER_UNAVAILABLE_MESSAGE, - } + now = time.monotonic() + if ( + not force_refresh + and _lm_studio_embedding_status_cache is not None + and now - _lm_studio_embedding_status_cache_at < _LM_STUDIO_EMBEDDING_STATUS_CACHE_TTL_SECONDS + ): + return dict(_lm_studio_embedding_status_cache) + + async with _lm_studio_embedding_status_lock: + now = time.monotonic() + if ( + not force_refresh + and _lm_studio_embedding_status_cache is not None + and now - _lm_studio_embedding_status_cache_at < _LM_STUDIO_EMBEDDING_STATUS_CACHE_TTL_SECONDS + ): + return dict(_lm_studio_embedding_status_cache) + + try: + embeddings = await asyncio.wait_for( + lm_studio_client.get_embeddings( + ["MOTO embedding readiness check"], + rag_config.embedding_model, + quiet=True, + ), + timeout=timeout_seconds, + ) + if embeddings and embeddings[0]: + status = { + "ready": True, + "provider": "lm_studio", + "message": "LM Studio embeddings are available.", + } + _lm_studio_embedding_status_cache = status + _lm_studio_embedding_status_cache_at = time.monotonic() + return dict(status) + except Exception as exc: + logger.info("LM Studio embedding readiness check failed: %s", exc) + + status = { + "ready": False, + "provider": "lm_studio", + "message": EMBEDDING_PROVIDER_UNAVAILABLE_MESSAGE, + } + _lm_studio_embedding_status_cache = status + _lm_studio_embedding_status_cache_at = time.monotonic() + return dict(status) -async def check_embedding_provider_ready(timeout_seconds: float = 10.0) -> dict[str, Any]: +async def check_embedding_provider_ready( + timeout_seconds: float = 10.0, + *, + force_refresh: bool = False, +) -> dict[str, Any]: """Return whether at least one embedding provider is available for RAG.""" if system_config.generic_mode: return { @@ -59,7 +99,10 @@ async def check_embedding_provider_ready(timeout_seconds: float = 10.0) -> dict[ "message": "OpenRouter API key is configured for embedding fallback.", } - lm_status = await check_lm_studio_embedding_ready(timeout_seconds=timeout_seconds) + lm_status = await check_lm_studio_embedding_ready( + timeout_seconds=timeout_seconds, + force_refresh=force_refresh, + ) if lm_status.get("ready"): return lm_status @@ -72,6 +115,6 @@ async def check_embedding_provider_ready(timeout_seconds: float = 10.0) -> dict[ async def require_embedding_provider_ready() -> None: """Raise ValueError with user-facing guidance if RAG embeddings are unavailable.""" - status = await check_embedding_provider_ready() + status = await check_embedding_provider_ready(force_refresh=True) if not status.get("ready"): raise ValueError(str(status.get("message") or EMBEDDING_PROVIDER_UNAVAILABLE_MESSAGE)) diff --git a/backend/shared/json_parser.py b/backend/shared/json_parser.py index 9073373..4884a77 100644 --- a/backend/shared/json_parser.py +++ b/backend/shared/json_parser.py @@ -380,25 +380,15 @@ def sanitize_json_response(raw_content: str) -> str: # Some models emit these BEFORE the JSON, some WITHIN the content # Strategy: Remove ALL control token patterns using regex - # Pattern for control tokens: <|word|> or <|word|>word (with optional trailing word) - control_token_pattern = r'<\|[a-zA-Z_]+\|>(?:[a-zA-Z_]+\s*)?' - - if re.search(control_token_pattern, content): - original_content = content - content = re.sub(control_token_pattern, '', content).strip() + original_content = content + content = _strip_control_tokens_outside_json_strings(content).strip() + if content != original_content: logger.debug( "Stripped control tokens: before=(%s), after=(%s)", _content_diagnostics(original_content), _content_diagnostics(content), ) - # Additional cleanup: Remove any remaining angle bracket artifacts - # that might be partial control tokens - if '<|' in content: - # Remove any remaining <|...> patterns - content = re.sub(r'<\|[^>]*\|>', '', content).strip() - logger.debug("Removed remaining control token artifacts") - # STEP 4: Extract only the first complete JSON object if multiple exist # Some models (especially reasoning models) may output multiple JSON objects # We only want the first valid one diff --git a/backend/shared/lean4_client.py b/backend/shared/lean4_client.py index 3369e6a..b363cf4 100644 --- a/backend/shared/lean4_client.py +++ b/backend/shared/lean4_client.py @@ -217,9 +217,10 @@ def _format_placeholder_rejection(token_name: str, *, from_lean_diagnostic: bool f"{reason}\n" "Required fix: produce a Lean 4 proof that closes every goal without " "using `sorry`, `admit`, unresolved `axiom` stubs introduced solely to " - "trivialize the target theorem, or any other placeholder. If the result " - "cannot be proved yet, return a narrower lemma that you can fully " - "prove instead." + "trivialize the target theorem, or any other placeholder. If the target " + "cannot be proved yet, do not replace it with a narrower, easier, " + "routine, trivial, local, or merely supporting lemma; keep attacking " + "the same high-impact target so Lean feedback exposes the real blocker." ) diff --git a/backend/shared/lm_studio_client.py b/backend/shared/lm_studio_client.py index b4ea954..672c1d6 100644 --- a/backend/shared/lm_studio_client.py +++ b/backend/shared/lm_studio_client.py @@ -167,7 +167,7 @@ async def list_models(self) -> List[Dict[str, Any]]: data = response.json() return data.get("data", []) except Exception as e: - logger.error("Failed to list models: %s", redact_log_text(e, 240)) + logger.debug("Failed to list LM Studio models: %s", redact_log_text(e, 240)) return [] async def get_loaded_models(self) -> List[str]: @@ -585,7 +585,7 @@ async def _execute_completion_request( raise RuntimeError("Completion generation failed after all retries") - async def get_embeddings(self, texts: List[str], model: str = None) -> List[List[float]]: + async def get_embeddings(self, texts: List[str], model: str = None, quiet: bool = False) -> List[List[float]]: """ Get embeddings using LM Studio API with rate limiting. Optimized with batching, retry logic, and performance metrics. @@ -614,8 +614,9 @@ async def get_embeddings(self, texts: List[str], model: str = None) -> List[List # Retry logic for transient failures batch_embeddings = await self._get_embeddings_with_retry( - batch_texts, - embedding_model + batch_texts, + embedding_model, + quiet=quiet, ) all_embeddings.extend(batch_embeddings) @@ -631,7 +632,8 @@ async def get_embeddings(self, texts: List[str], model: str = None) -> List[List except Exception as e: elapsed = time.time() - start_time - logger.error( + log_method = logger.debug if quiet else logger.error + log_method( f"Failed to get embeddings after {elapsed:.2f}s " f"({len(texts)} texts): {e}" ) @@ -640,7 +642,8 @@ async def get_embeddings(self, texts: List[str], model: str = None) -> List[List async def _get_embeddings_with_retry( self, texts: List[str], - model: str + model: str, + quiet: bool = False, ) -> List[List[float]]: """Get embeddings with retry logic for transient failures.""" for attempt in range(1, self.MAX_RETRIES + 1): @@ -666,18 +669,19 @@ async def _get_embeddings_with_retry( except (httpx.TimeoutException, httpx.NetworkError, httpx.HTTPStatusError) as e: if attempt < self.MAX_RETRIES: - logger.warning( + log_method = logger.debug if quiet else logger.warning + log_method( f"Embedding attempt {attempt}/{self.MAX_RETRIES} failed: {e}. " f"Retrying in {self.RETRY_DELAY}s..." ) await asyncio.sleep(self.RETRY_DELAY) else: - logger.error( - f"Embedding failed after {self.MAX_RETRIES} attempts: {e}" - ) + log_method = logger.debug if quiet else logger.error + log_method(f"Embedding failed after {self.MAX_RETRIES} attempts: {e}") raise except Exception as e: - logger.error(f"Embedding failed with unexpected error: {e}") + log_method = logger.debug if quiet else logger.error + log_method(f"Embedding failed with unexpected error: {e}") raise raise RuntimeError("Embedding retry loop exhausted without returning or raising") @@ -686,9 +690,15 @@ async def test_connection(self) -> bool: try: # Hard cap the startup probe so a LM Studio process that bound the # port but never responds cannot stall the FastAPI lifespan. - models = await asyncio.wait_for(self.list_models(), timeout=5.0) - logger.info(f"Successfully connected to LM Studio. Found {len(models)} models.") - return True + result = await asyncio.wait_for(self.check_availability(), timeout=5.0) + if result.get("available"): + logger.info( + "Successfully connected to LM Studio. Found %s models.", + result.get("model_count", 0), + ) + return True + logger.info("LM Studio is unavailable: %s", result.get("error") or "unknown error") + return False except asyncio.TimeoutError: logger.warning("LM Studio startup probe timed out after 5s; treating as unavailable.") return False @@ -758,11 +768,11 @@ async def check_availability(self, include_cli_models: bool = False) -> Dict[str except httpx.ConnectError: result["error"] = "Cannot connect to LM Studio server. Please ensure LM Studio is running." - logger.warning(f"LM Studio availability check failed: {result['error']}") + logger.debug(f"LM Studio availability check failed: {result['error']}") return result except httpx.TimeoutException: result["error"] = "Connection to LM Studio timed out." - logger.warning(f"LM Studio availability check failed: {result['error']}") + logger.info(f"LM Studio availability check failed: {result['error']}") return result except Exception as e: result["error"] = f"Error checking LM Studio availability: {str(e)}" diff --git a/backend/shared/model_error_utils.py b/backend/shared/model_error_utils.py index c9d31e5..093ab34 100644 --- a/backend/shared/model_error_utils.py +++ b/backend/shared/model_error_utils.py @@ -29,6 +29,7 @@ ) _TRANSIENT_MODEL_CALL_MARKERS = ( + "an error occurred while processing your request", "bad gateway", "codex connection failed", "connecterror", @@ -41,12 +42,14 @@ "peer closed connection", "readerror", "remoteprotocolerror", + "server_error", "service unavailable", "temporarily unavailable", "upstream connect error", "upstream provider timeout", "xai grok connection failed", "xai grok transient", + "you can retry your request", ) _TRANSIENT_MODEL_PROVIDER_MARKERS = ( @@ -55,6 +58,20 @@ "xai", ) +_TRANSIENT_PROVIDER_ERROR_PREFIX = "TRANSIENT PROVIDER ERROR" + + +def format_transient_provider_error(exc: Exception) -> str: + """Return a checkpoint-preserving transient provider error message.""" + message = str(exc or "").strip() + if _TRANSIENT_PROVIDER_ERROR_PREFIX in message: + return message + return ( + "TRANSIENT PROVIDER ERROR: provider connection failed before usable proof output. " + "Preserve the proof checkpoint and retry later." + + (f" Original error: {message}" if message else "") + ) + def is_retryable_model_output_error(exc: Exception) -> bool: """Return true when the provider returned a usable request with unusable output.""" diff --git a/backend/shared/models.py b/backend/shared/models.py index 387c825..d86e74d 100644 --- a/backend/shared/models.py +++ b/backend/shared/models.py @@ -5,13 +5,18 @@ from datetime import datetime from typing import List, Dict, Optional, Any, Literal -from pydantic import BaseModel, ConfigDict, Field +from pydantic import AliasChoices, BaseModel, ConfigDict, Field DEFAULT_CONTEXT_WINDOW = 0 DEFAULT_MAX_OUTPUT_TOKENS = 0 DEFAULT_OPENROUTER_REASONING_EFFORT = "auto" OpenRouterReasoningEffort = Literal["auto", "xhigh", "high", "medium", "low", "minimal", "none"] -ModelProvider = Literal["lm_studio", "openrouter", "openai_codex_oauth", "xai_grok_oauth"] +ModelProvider = Literal["lm_studio", "openrouter", "openai_codex_oauth", "xai_grok_oauth", "sakana_fugu"] +_LEGACY_WRITER_PREFIX = "high" + "_context" + + +def _legacy_writer_field(name: str) -> str: + return f"{_LEGACY_WRITER_PREFIX}_{name}" class DocumentChunk(BaseModel): @@ -103,6 +108,8 @@ class SystemStatus(BaseModel): cleanup_reviews_performed: int = 0 removals_proposed: int = 0 removals_executed: int = 0 + fatal_error_type: Optional[str] = None + fatal_error_message: Optional[str] = None class ModelConfig(BaseModel): @@ -139,7 +146,7 @@ class WorkflowTask(BaseModel): """Represents a predicted API call in the workflow.""" task_id: str # Unique ID like "agg_sub1_001" sequence_number: int # 1-20 - role: str # "Submitter 1", "Validator", "High-Context", etc. + role: str # "Submitter 1", "Validator", "Writing Submitter", etc. mode: Optional[str] = None # "Construction", "Rigor", "Review", etc. provider: str = "lm_studio" # "openrouter" | "lm_studio" using_boost: bool = False @@ -174,6 +181,15 @@ class AggregatorStartRequest(BaseModel): validator_context_size: int = DEFAULT_CONTEXT_WINDOW validator_max_output_tokens: int = DEFAULT_MAX_OUTPUT_TOKENS validator_supercharge_enabled: bool = False + # Parallel Assistant proof-retrieval role (defaults hydrate from Validator in routes/UI) + assistant_provider: ModelProvider = "lm_studio" + assistant_model: str = "" + assistant_openrouter_provider: Optional[str] = None + assistant_openrouter_reasoning_effort: OpenRouterReasoningEffort = DEFAULT_OPENROUTER_REASONING_EFFORT + assistant_lm_studio_fallback: Optional[str] = None + assistant_context_size: int = DEFAULT_CONTEXT_WINDOW + assistant_max_output_tokens: int = DEFAULT_MAX_OUTPUT_TOKENS + assistant_supercharge_enabled: bool = False uploaded_files: List[str] = Field(default_factory=list) @@ -300,16 +316,40 @@ class CompilerStartRequest(BaseModel): validator_context_size: int = DEFAULT_CONTEXT_WINDOW validator_max_output_tokens: int = DEFAULT_MAX_OUTPUT_TOKENS validator_supercharge_enabled: bool = False - # High-context submitter config - high_context_provider: ModelProvider = "lm_studio" - high_context_model: str - high_context_openrouter_provider: Optional[str] = None - high_context_openrouter_reasoning_effort: OpenRouterReasoningEffort = DEFAULT_OPENROUTER_REASONING_EFFORT - high_context_lm_studio_fallback: Optional[str] = None - high_context_context_size: int = DEFAULT_CONTEXT_WINDOW - high_context_max_output_tokens: int = DEFAULT_MAX_OUTPUT_TOKENS - high_context_supercharge_enabled: bool = False - # High-param submitter config + # Writing submitter config + writer_provider: ModelProvider = Field( + default="lm_studio", + validation_alias=AliasChoices("writer_provider", _legacy_writer_field("provider")), + ) + writer_model: str = Field(validation_alias=AliasChoices("writer_model", _legacy_writer_field("model"))) + writer_openrouter_provider: Optional[str] = Field( + default=None, + validation_alias=AliasChoices("writer_openrouter_provider", _legacy_writer_field("openrouter_provider")), + ) + writer_openrouter_reasoning_effort: OpenRouterReasoningEffort = Field( + default=DEFAULT_OPENROUTER_REASONING_EFFORT, + validation_alias=AliasChoices( + "writer_openrouter_reasoning_effort", + _legacy_writer_field("openrouter_reasoning_effort"), + ), + ) + writer_lm_studio_fallback: Optional[str] = Field( + default=None, + validation_alias=AliasChoices("writer_lm_studio_fallback", _legacy_writer_field("lm_studio_fallback")), + ) + writer_context_size: int = Field( + default=DEFAULT_CONTEXT_WINDOW, + validation_alias=AliasChoices("writer_context_size", _legacy_writer_field("context_size")), + ) + writer_max_output_tokens: int = Field( + default=DEFAULT_MAX_OUTPUT_TOKENS, + validation_alias=AliasChoices("writer_max_output_tokens", _legacy_writer_field("max_output_tokens")), + ) + writer_supercharge_enabled: bool = Field( + default=False, + validation_alias=AliasChoices("writer_supercharge_enabled", _legacy_writer_field("supercharge_enabled")), + ) + # Rigor & Proofs submitter config (legacy field prefix: high_param_*) high_param_provider: ModelProvider = "lm_studio" high_param_model: str high_param_openrouter_provider: Optional[str] = None @@ -318,15 +358,26 @@ class CompilerStartRequest(BaseModel): high_param_context_size: int = DEFAULT_CONTEXT_WINDOW high_param_max_output_tokens: int = DEFAULT_MAX_OUTPUT_TOKENS high_param_supercharge_enabled: bool = False - # Critique submitter config + # Deprecated compatibility aliases. Critique generation now uses the + # Rigor & Proofs submitter config; routes may mirror high_param_* here for + # older clients that still send/read critique_submitter_* fields. critique_submitter_provider: ModelProvider = "lm_studio" - critique_submitter_model: str + critique_submitter_model: str = "" critique_submitter_openrouter_provider: Optional[str] = None critique_submitter_openrouter_reasoning_effort: OpenRouterReasoningEffort = DEFAULT_OPENROUTER_REASONING_EFFORT critique_submitter_lm_studio_fallback: Optional[str] = None critique_submitter_context_window: int = DEFAULT_CONTEXT_WINDOW critique_submitter_max_tokens: int = DEFAULT_MAX_OUTPUT_TOKENS critique_submitter_supercharge_enabled: bool = False + # Parallel Assistant proof-retrieval role (defaults hydrate from Validator in routes/UI) + assistant_provider: ModelProvider = "lm_studio" + assistant_model: str = "" + assistant_openrouter_provider: Optional[str] = None + assistant_openrouter_reasoning_effort: OpenRouterReasoningEffort = DEFAULT_OPENROUTER_REASONING_EFFORT + assistant_lm_studio_fallback: Optional[str] = None + assistant_context_size: int = DEFAULT_CONTEXT_WINDOW + assistant_max_output_tokens: int = DEFAULT_MAX_OUTPUT_TOKENS + assistant_supercharge_enabled: bool = False # ============================================================================ @@ -474,16 +525,43 @@ class AutonomousResearchStartRequest(BaseModel): validator_context_window: int = DEFAULT_CONTEXT_WINDOW validator_max_tokens: int = DEFAULT_MAX_OUTPUT_TOKENS validator_supercharge_enabled: bool = False - # Compiler high-context settings (separate from aggregator submitters) - high_context_provider: ModelProvider = "lm_studio" - high_context_model: str = "" # Empty string allowed, will use submitter model as fallback - high_context_openrouter_provider: Optional[str] = None - high_context_openrouter_reasoning_effort: OpenRouterReasoningEffort = DEFAULT_OPENROUTER_REASONING_EFFORT - high_context_lm_studio_fallback: Optional[str] = None - high_context_context_window: int = DEFAULT_CONTEXT_WINDOW - high_context_max_tokens: int = DEFAULT_MAX_OUTPUT_TOKENS - high_context_supercharge_enabled: bool = False - # Compiler high-param settings + # Compiler writer settings (separate from aggregator submitters) + writer_provider: ModelProvider = Field( + default="lm_studio", + validation_alias=AliasChoices("writer_provider", _legacy_writer_field("provider")), + ) + writer_model: str = Field( + default="", + validation_alias=AliasChoices("writer_model", _legacy_writer_field("model")), + ) # Empty string allowed, will use submitter model as fallback + writer_openrouter_provider: Optional[str] = Field( + default=None, + validation_alias=AliasChoices("writer_openrouter_provider", _legacy_writer_field("openrouter_provider")), + ) + writer_openrouter_reasoning_effort: OpenRouterReasoningEffort = Field( + default=DEFAULT_OPENROUTER_REASONING_EFFORT, + validation_alias=AliasChoices( + "writer_openrouter_reasoning_effort", + _legacy_writer_field("openrouter_reasoning_effort"), + ), + ) + writer_lm_studio_fallback: Optional[str] = Field( + default=None, + validation_alias=AliasChoices("writer_lm_studio_fallback", _legacy_writer_field("lm_studio_fallback")), + ) + writer_context_window: int = Field( + default=DEFAULT_CONTEXT_WINDOW, + validation_alias=AliasChoices("writer_context_window", _legacy_writer_field("context_window")), + ) + writer_max_tokens: int = Field( + default=DEFAULT_MAX_OUTPUT_TOKENS, + validation_alias=AliasChoices("writer_max_tokens", _legacy_writer_field("max_tokens")), + ) + writer_supercharge_enabled: bool = Field( + default=False, + validation_alias=AliasChoices("writer_supercharge_enabled", _legacy_writer_field("supercharge_enabled")), + ) + # Compiler Rigor & Proofs settings (legacy field prefix: high_param_*) high_param_provider: ModelProvider = "lm_studio" high_param_model: str = "" # Empty string allowed, will use submitter model as fallback high_param_openrouter_provider: Optional[str] = None @@ -492,15 +570,26 @@ class AutonomousResearchStartRequest(BaseModel): high_param_context_window: int = DEFAULT_CONTEXT_WINDOW high_param_max_tokens: int = DEFAULT_MAX_OUTPUT_TOKENS high_param_supercharge_enabled: bool = False - # Critique submitter settings + # Deprecated compatibility aliases. Critique generation now uses the + # Rigor & Proofs submitter config; routes may mirror high_param_* here for + # older clients that still send/read critique_submitter_* fields. critique_submitter_provider: ModelProvider = "lm_studio" - critique_submitter_model: str = "" # For critique generation and rewrite decisions (uses high_context if empty) + critique_submitter_model: str = "" critique_submitter_openrouter_provider: Optional[str] = None critique_submitter_openrouter_reasoning_effort: OpenRouterReasoningEffort = DEFAULT_OPENROUTER_REASONING_EFFORT critique_submitter_lm_studio_fallback: Optional[str] = None critique_submitter_context_window: int = DEFAULT_CONTEXT_WINDOW critique_submitter_max_tokens: int = DEFAULT_MAX_OUTPUT_TOKENS critique_submitter_supercharge_enabled: bool = False + # Parallel Assistant proof-retrieval role (defaults hydrate from Validator in routes/UI) + assistant_provider: ModelProvider = "lm_studio" + assistant_model: str = "" + assistant_openrouter_provider: Optional[str] = None + assistant_openrouter_reasoning_effort: OpenRouterReasoningEffort = DEFAULT_OPENROUTER_REASONING_EFFORT + assistant_lm_studio_fallback: Optional[str] = None + assistant_context_window: int = DEFAULT_CONTEXT_WINDOW + assistant_max_tokens: int = DEFAULT_MAX_OUTPUT_TOKENS + assistant_supercharge_enabled: bool = False # Tier 3 Final Answer settings tier3_enabled: bool = False # Default OFF — system stops at Tier 2 paper library @@ -575,10 +664,16 @@ class ProofRoleConfigSnapshot(BaseModel): class ProofRuntimeConfigSnapshot(BaseModel): - """Persisted proof runtime config used for manual proof checks.""" + """Persisted proof runtime config used for manual proof checks. + + The source role slots are proof-submitter settings. After the Rigor & + Proofs consolidation, both brainstorm and paper slots should carry the + configured Rigor & Proofs submitter for proof-solving callers. + """ brainstorm: ProofRoleConfigSnapshot paper: ProofRoleConfigSnapshot validator: ProofRoleConfigSnapshot + assistant: ProofRoleConfigSnapshot = Field(default_factory=ProofRoleConfigSnapshot) class ProofDependency(BaseModel): @@ -707,6 +802,7 @@ class LeanOJStartRequest(BaseModel): brainstorm_validator: LeanOJRoleConfig path_decider: LeanOJRoleConfig = Field(default_factory=LeanOJRoleConfig) final_solver: LeanOJRoleConfig + assistant: LeanOJRoleConfig = Field(default_factory=LeanOJRoleConfig) max_initial_brainstorm_accepts: int = Field(default=30, ge=1, le=200) max_recursive_brainstorm_accepts: int = Field(default=10, ge=1, le=100) final_attempts_per_cycle: int = Field(default=30, ge=30, le=200) diff --git a/backend/shared/openai_codex_client.py b/backend/shared/openai_codex_client.py index 092bcd2..95736b4 100644 --- a/backend/shared/openai_codex_client.py +++ b/backend/shared/openai_codex_client.py @@ -42,6 +42,30 @@ class OpenAICodexRequestError(OpenAICodexError): """Raised when Codex rejects a completion request after authentication.""" +class OAuthUsageLimitError(OpenAICodexError): + """Raised when an OAuth-backed provider reports a timed usage limit.""" + + def __init__( + self, + *, + provider: str, + provider_label: str, + message: str, + plan_type: str = "", + resets_at: Optional[int] = None, + resets_in_seconds: Optional[int] = None, + ) -> None: + self.provider = provider + self.provider_label = provider_label + self.plan_type = plan_type + self.resets_at = resets_at + self.resets_in_seconds = resets_in_seconds + detail = message or "The usage limit has been reached." + if resets_in_seconds is not None: + detail = f"{detail} Resets in {resets_in_seconds} seconds." + super().__init__(detail) + + class OpenAICodexClient: """Client for OpenAI Codex OAuth and the ChatGPT Codex Responses backend.""" @@ -51,11 +75,13 @@ class OpenAICodexClient: REVOKE_URL = "https://auth.openai.com/oauth/revoke" CODEX_BASE_URL = "https://chatgpt.com/backend-api/codex" DEFAULT_REDIRECT_URI = "http://localhost:1455/auth/callback" + DEFAULT_ORIGINATOR = "moto-autonomous-asi" DEFAULT_INSTRUCTIONS = "Follow the user's instructions and produce the requested response." REFRESH_SKEW_SECONDS = 60 REASONING_EFFORT_LEVELS = {"xhigh", "high", "medium", "low", "none"} - MAX_RETRIES = 3 + MAX_RETRIES = 4 RETRY_DELAY = 2.0 + RETRY_MAX_DELAY = 30.0 TRANSIENT_COMPLETION_STATUS_CODES = {408, 409, 425, 500, 502, 503, 504, 520, 521, 522, 523, 524} TRANSIENT_COMPLETION_MARKERS = ( "bad gateway", @@ -65,9 +91,24 @@ class OpenAICodexClient: "incomplete chunked read", "peer closed connection", "service unavailable", + "server_error", "temporarily unavailable", "upstream connect error", "upstream provider timeout", + "you can retry", + ) + CODEX_SPARK_MODEL_ID = "gpt-5.3-codex-spark" + CODEX_SPARK_HIGH_MODEL_ID = "gpt-5.3-codex-spark-high" + PUBLIC_MODEL_CATALOG = ( + { + "slug": CODEX_SPARK_HIGH_MODEL_ID, + "title": "GPT-5.3 Codex Spark (high)", + "canonical_model": CODEX_SPARK_MODEL_ID, + "reasoning_effort": "high", + "context_length": 128000, + "input_context_window": 128000, + "max_output_tokens": 32768, + }, ) KNOWN_MODEL_LIMITS = { # OpenAI documents GPT-5.5 in Codex as a 400K-window product. Public @@ -80,6 +121,13 @@ class OpenAICodexClient: "max_output_tokens": 128000, "effective_context_window_percent": 95, }, + CODEX_SPARK_HIGH_MODEL_ID: { + "context_length": 128000, + "input_context_window": 128000, + "max_output_tokens": 32768, + "canonical_model": CODEX_SPARK_MODEL_ID, + "reasoning_effort": "high", + }, } def __init__(self) -> None: @@ -119,7 +167,7 @@ def build_authorization_url( "code_challenge_method": "S256", "id_token_add_organizations": "true", "codex_cli_simplified_flow": "true", - "originator": "moto", + "originator": cls.DEFAULT_ORIGINATOR, "state": state, } return f"{cls.AUTH_URL}?{urlencode(params)}" @@ -389,6 +437,14 @@ def _normalize_model_metadata(cls, model: Dict[str, Any]) -> Optional[Dict[str, "effective_context_window_percent": effective_percent, }, } + canonical_model = model.get("canonical_model") or known.get("canonical_model") + if canonical_model: + normalized["canonical_model"] = str(canonical_model) + normalized["provider_metadata"]["canonical_model"] = str(canonical_model) + model_reasoning_effort = model.get("reasoning_effort") or known.get("reasoning_effort") + if model_reasoning_effort: + normalized["reasoning_effort"] = str(model_reasoning_effort) + normalized["provider_metadata"]["reasoning_effort"] = str(model_reasoning_effort) if context_length: normalized["context_length"] = context_length if max_output_tokens: @@ -445,6 +501,7 @@ async def list_models(self) -> List[Dict[str, Any]]: ) data = response.json() models = [] + seen_model_ids = set() for model in data.get("models", []): if not isinstance(model, dict): continue @@ -452,9 +509,22 @@ async def list_models(self) -> List[Dict[str, Any]]: continue normalized = self._normalize_model_metadata(model) if normalized: + seen_model_ids.add(normalized["id"]) + models.append(normalized) + for model in self.PUBLIC_MODEL_CATALOG: + normalized = self._normalize_model_metadata(model) + if normalized and normalized["id"] not in seen_model_ids: models.append(normalized) + seen_model_ids.add(normalized["id"]) return models + @classmethod + def _resolve_model_request(cls, model: str, reasoning_effort: Optional[str]) -> tuple[str, Optional[str]]: + """Map user-facing Codex aliases onto the backend model id/request knobs.""" + if model == cls.CODEX_SPARK_HIGH_MODEL_ID: + return cls.CODEX_SPARK_MODEL_ID, "high" + return model, reasoning_effort + @staticmethod def _split_instructions(messages: List[Dict[str, Any]]) -> tuple[str, List[Dict[str, Any]]]: instruction_parts: List[str] = [] @@ -533,6 +603,85 @@ def _is_transient_completion_response(cls, response: httpx.Response) -> bool: or cls._is_transient_completion_text(body) ) + @staticmethod + def _coerce_positive_int(value: Any) -> Optional[int]: + try: + parsed = int(value) + except (TypeError, ValueError): + return None + return parsed if parsed > 0 else None + + @classmethod + def _extract_usage_limit_payload(cls, value: Any) -> Optional[Dict[str, Any]]: + if not isinstance(value, dict): + return None + error_type = str(value.get("type") or value.get("code") or "").strip().lower() + if error_type == "usage_limit_reached": + return value + for key in ("error", "response"): + nested = value.get(key) + if isinstance(nested, dict): + found = cls._extract_usage_limit_payload(nested) + if found is not None: + return found + response = value.get("response") + if isinstance(response, dict): + nested_error = response.get("error") + if isinstance(nested_error, dict): + return cls._extract_usage_limit_payload(nested_error) + return None + + @classmethod + def _usage_limit_error_from_payload(cls, value: Any) -> Optional[OAuthUsageLimitError]: + payload = cls._extract_usage_limit_payload(value) + if payload is None: + return None + resets_at = cls._coerce_positive_int(payload.get("resets_at")) + resets_in_seconds = cls._coerce_positive_int(payload.get("resets_in_seconds")) + if resets_at is None and resets_in_seconds is not None: + resets_at = int(time.time()) + resets_in_seconds + elif resets_in_seconds is None and resets_at is not None: + resets_in_seconds = max(1, resets_at - int(time.time())) + message = str(payload.get("message") or "The usage limit has been reached.").strip() + return OAuthUsageLimitError( + provider="openai_codex_oauth", + provider_label="OpenAI Codex", + message=message, + plan_type=str(payload.get("plan_type") or "").strip(), + resets_at=resets_at, + resets_in_seconds=resets_in_seconds, + ) + + @classmethod + def _usage_limit_error_from_text(cls, text: str) -> Optional[OAuthUsageLimitError]: + raw = str(text or "") + start = raw.find("{") + end = raw.rfind("}") + if 0 <= start < end: + try: + parsed = json.loads(raw[start : end + 1]) + except json.JSONDecodeError: + parsed = None + usage_error = cls._usage_limit_error_from_payload(parsed) + if usage_error is not None: + return usage_error + lowered = raw.lower() + if "usage_limit_reached" in lowered or "usage limit has been reached" in lowered: + return OAuthUsageLimitError( + provider="openai_codex_oauth", + provider_label="OpenAI Codex", + message="The usage limit has been reached.", + ) + return None + + @classmethod + def _max_attempts(cls) -> int: + return cls.MAX_RETRIES + 1 + + @classmethod + def _retry_delay(cls, retry_index: int) -> float: + return min(cls.RETRY_MAX_DELAY, cls.RETRY_DELAY * (2 ** max(0, retry_index))) + @classmethod def _reasoning_config(cls, reasoning_effort: Optional[str]) -> Optional[Dict[str, str]]: if not reasoning_effort: @@ -577,6 +726,9 @@ def _decode_response_body(cls, raw_body: str) -> Dict[str, Any]: try: data = json.loads(body) if isinstance(data, dict): + usage_error = cls._usage_limit_error_from_payload(data) + if usage_error is not None: + raise usage_error return data except json.JSONDecodeError: logger.debug("OpenAI Codex response body is not plain JSON; parsing stream events") @@ -599,6 +751,9 @@ def _decode_response_body(cls, raw_body: str) -> Dict[str, Any]: response = event.get("response") response_error = response.get("error") if isinstance(response, dict) else None error = event.get("error") or response_error + usage_error = cls._usage_limit_error_from_payload(error or event) + if usage_error is not None: + raise usage_error raise OpenAICodexRequestError( f"OpenAI Codex completion failed: {sanitize_provider_error_text(json.dumps(error or event))}" ) @@ -649,42 +804,47 @@ def _extract_output(response: Dict[str, Any]) -> tuple[str, List[Dict[str, Any]] async def _post_with_retry(self, url: str, **kwargs) -> httpx.Response: """POST with retry on transport errors (peer close, read error, connect error).""" - for attempt in range(self.MAX_RETRIES): + max_attempts = self._max_attempts() + for attempt in range(max_attempts): try: response = await self.client.post(url, **kwargs) if response.status_code >= 400 and self._is_transient_completion_response(response): error_detail = sanitize_provider_error_text(response.text) + delay = self._retry_delay(attempt) logger.warning( "OpenAI Codex transient completion response (attempt %s/%s): " - "status=%s error=%s", + "status=%s error=%s%s", attempt + 1, - self.MAX_RETRIES, + max_attempts, response.status_code, error_detail, + f"; retrying in {delay:.1f}s" if attempt < max_attempts - 1 else "", ) - if attempt < self.MAX_RETRIES - 1: - await asyncio.sleep(self.RETRY_DELAY * (attempt + 1)) + if attempt < max_attempts - 1: + await asyncio.sleep(delay) continue - raise ValueError( - f"OpenAI Codex connection failed after {self.MAX_RETRIES} attempts: " + raise OpenAICodexRequestError( + f"OpenAI Codex connection failed after {self.MAX_RETRIES} retries: " f"HTTP {response.status_code}: {error_detail}" ) return response - except (httpx.ConnectError, httpx.RemoteProtocolError, httpx.ReadError) as e: + except httpx.TransportError as e: error_type = type(e).__name__ - error_detail = repr(e) if not str(e) else str(e) + error_detail = sanitize_provider_error_text(str(e) or repr(e)) + delay = self._retry_delay(attempt) logger.warning( - "OpenAI Codex connection error (attempt %s/%s): [%s] %s", + "OpenAI Codex connection error (attempt %s/%s): [%s] %s%s", attempt + 1, - self.MAX_RETRIES, + max_attempts, error_type, error_detail, + f"; retrying in {delay:.1f}s" if attempt < max_attempts - 1 else "", ) - if attempt < self.MAX_RETRIES - 1: - await asyncio.sleep(self.RETRY_DELAY * (attempt + 1)) + if attempt < max_attempts - 1: + await asyncio.sleep(delay) continue - raise ValueError( - f"OpenAI Codex connection failed after {self.MAX_RETRIES} attempts: " + raise OpenAICodexRequestError( + f"OpenAI Codex connection failed after {self.MAX_RETRIES} retries: " f"[{error_type}] {error_detail}" ) @@ -701,6 +861,8 @@ async def generate_completion( tool_choice: Optional[Any] = None, ) -> Dict[str, Any]: """Generate a completion and return a Chat Completions-compatible shape.""" + requested_model = model + model, reasoning_effort = self._resolve_model_request(model, reasoning_effort) tokens = await self.get_valid_tokens() instructions, input_items = self._split_instructions(messages) payload: Dict[str, Any] = { @@ -725,6 +887,7 @@ async def generate_completion( payload["tool_choice"] = "auto" if tool_choice == "auto" else tool_choice auth_retry_used = False + stream_retries_used = 0 while True: response = await self._post_with_retry( f"{self.CODEX_BASE_URL}/responses", @@ -732,6 +895,9 @@ async def generate_completion( headers=self._headers(tokens, accept_stream=True), ) if response.status_code >= 400: + usage_error = self._usage_limit_error_from_text(response.text) + if usage_error is not None: + raise usage_error message = f"OpenAI Codex completion failed: {sanitize_provider_error_text(response.text)}" if response.status_code in {401, 403}: if not auth_retry_used: @@ -751,7 +917,25 @@ async def generate_completion( if self._is_auth_failure_text(str(exc)): raise OpenAICodexAuthError(str(exc)) from exc if self._is_transient_completion_text(str(exc)): - raise ValueError(f"OpenAI Codex connection failed while reading streamed response: {exc}") from exc + if stream_retries_used < self.MAX_RETRIES: + delay = self._retry_delay(stream_retries_used) + stream_retries_used += 1 + logger.warning( + "OpenAI Codex transient streamed response (retry %s/%s): %s; " + "retrying in %.1fs", + stream_retries_used, + self.MAX_RETRIES, + sanitize_provider_error_text(str(exc)), + delay, + ) + await asyncio.sleep(delay) + continue + raise OpenAICodexRequestError( + f"OpenAI Codex connection failed after {self.MAX_RETRIES} retries " + f"while reading streamed response: {exc}" + ) from exc + raise + except OAuthUsageLimitError: raise content, tool_calls = self._extract_output(data) message: Dict[str, Any] = {"role": "assistant", "content": content} @@ -768,7 +952,7 @@ async def generate_completion( return { "id": data.get("id") or "", "object": "chat.completion", - "model": model, + "model": requested_model, "choices": [{"index": 0, "message": message, "finish_reason": data.get("status") or "stop"}], "usage": { "prompt_tokens": prompt_tokens, diff --git a/backend/shared/openrouter_client.py b/backend/shared/openrouter_client.py index 05dd1f7..ad5edd7 100644 --- a/backend/shared/openrouter_client.py +++ b/backend/shared/openrouter_client.py @@ -381,6 +381,7 @@ async def generate_completion( reasoning_effort: Optional[str] = None, tools: Optional[List[Dict[str, Any]]] = None, tool_choice: Optional[Any] = None, + allow_provider_auto_fallback: bool = False, ) -> Dict[str, Any]: """ Generate a completion using OpenRouter API with validation and retry. @@ -418,6 +419,7 @@ async def generate_completion( reasoning_effort, tools=tools, tool_choice=tool_choice, + allow_provider_auto_fallback=allow_provider_auto_fallback, ) def _is_reasoning_model_without_temperature(self, model: str) -> bool: @@ -481,6 +483,7 @@ async def _execute_completion_request( reasoning_effort: Optional[str] = None, tools: Optional[List[Dict[str, Any]]] = None, tool_choice: Optional[Any] = None, + allow_provider_auto_fallback: bool = False, ) -> Dict[str, Any]: """Execute the actual completion request.""" # Check if this model is currently rate-limited (for free models) @@ -533,6 +536,7 @@ async def _execute_completion_request( payload["tool_choice"] = tool_choice # Add provider routing if specified + provider_auto_fallback_from: Optional[str] = None if provider: payload["provider"] = { "order": [provider], @@ -574,7 +578,12 @@ async def _execute_completion_request( response.raise_for_status() try: - return response.json() + data = response.json() + if provider_auto_fallback_from and isinstance(data, dict): + data["_moto_openrouter_provider_auto_fallback"] = { + "from_provider": provider_auto_fallback_from, + } + return data except (json.JSONDecodeError, ValueError) as json_err: # OpenRouter returned 2xx but body is not valid JSON. # This is typically a gateway/CDN error page, truncated stream, @@ -618,6 +627,7 @@ async def _execute_completion_request( except httpx.HTTPStatusError as e: error_detail = sanitize_provider_error_text(e.response.text if hasattr(e.response, 'text') else str(e)) + error_detail_lower = error_detail.lower() # Check for rate limit (429 Too Many Requests) if e.response.status_code == 429: @@ -658,7 +668,7 @@ async def _execute_completion_request( # Check for privacy policy error (404 with specific message) # This occurs when user's OpenRouter privacy settings block free models - if e.response.status_code == 404 and "data policy" in error_detail.lower(): + if e.response.status_code == 404 and "data policy" in error_detail_lower: logger.error( f"OpenRouter privacy policy error detected (404): {error_detail}" ) @@ -668,9 +678,31 @@ async def _execute_completion_request( f"the option to allow your data to be used for model training. " f"Free models on OpenRouter require this setting to be enabled." ) + + if e.response.status_code == 404 and "no endpoints found" in error_detail_lower: + if provider and allow_provider_auto_fallback and provider_auto_fallback_from is None: + provider_auto_fallback_from = provider + provider = None + if self.AUTO_IGNORED_PROVIDERS: + payload["provider"] = { + "ignore": list(self.AUTO_IGNORED_PROVIDERS), + } + else: + payload.pop("provider", None) + logger.warning( + "OpenRouter provider %s has no endpoint for model %s; retrying once with auto-routing.", + redact_log_text(provider_auto_fallback_from, 120), + redact_log_text(model, 160), + ) + continue + raise OpenRouterNoEndpointsError( + model=model, + provider=provider, + detail=error_detail, + ) # Check for credit/key-limit-related errors in message - if any(keyword in error_detail.lower() for keyword in ["credit", "insufficient", "balance", "quota", "key limit", "limit exceeded"]): + if any(keyword in error_detail_lower for keyword in ["credit", "insufficient", "balance", "quota", "key limit", "limit exceeded"]): logger.error(f"OpenRouter credit/key exhaustion detected in error message: {error_detail}") raise CreditExhaustionError( f"OpenRouter credits exhausted for model '{model}'. " @@ -914,6 +946,20 @@ class OpenRouterPrivacyPolicyError(Exception): pass +class OpenRouterNoEndpointsError(ValueError): + """Raised when OpenRouter has no callable endpoint for a model/provider pair.""" + + def __init__(self, *, model: str, provider: Optional[str], detail: str): + provider_text = f" via provider '{provider}'" if provider else "" + super().__init__( + f"OpenRouter has no available endpoint for model '{model}'{provider_text}. " + "Set Host Provider to Auto or choose a currently available host provider." + ) + self.model = model + self.provider = provider + self.detail = detail + + class RateLimitError(Exception): """ Raised when OpenRouter free model hits rate limit (429 or rate limit message). diff --git a/backend/shared/proof_search/__init__.py b/backend/shared/proof_search/__init__.py new file mode 100644 index 0000000..5a87b61 --- /dev/null +++ b/backend/shared/proof_search/__init__.py @@ -0,0 +1,2 @@ +"""Unified proof-search helpers for MOTO and SyntheticLib4 corpora.""" + diff --git a/backend/shared/proof_search/assistant_cache.py b/backend/shared/proof_search/assistant_cache.py new file mode 100644 index 0000000..486d56d --- /dev/null +++ b/backend/shared/proof_search/assistant_cache.py @@ -0,0 +1,642 @@ +"""Persistent cache for Assistant proof-support ranking state.""" +from __future__ import annotations + +import hashlib +import json +import re +import sqlite3 +from contextlib import closing +from dataclasses import dataclass +from datetime import datetime, timezone +from pathlib import Path +from typing import Any, Iterable + +from backend.shared.config import system_config +from backend.shared.proof_search.assistant_models import ( + AssistantProofPack, + AssistantTargetSnapshot, +) + +DEFAULT_MAX_ASSISTANT_CACHE_TARGETS = 128 + + +@dataclass(frozen=True) +class AssistantCandidateStats: + """Persisted ranking state for one proof candidate under one target.""" + + visits: int = 0 + failure_penalty: float = 0.0 + + +@dataclass(frozen=True) +class AssistantCooldownState: + """Durable Assistant proof-memory backoff state for one run scope.""" + + run_key: str + zero_attempts_in_batch: int = 0 + zero_cooldown_stage: int = 0 + zero_cooldown_skips_remaining: int = 0 + zero_steady_81_batches: int = 0 + zero_shutdown_active: bool = False + stagnant_same_count: int = 0 + stagnant_attempts_in_batch: int = 0 + stagnant_cooldown_stage: int = 0 + stagnant_cooldown_skips_remaining: int = 0 + last_signature: str = "" + last_reason: str = "" + updated_at: str = "" + + @classmethod + def empty(cls, run_key: str) -> "AssistantCooldownState": + return cls(run_key=run_key, updated_at=_now_iso()) + + def to_payload(self) -> dict[str, Any]: + return { + "run_key": self.run_key, + "zero_attempts_in_batch": self.zero_attempts_in_batch, + "zero_cooldown_stage": self.zero_cooldown_stage, + "zero_cooldown_skips_remaining": self.zero_cooldown_skips_remaining, + "zero_steady_81_batches": self.zero_steady_81_batches, + "zero_shutdown_active": self.zero_shutdown_active, + "stagnant_same_count": self.stagnant_same_count, + "stagnant_attempts_in_batch": self.stagnant_attempts_in_batch, + "stagnant_cooldown_stage": self.stagnant_cooldown_stage, + "stagnant_cooldown_skips_remaining": self.stagnant_cooldown_skips_remaining, + "last_signature": self.last_signature, + "last_reason": self.last_reason, + "updated_at": self.updated_at, + } + + +def default_assistant_cache_path() -> Path: + return Path(system_config.data_dir) / "proof_search" / "assistant_ranker.sqlite" + + +class AssistantRankCache: + """Small SQLite cache over canonical proof records. + + This is intentionally rebuildable. Canonical proof storage remains the source + of truth; this cache only keeps target-scoped ranking, pack, and goal reuse + metadata for the non-blocking Assistant role. + """ + + def __init__(self, db_path: Path | None = None) -> None: + self.db_path = Path(db_path) if db_path else default_assistant_cache_path() + + def load_cached_pack( + self, + *, + target_hash: str, + goal_hash: str = "", + ) -> AssistantProofPack | None: + if not self.db_path.exists(): + return None + with closing(self._connect()) as conn: + self._create_schema(conn) + row = conn.execute( + """ + SELECT pack_json + FROM assistant_proof_packs + WHERE target_hash = ? + ORDER BY created_at DESC + LIMIT 1 + """, + (target_hash,), + ).fetchone() + if row is None and goal_hash: + row = conn.execute( + """ + SELECT packs.pack_json + FROM assistant_goal_cache AS goals + JOIN assistant_proof_packs AS packs + ON packs.target_hash = goals.pack_target_hash + WHERE goals.goal_hash = ? + ORDER BY packs.created_at DESC + LIMIT 1 + """, + (goal_hash,), + ).fetchone() + if row is None: + return None + try: + pack = AssistantProofPack.model_validate_json(row["pack_json"]) + except Exception: + return None + freshness = "cached" if pack.target_hash == target_hash else "stale-but-best-known" + return pack.model_copy(update={"target_hash": target_hash, "freshness": freshness}) + + def load_candidate_stats(self, target_hash: str) -> dict[str, AssistantCandidateStats]: + if not self.db_path.exists(): + return {} + with closing(self._connect()) as conn: + self._create_schema(conn) + rows = conn.execute( + """ + SELECT search_id, visits, failure_penalty + FROM assistant_proof_candidates + WHERE target_hash = ? + """, + (target_hash,), + ).fetchall() + return { + row["search_id"]: AssistantCandidateStats( + visits=int(row["visits"] or 0), + failure_penalty=float(row["failure_penalty"] or 0.0), + ) + for row in rows + } + + def upsert_candidates( + self, + *, + target_hash: str, + candidates: Iterable[dict[str, Any]], + ) -> None: + now = _now_iso() + self.db_path.parent.mkdir(parents=True, exist_ok=True) + with closing(self._connect()) as conn: + self._create_schema(conn) + conn.executemany( + """ + INSERT INTO assistant_proof_candidates ( + target_hash, + search_id, + proof_source, + proof_id, + theorem_statement_hash, + lean_code_hash, + query_variant, + retrieval_score, + exact_match_score, + semantic_score, + dependency_overlap_score, + corpus_trust_score, + recency_score, + duplicate_group, + created_at, + updated_at + ) + VALUES ( + :target_hash, + :search_id, + :proof_source, + :proof_id, + :theorem_statement_hash, + :lean_code_hash, + :query_variant, + :retrieval_score, + :exact_match_score, + :semantic_score, + :dependency_overlap_score, + :corpus_trust_score, + :recency_score, + :duplicate_group, + :created_at, + :updated_at + ) + ON CONFLICT(target_hash, search_id) DO UPDATE SET + proof_source = excluded.proof_source, + proof_id = excluded.proof_id, + theorem_statement_hash = excluded.theorem_statement_hash, + lean_code_hash = excluded.lean_code_hash, + query_variant = excluded.query_variant, + retrieval_score = excluded.retrieval_score, + exact_match_score = excluded.exact_match_score, + semantic_score = excluded.semantic_score, + dependency_overlap_score = excluded.dependency_overlap_score, + corpus_trust_score = excluded.corpus_trust_score, + recency_score = excluded.recency_score, + duplicate_group = excluded.duplicate_group, + updated_at = excluded.updated_at + """, + ( + { + **candidate, + "target_hash": target_hash, + "created_at": now, + "updated_at": now, + } + for candidate in candidates + ), + ) + conn.commit() + + def record_pack( + self, + *, + snapshot: AssistantTargetSnapshot, + pack: AssistantProofPack, + selected_search_ids: list[str], + ) -> None: + now = _now_iso() + pack_id = hashlib.sha256( + f"{pack.target_hash}\n{pack.created_at}\n{','.join(selected_search_ids)}".encode( + "utf-8" + ) + ).hexdigest() + metadata_pack_json = json.dumps(pack.metadata_only_dump(), ensure_ascii=True) + self.db_path.parent.mkdir(parents=True, exist_ok=True) + with closing(self._connect()) as conn: + self._create_schema(conn) + conn.execute( + """ + INSERT OR REPLACE INTO assistant_proof_packs ( + pack_id, + target_hash, + created_at, + workflow_mode, + target_kind, + query_summary, + selected_candidate_ids_json, + warnings_json, + pack_json + ) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?) + """, + ( + pack_id, + pack.target_hash, + pack.created_at, + pack.workflow_mode, + pack.target_kind, + pack.query_summary, + json.dumps(selected_search_ids), + json.dumps(pack.warnings), + metadata_pack_json, + ), + ) + conn.executemany( + """ + UPDATE assistant_proof_candidates + SET visits = visits + 1, + last_selected_at = ?, + updated_at = ? + WHERE target_hash = ? AND search_id = ? + """, + ((now, now, pack.target_hash, search_id) for search_id in selected_search_ids), + ) + goal_hash = goal_hash_for_snapshot(snapshot) + if goal_hash: + conn.execute( + """ + INSERT OR REPLACE INTO assistant_goal_cache ( + goal_hash, + normalized_goal_text, + imports_hash, + source_context_hash, + result_kind, + proof_id, + tactic_sequence_hash, + feedback_summary, + pack_target_hash, + created_at + ) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) + """, + ( + goal_hash, + _normalize_goal_text(snapshot), + _hash_json(snapshot.imports), + _hash_text("\n".join([snapshot.source_type, snapshot.source_id])), + "retrieved_support", + selected_search_ids[0] if selected_search_ids else "", + "", + _trim_feedback(snapshot), + pack.target_hash, + now, + ), + ) + self._prune_cache(conn, max_targets=DEFAULT_MAX_ASSISTANT_CACHE_TARGETS) + conn.commit() + + def load_cooldown_state(self, run_key: str) -> AssistantCooldownState: + if not self.db_path.exists(): + return AssistantCooldownState.empty(run_key) + with closing(self._connect()) as conn: + self._create_schema(conn) + row = conn.execute( + """ + SELECT * + FROM assistant_cooldown_state + WHERE run_key = ? + LIMIT 1 + """, + (run_key,), + ).fetchone() + if row is None: + return AssistantCooldownState.empty(run_key) + return _cooldown_state_from_row(row) + + def save_cooldown_state(self, state: AssistantCooldownState) -> None: + now_state = state if state.updated_at else _replace_cooldown_state(state, updated_at=_now_iso()) + self.db_path.parent.mkdir(parents=True, exist_ok=True) + with closing(self._connect()) as conn: + self._create_schema(conn) + conn.execute( + """ + INSERT INTO assistant_cooldown_state ( + run_key, + zero_attempts_in_batch, + zero_cooldown_stage, + zero_cooldown_skips_remaining, + zero_steady_81_batches, + zero_shutdown_active, + stagnant_same_count, + stagnant_attempts_in_batch, + stagnant_cooldown_stage, + stagnant_cooldown_skips_remaining, + last_signature, + last_reason, + updated_at + ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) + ON CONFLICT(run_key) DO UPDATE SET + zero_attempts_in_batch = excluded.zero_attempts_in_batch, + zero_cooldown_stage = excluded.zero_cooldown_stage, + zero_cooldown_skips_remaining = excluded.zero_cooldown_skips_remaining, + zero_steady_81_batches = excluded.zero_steady_81_batches, + zero_shutdown_active = excluded.zero_shutdown_active, + stagnant_same_count = excluded.stagnant_same_count, + stagnant_attempts_in_batch = excluded.stagnant_attempts_in_batch, + stagnant_cooldown_stage = excluded.stagnant_cooldown_stage, + stagnant_cooldown_skips_remaining = excluded.stagnant_cooldown_skips_remaining, + last_signature = excluded.last_signature, + last_reason = excluded.last_reason, + updated_at = excluded.updated_at + """, + ( + now_state.run_key, + now_state.zero_attempts_in_batch, + now_state.zero_cooldown_stage, + now_state.zero_cooldown_skips_remaining, + now_state.zero_steady_81_batches, + int(now_state.zero_shutdown_active), + now_state.stagnant_same_count, + now_state.stagnant_attempts_in_batch, + now_state.stagnant_cooldown_stage, + now_state.stagnant_cooldown_skips_remaining, + now_state.last_signature, + now_state.last_reason, + now_state.updated_at, + ), + ) + conn.commit() + + def clear_cooldown_state(self, run_key: str | None = None) -> None: + if not self.db_path.exists(): + return + with closing(self._connect()) as conn: + self._create_schema(conn) + if run_key: + conn.execute("DELETE FROM assistant_cooldown_state WHERE run_key = ?", (run_key,)) + else: + conn.execute("DELETE FROM assistant_cooldown_state") + conn.commit() + + def _connect(self) -> sqlite3.Connection: + conn = sqlite3.connect(str(self.db_path)) + conn.row_factory = sqlite3.Row + return conn + + def _create_schema(self, conn: sqlite3.Connection) -> None: + conn.execute( + """ + CREATE TABLE IF NOT EXISTS assistant_proof_candidates ( + target_hash TEXT NOT NULL, + search_id TEXT NOT NULL, + proof_source TEXT NOT NULL, + proof_id TEXT NOT NULL, + theorem_statement_hash TEXT NOT NULL, + lean_code_hash TEXT NOT NULL, + query_variant TEXT NOT NULL DEFAULT '', + retrieval_score REAL NOT NULL DEFAULT 0, + exact_match_score REAL NOT NULL DEFAULT 0, + semantic_score REAL NOT NULL DEFAULT 0, + dependency_overlap_score REAL NOT NULL DEFAULT 0, + corpus_trust_score REAL NOT NULL DEFAULT 0, + recency_score REAL NOT NULL DEFAULT 0, + visits INTEGER NOT NULL DEFAULT 0, + last_selected_at TEXT, + failure_penalty REAL NOT NULL DEFAULT 0, + duplicate_group TEXT NOT NULL DEFAULT '', + created_at TEXT NOT NULL, + updated_at TEXT NOT NULL, + PRIMARY KEY (target_hash, search_id) + ) + """ + ) + conn.execute( + """ + CREATE TABLE IF NOT EXISTS assistant_proof_packs ( + pack_id TEXT PRIMARY KEY, + target_hash TEXT NOT NULL, + created_at TEXT NOT NULL, + workflow_mode TEXT NOT NULL, + target_kind TEXT NOT NULL, + query_summary TEXT NOT NULL, + selected_candidate_ids_json TEXT NOT NULL, + warnings_json TEXT NOT NULL, + pack_json TEXT NOT NULL + ) + """ + ) + conn.execute( + """ + CREATE TABLE IF NOT EXISTS assistant_goal_cache ( + goal_hash TEXT PRIMARY KEY, + normalized_goal_text TEXT NOT NULL, + imports_hash TEXT NOT NULL, + source_context_hash TEXT NOT NULL, + result_kind TEXT NOT NULL, + proof_id TEXT NOT NULL, + tactic_sequence_hash TEXT NOT NULL, + feedback_summary TEXT NOT NULL, + pack_target_hash TEXT NOT NULL, + created_at TEXT NOT NULL + ) + """ + ) + conn.execute( + "CREATE INDEX IF NOT EXISTS idx_assistant_candidates_target ON assistant_proof_candidates(target_hash)" + ) + conn.execute( + "CREATE INDEX IF NOT EXISTS idx_assistant_packs_target ON assistant_proof_packs(target_hash)" + ) + conn.execute( + "CREATE INDEX IF NOT EXISTS idx_assistant_goal_cache_pack ON assistant_goal_cache(pack_target_hash)" + ) + conn.execute( + """ + CREATE TABLE IF NOT EXISTS assistant_cooldown_state ( + run_key TEXT PRIMARY KEY, + zero_attempts_in_batch INTEGER NOT NULL DEFAULT 0, + zero_cooldown_stage INTEGER NOT NULL DEFAULT 0, + zero_cooldown_skips_remaining INTEGER NOT NULL DEFAULT 0, + zero_steady_81_batches INTEGER NOT NULL DEFAULT 0, + zero_shutdown_active INTEGER NOT NULL DEFAULT 0, + stagnant_same_count INTEGER NOT NULL DEFAULT 0, + stagnant_attempts_in_batch INTEGER NOT NULL DEFAULT 0, + stagnant_cooldown_stage INTEGER NOT NULL DEFAULT 0, + stagnant_cooldown_skips_remaining INTEGER NOT NULL DEFAULT 0, + last_signature TEXT NOT NULL DEFAULT '', + last_reason TEXT NOT NULL DEFAULT '', + updated_at TEXT NOT NULL + ) + """ + ) + + def _prune_cache(self, conn: sqlite3.Connection, *, max_targets: int) -> None: + rows = conn.execute( + """ + SELECT target_hash, MAX(created_at) AS newest_pack + FROM assistant_proof_packs + GROUP BY target_hash + ORDER BY newest_pack DESC + """ + ).fetchall() + stale_targets = [row["target_hash"] for row in rows[max(1, max_targets):]] + if not stale_targets: + return + placeholders = ",".join("?" for _ in stale_targets) + conn.execute( + f"DELETE FROM assistant_proof_packs WHERE target_hash IN ({placeholders})", + stale_targets, + ) + conn.execute( + f"DELETE FROM assistant_proof_candidates WHERE target_hash IN ({placeholders})", + stale_targets, + ) + conn.execute( + f"DELETE FROM assistant_goal_cache WHERE pack_target_hash IN ({placeholders})", + stale_targets, + ) + + +def goal_hash_for_snapshot(snapshot: AssistantTargetSnapshot) -> str: + goal_text = _normalize_goal_text(snapshot) + if not goal_text: + return "" + return _hash_text( + "\n\n".join( + [ + snapshot.workflow_mode, + snapshot.target_kind, + goal_text, + _hash_json(snapshot.imports), + _source_scope_hash(snapshot), + ] + ) + ) + + +def _normalize_goal_text(snapshot: AssistantTargetSnapshot) -> str: + if _is_broad_workflow_target(snapshot): + return " ".join( + part + for part in [ + _normalize_broad_fragment(snapshot.user_prompt), + _normalize_broad_fragment(snapshot.current_prompt_or_topic), + _normalize_broad_fragment(snapshot.writing_goal), + _normalize_broad_fragment(snapshot.outline_summary), + _normalize_broad_fragment(snapshot.paper_or_proof_draft_summary), + _normalize_broad_fragment(snapshot.accepted_memory_summary), + ] + if part + ) + return " ".join( + part + for part in [ + _normalize_lean_fragment(snapshot.target_statement), + _normalize_lean_fragment(snapshot.lean_template), + _normalize_lean_fragment(snapshot.lean_error), + ] + if part + ) + + +def _is_broad_workflow_target(snapshot: AssistantTargetSnapshot) -> bool: + return snapshot.target_kind in { + "brainstorm_context", + "writing_context", + "outline_context", + "reference_selection_context", + "topic_context", + "title_context", + "completion_review_context", + "path_context", + "semantic_review_context", + "final_answer_context", + } + + +def _normalize_broad_fragment(value: str) -> str: + text = " ".join((value or "").split()) + if not text: + return "" + return text[:2000] + + +def _source_scope_hash(snapshot: AssistantTargetSnapshot) -> str: + if _is_broad_workflow_target(snapshot): + return _hash_text(snapshot.source_type or snapshot.active_mode or snapshot.workflow_mode) + return _hash_text("\n".join([snapshot.source_type, snapshot.source_id])) + + +def _normalize_lean_fragment(value: str) -> str: + text = " ".join((value or "").split()) + if not text: + return "" + # Lean diagnostics often include volatile temp-file positions; those should + # not prevent reuse for the same mathematical goal/error shape. + text = re.sub(r"\bline\s+\d+\s*,\s*column\s+\d+\b", "line , column ", text, flags=re.I) + text = re.sub(r":\d+:\d+:", ":::", text) + return text + + +def _trim_feedback(snapshot: AssistantTargetSnapshot) -> str: + feedback = " ".join( + part + for part in [ + snapshot.rejection_feedback, + snapshot.proof_attempt_feedback, + snapshot.accepted_solver_summary, + ] + if part + ) + feedback = " ".join(feedback.split()) + return feedback[:600] + + +def _hash_json(values: list[str]) -> str: + return _hash_text(json.dumps(sorted(values or []), ensure_ascii=True)) + + +def _hash_text(value: str) -> str: + return hashlib.sha256((value or "").encode("utf-8")).hexdigest() + + +def _now_iso() -> str: + return datetime.now(timezone.utc).isoformat() + + +def _cooldown_state_from_row(row: sqlite3.Row) -> AssistantCooldownState: + return AssistantCooldownState( + run_key=str(row["run_key"] or ""), + zero_attempts_in_batch=int(row["zero_attempts_in_batch"] or 0), + zero_cooldown_stage=int(row["zero_cooldown_stage"] or 0), + zero_cooldown_skips_remaining=int(row["zero_cooldown_skips_remaining"] or 0), + zero_steady_81_batches=int(row["zero_steady_81_batches"] or 0), + zero_shutdown_active=bool(row["zero_shutdown_active"]), + stagnant_same_count=int(row["stagnant_same_count"] or 0), + stagnant_attempts_in_batch=int(row["stagnant_attempts_in_batch"] or 0), + stagnant_cooldown_stage=int(row["stagnant_cooldown_stage"] or 0), + stagnant_cooldown_skips_remaining=int(row["stagnant_cooldown_skips_remaining"] or 0), + last_signature=str(row["last_signature"] or ""), + last_reason=str(row["last_reason"] or ""), + updated_at=str(row["updated_at"] or ""), + ) + + +def _replace_cooldown_state(state: AssistantCooldownState, **updates: Any) -> AssistantCooldownState: + payload = state.to_payload() + payload.update(updates) + return AssistantCooldownState(**payload) diff --git a/backend/shared/proof_search/assistant_coordinator.py b/backend/shared/proof_search/assistant_coordinator.py new file mode 100644 index 0000000..60b0d22 --- /dev/null +++ b/backend/shared/proof_search/assistant_coordinator.py @@ -0,0 +1,1100 @@ +"""Non-blocking Assistant proof-support coordinator.""" +from __future__ import annotations + +import asyncio +import hashlib +import json +import logging +from datetime import datetime, timezone +from pathlib import Path +from typing import Any, Awaitable, Callable + +from backend.shared.config import system_config +from backend.shared.json_parser import parse_json +from backend.shared.proof_search.assistant_cache import AssistantCooldownState, AssistantRankCache, goal_hash_for_snapshot +from backend.shared.proof_search.assistant_models import AssistantProofPack, AssistantProofSupport, AssistantTargetSnapshot +from backend.shared.proof_search.assistant_ranker import ranked_candidates_to_cache_rows, score_assistant_proof_candidates, select_assistant_proof_supports +from backend.shared.proof_search.models import ProofSearchRequest, UnifiedProofSearchRecord, default_proof_search_corpora +from backend.shared.proof_search.search_service import ProofSearchService, proof_search_service +from backend.shared.response_extraction import extract_response_text + +logger = logging.getLogger(__name__) + +_ASSISTANT_CANDIDATE_POOL_TARGET = 64 +_ASSISTANT_SHORTLIST_TARGET = 21 +_ASSISTANT_FINAL_PACK_LIMIT = 7 +_ASSISTANT_SELECTION_MAX_OUTPUT_TOKENS = 4096 +_ASSISTANT_SELECTION_CODE_PREVIEW_CHARS = 1200 +_CURRENT_RUN_CORPUS_SCOPES = {"active", "current"} +_ZERO_BATCH_SIZE = 4 +_STAGNANT_REPEAT_THRESHOLD = 3 +_ASSISTANT_PACK_REFRESH_RECEIVER_READS = 2 +_COOLDOWN_DELAYS = [3, 9, 81] +_ZERO_STEADY_81_BATCHES_BEFORE_SHUTDOWN = 3 +_NO_EXTERNAL_HISTORY_REASON = ( + "Assistant memory skipped because no external proof-memory records were available; " + "the Assistant only performs proof-memory retrieval for now." +) + +AssistantSelector = Callable[ + [AssistantTargetSnapshot, list[AssistantProofSupport], str, str, str], + Awaitable[tuple[list[str], str]], +] + + +class _AssistantSelectionOutputError(ValueError): + """Raised when the Assistant selection call returns malformed output.""" + + +class AssistantProofSearchCoordinator: + """Maintains latest Assistant proof-support packs without blocking solvers.""" + + def __init__( + self, + service: ProofSearchService | None = None, + cache: AssistantRankCache | None = None, + assistant_selector: AssistantSelector | None = None, + ) -> None: + self._service = service or proof_search_service + self._cache = cache or AssistantRankCache() + self._assistant_selector = assistant_selector + self._packs: dict[str, AssistantProofPack] = {} + self._goal_target_hashes: dict[str, str] = {} + self._tasks: dict[str, asyncio.Task] = {} + self._lock = asyncio.Lock() + self._task_sequence = 0 + self._latest_pack_target_hash = "" + self._latest_pack_consumption_count = _ASSISTANT_PACK_REFRESH_RECEIVER_READS + + @property + def enabled(self) -> bool: + return bool(system_config.agent_conversation_memory_enabled) + + def get_latest_pack(self, target_hash: str | None = None) -> AssistantProofPack | None: + if target_hash: + return _drop_current_run_supports_from_pack(self._packs.get(target_hash)) + if not self._packs: + return None + return _drop_current_run_supports_from_pack(next(reversed(self._packs.values()))) + + def get_status(self) -> dict[str, Any]: + latest_pack = self.get_latest_pack() + enabled_corpora = default_proof_search_corpora() if self.enabled else [] + disabled_reason = "" + if not self.enabled: + disabled_reason = "Session History Memory is disabled." + elif not enabled_corpora: + disabled_reason = "No proof-search corpora are enabled." + return { + "enabled": self.enabled, + "running_tasks": sum(1 for task in self._tasks.values() if not task.done()), + "cached_pack_count": len(self._packs), + "latest_target_hash": latest_pack.target_hash if latest_pack else "", + "latest_workflow_mode": latest_pack.workflow_mode if latest_pack else "", + "latest_target_kind": latest_pack.target_kind if latest_pack else "", + "latest_result_count": len(latest_pack.results) if latest_pack else 0, + "latest_freshness": latest_pack.freshness if latest_pack else "", + "latest_warnings": latest_pack.warnings[:3] if latest_pack else [], + "enabled_corpora": enabled_corpora, + "disabled_reason": disabled_reason, + } + + def submit_target(self, snapshot: AssistantTargetSnapshot) -> str: + target_hash = snapshot.stable_hash() + snapshot = snapshot.model_copy(update={"target_hash": target_hash}) + if not self.enabled: + self._packs.pop(target_hash, None) + logger.info( + "Assistant memory search skipped for %s/%s: Agent Conversation Memory is disabled", + snapshot.workflow_mode, + snapshot.target_kind, + ) + return target_hash + cached_pack = self._load_cached_pack(snapshot) + if cached_pack is not None: + self._packs[target_hash] = cached_pack + logger.info( + "Assistant memory loaded cached pack for %s/%s (target=%s, results=%s, freshness=%s)", + snapshot.workflow_mode, + snapshot.target_kind, + target_hash[:12], + len(cached_pack.results), + cached_pack.freshness, + ) + if _cached_pack_is_reusable(cached_pack): + if target_hash != self._latest_pack_target_hash: + self._latest_pack_consumption_count = 0 + self._latest_pack_target_hash = target_hash + logger.info( + "Assistant memory refresh skipped for %s/%s (target=%s cached pack is current, mode=%s, results=%s)", + snapshot.workflow_mode, + snapshot.target_kind, + target_hash[:12], + cached_pack.selection_mode, + len(cached_pack.results), + ) + return target_hash + running_target_hash = self._running_target_hash() + if running_target_hash: + logger.info( + "Assistant memory refresh already running for %s/%s (running_target=%s, requested_target=%s)", + snapshot.workflow_mode, + snapshot.target_kind, + running_target_hash[:12], + target_hash[:12], + ) + return target_hash + if self._latest_pack_target_hash and not self._latest_pack_has_enough_receiver_reads(): + if cached_pack is None: + latest_pack = self.get_latest_pack(self._latest_pack_target_hash) or self.get_latest_pack() + if latest_pack is not None: + self._packs[target_hash] = latest_pack.model_copy( + update={ + "target_hash": target_hash, + "freshness": "stale-but-best-known", + "selection_mode": "stale-but-best-known", + } + ) + logger.info( + "Assistant memory refresh deferred for %s/%s (latest_target=%s receiver_reads=%s/%s)", + snapshot.workflow_mode, + snapshot.target_kind, + self._latest_pack_target_hash[:12], + self._latest_pack_consumption_count, + _ASSISTANT_PACK_REFRESH_RECEIVER_READS, + ) + return target_hash + existing = self._tasks.get(target_hash) + if existing and not existing.done(): + logger.info( + "Assistant memory refresh already running for %s/%s (target=%s)", + snapshot.workflow_mode, + snapshot.target_kind, + target_hash[:12], + ) + return target_hash + logger.info( + "Assistant memory refresh scheduled for %s/%s (target=%s, phase=%s, source=%s:%s)", + snapshot.workflow_mode, + snapshot.target_kind, + target_hash[:12], + snapshot.workflow_phase or "unknown", + snapshot.source_type or "unknown", + snapshot.source_id or "unknown", + ) + task = asyncio.create_task(self._refresh_pack(snapshot)) + task.add_done_callback(lambda completed: self._on_task_done(target_hash, completed)) + self._tasks[target_hash] = task + return target_hash + + async def refresh_now(self, snapshot: AssistantTargetSnapshot) -> AssistantProofPack | None: + target_hash = snapshot.stable_hash() + snapshot = snapshot.model_copy(update={"target_hash": target_hash}) + if not self.enabled: + return None + cached_pack = self._load_cached_pack(snapshot) + if cached_pack is not None: + self._packs[target_hash] = cached_pack + await self._refresh_pack(snapshot) + return self._packs.get(target_hash) + + async def stop_all(self, *, clear_packs: bool = True, broadcast: bool = False, reason: str = "parent_stopped") -> None: + tasks = list(self._tasks.values()) + for task in tasks: + if not task.done(): + task.cancel() + self._tasks.clear() + if tasks: + await asyncio.gather(*tasks, return_exceptions=True) + if clear_packs: + self._packs.clear() + self._goal_target_hashes.clear() + self._latest_pack_target_hash = "" + self._latest_pack_consumption_count = _ASSISTANT_PACK_REFRESH_RECEIVER_READS + await asyncio.to_thread(_delete_if_exists, _assistant_pack_path()) + if broadcast: + await self._broadcast_event("assistant_proof_pack_stopped", {"reason": reason, "cleared": clear_packs}) + + async def clear_cooldown_state(self, run_key: str | None = None) -> None: + await asyncio.to_thread(self._cache.clear_cooldown_state, run_key) + + def mark_pack_consumed_by_solver(self, target_hash: str, *, role_id: str = "", task_id: str = "") -> None: + """Track receiver reads so useful packs refresh only after two solver uses.""" + if not target_hash or target_hash not in self._packs: + return + if target_hash != self._latest_pack_target_hash and not self._pack_matches_latest_pack(target_hash): + logger.debug( + "Ignoring Assistant memory consumption for stale target role=%s task=%s target=%s latest=%s", + role_id or "unknown", + task_id or "unknown", + target_hash[:12], + self._latest_pack_target_hash[:12], + ) + return + self._latest_pack_consumption_count = min( + self._latest_pack_consumption_count + 1, + _ASSISTANT_PACK_REFRESH_RECEIVER_READS, + ) + logger.info( + "Assistant memory pack consumed by solver role=%s task=%s target=%s receiver_reads=%s/%s", + role_id or "unknown", + task_id or "unknown", + target_hash[:12], + self._latest_pack_consumption_count, + _ASSISTANT_PACK_REFRESH_RECEIVER_READS, + ) + + def _latest_pack_has_enough_receiver_reads(self) -> bool: + if not self._latest_pack_target_hash: + return True + latest_pack = self._packs.get(self._latest_pack_target_hash) + if latest_pack is None or not latest_pack.results: + return True + return self._latest_pack_consumption_count >= _ASSISTANT_PACK_REFRESH_RECEIVER_READS + + def _pack_matches_latest_pack(self, target_hash: str) -> bool: + latest = self._packs.get(self._latest_pack_target_hash) + consumed = self._packs.get(target_hash) + if latest is None or consumed is None: + return False + latest_ids = [support.search_id for support in latest.results] + consumed_ids = [support.search_id for support in consumed.results] + return bool(latest_ids) and latest_ids == consumed_ids + + def _run_key_for_snapshot(self, snapshot: AssistantTargetSnapshot) -> str: + scope_type, source_scope = _cooldown_run_scope(snapshot) + return ":".join( + part.replace(":", "_") + for part in [snapshot.workflow_mode or "unknown", scope_type, source_scope] + if part + ) + + async def _maybe_skip_for_cooldown(self, snapshot: AssistantTargetSnapshot) -> bool: + run_key = self._run_key_for_snapshot(snapshot) + state = await asyncio.to_thread(self._cache.load_cooldown_state, run_key) + if state.zero_shutdown_active: + await self._broadcast_event( + "assistant_proof_memory_shutdown", + _cooldown_payload( + snapshot, + state, + kind="zero_useful", + reason="Assistant proof-memory retrieval is shut off for this run after repeated zero-useful retrieval batches.", + ), + ) + return True + state, payload = _consume_cooldown_turn(snapshot, state) + if payload is None: + return False + await asyncio.to_thread(self._cache.save_cooldown_state, state) + await self._broadcast_event("assistant_proof_memory_cooldown", payload) + return True + + async def _record_cooldown_outcome(self, snapshot: AssistantTargetSnapshot, pack: AssistantProofPack) -> None: + run_key = self._run_key_for_snapshot(snapshot) + state = await asyncio.to_thread(self._cache.load_cooldown_state, run_key) + event_name = "" + payload: dict[str, Any] | None = None + if not pack.results and pack.selection_mode in {"assistant_llm", "no_candidates"}: + state, event_name, payload = _advance_zero_useful_state(snapshot, state) + elif pack.results: + state, event_name, payload = _advance_success_state(snapshot, state, pack) + else: + return + await asyncio.to_thread(self._cache.save_cooldown_state, state) + if event_name and payload: + await self._broadcast_event(event_name, payload) + + async def _refresh_pack(self, snapshot: AssistantTargetSnapshot) -> None: + async with self._lock: + if await self._maybe_skip_for_cooldown(snapshot): + return + warnings: list[str] = [] + corpora = default_proof_search_corpora() + if not corpora: + corpora = ["moto", "manual", "leanoj"] + + records: list[UnifiedProofSearchRecord] = [] + seen_ids: set[str] = set() + excluded_session_ids = [session_id for session_id in [_active_autonomous_session_id()] if session_id] + for query in _build_query_variants(snapshot): + if len(records) >= _ASSISTANT_CANDIDATE_POOL_TARGET: + break + try: + query_records = await self._service.search_candidate_pool( + ProofSearchRequest( + query=query, + goal_statement=snapshot.target_statement or snapshot.lean_template, + imports=snapshot.imports or ["Mathlib"], + dependency_names=snapshot.dependency_names, + corpora=corpora, + verified_only=True, + include_partial=False, + include_failed=False, + limit=_ASSISTANT_CANDIDATE_POOL_TARGET, + hydrate_lean_code=True, + ), + pool_limit=_ASSISTANT_CANDIDATE_POOL_TARGET - len(records), + exclude_corpus_scopes=sorted(_CURRENT_RUN_CORPUS_SCOPES), + exclude_session_ids=excluded_session_ids, + ) + query_records = _filter_current_run_records(query_records) + except Exception as exc: + logger.debug("Assistant proof search query failed: %s", exc) + warnings.append(f"Search query failed: {exc}") + continue + for record in query_records: + if record.search_id in seen_ids: + continue + seen_ids.add(record.search_id) + records.append(record) + + if not records: + if warnings: + await self._broadcast_event( + "assistant_proof_pack_warning", + { + "target_hash": snapshot.target_hash, + "workflow_mode": snapshot.workflow_mode, + "target_kind": snapshot.target_kind, + "workflow_phase": snapshot.workflow_phase, + "source_type": snapshot.source_type, + "source_id": snapshot.source_id, + "warnings": warnings[-3:], + "reason": "Assistant proof-memory search failed before any external proof-history records could be collected.", + }, + ) + logger.warning( + "Assistant memory search failed for %s/%s (target=%s) before collecting external records: %s", + snapshot.workflow_mode, + snapshot.target_kind, + snapshot.target_hash[:12], + "; ".join(warnings[-3:]), + ) + return + await self._broadcast_event( + "assistant_proof_memory_unavailable", + { + "target_hash": snapshot.target_hash, + "workflow_mode": snapshot.workflow_mode, + "target_kind": snapshot.target_kind, + "workflow_phase": snapshot.workflow_phase, + "source_type": snapshot.source_type, + "source_id": snapshot.source_id, + "reason": _NO_EXTERNAL_HISTORY_REASON, + }, + ) + logger.info( + "Assistant memory skipped for %s/%s (target=%s): no external proof-memory records after current-run filtering", + snapshot.workflow_mode, + snapshot.target_kind, + snapshot.target_hash[:12], + ) + return + + ranked_candidates = score_assistant_proof_candidates(records, snapshot) + await asyncio.to_thread(self._cache.upsert_candidates, target_hash=snapshot.target_hash, candidates=ranked_candidates_to_cache_rows(ranked_candidates)) + candidate_stats = await asyncio.to_thread(self._cache.load_candidate_stats, snapshot.target_hash) + shortlist = select_assistant_proof_supports(ranked_candidates, limit=_ASSISTANT_SHORTLIST_TARGET, candidate_stats=candidate_stats) + if not shortlist: + await self._publish_pack(snapshot, [], warnings=warnings, selection_mode="no_candidates", candidate_count=len(records), shortlist_count=0, selection_reasoning="No verified candidate supports were found.") + return + await self._select_and_publish_assistant_pack(snapshot=snapshot, shortlist=shortlist, warnings=warnings, candidate_count=len(records)) + + async def _select_and_publish_assistant_pack(self, *, snapshot: AssistantTargetSnapshot, shortlist: list[AssistantProofSupport], warnings: list[str], candidate_count: int) -> None: + assistant_role_id = _assistant_role_id_for_snapshot(snapshot) + assistant_model_id = "injected-assistant" if self._assistant_selector is not None else _assistant_model_id(assistant_role_id) + if not assistant_model_id: + warnings.append(f"Assistant role '{assistant_role_id}' is not configured.") + await self._publish_pack(snapshot, [], warnings=warnings, selection_mode="unavailable", assistant_role_id=assistant_role_id, assistant_model_id="", candidate_count=candidate_count, shortlist_count=len(shortlist), selection_reasoning="Configured Assistant role was unavailable.") + return + + if _assistant_oauth_provider_is_cooling_down(assistant_role_id): + latest_pack = self.get_latest_pack() + if latest_pack and latest_pack.results: + supports = latest_pack.results[:_ASSISTANT_FINAL_PACK_LIMIT] + await self._publish_pack( + snapshot, + supports, + warnings=[ + *warnings, + "Assistant OAuth provider is in usage-limit cooldown; reusing latest cached proof pack.", + ], + selection_mode="cached_oauth_cooldown", + assistant_role_id=assistant_role_id, + assistant_model_id=assistant_model_id, + candidate_count=candidate_count, + shortlist_count=len(shortlist), + selection_reasoning="Reused latest cached Assistant pack while OAuth provider cooldown is active.", + ) + return + if shortlist: + selected_supports = shortlist[:_ASSISTANT_FINAL_PACK_LIMIT] + await self._publish_pack( + snapshot, + selected_supports, + warnings=[ + *warnings, + "Assistant OAuth provider is in usage-limit cooldown; using deterministic shortlist without Assistant LLM selection.", + ], + selection_mode="deterministic_oauth_cooldown", + assistant_role_id=assistant_role_id, + assistant_model_id=assistant_model_id, + candidate_count=candidate_count, + shortlist_count=len(shortlist), + selection_reasoning="Used deterministic proof-support shortlist while OAuth provider cooldown is active.", + ) + return + + task_id = self._next_assistant_task_id(snapshot.workflow_mode) + await self._broadcast_event("assistant_proof_pack_refresh_started", {"target_hash": snapshot.target_hash, "workflow_mode": snapshot.workflow_mode, "target_kind": snapshot.target_kind, "workflow_phase": snapshot.workflow_phase, "source_type": snapshot.source_type, "source_id": snapshot.source_id, "assistant_role_id": assistant_role_id, "assistant_model_id": assistant_model_id, "candidate_count": candidate_count, "shortlist_count": len(shortlist), "max_result_count": _ASSISTANT_FINAL_PACK_LIMIT}) + try: + selected_search_ids, selection_reasoning = await self._select_with_assistant(snapshot, shortlist, assistant_role_id=assistant_role_id, assistant_model_id=assistant_model_id, task_id=task_id) + except Exception as exc: + warnings.append(f"Assistant LLM selection failed: {exc}") + await self._broadcast_event("assistant_proof_pack_warning", {"target_hash": snapshot.target_hash, "workflow_mode": snapshot.workflow_mode, "target_kind": snapshot.target_kind, "workflow_phase": snapshot.workflow_phase, "source_type": snapshot.source_type, "source_id": snapshot.source_id, "warnings": warnings[-3:], "assistant_role_id": assistant_role_id, "assistant_model_id": assistant_model_id, "candidate_count": candidate_count, "shortlist_count": len(shortlist)}) + await self._publish_pack(snapshot, [], warnings=warnings, selection_mode="unavailable", assistant_role_id=assistant_role_id, assistant_model_id=assistant_model_id, candidate_count=candidate_count, shortlist_count=len(shortlist), selection_reasoning="Assistant LLM selection failed.") + return + + selected_supports = _supports_for_selected_ids(shortlist, selected_search_ids) + await self._publish_pack(snapshot, selected_supports, warnings=warnings, selection_mode="assistant_llm", assistant_role_id=assistant_role_id, assistant_model_id=assistant_model_id, candidate_count=candidate_count, shortlist_count=len(shortlist), selection_reasoning=selection_reasoning) + + def _next_assistant_task_id(self, workflow_mode: str) -> str: + self._task_sequence += 1 + mode = "".join(char if char.isalnum() else "_" for char in workflow_mode or "assistant") + return f"assistant_pack_{mode}_{self._task_sequence:03d}" + + async def _select_with_assistant(self, snapshot: AssistantTargetSnapshot, shortlist: list[AssistantProofSupport], *, assistant_role_id: str, assistant_model_id: str, task_id: str) -> tuple[list[str], str]: + if self._assistant_selector is not None: + return await self._assistant_selector(snapshot, shortlist, assistant_role_id, assistant_model_id, task_id) + from backend.shared.api_client_manager import api_client_manager + role_config = api_client_manager.get_role_config(assistant_role_id) + if role_config is None: + raise RuntimeError(f"Assistant role '{assistant_role_id}' is not configured") + max_tokens = _assistant_selection_max_tokens(role_config.max_output_tokens) + prompt = _build_assistant_selection_prompt(snapshot, shortlist) + try: + payload = await _generate_assistant_selection_payload( + prompt=prompt, + task_id=task_id, + assistant_role_id=assistant_role_id, + assistant_model_id=assistant_model_id, + max_tokens=max_tokens, + ) + selected_ids = _extract_valid_selected_search_ids(payload, shortlist) + except _AssistantSelectionOutputError as first_error: + repair_prompt = _build_assistant_selection_repair_prompt( + snapshot, + shortlist, + error=str(first_error), + ) + try: + payload = await _generate_assistant_selection_payload( + prompt=repair_prompt, + task_id=f"{task_id}_retry", + assistant_role_id=assistant_role_id, + assistant_model_id=assistant_model_id, + max_tokens=max_tokens, + ) + selected_ids = _extract_valid_selected_search_ids(payload, shortlist) + except _AssistantSelectionOutputError as retry_error: + raise _AssistantSelectionOutputError( + f"{first_error}; retry failed: {retry_error}" + ) from retry_error + clean_ids = [str(item).strip() for item in selected_ids if str(item).strip()] + reasoning = str(payload.get("reasoning") or payload.get("selection_reasoning") or "").strip() or "Assistant selected proof supports for the current target." + reasoning = _compact_for_assistant_selection(reasoning, 300) + return clean_ids[:_ASSISTANT_FINAL_PACK_LIMIT], reasoning + + async def _publish_pack(self, snapshot: AssistantTargetSnapshot, supports: list[AssistantProofSupport], *, warnings: list[str], selection_mode: str, assistant_role_id: str = "", assistant_model_id: str = "", candidate_count: int, shortlist_count: int, selection_reasoning: str = "") -> None: + pack = AssistantProofPack(workflow_mode=snapshot.workflow_mode, target_kind=snapshot.target_kind, target_hash=snapshot.target_hash, query_summary=_compact_query_summary(snapshot), results=supports[:_ASSISTANT_FINAL_PACK_LIMIT], warnings=warnings, selection_mode=selection_mode, assistant_role_id=assistant_role_id, assistant_model_id=assistant_model_id, candidate_count=candidate_count, shortlist_count=shortlist_count, selection_reasoning=selection_reasoning) + source_counts: dict[str, int] = {} + for support in pack.results: + source_counts[support.corpus] = source_counts.get(support.corpus, 0) + 1 + logger.info("Assistant memory pack refreshed for %s/%s (target=%s, mode=%s, results=%s, local=%s, syntheticlib4=%s)", snapshot.workflow_mode, snapshot.target_kind, snapshot.target_hash[:12], selection_mode, len(pack.results), sum(count for corpus, count in source_counts.items() if corpus != "syntheticlib4"), source_counts.get("syntheticlib4", 0)) + if not pack.results and selection_mode in {"assistant_llm", "no_candidates"}: + logger.info( + "Assistant memory found no useful proof supports for %s/%s (target=%s, candidates=%s, shortlist=%s, mode=%s)", + snapshot.workflow_mode, + snapshot.target_kind, + snapshot.target_hash[:12], + candidate_count, + shortlist_count, + selection_mode, + ) + self._packs[snapshot.target_hash] = pack + self._latest_pack_target_hash = snapshot.target_hash + self._latest_pack_consumption_count = ( + 0 if pack.results else _ASSISTANT_PACK_REFRESH_RECEIVER_READS + ) + goal_hash = goal_hash_for_snapshot(snapshot) + if goal_hash: + self._goal_target_hashes[goal_hash] = snapshot.target_hash + await asyncio.to_thread(self._cache.record_pack, snapshot=snapshot, pack=pack, selected_search_ids=[support.search_id for support in pack.results]) + await self._persist_pack(pack) + await self._broadcast_event("assistant_proof_pack_updated", {"target_hash": pack.target_hash, "workflow_mode": pack.workflow_mode, "target_kind": pack.target_kind, "result_count": len(pack.results), "local_result_count": sum(count for corpus, count in source_counts.items() if corpus != "syntheticlib4"), "syntheticlib4_result_count": source_counts.get("syntheticlib4", 0), "source_counts": source_counts, "max_result_count": _ASSISTANT_FINAL_PACK_LIMIT, "workflow_phase": snapshot.workflow_phase, "source_type": snapshot.source_type, "source_id": snapshot.source_id, "warnings": pack.warnings[:3], "selection_mode": pack.selection_mode, "assistant_role_id": pack.assistant_role_id, "assistant_model_id": pack.assistant_model_id, "candidate_count": pack.candidate_count, "shortlist_count": pack.shortlist_count}) + await self._record_cooldown_outcome(snapshot, pack) + + def _on_task_done(self, target_hash: str, task: asyncio.Task) -> None: + if self._tasks.get(target_hash) is task: + self._tasks.pop(target_hash, None) + try: + task.result() + except asyncio.CancelledError: + return + except Exception: + logger.exception("Assistant proof-search refresh failed") + + def _running_target_hash(self) -> str: + for target_hash, task in self._tasks.items(): + if not task.done(): + return target_hash + return "" + + async def _persist_pack(self, pack: AssistantProofPack) -> None: + await asyncio.to_thread(_write_json, _assistant_pack_path(), pack.metadata_only_dump()) + + async def _broadcast_event(self, event_type: str, payload: dict[str, Any]) -> None: + try: + from backend.api.routes import websocket + await websocket.broadcast_event(event_type, payload) + except Exception: + logger.debug("Assistant proof-search event broadcast failed", exc_info=True) + + def _load_cached_pack(self, snapshot: AssistantTargetSnapshot) -> AssistantProofPack | None: + try: + goal_hash = goal_hash_for_snapshot(snapshot) + previous_target_hash = self._goal_target_hashes.get(goal_hash) if goal_hash else "" + if previous_target_hash: + in_memory_pack = self._packs.get(previous_target_hash) + if in_memory_pack is not None: + freshness = "cached" if previous_target_hash == snapshot.target_hash else "stale-but-best-known" + return _drop_current_run_supports_from_pack(in_memory_pack.model_copy(update={"target_hash": snapshot.target_hash, "freshness": freshness, "selection_mode": freshness})) + cached = self._cache.load_cached_pack(target_hash=snapshot.target_hash, goal_hash=goal_hash) + if cached is not None: + cached = cached.model_copy(update={"selection_mode": cached.freshness}) + return _drop_current_run_supports_from_pack(cached) + except Exception: + logger.debug("Assistant proof-search cache lookup failed", exc_info=True) + return None + + +def _cached_pack_is_reusable(pack: AssistantProofPack) -> bool: + return pack.freshness == "cached" and bool(pack.results) + + +def _build_query_variants(snapshot: AssistantTargetSnapshot) -> list[str]: + variants = [snapshot.search_text(), "\n\n".join(part for part in [snapshot.user_prompt, snapshot.current_prompt_or_topic, snapshot.writing_goal, snapshot.outline_summary] if part), "\n\n".join(part for part in [snapshot.user_prompt, snapshot.target_statement] if part), "\n\n".join(part for part in [snapshot.lean_template, snapshot.lean_error] if part), "\n\n".join(part for part in [snapshot.rejection_feedback, snapshot.proof_attempt_feedback] if part), "\n\n".join(part for part in [snapshot.accepted_memory_summary, snapshot.paper_or_proof_draft_summary, snapshot.recent_activity_summary] if part), " ".join([*snapshot.dependency_names, *snapshot.imports]), " ".join(snapshot.source_titles), snapshot.source_title] + cleaned: list[str] = [] + seen: set[str] = set() + for value in variants: + text = " ".join((value or "").split()) + if not text or text in seen: + continue + seen.add(text) + cleaned.append(text) + return cleaned or [snapshot.target_statement or snapshot.user_prompt or snapshot.lean_template] + + +def _assistant_role_id_for_snapshot(snapshot: AssistantTargetSnapshot) -> str: + workflow_mode = (snapshot.workflow_mode or "").lower() + if workflow_mode == "manual_proof_check": + return "manual_proof_assistant" + if workflow_mode == "aggregator": + return "aggregator_assistant" + if workflow_mode == "compiler": + return "compiler_assistant" + if workflow_mode == "leanoj": + return "leanoj_assistant" + return "autonomous_assistant" + + +def _assistant_model_id(role_id: str) -> str: + try: + from backend.shared.api_client_manager import api_client_manager + config = api_client_manager.get_role_config(role_id) + except Exception: + return "" + if config is None: + return "" + return config.openrouter_model_id or config.model_id + + +def _assistant_oauth_provider_key(role_id: str) -> str: + try: + from backend.shared.api_client_manager import api_client_manager + config = api_client_manager.get_role_config(role_id) + except Exception: + return "" + if config is None: + return "" + provider = str(config.provider or "").strip() + if provider in {"openai_codex_oauth", "xai_grok_oauth"}: + return provider + return "" + + +def _assistant_oauth_provider_is_cooling_down(role_id: str) -> bool: + provider = _assistant_oauth_provider_key(role_id) + if not provider: + return False + try: + from backend.shared.api_client_manager import api_client_manager + return api_client_manager.is_provider_cooling_down(provider) + except Exception: + return False + + +async def _generate_assistant_selection_payload( + *, + prompt: str, + task_id: str, + assistant_role_id: str, + assistant_model_id: str, + max_tokens: int, +) -> dict[str, Any]: + from backend.shared.api_client_manager import api_client_manager + + response_format = {"type": "json_object"} + role_config = api_client_manager.get_role_config(assistant_role_id) + if role_config is not None and role_config.provider == "lm_studio": + # LM Studio's OpenAI-compatible server rejects json_object on recent builds. + # The prompt already requires a compact JSON object, and parse_json enforces it. + response_format = {"type": "text"} + + response = await api_client_manager.generate_completion( + task_id=task_id, + role_id=assistant_role_id, + model=assistant_model_id, + messages=[{"role": "user", "content": prompt}], + temperature=0.0, + max_tokens=max_tokens, + response_format=response_format, + _moto_disable_supercharge=True, + _moto_reasoning_effort_override="none", + ) + try: + payload = parse_json(extract_response_text(response, context="assistant_proof_search")) + except Exception as exc: + raise _AssistantSelectionOutputError(str(exc)) from exc + if not isinstance(payload, dict): + raise _AssistantSelectionOutputError("Assistant response was not a JSON object") + return payload + + +def _extract_selected_search_ids(payload: dict[str, Any]) -> list[Any]: + selected_ids = payload.get("selected_search_ids") + if selected_ids is None: + selected_ids = payload.get("selected_ids") + if not isinstance(selected_ids, list): + raise _AssistantSelectionOutputError("Assistant response missing selected_search_ids array") + return selected_ids + + +def _extract_valid_selected_search_ids(payload: dict[str, Any], shortlist: list[AssistantProofSupport]) -> list[Any]: + selected_ids = _extract_selected_search_ids(payload) + clean_ids = [str(item).strip() for item in selected_ids if str(item).strip()] + if not clean_ids: + return [] + valid_ids = {support.search_id for support in shortlist} + invalid_ids = [search_id for search_id in clean_ids if search_id not in valid_ids] + if invalid_ids: + preview = ", ".join(invalid_ids[:3]) + raise _AssistantSelectionOutputError( + f"Assistant selected IDs outside the candidate shortlist: {preview}" + ) + return selected_ids + + +def _assistant_selection_max_tokens(configured_max_tokens: int | None) -> int: + configured = int(configured_max_tokens or 0) + if configured <= 0: + return _ASSISTANT_SELECTION_MAX_OUTPUT_TOKENS + return min(configured, _ASSISTANT_SELECTION_MAX_OUTPUT_TOKENS) + + +def _build_assistant_selection_prompt(snapshot: AssistantTargetSnapshot, shortlist: list[AssistantProofSupport]) -> str: + return _assistant_selection_prompt( + snapshot, + shortlist, + prefix=( + "You are the configured MOTO Assistant memory role. " + "Return one compact JSON object only." + ), + ) + + +def _build_assistant_selection_repair_prompt( + snapshot: AssistantTargetSnapshot, + shortlist: list[AssistantProofSupport], + *, + error: str, +) -> str: + safe_error = " ".join(error.split())[:240] + return _assistant_selection_prompt( + snapshot, + shortlist, + prefix=( + "Your previous Assistant proof-support selection was invalid: " + f"{safe_error}. Return corrected JSON only." + ), + ) + + +def _assistant_selection_prompt( + snapshot: AssistantTargetSnapshot, + shortlist: list[AssistantProofSupport], + *, + prefix: str, +) -> str: + ids = "\n".join(_format_assistant_candidate(support) for support in shortlist) + target = _compact_for_assistant_selection(snapshot.search_text(), 2400) + return ( + f"{prefix}\n" + 'Required schema: {"selected_search_ids":[""],"reasoning":"<=160 chars"}\n' + f"Rules: select at most {_ASSISTANT_FINAL_PACK_LIMIT}; use only exact listed IDs; use [] if no listed proof support is genuinely useful for the target; no markdown.\n\n" + f"TARGET:\n{target}\n\n" + f"CANDIDATES:\n{ids}\n" + ) + + +def _format_assistant_candidate(support: AssistantProofSupport) -> str: + label = support.theorem_name or support.theorem_statement or support.proof_id + statement = "" if label == support.theorem_statement else support.theorem_statement + parts = [f"- id: {support.search_id}", f" label: {_compact_for_assistant_selection(label, 180)}"] + if statement: + parts.append(f" statement: {_compact_for_assistant_selection(statement, 220)}") + return "\n".join(parts) + + +def _compact_for_assistant_selection(text: str, limit: int) -> str: + compact = " ".join((text or "").split()) + if len(compact) <= limit: + return compact + return compact[: limit - 3] + "..." + + +def _supports_for_selected_ids(shortlist: list[AssistantProofSupport], selected_search_ids: list[str]) -> list[AssistantProofSupport]: + by_id = {support.search_id: support for support in shortlist} + selected: list[AssistantProofSupport] = [] + seen: set[str] = set() + for search_id in selected_search_ids: + if search_id in seen: + continue + support = by_id.get(search_id) + if support is None: + continue + selected.append(support) + seen.add(search_id) + if len(selected) >= _ASSISTANT_FINAL_PACK_LIMIT: + break + return selected + + +def _compact_query_summary(snapshot: AssistantTargetSnapshot) -> str: + summary = " ".join(part for part in [snapshot.workflow_phase, snapshot.current_prompt_or_topic, snapshot.writing_goal, snapshot.outline_summary, snapshot.paper_or_proof_draft_summary, snapshot.target_statement, snapshot.lean_template, snapshot.lean_error, snapshot.rejection_feedback, snapshot.source_title] if part) + summary = " ".join(summary.split()) + return summary[:600] + ("..." if len(summary) > 600 else "") + + +def _filter_current_run_records(records: list[UnifiedProofSearchRecord]) -> list[UnifiedProofSearchRecord]: + return [record for record in records if not _is_current_run_record(record)] + + +def _drop_current_run_supports_from_pack(pack: AssistantProofPack | None) -> AssistantProofPack | None: + if pack is None or not pack.results: + return pack + filtered_results = [support for support in pack.results if not _is_current_run_support(support)] + if len(filtered_results) == len(pack.results): + return pack + return pack.model_copy(update={"results": filtered_results}) + + +def _is_current_run_record(record: UnifiedProofSearchRecord) -> bool: + if record.corpus == "syntheticlib4": + return False + if (record.corpus_scope or "").strip().lower() in _CURRENT_RUN_CORPUS_SCOPES: + return True + active_session_id = _active_autonomous_session_id() + return bool(active_session_id and record.session_id == active_session_id) + + +def _is_current_run_support(support: AssistantProofSupport) -> bool: + if support.corpus == "syntheticlib4": + return False + if (support.corpus_scope or "").strip().lower() in _CURRENT_RUN_CORPUS_SCOPES: + return True + active_session_id = _active_autonomous_session_id() + if not active_session_id: + return False + if support.session_id: + return support.session_id == active_session_id + parts = support.search_id.split(":") + return len(parts) >= 3 and parts[1] == active_session_id + + +def _cooldown_delay_for_stage(stage: int) -> int: + if stage <= 0: + return 0 + index = min(stage - 1, len(_COOLDOWN_DELAYS) - 1) + return _COOLDOWN_DELAYS[index] + + +def _state_with(state: AssistantCooldownState, **updates: Any) -> AssistantCooldownState: + payload = state.to_payload() + updates.setdefault("updated_at", _now_iso()) + payload.update(updates) + return AssistantCooldownState(**payload) + + +def _cooldown_payload( + snapshot: AssistantTargetSnapshot, + state: AssistantCooldownState, + *, + kind: str, + reason: str, +) -> dict[str, Any]: + stage = state.zero_cooldown_stage if kind == "zero_useful" else state.stagnant_cooldown_stage + remaining = state.zero_cooldown_skips_remaining if kind == "zero_useful" else state.stagnant_cooldown_skips_remaining + attempts = state.zero_attempts_in_batch if kind == "zero_useful" else state.stagnant_attempts_in_batch + required = _cooldown_delay_for_stage(stage) + return { + "target_hash": snapshot.target_hash, + "workflow_mode": snapshot.workflow_mode, + "target_kind": snapshot.target_kind, + "workflow_phase": snapshot.workflow_phase, + "source_type": snapshot.source_type, + "source_id": snapshot.source_id, + "reason": reason, + "cooldown_kind": kind, + "cooldown_stage": stage, + "eligible_turns_skipped": max(0, required - remaining), + "eligible_turns_required": required, + "eligible_turns_remaining": max(0, remaining), + "batch_attempts": attempts, + "batch_size": _ZERO_BATCH_SIZE if kind == "zero_useful" else _STAGNANT_REPEAT_THRESHOLD, + "shutdown_active": bool(state.zero_shutdown_active), + } + + +def _now_iso() -> str: + return datetime.now(timezone.utc).isoformat() + + +def _support_signature(pack: AssistantProofPack) -> str: + search_ids = [support.search_id for support in pack.results] + if not search_ids: + return "" + return hashlib.sha256("|".join(search_ids).encode("utf-8")).hexdigest() + + +def _cooldown_run_scope(snapshot: AssistantTargetSnapshot) -> tuple[str, str]: + """Group transient role/task IDs so backoff accumulates across a run phase.""" + source_id = (snapshot.source_id or "").strip() + source_type = (snapshot.source_type or "").strip() + transient_prefixes = ( + "agg_sub", + "agg_val", + "comp_writer", + "comp_hp", + "comp_val", + "assistant_pack", + "proof_id", + "proof_lemma", + "proof_form", + "proof_integrity", + "proof_novelty", + "proof_framing_gate", + "leanoj_topic", + "leanoj_brainstorm", + "leanoj_val", + "leanoj_path", + "leanoj_final", + ) + if source_id and not source_id.startswith(transient_prefixes): + return source_type or "source", source_id + workflow_scope = snapshot.active_mode or snapshot.workflow_mode or source_type or "workflow" + return "workflow", workflow_scope + + +def _consume_cooldown_turn( + snapshot: AssistantTargetSnapshot, + state: AssistantCooldownState, +) -> tuple[AssistantCooldownState, dict[str, Any] | None]: + if state.zero_shutdown_active: + return state, None + if state.zero_cooldown_skips_remaining > 0: + new_state = _state_with( + state, + zero_cooldown_skips_remaining=state.zero_cooldown_skips_remaining - 1, + last_reason="zero_useful_cooldown_skip", + ) + return new_state, _cooldown_payload( + snapshot, + new_state, + kind="zero_useful", + reason="Assistant proof-memory retrieval is backing off after repeated zero-useful retrieval batches.", + ) + if state.stagnant_cooldown_skips_remaining > 0: + new_state = _state_with( + state, + stagnant_cooldown_skips_remaining=state.stagnant_cooldown_skips_remaining - 1, + last_reason="stagnant_cooldown_skip", + ) + return new_state, _cooldown_payload( + snapshot, + new_state, + kind="stagnant", + reason="Assistant proof-memory retrieval is backing off because the same proof pack kept repeating.", + ) + return state, None + + +def _advance_zero_useful_state( + snapshot: AssistantTargetSnapshot, + state: AssistantCooldownState, +) -> tuple[AssistantCooldownState, str, dict[str, Any] | None]: + attempts = state.zero_attempts_in_batch + 1 + if attempts < _ZERO_BATCH_SIZE: + return ( + _state_with( + state, + zero_attempts_in_batch=attempts, + stagnant_same_count=0, + stagnant_attempts_in_batch=0, + stagnant_cooldown_stage=0, + stagnant_cooldown_skips_remaining=0, + last_signature="", + last_reason="zero_useful_attempt", + ), + "", + None, + ) + + already_at_steady_stage = state.zero_cooldown_stage >= len(_COOLDOWN_DELAYS) + next_stage = min(state.zero_cooldown_stage + 1, len(_COOLDOWN_DELAYS)) + steady_batches = state.zero_steady_81_batches + 1 if already_at_steady_stage else 0 + shutdown = already_at_steady_stage and steady_batches >= _ZERO_STEADY_81_BATCHES_BEFORE_SHUTDOWN + new_state = _state_with( + state, + zero_attempts_in_batch=0, + zero_cooldown_stage=next_stage, + zero_cooldown_skips_remaining=0 if shutdown else _cooldown_delay_for_stage(next_stage), + zero_steady_81_batches=steady_batches, + zero_shutdown_active=shutdown, + stagnant_same_count=0, + stagnant_attempts_in_batch=0, + stagnant_cooldown_stage=0, + stagnant_cooldown_skips_remaining=0, + last_signature="", + last_reason="zero_useful_shutdown" if shutdown else "zero_useful_cooldown_entered", + ) + if shutdown: + return new_state, "assistant_proof_memory_shutdown", _cooldown_payload( + snapshot, + new_state, + kind="zero_useful", + reason="Assistant proof-memory retrieval disabled for this run after repeated empty retrieval batches.", + ) + return new_state, "assistant_proof_memory_cooldown", _cooldown_payload( + snapshot, + new_state, + kind="zero_useful", + reason="Assistant proof-memory retrieval is backing off after repeated zero-useful retrieval batches.", + ) + + +def _advance_success_state( + snapshot: AssistantTargetSnapshot, + state: AssistantCooldownState, + pack: AssistantProofPack, +) -> tuple[AssistantCooldownState, str, dict[str, Any] | None]: + signature = _support_signature(pack) + zero_reset = { + "zero_attempts_in_batch": 0, + "zero_cooldown_stage": 0, + "zero_cooldown_skips_remaining": 0, + "zero_steady_81_batches": 0, + "zero_shutdown_active": False, + } + if not signature: + return _state_with(state, **zero_reset, last_reason="useful_pack_without_signature"), "", None + + if signature != state.last_signature: + return ( + _state_with( + state, + **zero_reset, + stagnant_same_count=1, + stagnant_attempts_in_batch=1, + stagnant_cooldown_stage=0, + stagnant_cooldown_skips_remaining=0, + last_signature=signature, + last_reason="useful_pack_changed", + ), + "", + None, + ) + + same_count = state.stagnant_same_count + 1 + attempts = state.stagnant_attempts_in_batch + 1 + if attempts < _STAGNANT_REPEAT_THRESHOLD: + return ( + _state_with( + state, + **zero_reset, + stagnant_same_count=same_count, + stagnant_attempts_in_batch=attempts, + last_reason="stagnant_repeat_observed", + ), + "", + None, + ) + + next_stage = min(state.stagnant_cooldown_stage + 1, len(_COOLDOWN_DELAYS)) + new_state = _state_with( + state, + **zero_reset, + stagnant_same_count=same_count, + stagnant_attempts_in_batch=0, + stagnant_cooldown_stage=next_stage, + stagnant_cooldown_skips_remaining=_cooldown_delay_for_stage(next_stage), + last_signature=signature, + last_reason="stagnant_cooldown_entered", + ) + return new_state, "assistant_proof_memory_cooldown", _cooldown_payload( + snapshot, + new_state, + kind="stagnant", + reason="Assistant proof-memory retrieval is backing off because the same proof pack kept repeating.", + ) + + +def _active_autonomous_session_id() -> str: + try: + from backend.autonomous.memory.session_manager import session_manager + if session_manager.is_session_active: + return str(session_manager.session_id or "").strip() + except Exception: + logger.debug("Assistant could not inspect active autonomous session", exc_info=True) + return "" + + +def _assistant_pack_path() -> Path: + return Path(system_config.data_dir) / "proof_search" / "assistant_latest_pack.json" + + +def _write_json(path: Path, payload: dict[str, Any]) -> None: + path.parent.mkdir(parents=True, exist_ok=True) + path.write_text(json.dumps(payload, ensure_ascii=True, indent=2), encoding="utf-8") + + +def _delete_if_exists(path: Path) -> None: + try: + path.unlink(missing_ok=True) + except TypeError: + if path.exists(): + path.unlink() + + +assistant_proof_search_coordinator = AssistantProofSearchCoordinator() diff --git a/backend/shared/proof_search/assistant_models.py b/backend/shared/proof_search/assistant_models.py new file mode 100644 index 0000000..99d5dd2 --- /dev/null +++ b/backend/shared/proof_search/assistant_models.py @@ -0,0 +1,291 @@ +"""Models for non-blocking Assistant proof-support retrieval.""" +from __future__ import annotations + +import hashlib +from datetime import datetime, timezone +from typing import Literal + +from pydantic import BaseModel, Field + +from backend.shared.proof_search.models import ProofSearchCorpus, UnifiedProofSearchRecord + + +AssistantWorkflowMode = Literal[ + "aggregator", + "compiler", + "autonomous", + "leanoj", + "manual_proof_check", +] +AssistantTargetKind = Literal[ + "brainstorm_context", + "writing_context", + "outline_context", + "reference_selection_context", + "topic_context", + "title_context", + "completion_review_context", + "path_context", + "semantic_review_context", + "final_answer_context", + "proof_candidate", + "lean_error", + "theorem_discovery", + "master_proof", + "final_solver", + "paper_claim", +] +AssistantFreshness = Literal["fresh", "cached", "stale-but-best-known"] +AssistantSelectionMode = Literal[ + "assistant_llm", + "cached", + "stale-but-best-known", + "no_candidates", + "unavailable", + "cached_oauth_cooldown", + "deterministic_oauth_cooldown", +] + + +def _now_iso() -> str: + return datetime.now(timezone.utc).isoformat() + + +class AssistantTargetSnapshot(BaseModel): + """A fast-moving memory-support target observed from the parent workflow.""" + + workflow_mode: AssistantWorkflowMode + target_kind: AssistantTargetKind + workflow_phase: str = "" + active_mode: str = "" + user_prompt: str = "" + current_prompt_or_topic: str = "" + current_submission_or_draft: str = "" + accepted_memory_summary: str = "" + writing_goal: str = "" + outline_summary: str = "" + paper_or_proof_draft_summary: str = "" + recent_activity_summary: str = "" + source_titles: list[str] = Field(default_factory=list) + target_statement: str = "" + lean_template: str = "" + formal_sketch: str = "" + lean_error: str = "" + rejection_feedback: str = "" + proof_attempt_feedback: str = "" + accepted_solver_summary: str = "" + source_title: str = "" + source_type: str = "" + source_id: str = "" + dependency_names: list[str] = Field(default_factory=list) + imports: list[str] = Field(default_factory=lambda: ["Mathlib"]) + target_hash: str = "" + created_at: str = Field(default_factory=_now_iso) + + def stable_hash(self) -> str: + if self.target_hash: + return self.target_hash + parts = [ + self.workflow_mode, + self.target_kind, + self.workflow_phase, + self.active_mode, + self.user_prompt, + self.current_prompt_or_topic, + self.current_submission_or_draft, + self.accepted_memory_summary, + self.writing_goal, + self.outline_summary, + self.paper_or_proof_draft_summary, + self.recent_activity_summary, + self.target_statement, + self.lean_template, + self.formal_sketch, + self.lean_error, + self.rejection_feedback, + self.proof_attempt_feedback, + self.source_title, + self.source_type, + self.source_id, + " ".join(self.source_titles), + " ".join(self.dependency_names), + " ".join(self.imports), + ] + return hashlib.sha256("\n\n".join(parts).encode("utf-8")).hexdigest() + + def search_text(self) -> str: + return "\n\n".join( + part + for part in [ + self.user_prompt, + self.current_prompt_or_topic, + self.current_submission_or_draft, + self.accepted_memory_summary, + self.writing_goal, + self.outline_summary, + self.paper_or_proof_draft_summary, + self.recent_activity_summary, + self.target_statement, + self.lean_template, + self.formal_sketch, + self.lean_error, + self.rejection_feedback, + self.proof_attempt_feedback, + self.accepted_solver_summary, + self.source_title, + " ".join(self.source_titles), + ] + if part and part.strip() + ) + + +class AssistantProofSupport(BaseModel): + """One proof support selected for the Assistant pack.""" + + search_id: str + corpus: ProofSearchCorpus + corpus_scope: str = "" + source_kind: str = "verified_proof" + proof_id: str + session_id: str = "" + fingerprint: str = "" + theorem_name: str = "" + theorem_statement: str + proof_description: str = "" + imports: list[str] = Field(default_factory=list) + dependency_names: list[str] = Field(default_factory=list) + theorem_statement_hash: str = "" + lean_code_hash: str = "" + canonical_uri: str = "" + relevance_reason: str = "" + transfer_hint: str = "" + has_hydrated_code: bool = False + lean_code: str = "" + + @classmethod + def from_record( + cls, + record: UnifiedProofSearchRecord, + *, + relevance_reason: str = "", + transfer_hint: str = "", + ) -> "AssistantProofSupport": + return cls( + search_id=record.search_id, + corpus=record.corpus, + corpus_scope=record.corpus_scope or record.release_id, + source_kind=record.source_kind, + proof_id=record.proof_id, + session_id=record.session_id, + fingerprint=record.external_fingerprint, + theorem_name=record.theorem_name or record.display_title, + theorem_statement=record.theorem_statement, + proof_description=record.proof_description or record.formal_sketch, + imports=list(record.imports or []), + dependency_names=list(record.dependency_names or []), + theorem_statement_hash=record.theorem_statement_hash, + lean_code_hash=record.lean_code_hash, + canonical_uri=record.canonical_uri, + relevance_reason=relevance_reason, + transfer_hint=transfer_hint, + has_hydrated_code=bool((record.lean_code or "").strip()), + lean_code=record.lean_code or "", + ) + + def metadata_only_dump(self) -> dict: + payload = self.model_dump(mode="json") + payload["lean_code"] = "" + payload["has_hydrated_code"] = bool(self.has_hydrated_code) + return payload + + +class AssistantProofPack(BaseModel): + """Latest non-blocking proof-support pack for one target snapshot.""" + + schema_version: str = "moto.assistant_proof_pack.v1" + created_at: str = Field(default_factory=_now_iso) + workflow_mode: AssistantWorkflowMode + target_kind: AssistantTargetKind + target_hash: str + query_summary: str = "" + freshness: AssistantFreshness = "fresh" + results: list[AssistantProofSupport] = Field(default_factory=list) + warnings: list[str] = Field(default_factory=list) + selection_mode: AssistantSelectionMode = "assistant_llm" + assistant_role_id: str = "" + assistant_model_id: str = "" + candidate_count: int = 0 + shortlist_count: int = 0 + selection_reasoning: str = "" + + def to_prompt_context(self, *, max_code_chars_per_result: int = 4000) -> str: + return self._to_prompt_context( + heading="ASSISTANT RETRIEVED PROOF SUPPORT", + max_code_chars_per_result=max_code_chars_per_result, + ) + + def to_memory_prompt_context(self, *, max_code_chars_per_result: int = 2500) -> str: + return self._to_prompt_context( + heading="ASSISTANT RETRIEVED MEMORY SUPPORT", + max_code_chars_per_result=max_code_chars_per_result, + ) + + def _to_prompt_context( + self, + *, + heading: str, + max_code_chars_per_result: int, + ) -> str: + if not self.results: + warning = " ".join(self.warnings).strip() + support_label = "memory" if "MEMORY" in heading else "proof" + return f"[Assistant {support_label} support unavailable. {warning}]".strip() + + lines = [ + heading, + f"Target hash: {self.target_hash}", + f"Freshness: {self.freshness}", + f"Selection mode: {self.selection_mode}", + f"Query summary: {self.query_summary or '[not provided]'}", + ( + "Use these verified memory records only as relevant mathematical context, " + "proof-pattern, dependency, or tactic guidance for the user's prompt/current target." + ), + ] + for index, support in enumerate(self.results[:7], start=1): + lean_code = support.lean_code or "" + if len(lean_code) > max_code_chars_per_result: + lean_code = ( + lean_code[:max_code_chars_per_result] + + "\n-- [assistant proof code truncated for prompt budget]" + ) + lines.extend( + [ + "", + f"{index}. {support.theorem_name or '[unnamed theorem]'}", + f"Source: {support.corpus} {support.corpus_scope}".strip(), + f"Source kind: {support.source_kind}", + f"Proof ID: {support.proof_id}", + f"Fingerprint: {support.fingerprint or '[none]'}", + f"Why relevant: {support.relevance_reason or '[ranked as relevant by Assistant retrieval]'}", + f"Transfer hint: {support.transfer_hint or '[inspect statement/dependencies for reusable proof shape]'}", + f"Statement: {support.theorem_statement}", + f"Description: {support.proof_description or '[none]'}", + f"Imports: {', '.join(support.imports) or '[none]'}", + f"Dependencies: {', '.join(support.dependency_names) or '[none]'}", + f"Theorem statement hash: {support.theorem_statement_hash or '[none]'}", + f"Lean code hash: {support.lean_code_hash or '[none]'}", + f"Canonical URI: {support.canonical_uri or '[none]'}", + "Lean code:", + lean_code or "[metadata-only support; use theorem/dependency shape only]", + ] + ) + if self.warnings: + lines.extend(["", "Assistant warnings:", *[f"- {warning}" for warning in self.warnings]]) + return "\n".join(lines) + + def metadata_only_dump(self) -> dict: + payload = self.model_dump(mode="json") + payload["results"] = [result.metadata_only_dump() for result in self.results] + return payload + diff --git a/backend/shared/proof_search/assistant_ranker.py b/backend/shared/proof_search/assistant_ranker.py new file mode 100644 index 0000000..0017d54 --- /dev/null +++ b/backend/shared/proof_search/assistant_ranker.py @@ -0,0 +1,331 @@ +"""Lightweight Assistant proof-support ranking and diversification.""" +from __future__ import annotations + +import math +import re +from collections import Counter +from dataclasses import dataclass + +from backend.shared.proof_search.assistant_cache import AssistantCandidateStats +from backend.shared.proof_search.assistant_models import ( + AssistantProofSupport, + AssistantTargetSnapshot, +) +from backend.shared.proof_search.models import UnifiedProofSearchRecord + +_TOKEN_RE = re.compile(r"[A-Za-z0-9_'.]+") +_CORPUS_TRUST = { + "moto": 1.0, + "manual": 0.95, + "leanoj": 0.95, + "syntheticlib4": 0.9, +} + + +@dataclass(frozen=True) +class RankedProofCandidate: + record: UnifiedProofSearchRecord + score: float + lexical_score: float + dependency_score: float + import_score: float + exact_score: float + trust_score: float + recency_score: float + relevance_reason: str + transfer_hint: str + + +def rank_assistant_proof_candidates( + records: list[UnifiedProofSearchRecord], + target: AssistantTargetSnapshot, + *, + limit: int = 7, + diversity_lambda: float = 0.25, + candidate_stats: dict[str, AssistantCandidateStats] | None = None, +) -> list[AssistantProofSupport]: + """Rank candidates with cheap scoring, persisted P-UCB state, and MMR diversity.""" + ranked = score_assistant_proof_candidates(records, target) + return select_assistant_proof_supports( + ranked, + limit=limit, + diversity_lambda=diversity_lambda, + candidate_stats=candidate_stats, + ) + + +def score_assistant_proof_candidates( + records: list[UnifiedProofSearchRecord], + target: AssistantTargetSnapshot, +) -> list[RankedProofCandidate]: + """Score and sort verified proof candidates before persistence/selection.""" + return _rank_candidates(records, target) + + +def select_assistant_proof_supports( + ranked_candidates: list[RankedProofCandidate], + *, + limit: int = 7, + diversity_lambda: float = 0.25, + candidate_stats: dict[str, AssistantCandidateStats] | None = None, + exploration_c: float = 0.2, +) -> list[AssistantProofSupport]: + """Select final supports using P-UCB and MMR-style diversity.""" + ranked = list(ranked_candidates) + stats = candidate_stats or {} + total_visits = sum(max(0, item.visits) for item in stats.values()) + 1 + selected: list[RankedProofCandidate] = [] + used_keys: set[str] = set() + + while ranked and len(selected) < max(1, limit): + best_index = 0 + best_score = -float("inf") + for index, candidate in enumerate(ranked): + keys = _dedupe_keys(candidate.record) + if keys and used_keys.intersection(keys): + continue + candidate_stats_entry = stats.get(candidate.record.search_id, AssistantCandidateStats()) + pucb_score = _pucb_score( + base_score=candidate.score, + visits=candidate_stats_entry.visits, + total_visits=total_visits, + exploration_c=exploration_c, + failure_penalty=candidate_stats_entry.failure_penalty, + ) + similarity = max( + (_record_similarity(candidate.record, chosen.record) for chosen in selected), + default=0.0, + ) + final_score = pucb_score - diversity_lambda * similarity + if final_score > best_score: + best_index = index + best_score = final_score + + candidate = ranked.pop(best_index) + keys = _dedupe_keys(candidate.record) + if keys and used_keys.intersection(keys): + continue + used_keys.update(keys) + selected.append(candidate) + + return [ + AssistantProofSupport.from_record( + candidate.record, + relevance_reason=candidate.relevance_reason, + transfer_hint=candidate.transfer_hint, + ) + for candidate in selected + ] + + +def ranked_candidates_to_cache_rows( + ranked_candidates: list[RankedProofCandidate], +) -> list[dict[str, object]]: + """Convert ranked candidates into SQLite payload rows without full Lean code.""" + rows: list[dict[str, object]] = [] + for candidate in ranked_candidates: + record = candidate.record + rows.append( + { + "search_id": record.search_id, + "proof_source": record.corpus, + "proof_id": record.proof_id, + "theorem_statement_hash": record.theorem_statement_hash, + "lean_code_hash": record.lean_code_hash, + "query_variant": "", + "retrieval_score": candidate.score, + "exact_match_score": candidate.exact_score, + "semantic_score": candidate.lexical_score, + "dependency_overlap_score": max( + candidate.dependency_score, + candidate.import_score, + ), + "corpus_trust_score": candidate.trust_score, + "recency_score": candidate.recency_score, + "duplicate_group": "|".join(sorted(_dedupe_keys(record))), + } + ) + return rows + + +def _rank_candidates( + records: list[UnifiedProofSearchRecord], + target: AssistantTargetSnapshot, +) -> list[RankedProofCandidate]: + target_tokens = _tokens(target.search_text()) + dependency_targets = {value.lower() for value in target.dependency_names if value} + import_targets = {value.lower() for value in target.imports if value} + ranked: list[RankedProofCandidate] = [] + + for record in records: + if not record.verified or record.source_kind != "verified_proof": + continue + record_text = "\n".join( + [ + record.theorem_name, + record.theorem_statement, + record.informal_statement, + record.proof_description, + record.formal_sketch, + record.source_title, + " ".join(record.dependency_names), + " ".join(record.imports), + " ".join(record.topic_tags), + " ".join(record.domain_tags), + ] + ) + record_tokens = _tokens(record_text) + lexical_score = _jaccard(target_tokens, record_tokens) + dependency_score = _overlap_score(dependency_targets, {value.lower() for value in record.dependency_names}) + import_score = _overlap_score(import_targets, {value.lower() for value in record.imports}) + exact_score = _exact_score(target, record) + trust_score = _CORPUS_TRUST.get(record.corpus, 0.75) + recency_score = _recency_score(record.created_at) + score = ( + 0.30 * lexical_score + + 0.20 * max(dependency_score, import_score) + + 0.20 * exact_score + + 0.15 * trust_score + + 0.10 * lexical_score + + 0.05 * recency_score + ) + ranked.append( + RankedProofCandidate( + record=record, + score=score, + lexical_score=lexical_score, + dependency_score=dependency_score, + import_score=import_score, + exact_score=exact_score, + trust_score=trust_score, + recency_score=recency_score, + relevance_reason=_relevance_reason( + lexical_score=lexical_score, + dependency_score=dependency_score, + import_score=import_score, + exact_score=exact_score, + ), + transfer_hint=_transfer_hint(record), + ) + ) + + ranked.sort(key=lambda candidate: candidate.score, reverse=True) + return ranked + + +def _tokens(value: str) -> set[str]: + tokens = { + match.group(0).lower() + for match in _TOKEN_RE.finditer(value or "") + if len(match.group(0)) > 2 + } + return tokens + + +def _jaccard(left: set[str], right: set[str]) -> float: + if not left or not right: + return 0.0 + return len(left.intersection(right)) / max(1, len(left.union(right))) + + +def _overlap_score(left: set[str], right: set[str]) -> float: + if not left or not right: + return 0.0 + return len(left.intersection(right)) / max(1, len(left)) + + +def _exact_score(target: AssistantTargetSnapshot, record: UnifiedProofSearchRecord) -> float: + haystack = " ".join( + [ + record.theorem_name, + record.theorem_statement, + record.display_title, + record.module, + record.source_path, + ] + ).lower() + exact_parts = [ + target.target_statement, + target.lean_template, + target.source_title, + ] + for part in exact_parts: + normalized = " ".join((part or "").lower().split()) + if normalized and len(normalized) > 24 and normalized in haystack: + return 1.0 + target_names = Counter(_tokens(" ".join(exact_parts))) + if not target_names: + return 0.0 + matched = sum(count for token, count in target_names.items() if token in haystack) + return min(1.0, matched / max(1, sum(target_names.values()))) + + +def _pucb_score( + *, + base_score: float, + visits: int, + total_visits: int, + exploration_c: float, + failure_penalty: float, +) -> float: + quality = _quality_score(base_score=base_score) + exploration = exploration_c * math.sqrt(max(1, total_visits)) / (max(0, visits) + 1) + return quality + exploration - max(0.0, failure_penalty) + + +def _quality_score(*, base_score: float) -> float: + return min(1.0, max(0.0, base_score)) + + +def _recency_score(created_at: str) -> float: + if not created_at: + return 0.0 + # ISO timestamps sort lexicographically; give a tiny stable bonus to dated records. + return 0.5 + + +def _dedupe_keys(record: UnifiedProofSearchRecord) -> set[str]: + return set(record.dedupe_keys() or {f"{record.corpus}:{record.search_id}"}) + + +def _record_similarity(left: UnifiedProofSearchRecord, right: UnifiedProofSearchRecord) -> float: + if left.theorem_statement_hash and left.theorem_statement_hash == right.theorem_statement_hash: + return 1.0 + left_tokens = _tokens( + " ".join([left.theorem_statement, left.theorem_name, " ".join(left.dependency_names)]) + ) + right_tokens = _tokens( + " ".join([right.theorem_statement, right.theorem_name, " ".join(right.dependency_names)]) + ) + return _jaccard(left_tokens, right_tokens) + + +def _relevance_reason( + *, + lexical_score: float, + dependency_score: float, + import_score: float, + exact_score: float, +) -> str: + reasons: list[str] = [] + if exact_score >= 0.5: + reasons.append("strong theorem/statement term overlap") + if dependency_score > 0: + reasons.append("shares requested dependencies") + if import_score > 0: + reasons.append("shares imports") + if lexical_score > 0: + reasons.append("lexically similar to the active target") + return "; ".join(reasons) or "selected as a diversified verified proof support" + + +def _transfer_hint(record: UnifiedProofSearchRecord) -> str: + deps = ", ".join(record.dependency_names[:5]) + imports = ", ".join(record.imports[:5]) + if deps: + return f"Check dependency/tactic transfer around: {deps}." + if imports: + return f"Check reusable Mathlib/import context around: {imports}." + if record.proof_description: + return "Use the proof description to identify reusable decomposition or tactic structure." + return "Compare theorem statement shape and Lean code for reusable proof patterns." diff --git a/backend/shared/proof_search/indexer.py b/backend/shared/proof_search/indexer.py new file mode 100644 index 0000000..71d54a2 --- /dev/null +++ b/backend/shared/proof_search/indexer.py @@ -0,0 +1,600 @@ +"""SQLite/FTS index for unified proof search.""" +from __future__ import annotations + +import json +import re +import sqlite3 +from collections import Counter +from contextlib import closing +from pathlib import Path +from typing import Any, Iterable + +from backend.shared.proof_search.models import ( + CorpusOverview, + ProofSearchRequest, + ProofSearchResponse, + UnifiedProofSearchRecord, +) + +RESULT_CAP = 7 + + +class ProofSearchIndexer: + """Small local SQLite FTS index for proof metadata and bounded retrieval.""" + + def __init__(self, db_path: Path) -> None: + self.db_path = Path(db_path) + + def rebuild(self, records: Iterable[UnifiedProofSearchRecord]) -> None: + self.db_path.parent.mkdir(parents=True, exist_ok=True) + unique_records = {record.search_id: record for record in records} + with closing(self._connect()) as conn: + self._create_schema(conn) + conn.execute("DELETE FROM proof_fts") + conn.execute("DELETE FROM proof_records") + conn.executemany( + _PROOF_RECORD_INSERT_SQL, + (self._record_payload(record) for record in unique_records.values()), + ) + conn.executemany( + _PROOF_FTS_INSERT_SQL, + (self._fts_payload(record) for record in unique_records.values()), + ) + conn.commit() + + def search(self, request: ProofSearchRequest) -> ProofSearchResponse: + if not self.db_path.exists(): + return ProofSearchResponse( + results=[], + result_count=0, + weak_result_warning="Proof-search index is not built yet.", + ) + + with closing(self._connect()) as conn: + self._create_schema(conn) + candidates = self._query_candidates(conn, request) + records = self._dedupe_and_limit(candidates, request, result_cap=RESULT_CAP) + corpus_counts = self._corpus_counts(conn, request.corpora) + + if not request.hydrate_lean_code: + records = [record.model_copy(update={"lean_code": ""}) for record in records] + + warning = None + if not records: + warning = ( + "No proof records matched this query. Try a theorem name, dependency, " + "module, import, or a broader goal statement." + ) + + return ProofSearchResponse( + results=records, + result_count=len(records), + next_cursor=None, + searched_corpora=sorted(set(request.corpora)), + corpus_counts=corpus_counts, + ranking_notes=( + "Ranked with SQLite FTS lexical matching, exact import/dependency boosts, " + "and duplicate suppression by fingerprint/local ID/hash." + ), + weak_result_warning=warning, + ) + + def search_candidate_pool( + self, + request: ProofSearchRequest, + *, + pool_limit: int, + exclude_corpus_scopes: Iterable[str] | None = None, + exclude_session_ids: Iterable[str] | None = None, + ) -> list[UnifiedProofSearchRecord]: + """Return a wider internal candidate pool without changing public route caps.""" + if not self.db_path.exists(): + return [] + + with closing(self._connect()) as conn: + self._create_schema(conn) + candidates = self._query_candidates( + conn, + request, + candidate_limit=pool_limit * 4, + exclude_corpus_scopes=exclude_corpus_scopes, + exclude_session_ids=exclude_session_ids, + ) + records = self._dedupe_and_limit(candidates, request, result_cap=pool_limit) + + if not request.hydrate_lean_code: + records = [record.model_copy(update={"lean_code": ""}) for record in records] + return records + + def get_record( + self, + *, + corpus: str, + proof_id: str, + session_id: str | None = None, + ) -> UnifiedProofSearchRecord | None: + """Fetch one indexed record for detail/hydration flows.""" + if not self.db_path.exists(): + return None + + with closing(self._connect()) as conn: + self._create_schema(conn) + clauses = ["corpus = ?", "(proof_id = ? OR search_id = ?)"] + params: list[Any] = [corpus, proof_id, proof_id] + if session_id: + clauses.append("session_id = ?") + params.append(session_id) + row = conn.execute( + f""" + SELECT * + FROM proof_records + WHERE {' AND '.join(clauses)} + ORDER BY + CASE corpus_scope + WHEN 'active' THEN 0 + WHEN 'current' THEN 1 + ELSE 2 + END, + created_at DESC + LIMIT 1 + """, + params, + ).fetchone() + + return self._row_to_record(row) if row else None + + def overview(self, corpora: Iterable[str] | None = None) -> CorpusOverview: + if not self.db_path.exists(): + return CorpusOverview( + total_records=0, + verified_records=0, + partial_records=0, + failed_attempt_records=0, + corpora=[], + search_fields=_SEARCH_FIELDS, + recommended_queries=_RECOMMENDED_QUERIES, + ) + + with closing(self._connect()) as conn: + self._create_schema(conn) + if corpora is None: + rows = conn.execute("SELECT * FROM proof_records").fetchall() + else: + corpus_list = list(corpora) + if corpus_list: + placeholders = ",".join("?" for _ in corpus_list) + rows = conn.execute( + f"SELECT * FROM proof_records WHERE corpus IN ({placeholders})", + corpus_list, + ).fetchall() + else: + rows = [] + + records = [self._row_to_record(row) for row in rows] + corpus_counts: Counter[str] = Counter(record.corpus for record in records) + source_counts: Counter[str] = Counter(record.source_kind for record in records) + novelty_counts: Counter[str] = Counter( + record.novelty_tier or "unknown" for record in records + ) + module_counts: Counter[str] = Counter( + value for record in records for value in [record.module or record.source_path] if value + ) + import_counts: Counter[str] = Counter( + value for record in records for value in record.imports if value + ) + dependency_counts: Counter[str] = Counter( + value for record in records for value in record.dependency_names if value + ) + tag_counts: Counter[str] = Counter( + value for record in records for value in [*record.topic_tags, *record.domain_tags] if value + ) + + corpora = [ + { + "id": corpus, + "count": count, + "freshness": self._freshness_for_corpus(corpus, records), + "description": _CORPUS_DESCRIPTIONS.get(corpus, "Proof records"), + } + for corpus, count in sorted(corpus_counts.items()) + ] + + return CorpusOverview( + total_records=len(records), + verified_records=source_counts.get("verified_proof", 0), + partial_records=source_counts.get("partial_proof", 0), + failed_attempt_records=source_counts.get("failed_attempt", 0), + corpora=corpora, + novelty_distribution=dict(novelty_counts), + top_modules=_top_counts(module_counts), + top_imports=_top_counts(import_counts), + top_dependencies=_top_counts(dependency_counts), + top_tags=_top_counts(tag_counts), + search_fields=_SEARCH_FIELDS, + recommended_queries=_RECOMMENDED_QUERIES, + ) + + def _connect(self) -> sqlite3.Connection: + conn = sqlite3.connect(str(self.db_path)) + conn.row_factory = sqlite3.Row + return conn + + def _create_schema(self, conn: sqlite3.Connection) -> None: + conn.execute( + """ + CREATE TABLE IF NOT EXISTS proof_records ( + search_id TEXT PRIMARY KEY, + corpus TEXT NOT NULL, + corpus_scope TEXT NOT NULL, + source_kind TEXT NOT NULL, + proof_id TEXT NOT NULL, + external_fingerprint TEXT NOT NULL, + release_id TEXT NOT NULL, + session_id TEXT NOT NULL, + source_type TEXT NOT NULL, + source_id TEXT NOT NULL, + source_title TEXT NOT NULL, + display_title TEXT NOT NULL, + theorem_name TEXT NOT NULL, + theorem_statement TEXT NOT NULL, + informal_statement TEXT NOT NULL, + proof_description TEXT NOT NULL, + formal_sketch TEXT NOT NULL, + lean_code TEXT NOT NULL, + lean_code_hash TEXT NOT NULL, + theorem_statement_hash TEXT NOT NULL, + imports_json TEXT NOT NULL, + dependency_names_json TEXT NOT NULL, + topic_tags_json TEXT NOT NULL, + domain_tags_json TEXT NOT NULL, + module TEXT NOT NULL, + source_path TEXT NOT NULL, + novelty_tier TEXT NOT NULL, + novelty_reasoning TEXT NOT NULL, + verified INTEGER NOT NULL, + created_at TEXT NOT NULL, + canonical_uri TEXT NOT NULL, + metadata_json TEXT NOT NULL + ) + """ + ) + conn.execute( + """ + CREATE VIRTUAL TABLE IF NOT EXISTS proof_fts USING fts5( + search_id UNINDEXED, + theorem_name, + theorem_statement, + informal_statement, + proof_description, + formal_sketch, + source_title, + module, + source_path, + novelty_reasoning, + dependencies, + imports, + tags, + display_title + ) + """ + ) + + def _record_payload(self, record: UnifiedProofSearchRecord) -> dict[str, Any]: + payload = record.model_dump(mode="json") + return { + **payload, + "imports_json": json.dumps(record.imports), + "dependency_names_json": json.dumps(record.dependency_names), + "topic_tags_json": json.dumps(record.topic_tags), + "domain_tags_json": json.dumps(record.domain_tags), + "metadata_json": json.dumps(record.metadata), + "verified": 1 if record.verified else 0, + } + + def _fts_payload(self, record: UnifiedProofSearchRecord) -> tuple[str, ...]: + return ( + record.search_id, + record.theorem_name, + record.theorem_statement, + record.informal_statement, + record.proof_description, + record.formal_sketch, + record.source_title, + record.module, + record.source_path, + record.novelty_reasoning, + " ".join(record.dependency_names), + " ".join(record.imports), + " ".join([*record.topic_tags, *record.domain_tags]), + record.display_title, + ) + + def _upsert_record(self, conn: sqlite3.Connection, record: UnifiedProofSearchRecord) -> None: + conn.execute( + _PROOF_RECORD_INSERT_SQL, + self._record_payload(record), + ) + conn.execute("DELETE FROM proof_fts WHERE search_id = ?", (record.search_id,)) + conn.execute( + _PROOF_FTS_INSERT_SQL, + self._fts_payload(record), + ) + + def _query_candidates( + self, + conn: sqlite3.Connection, + request: ProofSearchRequest, + *, + candidate_limit: int | None = None, + exclude_corpus_scopes: Iterable[str] | None = None, + exclude_session_ids: Iterable[str] | None = None, + ) -> list[tuple[UnifiedProofSearchRecord, float]]: + clauses = [] + params: list[Any] = [] + + if request.corpora: + placeholders = ",".join("?" for _ in request.corpora) + clauses.append(f"r.corpus IN ({placeholders})") + params.extend(request.corpora) + if request.verified_only: + clauses.append("r.verified = 1") + if not request.include_partial: + clauses.append("r.source_kind != 'partial_proof'") + if not request.include_failed: + clauses.append("r.source_kind != 'failed_attempt'") + if request.novelty_filters: + placeholders = ",".join("?" for _ in request.novelty_filters) + clauses.append(f"r.novelty_tier IN ({placeholders})") + params.extend(request.novelty_filters) + if request.module_filters: + module_clauses = [] + for module in request.module_filters: + module_clauses.append("(r.module LIKE ? OR r.source_path LIKE ?)") + params.extend([f"%{module}%", f"%{module}%"]) + clauses.append(f"({' OR '.join(module_clauses)})") + if request.source_filters: + placeholders = ",".join("?" for _ in request.source_filters) + clauses.append(f"r.source_type IN ({placeholders})") + params.extend(request.source_filters) + if request.exclude_ids: + placeholders = ",".join("?" for _ in request.exclude_ids) + clauses.append(f"r.search_id NOT IN ({placeholders})") + params.extend(request.exclude_ids) + excluded_scopes = [scope for scope in (exclude_corpus_scopes or []) if scope] + if excluded_scopes: + placeholders = ",".join("?" for _ in excluded_scopes) + clauses.append(f"(r.corpus = 'syntheticlib4' OR r.corpus_scope NOT IN ({placeholders}))") + params.extend(excluded_scopes) + excluded_sessions = [session_id for session_id in (exclude_session_ids or []) if session_id] + if excluded_sessions: + placeholders = ",".join("?" for _ in excluded_sessions) + clauses.append(f"r.session_id NOT IN ({placeholders})") + params.extend(excluded_sessions) + + where_sql = f"WHERE {' AND '.join(clauses)}" if clauses else "" + fts_query = _build_fts_query( + " ".join( + [ + request.query, + request.goal_statement, + " ".join(request.imports), + " ".join(request.dependency_names), + ] + ) + ) + limit = candidate_limit or max(RESULT_CAP * 4, min(request.limit, RESULT_CAP) * 4) + + if fts_query: + sql = f""" + SELECT r.*, bm25(proof_fts) AS rank + FROM proof_fts + JOIN proof_records r ON r.search_id = proof_fts.search_id + {where_sql} {'AND' if where_sql else 'WHERE'} proof_fts MATCH ? + ORDER BY rank ASC + LIMIT ? + """ + rows = conn.execute(sql, [*params, fts_query, limit]).fetchall() + else: + sql = f"SELECT r.*, 0.0 AS rank FROM proof_records r {where_sql} LIMIT ?" + rows = conn.execute(sql, [*params, limit]).fetchall() + + candidates: list[tuple[UnifiedProofSearchRecord, float]] = [] + for row in rows: + record = self._row_to_record(row) + score = -float(row["rank"] or 0.0) + score += self._exact_boost(record, request) + candidates.append((record, score)) + + candidates.sort(key=lambda item: item[1], reverse=True) + return candidates + + def _dedupe_and_limit( + self, + candidates: list[tuple[UnifiedProofSearchRecord, float]], + request: ProofSearchRequest, + *, + result_cap: int, + ) -> list[UnifiedProofSearchRecord]: + result_limit = min(request.limit, result_cap) + used_keys: set[str] = set() + records: list[UnifiedProofSearchRecord] = [] + + for record, _score in candidates: + keys = record.dedupe_keys() + if keys and any(key in used_keys for key in keys): + continue + used_keys.update(keys) + records.append(record) + if len(records) >= result_limit: + break + + return records + + def _exact_boost(self, record: UnifiedProofSearchRecord, request: ProofSearchRequest) -> float: + boost = 0.0 + imports = {value.lower() for value in request.imports} + dependencies = {value.lower() for value in request.dependency_names} + if imports.intersection(value.lower() for value in record.imports): + boost += 3.0 + if dependencies.intersection(value.lower() for value in record.dependency_names): + boost += 3.0 + query = (request.query or "").strip().lower() + if query and query in record.theorem_name.lower(): + boost += 4.0 + if record.verified: + boost += 1.0 + return boost + + def _row_to_record(self, row: sqlite3.Row) -> UnifiedProofSearchRecord: + def _json_list(field: str) -> list[str]: + try: + value = json.loads(row[field] or "[]") + return [str(item) for item in value if str(item).strip()] + except json.JSONDecodeError: + return [] + + try: + metadata = json.loads(row["metadata_json"] or "{}") + except json.JSONDecodeError: + metadata = {} + + return UnifiedProofSearchRecord( + search_id=row["search_id"], + corpus=row["corpus"], + corpus_scope=row["corpus_scope"], + source_kind=row["source_kind"], + proof_id=row["proof_id"], + external_fingerprint=row["external_fingerprint"], + release_id=row["release_id"], + session_id=row["session_id"], + source_type=row["source_type"], + source_id=row["source_id"], + source_title=row["source_title"], + display_title=row["display_title"], + theorem_name=row["theorem_name"], + theorem_statement=row["theorem_statement"], + informal_statement=row["informal_statement"], + proof_description=row["proof_description"], + formal_sketch=row["formal_sketch"], + lean_code=row["lean_code"], + lean_code_hash=row["lean_code_hash"], + theorem_statement_hash=row["theorem_statement_hash"], + imports=_json_list("imports_json"), + dependency_names=_json_list("dependency_names_json"), + topic_tags=_json_list("topic_tags_json"), + domain_tags=_json_list("domain_tags_json"), + module=row["module"], + source_path=row["source_path"], + novelty_tier=row["novelty_tier"], + novelty_reasoning=row["novelty_reasoning"], + verified=bool(row["verified"]), + created_at=row["created_at"], + canonical_uri=row["canonical_uri"], + metadata=metadata, + ) + + def _corpus_counts(self, conn: sqlite3.Connection, corpora: Iterable[str] | None = None) -> dict[str, int]: + corpus_list = list(corpora or []) + if corpus_list: + placeholders = ",".join("?" for _ in corpus_list) + rows = conn.execute( + f""" + SELECT corpus, COUNT(*) AS count + FROM proof_records + WHERE corpus IN ({placeholders}) + GROUP BY corpus + """, + corpus_list, + ).fetchall() + elif corpora is not None: + rows = [] + else: + rows = conn.execute( + "SELECT corpus, COUNT(*) AS count FROM proof_records GROUP BY corpus" + ).fetchall() + return {str(row["corpus"]): int(row["count"]) for row in rows} + + def _freshness_for_corpus( + self, + corpus: str, + records: list[UnifiedProofSearchRecord], + ) -> str: + if corpus == "syntheticlib4": + release_ids = sorted({record.release_id for record in records if record.release_id}) + return f"release {release_ids[-1]}" if release_ids else "local fixture" + return "current" + + +def _build_fts_query(raw_query: str) -> str: + terms = [term for term in re.findall(r"[A-Za-z0-9_'.]+", raw_query or "") if len(term) > 1] + unique_terms = [] + for term in terms: + cleaned = term.replace("'", "''") + if cleaned not in unique_terms: + unique_terms.append(cleaned) + return " OR ".join(f'"{term}"' for term in unique_terms[:16]) + + +def _top_counts(counter: Counter[str], limit: int = 10) -> list[dict[str, Any]]: + return [{"value": value, "count": count} for value, count in counter.most_common(limit)] + + +_PROOF_RECORD_INSERT_SQL = """ + INSERT OR REPLACE INTO proof_records ( + search_id, corpus, corpus_scope, source_kind, proof_id, + external_fingerprint, release_id, session_id, source_type, source_id, + source_title, display_title, theorem_name, theorem_statement, + informal_statement, proof_description, formal_sketch, lean_code, + lean_code_hash, theorem_statement_hash, imports_json, + dependency_names_json, topic_tags_json, domain_tags_json, module, + source_path, novelty_tier, novelty_reasoning, verified, created_at, + canonical_uri, metadata_json + ) VALUES ( + :search_id, :corpus, :corpus_scope, :source_kind, :proof_id, + :external_fingerprint, :release_id, :session_id, :source_type, :source_id, + :source_title, :display_title, :theorem_name, :theorem_statement, + :informal_statement, :proof_description, :formal_sketch, :lean_code, + :lean_code_hash, :theorem_statement_hash, :imports_json, + :dependency_names_json, :topic_tags_json, :domain_tags_json, :module, + :source_path, :novelty_tier, :novelty_reasoning, :verified, :created_at, + :canonical_uri, :metadata_json + ) +""" + +_PROOF_FTS_INSERT_SQL = """ + INSERT INTO proof_fts ( + search_id, theorem_name, theorem_statement, informal_statement, + proof_description, formal_sketch, source_title, module, source_path, + novelty_reasoning, dependencies, imports, tags, display_title + ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) +""" + + +_CORPUS_DESCRIPTIONS = { + "moto": "Autonomous MOTO proof records from canonical ProofDatabase stores", + "manual": "Active and archived manual Aggregator/Compiler proof records", + "leanoj": "LeanOJ verified proof records registered into MOTO proof storage", + "syntheticlib4": "Authorized local SyntheticLib4 snapshot or offline fixtures", +} + +_SEARCH_FIELDS = [ + "theorem_name", + "theorem_statement", + "informal_statement", + "proof_description", + "formal_sketch", + "source_title", + "module", + "source_path", + "imports", + "dependency_names", + "topic_tags", + "domain_tags", +] + +_RECOMMENDED_QUERIES = [ + "Search by theorem goal: finite sum cancellation over Nat", + "Search by dependency: Finset.sum_congr Nat.add_comm", + "Search by module or source path: SyntheticLib4.Finset", +] + diff --git a/backend/shared/proof_search/models.py b/backend/shared/proof_search/models.py new file mode 100644 index 0000000..3e24c94 --- /dev/null +++ b/backend/shared/proof_search/models.py @@ -0,0 +1,128 @@ +"""Normalized models for unified proof search.""" +from __future__ import annotations + +from typing import Any, Literal + +from pydantic import BaseModel, Field + +from backend.shared.config import system_config + + +ProofSearchCorpus = Literal["moto", "manual", "leanoj", "syntheticlib4"] +ProofSearchSourceKind = Literal["verified_proof", "partial_proof", "failed_attempt"] + + +def default_proof_search_corpora() -> list[ProofSearchCorpus]: + """Return the enabled default proof-search corpora for AI retrieval.""" + corpora: list[ProofSearchCorpus] = [] + if system_config.agent_conversation_memory_enabled: + corpora.extend(["moto", "manual", "leanoj"]) + if system_config.syntheticlib4_enabled: + corpora.append("syntheticlib4") + return corpora + + +class UnifiedProofSearchRecord(BaseModel): + """One proof-like record normalized for local search and AI retrieval.""" + + search_id: str + corpus: ProofSearchCorpus + corpus_scope: str = "" + source_kind: ProofSearchSourceKind = "verified_proof" + proof_id: str + external_fingerprint: str = "" + release_id: str = "" + session_id: str = "" + source_type: str = "" + source_id: str = "" + source_title: str = "" + display_title: str = "" + theorem_name: str = "" + theorem_statement: str + informal_statement: str = "" + proof_description: str = "" + formal_sketch: str = "" + lean_code: str = "" + lean_code_hash: str = "" + theorem_statement_hash: str = "" + imports: list[str] = Field(default_factory=list) + dependency_names: list[str] = Field(default_factory=list) + topic_tags: list[str] = Field(default_factory=list) + domain_tags: list[str] = Field(default_factory=list) + module: str = "" + source_path: str = "" + novelty_tier: str = "" + novelty_reasoning: str = "" + verified: bool = True + created_at: str = "" + canonical_uri: str + metadata: dict[str, Any] = Field(default_factory=dict) + + def dedupe_keys(self) -> list[str]: + """Return strong identity keys for exact duplicate suppression.""" + keys: list[str] = [] + if self.external_fingerprint: + keys.append(f"fingerprint:{self.external_fingerprint}") + if self.corpus != "syntheticlib4" and self.proof_id: + keys.append(f"local:{self.corpus}:{self.proof_id}") + if self.theorem_statement_hash and self.lean_code_hash: + keys.append(f"hash:{self.theorem_statement_hash}:{self.lean_code_hash}") + return keys + + +class ProofSearchRequest(BaseModel): + """Backend search request shared by routes and the future AI-facing tool.""" + + query: str = "" + goal_statement: str = "" + imports: list[str] = Field(default_factory=list) + dependency_names: list[str] = Field(default_factory=list) + corpora: list[ProofSearchCorpus] = Field(default_factory=default_proof_search_corpora) + verified_only: bool = True + include_partial: bool = False + include_failed: bool = False + novelty_filters: list[str] = Field(default_factory=list) + module_filters: list[str] = Field(default_factory=list) + source_filters: list[str] = Field(default_factory=list) + limit: int = Field(default=7, ge=1) + cursor: str | None = None + exclude_ids: list[str] = Field(default_factory=list) + hydrate_lean_code: bool = True + search_mode: Literal["text", "exact", "hybrid"] = "hybrid" + + +class PublicProofSearchRequest(ProofSearchRequest): + """Public REST proof-search request capped by the web contract.""" + + limit: int = Field(default=7, ge=1, le=7) + + +class ProofSearchResponse(BaseModel): + results: list[UnifiedProofSearchRecord] + result_count: int + next_cursor: str | None = None + searched_corpora: list[str] = Field(default_factory=list) + corpus_counts: dict[str, int] = Field(default_factory=dict) + ranking_notes: str = "" + weak_result_warning: str | None = None + + +class CorpusOverview(BaseModel): + total_records: int + verified_records: int + partial_records: int + failed_attempt_records: int + corpora: list[dict[str, Any]] + novelty_distribution: dict[str, int] = Field(default_factory=dict) + top_modules: list[dict[str, Any]] = Field(default_factory=list) + top_imports: list[dict[str, Any]] = Field(default_factory=list) + top_dependencies: list[dict[str, Any]] = Field(default_factory=list) + top_tags: list[dict[str, Any]] = Field(default_factory=list) + search_fields: list[str] = Field(default_factory=list) + recommended_queries: list[str] = Field(default_factory=list) + result_cap: int = 7 + hydration_behavior: str = ( + "Search returns at most 7 combined records. Full Lean code is included " + "only when available and requested; metadata-only records can be hydrated later." + ) + diff --git a/backend/shared/proof_search/moto_sources.py b/backend/shared/proof_search/moto_sources.py new file mode 100644 index 0000000..28580f8 --- /dev/null +++ b/backend/shared/proof_search/moto_sources.py @@ -0,0 +1,185 @@ +"""Canonical MOTO proof database normalization for unified proof search.""" +from __future__ import annotations + +import hashlib +from pathlib import Path +from typing import Any + +from backend.autonomous.memory.proof_database import manual_proof_database, proof_database +from backend.shared.models import ProofDependency, ProofRecord +from backend.shared.proof_search.models import UnifiedProofSearchRecord + + +async def load_moto_proof_records() -> list[UnifiedProofSearchRecord]: + """Collect current canonical MOTO proof records without reading display appendices.""" + records: list[UnifiedProofSearchRecord] = [] + records.extend(await _records_from_database(proof_database, default_corpus="moto")) + records.extend(await _records_from_autonomous_history()) + records.extend(await _records_from_database(manual_proof_database, default_corpus="manual")) + records.extend(await _records_from_manual_history()) + return records + + +async def _records_from_database(database, *, default_corpus: str) -> list[UnifiedProofSearchRecord]: + try: + proofs = await database.get_all_proofs(novel_only=None) + except Exception: + return [] + return [ + normalize_proof_record(proof, default_corpus=default_corpus, session_id="") + for proof in proofs + ] + + +async def _records_from_autonomous_history() -> list[UnifiedProofSearchRecord]: + try: + entries = await proof_database.list_proof_library(novel_only=False) + except Exception: + return [] + records: list[UnifiedProofSearchRecord] = [] + for entry in entries: + full_entry = entry + session_id = str(entry.get("session_id", "") or "") + proof_id = str(entry.get("proof_id", "") or "") + if session_id and proof_id: + try: + hydrated = await proof_database.get_library_proof(session_id, proof_id) + except Exception: + hydrated = None + if hydrated: + full_entry = {**entry, **hydrated} + records.append(_record_from_library_entry(full_entry, default_corpus="moto")) + return records + + +async def _records_from_manual_history() -> list[UnifiedProofSearchRecord]: + try: + from backend.shared.config import system_config + + entries = await manual_proof_database.list_proof_library_from_history( + Path(system_config.data_dir) / "manual_proof_runs", + novel_only=False, + ) + except Exception: + return [] + + return [_record_from_library_entry(entry, default_corpus="manual") for entry in entries] + + +def _record_from_library_entry( + entry: dict[str, Any], + *, + default_corpus: str, +) -> UnifiedProofSearchRecord: + theorem_statement = str(entry.get("theorem_statement", "") or "") + proof_id = str(entry.get("proof_id", "") or "") + session_id = str(entry.get("session_id", "") or "") + source_type = str(entry.get("source_type", "") or "") + corpus = "leanoj" if source_type.startswith("leanoj_") else default_corpus + lean_code = str(entry.get("lean_code", "") or "") + return UnifiedProofSearchRecord( + search_id=f"{corpus}:{session_id}:{proof_id}", + corpus=corpus, + corpus_scope="archived" if default_corpus == "manual" else "history", + source_kind="verified_proof", + proof_id=proof_id, + session_id=session_id, + source_type=source_type, + source_id=str(entry.get("source_id", "") or ""), + source_title=str(entry.get("source_title", "") or ""), + display_title=str(entry.get("theorem_name", "") or proof_id), + theorem_name=str(entry.get("theorem_name", "") or ""), + theorem_statement=theorem_statement, + formal_sketch=str(entry.get("formal_sketch", "") or ""), + lean_code=lean_code, + lean_code_hash=_sha256_text(lean_code) if lean_code else "", + theorem_statement_hash=_sha256_text(theorem_statement), + dependency_names=_dependency_names(entry.get("dependencies", [])), + novelty_tier=str(entry.get("novelty_tier", "") or ""), + novelty_reasoning=str(entry.get("novelty_reasoning", "") or ""), + verified=True, + created_at=str(entry.get("created_at", "") or ""), + canonical_uri=f"moto-proof://{corpus}/{session_id}/{proof_id}", + metadata={ + "novel": bool(entry.get("novel", False)), + "solver": str(entry.get("solver", "") or "Lean 4"), + "attempt_count": entry.get("attempt_count", 0), + "verification_notes": str(entry.get("verification_notes", "") or ""), + "user_prompt": str(entry.get("user_prompt", "") or ""), + }, + ) + + +def normalize_proof_record( + proof: ProofRecord, + *, + default_corpus: str = "moto", + session_id: str = "", +) -> UnifiedProofSearchRecord: + """Convert a stored ProofRecord into the shared search model.""" + corpus = "leanoj" if proof.source_type.startswith("leanoj_") else default_corpus + lean_code_hash = _sha256_text(proof.lean_code) if proof.lean_code else "" + statement_hash = _sha256_text(proof.theorem_statement) + scope = "active" if default_corpus == "manual" else "current" + + return UnifiedProofSearchRecord( + search_id=f"{corpus}:{session_id}:{proof.proof_id}", + corpus=corpus, + corpus_scope=scope, + source_kind="verified_proof", + proof_id=proof.proof_id, + session_id=session_id, + source_type=proof.source_type, + source_id=proof.source_id, + source_title=proof.source_title, + display_title=proof.theorem_name or proof.theorem_id or proof.proof_id, + theorem_name=proof.theorem_name, + theorem_statement=proof.theorem_statement, + formal_sketch=proof.formal_sketch, + lean_code=proof.lean_code, + lean_code_hash=lean_code_hash, + theorem_statement_hash=statement_hash, + imports=_import_names(proof.dependencies), + dependency_names=_dependency_names(proof.dependencies), + novelty_tier=proof.novelty_tier, + novelty_reasoning=proof.novelty_reasoning, + verified=True, + created_at=proof.created_at.isoformat() if proof.created_at else "", + canonical_uri=f"moto-proof://{corpus}/{proof.proof_id}", + metadata={ + "theorem_id": proof.theorem_id, + "solver": proof.solver, + "novel": proof.novel, + "verification_notes": proof.verification_notes, + "attempt_count": proof.attempt_count, + }, + ) + + +def _dependency_names(dependencies: list[Any]) -> list[str]: + names: list[str] = [] + for dependency in dependencies or []: + if isinstance(dependency, ProofDependency): + name = dependency.name + elif isinstance(dependency, dict): + name = str(dependency.get("name", "") or "") + else: + name = "" + if name: + names.append(name) + return names + + +def _import_names(dependencies: list[Any]) -> list[str]: + imports = [] + for dependency in dependencies or []: + kind = dependency.kind if isinstance(dependency, ProofDependency) else dependency.get("kind", "") + name = dependency.name if isinstance(dependency, ProofDependency) else dependency.get("name", "") + if kind == "mathlib" and name: + imports.append(str(name).split(".")[0]) + return sorted(set(imports)) + + +def _sha256_text(value: str) -> str: + return hashlib.sha256((value or "").encode("utf-8")).hexdigest() + diff --git a/backend/shared/proof_search/search_service.py b/backend/shared/proof_search/search_service.py new file mode 100644 index 0000000..d3455dc --- /dev/null +++ b/backend/shared/proof_search/search_service.py @@ -0,0 +1,190 @@ +"""Shared proof-search service used by routes and future AI tool adapters.""" +from __future__ import annotations + +import asyncio +from pathlib import Path + +from backend.shared.config import system_config +from backend.shared.proof_search.indexer import ProofSearchIndexer +from backend.shared.proof_search.models import ( + CorpusOverview, + ProofSearchRequest, + ProofSearchResponse, + UnifiedProofSearchRecord, + default_proof_search_corpora, +) +from backend.shared.proof_search.moto_sources import load_moto_proof_records +from backend.shared.proof_search.syntheticlib4_sources import ( + load_syntheticlib4_fixture_records, + normalize_syntheticlib4_record, +) +from backend.shared.syntheticlib4_client import syntheticlib4_client + + +class ProofSearchService: + """Coordinates source normalization and the local SQLite proof-search index.""" + + def __init__(self, index_path: Path | None = None) -> None: + self._explicit_index_path = Path(index_path) if index_path else None + self._lock = asyncio.Lock() + + @property + def index_path(self) -> Path: + if self._explicit_index_path is not None: + return self._explicit_index_path + return Path(system_config.data_dir) / "proof_search" / "proof_search.sqlite" + + async def rebuild_index(self, *, include_disabled: bool = False) -> CorpusOverview: + """Rebuild the unified index from currently available local sources.""" + async with self._lock: + records = await self._load_records() + indexer = ProofSearchIndexer(self.index_path) + await asyncio.to_thread(indexer.rebuild, records) + return await asyncio.to_thread( + indexer.overview, + None if include_disabled else default_proof_search_corpora(), + ) + + async def overview(self, *, include_disabled: bool = False) -> CorpusOverview: + await self._ensure_index() + return await asyncio.to_thread( + ProofSearchIndexer(self.index_path).overview, + None if include_disabled else default_proof_search_corpora(), + ) + + async def search(self, request: ProofSearchRequest) -> ProofSearchResponse: + request = self._filter_request_corpora(request) + if not request.corpora: + return ProofSearchResponse( + results=[], + result_count=0, + searched_corpora=[], + corpus_counts={}, + ranking_notes="All proof-search memory corpora are disabled.", + weak_result_warning="No proof-search corpora are enabled for this request.", + ) + await self._ensure_index() + return await asyncio.to_thread(ProofSearchIndexer(self.index_path).search, request) + + async def search_candidate_pool( + self, + request: ProofSearchRequest, + *, + pool_limit: int, + exclude_corpus_scopes: list[str] | None = None, + exclude_session_ids: list[str] | None = None, + ) -> list[UnifiedProofSearchRecord]: + """Return a wider internal candidate pool for Assistant ranking. + + Public route/tool search remains capped by the indexer's normal result + cap; this internal path lets Assistant gather enough verified candidates + for visit-count P-UCB/MMR selection without changing the REST contract. + """ + request = self._filter_request_corpora(request) + if not request.corpora: + return [] + await self._ensure_index() + return await asyncio.to_thread( + ProofSearchIndexer(self.index_path).search_candidate_pool, + request, + pool_limit=pool_limit, + exclude_corpus_scopes=exclude_corpus_scopes, + exclude_session_ids=exclude_session_ids, + ) + + async def get_record( + self, + *, + corpus: str, + proof_id: str, + session_id: str | None = None, + ) -> UnifiedProofSearchRecord | None: + """Fetch one indexed proof and hydrate SyntheticLib4 fixture code when available.""" + if corpus not in set(default_proof_search_corpora()): + return None + await self._ensure_index() + record = await asyncio.to_thread( + ProofSearchIndexer(self.index_path).get_record, + corpus=corpus, + proof_id=proof_id, + session_id=session_id, + ) + if record is None or record.corpus != "syntheticlib4" or record.lean_code: + return record + + hydrated = await asyncio.to_thread( + syntheticlib4_client.hydrate_proof, + record.external_fingerprint or record.proof_id, + ) + if not hydrated or not str(hydrated.get("lean_code") or "").strip(): + return record + + hydrated_record = normalize_syntheticlib4_record( + hydrated, + release_id=record.release_id, + channel=record.corpus_scope or "stable", + ) + if hydrated_record.theorem_statement_hash != record.theorem_statement_hash: + raise ValueError("SyntheticLib4 hydration theorem-statement hash mismatch") + if hydrated_record.lean_code_hash != record.lean_code_hash: + raise ValueError("SyntheticLib4 hydration Lean-code hash mismatch") + return hydrated_record + + async def _ensure_index(self) -> None: + if self.index_path.exists() and not await asyncio.to_thread( + self._sources_are_newer_than_index + ): + return + await self.rebuild_index() + + def _filter_request_corpora(self, request: ProofSearchRequest) -> ProofSearchRequest: + enabled = set(default_proof_search_corpora()) + requested = [corpus for corpus in request.corpora if corpus in enabled] + if requested == request.corpora: + return request + return request.model_copy(update={"corpora": requested}) + + def _sources_are_newer_than_index(self) -> bool: + try: + index_mtime = self.index_path.stat().st_mtime + except OSError: + return True + data_root = Path(system_config.data_dir) + source_roots = [ + data_root / "proofs", + data_root / "manual_proofs", + data_root / "manual_proof_runs", + data_root / "auto_sessions", + data_root / "leanoj_sessions", + data_root / "leanoj_partial_proofs", + data_root / "syntheticlib4", + ] + for root in source_roots: + if not root.exists(): + continue + try: + files = root.rglob("*") + for path in files: + if not path.is_file() or path.suffix.lower() not in {".json", ".jsonl", ".lean"}: + continue + if path.stat().st_mtime > index_mtime: + return True + except OSError: + # If source freshness cannot be determined, rebuild rather than + # serving stale proof-history records. + return True + return False + + async def _load_records(self) -> list[UnifiedProofSearchRecord]: + records: list[UnifiedProofSearchRecord] = [] + try: + records.extend(load_syntheticlib4_fixture_records()) + except Exception: + # SyntheticLib4 is optional; local MOTO proof search should still work. + records.extend([]) + records.extend(await load_moto_proof_records()) + return records + + +proof_search_service = ProofSearchService() + diff --git a/backend/shared/proof_search/syntheticlib4_sources.py b/backend/shared/proof_search/syntheticlib4_sources.py new file mode 100644 index 0000000..aa3a167 --- /dev/null +++ b/backend/shared/proof_search/syntheticlib4_sources.py @@ -0,0 +1,95 @@ +"""SyntheticLib4 fixture/snapshot normalization for unified proof search.""" +from __future__ import annotations + +import hashlib +from typing import Any + +from backend.shared.proof_search.models import UnifiedProofSearchRecord +from backend.shared.syntheticlib4_client import SyntheticLib4Client, syntheticlib4_client + + +def normalize_syntheticlib4_record( + record: dict[str, Any], + *, + release_id: str, + channel: str = "stable", +) -> UnifiedProofSearchRecord: + """Convert one SyntheticLib4 proof record into the shared search model.""" + fingerprint = str(record.get("fingerprint", "")).strip() + theorem_statement = str(record.get("theorem_statement", "")).strip() + lean_code = str(record.get("lean_code", "") or "") + statement_hash = str(record.get("theorem_statement_hash", "")).strip() or _sha256_text( + theorem_statement + ) + lean_hash = str(record.get("lean_code_hash", "")).strip() or ( + _sha256_text(lean_code) if lean_code else "" + ) + module = str(record.get("module", "") or "") + source_path = str(record.get("source_path", "") or "") + + return UnifiedProofSearchRecord( + search_id=f"syntheticlib4:{fingerprint}", + corpus="syntheticlib4", + corpus_scope=channel, + source_kind="verified_proof", + proof_id=fingerprint, + external_fingerprint=fingerprint, + release_id=release_id, + source_type="syntheticlib4_snapshot", + source_id=release_id, + source_title=source_path or module or "SyntheticLib4", + display_title=str(record.get("display_title", "") or record.get("theorem_name", "")), + theorem_name=str(record.get("theorem_name", "") or ""), + theorem_statement=theorem_statement, + informal_statement=str(record.get("informal_statement", "") or ""), + proof_description=str(record.get("proof_description", "") or ""), + formal_sketch=str(record.get("proof_description", "") or ""), + lean_code=lean_code, + lean_code_hash=lean_hash, + theorem_statement_hash=statement_hash, + imports=_string_list(record.get("imports")), + dependency_names=_string_list(record.get("dependency_names")), + topic_tags=_string_list(record.get("topic_tags")), + domain_tags=_string_list(record.get("domain_tags")), + module=module, + source_path=source_path, + novelty_tier=str(record.get("novelty_rank", "") or ""), + novelty_reasoning=( + f"SyntheticLib4 novelty confidence: {record.get('novelty_confidence', 'unknown')}" + ), + verified=True, + created_at=str(record.get("created_at", "") or ""), + canonical_uri=f"syntheticlib4://{release_id}/{fingerprint}", + metadata={ + "validation_record_id": record.get("validation_record_id", ""), + "line_range": record.get("line_range", {}), + "license_terms_id": record.get("license_terms_id", ""), + "release_membership": record.get("release_membership", ""), + "hydration_url": record.get("hydration_url"), + }, + ) + + +def load_syntheticlib4_fixture_records( + client: SyntheticLib4Client | None = None, +) -> list[UnifiedProofSearchRecord]: + """Load SyntheticLib4 fixture records for offline development and tests.""" + active_client = client or syntheticlib4_client + manifest = active_client.get_release_manifest() + release_id = str(manifest.get("release_id", "") or "fixture") + channel = str(manifest.get("channel", "") or "stable") + return [ + normalize_syntheticlib4_record(record, release_id=release_id, channel=channel) + for record in active_client.load_proof_metadata() + ] + + +def _string_list(value: Any) -> list[str]: + if not isinstance(value, list): + return [] + return [str(item) for item in value if str(item).strip()] + + +def _sha256_text(value: str) -> str: + return hashlib.sha256(value.encode("utf-8")).hexdigest() + diff --git a/backend/shared/proof_search/tool_adapter.py b/backend/shared/proof_search/tool_adapter.py new file mode 100644 index 0000000..3285eac --- /dev/null +++ b/backend/shared/proof_search/tool_adapter.py @@ -0,0 +1,396 @@ +"""OpenAI-compatible proof-search tool adapter.""" +from __future__ import annotations + +import asyncio +import json +from datetime import datetime, timezone +from pathlib import Path +from typing import Any + +from backend.shared.config import system_config +from backend.shared.proof_search.models import ( + ProofSearchCorpus, + ProofSearchRequest, + UnifiedProofSearchRecord, + default_proof_search_corpora, +) +from backend.shared.proof_search.search_service import ( + ProofSearchService, + proof_search_service, +) + +MAX_PROOF_SEARCH_RESULTS = 7 + +SEARCH_LEAN_PROOFS_TOOL_SCHEMA: dict[str, Any] = { + "type": "function", + "function": { + "name": "search_lean_proofs", + "description": ( + "Search MOTO local proof history and SyntheticLib4 proof records for " + "prompt-relevant Lean proof patterns. Use this only for active proof " + "formalization or proof repair work. Results are capped at 7 combined " + "proofs and include provenance plus theorem/Lean-code hashes." + ), + "parameters": { + "type": "object", + "properties": { + "action": { + "type": "string", + "enum": ["overview", "search", "hydrate", "attest_usage"], + }, + "query": {"type": "string"}, + "goal_statement": {"type": "string"}, + "lean_template": {"type": "string"}, + "imports": {"type": "array", "items": {"type": "string"}}, + "dependency_names": {"type": "array", "items": {"type": "string"}}, + "corpora": { + "type": "array", + "items": { + "type": "string", + "enum": ["moto", "manual", "autonomous", "leanoj", "syntheticlib4"], + }, + }, + "verified_only": {"type": "boolean"}, + "include_partial": {"type": "boolean"}, + "include_failed": {"type": "boolean"}, + "novelty_filters": {"type": "array", "items": {"type": "string"}}, + "module_filters": {"type": "array", "items": {"type": "string"}}, + "source_filters": {"type": "array", "items": {"type": "string"}}, + "exclude_ids": {"type": "array", "items": {"type": "string"}}, + "limit": {"type": "integer", "minimum": 1, "maximum": MAX_PROOF_SEARCH_RESULTS}, + "cursor": {"type": "string"}, + "hydrate_lean_code": {"type": "boolean"}, + "search_mode": { + "type": "string", + "enum": ["auto", "exact", "lexical", "text", "semantic", "hybrid"], + }, + "source": { + "type": "string", + "enum": ["moto", "manual", "autonomous", "leanoj", "syntheticlib4"], + }, + "proof_id": {"type": "string"}, + "fingerprint": {"type": "string"}, + "session_id": {"type": "string"}, + "usage_attestation": { + "type": "object", + "properties": { + "retrieval_batch_id": {"type": "string"}, + "used_fingerprints": {"type": "array", "items": {"type": "string"}}, + "unused_fingerprints": {"type": "array", "items": {"type": "string"}}, + "used_proofs": { + "type": "array", + "items": { + "type": "object", + "properties": { + "fingerprint": {"type": "string"}, + "theorem_statement_hash": {"type": "string"}, + "lean_code_hash": {"type": "string"}, + }, + }, + }, + "entire_code_used": {"type": "boolean"}, + "moto_artifact_hash": {"type": "string"}, + "usage_type": {"type": "string"}, + }, + }, + }, + "required": ["action"], + }, + }, +} + +_VALID_CORPORA: set[str] = {"moto", "manual", "leanoj", "syntheticlib4"} +_CORPUS_ALIASES = {"autonomous": "moto"} + + +async def execute_search_lean_proofs( + arguments: dict[str, Any] | str, + *, + service: ProofSearchService | None = None, + usage_root: Path | None = None, +) -> dict[str, Any]: + """Execute one `search_lean_proofs` tool call and return JSON-safe output.""" + active_service = service or proof_search_service + try: + args = _coerce_arguments(arguments) + action = str(args.get("action") or "").strip() + if action == "overview": + overview = await active_service.overview() + return _tool_success(action, overview=overview.model_dump(mode="json")) + if action == "search": + request = _build_search_request(args) + response = await active_service.search(request) + return _tool_success( + action, + results=[_record_to_tool_result(record) for record in response.results], + next_cursor=response.next_cursor, + searched_corpora=response.searched_corpora, + corpus_counts=response.corpus_counts, + ranking_notes=response.ranking_notes, + weak_result_warning=response.weak_result_warning, + ) + if action == "hydrate": + record = await _hydrate_record(active_service, args) + if record is None: + return _tool_error(action, "Proof record not found.") + return _tool_success(action, results=[_record_to_tool_result(record)]) + if action == "attest_usage": + attestation = await _persist_usage_attestation(args, usage_root=usage_root) + return _tool_success(action, usage_attestation=attestation) + return _tool_error(action or "unknown", "Unsupported search_lean_proofs action.") + except Exception as exc: + return _tool_error("error", str(exc)) + + +def _coerce_arguments(arguments: dict[str, Any] | str) -> dict[str, Any]: + if isinstance(arguments, str): + parsed = json.loads(arguments or "{}") + else: + parsed = dict(arguments or {}) + if not isinstance(parsed, dict): + raise ValueError("Tool arguments must be a JSON object.") + return parsed + + +def _build_search_request(args: dict[str, Any]) -> ProofSearchRequest: + goal_parts = [_string(args.get("goal_statement"))] + lean_template = _string(args.get("lean_template")) + if lean_template: + goal_parts.append(lean_template) + return ProofSearchRequest( + query=_string(args.get("query")), + goal_statement="\n\n".join(part for part in goal_parts if part), + imports=_string_list(args.get("imports")), + dependency_names=_string_list(args.get("dependency_names")), + corpora=_normalize_corpora(args.get("corpora")), + verified_only=bool(args.get("verified_only", True)), + include_partial=bool(args.get("include_partial", False)), + include_failed=bool(args.get("include_failed", False)), + novelty_filters=_string_list(args.get("novelty_filters")), + module_filters=_string_list(args.get("module_filters")), + source_filters=_string_list(args.get("source_filters")), + limit=_normalize_limit(args.get("limit")), + cursor=_optional_string(args.get("cursor")), + exclude_ids=_string_list(args.get("exclude_ids")), + hydrate_lean_code=bool(args.get("hydrate_lean_code", True)), + search_mode=_normalize_search_mode(args.get("search_mode")), + ) + + +async def _hydrate_record( + service: ProofSearchService, + args: dict[str, Any], +) -> UnifiedProofSearchRecord | None: + source = _normalize_corpus(args.get("source") or _first(args.get("corpora")) or "") + proof_id = _string(args.get("proof_id") or args.get("fingerprint")) + if not source: + raise ValueError("Hydrate action requires 'source' or a single corpus.") + if not proof_id: + raise ValueError("Hydrate action requires 'proof_id' or 'fingerprint'.") + return await service.get_record( + corpus=source, + proof_id=proof_id, + session_id=_optional_string(args.get("session_id")), + ) + + +async def _persist_usage_attestation( + args: dict[str, Any], + *, + usage_root: Path | None = None, +) -> dict[str, Any]: + attestation = dict(args.get("usage_attestation") or {}) + now = datetime.now(timezone.utc).isoformat() + used_proofs = _normalize_used_proofs(attestation) + payload = { + "schema_version": "moto.proof_search_usage_attestation.v1", + "created_at": now, + "retrieval_batch_id": _string(attestation.get("retrieval_batch_id")), + "used_fingerprints": [proof["fingerprint"] for proof in used_proofs], + "used_proofs": used_proofs, + "unused_fingerprints": _string_list(attestation.get("unused_fingerprints")), + "usage_type": _string(attestation.get("usage_type") or "whole_proof_dependency"), + "entire_code_used": bool(attestation.get("entire_code_used", False)), + "moto_artifact_hash": _string(attestation.get("moto_artifact_hash")), + "submitted": False, + } + if not used_proofs: + raise ValueError("Usage attestation requires at least one used fingerprint.") + if payload["entire_code_used"] and any( + not proof["theorem_statement_hash"] or not proof["lean_code_hash"] + for proof in used_proofs + ): + raise ValueError( + "Whole-code usage attestations require theorem_statement_hash and lean_code_hash for every used proof." + ) + root = usage_root or Path(system_config.data_dir) / "proof_search" + path = root / "usage_attestations.jsonl" + + def _append() -> None: + root.mkdir(parents=True, exist_ok=True) + with path.open("a", encoding="utf-8") as handle: + handle.write(json.dumps(payload, ensure_ascii=True) + "\n") + + await asyncio.to_thread(_append) + return {**payload, "persisted": True} + + +def _normalize_used_proofs(attestation: dict[str, Any]) -> list[dict[str, str]]: + normalized: list[dict[str, str]] = [] + seen: set[str] = set() + raw_used_proofs = attestation.get("used_proofs") or [] + if isinstance(raw_used_proofs, list): + for item in raw_used_proofs: + if not isinstance(item, dict): + continue + fingerprint = _string(item.get("fingerprint")) + if not fingerprint or fingerprint in seen: + continue + normalized.append( + { + "fingerprint": fingerprint, + "theorem_statement_hash": _string(item.get("theorem_statement_hash")), + "lean_code_hash": _string(item.get("lean_code_hash")), + } + ) + seen.add(fingerprint) + for fingerprint in _string_list(attestation.get("used_fingerprints")): + if fingerprint in seen: + continue + normalized.append( + { + "fingerprint": fingerprint, + "theorem_statement_hash": "", + "lean_code_hash": "", + } + ) + seen.add(fingerprint) + return normalized + + +def _record_to_tool_result(record: UnifiedProofSearchRecord) -> dict[str, Any]: + return { + "search_id": record.search_id, + "corpus": record.corpus, + "corpus_scope": record.corpus_scope, + "source_kind": record.source_kind, + "proof_id": record.proof_id, + "fingerprint": record.external_fingerprint, + "release_id": record.release_id, + "session_id": record.session_id, + "source_type": record.source_type, + "source_id": record.source_id, + "source_title": record.source_title, + "display_title": record.display_title, + "theorem_name": record.theorem_name, + "theorem_statement": record.theorem_statement, + "informal_statement": record.informal_statement, + "proof_description": record.proof_description, + "formal_sketch": record.formal_sketch, + "imports": record.imports, + "dependency_names": record.dependency_names, + "topic_tags": record.topic_tags, + "domain_tags": record.domain_tags, + "module": record.module, + "source_path": record.source_path, + "novelty_tier": record.novelty_tier, + "novelty_reasoning": record.novelty_reasoning, + "lean_code": record.lean_code, + "theorem_statement_hash": record.theorem_statement_hash, + "lean_code_hash": record.lean_code_hash, + "canonical_uri": record.canonical_uri, + "metadata": record.metadata, + } + + +def _tool_success(action: str, **payload: Any) -> dict[str, Any]: + return { + "success": True, + "action": action, + "overview": payload.pop("overview", None), + "results": payload.pop("results", []), + "next_cursor": payload.pop("next_cursor", None), + "searched_corpora": payload.pop("searched_corpora", []), + "corpus_counts": payload.pop("corpus_counts", {}), + "ranking_notes": payload.pop("ranking_notes", ""), + "weak_result_warning": payload.pop("weak_result_warning", None), + "usage_attestation": payload.pop("usage_attestation", None), + "error": None, + **payload, + } + + +def _tool_error(action: str, message: str) -> dict[str, Any]: + return { + "success": False, + "action": action, + "overview": None, + "results": [], + "next_cursor": None, + "searched_corpora": [], + "corpus_counts": {}, + "ranking_notes": "", + "weak_result_warning": None, + "usage_attestation": None, + "error": message, + } + + +def _normalize_corpora(value: Any) -> list[ProofSearchCorpus]: + corpora = [_normalize_corpus(item) for item in _string_list(value)] + valid = [corpus for corpus in corpora if corpus] + if valid: + from backend.shared.config import system_config + + return [ + corpus + for corpus in valid + if ( + (corpus == "syntheticlib4" and system_config.syntheticlib4_enabled) + or (corpus != "syntheticlib4" and system_config.agent_conversation_memory_enabled) + ) + ] + return default_proof_search_corpora() + + +def _normalize_corpus(value: Any) -> ProofSearchCorpus | None: + corpus = _CORPUS_ALIASES.get(_string(value), _string(value)) + if corpus not in _VALID_CORPORA: + return None + return corpus # type: ignore[return-value] + + +def _normalize_search_mode(value: Any) -> str: + mode = _string(value) or "hybrid" + if mode in {"lexical", "text"}: + return "text" + if mode == "exact": + return "exact" + return "hybrid" + + +def _normalize_limit(value: Any) -> int: + try: + limit = int(value) + except (TypeError, ValueError): + limit = MAX_PROOF_SEARCH_RESULTS + return min(max(limit, 1), MAX_PROOF_SEARCH_RESULTS) + + +def _string_list(value: Any) -> list[str]: + if not isinstance(value, list): + return [] + return [str(item).strip() for item in value if str(item).strip()] + + +def _optional_string(value: Any) -> str | None: + text = _string(value) + return text or None + + +def _string(value: Any) -> str: + return str(value or "").strip() + + +def _first(value: Any) -> Any: + return value[0] if isinstance(value, list) and value else None diff --git a/backend/shared/provider_notification_store.py b/backend/shared/provider_notification_store.py new file mode 100644 index 0000000..fe65d8a --- /dev/null +++ b/backend/shared/provider_notification_store.py @@ -0,0 +1,165 @@ +"""Durable non-secret provider/OAuth notifications for frontend recovery.""" +from __future__ import annotations + +import json +import logging +import threading +import time +from datetime import datetime, timezone +from pathlib import Path +from typing import Any, Dict, List + +from backend.shared.config import system_config +from backend.shared.log_redaction import redact_log_text + +logger = logging.getLogger(__name__) + +PROVIDER_NOTIFICATIONS_FILENAME = "provider_notifications.json" +MAX_PROVIDER_NOTIFICATIONS = 20 +PROVIDER_NOTIFICATION_TTL_SECONDS = 7 * 24 * 60 * 60 + +_store_lock = threading.Lock() + + +def _notifications_path() -> Path: + return Path(system_config.data_dir) / PROVIDER_NOTIFICATIONS_FILENAME + + +def _now_iso() -> str: + return datetime.now(timezone.utc).isoformat().replace("+00:00", "Z") + + +def _coerce_created_at(value: Any) -> str: + raw = str(value or "").strip() + return raw or _now_iso() + + +def _notification_age_seconds(notification: dict[str, Any], now: float) -> float: + raw = str(notification.get("created_at") or notification.get("timestamp") or "").strip() + if not raw: + return 0.0 + try: + parsed = datetime.fromisoformat(raw.replace("Z", "+00:00")) + except ValueError: + return 0.0 + if parsed.tzinfo is None: + parsed = parsed.replace(tzinfo=timezone.utc) + return max(0.0, now - parsed.timestamp()) + + +def _read_payload() -> dict[str, Any]: + path = _notifications_path() + try: + if not path.exists(): + return {"notifications": []} + payload = json.loads(path.read_text(encoding="utf-8")) + except json.JSONDecodeError as exc: + logger.warning("Ignoring corrupt provider notifications file %s: %s", redact_log_text(path, 240), exc) + return {"notifications": []} + except OSError as exc: + logger.warning("Failed to read provider notifications file %s: %s", redact_log_text(path, 240), exc) + return {"notifications": []} + return payload if isinstance(payload, dict) else {"notifications": []} + + +def _write_payload(payload: dict[str, Any]) -> None: + path = _notifications_path() + try: + path.parent.mkdir(parents=True, exist_ok=True) + temp_path = path.with_name(f".{path.name}.tmp") + temp_path.write_text(json.dumps(payload, indent=2, sort_keys=True) + "\n", encoding="utf-8") + temp_path.replace(path) + except OSError as exc: + logger.warning("Failed to persist provider notifications file %s: %s", redact_log_text(path, 240), exc) + + +def _clean_notifications(notifications: list[Any]) -> list[dict[str, Any]]: + now = time.time() + cleaned: list[dict[str, Any]] = [] + for item in notifications: + if not isinstance(item, dict): + continue + if _notification_age_seconds(item, now) > PROVIDER_NOTIFICATION_TTL_SECONDS: + continue + cleaned.append(item) + return cleaned[-MAX_PROVIDER_NOTIFICATIONS:] + + +def _safe_string(value: Any, max_chars: int = 700) -> str: + text = redact_log_text(str(value or "")).strip() + if max_chars is not None and max_chars >= 0 and len(text) > max_chars: + if max_chars <= 3: + return text[:max_chars] + return text[: max_chars - 3] + "..." + return text + + +def _stable_notification_key(provider: str, role_id: str, reason: str, model: str) -> str: + parts = [provider, role_id, reason, model or "*"] + return ":".join(part.replace(":", "_") for part in parts) + + +def record_provider_notification(event_type: str, payload: Dict[str, Any]) -> dict[str, Any]: + """Persist a recoverable provider/OAuth notification and return the stored payload.""" + provider = _safe_string(payload.get("provider"), 120) or "oauth" + role_id = _safe_string(payload.get("role_id"), 160) or provider + reason = _safe_string(payload.get("reason"), 160) or "provider_error" + model = _safe_string(payload.get("model"), 240) + created_at = _coerce_created_at(payload.get("created_at") or payload.get("_serverTimestamp")) + key_model = model + if reason == "usage_limit_reached" and payload.get("cooldown_until") is not None: + key_model = f"{model or '*'}@{payload.get('cooldown_until')}" + notification_key = _stable_notification_key(provider, role_id, reason, key_model) + notification_id = _safe_string(payload.get("id"), 240) or notification_key + + notification = { + "id": notification_id, + "notification_key": notification_key, + "event_type": _safe_string(event_type, 120), + "created_at": created_at, + "provider": provider, + "provider_label": _safe_string(payload.get("provider_label"), 120), + "role_id": role_id, + "model": model, + "reason": reason, + "recoverable": bool(payload.get("recoverable", False)), + "message": _safe_string(payload.get("message"), 700), + "error_summary": _safe_string(payload.get("error_summary"), 700), + "oauth_error_message": _safe_string(payload.get("oauth_error_message"), 1800), + } + for numeric_key in ("resets_at", "resets_in_seconds", "cooldown_until"): + raw_value = payload.get(numeric_key) + if raw_value is None: + continue + try: + notification[numeric_key] = int(raw_value) + except (TypeError, ValueError): + continue + if "fallback_model" in payload: + notification["fallback_model"] = _safe_string(payload.get("fallback_model"), 240) + if "plan_type" in payload: + notification["plan_type"] = _safe_string(payload.get("plan_type"), 120) + + with _store_lock: + stored_payload = _read_payload() + notifications = _clean_notifications(stored_payload.get("notifications") or []) + notifications = [ + item for item in notifications + if item.get("notification_key") != notification_key and item.get("id") != notification_id + ] + notifications.append(notification) + stored_payload["notifications"] = notifications[-MAX_PROVIDER_NOTIFICATIONS:] + _write_payload(stored_payload) + + return notification + + +def list_provider_notifications() -> List[dict[str, Any]]: + """Return recent provider/OAuth notifications, newest last.""" + with _store_lock: + payload = _read_payload() + notifications = _clean_notifications(payload.get("notifications") or []) + if len(notifications) != len(payload.get("notifications") or []): + payload["notifications"] = notifications + _write_payload(payload) + return list(notifications) diff --git a/backend/shared/runtime_settings.py b/backend/shared/runtime_settings.py index 4b3b41e..432f5c0 100644 --- a/backend/shared/runtime_settings.py +++ b/backend/shared/runtime_settings.py @@ -37,6 +37,12 @@ class RuntimeSettingsError(RuntimeError): "smt_timeout": (1, 600), } +_CONNECTIVITY_BOOL_FIELDS = { + "syntheticlib4_enabled", + "agent_conversation_memory_enabled", + "wolfram_alpha_enabled", +} + def _settings_path() -> Path: return Path(system_config.data_dir) / RUNTIME_SETTINGS_FILENAME @@ -125,6 +131,14 @@ def _free_model_settings_from_manager() -> Dict[str, Any]: } +def _connectivity_toggles_from_config() -> Dict[str, Any]: + return { + "syntheticlib4_enabled": bool(system_config.syntheticlib4_enabled), + "agent_conversation_memory_enabled": bool(system_config.agent_conversation_memory_enabled), + "wolfram_alpha_enabled": bool(system_config.wolfram_alpha_enabled), + } + + def save_proof_runtime_settings() -> None: """Persist current non-secret Lean/SMT proof runtime settings.""" payload = _read_settings() @@ -139,6 +153,25 @@ def save_free_model_runtime_settings() -> None: _write_settings(payload) +def save_connectivity_runtime_settings() -> None: + """Persist current non-secret connectivity feature toggles.""" + payload = _read_settings() + payload["connectivity_toggles"] = _connectivity_toggles_from_config() + _write_settings(payload) + + +def get_persisted_connectivity_toggles() -> Dict[str, bool]: + """Return persisted connectivity toggles, omitting fields not yet saved.""" + toggles = _read_settings().get("connectivity_toggles") + if not isinstance(toggles, dict): + return {} + result: Dict[str, bool] = {} + for field in _CONNECTIVITY_BOOL_FIELDS: + if field in toggles: + result[field] = _coerce_bool(toggles[field], bool(getattr(system_config, field))) + return result + + def apply_persisted_runtime_settings() -> None: """Apply persisted non-secret runtime settings to process globals.""" payload = _read_settings() @@ -182,3 +215,13 @@ def apply_persisted_runtime_settings() -> None: looping=looping_enabled, auto_selector=auto_selector_enabled, ) + + connectivity_toggles = payload.get("connectivity_toggles") + if isinstance(connectivity_toggles, dict): + for field in _CONNECTIVITY_BOOL_FIELDS: + if field in connectivity_toggles: + setattr( + system_config, + field, + _coerce_bool(connectivity_toggles[field], bool(getattr(system_config, field))), + ) diff --git a/backend/shared/sakana_fugu_client.py b/backend/shared/sakana_fugu_client.py new file mode 100644 index 0000000..f1f4b52 --- /dev/null +++ b/backend/shared/sakana_fugu_client.py @@ -0,0 +1,409 @@ +""" +Sakana Fugu subscription API client. + +Fugu is exposed as an OpenAI-compatible provider. This adapter stores the +desktop API key in the backend keyring and returns Chat-Completions-compatible +responses so the rest of MOTO can keep using the shared extraction/logging path. +""" +from __future__ import annotations + +import asyncio +import os +import time +from typing import Any, Dict, List, Optional + +import httpx + +from backend.shared.log_redaction import redact_log_text +from backend.shared.openrouter_client import sanitize_provider_error_text +from backend.shared.secret_store import ( + clear_sakana_fugu_api_key, + load_sakana_fugu_api_key, + store_sakana_fugu_api_key, +) + + +class SakanaFuguError(RuntimeError): + """Base error for Sakana Fugu requests.""" + + +class SakanaFuguAuthError(SakanaFuguError): + """Raised when the Sakana Fugu API key is missing or rejected.""" + + +class SakanaFuguRequestError(SakanaFuguError): + """Raised when Sakana rejects a non-auth request.""" + + +class SakanaFuguClient: + """Client for Sakana Fugu's OpenAI-compatible API.""" + + API_BASE_URL = os.getenv("MOTO_SAKANA_FUGU_BASE_URL", "https://api.sakana.ai/v1").rstrip("/") + DEFAULT_MODEL = os.getenv("MOTO_SAKANA_FUGU_DEFAULT_MODEL", "fugu") + KNOWN_MODELS = [ + { + "id": "fugu", + "name": "Fugu", + "context_length": 1_000_000, + "max_output_tokens": 100_000, + "pricing": {"prompt": "subscription", "completion": "subscription"}, + "provider_metadata": {"source": "sakana_fugu", "supports_reasoning_effort": True}, + }, + { + "id": "fugu-ultra", + "name": "Fugu Ultra", + "context_length": 1_000_000, + "max_output_tokens": 100_000, + "pricing": {"prompt": "subscription", "completion": "subscription"}, + "provider_metadata": {"source": "sakana_fugu", "supports_reasoning_effort": True}, + }, + { + "id": "fugu-ultra-20260615", + "name": "Fugu Ultra 20260615", + "context_length": 1_000_000, + "max_output_tokens": 100_000, + "pricing": {"prompt": "subscription", "completion": "subscription"}, + "provider_metadata": {"source": "sakana_fugu", "supports_reasoning_effort": True}, + }, + ] + TRANSIENT_STATUS_CODES = {408, 409, 425, 429, 500, 502, 503, 504, 520, 521, 522, 523, 524} + MAX_RETRIES = 4 + RETRY_DELAY = 2.0 + RETRY_MAX_DELAY = 30.0 + + def __init__(self) -> None: + self._api_key: Optional[str] = None + self.client = httpx.AsyncClient( + timeout=None, + limits=httpx.Limits(max_keepalive_connections=20, max_connections=50, keepalive_expiry=30.0), + ) + env_key = os.getenv("SAKANA_API_KEY") or os.getenv("FUGU_API_KEY") or os.getenv("MOTO_SAKANA_FUGU_API_KEY") + if env_key: + self._api_key = env_key.strip() or None + + def _load_api_key(self) -> Optional[str]: + if self._api_key: + return self._api_key + key = load_sakana_fugu_api_key() + if key: + self._api_key = key + return self._api_key + + def set_api_key(self, api_key: str, *, persist: bool = True) -> None: + key = (api_key or "").strip() + if not key: + raise ValueError("Sakana Fugu API key is required.") + self._api_key = key + if persist: + store_sakana_fugu_api_key(key) + + async def clear_api_key(self) -> None: + self._api_key = None + clear_sakana_fugu_api_key() + + async def status(self) -> Dict[str, Any]: + return { + "configured": bool(self._load_api_key()), + "provider": "sakana_fugu", + "updated_at": int(time.time()) if self._load_api_key() else None, + } + + @staticmethod + def _headers_for_key(key: Optional[str]) -> Dict[str, str]: + if not key: + raise SakanaFuguAuthError("Sakana Fugu API key is not configured.") + return { + "Authorization": f"Bearer {key}", + "Content-Type": "application/json", + } + + def _headers(self) -> Dict[str, str]: + return self._headers_for_key(self._load_api_key()) + + @classmethod + def _retry_delay(cls, attempt: int) -> float: + return min(cls.RETRY_DELAY * (2 ** attempt), cls.RETRY_MAX_DELAY) + + async def _post_with_retry(self, url: str, **kwargs) -> httpx.Response: + for attempt in range(self.MAX_RETRIES): + try: + response = await self.client.post(url, **kwargs) + if response.status_code >= 400 and response.status_code in self.TRANSIENT_STATUS_CODES: + detail = sanitize_provider_error_text(response.text) + if attempt < self.MAX_RETRIES - 1: + await asyncio.sleep(self._retry_delay(attempt)) + continue + raise SakanaFuguRequestError( + f"Sakana Fugu connection failed after retries: HTTP {response.status_code}: {detail}" + ) + return response + except httpx.TransportError as exc: + detail = sanitize_provider_error_text(str(exc) or repr(exc)) + if attempt < self.MAX_RETRIES - 1: + await asyncio.sleep(self._retry_delay(attempt)) + continue + raise SakanaFuguRequestError(f"Sakana Fugu connection failed after retries: {detail}") from exc + raise SakanaFuguRequestError("Sakana Fugu request failed after retries.") + + @staticmethod + def _normalize_reasoning_effort(reasoning_effort: Optional[str]) -> Optional[str]: + effort = (reasoning_effort or "").strip().lower() + if not effort or effort == "none": + return None + if effort in {"auto", "xhigh", "max", "maximum", "highest"}: + return "xhigh" + if effort == "high": + return "high" + return None + + @staticmethod + def _chat_usage_from_responses_usage(usage: Dict[str, Any]) -> Dict[str, Any]: + prompt_tokens = usage.get("prompt_tokens", usage.get("input_tokens")) + completion_tokens = usage.get("completion_tokens", usage.get("output_tokens")) + total_tokens = usage.get("total_tokens") + if total_tokens is None and prompt_tokens is not None and completion_tokens is not None: + total_tokens = int(prompt_tokens) + int(completion_tokens) + normalized = { + "prompt_tokens": prompt_tokens, + "completion_tokens": completion_tokens, + "total_tokens": total_tokens, + } + if usage.get("input_tokens_details"): + normalized["prompt_tokens_details"] = usage["input_tokens_details"] + if usage.get("output_tokens_details"): + normalized["completion_tokens_details"] = usage["output_tokens_details"] + return {key: value for key, value in normalized.items() if value is not None} + + @staticmethod + def _extract_responses_text(data: Dict[str, Any]) -> str: + if isinstance(data.get("output_text"), str): + return data["output_text"] + parts: List[str] = [] + for item in data.get("output") or []: + if not isinstance(item, dict): + continue + for content in item.get("content") or []: + if not isinstance(content, dict): + continue + text = content.get("text") + if isinstance(text, str): + parts.append(text) + return "".join(parts) + + @staticmethod + def _messages_to_responses_payload(messages: List[Dict[str, Any]]) -> tuple[str, List[Dict[str, Any]]]: + instructions: List[str] = [] + inputs: List[Dict[str, Any]] = [] + for message in messages: + role = str(message.get("role") or "user") + content = message.get("content", "") + if role in {"system", "developer"}: + if isinstance(content, str): + instructions.append(content) + else: + instructions.append(str(content)) + continue + responses_role = "assistant" if role == "assistant" else "user" + if role == "tool": + responses_role = "user" + inputs.append({"role": responses_role, "content": content}) + return "\n\n".join(part for part in instructions if part), inputs + + @classmethod + def _normalize_responses_to_chat(cls, data: Dict[str, Any], model: str) -> Dict[str, Any]: + text = cls._extract_responses_text(data) + usage = data.get("usage") if isinstance(data.get("usage"), dict) else {} + return { + "id": data.get("id") or f"sakana-fugu-{int(time.time() * 1000)}", + "object": "chat.completion", + "created": data.get("created_at") or data.get("created") or int(time.time()), + "model": data.get("model") or model, + "choices": [ + { + "index": 0, + "message": {"role": "assistant", "content": text}, + "finish_reason": data.get("status") if data.get("status") not in {None, "completed"} else "stop", + } + ], + "usage": cls._chat_usage_from_responses_usage(usage), + "_moto_sakana_wire_api": "responses", + } + + @staticmethod + def _messages_need_chat_completions(messages: List[Dict[str, Any]], tools: Optional[List[Dict[str, Any]]]) -> bool: + if tools: + return True + for message in messages: + role = str(message.get("role") or "") + if role == "tool" or message.get("tool_calls"): + return True + return False + + @classmethod + def _normalize_chat_completion(cls, data: Dict[str, Any], model: str) -> Dict[str, Any]: + choices = data.get("choices") if isinstance(data.get("choices"), list) else [] + if not choices: + raise SakanaFuguRequestError("Sakana Fugu chat completion returned no choices.") + usage = data.get("usage") if isinstance(data.get("usage"), dict) else {} + return { + "id": data.get("id") or f"sakana-fugu-{int(time.time() * 1000)}", + "object": "chat.completion", + "created": data.get("created") or int(time.time()), + "model": data.get("model") or model, + "choices": choices, + "usage": cls._chat_usage_from_responses_usage(usage), + "_moto_sakana_wire_api": "chat.completions", + } + + async def _generate_via_chat_completions( + self, + *, + model: str, + messages: List[Dict[str, Any]], + temperature: float, + max_tokens: Optional[int], + response_format: Optional[Dict[str, Any]], + reasoning_effort: Optional[str], + tools: Optional[List[Dict[str, Any]]], + tool_choice: Optional[Any], + ) -> Dict[str, Any]: + payload: Dict[str, Any] = { + "model": model, + "messages": messages, + "temperature": temperature, + } + if max_tokens: + payload["max_tokens"] = int(max_tokens) + effort = self._normalize_reasoning_effort(reasoning_effort) + if effort: + payload["reasoning"] = {"effort": effort} + if response_format: + payload["response_format"] = response_format + if tools: + payload["tools"] = tools + if tool_choice is not None: + payload["tool_choice"] = tool_choice + + response = await self._post_with_retry( + f"{self.API_BASE_URL}/chat/completions", + json=payload, + headers=self._headers(), + ) + if response.status_code >= 400: + message = sanitize_provider_error_text(response.text) + if response.status_code in {401, 403}: + raise SakanaFuguAuthError(f"Sakana Fugu completion failed: {message}") + raise SakanaFuguRequestError(f"Sakana Fugu completion failed: {message}") + data = response.json() + if not isinstance(data, dict): + raise SakanaFuguRequestError("Sakana Fugu chat completion returned an invalid response shape.") + return self._normalize_chat_completion(data, model) + + async def list_models(self, api_key: Optional[str] = None) -> List[Dict[str, Any]]: + try: + headers = self._headers_for_key(api_key.strip() if api_key is not None else self._load_api_key()) + response = await self.client.get(f"{self.API_BASE_URL}/models", headers=headers) + if response.status_code >= 400: + if response.status_code in {401, 403}: + raise SakanaFuguAuthError(f"Sakana Fugu model list failed: {sanitize_provider_error_text(response.text)}") + raise SakanaFuguRequestError(f"Sakana Fugu model list failed: {sanitize_provider_error_text(response.text)}") + payload = response.json() + records = payload.get("data") if isinstance(payload, dict) else None + if not isinstance(records, list): + return self.KNOWN_MODELS + known_by_id = {model["id"]: model for model in self.KNOWN_MODELS} + models = [] + for record in records: + model_id = str(record.get("id") or "").strip() + if not model_id: + continue + base = known_by_id.get(model_id, {}) + models.append({ + **base, + "id": model_id, + "name": base.get("name") or model_id, + "provider_metadata": {"source": "sakana_fugu", **base.get("provider_metadata", {})}, + }) + return models or self.KNOWN_MODELS + except SakanaFuguError: + raise + except Exception as exc: + raise SakanaFuguRequestError("Sakana Fugu model list failed before a valid model list was returned.") from exc + + async def generate_completion( + self, + *, + model: str, + messages: List[Dict[str, Any]], + temperature: float = 0.0, + max_tokens: Optional[int] = None, + response_format: Optional[Dict[str, Any]] = None, + reasoning_effort: Optional[str] = None, + tools: Optional[List[Dict[str, Any]]] = None, + tool_choice: Optional[Any] = None, + ) -> Dict[str, Any]: + selected_model = model or self.DEFAULT_MODEL + if self._messages_need_chat_completions(messages, tools): + return await self._generate_via_chat_completions( + model=selected_model, + messages=messages, + temperature=temperature, + max_tokens=max_tokens, + response_format=response_format, + reasoning_effort=reasoning_effort, + tools=tools, + tool_choice=tool_choice, + ) + + instructions, input_items = self._messages_to_responses_payload(messages) + payload: Dict[str, Any] = { + "model": selected_model, + "input": input_items or "", + "temperature": temperature, + } + if instructions: + payload["instructions"] = instructions + if max_tokens: + payload["max_output_tokens"] = int(max_tokens) + effort = self._normalize_reasoning_effort(reasoning_effort) + if effort: + payload["reasoning"] = {"effort": effort} + if tools: + payload["tools"] = tools + if tool_choice is not None: + payload["tool_choice"] = tool_choice + if response_format and response_format.get("type") == "json_object": + payload["text"] = {"format": {"type": "json_object"}} + + response = await self._post_with_retry(f"{self.API_BASE_URL}/responses", json=payload, headers=self._headers()) + if response.status_code >= 400: + message = sanitize_provider_error_text(response.text) + if response.status_code in {401, 403}: + raise SakanaFuguAuthError(f"Sakana Fugu completion failed: {message}") + if response.status_code in {400, 404, 422}: + return await self._generate_via_chat_completions( + model=selected_model, + messages=messages, + temperature=temperature, + max_tokens=max_tokens, + response_format=response_format, + reasoning_effort=reasoning_effort, + tools=tools, + tool_choice=tool_choice, + ) + raise SakanaFuguRequestError(f"Sakana Fugu completion failed: {message}") + data = response.json() + if not isinstance(data, dict): + raise SakanaFuguRequestError("Sakana Fugu completion returned an invalid response shape.") + result = self._normalize_responses_to_chat(data, selected_model) + if not result["choices"][0]["message"]["content"]: + raise SakanaFuguRequestError( + f"Sakana Fugu response did not include output text: {redact_log_text(str(data)[:500])}" + ) + return result + + async def close(self) -> None: + await self.client.aclose() + + +sakana_fugu_client = SakanaFuguClient() diff --git a/backend/shared/secret_store.py b/backend/shared/secret_store.py index a0ba79c..bd8216b 100644 --- a/backend/shared/secret_store.py +++ b/backend/shared/secret_store.py @@ -23,6 +23,8 @@ _XAI_GROK_OAUTH = "xai_grok_oauth" _XAI_GROK_OAUTH_CHUNK_PREFIX = "xai_grok_oauth_chunk" _XAI_GROK_OAUTH_CHUNK_COUNT = "xai_grok_oauth_chunk_count" +_SAKANA_FUGU_API_KEY = "sakana_fugu_api_key" +_SYNTHETICLIB4_API_KEY = "syntheticlib4_api_key" # Windows Credential Manager limits blobs to 2560 bytes, which is about # 1280 UTF-16 characters through keyring/win32cred. Keep chunks below that. _SECRET_CHUNK_SIZE = 1000 @@ -224,6 +226,36 @@ def clear_xai_grok_oauth_tokens() -> None: _delete_chunked_secret(_XAI_GROK_OAUTH_CHUNK_PREFIX, _XAI_GROK_OAUTH_CHUNK_COUNT) +def load_sakana_fugu_api_key() -> Optional[str]: + """Load the persisted Sakana Fugu API key.""" + return _get_secret(_SAKANA_FUGU_API_KEY) + + +def store_sakana_fugu_api_key(api_key: str) -> None: + """Persist the Sakana Fugu API key securely.""" + _set_secret(_SAKANA_FUGU_API_KEY, api_key) + + +def clear_sakana_fugu_api_key() -> None: + """Delete the persisted Sakana Fugu API key.""" + _delete_secret(_SAKANA_FUGU_API_KEY) + + +def load_syntheticlib4_api_key() -> Optional[str]: + """Load the persisted SyntheticLib4 corpus API key.""" + return _get_secret(_SYNTHETICLIB4_API_KEY) + + +def store_syntheticlib4_api_key(api_key: str) -> None: + """Persist the SyntheticLib4 corpus API key securely.""" + _set_secret(_SYNTHETICLIB4_API_KEY, api_key) + + +def clear_syntheticlib4_api_key() -> None: + """Delete the persisted SyntheticLib4 corpus API key.""" + _delete_secret(_SYNTHETICLIB4_API_KEY) + + def load_wolfram_api_key() -> Optional[str]: """Load the persisted Wolfram Alpha API key.""" return _get_secret(_WOLFRAM_KEY) diff --git a/backend/shared/syntheticlib4_client.py b/backend/shared/syntheticlib4_client.py new file mode 100644 index 0000000..4a457e7 --- /dev/null +++ b/backend/shared/syntheticlib4_client.py @@ -0,0 +1,641 @@ +"""SyntheticLib4 corpus client used by the proof-search build slice. + +The production SyntheticLib.com service is still under construction, so this +client implements the MOTO-side contract against offline/mock data while keeping +the same public methods that the live adapter will use later. +""" +from __future__ import annotations + +import hashlib +import json +import logging +import shutil +from pathlib import Path +from typing import Any + +from backend.shared.config import system_config +from backend.shared.path_safety import validate_single_path_component + +logger = logging.getLogger(__name__) + +SYNTHETICLIB4_CONTRACT_VERSION = "moto-syntheticlib4-v1" +SYNTHETICLIB4_SCHEMA_VERSION = "syntheticlib4.mock_client.v1" +_REPO_ROOT = Path(__file__).resolve().parents[2] +_DEFAULT_FIXTURE_DIR = _REPO_ROOT / "tests" / "fixtures" / "syntheticlib4" +_BUILTIN_RELEASE_ID = "stable-2026-06-11" +_MOCK_SCOPES = [ + "proofs:read", + "releases:read", + "deltas:read", + "usage:write", + "account:status", + "user_proofs:read", +] + + +class SyntheticLib4ClientError(RuntimeError): + """Raised when mock SyntheticLib4 fixture data is unavailable or invalid.""" + + +class SyntheticLib4Client: + """ + Minimal contract-first SyntheticLib4 adapter. + + This first build slice intentionally supports offline fixtures only. It gives + MOTO a stable client surface before production auth/download endpoints exist. + """ + + def __init__(self, fixture_dir: Path | None = None) -> None: + self.fixture_dir = Path(fixture_dir) if fixture_dir else _DEFAULT_FIXTURE_DIR + self._fixture_dir_explicit = fixture_dir is not None + self._memory_api_key: str | None = None + + @property + def snapshot_root(self) -> Path: + """Return the active data-root SyntheticLib4 cache directory.""" + return Path(system_config.data_dir) / "syntheticlib4" + + def get_status(self) -> dict[str, Any]: + """Return non-secret mock account status.""" + source_dir, source_kind = self._current_source_dir() + status_path = source_dir / "account_status_response.json" if source_dir else None + if status_path and status_path.exists(): + status = self._load_json(status_path) + auth_mode = "local_snapshot" if source_kind == "data_root_snapshot" else "offline_fixture" + else: + status = self._builtin_account_status() + auth_mode = "built_in_offline_fixture" + + credential_configured = self.has_configured_credentials() + return { + **status, + "credential_configured": credential_configured, + "auth_mode": "api_key" if credential_configured else auth_mode, + "hosted_auth_connected": False, + "production_contract_pending": True, + } + + def set_api_key(self, api_key: str) -> dict[str, Any]: + """Store a SyntheticLib4 API key through the mode-appropriate secret path.""" + normalized = (api_key or "").strip() + if not normalized: + raise SyntheticLib4ClientError("SyntheticLib4 API key is required") + if system_config.generic_mode: + self._memory_api_key = normalized + else: + from backend.shared.secret_store import store_syntheticlib4_api_key + + store_syntheticlib4_api_key(normalized) + return self.get_status() + + def clear_credentials(self) -> dict[str, Any]: + """Clear the configured SyntheticLib4 credential without touching snapshots.""" + self._memory_api_key = None + if not system_config.generic_mode: + from backend.shared.secret_store import clear_syntheticlib4_api_key + + clear_syntheticlib4_api_key() + return self.get_status() + + def has_configured_credentials(self) -> bool: + """Return whether a SyntheticLib4 credential is configured without exposing it.""" + if system_config.generic_mode: + return bool(self._memory_api_key) + try: + from backend.shared.secret_store import load_syntheticlib4_api_key + + return bool(load_syntheticlib4_api_key()) + except Exception as exc: + logger.debug("SyntheticLib4 credential status unavailable: %s", exc) + return False + + def list_releases(self, channel: str | None = None) -> dict[str, Any]: + """Return a mock release list derived from the fixture manifest.""" + manifest = self.get_release_manifest() + requested_channel = (channel or manifest.get("channel") or "stable").strip() + release_channel = str(manifest.get("channel") or "stable") + releases = [] + if requested_channel == release_channel: + releases.append( + { + "release_id": manifest.get("release_id", ""), + "channel": release_channel, + "created_at": manifest.get("generated_at", ""), + "lean_toolchain": manifest.get("lean_toolchain", ""), + "mathlib_revision": manifest.get("mathlib_revision", ""), + "syntheticlib4_revision": manifest.get("syntheticlib4_revision", ""), + "proof_count": manifest.get("proof_count", 0), + "schema_version": manifest.get("schema_version", ""), + "compatible_moto_contract_versions": manifest.get( + "compatible_moto_contract_versions", [] + ), + "manifest_url": self._manifest_uri(), + } + ) + return { + "contract_version": SYNTHETICLIB4_CONTRACT_VERSION, + "schema_version": "syntheticlib4.releases.v1", + "releases": releases, + } + + def get_release_manifest(self) -> dict[str, Any]: + """Load the local mock release manifest.""" + source_dir, _source_kind = self._current_source_dir() + manifest_path = source_dir / "release_manifest.json" if source_dir else None + if manifest_path and manifest_path.exists(): + return self._load_json(manifest_path) + return self._builtin_release_manifest() + + def load_proof_metadata(self) -> list[dict[str, Any]]: + """Load SyntheticLib4 proof metadata JSONL fixture records.""" + source_dir, _source_kind = self._current_source_dir() + metadata_path = source_dir / "proof_metadata.jsonl" if source_dir else None + if not metadata_path or not metadata_path.exists(): + return self._builtin_proof_metadata() + + records: list[dict[str, Any]] = [] + for line_number, raw_line in enumerate(metadata_path.read_text(encoding="utf-8").splitlines(), 1): + line = raw_line.strip() + if not line: + continue + try: + record = json.loads(line) + except json.JSONDecodeError as exc: + raise SyntheticLib4ClientError( + f"Invalid SyntheticLib4 metadata JSONL at line {line_number}: {exc}" + ) from exc + self._validate_proof_record(record) + records.append(record) + return records + + def validate_local_snapshot(self) -> dict[str, Any]: + """ + Validate the available local fixture/snapshot before activating search. + + The first MOTO-side build supports JSONL fixture metadata, not full + archive download/extraction. Real hashes are checked when present; mock + hash placeholders are reported but not treated as failures. + """ + source_dir, source_kind = self._current_source_dir() + manifest = self.get_release_manifest() + records = self.load_proof_metadata() + required_manifest_fields = [ + "contract_version", + "schema_version", + "release_id", + "channel", + "proof_count", + "compatible_moto_contract_versions", + ] + missing = [field for field in required_manifest_fields if manifest.get(field) in (None, "")] + if missing: + raise SyntheticLib4ClientError( + f"SyntheticLib4 manifest is missing required fields: {', '.join(missing)}" + ) + compatible = manifest.get("compatible_moto_contract_versions") or [] + if SYNTHETICLIB4_CONTRACT_VERSION not in compatible: + raise SyntheticLib4ClientError( + f"SyntheticLib4 release is not compatible with {SYNTHETICLIB4_CONTRACT_VERSION}" + ) + expected_count = int(manifest.get("proof_count") or 0) + if expected_count and expected_count != len(records): + raise SyntheticLib4ClientError( + f"SyntheticLib4 proof count mismatch: manifest={expected_count}, metadata={len(records)}" + ) + + file_checks: list[dict[str, Any]] = [] + for entry in manifest.get("files") or []: + if not isinstance(entry, dict): + continue + name = validate_single_path_component(str(entry.get("name") or ""), "SyntheticLib4 manifest file") + path = source_dir / name if source_dir else Path() + expected_hash = str(entry.get("sha256") or "").strip() + exists = path.exists() + actual_hash = "" + hash_verified = False + if exists and expected_hash and not expected_hash.startswith("mocksha256"): + actual_hash = hashlib.sha256(path.read_bytes()).hexdigest() + if actual_hash != expected_hash: + raise SyntheticLib4ClientError( + f"SyntheticLib4 snapshot hash mismatch for {name}" + ) + hash_verified = True + file_checks.append( + { + "name": name, + "exists": exists, + "expected_sha256": expected_hash, + "actual_sha256": actual_hash, + "hash_verified": hash_verified, + "mock_hash_placeholder": expected_hash.startswith("mocksha256"), + } + ) + + return { + "valid": True, + "contract_version": SYNTHETICLIB4_CONTRACT_VERSION, + "release_id": manifest.get("release_id", ""), + "channel": manifest.get("channel", "stable"), + "proof_count": len(records), + "fixture_source": source_kind, + "snapshot_dir": str(source_dir) if source_dir else "", + "file_checks": file_checks, + } + + def import_snapshot_directory(self, source_dir: Path | str, *, channel: str = "stable") -> dict[str, Any]: + """ + Validate and activate a local SyntheticLib4 snapshot directory. + + Expected input files are `release_manifest.json` and + `proof_metadata.jsonl`, with optional account-proof fixtures and + `proofs/*.json` hydration records. The existing active snapshot is + preserved unless the staged snapshot validates successfully. + """ + safe_channel = validate_single_path_component(channel or "stable", "SyntheticLib4 channel") + source_path = Path(source_dir).resolve() + if not source_path.exists() or not source_path.is_dir(): + raise SyntheticLib4ClientError(f"SyntheticLib4 snapshot source is not a directory: {source_path}") + self._validate_snapshot_source_tree(source_path) + + releases_root = self.snapshot_root / "releases" + target_dir = releases_root / safe_channel + staging_dir = releases_root / f".{safe_channel}.staging" + previous_dir = releases_root / f".{safe_channel}.previous" + + self._remove_tree(staging_dir) + releases_root.mkdir(parents=True, exist_ok=True) + shutil.copytree(source_path, staging_dir) + + staged_client = SyntheticLib4Client(staging_dir) + staged_validation = staged_client.validate_local_snapshot() + + target_moved = False + try: + self._remove_tree(previous_dir) + if target_dir.exists(): + shutil.move(str(target_dir), str(previous_dir)) + target_moved = True + shutil.move(str(staging_dir), str(target_dir)) + except Exception: + self._remove_tree(target_dir) + if target_moved and previous_dir.exists(): + shutil.move(str(previous_dir), str(target_dir)) + raise + + return { + "success": True, + "activated_channel": safe_channel, + "snapshot_dir": str(target_dir), + "previous_snapshot_preserved": previous_dir.exists(), + "validation": staged_validation, + } + + def retrieve_batch(self, request: dict[str, Any]) -> dict[str, Any]: + """Return up to 7 fixture proofs, honoring cursors and excluded fingerprints.""" + limit = min(max(int(request.get("limit") or 7), 1), 7) + excluded = {str(value) for value in request.get("excluded_fingerprints", [])} + cursor = str(request.get("cursor") or "").strip() + include_full_code = bool(request.get("include_full_code", True)) + offset = 0 + if cursor.startswith("cursor_mock_"): + try: + offset = int(cursor.removeprefix("cursor_mock_")) + except ValueError: + offset = 0 + + records = [ + record + for record in self.load_proof_metadata() + if str(record.get("fingerprint", "")) not in excluded + ] + selected = records[offset : offset + limit] + next_offset = offset + len(selected) + next_cursor = f"cursor_mock_{next_offset}" if next_offset < len(records) else None + + proofs = [] + for record in selected: + payload = dict(record) + if not include_full_code: + payload["lean_code"] = "" + proofs.append(payload) + + manifest = self.get_release_manifest() + return { + "contract_version": SYNTHETICLIB4_CONTRACT_VERSION, + "schema_version": "syntheticlib4.retrieve_batch.v1", + "retrieval_batch_id": f"rb_mock_{offset + 1:03d}", + "release_id": manifest.get("release_id", ""), + "channel": manifest.get("channel", "stable"), + "lean_toolchain": manifest.get("lean_toolchain", ""), + "mathlib_revision": manifest.get("mathlib_revision", ""), + "syntheticlib4_revision": manifest.get("syntheticlib4_revision", ""), + "proofs": proofs, + "next_cursor": next_cursor, + "exhausted": next_cursor is None, + "exhaustion_reason": None if next_cursor else "fixture_exhausted", + "quota_remaining": { + "api_requests_remaining_day": 1999, + "text_searches_remaining_month": 1999, + "semantic_searches_remaining_month": 199, + }, + } + + def hydrate_proof(self, fingerprint: str) -> dict[str, Any] | None: + """Return one fixture proof by fingerprint, including any available Lean code.""" + safe_fingerprint = validate_single_path_component(fingerprint, "SyntheticLib4 fingerprint") + source_dir, _source_kind = self._current_source_dir() + for record in self.load_proof_metadata(): + if record.get("fingerprint") == safe_fingerprint: + hydration_url = str(record.get("hydration_url") or "") + if hydration_url.startswith("fixture://syntheticlib4/proofs/"): + hydrated_path = source_dir / "proofs" / f"{safe_fingerprint}.json" if source_dir else Path() + if hydrated_path.exists(): + hydrated = self._load_json(hydrated_path) + self._validate_proof_record(hydrated) + return {**record, **hydrated} + return record + return None + + def list_account_proofs( + self, + *, + cursor: str | None = None, + limit: int = 50, + release_id: str | None = None, + channel: str | None = None, + ) -> dict[str, Any]: + """Return mock accepted user proofs using the planned account-proof shape.""" + source_dir, _source_kind = self._current_source_dir() + fixture_path = source_dir / "account_proofs_response.json" if source_dir else None + if fixture_path and fixture_path.exists(): + return self._load_json(fixture_path) + return self._account_proofs_from_metadata( + query="", + cursor=cursor, + limit=limit, + release_id=release_id, + channel=channel, + schema_version="syntheticlib4.account_proofs.v1", + ) + + def search_user_proofs( + self, + *, + query: str = "", + module: str | None = None, + novelty_rank: str | None = None, + cursor: str | None = None, + limit: int = 50, + ) -> dict[str, Any]: + """Search mock accepted user proofs using the planned account-proof shape.""" + source_dir, _source_kind = self._current_source_dir() + fixture_path = source_dir / "account_proofs_search_response.json" if source_dir else None + if fixture_path and fixture_path.exists(): + return self._load_json(fixture_path) + search_query = " ".join(part for part in [query, module, novelty_rank] if part) + return self._account_proofs_from_metadata( + query=search_query, + cursor=cursor, + limit=limit, + release_id=None, + channel=None, + schema_version="syntheticlib4.account_proofs.v1", + ) + + def _load_json(self, path: Path) -> dict[str, Any]: + try: + payload = json.loads(path.read_text(encoding="utf-8")) + except FileNotFoundError as exc: + raise SyntheticLib4ClientError(f"SyntheticLib4 fixture missing: {path}") from exc + except json.JSONDecodeError as exc: + raise SyntheticLib4ClientError(f"Invalid SyntheticLib4 fixture JSON: {path}") from exc + if not isinstance(payload, dict): + raise SyntheticLib4ClientError(f"SyntheticLib4 fixture is not an object: {path}") + return payload + + def _current_source_dir(self) -> tuple[Path | None, str]: + if not self._fixture_dir_explicit: + data_snapshot = self._active_data_snapshot_dir() + if data_snapshot is not None: + return data_snapshot, "data_root_snapshot" + if (self.fixture_dir / "release_manifest.json").exists() or (self.fixture_dir / "proof_metadata.jsonl").exists(): + return self.fixture_dir, "filesystem" + return None, "built_in" + + def _active_data_snapshot_dir(self, channel: str = "stable") -> Path | None: + candidate = self.snapshot_root / "releases" / validate_single_path_component(channel, "SyntheticLib4 channel") + if (candidate / "release_manifest.json").exists() and (candidate / "proof_metadata.jsonl").exists(): + return candidate + return None + + def _manifest_uri(self) -> str: + source_dir, source_kind = self._current_source_dir() + if source_kind == "data_root_snapshot" and source_dir is not None: + return f"file://{source_dir / 'release_manifest.json'}" + if source_kind == "filesystem": + return "fixture://syntheticlib4/release_manifest.json" + return "builtin://syntheticlib4/release_manifest.json" + + def _validate_snapshot_source_tree(self, source_path: Path) -> None: + required = {"release_manifest.json", "proof_metadata.jsonl"} + found = {path.name for path in source_path.iterdir() if path.is_file()} + missing = sorted(required - found) + if missing: + raise SyntheticLib4ClientError( + f"SyntheticLib4 snapshot directory is missing required files: {', '.join(missing)}" + ) + + allowed_root_files = { + "release_manifest.json", + "proof_metadata.jsonl", + "account_status_response.json", + "account_proofs_response.json", + "account_proofs_search_response.json", + } + max_file_bytes = 64 * 1024 * 1024 + for path in source_path.rglob("*"): + relative = path.relative_to(source_path) + if path.is_symlink(): + raise SyntheticLib4ClientError(f"SyntheticLib4 snapshot contains a symlink: {relative}") + if path.is_dir(): + if relative.parts and relative.parts[0] != "proofs": + raise SyntheticLib4ClientError(f"SyntheticLib4 snapshot contains an unsupported directory: {relative}") + continue + if path.stat().st_size > max_file_bytes: + raise SyntheticLib4ClientError(f"SyntheticLib4 snapshot file is too large: {relative}") + if len(relative.parts) == 1: + if relative.name not in allowed_root_files: + raise SyntheticLib4ClientError(f"SyntheticLib4 snapshot contains an unsupported file: {relative}") + continue + if len(relative.parts) == 2 and relative.parts[0] == "proofs" and relative.name.endswith(".json"): + validate_single_path_component(relative.name, "SyntheticLib4 hydration proof file") + continue + raise SyntheticLib4ClientError(f"SyntheticLib4 snapshot contains an unsupported path: {relative}") + + # Validate staged content against the normal contract before copying. + staging_client = SyntheticLib4Client(source_path) + staging_client.validate_local_snapshot() + + @staticmethod + def _remove_tree(path: Path) -> None: + if path.exists(): + shutil.rmtree(path) + + def _validate_proof_record(self, record: dict[str, Any]) -> None: + required = [ + "fingerprint", + "theorem_statement", + "theorem_statement_hash", + "lean_code_hash", + "release_membership", + "license_terms_id", + ] + missing = [field for field in required if not str(record.get(field, "")).strip()] + if missing: + raise SyntheticLib4ClientError( + f"SyntheticLib4 proof record {record.get('fingerprint') or ''} " + f"is missing required fields: {', '.join(missing)}" + ) + + def _account_proofs_from_metadata( + self, + *, + query: str, + cursor: str | None, + limit: int, + release_id: str | None, + channel: str | None, + schema_version: str, + ) -> dict[str, Any]: + records = self.load_proof_metadata() + manifest = self.get_release_manifest() + if release_id and release_id != manifest.get("release_id"): + records = [] + if channel and channel != manifest.get("channel"): + records = [] + terms = [term.lower() for term in (query or "").split() if term.strip()] + if terms: + def _matches(record: dict[str, Any]) -> bool: + haystack = " ".join( + str(record.get(field) or "") + for field in ( + "fingerprint", + "display_title", + "theorem_name", + "theorem_statement", + "informal_statement", + "proof_description", + "module", + "source_path", + "novelty_rank", + ) + ).lower() + return all(term in haystack for term in terms) + + records = [record for record in records if _matches(record)] + + offset = 0 + raw_cursor = (cursor or "").strip() + if raw_cursor.startswith("account_cursor_"): + try: + offset = int(raw_cursor.removeprefix("account_cursor_")) + except ValueError: + offset = 0 + capped_limit = min(max(int(limit or 50), 1), 100) + selected = records[offset : offset + capped_limit] + next_offset = offset + len(selected) + next_cursor = f"account_cursor_{next_offset}" if next_offset < len(records) else None + return { + "contract_version": SYNTHETICLIB4_CONTRACT_VERSION, + "schema_version": schema_version, + "proofs": selected, + "next_cursor": next_cursor, + "quota_remaining": { + "api_requests_remaining_day": 1999, + "text_searches_remaining_month": 1999, + "semantic_searches_remaining_month": 199, + }, + } + + def _builtin_account_status(self) -> dict[str, Any]: + return { + "contract_version": SYNTHETICLIB4_CONTRACT_VERSION, + "schema_version": "syntheticlib4.account_status.v1", + "authenticated": True, + "membership_active": True, + "membership_tier": "offline_mock", + "access_expires_at": "", + "scopes": list(_MOCK_SCOPES), + "quota": { + "api_requests_remaining_day": 2000, + "text_searches_remaining_month": 2000, + "semantic_searches_remaining_month": 200, + }, + } + + def _builtin_release_manifest(self) -> dict[str, Any]: + return { + "contract_version": SYNTHETICLIB4_CONTRACT_VERSION, + "schema_version": "syntheticlib4.release_manifest.v1", + "release_id": _BUILTIN_RELEASE_ID, + "channel": "stable", + "generated_at": "2026-06-11T00:00:00Z", + "lean_toolchain": "leanprover/lean4:v4.18.0", + "mathlib_revision": "mock-mathlib-rev", + "syntheticlib4_revision": "built-in-mock", + "license_terms_id": "syntheticlib4-member-license-v1", + "proof_count": 30, + "novelty_distribution": { + "novel_formalization": 20, + "novel_reformulation": 7, + "minor_mathematical_discovery": 3, + }, + "compatible_moto_contract_versions": [SYNTHETICLIB4_CONTRACT_VERSION], + "files": [], + } + + def _builtin_proof_metadata(self) -> list[dict[str, Any]]: + records: list[dict[str, Any]] = [] + for index in range(1, 31): + theorem_name = f"SyntheticLib4.Mock.builtin_helper_{index:03d}" + theorem_statement = f"theorem builtin_helper_{index:03d} : True" + lean_code = ( + "import Mathlib\n\n" + f"theorem builtin_helper_{index:03d} : True := by\n" + " trivial\n" + ) + fingerprint = f"sl4_builtin_fp_{index:03d}" + statement_hash = hashlib.sha256(theorem_statement.encode("utf-8")).hexdigest() + code_hash = hashlib.sha256(lean_code.encode("utf-8")).hexdigest() + metadata_only = index > 20 + records.append( + { + "fingerprint": fingerprint, + "display_title": f"Built-in SyntheticLib4 fixture proof {index}", + "theorem_name": theorem_name, + "theorem_statement": theorem_statement, + "informal_statement": "A built-in offline fixture proof for MOTO proof-search smoke tests.", + "proof_description": "Uses `trivial` to close a True goal.", + "theorem_statement_hash": statement_hash, + "lean_code": "" if metadata_only else lean_code, + "lean_code_hash": code_hash, + "imports": ["Mathlib"], + "dependency_names": ["True.intro"], + "topic_tags": ["fixture"], + "domain_tags": ["logic"], + "module": "SyntheticLib4.Mock", + "source_path": "SyntheticLib4/Mock.lean", + "line_range": {"start": index, "end": index + 2}, + "novelty_rank": "novel_formalization", + "novelty_confidence": 0.5, + "validation_record_id": f"builtin_val_{index:03d}", + "release_membership": "stable", + "license_terms_id": "syntheticlib4-member-license-v1", + "hydration_url": None, + } + ) + return records + + +syntheticlib4_client = SyntheticLib4Client() + diff --git a/backend/shared/workflow_predictor.py b/backend/shared/workflow_predictor.py index 5fcc9be..3688054 100644 --- a/backend/shared/workflow_predictor.py +++ b/backend/shared/workflow_predictor.py @@ -133,13 +133,13 @@ def predict_compiler_workflow( seq = current_sequence if not outline_accepted: - # Outline creation phase (iterative): HC → V → HC → V (max 15 iterations) + # Outline creation phase (iterative): writer -> validator -> writer -> validator for i in range(min(20, 30)): # 15 iterations max = 30 tasks if i % 2 == 0: tasks.append(WorkflowTask( - task_id=f"comp_hc_outline_{seq:03d}", + task_id=f"comp_writer_outline_{seq:03d}", sequence_number=seq + 1, - role="High-Context", + role="Writing Submitter", mode="Outline Creation", provider="lm_studio" )) @@ -157,25 +157,25 @@ def predict_compiler_workflow( break else: # Paper construction phase - Construction cycle pattern - # HC(const) → V → HC(const) → V → HC(const) → V → HC(const) → V → - # HC(outline) → V → HC(review) → V → HC(review) → V → HP(rigor) → V + # writer construction/review turns alternate with validator turns, + # then Rigor & Proofs runs the theorem/proof cycle. cycle_pattern = [ - ("High-Context", "Construction"), + ("Writing Submitter", "Construction"), ("Validator", "Construction Review"), - ("High-Context", "Construction"), + ("Writing Submitter", "Construction"), ("Validator", "Construction Review"), - ("High-Context", "Construction"), + ("Writing Submitter", "Construction"), ("Validator", "Construction Review"), - ("High-Context", "Construction"), + ("Writing Submitter", "Construction"), ("Validator", "Construction Review"), - ("High-Context", "Outline Update"), + ("Writing Submitter", "Outline Update"), ("Validator", "Outline Review"), - ("High-Context", "Paper Review"), + ("Writing Submitter", "Paper Review"), ("Validator", "Review Validation"), - ("High-Context", "Paper Review"), + ("Writing Submitter", "Paper Review"), ("Validator", "Review Validation"), - ("High-Param", "Rigor Enhancement"), + ("Rigor & Proofs", "Rigor Enhancement"), ("Validator", "Rigor Review"), ] @@ -183,9 +183,9 @@ def predict_compiler_workflow( pattern_idx = i % len(cycle_pattern) role, mode = cycle_pattern[pattern_idx] - if role == "High-Context": - task_id = f"comp_hc_{seq:03d}" - elif role == "High-Param": + if role == "Writing Submitter": + task_id = f"comp_writer_{seq:03d}" + elif role == "Rigor & Proofs": task_id = f"comp_hp_{seq:03d}" else: task_id = f"comp_val_{seq:03d}" diff --git a/backend/shared/xai_grok_client.py b/backend/shared/xai_grok_client.py index 18f39a2..90e8a17 100644 --- a/backend/shared/xai_grok_client.py +++ b/backend/shared/xai_grok_client.py @@ -59,19 +59,25 @@ class XAIGrokClient: "openid profile email offline_access grok-cli:access api:access", ) DEFAULT_PLAN = os.getenv("MOTO_XAI_GROK_OAUTH_PLAN", "generic") - DEFAULT_REFERRER = os.getenv("MOTO_XAI_GROK_OAUTH_REFERRER", "moto") + DEFAULT_REFERRER = os.getenv("MOTO_XAI_GROK_OAUTH_REFERRER", "moto-autonomous-asi") REFRESH_SKEW_SECONDS = 120 - MAX_RETRIES = 3 + MAX_RETRIES = 4 RETRY_DELAY = 2.0 + RETRY_MAX_DELAY = 30.0 TRANSIENT_STATUS_CODES = {408, 409, 425, 429, 500, 502, 503, 504, 520, 521, 522, 523, 524} TRANSIENT_MARKERS = ( "bad gateway", "connection timeout", + "disconnect/reset before headers", "gateway timeout", + "incomplete chunked read", "peer closed connection", "service unavailable", + "server_error", "temporarily unavailable", + "upstream connect error", "upstream provider timeout", + "you can retry", ) CHAT_UNSUPPORTED_MODEL_MARKERS = ( "multi-agent", @@ -223,6 +229,14 @@ def _is_transient_text(cls, text: str) -> bool: lowered = (text or "").lower() return any(marker in lowered for marker in cls.TRANSIENT_MARKERS) + @classmethod + def _max_attempts(cls) -> int: + return cls.MAX_RETRIES + 1 + + @classmethod + def _retry_delay(cls, retry_index: int) -> float: + return min(cls.RETRY_MAX_DELAY, cls.RETRY_DELAY * (2 ** max(0, retry_index))) + async def exchange_code( self, *, @@ -486,7 +500,8 @@ async def clear_tokens(self) -> None: async def _post_with_retry(self, url: str, **kwargs) -> httpx.Response: """POST with retry on transient transport/provider errors.""" - for attempt in range(self.MAX_RETRIES): + max_attempts = self._max_attempts() + for attempt in range(max_attempts): try: response = await self.client.post(url, **kwargs) if response.status_code >= 400 and ( @@ -494,36 +509,40 @@ async def _post_with_retry(self, url: str, **kwargs) -> httpx.Response: or self._is_transient_text(response.text) ): error_detail = sanitize_provider_error_text(response.text) + delay = self._retry_delay(attempt) logger.warning( - "xAI Grok transient completion response (attempt %s/%s): status=%s error=%s", + "xAI Grok transient completion response (attempt %s/%s): status=%s error=%s%s", attempt + 1, - self.MAX_RETRIES, + max_attempts, response.status_code, error_detail, + f"; retrying in {delay:.1f}s" if attempt < max_attempts - 1 else "", ) - if attempt < self.MAX_RETRIES - 1: - await asyncio.sleep(self.RETRY_DELAY * (attempt + 1)) + if attempt < max_attempts - 1: + await asyncio.sleep(delay) continue - raise ValueError( - f"xAI Grok connection failed after {self.MAX_RETRIES} attempts: " + raise XAIGrokRequestError( + f"xAI Grok connection failed after {self.MAX_RETRIES} retries: " f"HTTP {response.status_code}: {error_detail}" ) return response - except (httpx.ConnectError, httpx.RemoteProtocolError, httpx.ReadError) as exc: + except httpx.TransportError as exc: error_type = type(exc).__name__ error_detail = sanitize_provider_error_text(str(exc) or repr(exc)) + delay = self._retry_delay(attempt) logger.warning( - "xAI Grok connection error (attempt %s/%s): [%s] %s", + "xAI Grok connection error (attempt %s/%s): [%s] %s%s", attempt + 1, - self.MAX_RETRIES, + max_attempts, error_type, error_detail, + f"; retrying in {delay:.1f}s" if attempt < max_attempts - 1 else "", ) - if attempt < self.MAX_RETRIES - 1: - await asyncio.sleep(self.RETRY_DELAY * (attempt + 1)) + if attempt < max_attempts - 1: + await asyncio.sleep(delay) continue - raise ValueError( - f"xAI Grok connection failed after {self.MAX_RETRIES} attempts: " + raise XAIGrokRequestError( + f"xAI Grok connection failed after {self.MAX_RETRIES} retries: " f"[{error_type}] {error_detail}" ) diff --git a/frontend/package-lock.json b/frontend/package-lock.json index 649115f..0a91f77 100644 --- a/frontend/package-lock.json +++ b/frontend/package-lock.json @@ -1,12 +1,12 @@ { "name": "asi-aggregator-frontend", - "version": "1.1.0", + "version": "1.1.01", "lockfileVersion": 3, "requires": true, "packages": { "": { "name": "asi-aggregator-frontend", - "version": "1.1.0", + "version": "1.1.01", "license": "MIT", "dependencies": { "dompurify": "^3.2.4", @@ -15,160 +15,87 @@ "react-dom": "^18.2.0" }, "devDependencies": { + "@testing-library/jest-dom": "^6.9.1", + "@testing-library/react": "^16.3.2", + "@testing-library/user-event": "^14.6.1", "@types/react": "^18.2.43", "@types/react-dom": "^18.2.17", - "@vitejs/plugin-react": "^4.2.1", - "vite": "^7.1.12" - } - }, - "node_modules/@babel/code-frame": { - "version": "7.27.1", - "resolved": "https://registry.npmjs.org/@babel/code-frame/-/code-frame-7.27.1.tgz", - "integrity": "sha512-cjQ7ZlQ0Mv3b47hABuTevyTuYN4i+loJKGeV9flcCgIK37cCXRh+L1bd3iBHlynerhQ7BhCkn2BPbQUL+rGqFg==", - "dev": true, - "license": "MIT", - "dependencies": { - "@babel/helper-validator-identifier": "^7.27.1", - "js-tokens": "^4.0.0", - "picocolors": "^1.1.1" - }, - "engines": { - "node": ">=6.9.0" + "@vitejs/plugin-react": "^6.0.2", + "jsdom": "^29.1.1", + "vite": "^8.0.16", + "vitest": "^4.1.8" } }, - "node_modules/@babel/compat-data": { - "version": "7.28.5", - "resolved": "https://registry.npmjs.org/@babel/compat-data/-/compat-data-7.28.5.tgz", - "integrity": "sha512-6uFXyCayocRbqhZOB+6XcuZbkMNimwfVGFji8CTZnCzOHVGvDqzvitu1re2AU5LROliz7eQPhB8CpAMvnx9EjA==", + "node_modules/@adobe/css-tools": { + "version": "4.5.0", + "resolved": "https://registry.npmjs.org/@adobe/css-tools/-/css-tools-4.5.0.tgz", + "integrity": "sha512-6OzddxPio9UiWTCemp4N8cYLV2ZN1ncRnV1cVGtve7dhPOtRkleRyx32GQCYSwDYgaHU3USMm84tNsvKzRCa1Q==", "dev": true, - "license": "MIT", - "engines": { - "node": ">=6.9.0" - } + "license": "MIT" }, - "node_modules/@babel/core": { - "version": "7.28.5", - "resolved": "https://registry.npmjs.org/@babel/core/-/core-7.28.5.tgz", - "integrity": "sha512-e7jT4DxYvIDLk1ZHmU/m/mB19rex9sv0c2ftBtjSBv+kVM/902eh0fINUzD7UwLLNR+jU585GxUJ8/EBfAM5fw==", + "node_modules/@asamuzakjp/css-color": { + "version": "5.1.11", + "resolved": "https://registry.npmjs.org/@asamuzakjp/css-color/-/css-color-5.1.11.tgz", + "integrity": "sha512-KVw6qIiCTUQhByfTd78h2yD1/00waTmm9uy/R7Ck/ctUyAPj+AEDLkQIdJW0T8+qGgj3j5bpNKK7Q3G+LedJWg==", "dev": true, "license": "MIT", "dependencies": { - "@babel/code-frame": "^7.27.1", - "@babel/generator": "^7.28.5", - "@babel/helper-compilation-targets": "^7.27.2", - "@babel/helper-module-transforms": "^7.28.3", - "@babel/helpers": "^7.28.4", - "@babel/parser": "^7.28.5", - "@babel/template": "^7.27.2", - "@babel/traverse": "^7.28.5", - "@babel/types": "^7.28.5", - "@jridgewell/remapping": "^2.3.5", - "convert-source-map": "^2.0.0", - "debug": "^4.1.0", - "gensync": "^1.0.0-beta.2", - "json5": "^2.2.3", - "semver": "^6.3.1" + "@asamuzakjp/generational-cache": "^1.0.1", + "@csstools/css-calc": "^3.2.0", + "@csstools/css-color-parser": "^4.1.0", + "@csstools/css-parser-algorithms": "^4.0.0", + "@csstools/css-tokenizer": "^4.0.0" }, "engines": { - "node": ">=6.9.0" - }, - "funding": { - "type": "opencollective", - "url": "https://opencollective.com/babel" + "node": "^20.19.0 || ^22.12.0 || >=24.0.0" } }, - "node_modules/@babel/generator": { - "version": "7.28.5", - "resolved": "https://registry.npmjs.org/@babel/generator/-/generator-7.28.5.tgz", - "integrity": "sha512-3EwLFhZ38J4VyIP6WNtt2kUdW9dokXA9Cr4IVIFHuCpZ3H8/YFOl5JjZHisrn1fATPBmKKqXzDFvh9fUwHz6CQ==", + "node_modules/@asamuzakjp/dom-selector": { + "version": "7.1.1", + "resolved": "https://registry.npmjs.org/@asamuzakjp/dom-selector/-/dom-selector-7.1.1.tgz", + "integrity": "sha512-67RZDnYRc8H/8MLDgQCDE//zoqVFwajkepHZgmXrbwybzXOEwOWGPYGmALYl9J2DOLfFPPs6kKCqmbzV895hTQ==", "dev": true, "license": "MIT", "dependencies": { - "@babel/parser": "^7.28.5", - "@babel/types": "^7.28.5", - "@jridgewell/gen-mapping": "^0.3.12", - "@jridgewell/trace-mapping": "^0.3.28", - "jsesc": "^3.0.2" + "@asamuzakjp/generational-cache": "^1.0.1", + "@asamuzakjp/nwsapi": "^2.3.9", + "bidi-js": "^1.0.3", + "css-tree": "^3.2.1", + "is-potential-custom-element-name": "^1.0.1" }, "engines": { - "node": ">=6.9.0" + "node": "^20.19.0 || ^22.12.0 || >=24.0.0" } }, - "node_modules/@babel/helper-compilation-targets": { - "version": "7.27.2", - "resolved": "https://registry.npmjs.org/@babel/helper-compilation-targets/-/helper-compilation-targets-7.27.2.tgz", - "integrity": "sha512-2+1thGUUWWjLTYTHZWK1n8Yga0ijBz1XAhUXcKy81rd5g6yh7hGqMp45v7cadSbEHc9G3OTv45SyneRN3ps4DQ==", + "node_modules/@asamuzakjp/generational-cache": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/@asamuzakjp/generational-cache/-/generational-cache-1.0.1.tgz", + "integrity": "sha512-wajfB8KqzMCN2KGNFdLkReeHncd0AslUSrvHVvvYWuU8ghncRJoA50kT3zP9MVL0+9g4/67H+cdvBskj9THPzg==", "dev": true, "license": "MIT", - "dependencies": { - "@babel/compat-data": "^7.27.2", - "@babel/helper-validator-option": "^7.27.1", - "browserslist": "^4.24.0", - "lru-cache": "^5.1.1", - "semver": "^6.3.1" - }, "engines": { - "node": ">=6.9.0" + "node": "^20.19.0 || ^22.12.0 || >=24.0.0" } }, - "node_modules/@babel/helper-globals": { - "version": "7.28.0", - "resolved": "https://registry.npmjs.org/@babel/helper-globals/-/helper-globals-7.28.0.tgz", - "integrity": "sha512-+W6cISkXFa1jXsDEdYA8HeevQT/FULhxzR99pxphltZcVaugps53THCeiWA8SguxxpSp3gKPiuYfSWopkLQ4hw==", + "node_modules/@asamuzakjp/nwsapi": { + "version": "2.3.9", + "resolved": "https://registry.npmjs.org/@asamuzakjp/nwsapi/-/nwsapi-2.3.9.tgz", + "integrity": "sha512-n8GuYSrI9bF7FFZ/SjhwevlHc8xaVlb/7HmHelnc/PZXBD2ZR49NnN9sMMuDdEGPeeRQ5d0hqlSlEpgCX3Wl0Q==", "dev": true, - "license": "MIT", - "engines": { - "node": ">=6.9.0" - } + "license": "MIT" }, - "node_modules/@babel/helper-module-imports": { + "node_modules/@babel/code-frame": { "version": "7.27.1", - "resolved": "https://registry.npmjs.org/@babel/helper-module-imports/-/helper-module-imports-7.27.1.tgz", - "integrity": "sha512-0gSFWUPNXNopqtIPQvlD5WgXYI5GY2kP2cCvoT8kczjbfcfuIljTbcWrulD1CIPIX2gt1wghbDy08yE1p+/r3w==", - "dev": true, - "license": "MIT", - "dependencies": { - "@babel/traverse": "^7.27.1", - "@babel/types": "^7.27.1" - }, - "engines": { - "node": ">=6.9.0" - } - }, - "node_modules/@babel/helper-module-transforms": { - "version": "7.28.3", - "resolved": "https://registry.npmjs.org/@babel/helper-module-transforms/-/helper-module-transforms-7.28.3.tgz", - "integrity": "sha512-gytXUbs8k2sXS9PnQptz5o0QnpLL51SwASIORY6XaBKF88nsOT0Zw9szLqlSGQDP/4TljBAD5y98p2U1fqkdsw==", + "resolved": "https://registry.npmjs.org/@babel/code-frame/-/code-frame-7.27.1.tgz", + "integrity": "sha512-cjQ7ZlQ0Mv3b47hABuTevyTuYN4i+loJKGeV9flcCgIK37cCXRh+L1bd3iBHlynerhQ7BhCkn2BPbQUL+rGqFg==", "dev": true, "license": "MIT", + "peer": true, "dependencies": { - "@babel/helper-module-imports": "^7.27.1", "@babel/helper-validator-identifier": "^7.27.1", - "@babel/traverse": "^7.28.3" - }, - "engines": { - "node": ">=6.9.0" + "js-tokens": "^4.0.0", + "picocolors": "^1.1.1" }, - "peerDependencies": { - "@babel/core": "^7.0.0" - } - }, - "node_modules/@babel/helper-plugin-utils": { - "version": "7.27.1", - "resolved": "https://registry.npmjs.org/@babel/helper-plugin-utils/-/helper-plugin-utils-7.27.1.tgz", - "integrity": "sha512-1gn1Up5YXka3YYAHGKpbideQ5Yjf1tDa9qYcgysz+cNCXukyLl6DjPXhD3VRwSb8c0J9tA4b2+rHEZtc6R0tlw==", - "dev": true, - "license": "MIT", - "engines": { - "node": ">=6.9.0" - } - }, - "node_modules/@babel/helper-string-parser": { - "version": "7.27.1", - "resolved": "https://registry.npmjs.org/@babel/helper-string-parser/-/helper-string-parser-7.27.1.tgz", - "integrity": "sha512-qMlSxKbpRlAridDExk92nSobyDdpPijUq2DW6oDnUqd0iOGxmQjyqhMIihI9+zv4LPyZdRje2cavWPbCbWm3eA==", - "dev": true, - "license": "MIT", "engines": { "node": ">=6.9.0" } @@ -179,236 +106,284 @@ "integrity": "sha512-qSs4ifwzKJSV39ucNjsvc6WVHs6b7S03sOh2OcHF9UHfVPqWWALUsNUVzhSBiItjRZoLHx7nIarVjqKVusUZ1Q==", "dev": true, "license": "MIT", + "peer": true, "engines": { "node": ">=6.9.0" } }, - "node_modules/@babel/helper-validator-option": { - "version": "7.27.1", - "resolved": "https://registry.npmjs.org/@babel/helper-validator-option/-/helper-validator-option-7.27.1.tgz", - "integrity": "sha512-YvjJow9FxbhFFKDSuFnVCe2WxXk1zWc22fFePVNEaWJEu8IrZVlda6N0uHwzZrUM1il7NC9Mlp4MaJYbYd9JSg==", + "node_modules/@babel/runtime": { + "version": "7.29.7", + "resolved": "https://registry.npmjs.org/@babel/runtime/-/runtime-7.29.7.tgz", + "integrity": "sha512-Nq8OhGWiZIZGV6hLHoyAKLLcJihP/xFeBMGJoUrxTX2psI8dCifzLhZISFb+VWS3wFMRDmCGw5R+dOySCqPLhw==", "dev": true, "license": "MIT", "engines": { "node": ">=6.9.0" } }, - "node_modules/@babel/helpers": { - "version": "7.28.4", - "resolved": "https://registry.npmjs.org/@babel/helpers/-/helpers-7.28.4.tgz", - "integrity": "sha512-HFN59MmQXGHVyYadKLVumYsA9dBFun/ldYxipEjzA4196jpLZd8UjEEBLkbEkvfYreDqJhZxYAWFPtrfhNpj4w==", + "node_modules/@bramus/specificity": { + "version": "2.4.2", + "resolved": "https://registry.npmjs.org/@bramus/specificity/-/specificity-2.4.2.tgz", + "integrity": "sha512-ctxtJ/eA+t+6q2++vj5j7FYX3nRu311q1wfYH3xjlLOsczhlhxAg2FWNUXhpGvAw3BWo1xBcvOV6/YLc2r5FJw==", "dev": true, "license": "MIT", "dependencies": { - "@babel/template": "^7.27.2", - "@babel/types": "^7.28.4" + "css-tree": "^3.0.0" }, - "engines": { - "node": ">=6.9.0" + "bin": { + "specificity": "bin/cli.js" } }, - "node_modules/@babel/parser": { - "version": "7.28.5", - "resolved": "https://registry.npmjs.org/@babel/parser/-/parser-7.28.5.tgz", - "integrity": "sha512-KKBU1VGYR7ORr3At5HAtUQ+TV3SzRCXmA/8OdDZiLDBIZxVyzXuztPjfLd3BV1PRAQGCMWWSHYhL0F8d5uHBDQ==", + "node_modules/@csstools/color-helpers": { + "version": "6.0.2", + "resolved": "https://registry.npmjs.org/@csstools/color-helpers/-/color-helpers-6.0.2.tgz", + "integrity": "sha512-LMGQLS9EuADloEFkcTBR3BwV/CGHV7zyDxVRtVDTwdI2Ca4it0CCVTT9wCkxSgokjE5Ho41hEPgb8OEUwoXr6Q==", "dev": true, - "license": "MIT", - "dependencies": { - "@babel/types": "^7.28.5" - }, - "bin": { - "parser": "bin/babel-parser.js" - }, + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/csstools" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/csstools" + } + ], + "license": "MIT-0", "engines": { - "node": ">=6.0.0" + "node": ">=20.19.0" } }, - "node_modules/@babel/plugin-transform-react-jsx-self": { - "version": "7.27.1", - "resolved": "https://registry.npmjs.org/@babel/plugin-transform-react-jsx-self/-/plugin-transform-react-jsx-self-7.27.1.tgz", - "integrity": "sha512-6UzkCs+ejGdZ5mFFC/OCUrv028ab2fp1znZmCZjAOBKiBK2jXD1O+BPSfX8X2qjJ75fZBMSnQn3Rq2mrBJK2mw==", + "node_modules/@csstools/css-calc": { + "version": "3.2.1", + "resolved": "https://registry.npmjs.org/@csstools/css-calc/-/css-calc-3.2.1.tgz", + "integrity": "sha512-DtdHlgXh5ZkA43cwBcAm+huzgJiwx3ZTWVjBs94kwz2xKqSimDA3lBgCjphYgwgVUMWatSM0pDd8TILB1yrVVg==", "dev": true, + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/csstools" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/csstools" + } + ], "license": "MIT", - "dependencies": { - "@babel/helper-plugin-utils": "^7.27.1" - }, "engines": { - "node": ">=6.9.0" + "node": ">=20.19.0" }, "peerDependencies": { - "@babel/core": "^7.0.0-0" + "@csstools/css-parser-algorithms": "^4.0.0", + "@csstools/css-tokenizer": "^4.0.0" } }, - "node_modules/@babel/plugin-transform-react-jsx-source": { - "version": "7.27.1", - "resolved": "https://registry.npmjs.org/@babel/plugin-transform-react-jsx-source/-/plugin-transform-react-jsx-source-7.27.1.tgz", - "integrity": "sha512-zbwoTsBruTeKB9hSq73ha66iFeJHuaFkUbwvqElnygoNbj/jHRsSeokowZFN3CZ64IvEqcmmkVe89OPXc7ldAw==", + "node_modules/@csstools/css-color-parser": { + "version": "4.1.3", + "resolved": "https://registry.npmjs.org/@csstools/css-color-parser/-/css-color-parser-4.1.3.tgz", + "integrity": "sha512-DOgvIPkikIOixQRlD4YF31VN6fLLUTdrzhfRbis8vm0kMTgIbEPX0Ip/YX9fOeV9iywAS4sUUbTclpan7yYP8Q==", "dev": true, + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/csstools" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/csstools" + } + ], "license": "MIT", "dependencies": { - "@babel/helper-plugin-utils": "^7.27.1" + "@csstools/color-helpers": "^6.0.2", + "@csstools/css-calc": "^3.2.1" }, "engines": { - "node": ">=6.9.0" + "node": ">=20.19.0" }, "peerDependencies": { - "@babel/core": "^7.0.0-0" + "@csstools/css-parser-algorithms": "^4.0.0", + "@csstools/css-tokenizer": "^4.0.0" } }, - "node_modules/@babel/template": { - "version": "7.27.2", - "resolved": "https://registry.npmjs.org/@babel/template/-/template-7.27.2.tgz", - "integrity": "sha512-LPDZ85aEJyYSd18/DkjNh4/y1ntkE5KwUHWTiqgRxruuZL2F1yuHligVHLvcHY2vMHXttKFpJn6LwfI7cw7ODw==", + "node_modules/@csstools/css-parser-algorithms": { + "version": "4.0.0", + "resolved": "https://registry.npmjs.org/@csstools/css-parser-algorithms/-/css-parser-algorithms-4.0.0.tgz", + "integrity": "sha512-+B87qS7fIG3L5h3qwJ/IFbjoVoOe/bpOdh9hAjXbvx0o8ImEmUsGXN0inFOnk2ChCFgqkkGFQ+TpM5rbhkKe4w==", "dev": true, + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/csstools" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/csstools" + } + ], "license": "MIT", - "dependencies": { - "@babel/code-frame": "^7.27.1", - "@babel/parser": "^7.27.2", - "@babel/types": "^7.27.1" - }, "engines": { - "node": ">=6.9.0" + "node": ">=20.19.0" + }, + "peerDependencies": { + "@csstools/css-tokenizer": "^4.0.0" } }, - "node_modules/@babel/traverse": { - "version": "7.28.5", - "resolved": "https://registry.npmjs.org/@babel/traverse/-/traverse-7.28.5.tgz", - "integrity": "sha512-TCCj4t55U90khlYkVV/0TfkJkAkUg3jZFA3Neb7unZT8CPok7iiRfaX0F+WnqWqt7OxhOn0uBKXCw4lbL8W0aQ==", + "node_modules/@csstools/css-syntax-patches-for-csstree": { + "version": "1.1.5", + "resolved": "https://registry.npmjs.org/@csstools/css-syntax-patches-for-csstree/-/css-syntax-patches-for-csstree-1.1.5.tgz", + "integrity": "sha512-oNjBvzLq2GPZtJphCjLqXow/cHySHSgtxvKZb7OqSZ/xHgw6NWNhfad+6AB9cLeVm6eA9d/qMll3JdEHjy6M+A==", "dev": true, - "license": "MIT", - "dependencies": { - "@babel/code-frame": "^7.27.1", - "@babel/generator": "^7.28.5", - "@babel/helper-globals": "^7.28.0", - "@babel/parser": "^7.28.5", - "@babel/template": "^7.27.2", - "@babel/types": "^7.28.5", - "debug": "^4.3.1" + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/csstools" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/csstools" + } + ], + "license": "MIT-0", + "peerDependencies": { + "css-tree": "^3.2.1" }, - "engines": { - "node": ">=6.9.0" + "peerDependenciesMeta": { + "css-tree": { + "optional": true + } } }, - "node_modules/@babel/types": { - "version": "7.28.5", - "resolved": "https://registry.npmjs.org/@babel/types/-/types-7.28.5.tgz", - "integrity": "sha512-qQ5m48eI/MFLQ5PxQj4PFaprjyCTLI37ElWMmNs0K8Lk3dVeOdNpB3ks8jc7yM5CDmVC73eMVk/trk3fgmrUpA==", + "node_modules/@csstools/css-tokenizer": { + "version": "4.0.0", + "resolved": "https://registry.npmjs.org/@csstools/css-tokenizer/-/css-tokenizer-4.0.0.tgz", + "integrity": "sha512-QxULHAm7cNu72w97JUNCBFODFaXpbDg+dP8b/oWFAZ2MTRppA3U00Y2L1HqaS4J6yBqxwa/Y3nMBaxVKbB/NsA==", "dev": true, + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/csstools" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/csstools" + } + ], "license": "MIT", - "dependencies": { - "@babel/helper-string-parser": "^7.27.1", - "@babel/helper-validator-identifier": "^7.28.5" - }, "engines": { - "node": ">=6.9.0" + "node": ">=20.19.0" } }, - "node_modules/@esbuild/win32-x64": { - "version": "0.27.2", - "resolved": "https://registry.npmjs.org/@esbuild/win32-x64/-/win32-x64-0.27.2.tgz", - "integrity": "sha512-sRdU18mcKf7F+YgheI/zGf5alZatMUTKj/jNS6l744f9u3WFu4v7twcUI9vu4mknF4Y9aDlblIie0IM+5xxaqQ==", - "cpu": [ - "x64" - ], + "node_modules/@emnapi/core": { + "version": "1.10.0", + "resolved": "https://registry.npmjs.org/@emnapi/core/-/core-1.10.0.tgz", + "integrity": "sha512-yq6OkJ4p82CAfPl0u9mQebQHKPJkY7WrIuk205cTYnYe+k2Z8YBh11FrbRG/H6ihirqcacOgl2BIO8oyMQLeXw==", "dev": true, "license": "MIT", "optional": true, - "os": [ - "win32" - ], - "engines": { - "node": ">=18" + "dependencies": { + "@emnapi/wasi-threads": "1.2.1", + "tslib": "^2.4.0" } }, - "node_modules/@jridgewell/gen-mapping": { - "version": "0.3.13", - "resolved": "https://registry.npmjs.org/@jridgewell/gen-mapping/-/gen-mapping-0.3.13.tgz", - "integrity": "sha512-2kkt/7niJ6MgEPxF0bYdQ6etZaA+fQvDcLKckhy1yIQOzaoKjBBjSj63/aLVjYE3qhRt5dvM+uUyfCg6UKCBbA==", + "node_modules/@emnapi/runtime": { + "version": "1.10.0", + "resolved": "https://registry.npmjs.org/@emnapi/runtime/-/runtime-1.10.0.tgz", + "integrity": "sha512-ewvYlk86xUoGI0zQRNq/mC+16R1QeDlKQy21Ki3oSYXNgLb45GV1P6A0M+/s6nyCuNDqe5VpaY84BzXGwVbwFA==", "dev": true, "license": "MIT", + "optional": true, "dependencies": { - "@jridgewell/sourcemap-codec": "^1.5.0", - "@jridgewell/trace-mapping": "^0.3.24" + "tslib": "^2.4.0" } }, - "node_modules/@jridgewell/remapping": { - "version": "2.3.5", - "resolved": "https://registry.npmjs.org/@jridgewell/remapping/-/remapping-2.3.5.tgz", - "integrity": "sha512-LI9u/+laYG4Ds1TDKSJW2YPrIlcVYOwi2fUC6xB43lueCjgxV4lffOCZCtYFiH6TNOX+tQKXx97T4IKHbhyHEQ==", + "node_modules/@emnapi/wasi-threads": { + "version": "1.2.1", + "resolved": "https://registry.npmjs.org/@emnapi/wasi-threads/-/wasi-threads-1.2.1.tgz", + "integrity": "sha512-uTII7OYF+/Mes/MrcIOYp5yOtSMLBWSIoLPpcgwipoiKbli6k322tcoFsxoIIxPDqW01SQGAgko4EzZi2BNv2w==", "dev": true, "license": "MIT", + "optional": true, "dependencies": { - "@jridgewell/gen-mapping": "^0.3.5", - "@jridgewell/trace-mapping": "^0.3.24" + "tslib": "^2.4.0" } }, - "node_modules/@jridgewell/resolve-uri": { - "version": "3.1.2", - "resolved": "https://registry.npmjs.org/@jridgewell/resolve-uri/-/resolve-uri-3.1.2.tgz", - "integrity": "sha512-bRISgCIjP20/tbWSPWMEi54QVPRZExkuD9lJL+UIxUKtwVJA8wW1Trb1jMs1RFXo1CBTNZ/5hpC9QvmKWdopKw==", + "node_modules/@esbuild/aix-ppc64": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/aix-ppc64/-/aix-ppc64-0.28.1.tgz", + "integrity": "sha512-Svl7tq8k/08+p6CXPpRjQ1fKX+1odH/BQbb48fV6fj3CWHhsoIOoY87w1oHXm0qEpkIK3ZfVgp0hed3XBXzXMQ==", + "cpu": [ + "ppc64" + ], "dev": true, "license": "MIT", + "optional": true, + "os": [ + "aix" + ], + "peer": true, "engines": { - "node": ">=6.0.0" + "node": ">=18" } }, - "node_modules/@jridgewell/sourcemap-codec": { - "version": "1.5.5", - "resolved": "https://registry.npmjs.org/@jridgewell/sourcemap-codec/-/sourcemap-codec-1.5.5.tgz", - "integrity": "sha512-cYQ9310grqxueWbl+WuIUIaiUaDcj7WOq5fVhEljNVgRfOUhY9fy2zTvfoqWsnebh8Sl70VScFbICvJnLKB0Og==", - "dev": true, - "license": "MIT" - }, - "node_modules/@jridgewell/trace-mapping": { - "version": "0.3.31", - "resolved": "https://registry.npmjs.org/@jridgewell/trace-mapping/-/trace-mapping-0.3.31.tgz", - "integrity": "sha512-zzNR+SdQSDJzc8joaeP8QQoCQr8NuYx2dIIytl1QeBEZHJ9uW6hebsrYgbz8hJwUQao3TWCMtmfV8Nu1twOLAw==", + "node_modules/@esbuild/android-arm": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/android-arm/-/android-arm-0.28.1.tgz", + "integrity": "sha512-0k2F129Xdio1TdJfzJ8sy1Q47vUD2NnwdhiAf7drUN1EBTfPf4hsFCtmMgu/6m8JSzsBrlmVjudMBQqOfG8usQ==", + "cpu": [ + "arm" + ], "dev": true, "license": "MIT", - "dependencies": { - "@jridgewell/resolve-uri": "^3.1.0", - "@jridgewell/sourcemap-codec": "^1.4.14" + "optional": true, + "os": [ + "android" + ], + "peer": true, + "engines": { + "node": ">=18" } }, - "node_modules/@rolldown/pluginutils": { - "version": "1.0.0-beta.27", - "resolved": "https://registry.npmjs.org/@rolldown/pluginutils/-/pluginutils-1.0.0-beta.27.tgz", - "integrity": "sha512-+d0F4MKMCbeVUJwG96uQ4SgAznZNSq93I3V+9NHA4OpvqG8mRCpGdKmK8l/dl02h2CCDHwW2FqilnTyDcAnqjA==", - "dev": true, - "license": "MIT" - }, - "node_modules/@rollup/rollup-android-arm-eabi": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-android-arm-eabi/-/rollup-android-arm-eabi-4.59.0.tgz", - "integrity": "sha512-upnNBkA6ZH2VKGcBj9Fyl9IGNPULcjXRlg0LLeaioQWueH30p6IXtJEbKAgvyv+mJaMxSm1l6xwDXYjpEMiLMg==", + "node_modules/@esbuild/android-arm64": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/android-arm64/-/android-arm64-0.28.1.tgz", + "integrity": "sha512-34EGEbCIAgosYz6goLcopX6Mo7NyGv9tfwEM2/7Ce2VcVRk568iSvniGWcUXIy7wEDR1wzolcxcriFVrWYcwBg==", "cpu": [ - "arm" + "arm64" ], "dev": true, "license": "MIT", "optional": true, "os": [ "android" - ] + ], + "peer": true, + "engines": { + "node": ">=18" + } }, - "node_modules/@rollup/rollup-android-arm64": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-android-arm64/-/rollup-android-arm64-4.59.0.tgz", - "integrity": "sha512-hZ+Zxj3SySm4A/DylsDKZAeVg0mvi++0PYVceVyX7hemkw7OreKdCvW2oQ3T1FMZvCaQXqOTHb8qmBShoqk69Q==", + "node_modules/@esbuild/android-x64": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/android-x64/-/android-x64-0.28.1.tgz", + "integrity": "sha512-dbwY7ltSMDWsRatcRpCnES4F+im88OCUgGZjy52shC7GqHRE/cYlxNbB4Z4UpJswpcc4Qxd2oE/ufM0p61IKng==", "cpu": [ - "arm64" + "x64" ], "dev": true, "license": "MIT", "optional": true, "os": [ "android" - ] + ], + "peer": true, + "engines": { + "node": ">=18" + } }, - "node_modules/@rollup/rollup-darwin-arm64": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-darwin-arm64/-/rollup-darwin-arm64-4.59.0.tgz", - "integrity": "sha512-W2Psnbh1J8ZJw0xKAd8zdNgF9HRLkdWwwdWqubSVk0pUuQkoHnv7rx4GiF9rT4t5DIZGAsConRE3AxCdJ4m8rg==", + "node_modules/@esbuild/darwin-arm64": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/darwin-arm64/-/darwin-arm64-0.28.1.tgz", + "integrity": "sha512-TZbWkQY7kvTAXbXUT7uVACR5cMHsDiSz9z7ZKAX/RTq/WJEk3QyRr0wZpNhBDX+/0CtdqUIJlOiodQcta6tY3Q==", "cpu": [ "arm64" ], @@ -417,12 +392,16 @@ "optional": true, "os": [ "darwin" - ] + ], + "peer": true, + "engines": { + "node": ">=18" + } }, - "node_modules/@rollup/rollup-darwin-x64": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-darwin-x64/-/rollup-darwin-x64-4.59.0.tgz", - "integrity": "sha512-ZW2KkwlS4lwTv7ZVsYDiARfFCnSGhzYPdiOU4IM2fDbL+QGlyAbjgSFuqNRbSthybLbIJ915UtZBtmuLrQAT/w==", + "node_modules/@esbuild/darwin-x64": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/darwin-x64/-/darwin-x64-0.28.1.tgz", + "integrity": "sha512-zfdzgK9ACBNZLI/CyHTOx81SyNbM6YXn7rxSgX97VjyiPl9W1i4Ka4fgKECEoFCKGpvBj5qArWIGgQjOwkgskQ==", "cpu": [ "x64" ], @@ -431,12 +410,16 @@ "optional": true, "os": [ "darwin" - ] + ], + "peer": true, + "engines": { + "node": ">=18" + } }, - "node_modules/@rollup/rollup-freebsd-arm64": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-freebsd-arm64/-/rollup-freebsd-arm64-4.59.0.tgz", - "integrity": "sha512-EsKaJ5ytAu9jI3lonzn3BgG8iRBjV4LxZexygcQbpiU0wU0ATxhNVEpXKfUa0pS05gTcSDMKpn3Sx+QB9RlTTA==", + "node_modules/@esbuild/freebsd-arm64": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/freebsd-arm64/-/freebsd-arm64-0.28.1.tgz", + "integrity": "sha512-wG2EA8ENdEI0qhkSZMjfqrdY+ziCYCPMmtZjjIwOmXFjmyzEHn+UUxk5of+SYsjtfs3VpnlC7QLzSI5hY/rOAw==", "cpu": [ "arm64" ], @@ -445,12 +428,16 @@ "optional": true, "os": [ "freebsd" - ] + ], + "peer": true, + "engines": { + "node": ">=18" + } }, - "node_modules/@rollup/rollup-freebsd-x64": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-freebsd-x64/-/rollup-freebsd-x64-4.59.0.tgz", - "integrity": "sha512-d3DuZi2KzTMjImrxoHIAODUZYoUUMsuUiY4SRRcJy6NJoZ6iIqWnJu9IScV9jXysyGMVuW+KNzZvBLOcpdl3Vg==", + "node_modules/@esbuild/freebsd-x64": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/freebsd-x64/-/freebsd-x64-0.28.1.tgz", + "integrity": "sha512-i7dZ9vQgnvSCzi/rYCXNgtF/U+eKZNJBzu3eTQbRgHnM7tNSizLOkRFAl3qzVc/Op/u5YkHHa4pf/3DOYHthLQ==", "cpu": [ "x64" ], @@ -459,12 +446,16 @@ "optional": true, "os": [ "freebsd" - ] + ], + "peer": true, + "engines": { + "node": ">=18" + } }, - "node_modules/@rollup/rollup-linux-arm-gnueabihf": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-arm-gnueabihf/-/rollup-linux-arm-gnueabihf-4.59.0.tgz", - "integrity": "sha512-t4ONHboXi/3E0rT6OZl1pKbl2Vgxf9vJfWgmUoCEVQVxhW6Cw/c8I6hbbu7DAvgp82RKiH7TpLwxnJeKv2pbsw==", + "node_modules/@esbuild/linux-arm": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/linux-arm/-/linux-arm-0.28.1.tgz", + "integrity": "sha512-qVXBOHQS+d5Y722GwJzJUtOLlX7km3CraOaGormF1pDtPd2C/l1SHRPgjLunLGe51Sh5YYWKMFDyV4SxgMQYTQ==", "cpu": [ "arm" ], @@ -473,152 +464,178 @@ "optional": true, "os": [ "linux" - ] + ], + "peer": true, + "engines": { + "node": ">=18" + } }, - "node_modules/@rollup/rollup-linux-arm-musleabihf": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-arm-musleabihf/-/rollup-linux-arm-musleabihf-4.59.0.tgz", - "integrity": "sha512-CikFT7aYPA2ufMD086cVORBYGHffBo4K8MQ4uPS/ZnY54GKj36i196u8U+aDVT2LX4eSMbyHtyOh7D7Zvk2VvA==", + "node_modules/@esbuild/linux-arm64": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/linux-arm64/-/linux-arm64-0.28.1.tgz", + "integrity": "sha512-yHs+0uc8+nvEAfAfxrWQKK5peSNzBc4PegcMO0EJ2hT71uA7vB8Ihg2e77R2P7SG5uYjPbHlLLmve4LLLRCf0g==", "cpu": [ - "arm" + "arm64" ], "dev": true, "license": "MIT", "optional": true, "os": [ "linux" - ] + ], + "peer": true, + "engines": { + "node": ">=18" + } }, - "node_modules/@rollup/rollup-linux-arm64-gnu": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-arm64-gnu/-/rollup-linux-arm64-gnu-4.59.0.tgz", - "integrity": "sha512-jYgUGk5aLd1nUb1CtQ8E+t5JhLc9x5WdBKew9ZgAXg7DBk0ZHErLHdXM24rfX+bKrFe+Xp5YuJo54I5HFjGDAA==", + "node_modules/@esbuild/linux-ia32": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/linux-ia32/-/linux-ia32-0.28.1.tgz", + "integrity": "sha512-d1z4ZuP0ajrfz/FhGT4vv278rX8KnPPJx8i5+AtK7TYbx9Le9F1hyzurZpkEyjkGa9dUGhQow4C1NmeGvqxN2w==", "cpu": [ - "arm64" + "ia32" ], "dev": true, "license": "MIT", "optional": true, "os": [ "linux" - ] + ], + "peer": true, + "engines": { + "node": ">=18" + } }, - "node_modules/@rollup/rollup-linux-arm64-musl": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-arm64-musl/-/rollup-linux-arm64-musl-4.59.0.tgz", - "integrity": "sha512-peZRVEdnFWZ5Bh2KeumKG9ty7aCXzzEsHShOZEFiCQlDEepP1dpUl/SrUNXNg13UmZl+gzVDPsiCwnV1uI0RUA==", + "node_modules/@esbuild/linux-loong64": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/linux-loong64/-/linux-loong64-0.28.1.tgz", + "integrity": "sha512-M5sRjUVZrkm1OAPR3dlOYzNmN+loZKGVi1VUQGrwuqLcbR6qeAz+famMhjASeH3YVKvZz+zT1jlh/keC3Rj/lg==", "cpu": [ - "arm64" + "loong64" ], "dev": true, "license": "MIT", "optional": true, "os": [ "linux" - ] + ], + "peer": true, + "engines": { + "node": ">=18" + } }, - "node_modules/@rollup/rollup-linux-loong64-gnu": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-loong64-gnu/-/rollup-linux-loong64-gnu-4.59.0.tgz", - "integrity": "sha512-gbUSW/97f7+r4gHy3Jlup8zDG190AuodsWnNiXErp9mT90iCy9NKKU0Xwx5k8VlRAIV2uU9CsMnEFg/xXaOfXg==", + "node_modules/@esbuild/linux-mips64el": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/linux-mips64el/-/linux-mips64el-0.28.1.tgz", + "integrity": "sha512-mRObBZeHh2OxcBFPWE/FjylkRgZdYuiTR3vaTozquCGOH14iP9oN4x4Ge81CoIDYQrXmIxpFumJBu5MtZpnQJQ==", "cpu": [ - "loong64" + "mips64el" ], "dev": true, "license": "MIT", "optional": true, "os": [ "linux" - ] + ], + "peer": true, + "engines": { + "node": ">=18" + } }, - "node_modules/@rollup/rollup-linux-loong64-musl": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-loong64-musl/-/rollup-linux-loong64-musl-4.59.0.tgz", - "integrity": "sha512-yTRONe79E+o0FWFijasoTjtzG9EBedFXJMl888NBEDCDV9I2wGbFFfJQQe63OijbFCUZqxpHz1GzpbtSFikJ4Q==", + "node_modules/@esbuild/linux-ppc64": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/linux-ppc64/-/linux-ppc64-0.28.1.tgz", + "integrity": "sha512-slScBsMAb3GFDcdrCgLwZtPYRoH2H/youv10QiZyRjmsP48fznoveWytSgCI/R0ZcUgpc0ZhIUEx6LHts8yrfQ==", "cpu": [ - "loong64" + "ppc64" ], "dev": true, "license": "MIT", "optional": true, "os": [ "linux" - ] + ], + "peer": true, + "engines": { + "node": ">=18" + } }, - "node_modules/@rollup/rollup-linux-ppc64-gnu": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-ppc64-gnu/-/rollup-linux-ppc64-gnu-4.59.0.tgz", - "integrity": "sha512-sw1o3tfyk12k3OEpRddF68a1unZ5VCN7zoTNtSn2KndUE+ea3m3ROOKRCZxEpmT9nsGnogpFP9x6mnLTCaoLkA==", + "node_modules/@esbuild/linux-riscv64": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/linux-riscv64/-/linux-riscv64-0.28.1.tgz", + "integrity": "sha512-kw0owk1o0GFETUJyW0jc0G4Yzs0BHZn0JDZ8JRT088vjJYX777BAs1fDGxAC+q831qOs2DTC96mNsG2opdfyyQ==", "cpu": [ - "ppc64" + "riscv64" ], "dev": true, "license": "MIT", "optional": true, "os": [ "linux" - ] - }, - "node_modules/@rollup/rollup-linux-ppc64-musl": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-ppc64-musl/-/rollup-linux-ppc64-musl-4.59.0.tgz", - "integrity": "sha512-+2kLtQ4xT3AiIxkzFVFXfsmlZiG5FXYW7ZyIIvGA7Bdeuh9Z0aN4hVyXS/G1E9bTP/vqszNIN/pUKCk/BTHsKA==", - "cpu": [ - "ppc64" ], - "dev": true, - "license": "MIT", - "optional": true, - "os": [ - "linux" - ] + "peer": true, + "engines": { + "node": ">=18" + } }, - "node_modules/@rollup/rollup-linux-riscv64-gnu": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-riscv64-gnu/-/rollup-linux-riscv64-gnu-4.59.0.tgz", - "integrity": "sha512-NDYMpsXYJJaj+I7UdwIuHHNxXZ/b/N2hR15NyH3m2qAtb/hHPA4g4SuuvrdxetTdndfj9b1WOmy73kcPRoERUg==", + "node_modules/@esbuild/linux-s390x": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/linux-s390x/-/linux-s390x-0.28.1.tgz", + "integrity": "sha512-/lAIjX8aYFRByhh6L5rYtPEDRqa9de/4V/juOXcta5frjvzXO4/sqEtyytse0g3zZFuWu5cDN0MkLz2qRDD2Ag==", "cpu": [ - "riscv64" + "s390x" ], "dev": true, "license": "MIT", "optional": true, "os": [ "linux" - ] + ], + "peer": true, + "engines": { + "node": ">=18" + } }, - "node_modules/@rollup/rollup-linux-riscv64-musl": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-riscv64-musl/-/rollup-linux-riscv64-musl-4.59.0.tgz", - "integrity": "sha512-nLckB8WOqHIf1bhymk+oHxvM9D3tyPndZH8i8+35p/1YiVoVswPid2yLzgX7ZJP0KQvnkhM4H6QZ5m0LzbyIAg==", + "node_modules/@esbuild/linux-x64": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/linux-x64/-/linux-x64-0.28.1.tgz", + "integrity": "sha512-u/anNYF2mmVOEDwLtnQ1wOr3EZ9sTNGLWrsYGYwHWzGA3Si84IOkHXlbWTD1NB+9/1lcnweYKO54uhxZydNzfA==", "cpu": [ - "riscv64" + "x64" ], "dev": true, "license": "MIT", "optional": true, "os": [ "linux" - ] + ], + "peer": true, + "engines": { + "node": ">=18" + } }, - "node_modules/@rollup/rollup-linux-s390x-gnu": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-s390x-gnu/-/rollup-linux-s390x-gnu-4.59.0.tgz", - "integrity": "sha512-oF87Ie3uAIvORFBpwnCvUzdeYUqi2wY6jRFWJAy1qus/udHFYIkplYRW+wo+GRUP4sKzYdmE1Y3+rY5Gc4ZO+w==", + "node_modules/@esbuild/netbsd-arm64": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/netbsd-arm64/-/netbsd-arm64-0.28.1.tgz", + "integrity": "sha512-oks0DYbLwWMmaakTsCb+zL4E+aHRVLom9IJZOAthMQEPiQmydXHkziYEsGYRx0uNV/IjEKGAV941JzH02pflqw==", "cpu": [ - "s390x" + "arm64" ], "dev": true, "license": "MIT", "optional": true, "os": [ - "linux" - ] + "netbsd" + ], + "peer": true, + "engines": { + "node": ">=18" + } }, - "node_modules/@rollup/rollup-linux-x64-gnu": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-x64-gnu/-/rollup-linux-x64-gnu-4.59.0.tgz", - "integrity": "sha512-3AHmtQq/ppNuUspKAlvA8HtLybkDflkMuLK4DPo77DfthRb71V84/c4MlWJXixZz4uruIH4uaa07IqoAkG64fg==", + "node_modules/@esbuild/netbsd-x64": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/netbsd-x64/-/netbsd-x64-0.28.1.tgz", + "integrity": "sha512-aeL6lAnN89Hz43Mlh1G8ARasbuoYvSITDEx0tHh5b7jJnHcssqgjy9Yx430GDpmCa6OyrKoS0aNRjKundRizGg==", "cpu": [ "x64" ], @@ -626,27 +643,35 @@ "license": "MIT", "optional": true, "os": [ - "linux" - ] + "netbsd" + ], + "peer": true, + "engines": { + "node": ">=18" + } }, - "node_modules/@rollup/rollup-linux-x64-musl": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-x64-musl/-/rollup-linux-x64-musl-4.59.0.tgz", - "integrity": "sha512-2UdiwS/9cTAx7qIUZB/fWtToJwvt0Vbo0zmnYt7ED35KPg13Q0ym1g442THLC7VyI6JfYTP4PiSOWyoMdV2/xg==", + "node_modules/@esbuild/openbsd-arm64": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/openbsd-arm64/-/openbsd-arm64-0.28.1.tgz", + "integrity": "sha512-MEFJe5C3R8pwXdZ5Y21oo6m7ePiS0d9pWucn99O/wvyJZChoIQKrQDxKrGeW8F5+T0okTHesAmDeiHDTIq0V/Q==", "cpu": [ - "x64" + "arm64" ], "dev": true, "license": "MIT", "optional": true, "os": [ - "linux" - ] + "openbsd" + ], + "peer": true, + "engines": { + "node": ">=18" + } }, - "node_modules/@rollup/rollup-openbsd-x64": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-openbsd-x64/-/rollup-openbsd-x64-4.59.0.tgz", - "integrity": "sha512-M3bLRAVk6GOwFlPTIxVBSYKUaqfLrn8l0psKinkCFxl4lQvOSz8ZrKDz2gxcBwHFpci0B6rttydI4IpS4IS/jQ==", + "node_modules/@esbuild/openbsd-x64": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/openbsd-x64/-/openbsd-x64-0.28.1.tgz", + "integrity": "sha512-i/ZLIOafE0Z8cI/XANJAixoJL/uRAoS2xOA3rb0xN+KK0K177cMAsQYkzHtBrtMXAKuAc7HGgcWiZ/sRC1Nxgw==", "cpu": [ "x64" ], @@ -655,12 +680,16 @@ "optional": true, "os": [ "openbsd" - ] + ], + "peer": true, + "engines": { + "node": ">=18" + } }, - "node_modules/@rollup/rollup-openharmony-arm64": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-openharmony-arm64/-/rollup-openharmony-arm64-4.59.0.tgz", - "integrity": "sha512-tt9KBJqaqp5i5HUZzoafHZX8b5Q2Fe7UjYERADll83O4fGqJ49O1FsL6LpdzVFQcpwvnyd0i+K/VSwu/o/nWlA==", + "node_modules/@esbuild/openharmony-arm64": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/openharmony-arm64/-/openharmony-arm64-0.28.1.tgz", + "integrity": "sha512-ge+Z7EXFNt2BO1oAMsVpiQ8EwndV9i1xXerAeTIK7AtPs3bKFXQM7nlRxDSIUIMeueR1CNXxqztLzdNeReKBJg==", "cpu": [ "arm64" ], @@ -669,54 +698,70 @@ "optional": true, "os": [ "openharmony" - ] + ], + "peer": true, + "engines": { + "node": ">=18" + } }, - "node_modules/@rollup/rollup-win32-arm64-msvc": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-win32-arm64-msvc/-/rollup-win32-arm64-msvc-4.59.0.tgz", - "integrity": "sha512-V5B6mG7OrGTwnxaNUzZTDTjDS7F75PO1ae6MJYdiMu60sq0CqN5CVeVsbhPxalupvTX8gXVSU9gq+Rx1/hvu6A==", + "node_modules/@esbuild/sunos-x64": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/sunos-x64/-/sunos-x64-0.28.1.tgz", + "integrity": "sha512-BEjgtECkL3vY+SaSQ6nzVfiALUeFxpawyp8Jmf5PtYhf1Ug40N1h/hxlhts+f1FvSvarEigdxS3BlSMI2PJLcQ==", "cpu": [ - "arm64" + "x64" ], "dev": true, "license": "MIT", "optional": true, "os": [ - "win32" - ] + "sunos" + ], + "peer": true, + "engines": { + "node": ">=18" + } }, - "node_modules/@rollup/rollup-win32-ia32-msvc": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-win32-ia32-msvc/-/rollup-win32-ia32-msvc-4.59.0.tgz", - "integrity": "sha512-UKFMHPuM9R0iBegwzKF4y0C4J9u8C6MEJgFuXTBerMk7EJ92GFVFYBfOZaSGLu6COf7FxpQNqhNS4c4icUPqxA==", + "node_modules/@esbuild/win32-arm64": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/win32-arm64/-/win32-arm64-0.28.1.tgz", + "integrity": "sha512-lCv9eK/H6ZJWbE7bh2nw54CZ9M2nupBxJcTsdk/QQnWkdSjKGuxmmH8/GWrlT1eMmZfn4dGcCjRte397WqfQXA==", "cpu": [ - "ia32" + "arm64" ], "dev": true, "license": "MIT", "optional": true, "os": [ "win32" - ] + ], + "peer": true, + "engines": { + "node": ">=18" + } }, - "node_modules/@rollup/rollup-win32-x64-gnu": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-win32-x64-gnu/-/rollup-win32-x64-gnu-4.59.0.tgz", - "integrity": "sha512-laBkYlSS1n2L8fSo1thDNGrCTQMmxjYY5G0WFWjFFYZkKPjsMBsgJfGf4TLxXrF6RyhI60L8TMOjBMvXiTcxeA==", + "node_modules/@esbuild/win32-ia32": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/win32-ia32/-/win32-ia32-0.28.1.tgz", + "integrity": "sha512-zvb/mB2bSCoJOpoCBgYKKpX6YM6mJBlBUVUtVj41DlZJVEB6/0CKlRYxP5wWl1C1ILiCoAU5wZZ4q1P3qeS6Eg==", "cpu": [ - "x64" + "ia32" ], "dev": true, "license": "MIT", "optional": true, "os": [ "win32" - ] + ], + "peer": true, + "engines": { + "node": ">=18" + } }, - "node_modules/@rollup/rollup-win32-x64-msvc": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/@rollup/rollup-win32-x64-msvc/-/rollup-win32-x64-msvc-4.59.0.tgz", - "integrity": "sha512-2HRCml6OztYXyJXAvdDXPKcawukWY2GpR5/nxKp4iBgiO3wcoEGkAaqctIbZcNB6KlUQBIqt8VYkNSj2397EfA==", + "node_modules/@esbuild/win32-x64": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@esbuild/win32-x64/-/win32-x64-0.28.1.tgz", + "integrity": "sha512-bm4Mowrv+GXMlpWX++EcXw/iLyd1o3+bJkC2DkWXYVvgZCqD/bSj9ctZeAMC3cIxgjRVR2Dufaiu4YPxr5gW1A==", "cpu": [ "x64" ], @@ -725,197 +770,747 @@ "optional": true, "os": [ "win32" - ] + ], + "peer": true, + "engines": { + "node": ">=18" + } }, - "node_modules/@types/babel__core": { - "version": "7.20.5", - "resolved": "https://registry.npmjs.org/@types/babel__core/-/babel__core-7.20.5.tgz", - "integrity": "sha512-qoQprZvz5wQFJwMDqeseRXWv3rqMvhgpbXFfVyWhbx9X47POIA6i/+dXefEmZKoAgOaTdaIgNSMqMIU61yRyzA==", + "node_modules/@exodus/bytes": { + "version": "1.15.1", + "resolved": "https://registry.npmjs.org/@exodus/bytes/-/bytes-1.15.1.tgz", + "integrity": "sha512-S6mL0yNB/Abt9Ei4tq8gDhcczc4S3+vQ4ra7vxnAf+YHC02srtqxKKZghx2Dq6p0e66THKwR6r8N6P95wEty7Q==", "dev": true, "license": "MIT", - "dependencies": { - "@babel/parser": "^7.20.7", - "@babel/types": "^7.20.7", - "@types/babel__generator": "*", - "@types/babel__template": "*", - "@types/babel__traverse": "*" + "engines": { + "node": "^20.19.0 || ^22.12.0 || >=24.0.0" + }, + "peerDependencies": { + "@noble/hashes": "^1.8.0 || ^2.0.0" + }, + "peerDependenciesMeta": { + "@noble/hashes": { + "optional": true + } } }, - "node_modules/@types/babel__generator": { - "version": "7.27.0", - "resolved": "https://registry.npmjs.org/@types/babel__generator/-/babel__generator-7.27.0.tgz", - "integrity": "sha512-ufFd2Xi92OAVPYsy+P4n7/U7e68fex0+Ee8gSG9KX7eo084CWiQ4sdxktvdl0bOPupXtVJPY19zk6EwWqUQ8lg==", + "node_modules/@jridgewell/sourcemap-codec": { + "version": "1.5.5", + "resolved": "https://registry.npmjs.org/@jridgewell/sourcemap-codec/-/sourcemap-codec-1.5.5.tgz", + "integrity": "sha512-cYQ9310grqxueWbl+WuIUIaiUaDcj7WOq5fVhEljNVgRfOUhY9fy2zTvfoqWsnebh8Sl70VScFbICvJnLKB0Og==", "dev": true, - "license": "MIT", - "dependencies": { - "@babel/types": "^7.0.0" - } + "license": "MIT" }, - "node_modules/@types/babel__template": { - "version": "7.4.4", - "resolved": "https://registry.npmjs.org/@types/babel__template/-/babel__template-7.4.4.tgz", - "integrity": "sha512-h/NUaSyG5EyxBIp8YRxo4RMe2/qQgvyowRwVMzhYhBCONbW8PUsg4lkFMrhgZhUe5z3L3MiLDuvyJ/CaPa2A8A==", + "node_modules/@napi-rs/wasm-runtime": { + "version": "1.1.5", + "resolved": "https://registry.npmjs.org/@napi-rs/wasm-runtime/-/wasm-runtime-1.1.5.tgz", + "integrity": "sha512-AWPoBRJ9tsnVhor4sjO7rkni+7p+2IAEFj6cx06UgP10jkQHqay/36uRV/bFkgrh18D9vb4cr8Q0Pthskgzy+Q==", "dev": true, "license": "MIT", + "optional": true, "dependencies": { - "@babel/parser": "^7.1.0", - "@babel/types": "^7.0.0" + "@tybys/wasm-util": "^0.10.2" + }, + "funding": { + "type": "github", + "url": "https://github.com/sponsors/Brooooooklyn" + }, + "peerDependencies": { + "@emnapi/core": "^1.7.1", + "@emnapi/runtime": "^1.7.1" } }, - "node_modules/@types/babel__traverse": { - "version": "7.28.0", - "resolved": "https://registry.npmjs.org/@types/babel__traverse/-/babel__traverse-7.28.0.tgz", - "integrity": "sha512-8PvcXf70gTDZBgt9ptxJ8elBeBjcLOAcOtoO/mPJjtji1+CdGbHgm77om1GrsPxsiE+uXIpNSK64UYaIwQXd4Q==", + "node_modules/@oxc-project/types": { + "version": "0.133.0", + "resolved": "https://registry.npmjs.org/@oxc-project/types/-/types-0.133.0.tgz", + "integrity": "sha512-KzkdCd6Uxqnf6l3HOw1xfatAlUURA0g14cvBYFyJ5SaNOQbOUvBr9PKArcPcrNIeRsBdgcUzOGrhKveVpvOIGA==", "dev": true, "license": "MIT", - "dependencies": { - "@babel/types": "^7.28.2" + "funding": { + "url": "https://github.com/sponsors/Boshen" } }, - "node_modules/@types/estree": { - "version": "1.0.8", - "resolved": "https://registry.npmjs.org/@types/estree/-/estree-1.0.8.tgz", - "integrity": "sha512-dWHzHa2WqEXI/O1E9OjrocMTKJl2mSrEolh1Iomrv6U+JuNwaHXsXx9bLu5gG7BUWFIN0skIQJQ/L1rIex4X6w==", - "dev": true, - "license": "MIT" - }, - "node_modules/@types/prop-types": { - "version": "15.7.15", - "resolved": "https://registry.npmjs.org/@types/prop-types/-/prop-types-15.7.15.tgz", - "integrity": "sha512-F6bEyamV9jKGAFBEmlQnesRPGOQqS2+Uwi0Em15xenOxHaf2hv6L8YCVn3rPdPJOiJfPiCnLIRyvwVaqMY3MIw==", + "node_modules/@rolldown/binding-android-arm64": { + "version": "1.0.3", + "resolved": "https://registry.npmjs.org/@rolldown/binding-android-arm64/-/binding-android-arm64-1.0.3.tgz", + "integrity": "sha512-454rs7jHngixp/NMxd5srYD57OnzSlZ/eFTETjORQHLwJG1lRtmNOJcBerZlfu4GjKqeq8aCCIQrMdHyhI51Hw==", + "cpu": [ + "arm64" + ], "dev": true, - "license": "MIT" + "license": "MIT", + "optional": true, + "os": [ + "android" + ], + "engines": { + "node": "^20.19.0 || >=22.12.0" + } }, - "node_modules/@types/react": { - "version": "18.3.27", - "resolved": "https://registry.npmjs.org/@types/react/-/react-18.3.27.tgz", - "integrity": "sha512-cisd7gxkzjBKU2GgdYrTdtQx1SORymWyaAFhaxQPK9bYO9ot3Y5OikQRvY0VYQtvwjeQnizCINJAenh/V7MK2w==", + "node_modules/@rolldown/binding-darwin-arm64": { + "version": "1.0.3", + "resolved": "https://registry.npmjs.org/@rolldown/binding-darwin-arm64/-/binding-darwin-arm64-1.0.3.tgz", + "integrity": "sha512-PcAhP+ynjURNyy8SKGl5DQP94aGuB/7JrXJb/t7P+hanXvQVMWzUvRRhBAcg/lNRadBhoUPqSoP4xw5tR/KBEA==", + "cpu": [ + "arm64" + ], "dev": true, "license": "MIT", - "dependencies": { - "@types/prop-types": "*", - "csstype": "^3.2.2" + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": "^20.19.0 || >=22.12.0" } }, - "node_modules/@types/react-dom": { - "version": "18.3.7", - "resolved": "https://registry.npmjs.org/@types/react-dom/-/react-dom-18.3.7.tgz", - "integrity": "sha512-MEe3UeoENYVFXzoXEWsvcpg6ZvlrFNlOQ7EOsvhI3CfAXwzPfO8Qwuxd40nepsYKqyyVQnTdEfv68q91yLcKrQ==", + "node_modules/@rolldown/binding-darwin-x64": { + "version": "1.0.3", + "resolved": "https://registry.npmjs.org/@rolldown/binding-darwin-x64/-/binding-darwin-x64-1.0.3.tgz", + "integrity": "sha512-9YpfeUvSE2RS7wysJ81uOZkXJz7f7Q55H2Gvp3VEw/EsahqDtrphrZ0EwDLK5vvKOzaCrBsjF8JmnMLcUt78Gg==", + "cpu": [ + "x64" + ], "dev": true, "license": "MIT", - "peerDependencies": { - "@types/react": "^18.0.0" + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": "^20.19.0 || >=22.12.0" } }, - "node_modules/@types/trusted-types": { - "version": "2.0.7", - "resolved": "https://registry.npmjs.org/@types/trusted-types/-/trusted-types-2.0.7.tgz", - "integrity": "sha512-ScaPdn1dQczgbl0QFTeTOmVHFULt394XJgOQNoyVhZ6r2vLnMLJfBPd53SB52T/3G36VI1/g2MZaX0cwDuXsfw==", + "node_modules/@rolldown/binding-freebsd-x64": { + "version": "1.0.3", + "resolved": "https://registry.npmjs.org/@rolldown/binding-freebsd-x64/-/binding-freebsd-x64-1.0.3.tgz", + "integrity": "sha512-yB1IlAsSNHncV6SCTL27/MVGR5htvQsoGxIv5KMGXALp+Ll1wYsn+x98M9MW7qa+NdSbvrrY7ANI4wLJ0n1e6g==", + "cpu": [ + "x64" + ], + "dev": true, "license": "MIT", - "optional": true + "optional": true, + "os": [ + "freebsd" + ], + "engines": { + "node": "^20.19.0 || >=22.12.0" + } }, - "node_modules/@vitejs/plugin-react": { - "version": "4.7.0", - "resolved": "https://registry.npmjs.org/@vitejs/plugin-react/-/plugin-react-4.7.0.tgz", - "integrity": "sha512-gUu9hwfWvvEDBBmgtAowQCojwZmJ5mcLn3aufeCsitijs3+f2NsrPtlAWIR6OPiqljl96GVCUbLe0HyqIpVaoA==", + "node_modules/@rolldown/binding-linux-arm-gnueabihf": { + "version": "1.0.3", + "resolved": "https://registry.npmjs.org/@rolldown/binding-linux-arm-gnueabihf/-/binding-linux-arm-gnueabihf-1.0.3.tgz", + "integrity": "sha512-Yi30IVAAfLUCy2MseFjbB1jAMDl1VMCAas5StnYp8da9+CKvMd2H2cbEjWcw5NPaPqzvYkVIaF1nNUG+b7u/sw==", + "cpu": [ + "arm" + ], "dev": true, "license": "MIT", - "dependencies": { - "@babel/core": "^7.28.0", - "@babel/plugin-transform-react-jsx-self": "^7.27.1", - "@babel/plugin-transform-react-jsx-source": "^7.27.1", - "@rolldown/pluginutils": "1.0.0-beta.27", - "@types/babel__core": "^7.20.5", - "react-refresh": "^0.17.0" - }, + "optional": true, + "os": [ + "linux" + ], "engines": { - "node": "^14.18.0 || >=16.0.0" - }, - "peerDependencies": { - "vite": "^4.2.0 || ^5.0.0 || ^6.0.0 || ^7.0.0" + "node": "^20.19.0 || >=22.12.0" } }, - "node_modules/baseline-browser-mapping": { - "version": "2.9.11", - "resolved": "https://registry.npmjs.org/baseline-browser-mapping/-/baseline-browser-mapping-2.9.11.tgz", - "integrity": "sha512-Sg0xJUNDU1sJNGdfGWhVHX0kkZ+HWcvmVymJbj6NSgZZmW/8S9Y2HQ5euytnIgakgxN6papOAWiwDo1ctFDcoQ==", + "node_modules/@rolldown/binding-linux-arm64-gnu": { + "version": "1.0.3", + "resolved": "https://registry.npmjs.org/@rolldown/binding-linux-arm64-gnu/-/binding-linux-arm64-gnu-1.0.3.tgz", + "integrity": "sha512-jsO7R8To+AdlYgUmN5sHSCZbfhtMBkO0WUx8iORQnPcMMdgr7qM2DQmMwgabs3GhNztdmoKkMKQFHD6DTMCIQw==", + "cpu": [ + "arm64" + ], "dev": true, - "license": "Apache-2.0", - "bin": { - "baseline-browser-mapping": "dist/cli.js" + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^20.19.0 || >=22.12.0" } }, - "node_modules/browserslist": { - "version": "4.28.1", - "resolved": "https://registry.npmjs.org/browserslist/-/browserslist-4.28.1.tgz", - "integrity": "sha512-ZC5Bd0LgJXgwGqUknZY/vkUQ04r8NXnJZ3yYi4vDmSiZmC/pdSN0NbNRPxZpbtO4uAfDUAFffO8IZoM3Gj8IkA==", - "dev": true, - "funding": [ - { - "type": "opencollective", - "url": "https://opencollective.com/browserslist" - }, - { - "type": "tidelift", - "url": "https://tidelift.com/funding/github/npm/browserslist" - }, - { - "type": "github", - "url": "https://github.com/sponsors/ai" - } + "node_modules/@rolldown/binding-linux-arm64-musl": { + "version": "1.0.3", + "resolved": "https://registry.npmjs.org/@rolldown/binding-linux-arm64-musl/-/binding-linux-arm64-musl-1.0.3.tgz", + "integrity": "sha512-VWkUHwWriDciit80wleYwKILoR/KMvxh/IdwS/paX+ZgpuRpCrKLUdadJbc0NpBEiyhpYawsJ73j9aCvOH+f7Q==", + "cpu": [ + "arm64" ], + "dev": true, "license": "MIT", - "dependencies": { - "baseline-browser-mapping": "^2.9.0", - "caniuse-lite": "^1.0.30001759", - "electron-to-chromium": "^1.5.263", - "node-releases": "^2.0.27", - "update-browserslist-db": "^1.2.0" - }, - "bin": { - "browserslist": "cli.js" - }, + "optional": true, + "os": [ + "linux" + ], "engines": { - "node": "^6 || ^7 || ^8 || ^9 || ^10 || ^11 || ^12 || >=13.7" + "node": "^20.19.0 || >=22.12.0" } }, - "node_modules/caniuse-lite": { - "version": "1.0.30001762", - "resolved": "https://registry.npmjs.org/caniuse-lite/-/caniuse-lite-1.0.30001762.tgz", - "integrity": "sha512-PxZwGNvH7Ak8WX5iXzoK1KPZttBXNPuaOvI2ZYU7NrlM+d9Ov+TUvlLOBNGzVXAntMSMMlJPd+jY6ovrVjSmUw==", + "node_modules/@rolldown/binding-linux-ppc64-gnu": { + "version": "1.0.3", + "resolved": "https://registry.npmjs.org/@rolldown/binding-linux-ppc64-gnu/-/binding-linux-ppc64-gnu-1.0.3.tgz", + "integrity": "sha512-5f1laC0SlIR0yDbFCd8acUhvJIag6N3zC5P7oUPN6wX0aOma+uKJ0wBDH5aq7I1PVI2ttTlhJwzwRIBnLiSGEg==", + "cpu": [ + "ppc64" + ], "dev": true, - "funding": [ - { - "type": "opencollective", - "url": "https://opencollective.com/browserslist" - }, - { - "type": "tidelift", - "url": "https://tidelift.com/funding/github/npm/caniuse-lite" - }, - { - "type": "github", - "url": "https://github.com/sponsors/ai" - } + "license": "MIT", + "optional": true, + "os": [ + "linux" ], - "license": "CC-BY-4.0" + "engines": { + "node": "^20.19.0 || >=22.12.0" + } }, - "node_modules/commander": { - "version": "8.3.0", - "resolved": "https://registry.npmjs.org/commander/-/commander-8.3.0.tgz", - "integrity": "sha512-OkTL9umf+He2DZkUq8f8J9of7yL6RJKI24dVITBmNfZBmri9zYZQrKkuXiKhyfPSu8tUhnVBB1iKXevvnlR4Ww==", + "node_modules/@rolldown/binding-linux-s390x-gnu": { + "version": "1.0.3", + "resolved": "https://registry.npmjs.org/@rolldown/binding-linux-s390x-gnu/-/binding-linux-s390x-gnu-1.0.3.tgz", + "integrity": "sha512-Iq4ko0r4XsgbrF/LunNgHtAGLRRVE2kXonAXQ/MV0mC6jQpMOhW1SvtZja2EhC/kd05++bP78dsqBeIQyYJ6Yg==", + "cpu": [ + "s390x" + ], + "dev": true, "license": "MIT", + "optional": true, + "os": [ + "linux" + ], "engines": { - "node": ">= 12" + "node": "^20.19.0 || >=22.12.0" } }, - "node_modules/convert-source-map": { - "version": "2.0.0", - "resolved": "https://registry.npmjs.org/convert-source-map/-/convert-source-map-2.0.0.tgz", + "node_modules/@rolldown/binding-linux-x64-gnu": { + "version": "1.0.3", + "resolved": "https://registry.npmjs.org/@rolldown/binding-linux-x64-gnu/-/binding-linux-x64-gnu-1.0.3.tgz", + "integrity": "sha512-B8m6tD5+/N5FeNQFbKlLA/2yVq9ycQP1SeedyEYYKWBNR3ZQbkvIUcNnDNM03lO1l5F2roiiFJGgvoLLyZXtSg==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^20.19.0 || >=22.12.0" + } + }, + "node_modules/@rolldown/binding-linux-x64-musl": { + "version": "1.0.3", + "resolved": "https://registry.npmjs.org/@rolldown/binding-linux-x64-musl/-/binding-linux-x64-musl-1.0.3.tgz", + "integrity": "sha512-pSdpdUJHkuCxun9LE7jvgUB9qsRgaiyNNCX7m/AvHTcq67AiT/Yhoxvw5zPfhrM8k/BfP8ce/hMOpthKDpEUow==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^20.19.0 || >=22.12.0" + } + }, + "node_modules/@rolldown/binding-openharmony-arm64": { + "version": "1.0.3", + "resolved": "https://registry.npmjs.org/@rolldown/binding-openharmony-arm64/-/binding-openharmony-arm64-1.0.3.tgz", + "integrity": "sha512-OXXS3RKJgX2uLwM+gYyuH5omcH8fL1LJs96pZGgtetVCahON57+d4SJHzTgZiOjxgGkSnpXpOsWuPDGAKAigEg==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "openharmony" + ], + "engines": { + "node": "^20.19.0 || >=22.12.0" + } + }, + "node_modules/@rolldown/binding-wasm32-wasi": { + "version": "1.0.3", + "resolved": "https://registry.npmjs.org/@rolldown/binding-wasm32-wasi/-/binding-wasm32-wasi-1.0.3.tgz", + "integrity": "sha512-JTtb8BWFynicNSoPrehsCzBtOKjZ6jhMiPFEmOiuXg1Fl8dn2KHQob+GuPSGR0dryQa1PQJbzjF3dqO/whhjLg==", + "cpu": [ + "wasm32" + ], + "dev": true, + "license": "MIT", + "optional": true, + "dependencies": { + "@emnapi/core": "1.10.0", + "@emnapi/runtime": "1.10.0", + "@napi-rs/wasm-runtime": "^1.1.4" + }, + "engines": { + "node": "^20.19.0 || >=22.12.0" + } + }, + "node_modules/@rolldown/binding-win32-arm64-msvc": { + "version": "1.0.3", + "resolved": "https://registry.npmjs.org/@rolldown/binding-win32-arm64-msvc/-/binding-win32-arm64-msvc-1.0.3.tgz", + "integrity": "sha512-gEdFFEN70A/jxb2svrWsN3aDL7OUtmvlOy+6fa2jxG8K0wQ1ZbdeLGnidov6Yu5/733dI5ySfzFlQ/cb0bSz1g==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": "^20.19.0 || >=22.12.0" + } + }, + "node_modules/@rolldown/binding-win32-x64-msvc": { + "version": "1.0.3", + "resolved": "https://registry.npmjs.org/@rolldown/binding-win32-x64-msvc/-/binding-win32-x64-msvc-1.0.3.tgz", + "integrity": "sha512-eXB7CHuaQdqmJcc3koCNtNPmT/bj2gc999kUFgBxG8Ac0NdgXc4rkCHhqrgrhN3zddvvvrgzj1e90SuSfmyIXA==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": "^20.19.0 || >=22.12.0" + } + }, + "node_modules/@rolldown/pluginutils": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/@rolldown/pluginutils/-/pluginutils-1.0.1.tgz", + "integrity": "sha512-2j9bGt5Jh8hj+vPtgzPtl72j0yRxHAyumoo6TNfAjsLB04UtpSvPbPcDcBMxz7n+9CYB0c1GxQFxYRg2jimqGw==", + "dev": true, + "license": "MIT" + }, + "node_modules/@standard-schema/spec": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/@standard-schema/spec/-/spec-1.1.0.tgz", + "integrity": "sha512-l2aFy5jALhniG5HgqrD6jXLi/rUWrKvqN/qJx6yoJsgKhblVd+iqqU4RCXavm/jPityDo5TCvKMnpjKnOriy0w==", + "dev": true, + "license": "MIT" + }, + "node_modules/@testing-library/dom": { + "version": "10.4.1", + "resolved": "https://registry.npmjs.org/@testing-library/dom/-/dom-10.4.1.tgz", + "integrity": "sha512-o4PXJQidqJl82ckFaXUeoAW+XysPLauYI43Abki5hABd853iMhitooc6znOnczgbTYmEP6U6/y1ZyKAIsvMKGg==", + "dev": true, + "license": "MIT", + "peer": true, + "dependencies": { + "@babel/code-frame": "^7.10.4", + "@babel/runtime": "^7.12.5", + "@types/aria-query": "^5.0.1", + "aria-query": "5.3.0", + "dom-accessibility-api": "^0.5.9", + "lz-string": "^1.5.0", + "picocolors": "1.1.1", + "pretty-format": "^27.0.2" + }, + "engines": { + "node": ">=18" + } + }, + "node_modules/@testing-library/jest-dom": { + "version": "6.9.1", + "resolved": "https://registry.npmjs.org/@testing-library/jest-dom/-/jest-dom-6.9.1.tgz", + "integrity": "sha512-zIcONa+hVtVSSep9UT3jZ5rizo2BsxgyDYU7WFD5eICBE7no3881HGeb/QkGfsJs6JTkY1aQhT7rIPC7e+0nnA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@adobe/css-tools": "^4.4.0", + "aria-query": "^5.0.0", + "css.escape": "^1.5.1", + "dom-accessibility-api": "^0.6.3", + "picocolors": "^1.1.1", + "redent": "^3.0.0" + }, + "engines": { + "node": ">=14", + "npm": ">=6", + "yarn": ">=1" + } + }, + "node_modules/@testing-library/jest-dom/node_modules/dom-accessibility-api": { + "version": "0.6.3", + "resolved": "https://registry.npmjs.org/dom-accessibility-api/-/dom-accessibility-api-0.6.3.tgz", + "integrity": "sha512-7ZgogeTnjuHbo+ct10G9Ffp0mif17idi0IyWNVA/wcwcm7NPOD/WEHVP3n7n3MhXqxoIYm8d6MuZohYWIZ4T3w==", + "dev": true, + "license": "MIT" + }, + "node_modules/@testing-library/react": { + "version": "16.3.2", + "resolved": "https://registry.npmjs.org/@testing-library/react/-/react-16.3.2.tgz", + "integrity": "sha512-XU5/SytQM+ykqMnAnvB2umaJNIOsLF3PVv//1Ew4CTcpz0/BRyy/af40qqrt7SjKpDdT1saBMc42CUok5gaw+g==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/runtime": "^7.12.5" + }, + "engines": { + "node": ">=18" + }, + "peerDependencies": { + "@testing-library/dom": "^10.0.0", + "@types/react": "^18.0.0 || ^19.0.0", + "@types/react-dom": "^18.0.0 || ^19.0.0", + "react": "^18.0.0 || ^19.0.0", + "react-dom": "^18.0.0 || ^19.0.0" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@testing-library/user-event": { + "version": "14.6.1", + "resolved": "https://registry.npmjs.org/@testing-library/user-event/-/user-event-14.6.1.tgz", + "integrity": "sha512-vq7fv0rnt+QTXgPxr5Hjc210p6YKq2kmdziLgnsZGgLJ9e6VAShx1pACLuRjd/AS/sr7phAR58OIIpf0LlmQNw==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=12", + "npm": ">=6" + }, + "peerDependencies": { + "@testing-library/dom": ">=7.21.4" + } + }, + "node_modules/@tybys/wasm-util": { + "version": "0.10.2", + "resolved": "https://registry.npmjs.org/@tybys/wasm-util/-/wasm-util-0.10.2.tgz", + "integrity": "sha512-RoBvJ2X0wuKlWFIjrwffGw1IqZHKQqzIchKaadZZfnNpsAYp2mM0h36JtPCjNDAHGgYez/15uMBpfGwchhiMgg==", + "dev": true, + "license": "MIT", + "optional": true, + "dependencies": { + "tslib": "^2.4.0" + } + }, + "node_modules/@types/aria-query": { + "version": "5.0.4", + "resolved": "https://registry.npmjs.org/@types/aria-query/-/aria-query-5.0.4.tgz", + "integrity": "sha512-rfT93uj5s0PRL7EzccGMs3brplhcrghnDoV26NqKhCAS1hVo+WdNsPvE/yb6ilfr5hi2MEk6d5EWJTKdxg8jVw==", + "dev": true, + "license": "MIT", + "peer": true + }, + "node_modules/@types/chai": { + "version": "5.2.3", + "resolved": "https://registry.npmjs.org/@types/chai/-/chai-5.2.3.tgz", + "integrity": "sha512-Mw558oeA9fFbv65/y4mHtXDs9bPnFMZAL/jxdPFUpOHHIXX91mcgEHbS5Lahr+pwZFR8A7GQleRWeI6cGFC2UA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@types/deep-eql": "*", + "assertion-error": "^2.0.1" + } + }, + "node_modules/@types/deep-eql": { + "version": "4.0.2", + "resolved": "https://registry.npmjs.org/@types/deep-eql/-/deep-eql-4.0.2.tgz", + "integrity": "sha512-c9h9dVVMigMPc4bwTvC5dxqtqJZwQPePsWjPlpSOnojbor6pGqdk541lfA7AqFQr5pB1BRdq0juY9db81BwyFw==", + "dev": true, + "license": "MIT" + }, + "node_modules/@types/estree": { + "version": "1.0.8", + "resolved": "https://registry.npmjs.org/@types/estree/-/estree-1.0.8.tgz", + "integrity": "sha512-dWHzHa2WqEXI/O1E9OjrocMTKJl2mSrEolh1Iomrv6U+JuNwaHXsXx9bLu5gG7BUWFIN0skIQJQ/L1rIex4X6w==", + "dev": true, + "license": "MIT" + }, + "node_modules/@types/prop-types": { + "version": "15.7.15", + "resolved": "https://registry.npmjs.org/@types/prop-types/-/prop-types-15.7.15.tgz", + "integrity": "sha512-F6bEyamV9jKGAFBEmlQnesRPGOQqS2+Uwi0Em15xenOxHaf2hv6L8YCVn3rPdPJOiJfPiCnLIRyvwVaqMY3MIw==", + "dev": true, + "license": "MIT" + }, + "node_modules/@types/react": { + "version": "18.3.27", + "resolved": "https://registry.npmjs.org/@types/react/-/react-18.3.27.tgz", + "integrity": "sha512-cisd7gxkzjBKU2GgdYrTdtQx1SORymWyaAFhaxQPK9bYO9ot3Y5OikQRvY0VYQtvwjeQnizCINJAenh/V7MK2w==", + "dev": true, + "license": "MIT", + "dependencies": { + "@types/prop-types": "*", + "csstype": "^3.2.2" + } + }, + "node_modules/@types/react-dom": { + "version": "18.3.7", + "resolved": "https://registry.npmjs.org/@types/react-dom/-/react-dom-18.3.7.tgz", + "integrity": "sha512-MEe3UeoENYVFXzoXEWsvcpg6ZvlrFNlOQ7EOsvhI3CfAXwzPfO8Qwuxd40nepsYKqyyVQnTdEfv68q91yLcKrQ==", + "dev": true, + "license": "MIT", + "peerDependencies": { + "@types/react": "^18.0.0" + } + }, + "node_modules/@types/trusted-types": { + "version": "2.0.7", + "resolved": "https://registry.npmjs.org/@types/trusted-types/-/trusted-types-2.0.7.tgz", + "integrity": "sha512-ScaPdn1dQczgbl0QFTeTOmVHFULt394XJgOQNoyVhZ6r2vLnMLJfBPd53SB52T/3G36VI1/g2MZaX0cwDuXsfw==", + "license": "MIT", + "optional": true + }, + "node_modules/@vitejs/plugin-react": { + "version": "6.0.2", + "resolved": "https://registry.npmjs.org/@vitejs/plugin-react/-/plugin-react-6.0.2.tgz", + "integrity": "sha512-DlSMqo4WhThw4vB8Mpn0Woe9J+Jfq1geJ61AKW0QEgLzGMNwtIMdxbDUzLxcun8W7NbJO0e2Jg/Nxm3cCSVzzg==", + "dev": true, + "license": "MIT", + "dependencies": { + "@rolldown/pluginutils": "^1.0.0" + }, + "engines": { + "node": "^20.19.0 || >=22.12.0" + }, + "peerDependencies": { + "@rolldown/plugin-babel": "^0.1.7 || ^0.2.0", + "babel-plugin-react-compiler": "^1.0.0", + "vite": "^8.0.0" + }, + "peerDependenciesMeta": { + "@rolldown/plugin-babel": { + "optional": true + }, + "babel-plugin-react-compiler": { + "optional": true + } + } + }, + "node_modules/@vitest/expect": { + "version": "4.1.8", + "resolved": "https://registry.npmjs.org/@vitest/expect/-/expect-4.1.8.tgz", + "integrity": "sha512-h3nDO677RDLEGlBxyQ5CW8RlMThSKSRLUePLOx09gNIWRL40edgA1GCZSZgf1W55MFAG6/Sw14KeaAnqv0NKdQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "@standard-schema/spec": "^1.1.0", + "@types/chai": "^5.2.2", + "@vitest/spy": "4.1.8", + "@vitest/utils": "4.1.8", + "chai": "^6.2.2", + "tinyrainbow": "^3.1.0" + }, + "funding": { + "url": "https://opencollective.com/vitest" + } + }, + "node_modules/@vitest/mocker": { + "version": "4.1.8", + "resolved": "https://registry.npmjs.org/@vitest/mocker/-/mocker-4.1.8.tgz", + "integrity": "sha512-LEiN/xe4OSIbKe9HQIp5OC24agGD9J5CnmMgsLohVVoOPWL9a2sBoR6VBx43jQZb7Kr1l4RCuyCJzcAa0+dojw==", + "dev": true, + "license": "MIT", + "dependencies": { + "@vitest/spy": "4.1.8", + "estree-walker": "^3.0.3", + "magic-string": "^0.30.21" + }, + "funding": { + "url": "https://opencollective.com/vitest" + }, + "peerDependencies": { + "msw": "^2.4.9", + "vite": "^6.0.0 || ^7.0.0 || ^8.0.0" + }, + "peerDependenciesMeta": { + "msw": { + "optional": true + }, + "vite": { + "optional": true + } + } + }, + "node_modules/@vitest/pretty-format": { + "version": "4.1.8", + "resolved": "https://registry.npmjs.org/@vitest/pretty-format/-/pretty-format-4.1.8.tgz", + "integrity": "sha512-9GasEBxpZ1VYIpqHf/0+YGg121uSNwCKOJqIrTwWP/TB7DmFCiaBpNl3aPZzoLWfWkuqhbH8vJIVobZkvdo2cA==", + "dev": true, + "license": "MIT", + "dependencies": { + "tinyrainbow": "^3.1.0" + }, + "funding": { + "url": "https://opencollective.com/vitest" + } + }, + "node_modules/@vitest/runner": { + "version": "4.1.8", + "resolved": "https://registry.npmjs.org/@vitest/runner/-/runner-4.1.8.tgz", + "integrity": "sha512-EmVxeBAfMJvycdjd6Hm+RbFBbA9fKvo0Kx37hNpBYoYeavH3RNsBXWDooR1mgD52dCrxIIuP7UotpfiwOikvcg==", + "dev": true, + "license": "MIT", + "dependencies": { + "@vitest/utils": "4.1.8", + "pathe": "^2.0.3" + }, + "funding": { + "url": "https://opencollective.com/vitest" + } + }, + "node_modules/@vitest/snapshot": { + "version": "4.1.8", + "resolved": "https://registry.npmjs.org/@vitest/snapshot/-/snapshot-4.1.8.tgz", + "integrity": "sha512-acfZboRmAIf05DEKcBQy33VXojFJjtUdLyo7oOmV9kebb2xdU01UknNiPuPZoJZQyO7DF0gZdTGTpeAzET9QPQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "@vitest/pretty-format": "4.1.8", + "@vitest/utils": "4.1.8", + "magic-string": "^0.30.21", + "pathe": "^2.0.3" + }, + "funding": { + "url": "https://opencollective.com/vitest" + } + }, + "node_modules/@vitest/spy": { + "version": "4.1.8", + "resolved": "https://registry.npmjs.org/@vitest/spy/-/spy-4.1.8.tgz", + "integrity": "sha512-6EevtBp6OZOPF7bmz36HrGMeP3txgVSrgebWxHOafDXGkhIzfXK14f8KF6MuFfgXXUeHxmpD3BQxkV00/3s5mA==", + "dev": true, + "license": "MIT", + "funding": { + "url": "https://opencollective.com/vitest" + } + }, + "node_modules/@vitest/utils": { + "version": "4.1.8", + "resolved": "https://registry.npmjs.org/@vitest/utils/-/utils-4.1.8.tgz", + "integrity": "sha512-uOJamYALNhfJ6iolExyQM40yIQwDqYnkKtQ5VCiSe17E33H0aQ/u+1GlRuz4LZBk6Mm3sg90G9hEbmEt37C1Zg==", + "dev": true, + "license": "MIT", + "dependencies": { + "@vitest/pretty-format": "4.1.8", + "convert-source-map": "^2.0.0", + "tinyrainbow": "^3.1.0" + }, + "funding": { + "url": "https://opencollective.com/vitest" + } + }, + "node_modules/ansi-regex": { + "version": "5.0.1", + "resolved": "https://registry.npmjs.org/ansi-regex/-/ansi-regex-5.0.1.tgz", + "integrity": "sha512-quJQXlTSUGL2LH9SUXo8VwsY4soanhgo6LNSm84E1LBcE8s3O0wpdiRzyR9z/ZZJMlMWv37qOOb9pdJlMUEKFQ==", + "dev": true, + "license": "MIT", + "peer": true, + "engines": { + "node": ">=8" + } + }, + "node_modules/ansi-styles": { + "version": "5.2.0", + "resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-5.2.0.tgz", + "integrity": "sha512-Cxwpt2SfTzTtXcfOlzGEee8O+c+MmUgGrNiBcXnuWxuFJHe6a5Hz7qwhwe5OgaSYI0IJvkLqWX1ASG+cJOkEiA==", + "dev": true, + "license": "MIT", + "peer": true, + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/chalk/ansi-styles?sponsor=1" + } + }, + "node_modules/aria-query": { + "version": "5.3.0", + "resolved": "https://registry.npmjs.org/aria-query/-/aria-query-5.3.0.tgz", + "integrity": "sha512-b0P0sZPKtyu8HkeRAfCq0IfURZK+SuwMjY1UXGBU27wpAiTwQAIlq56IbIO+ytk/JjS1fMR14ee5WBBfKi5J6A==", + "dev": true, + "license": "Apache-2.0", + "dependencies": { + "dequal": "^2.0.3" + } + }, + "node_modules/assertion-error": { + "version": "2.0.1", + "resolved": "https://registry.npmjs.org/assertion-error/-/assertion-error-2.0.1.tgz", + "integrity": "sha512-Izi8RQcffqCeNVgFigKli1ssklIbpHnCYc6AknXGYoB6grJqyeby7jv12JUQgmTAnIDnbck1uxksT4dzN3PWBA==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=12" + } + }, + "node_modules/bidi-js": { + "version": "1.0.3", + "resolved": "https://registry.npmjs.org/bidi-js/-/bidi-js-1.0.3.tgz", + "integrity": "sha512-RKshQI1R3YQ+n9YJz2QQ147P66ELpa1FQEg20Dk8oW9t2KgLbpDLLp9aGZ7y8WHSshDknG0bknqGw5/tyCs5tw==", + "dev": true, + "license": "MIT", + "dependencies": { + "require-from-string": "^2.0.2" + } + }, + "node_modules/chai": { + "version": "6.2.2", + "resolved": "https://registry.npmjs.org/chai/-/chai-6.2.2.tgz", + "integrity": "sha512-NUPRluOfOiTKBKvWPtSD4PhFvWCqOi0BGStNWs57X9js7XGTprSmFoz5F0tWhR4WPjNeR9jXqdC7/UpSJTnlRg==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=18" + } + }, + "node_modules/commander": { + "version": "8.3.0", + "resolved": "https://registry.npmjs.org/commander/-/commander-8.3.0.tgz", + "integrity": "sha512-OkTL9umf+He2DZkUq8f8J9of7yL6RJKI24dVITBmNfZBmri9zYZQrKkuXiKhyfPSu8tUhnVBB1iKXevvnlR4Ww==", + "license": "MIT", + "engines": { + "node": ">= 12" + } + }, + "node_modules/convert-source-map": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/convert-source-map/-/convert-source-map-2.0.0.tgz", "integrity": "sha512-Kvp459HrV2FEJ1CAsi1Ku+MY3kasH19TFykTz2xWmMeq6bk2NU3XXvfJ+Q61m0xktWwt+1HSYf3JZsTms3aRJg==", "dev": true, "license": "MIT" }, + "node_modules/css-tree": { + "version": "3.2.1", + "resolved": "https://registry.npmjs.org/css-tree/-/css-tree-3.2.1.tgz", + "integrity": "sha512-X7sjQzceUhu1u7Y/ylrRZFU2FS6LRiFVp6rKLPg23y3x3c3DOKAwuXGDp+PAGjh6CSnCjYeAul8pcT8bAl+lSA==", + "dev": true, + "license": "MIT", + "dependencies": { + "mdn-data": "2.27.1", + "source-map-js": "^1.2.1" + }, + "engines": { + "node": "^10 || ^12.20.0 || ^14.13.0 || >=15.0.0" + } + }, + "node_modules/css.escape": { + "version": "1.5.1", + "resolved": "https://registry.npmjs.org/css.escape/-/css.escape-1.5.1.tgz", + "integrity": "sha512-YUifsXXuknHlUsmlgyY0PKzgPOr7/FjCePfHNt0jxm83wHZi44VDMQ7/fGNkjY3/jV1MC+1CmZbaHzugyeRtpg==", + "dev": true, + "license": "MIT" + }, "node_modules/csstype": { "version": "3.2.3", "resolved": "https://registry.npmjs.org/csstype/-/csstype-3.2.3.tgz", @@ -923,47 +1518,93 @@ "dev": true, "license": "MIT" }, - "node_modules/debug": { - "version": "4.4.3", - "resolved": "https://registry.npmjs.org/debug/-/debug-4.4.3.tgz", - "integrity": "sha512-RGwwWnwQvkVfavKVt22FGLw+xYSdzARwm0ru6DhTVA3umU5hZc28V3kO4stgYryrTlLpuvgI9GiijltAjNbcqA==", + "node_modules/data-urls": { + "version": "7.0.0", + "resolved": "https://registry.npmjs.org/data-urls/-/data-urls-7.0.0.tgz", + "integrity": "sha512-23XHcCF+coGYevirZceTVD7NdJOqVn+49IHyxgszm+JIiHLoB2TkmPtsYkNWT1pvRSGkc35L6NHs0yHkN2SumA==", "dev": true, "license": "MIT", "dependencies": { - "ms": "^2.1.3" + "whatwg-mimetype": "^5.0.0", + "whatwg-url": "^16.0.0" }, "engines": { - "node": ">=6.0" - }, - "peerDependenciesMeta": { - "supports-color": { - "optional": true - } + "node": "^20.19.0 || ^22.12.0 || >=24.0.0" + } + }, + "node_modules/decimal.js": { + "version": "10.6.0", + "resolved": "https://registry.npmjs.org/decimal.js/-/decimal.js-10.6.0.tgz", + "integrity": "sha512-YpgQiITW3JXGntzdUmyUR1V812Hn8T1YVXhCu+wO3OpS4eU9l4YdD3qjyiKdV6mvV29zapkMeD390UVEf2lkUg==", + "dev": true, + "license": "MIT" + }, + "node_modules/dequal": { + "version": "2.0.3", + "resolved": "https://registry.npmjs.org/dequal/-/dequal-2.0.3.tgz", + "integrity": "sha512-0je+qPKHEMohvfRTCEo3CrPG6cAzAYgmzKyxRiYSSDkS6eGJdyVJm7WaYA5ECaAD9wLB2T4EEeymA5aFVcYXCA==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6" + } + }, + "node_modules/detect-libc": { + "version": "2.1.2", + "resolved": "https://registry.npmjs.org/detect-libc/-/detect-libc-2.1.2.tgz", + "integrity": "sha512-Btj2BOOO83o3WyH59e8MgXsxEQVcarkUOpEYrubB0urwnN10yQ364rsiByU11nZlqWYZm05i/of7io4mzihBtQ==", + "dev": true, + "license": "Apache-2.0", + "engines": { + "node": ">=8" } }, + "node_modules/dom-accessibility-api": { + "version": "0.5.16", + "resolved": "https://registry.npmjs.org/dom-accessibility-api/-/dom-accessibility-api-0.5.16.tgz", + "integrity": "sha512-X7BJ2yElsnOJ30pZF4uIIDfBEVgF4XEBxL9Bxhy6dnrm5hkzqmsWHGTiHqRiITNhMyFLyAiWndIJP7Z1NTteDg==", + "dev": true, + "license": "MIT", + "peer": true + }, "node_modules/dompurify": { - "version": "3.4.0", - "resolved": "https://registry.npmjs.org/dompurify/-/dompurify-3.4.0.tgz", - "integrity": "sha512-nolgK9JcaUXMSmW+j1yaSvaEaoXYHwWyGJlkoCTghc97KgGDDSnpoU/PlEnw63Ah+TGKFOyY+X5LnxaWbCSfXg==", + "version": "3.4.11", + "resolved": "https://registry.npmjs.org/dompurify/-/dompurify-3.4.11.tgz", + "integrity": "sha512-zhlUV12GsaRzMsf9q5M254YhA4+VuF0fG+QFqu6aYpoGlKtz+w8//jBcGVYBgQkR5GHjUomejY84AV+/uPbWdw==", "license": "(MPL-2.0 OR Apache-2.0)", "optionalDependencies": { "@types/trusted-types": "^2.0.7" } }, - "node_modules/electron-to-chromium": { - "version": "1.5.267", - "resolved": "https://registry.npmjs.org/electron-to-chromium/-/electron-to-chromium-1.5.267.tgz", - "integrity": "sha512-0Drusm6MVRXSOJpGbaSVgcQsuB4hEkMpHXaVstcPmhu5LIedxs1xNK/nIxmQIU/RPC0+1/o0AVZfBTkTNJOdUw==", + "node_modules/entities": { + "version": "8.0.0", + "resolved": "https://registry.npmjs.org/entities/-/entities-8.0.0.tgz", + "integrity": "sha512-zwfzJecQ/Uej6tusMqwAqU/6KL2XaB2VZ2Jg54Je6ahNBGNH6Ek6g3jjNCF0fG9EWQKGZNddNjU5F1ZQn/sBnA==", "dev": true, - "license": "ISC" + "license": "BSD-2-Clause", + "engines": { + "node": ">=20.19.0" + }, + "funding": { + "url": "https://github.com/fb55/entities?sponsor=1" + } + }, + "node_modules/es-module-lexer": { + "version": "2.1.0", + "resolved": "https://registry.npmjs.org/es-module-lexer/-/es-module-lexer-2.1.0.tgz", + "integrity": "sha512-n27zTYMjYu1aj4MjCWzSP7G9r75utsaoc8m61weK+W8JMBGGQybd43GstCXZ3WNmSFtGT9wi59qQTW6mhTR5LQ==", + "dev": true, + "license": "MIT" }, "node_modules/esbuild": { - "version": "0.27.2", - "resolved": "https://registry.npmjs.org/esbuild/-/esbuild-0.27.2.tgz", - "integrity": "sha512-HyNQImnsOC7X9PMNaCIeAm4ISCQXs5a5YasTXVliKv4uuBo1dKrG0A+uQS8M5eXjVMnLg3WgXaKvprHlFJQffw==", + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/esbuild/-/esbuild-0.28.1.tgz", + "integrity": "sha512-HrJrvZv5ayxBzPfwphOoNzkzOIIlifzk0KJrGK2c8R4+LKpMtpYLQeUdjnwjWv/LZlkH2laZk+4w78pi99D4Vw==", "dev": true, "hasInstallScript": true, "license": "MIT", + "optional": true, + "peer": true, "bin": { "esbuild": "bin/esbuild" }, @@ -971,42 +1612,52 @@ "node": ">=18" }, "optionalDependencies": { - "@esbuild/aix-ppc64": "0.27.2", - "@esbuild/android-arm": "0.27.2", - "@esbuild/android-arm64": "0.27.2", - "@esbuild/android-x64": "0.27.2", - "@esbuild/darwin-arm64": "0.27.2", - "@esbuild/darwin-x64": "0.27.2", - "@esbuild/freebsd-arm64": "0.27.2", - "@esbuild/freebsd-x64": "0.27.2", - "@esbuild/linux-arm": "0.27.2", - "@esbuild/linux-arm64": "0.27.2", - "@esbuild/linux-ia32": "0.27.2", - "@esbuild/linux-loong64": "0.27.2", - "@esbuild/linux-mips64el": "0.27.2", - "@esbuild/linux-ppc64": "0.27.2", - "@esbuild/linux-riscv64": "0.27.2", - "@esbuild/linux-s390x": "0.27.2", - "@esbuild/linux-x64": "0.27.2", - "@esbuild/netbsd-arm64": "0.27.2", - "@esbuild/netbsd-x64": "0.27.2", - "@esbuild/openbsd-arm64": "0.27.2", - "@esbuild/openbsd-x64": "0.27.2", - "@esbuild/openharmony-arm64": "0.27.2", - "@esbuild/sunos-x64": "0.27.2", - "@esbuild/win32-arm64": "0.27.2", - "@esbuild/win32-ia32": "0.27.2", - "@esbuild/win32-x64": "0.27.2" - } - }, - "node_modules/escalade": { - "version": "3.2.0", - "resolved": "https://registry.npmjs.org/escalade/-/escalade-3.2.0.tgz", - "integrity": "sha512-WUj2qlxaQtO4g6Pq5c29GTcWGDyd8itL8zTlipgECz3JesAiiOKotd8JU6otB3PACgG6xkJUyVhboMS+bje/jA==", + "@esbuild/aix-ppc64": "0.28.1", + "@esbuild/android-arm": "0.28.1", + "@esbuild/android-arm64": "0.28.1", + "@esbuild/android-x64": "0.28.1", + "@esbuild/darwin-arm64": "0.28.1", + "@esbuild/darwin-x64": "0.28.1", + "@esbuild/freebsd-arm64": "0.28.1", + "@esbuild/freebsd-x64": "0.28.1", + "@esbuild/linux-arm": "0.28.1", + "@esbuild/linux-arm64": "0.28.1", + "@esbuild/linux-ia32": "0.28.1", + "@esbuild/linux-loong64": "0.28.1", + "@esbuild/linux-mips64el": "0.28.1", + "@esbuild/linux-ppc64": "0.28.1", + "@esbuild/linux-riscv64": "0.28.1", + "@esbuild/linux-s390x": "0.28.1", + "@esbuild/linux-x64": "0.28.1", + "@esbuild/netbsd-arm64": "0.28.1", + "@esbuild/netbsd-x64": "0.28.1", + "@esbuild/openbsd-arm64": "0.28.1", + "@esbuild/openbsd-x64": "0.28.1", + "@esbuild/openharmony-arm64": "0.28.1", + "@esbuild/sunos-x64": "0.28.1", + "@esbuild/win32-arm64": "0.28.1", + "@esbuild/win32-ia32": "0.28.1", + "@esbuild/win32-x64": "0.28.1" + } + }, + "node_modules/estree-walker": { + "version": "3.0.3", + "resolved": "https://registry.npmjs.org/estree-walker/-/estree-walker-3.0.3.tgz", + "integrity": "sha512-7RUKfXgSMMkzt6ZuXmqapOurLGPPfgj6l9uRZ7lRGolvk0y2yocc35LdcxKC5PQZdn2DMqioAQ2NoWcrTKmm6g==", "dev": true, "license": "MIT", + "dependencies": { + "@types/estree": "^1.0.0" + } + }, + "node_modules/expect-type": { + "version": "1.3.0", + "resolved": "https://registry.npmjs.org/expect-type/-/expect-type-1.3.0.tgz", + "integrity": "sha512-knvyeauYhqjOYvQ66MznSMs83wmHrCycNEN6Ao+2AeYEfxUIkuiVxdEa1qlGEPK+We3n0THiDciYSsCcgW/DoA==", + "dev": true, + "license": "Apache-2.0", "engines": { - "node": ">=6" + "node": ">=12.0.0" } }, "node_modules/fdir": { @@ -1042,62 +1693,368 @@ "node": "^8.16.0 || ^10.6.0 || >=11.0.0" } }, - "node_modules/gensync": { - "version": "1.0.0-beta.2", - "resolved": "https://registry.npmjs.org/gensync/-/gensync-1.0.0-beta.2.tgz", - "integrity": "sha512-3hN7NaskYvMDLQY55gnW3NQ+mesEAepTqlg+VEbj7zzqEMBVNhzcGYYeqFo/TlYz6eQiFcp1HcsCZO+nGgS8zg==", + "node_modules/html-encoding-sniffer": { + "version": "6.0.0", + "resolved": "https://registry.npmjs.org/html-encoding-sniffer/-/html-encoding-sniffer-6.0.0.tgz", + "integrity": "sha512-CV9TW3Y3f8/wT0BRFc1/KAVQ3TUHiXmaAb6VW9vtiMFf7SLoMd1PdAc4W3KFOFETBJUb90KatHqlsZMWV+R9Gg==", "dev": true, "license": "MIT", + "dependencies": { + "@exodus/bytes": "^1.6.0" + }, "engines": { - "node": ">=6.9.0" + "node": "^20.19.0 || ^22.12.0 || >=24.0.0" } }, + "node_modules/indent-string": { + "version": "4.0.0", + "resolved": "https://registry.npmjs.org/indent-string/-/indent-string-4.0.0.tgz", + "integrity": "sha512-EdDDZu4A2OyIK7Lr/2zG+w5jmbuk1DVBnEwREQvBzspBJkCEbRa8GxU1lghYcaGJCnRWibjDXlq779X1/y5xwg==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=8" + } + }, + "node_modules/is-potential-custom-element-name": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/is-potential-custom-element-name/-/is-potential-custom-element-name-1.0.1.tgz", + "integrity": "sha512-bCYeRA2rVibKZd+s2625gGnGF/t7DSqDs4dP7CrLA1m7jKWz6pps0LpYLJN8Q64HtmPKJ1hrN3nzPNKFEKOUiQ==", + "dev": true, + "license": "MIT" + }, "node_modules/js-tokens": { "version": "4.0.0", "resolved": "https://registry.npmjs.org/js-tokens/-/js-tokens-4.0.0.tgz", "integrity": "sha512-RdJUflcE3cUzKiMqQgsCu06FPu9UdIJO0beYbPhHN4k6apgJtifcoCtT9bcxOpYBtpD2kCM6Sbzg4CausW/PKQ==", "license": "MIT" }, - "node_modules/jsesc": { - "version": "3.1.0", - "resolved": "https://registry.npmjs.org/jsesc/-/jsesc-3.1.0.tgz", - "integrity": "sha512-/sM3dO2FOzXjKQhJuo0Q173wf2KOo8t4I8vHy6lF9poUp7bKT0/NHE8fPX23PwfhnykfqnC2xRxOnVw5XuGIaA==", + "node_modules/jsdom": { + "version": "29.1.1", + "resolved": "https://registry.npmjs.org/jsdom/-/jsdom-29.1.1.tgz", + "integrity": "sha512-ECi4Fi2f7BdJtUKTflYRTiaMxIB0O6zfR1fX0GXpUrf6flp8QIYn1UT20YQqdSOfk2dfkCwS8LAFoJDEppNK5Q==", + "dev": true, + "license": "MIT", + "dependencies": { + "@asamuzakjp/css-color": "^5.1.11", + "@asamuzakjp/dom-selector": "^7.1.1", + "@bramus/specificity": "^2.4.2", + "@csstools/css-syntax-patches-for-csstree": "^1.1.3", + "@exodus/bytes": "^1.15.0", + "css-tree": "^3.2.1", + "data-urls": "^7.0.0", + "decimal.js": "^10.6.0", + "html-encoding-sniffer": "^6.0.0", + "is-potential-custom-element-name": "^1.0.1", + "lru-cache": "^11.3.5", + "parse5": "^8.0.1", + "saxes": "^6.0.0", + "symbol-tree": "^3.2.4", + "tough-cookie": "^6.0.1", + "undici": "^7.25.0", + "w3c-xmlserializer": "^5.0.0", + "webidl-conversions": "^8.0.1", + "whatwg-mimetype": "^5.0.0", + "whatwg-url": "^16.0.1", + "xml-name-validator": "^5.0.0" + }, + "engines": { + "node": "^20.19.0 || ^22.13.0 || >=24.0.0" + }, + "peerDependencies": { + "canvas": "^3.0.0" + }, + "peerDependenciesMeta": { + "canvas": { + "optional": true + } + } + }, + "node_modules/jsdom/node_modules/lru-cache": { + "version": "11.5.1", + "resolved": "https://registry.npmjs.org/lru-cache/-/lru-cache-11.5.1.tgz", + "integrity": "sha512-RPimw/7aMdv2oqRrxKwvZXcPfwBrn/JZ2xYcY9Hus/6LaS3VOAKVWKWgNLCFSiOm1ESXinjsDlidVU7JlnCN2A==", "dev": true, + "license": "BlueOak-1.0.0", + "engines": { + "node": "20 || >=22" + } + }, + "node_modules/katex": { + "version": "0.16.27", + "resolved": "https://registry.npmjs.org/katex/-/katex-0.16.27.tgz", + "integrity": "sha512-aeQoDkuRWSqQN6nSvVCEFvfXdqo1OQiCmmW1kc9xSdjutPv7BGO7pqY9sQRJpMOGrEdfDgF2TfRXe5eUAD2Waw==", + "funding": [ + "https://opencollective.com/katex", + "https://github.com/sponsors/katex" + ], "license": "MIT", + "dependencies": { + "commander": "^8.3.0" + }, "bin": { - "jsesc": "bin/jsesc" + "katex": "cli.js" + } + }, + "node_modules/lightningcss": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss/-/lightningcss-1.32.0.tgz", + "integrity": "sha512-NXYBzinNrblfraPGyrbPoD19C1h9lfI/1mzgWYvXUTe414Gz/X1FD2XBZSZM7rRTrMA8JL3OtAaGifrIKhQ5yQ==", + "dev": true, + "license": "MPL-2.0", + "dependencies": { + "detect-libc": "^2.0.3" + }, + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + }, + "optionalDependencies": { + "lightningcss-android-arm64": "1.32.0", + "lightningcss-darwin-arm64": "1.32.0", + "lightningcss-darwin-x64": "1.32.0", + "lightningcss-freebsd-x64": "1.32.0", + "lightningcss-linux-arm-gnueabihf": "1.32.0", + "lightningcss-linux-arm64-gnu": "1.32.0", + "lightningcss-linux-arm64-musl": "1.32.0", + "lightningcss-linux-x64-gnu": "1.32.0", + "lightningcss-linux-x64-musl": "1.32.0", + "lightningcss-win32-arm64-msvc": "1.32.0", + "lightningcss-win32-x64-msvc": "1.32.0" + } + }, + "node_modules/lightningcss-android-arm64": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-android-arm64/-/lightningcss-android-arm64-1.32.0.tgz", + "integrity": "sha512-YK7/ClTt4kAK0vo6w3X+Pnm0D2cf2vPHbhOXdoNti1Ga0al1P4TBZhwjATvjNwLEBCnKvjJc2jQgHXH0NEwlAg==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MPL-2.0", + "optional": true, + "os": [ + "android" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-darwin-arm64": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-darwin-arm64/-/lightningcss-darwin-arm64-1.32.0.tgz", + "integrity": "sha512-RzeG9Ju5bag2Bv1/lwlVJvBE3q6TtXskdZLLCyfg5pt+HLz9BqlICO7LZM7VHNTTn/5PRhHFBSjk5lc4cmscPQ==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MPL-2.0", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-darwin-x64": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-darwin-x64/-/lightningcss-darwin-x64-1.32.0.tgz", + "integrity": "sha512-U+QsBp2m/s2wqpUYT/6wnlagdZbtZdndSmut/NJqlCcMLTWp5muCrID+K5UJ6jqD2BFshejCYXniPDbNh73V8w==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MPL-2.0", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-freebsd-x64": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-freebsd-x64/-/lightningcss-freebsd-x64-1.32.0.tgz", + "integrity": "sha512-JCTigedEksZk3tHTTthnMdVfGf61Fky8Ji2E4YjUTEQX14xiy/lTzXnu1vwiZe3bYe0q+SpsSH/CTeDXK6WHig==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MPL-2.0", + "optional": true, + "os": [ + "freebsd" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-linux-arm-gnueabihf": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-linux-arm-gnueabihf/-/lightningcss-linux-arm-gnueabihf-1.32.0.tgz", + "integrity": "sha512-x6rnnpRa2GL0zQOkt6rts3YDPzduLpWvwAF6EMhXFVZXD4tPrBkEFqzGowzCsIWsPjqSK+tyNEODUBXeeVHSkw==", + "cpu": [ + "arm" + ], + "dev": true, + "license": "MPL-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-linux-arm64-gnu": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-linux-arm64-gnu/-/lightningcss-linux-arm64-gnu-1.32.0.tgz", + "integrity": "sha512-0nnMyoyOLRJXfbMOilaSRcLH3Jw5z9HDNGfT/gwCPgaDjnx0i8w7vBzFLFR1f6CMLKF8gVbebmkUN3fa/kQJpQ==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MPL-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-linux-arm64-musl": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-linux-arm64-musl/-/lightningcss-linux-arm64-musl-1.32.0.tgz", + "integrity": "sha512-UpQkoenr4UJEzgVIYpI80lDFvRmPVg6oqboNHfoH4CQIfNA+HOrZ7Mo7KZP02dC6LjghPQJeBsvXhJod/wnIBg==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MPL-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-linux-x64-gnu": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-linux-x64-gnu/-/lightningcss-linux-x64-gnu-1.32.0.tgz", + "integrity": "sha512-V7Qr52IhZmdKPVr+Vtw8o+WLsQJYCTd8loIfpDaMRWGUZfBOYEJeyJIkqGIDMZPwPx24pUMfwSxxI8phr/MbOA==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MPL-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-linux-x64-musl": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-linux-x64-musl/-/lightningcss-linux-x64-musl-1.32.0.tgz", + "integrity": "sha512-bYcLp+Vb0awsiXg/80uCRezCYHNg1/l3mt0gzHnWV9XP1W5sKa5/TCdGWaR/zBM2PeF/HbsQv/j2URNOiVuxWg==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MPL-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">= 12.0.0" }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-win32-arm64-msvc": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-win32-arm64-msvc/-/lightningcss-win32-arm64-msvc-1.32.0.tgz", + "integrity": "sha512-8SbC8BR40pS6baCM8sbtYDSwEVQd4JlFTOlaD3gWGHfThTcABnNDBda6eTZeqbofalIJhFx0qKzgHJmcPTnGdw==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MPL-2.0", + "optional": true, + "os": [ + "win32" + ], "engines": { - "node": ">=6" - } - }, - "node_modules/json5": { - "version": "2.2.3", - "resolved": "https://registry.npmjs.org/json5/-/json5-2.2.3.tgz", - "integrity": "sha512-XmOWe7eyHYH14cLdVPoyg+GOH3rYX++KpzrylJwSW98t3Nk+U8XOl8FWKOgwtzdb8lXGf6zYwDUzeHMWfxasyg==", - "dev": true, - "license": "MIT", - "bin": { - "json5": "lib/cli.js" + "node": ">= 12.0.0" }, - "engines": { - "node": ">=6" + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" } }, - "node_modules/katex": { - "version": "0.16.27", - "resolved": "https://registry.npmjs.org/katex/-/katex-0.16.27.tgz", - "integrity": "sha512-aeQoDkuRWSqQN6nSvVCEFvfXdqo1OQiCmmW1kc9xSdjutPv7BGO7pqY9sQRJpMOGrEdfDgF2TfRXe5eUAD2Waw==", - "funding": [ - "https://opencollective.com/katex", - "https://github.com/sponsors/katex" + "node_modules/lightningcss-win32-x64-msvc": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-win32-x64-msvc/-/lightningcss-win32-x64-msvc-1.32.0.tgz", + "integrity": "sha512-Amq9B/SoZYdDi1kFrojnoqPLxYhQ4Wo5XiL8EVJrVsB8ARoC1PWW6VGtT0WKCemjy8aC+louJnjS7U18x3b06Q==", + "cpu": [ + "x64" ], - "license": "MIT", - "dependencies": { - "commander": "^8.3.0" + "dev": true, + "license": "MPL-2.0", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": ">= 12.0.0" }, - "bin": { - "katex": "cli.js" + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" } }, "node_modules/loose-envify": { @@ -1112,27 +2069,48 @@ "loose-envify": "cli.js" } }, - "node_modules/lru-cache": { - "version": "5.1.1", - "resolved": "https://registry.npmjs.org/lru-cache/-/lru-cache-5.1.1.tgz", - "integrity": "sha512-KpNARQA3Iwv+jTA0utUVVbrh+Jlrr1Fv0e56GGzAFOXN7dk/FviaDW8LHmK52DlcH4WP2n6gI8vN1aesBFgo9w==", + "node_modules/lz-string": { + "version": "1.5.0", + "resolved": "https://registry.npmjs.org/lz-string/-/lz-string-1.5.0.tgz", + "integrity": "sha512-h5bgJWpxJNswbU7qCrV0tIKQCaS3blPDrqKWx+QxzuzL1zGUzij9XCWLrSLsJPu5t+eWA/ycetzYAO5IOMcWAQ==", "dev": true, - "license": "ISC", + "license": "MIT", + "peer": true, + "bin": { + "lz-string": "bin/bin.js" + } + }, + "node_modules/magic-string": { + "version": "0.30.21", + "resolved": "https://registry.npmjs.org/magic-string/-/magic-string-0.30.21.tgz", + "integrity": "sha512-vd2F4YUyEXKGcLHoq+TEyCjxueSeHnFxyyjNp80yg0XV4vUhnDer/lvvlqM/arB5bXQN5K2/3oinyCRyx8T2CQ==", + "dev": true, + "license": "MIT", "dependencies": { - "yallist": "^3.0.2" + "@jridgewell/sourcemap-codec": "^1.5.5" } }, - "node_modules/ms": { - "version": "2.1.3", - "resolved": "https://registry.npmjs.org/ms/-/ms-2.1.3.tgz", - "integrity": "sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA==", + "node_modules/mdn-data": { + "version": "2.27.1", + "resolved": "https://registry.npmjs.org/mdn-data/-/mdn-data-2.27.1.tgz", + "integrity": "sha512-9Yubnt3e8A0OKwxYSXyhLymGW4sCufcLG6VdiDdUGVkPhpqLxlvP5vl1983gQjJl3tqbrM731mjaZaP68AgosQ==", "dev": true, - "license": "MIT" + "license": "CC0-1.0" + }, + "node_modules/min-indent": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/min-indent/-/min-indent-1.0.1.tgz", + "integrity": "sha512-I9jwMn07Sy/IwOj3zVkVik2JTvgpaykDZEigL6Rx6N9LbMywwUSMtxET+7lVoDLLd3O3IXwJwvuuns8UB/HeAg==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=4" + } }, "node_modules/nanoid": { - "version": "3.3.11", - "resolved": "https://registry.npmjs.org/nanoid/-/nanoid-3.3.11.tgz", - "integrity": "sha512-N8SpfPUnUp1bK+PMYW8qSWdl9U+wwNWI4QKxOYDy9JAro3WMX7p2OeVRF9v+347pnakNevPmiHhNmZ2HbFA76w==", + "version": "3.3.12", + "resolved": "https://registry.npmjs.org/nanoid/-/nanoid-3.3.12.tgz", + "integrity": "sha512-ZB9RH/39qpq5Vu6Y+NmUaFhQR6pp+M2Xt76XBnEwDaGcVAqhlvxrl3B2bKS5D3NH3QR76v3aSrKaF/Kiy7lEtQ==", "dev": true, "funding": [ { @@ -1148,10 +2126,37 @@ "node": "^10 || ^12 || ^13.7 || ^14 || >=15.0.1" } }, - "node_modules/node-releases": { - "version": "2.0.27", - "resolved": "https://registry.npmjs.org/node-releases/-/node-releases-2.0.27.tgz", - "integrity": "sha512-nmh3lCkYZ3grZvqcCH+fjmQ7X+H0OeZgP40OierEaAptX4XofMh5kwNbWh7lBduUzCcV/8kZ+NDLCwm2iorIlA==", + "node_modules/obug": { + "version": "2.1.3", + "resolved": "https://registry.npmjs.org/obug/-/obug-2.1.3.tgz", + "integrity": "sha512-9miFgM2OFba7hB+pRgvtV84pYTBaoTHohvmIgiRt6dRIzbwEOIaNaP+dIlGs2fNFoB0SeISs0Jz5WFVRid6Xyg==", + "dev": true, + "funding": [ + "https://github.com/sponsors/sxzz", + "https://opencollective.com/debug" + ], + "license": "MIT", + "engines": { + "node": ">=12.20.0" + } + }, + "node_modules/parse5": { + "version": "8.0.1", + "resolved": "https://registry.npmjs.org/parse5/-/parse5-8.0.1.tgz", + "integrity": "sha512-z1e/HMG90obSGeidlli3hj7cbocou0/wa5HacvI3ASx34PecNjNQeaHNo5WIZpWofN9kgkqV1q5YvXe3F0FoPw==", + "dev": true, + "license": "MIT", + "dependencies": { + "entities": "^8.0.0" + }, + "funding": { + "url": "https://github.com/inikulin/parse5?sponsor=1" + } + }, + "node_modules/pathe": { + "version": "2.0.3", + "resolved": "https://registry.npmjs.org/pathe/-/pathe-2.0.3.tgz", + "integrity": "sha512-WUjGcAqP1gQacoQe+OBJsFA7Ld4DyXuUIjZ5cc75cLHvJ7dtNsTugphxIADwspS+AraAUePCKrSVtPLFj/F88w==", "dev": true, "license": "MIT" }, @@ -1176,9 +2181,9 @@ } }, "node_modules/postcss": { - "version": "8.5.14", - "resolved": "https://registry.npmjs.org/postcss/-/postcss-8.5.14.tgz", - "integrity": "sha512-SoSL4+OSEtR99LHFZQiJLkT59C5B1amGO1NzTwj7TT1qCUgUO6hxOvzkOYxD+vMrXBM3XJIKzokoERdqQq/Zmg==", + "version": "8.5.15", + "resolved": "https://registry.npmjs.org/postcss/-/postcss-8.5.15.tgz", + "integrity": "sha512-FfR8sjd4em2T6fb3I2MwAJU7HWVMr9zba+enmQeeWFfCbm+UOC/0X4DS8XtpUTMwWMGbjKYP7xjfNekzyGmB3A==", "dev": true, "funding": [ { @@ -1196,7 +2201,7 @@ ], "license": "MIT", "dependencies": { - "nanoid": "^3.3.11", + "nanoid": "^3.3.12", "picocolors": "^1.1.1", "source-map-js": "^1.2.1" }, @@ -1204,6 +2209,32 @@ "node": "^10 || ^12 || >=14" } }, + "node_modules/pretty-format": { + "version": "27.5.1", + "resolved": "https://registry.npmjs.org/pretty-format/-/pretty-format-27.5.1.tgz", + "integrity": "sha512-Qb1gy5OrP5+zDf2Bvnzdl3jsTf1qXVMazbvCoKhtKqVs4/YK4ozX4gKQJJVyNe+cajNPn0KoC0MC3FUmaHWEmQ==", + "dev": true, + "license": "MIT", + "peer": true, + "dependencies": { + "ansi-regex": "^5.0.1", + "ansi-styles": "^5.0.0", + "react-is": "^17.0.1" + }, + "engines": { + "node": "^10.13.0 || ^12.13.0 || ^14.15.0 || >=15.0.0" + } + }, + "node_modules/punycode": { + "version": "2.3.1", + "resolved": "https://registry.npmjs.org/punycode/-/punycode-2.3.1.tgz", + "integrity": "sha512-vYt7UD1U9Wg6138shLtLOvdAu+8DsC/ilFtEVHcH+wydcSpNE20AfSOduf6MkRFahL5FY7X1oU7nKVZFtfq8Fg==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6" + } + }, "node_modules/react": { "version": "18.3.1", "resolved": "https://registry.npmjs.org/react/-/react-18.3.1.tgz", @@ -1229,59 +2260,83 @@ "react": "^18.3.1" } }, - "node_modules/react-refresh": { - "version": "0.17.0", - "resolved": "https://registry.npmjs.org/react-refresh/-/react-refresh-0.17.0.tgz", - "integrity": "sha512-z6F7K9bV85EfseRCp2bzrpyQ0Gkw1uLoCel9XBVWPg/TjRj94SkJzUTGfOa4bs7iJvBWtQG0Wq7wnI0syw3EBQ==", + "node_modules/react-is": { + "version": "17.0.2", + "resolved": "https://registry.npmjs.org/react-is/-/react-is-17.0.2.tgz", + "integrity": "sha512-w2GsyukL62IJnlaff/nRegPQR94C/XXamvMWmSHRJ4y7Ts/4ocGRmTHvOs8PSE6pB3dWOrD/nueuU5sduBsQ4w==", + "dev": true, + "license": "MIT", + "peer": true + }, + "node_modules/redent": { + "version": "3.0.0", + "resolved": "https://registry.npmjs.org/redent/-/redent-3.0.0.tgz", + "integrity": "sha512-6tDA8g98We0zd0GvVeMT9arEOnTw9qM03L9cJXaCjrip1OO764RDBLBfrB4cwzNGDj5OA5ioymC9GkizgWJDUg==", + "dev": true, + "license": "MIT", + "dependencies": { + "indent-string": "^4.0.0", + "strip-indent": "^3.0.0" + }, + "engines": { + "node": ">=8" + } + }, + "node_modules/require-from-string": { + "version": "2.0.2", + "resolved": "https://registry.npmjs.org/require-from-string/-/require-from-string-2.0.2.tgz", + "integrity": "sha512-Xf0nWe6RseziFMu+Ap9biiUbmplq6S9/p+7w7YXP/JBHhrUDDUhwa+vANyubuqfZWTveU//DYVGsDG7RKL/vEw==", "dev": true, "license": "MIT", "engines": { "node": ">=0.10.0" } }, - "node_modules/rollup": { - "version": "4.59.0", - "resolved": "https://registry.npmjs.org/rollup/-/rollup-4.59.0.tgz", - "integrity": "sha512-2oMpl67a3zCH9H79LeMcbDhXW/UmWG/y2zuqnF2jQq5uq9TbM9TVyXvA4+t+ne2IIkBdrLpAaRQAvo7YI/Yyeg==", + "node_modules/rolldown": { + "version": "1.0.3", + "resolved": "https://registry.npmjs.org/rolldown/-/rolldown-1.0.3.tgz", + "integrity": "sha512-i00lAJ2ks1BYr7rjNjKC7BcqAS7nVfiT3QX1SI5aY+AFHblCmaUf9OE9dbdzDvW6dJxbi2ZCZiy9v3CcwOiX3g==", "dev": true, "license": "MIT", "dependencies": { - "@types/estree": "1.0.8" + "@oxc-project/types": "=0.133.0", + "@rolldown/pluginutils": "^1.0.0" }, "bin": { - "rollup": "dist/bin/rollup" + "rolldown": "bin/cli.mjs" }, "engines": { - "node": ">=18.0.0", - "npm": ">=8.0.0" + "node": "^20.19.0 || >=22.12.0" }, "optionalDependencies": { - "@rollup/rollup-android-arm-eabi": "4.59.0", - "@rollup/rollup-android-arm64": "4.59.0", - "@rollup/rollup-darwin-arm64": "4.59.0", - "@rollup/rollup-darwin-x64": "4.59.0", - "@rollup/rollup-freebsd-arm64": "4.59.0", - "@rollup/rollup-freebsd-x64": "4.59.0", - "@rollup/rollup-linux-arm-gnueabihf": "4.59.0", - "@rollup/rollup-linux-arm-musleabihf": "4.59.0", - "@rollup/rollup-linux-arm64-gnu": "4.59.0", - "@rollup/rollup-linux-arm64-musl": "4.59.0", - "@rollup/rollup-linux-loong64-gnu": "4.59.0", - "@rollup/rollup-linux-loong64-musl": "4.59.0", - "@rollup/rollup-linux-ppc64-gnu": "4.59.0", - "@rollup/rollup-linux-ppc64-musl": "4.59.0", - "@rollup/rollup-linux-riscv64-gnu": "4.59.0", - "@rollup/rollup-linux-riscv64-musl": "4.59.0", - "@rollup/rollup-linux-s390x-gnu": "4.59.0", - "@rollup/rollup-linux-x64-gnu": "4.59.0", - "@rollup/rollup-linux-x64-musl": "4.59.0", - "@rollup/rollup-openbsd-x64": "4.59.0", - "@rollup/rollup-openharmony-arm64": "4.59.0", - "@rollup/rollup-win32-arm64-msvc": "4.59.0", - "@rollup/rollup-win32-ia32-msvc": "4.59.0", - "@rollup/rollup-win32-x64-gnu": "4.59.0", - "@rollup/rollup-win32-x64-msvc": "4.59.0", - "fsevents": "~2.3.2" + "@rolldown/binding-android-arm64": "1.0.3", + "@rolldown/binding-darwin-arm64": "1.0.3", + "@rolldown/binding-darwin-x64": "1.0.3", + "@rolldown/binding-freebsd-x64": "1.0.3", + "@rolldown/binding-linux-arm-gnueabihf": "1.0.3", + "@rolldown/binding-linux-arm64-gnu": "1.0.3", + "@rolldown/binding-linux-arm64-musl": "1.0.3", + "@rolldown/binding-linux-ppc64-gnu": "1.0.3", + "@rolldown/binding-linux-s390x-gnu": "1.0.3", + "@rolldown/binding-linux-x64-gnu": "1.0.3", + "@rolldown/binding-linux-x64-musl": "1.0.3", + "@rolldown/binding-openharmony-arm64": "1.0.3", + "@rolldown/binding-wasm32-wasi": "1.0.3", + "@rolldown/binding-win32-arm64-msvc": "1.0.3", + "@rolldown/binding-win32-x64-msvc": "1.0.3" + } + }, + "node_modules/saxes": { + "version": "6.0.0", + "resolved": "https://registry.npmjs.org/saxes/-/saxes-6.0.0.tgz", + "integrity": "sha512-xAg7SOnEhrm5zI3puOOKyy1OMcMlIJZYNJY7xLBwSze0UjhPLnWfj2GF2EpT0jmzaJKIWKHLsaSSajf35bcYnA==", + "dev": true, + "license": "ISC", + "dependencies": { + "xmlchars": "^2.2.0" + }, + "engines": { + "node": ">=v12.22.7" } }, "node_modules/scheduler": { @@ -1293,15 +2348,12 @@ "loose-envify": "^1.1.0" } }, - "node_modules/semver": { - "version": "6.3.1", - "resolved": "https://registry.npmjs.org/semver/-/semver-6.3.1.tgz", - "integrity": "sha512-BR7VvDCVHO+q2xBEWskxS6DJE1qRnb7DxzUrogb71CWoSficBxYsiAGd+Kl0mmq/MprG9yArRkyrQxTO6XjMzA==", + "node_modules/siginfo": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/siginfo/-/siginfo-2.0.0.tgz", + "integrity": "sha512-ybx0WO1/8bSBLEWXZvEd7gMW3Sn3JFlW3TvX1nREbDLRNQNaeNN8WK0meBwPdAaOI7TtRRRJn/Es1zhrrCHu7g==", "dev": true, - "license": "ISC", - "bin": { - "semver": "bin/semver.js" - } + "license": "ISC" }, "node_modules/source-map-js": { "version": "1.2.1", @@ -1313,15 +2365,66 @@ "node": ">=0.10.0" } }, + "node_modules/stackback": { + "version": "0.0.2", + "resolved": "https://registry.npmjs.org/stackback/-/stackback-0.0.2.tgz", + "integrity": "sha512-1XMJE5fQo1jGH6Y/7ebnwPOBEkIEnT4QF32d5R1+VXdXveM0IBMJt8zfaxX1P3QhVwrYe+576+jkANtSS2mBbw==", + "dev": true, + "license": "MIT" + }, + "node_modules/std-env": { + "version": "4.1.0", + "resolved": "https://registry.npmjs.org/std-env/-/std-env-4.1.0.tgz", + "integrity": "sha512-Rq7ybcX2RuC55r9oaPVEW7/xu3tj8u4GeBYHBWCychFtzMIr86A7e3PPEBPT37sHStKX3+TiX/Fr/ACmJLVlLQ==", + "dev": true, + "license": "MIT" + }, + "node_modules/strip-indent": { + "version": "3.0.0", + "resolved": "https://registry.npmjs.org/strip-indent/-/strip-indent-3.0.0.tgz", + "integrity": "sha512-laJTa3Jb+VQpaC6DseHhF7dXVqHTfJPCRDaEbid/drOhgitgYku/letMUqOXFoWV0zIIUbjpdH2t+tYj4bQMRQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "min-indent": "^1.0.0" + }, + "engines": { + "node": ">=8" + } + }, + "node_modules/symbol-tree": { + "version": "3.2.4", + "resolved": "https://registry.npmjs.org/symbol-tree/-/symbol-tree-3.2.4.tgz", + "integrity": "sha512-9QNk5KwDF+Bvz+PyObkmSYjI5ksVUYtjW7AU22r2NKcfLJcXp96hkDWU3+XndOsUb+AQ9QhfzfCT2O+CNWT5Tw==", + "dev": true, + "license": "MIT" + }, + "node_modules/tinybench": { + "version": "2.9.0", + "resolved": "https://registry.npmjs.org/tinybench/-/tinybench-2.9.0.tgz", + "integrity": "sha512-0+DUvqWMValLmha6lr4kD8iAMK1HzV0/aKnCtWb9v9641TnP/MFb7Pc2bxoxQjTXAErryXVgUOfv2YqNllqGeg==", + "dev": true, + "license": "MIT" + }, + "node_modules/tinyexec": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/tinyexec/-/tinyexec-1.2.4.tgz", + "integrity": "sha512-SHf/r48b7vOrjve9PxJo3MN5v5yuyjHvdUcrQffT3WXMUfnGmHDVbC4k3sHJaJTgZCwpUplIaAo5ANtMyp3YHg==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=18" + } + }, "node_modules/tinyglobby": { - "version": "0.2.15", - "resolved": "https://registry.npmjs.org/tinyglobby/-/tinyglobby-0.2.15.tgz", - "integrity": "sha512-j2Zq4NyQYG5XMST4cbs02Ak8iJUdxRM0XI5QyxXuZOzKOINmWurp3smXu3y5wDcJrptwpSjgXHzIQxR0omXljQ==", + "version": "0.2.17", + "resolved": "https://registry.npmjs.org/tinyglobby/-/tinyglobby-0.2.17.tgz", + "integrity": "sha512-wXR/dYpcqKmfWpEdZjiKJOwCNFndD0DMnrW/cYjVGttEkBfVgcLFHoNrlj47mjOVic9yyNu65alsgF4NQyTa2g==", "dev": true, "license": "MIT", "dependencies": { "fdir": "^6.5.0", - "picomatch": "^4.0.3" + "picomatch": "^4.0.4" }, "engines": { "node": ">=12.0.0" @@ -1330,50 +2433,92 @@ "url": "https://github.com/sponsors/SuperchupuDev" } }, - "node_modules/update-browserslist-db": { - "version": "1.2.3", - "resolved": "https://registry.npmjs.org/update-browserslist-db/-/update-browserslist-db-1.2.3.tgz", - "integrity": "sha512-Js0m9cx+qOgDxo0eMiFGEueWztz+d4+M3rGlmKPT+T4IS/jP4ylw3Nwpu6cpTTP8R1MAC1kF4VbdLt3ARf209w==", + "node_modules/tinyrainbow": { + "version": "3.1.0", + "resolved": "https://registry.npmjs.org/tinyrainbow/-/tinyrainbow-3.1.0.tgz", + "integrity": "sha512-Bf+ILmBgretUrdJxzXM0SgXLZ3XfiaUuOj/IKQHuTXip+05Xn+uyEYdVg0kYDipTBcLrCVyUzAPz7QmArb0mmw==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=14.0.0" + } + }, + "node_modules/tldts": { + "version": "7.4.2", + "resolved": "https://registry.npmjs.org/tldts/-/tldts-7.4.2.tgz", + "integrity": "sha512-kCwffuaH8ntKtygnWe1b4BJKWiCUH30n5KfoTr6IchcXOwR7chAOFJxFrH3vjANafUYrIA4a7SDL+nn7SiR4Sw==", "dev": true, - "funding": [ - { - "type": "opencollective", - "url": "https://opencollective.com/browserslist" - }, - { - "type": "tidelift", - "url": "https://tidelift.com/funding/github/npm/browserslist" - }, - { - "type": "github", - "url": "https://github.com/sponsors/ai" - } - ], "license": "MIT", "dependencies": { - "escalade": "^3.2.0", - "picocolors": "^1.1.1" + "tldts-core": "^7.4.2" }, "bin": { - "update-browserslist-db": "cli.js" + "tldts": "bin/cli.js" + } + }, + "node_modules/tldts-core": { + "version": "7.4.2", + "resolved": "https://registry.npmjs.org/tldts-core/-/tldts-core-7.4.2.tgz", + "integrity": "sha512-nwEyF4vl4RSJjwSjBUmOSxc3BFPoIFdlRthJ6e+5v9P3bHNsoD06UjuqMUspqp7vsEZ1beaHi1km+optiE17yA==", + "dev": true, + "license": "MIT" + }, + "node_modules/tough-cookie": { + "version": "6.0.1", + "resolved": "https://registry.npmjs.org/tough-cookie/-/tough-cookie-6.0.1.tgz", + "integrity": "sha512-LktZQb3IeoUWB9lqR5EWTHgW/VTITCXg4D21M+lvybRVdylLrRMnqaIONLVb5mav8vM19m44HIcGq4qASeu2Qw==", + "dev": true, + "license": "BSD-3-Clause", + "dependencies": { + "tldts": "^7.0.5" }, - "peerDependencies": { - "browserslist": ">= 4.21.0" + "engines": { + "node": ">=16" + } + }, + "node_modules/tr46": { + "version": "6.0.0", + "resolved": "https://registry.npmjs.org/tr46/-/tr46-6.0.0.tgz", + "integrity": "sha512-bLVMLPtstlZ4iMQHpFHTR7GAGj2jxi8Dg0s2h2MafAE4uSWF98FC/3MomU51iQAMf8/qDUbKWf5GxuvvVcXEhw==", + "dev": true, + "license": "MIT", + "dependencies": { + "punycode": "^2.3.1" + }, + "engines": { + "node": ">=20" + } + }, + "node_modules/tslib": { + "version": "2.8.1", + "resolved": "https://registry.npmjs.org/tslib/-/tslib-2.8.1.tgz", + "integrity": "sha512-oJFu94HQb+KVduSUQL7wnpmqnfmLsOA/nAh6b6EH0wCEoK0/mPeXU6c3wKDV83MkOuHPRHtSXKKU99IBazS/2w==", + "dev": true, + "license": "0BSD", + "optional": true + }, + "node_modules/undici": { + "version": "7.28.0", + "resolved": "https://registry.npmjs.org/undici/-/undici-7.28.0.tgz", + "integrity": "sha512-cRZYrTDwWznlnRiPjggAGxZXanty6M8RV1ff8Wm4LWXBp7/IG8v5DnOm74DtUBp9OONpK75YlPnIjQqX0dBDtA==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=20.18.1" } }, "node_modules/vite": { - "version": "7.3.2", - "resolved": "https://registry.npmjs.org/vite/-/vite-7.3.2.tgz", - "integrity": "sha512-Bby3NOsna2jsjfLVOHKes8sGwgl4TT0E6vvpYgnAYDIF/tie7MRaFthmKuHx1NSXjiTueXH3do80FMQgvEktRg==", + "version": "8.0.16", + "resolved": "https://registry.npmjs.org/vite/-/vite-8.0.16.tgz", + "integrity": "sha512-h9bXPmJichP5fLmVQo3PyaGSDE2n3aPuomeAlVRm0JLmt4rY6zmPKd59HYI4LNW8oTK7tlTsuC7l/m7awx9Jcw==", "dev": true, "license": "MIT", "dependencies": { - "esbuild": "^0.27.0", - "fdir": "^6.5.0", - "picomatch": "^4.0.3", - "postcss": "^8.5.6", - "rollup": "^4.43.0", - "tinyglobby": "^0.2.15" + "lightningcss": "^1.32.0", + "picomatch": "^4.0.4", + "postcss": "^8.5.15", + "rolldown": "1.0.3", + "tinyglobby": "^0.2.17" }, "bin": { "vite": "bin/vite.js" @@ -1389,9 +2534,10 @@ }, "peerDependencies": { "@types/node": "^20.19.0 || >=22.12.0", + "@vitejs/devtools": "^0.1.18", + "esbuild": "^0.27.0 || ^0.28.0", "jiti": ">=1.21.0", "less": "^4.0.0", - "lightningcss": "^1.21.0", "sass": "^1.70.0", "sass-embedded": "^1.70.0", "stylus": ">=0.54.8", @@ -1404,13 +2550,16 @@ "@types/node": { "optional": true }, - "jiti": { + "@vitejs/devtools": { "optional": true }, - "less": { + "esbuild": { + "optional": true + }, + "jiti": { "optional": true }, - "lightningcss": { + "less": { "optional": true }, "sass": { @@ -1436,12 +2585,177 @@ } } }, - "node_modules/yallist": { - "version": "3.1.1", - "resolved": "https://registry.npmjs.org/yallist/-/yallist-3.1.1.tgz", - "integrity": "sha512-a4UGQaWPH59mOXUYnAG2ewncQS4i4F43Tv3JoAM+s2VDAmS9NsK8GpDMLrCHPksFT7h3K6TOoUNn2pb7RoXx4g==", + "node_modules/vitest": { + "version": "4.1.8", + "resolved": "https://registry.npmjs.org/vitest/-/vitest-4.1.8.tgz", + "integrity": "sha512-flY6ScbCIt9HThs+C5HS7jvGOB560DJtk/Z15IQROTA6zEy49Nh8T/dofWTQL+n3vswqn87sbJNiuqw1SDp5Ig==", "dev": true, - "license": "ISC" + "license": "MIT", + "dependencies": { + "@vitest/expect": "4.1.8", + "@vitest/mocker": "4.1.8", + "@vitest/pretty-format": "4.1.8", + "@vitest/runner": "4.1.8", + "@vitest/snapshot": "4.1.8", + "@vitest/spy": "4.1.8", + "@vitest/utils": "4.1.8", + "es-module-lexer": "^2.0.0", + "expect-type": "^1.3.0", + "magic-string": "^0.30.21", + "obug": "^2.1.1", + "pathe": "^2.0.3", + "picomatch": "^4.0.3", + "std-env": "^4.0.0-rc.1", + "tinybench": "^2.9.0", + "tinyexec": "^1.0.2", + "tinyglobby": "^0.2.15", + "tinyrainbow": "^3.1.0", + "vite": "^6.0.0 || ^7.0.0 || ^8.0.0", + "why-is-node-running": "^2.3.0" + }, + "bin": { + "vitest": "vitest.mjs" + }, + "engines": { + "node": "^20.0.0 || ^22.0.0 || >=24.0.0" + }, + "funding": { + "url": "https://opencollective.com/vitest" + }, + "peerDependencies": { + "@edge-runtime/vm": "*", + "@opentelemetry/api": "^1.9.0", + "@types/node": "^20.0.0 || ^22.0.0 || >=24.0.0", + "@vitest/browser-playwright": "4.1.8", + "@vitest/browser-preview": "4.1.8", + "@vitest/browser-webdriverio": "4.1.8", + "@vitest/coverage-istanbul": "4.1.8", + "@vitest/coverage-v8": "4.1.8", + "@vitest/ui": "4.1.8", + "happy-dom": "*", + "jsdom": "*", + "vite": "^6.0.0 || ^7.0.0 || ^8.0.0" + }, + "peerDependenciesMeta": { + "@edge-runtime/vm": { + "optional": true + }, + "@opentelemetry/api": { + "optional": true + }, + "@types/node": { + "optional": true + }, + "@vitest/browser-playwright": { + "optional": true + }, + "@vitest/browser-preview": { + "optional": true + }, + "@vitest/browser-webdriverio": { + "optional": true + }, + "@vitest/coverage-istanbul": { + "optional": true + }, + "@vitest/coverage-v8": { + "optional": true + }, + "@vitest/ui": { + "optional": true + }, + "happy-dom": { + "optional": true + }, + "jsdom": { + "optional": true + }, + "vite": { + "optional": false + } + } + }, + "node_modules/w3c-xmlserializer": { + "version": "5.0.0", + "resolved": "https://registry.npmjs.org/w3c-xmlserializer/-/w3c-xmlserializer-5.0.0.tgz", + "integrity": "sha512-o8qghlI8NZHU1lLPrpi2+Uq7abh4GGPpYANlalzWxyWteJOCsr/P+oPBA49TOLu5FTZO4d3F9MnWJfiMo4BkmA==", + "dev": true, + "license": "MIT", + "dependencies": { + "xml-name-validator": "^5.0.0" + }, + "engines": { + "node": ">=18" + } + }, + "node_modules/webidl-conversions": { + "version": "8.0.1", + "resolved": "https://registry.npmjs.org/webidl-conversions/-/webidl-conversions-8.0.1.tgz", + "integrity": "sha512-BMhLD/Sw+GbJC21C/UgyaZX41nPt8bUTg+jWyDeg7e7YN4xOM05YPSIXceACnXVtqyEw/LMClUQMtMZ+PGGpqQ==", + "dev": true, + "license": "BSD-2-Clause", + "engines": { + "node": ">=20" + } + }, + "node_modules/whatwg-mimetype": { + "version": "5.0.0", + "resolved": "https://registry.npmjs.org/whatwg-mimetype/-/whatwg-mimetype-5.0.0.tgz", + "integrity": "sha512-sXcNcHOC51uPGF0P/D4NVtrkjSU2fNsm9iog4ZvZJsL3rjoDAzXZhkm2MWt1y+PUdggKAYVoMAIYcs78wJ51Cw==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=20" + } + }, + "node_modules/whatwg-url": { + "version": "16.0.1", + "resolved": "https://registry.npmjs.org/whatwg-url/-/whatwg-url-16.0.1.tgz", + "integrity": "sha512-1to4zXBxmXHV3IiSSEInrreIlu02vUOvrhxJJH5vcxYTBDAx51cqZiKdyTxlecdKNSjj8EcxGBxNf6Vg+945gw==", + "dev": true, + "license": "MIT", + "dependencies": { + "@exodus/bytes": "^1.11.0", + "tr46": "^6.0.0", + "webidl-conversions": "^8.0.1" + }, + "engines": { + "node": "^20.19.0 || ^22.12.0 || >=24.0.0" + } + }, + "node_modules/why-is-node-running": { + "version": "2.3.0", + "resolved": "https://registry.npmjs.org/why-is-node-running/-/why-is-node-running-2.3.0.tgz", + "integrity": "sha512-hUrmaWBdVDcxvYqnyh09zunKzROWjbZTiNy8dBEjkS7ehEDQibXJ7XvlmtbwuTclUiIyN+CyXQD4Vmko8fNm8w==", + "dev": true, + "license": "MIT", + "dependencies": { + "siginfo": "^2.0.0", + "stackback": "0.0.2" + }, + "bin": { + "why-is-node-running": "cli.js" + }, + "engines": { + "node": ">=8" + } + }, + "node_modules/xml-name-validator": { + "version": "5.0.0", + "resolved": "https://registry.npmjs.org/xml-name-validator/-/xml-name-validator-5.0.0.tgz", + "integrity": "sha512-EvGK8EJ3DhaHfbRlETOWAS5pO9MZITeauHKJyb8wyajUfQUenkIg2MvLDTZ4T/TgIcm3HU0TFBgWWboAZ30UHg==", + "dev": true, + "license": "Apache-2.0", + "engines": { + "node": ">=18" + } + }, + "node_modules/xmlchars": { + "version": "2.2.0", + "resolved": "https://registry.npmjs.org/xmlchars/-/xmlchars-2.2.0.tgz", + "integrity": "sha512-JZnDKK8B0RCDw84FNdDAIpZK+JuJw+s7Lz8nksI7SIuU3UXJJslUthsi+uWBUYOwPFwW7W7PRLRfUKpxjtjFCw==", + "dev": true, + "license": "MIT" } } } diff --git a/frontend/package.json b/frontend/package.json index 7f962d7..d3a8a8f 100644 --- a/frontend/package.json +++ b/frontend/package.json @@ -1,6 +1,6 @@ { "name": "asi-aggregator-frontend", - "version": "1.1.0", + "version": "1.1.01", "description": "Frontend UI for MOTO S.T.E.M. Mathematics Variant - Autonomous ASI Research System for Novel S.T.E.M. Mathematical Paper Generation", "author": "Intrafere LLC", "license": "MIT", @@ -8,7 +8,9 @@ "scripts": { "dev": "vite", "build": "vite build", - "preview": "vite preview" + "preview": "vite preview", + "test": "vitest run", + "test:watch": "vitest" }, "dependencies": { "dompurify": "^3.2.4", @@ -17,9 +19,14 @@ "react-dom": "^18.2.0" }, "devDependencies": { + "@testing-library/jest-dom": "^6.9.1", + "@testing-library/react": "^16.3.2", + "@testing-library/user-event": "^14.6.1", "@types/react": "^18.2.43", "@types/react-dom": "^18.2.17", - "@vitejs/plugin-react": "^4.2.1", - "vite": "^7.1.12" + "@vitejs/plugin-react": "^6.0.2", + "jsdom": "^29.1.1", + "vite": "^8.0.16", + "vitest": "^4.1.8" } } diff --git a/frontend/src/App.jsx b/frontend/src/App.jsx index 0e0c774..7c23ab3 100644 --- a/frontend/src/App.jsx +++ b/frontend/src/App.jsx @@ -32,6 +32,11 @@ import WorkflowPanel from './components/WorkflowPanel'; import BoostControlModal from './components/BoostControlModal'; import StartupProviderSetupModal from './components/StartupProviderSetupModal'; import OpenRouterApiKeyModal from './components/OpenRouterApiKeyModal'; +import ConnectivityPanel from './components/ConnectivityPanel'; +import SyntheticLib4AccessModal from './components/SyntheticLib4AccessModal'; +import WolframAlphaAccessModal from './components/WolframAlphaAccessModal'; +import LMStudioConnectivityModal from './components/LMStudioConnectivityModal'; +import AgentConversationMemoryModal from './components/AgentConversationMemoryModal'; import OpenRouterPrivacyWarningModal from './components/OpenRouterPrivacyWarningModal'; import CritiqueNotificationStack from './components/CritiqueNotificationStack'; import ProofNotificationStack from './components/autonomous/ProofNotificationStack'; @@ -40,7 +45,7 @@ import CodexOAuthNotificationStack from './components/CodexOAuthNotificationStac import UpdateNotificationBanner from './components/UpdateNotificationBanner'; import PaperCritiqueModal from './components/PaperCritiqueModal'; import { websocket } from './services/websocket'; -import { api, autonomousAPI, cloudAccessAPI, compilerAPI, leanojAPI, openRouterAPI } from './services/api'; +import { api, autonomousAPI, cloudAccessAPI, compilerAPI, connectivityAPI, leanojAPI, openRouterAPI } from './services/api'; import { LM_STUDIO_STARTUP_CHOICE, RECOMMENDED_PROFILE_KEY, @@ -60,8 +65,23 @@ import { DEFAULT_MAX_OUTPUT_TOKENS, } from './utils/openRouterSelection'; import { CLOUD_ACCESS_PROVIDERS, isCloudAccessProvider } from './utils/oauthProviders'; +import { + formatContextOverflowActivityMessage, + formatAssistantProofPackEventMessage, + buildRejectionFeedbackNoticeActivity, + hasRecentAssistantProofPackDuplicate, + shouldAddRejectionFeedbackNotice, +} from './utils/activityStyles'; +import { + canStorePromptDraftInLocalStorage, + readPromptDraftSync, + removePromptDraft, + savePromptDraft, +} from './utils/promptDraftStorage'; +import { readBooleanStorage } from './utils/safeStorage'; const DEVELOPER_MODE_STORAGE_KEY = 'developerModeSettingsEnabled'; +const AGGREGATOR_PROMPT_STORAGE_KEY = 'aggregator_user_prompt'; const DEPRECATED_SCREEN_STATE_STORAGE_KEYS = [ 'appMode', 'singlePaperWriterExpanded', @@ -71,11 +91,18 @@ const DEPRECATED_SCREEN_STATE_STORAGE_KEYS = [ 'leanojActiveTab', ]; const EMBEDDING_MODEL_HINTS = ['embed', 'embedding', 'nomic', 'bge', 'e5', 'gte']; -const AUTONOMOUS_ROLE_PREFIXES = ['validator', 'high_context', 'high_param', 'critique_submitter']; +const AUTONOMOUS_ROLE_PREFIXES = ['validator', 'assistant', 'writer', 'high_param']; const HIGH_SCORE_CRITIQUE_THRESHOLD = 6.25; const SEEN_HIGH_SCORE_CRITIQUES_STORAGE_KEY = 'seenHighScoreCritiqueNotifications'; +const DISMISSED_OAUTH_PROVIDER_NOTIFICATIONS_STORAGE_KEY = 'dismissedOAuthProviderNotifications'; const MAX_SEEN_HIGH_SCORE_CRITIQUES = 500; +const MAX_DISMISSED_OAUTH_PROVIDER_NOTIFICATIONS = 500; const MAX_LIVE_ACTIVITY_EVENTS = 5000; +const AUTONOMOUS_LIVE_ACTIVITY_STORAGE_KEY = 'autonomous_live_activity_events'; +const LEANOJ_LIVE_ACTIVITY_STORAGE_KEY = 'leanoj_live_activity_events'; +const MAX_PERSISTED_ACTIVITY_STRING_LENGTH = 2000; +const MAX_PERSISTED_ACTIVITY_ARRAY_ITEMS = 20; +const MAX_PERSISTED_ACTIVITY_OBJECT_KEYS = 60; const MAX_PROOF_NOTIFICATIONS = 20; const UPDATE_NOTICE_POLL_INTERVAL_MS = 4 * 60 * 60 * 1000; const DEFAULT_CAPABILITIES = Object.freeze({ @@ -84,6 +111,7 @@ const DEFAULT_CAPABILITIES = Object.freeze({ pdfDownloadAvailable: true, openAICodexOauthAvailable: true, xaiGrokOauthAvailable: true, + sakanaFuguAvailable: true, version: '', buildCommit: '', updateChannel: 'main', @@ -158,6 +186,129 @@ function persistSeenHighScoreCritiques(seenSet) { } } +function readDismissedOAuthProviderNotifications() { + if (typeof window === 'undefined') { + return new Set(); + } + + try { + const raw = window.localStorage.getItem(DISMISSED_OAUTH_PROVIDER_NOTIFICATIONS_STORAGE_KEY); + const values = raw ? JSON.parse(raw) : []; + return new Set(Array.isArray(values) ? values.filter(value => typeof value === 'string') : []); + } catch (error) { + console.warn('Could not read dismissed OAuth provider notifications:', error); + return new Set(); + } +} + +function persistDismissedOAuthProviderNotifications(seenSet) { + if (typeof window === 'undefined') { + return; + } + + try { + const values = Array.from(seenSet).slice(-MAX_DISMISSED_OAUTH_PROVIDER_NOTIFICATIONS); + window.localStorage.setItem(DISMISSED_OAUTH_PROVIDER_NOTIFICATIONS_STORAGE_KEY, JSON.stringify(values)); + } catch (error) { + console.warn('Could not save dismissed OAuth provider notifications:', error); + } +} + +function getProviderNotificationKey(data = {}) { + if (data.notification_key) { + return String(data.notification_key); + } + const provider = data.provider || 'oauth'; + const roleId = data.role_id || provider; + const reason = data.reason || 'provider_error'; + const model = data.model || '*'; + return [provider, roleId, reason, model] + .map(value => String(value || '').replace(/:/g, '_')) + .join(':'); +} + +function buildOAuthProviderNotification(data = {}, fallbackProvider = 'oauth') { + const provider = data.provider || fallbackProvider; + const roleId = data.role_id || provider; + const reason = data.reason || 'unrecoverable_oauth_error'; + const notificationKey = getProviderNotificationKey({ ...data, provider, role_id: roleId, reason }); + const isCodex = provider === 'openai_codex_oauth'; + return { + id: `oauth_provider_${notificationKey}`, + notification_key: notificationKey, + provider, + provider_label: data.provider_label || (isCodex ? 'OpenAI Codex' : 'OAuth provider'), + role_id: roleId, + model: data.model, + reason, + message: data.message || `Check your ${isCodex ? 'OpenAI Codex' : 'OAuth provider'} connection, sign in again, and retry.`, + timestamp: data.created_at || data._serverTimestamp || data.timestamp || new Date().toISOString(), + }; +} + +function addOAuthProviderNotification(setNotifications, data = {}, fallbackProvider = 'oauth') { + const notification = buildOAuthProviderNotification(data, fallbackProvider); + const dismissedNotifications = readDismissedOAuthProviderNotifications(); + if ( + dismissedNotifications.has(notification.notification_key) + || (data.id && dismissedNotifications.has(String(data.id))) + ) { + return; + } + setNotifications(prev => { + if (prev.some(item => item.notification_key === notification.notification_key)) { + return prev; + } + return [...prev, notification].slice(-3); + }); +} + +function truncateOAuthActivityDetail(value, maxChars = 1800) { + const text = String(value || '').replace(/\s+/g, ' ').trim(); + if (!text) return ''; + if (text.length <= maxChars) return text; + if (maxChars <= 3) return text.slice(0, maxChars); + return `${text.slice(0, maxChars - 3)}...`; +} + +function buildOAuthActivityMessage(data = {}, fallbackProviderLabel = 'OAuth provider') { + const providerLabel = data.provider_label || fallbackProviderLabel; + const roleId = data.role_id || 'a role'; + const isOAuthProvider = providerLabel.toLowerCase().includes('oauth') + || data.provider === 'openai_codex_oauth' + || data.provider === 'xai_grok_oauth'; + const providerKind = isOAuthProvider ? 'OAuth' : 'provider'; + const repairHint = isOAuthProvider + ? 'check your OAuth connection and sign in again' + : 'check your provider connection or key'; + if (data.reason === 'usage_limit_reached') { + return buildOAuthUsageLimitActivityMessage(data, providerLabel); + } + const detail = truncateOAuthActivityDetail( + data.oauth_error_message || data.error_message || data.error_summary, + ); + if (detail) { + return `${providerLabel} ${providerKind} failed for ${roleId}: ${detail}`; + } + return `${providerLabel} ${providerKind} failed for ${roleId}; ${repairHint}.`; +} + +function buildOAuthUsageLimitActivityMessage(data = {}, fallbackProviderLabel = 'OAuth provider') { + const providerLabel = data.provider_label || fallbackProviderLabel; + const roleId = data.role_id || 'a role'; + const resetsIn = Number(data.resets_in_seconds); + const resetText = Number.isFinite(resetsIn) && resetsIn > 0 + ? ` Provider reset in about ${Math.max(1, Math.ceil(resetsIn / 60))} minute(s).` + : ''; + const fallbackText = data.fallback_model + ? ` Using LM Studio fallback (${data.fallback_model}) until reset.` + : ' Roles without fallback will wait until the provider reset.'; + if (data.message) { + return String(data.message); + } + return `${providerLabel} usage limit reached for ${roleId}.${fallbackText}${resetText}`; +} + const createDefaultAggregatorSubmitterConfigs = () => ( [1, 2, 3].map((submitterId) => ({ submitterId, @@ -172,6 +323,48 @@ const createDefaultAggregatorSubmitterConfigs = () => ( })) ); +function readAggregatorStorage(key) { + try { + const raw = localStorage.getItem(key); + return raw ? JSON.parse(raw) : null; + } catch (error) { + console.error(`Failed to parse ${key}:`, error); + return null; + } +} + +function buildAggregatorConfigFromStorage() { + const settings = readAggregatorStorage('aggregator_settings') || {}; + const legacy = readAggregatorStorage('aggregatorConfig') || {}; + const promptDraft = readPromptDraftSync(AGGREGATOR_PROMPT_STORAGE_KEY); + const promptSource = typeof settings.userPrompt === 'string' && settings.userPrompt.trim().length > 0 + ? settings + : legacy; + + return { + userPrompt: promptDraft || promptSource.userPrompt || '', + submitterConfigs: settings.submitterConfigs || legacy.submitterConfigs || createDefaultAggregatorSubmitterConfigs(), + validatorModel: settings.validatorModel || legacy.validatorModel || '', + validatorProvider: settings.validatorProvider || legacy.validatorProvider || 'lm_studio', + validatorOpenrouterProvider: settings.validatorOpenrouterProvider || legacy.validatorOpenrouterProvider || null, + validatorOpenrouterReasoningEffort: settings.validatorOpenrouterReasoningEffort || legacy.validatorOpenrouterReasoningEffort || 'auto', + validatorLmStudioFallback: settings.validatorLmStudioFallback || legacy.validatorLmStudioFallback || null, + validatorContextSize: settings.validatorContextSize ?? legacy.validatorContextSize ?? DEFAULT_CONTEXT_WINDOW, + validatorMaxOutput: settings.validatorMaxOutput ?? legacy.validatorMaxOutput ?? DEFAULT_MAX_OUTPUT_TOKENS, + validatorSuperchargeEnabled: Boolean(settings.validatorSuperchargeEnabled ?? legacy.validatorSuperchargeEnabled), + assistantModel: settings.assistantModel ?? legacy.assistantModel ?? '', + assistantProvider: settings.assistantProvider ?? legacy.assistantProvider ?? settings.validatorProvider ?? legacy.validatorProvider ?? 'lm_studio', + assistantOpenrouterProvider: settings.assistantOpenrouterProvider ?? legacy.assistantOpenrouterProvider ?? null, + assistantOpenrouterReasoningEffort: settings.assistantOpenrouterReasoningEffort ?? legacy.assistantOpenrouterReasoningEffort ?? settings.validatorOpenrouterReasoningEffort ?? legacy.validatorOpenrouterReasoningEffort ?? 'auto', + assistantLmStudioFallback: settings.assistantLmStudioFallback ?? legacy.assistantLmStudioFallback ?? null, + assistantContextSize: settings.assistantContextSize ?? legacy.assistantContextSize ?? settings.validatorContextSize ?? legacy.validatorContextSize ?? DEFAULT_CONTEXT_WINDOW, + assistantMaxOutput: settings.assistantMaxOutput ?? legacy.assistantMaxOutput ?? settings.validatorMaxOutput ?? legacy.validatorMaxOutput ?? DEFAULT_MAX_OUTPUT_TOKENS, + assistantSuperchargeEnabled: Boolean(settings.assistantSuperchargeEnabled ?? legacy.assistantSuperchargeEnabled), + creativityEmphasisBoostEnabled: Boolean(settings.creativityEmphasisBoostEnabled ?? legacy.creativityEmphasisBoostEnabled), + uploadedFiles: [], + }; +} + function normalizeLoadedLmStudioModelId(modelId = '') { return String(modelId).replace(/:\d+$/, ''); } @@ -220,6 +413,9 @@ function normalizeFeaturesPayload(payload = {}) { xaiGrokOauthAvailable: payload.xai_grok_oauth_available === undefined ? !genericMode : payload.xai_grok_oauth_available !== false, + sakanaFuguAvailable: payload.sakana_fugu_available === undefined + ? !genericMode + : payload.sakana_fugu_available !== false, version: payload.version || '', buildCommit: payload.build_commit || '', updateChannel: payload.update_channel || 'main', @@ -248,6 +444,9 @@ function normalizeRuntimeModelConfig(config = {}, lmStudioEnabled) { function normalizeAggregatorConfigForCapabilities(config, lmStudioEnabled) { const originalValidatorProvider = config.validatorProvider || 'lm_studio'; const shouldResetValidator = !lmStudioEnabled && originalValidatorProvider !== 'openrouter'; + const originalAssistantProvider = config.assistantProvider || config.validatorProvider || 'lm_studio'; + const shouldResetAssistant = !lmStudioEnabled && originalAssistantProvider !== 'openrouter'; + const hasAssistantModelField = Object.prototype.hasOwnProperty.call(config, 'assistantModel'); return { ...config, @@ -261,6 +460,15 @@ function normalizeAggregatorConfigForCapabilities(config, lmStudioEnabled) { : (config.validatorOpenrouterProvider || null), validatorOpenrouterReasoningEffort: config.validatorOpenrouterReasoningEffort || 'auto', validatorLmStudioFallback: lmStudioEnabled ? (config.validatorLmStudioFallback || null) : null, + assistantProvider: normalizeRuntimeProvider(config.assistantProvider || config.validatorProvider, lmStudioEnabled), + assistantModel: shouldResetAssistant + ? '' + : (hasAssistantModelField ? (config.assistantModel || '') : (config.validatorModel || '')), + assistantOpenrouterProvider: shouldResetAssistant + ? null + : (config.assistantOpenrouterProvider || null), + assistantOpenrouterReasoningEffort: config.assistantOpenrouterReasoningEffort || config.validatorOpenrouterReasoningEffort || 'auto', + assistantLmStudioFallback: lmStudioEnabled ? (config.assistantLmStudioFallback || null) : null, }; } @@ -292,6 +500,110 @@ function normalizeAutonomousConfigForCapabilities(config, lmStudioEnabled) { return nextConfig; } +function coercePositiveIntegerSetting(value, fallback) { + const text = String(value ?? '').trim(); + if (!/^\d+$/.test(text)) { + return fallback; + } + const parsed = Number(text); + if (Number.isSafeInteger(parsed) && parsed > 0) { + return parsed; + } + return fallback; +} + +function readPersistedLiveActivity(storageKey) { + try { + const savedEvents = localStorage.getItem(storageKey); + if (!savedEvents) { + return []; + } + const parsed = JSON.parse(savedEvents); + return Array.isArray(parsed) + ? parsed.filter((event) => event && typeof event === 'object').slice(-MAX_LIVE_ACTIVITY_EVENTS) + : []; + } catch (error) { + console.error(`Failed to load ${storageKey}:`, error); + return []; + } +} + +function compactPersistedActivityValue(value, depth = 0) { + if (value == null || typeof value === 'number' || typeof value === 'boolean') { + return value; + } + if (typeof value === 'string') { + return value.length > MAX_PERSISTED_ACTIVITY_STRING_LENGTH + ? `${value.slice(0, MAX_PERSISTED_ACTIVITY_STRING_LENGTH)}...` + : value; + } + if (depth >= 3) { + return '[omitted]'; + } + if (Array.isArray(value)) { + return value + .slice(0, MAX_PERSISTED_ACTIVITY_ARRAY_ITEMS) + .map((item) => compactPersistedActivityValue(item, depth + 1)); + } + if (typeof value === 'object') { + return Object.fromEntries( + Object.entries(value) + .slice(0, MAX_PERSISTED_ACTIVITY_OBJECT_KEYS) + .map(([key, nestedValue]) => [key, compactPersistedActivityValue(nestedValue, depth + 1)]) + ); + } + return String(value); +} + +function compactLiveActivityEvent(event) { + if (!event || typeof event !== 'object') { + return null; + } + return { + event: event.event || event.type || '', + type: event.type, + timestamp: event.timestamp || event.fullTimestamp || '', + fullTimestamp: event.fullTimestamp, + message: typeof event.message === 'string' + ? compactPersistedActivityValue(event.message) + : '', + data: compactPersistedActivityValue(event.data || {}), + }; +} + +function persistLiveActivity(storageKey, events) { + let lastError = null; + try { + const boundedEvents = Array.isArray(events) + ? events + .slice(-MAX_LIVE_ACTIVITY_EVENTS) + .map(compactLiveActivityEvent) + .filter(Boolean) + : []; + if (boundedEvents.length === 0) { + localStorage.removeItem(storageKey); + return; + } + + for (let limit = boundedEvents.length; limit > 0; limit = Math.floor(limit / 2)) { + try { + localStorage.setItem(storageKey, JSON.stringify(boundedEvents.slice(-limit))); + return; + } catch (error) { + lastError = error; + } + } + + localStorage.removeItem(storageKey); + } catch (error) { + lastError = error; + } + + if (lastError) { + console.error(`Failed to save ${storageKey}:`, lastError); + } +} + function App() { const [appMode, setAppMode] = useState('autonomous'); const [autonomousActiveTab, setAutonomousActiveTab] = useState('auto-interface'); @@ -303,21 +615,22 @@ function App() { : appMode === 'leanoj' ? leanojActiveTab : autonomousActiveTab; - const shimmerAccentsEnabled = (() => { - const saved = localStorage.getItem('banner_shimmer_enabled'); - return saved !== null ? JSON.parse(saved) : true; - })(); + const shimmerAccentsEnabled = readBooleanStorage('banner_shimmer_enabled', true); // Models list (fetched from API) const [models, setModels] = useState([]); // Boost modal state const [showBoostModal, setShowBoostModal] = useState(false); - const [showApiBoostTooltip, setShowApiBoostTooltip] = useState(false); // OpenRouter API Key modal state const [showOpenRouterKeyModal, setShowOpenRouterKeyModal] = useState(false); const [openRouterKeyReason, setOpenRouterKeyReason] = useState('setup'); + const [showSyntheticLib4Modal, setShowSyntheticLib4Modal] = useState(false); + const [showWolframModal, setShowWolframModal] = useState(false); + const [showLmStudioModal, setShowLmStudioModal] = useState(false); + const [showAgentMemoryModal, setShowAgentMemoryModal] = useState(false); + const [connectivityStatus, setConnectivityStatus] = useState(null); // LM Studio availability state (for determining default provider) const [lmStudioAvailable, setLmStudioAvailable] = useState(true); @@ -443,74 +756,44 @@ function App() { // Initialize config from localStorage or use defaults // CRITICAL: Read from 'aggregator_settings' (used by AggregatorSettings component) const [config, setConfig] = useState(() => { - // Try to load from the settings component key first - const settingsConfig = localStorage.getItem('aggregator_settings'); - if (settingsConfig) { - try { - const settings = JSON.parse(settingsConfig); - return { - userPrompt: settings.userPrompt || '', - submitterConfigs: settings.submitterConfigs || createDefaultAggregatorSubmitterConfigs(), - validatorModel: settings.validatorModel || '', - validatorProvider: settings.validatorProvider || 'lm_studio', - validatorOpenrouterProvider: settings.validatorOpenrouterProvider || null, - validatorOpenrouterReasoningEffort: settings.validatorOpenrouterReasoningEffort || 'auto', - validatorLmStudioFallback: settings.validatorLmStudioFallback || null, - validatorContextSize: settings.validatorContextSize ?? DEFAULT_CONTEXT_WINDOW, - validatorMaxOutput: settings.validatorMaxOutput ?? DEFAULT_MAX_OUTPUT_TOKENS, - validatorSuperchargeEnabled: Boolean(settings.validatorSuperchargeEnabled), - creativityEmphasisBoostEnabled: Boolean(settings.creativityEmphasisBoostEnabled), - uploadedFiles: [], - }; - } catch (e) { - console.error('Failed to parse aggregator_settings:', e); - } - } - - // Fallback to old key for backward compatibility - const savedConfig = localStorage.getItem('aggregatorConfig'); - if (savedConfig) { + return buildAggregatorConfigFromStorage(); + }); + const aggregatorPromptClearedRef = useRef(false); + + useEffect(() => { + let cancelled = false; + const hydrateAggregatorPrompt = async () => { try { - const parsed = JSON.parse(savedConfig); - return { - userPrompt: parsed.userPrompt || '', - submitterConfigs: parsed.submitterConfigs || createDefaultAggregatorSubmitterConfigs(), - validatorModel: parsed.validatorModel || '', - validatorProvider: parsed.validatorProvider || 'lm_studio', - validatorOpenrouterProvider: parsed.validatorOpenrouterProvider || null, - validatorOpenrouterReasoningEffort: parsed.validatorOpenrouterReasoningEffort || 'auto', - validatorLmStudioFallback: parsed.validatorLmStudioFallback || null, - validatorContextSize: parsed.validatorContextSize ?? DEFAULT_CONTEXT_WINDOW, - validatorMaxOutput: parsed.validatorMaxOutput ?? DEFAULT_MAX_OUTPUT_TOKENS, - validatorSuperchargeEnabled: Boolean(parsed.validatorSuperchargeEnabled), - creativityEmphasisBoostEnabled: Boolean(parsed.creativityEmphasisBoostEnabled), - uploadedFiles: [], - }; - } catch (e) { - console.error('Failed to parse saved config:', e); + const data = await api.getAggregatorPrompt(); + const persistedPrompt = data?.prompt || ''; + if (!persistedPrompt.trim() || cancelled || aggregatorPromptClearedRef.current) { + return; + } + setConfig((prev) => ( + prev.userPrompt?.trim() + ? prev + : { ...prev, userPrompt: persistedPrompt } + )); + } catch (error) { + console.debug('Could not hydrate manual Aggregator prompt:', error); } - } - return { - userPrompt: '', - submitterConfigs: createDefaultAggregatorSubmitterConfigs(), - validatorModel: '', - validatorProvider: 'lm_studio', - validatorOpenrouterProvider: null, - validatorOpenrouterReasoningEffort: 'auto', - validatorLmStudioFallback: null, - validatorContextSize: DEFAULT_CONTEXT_WINDOW, - validatorMaxOutput: DEFAULT_MAX_OUTPUT_TOKENS, - validatorSuperchargeEnabled: false, - creativityEmphasisBoostEnabled: false, - uploadedFiles: [], }; - }); + + hydrateAggregatorPrompt(); + return () => { + cancelled = true; + }; + }, []); // Save config to localStorage whenever it changes (excluding transient data) // CRITICAL: Save to BOTH keys to maintain backward compatibility useEffect(() => { + savePromptDraft(AGGREGATOR_PROMPT_STORAGE_KEY, config.userPrompt); + const localStoragePrompt = canStorePromptDraftInLocalStorage(config.userPrompt) + ? config.userPrompt + : ''; const configToSave = { - userPrompt: config.userPrompt, + userPrompt: localStoragePrompt, submitterConfigs: config.submitterConfigs, validatorModel: config.validatorModel, validatorProvider: config.validatorProvider, @@ -520,18 +803,32 @@ function App() { validatorContextSize: config.validatorContextSize, validatorMaxOutput: config.validatorMaxOutput, validatorSuperchargeEnabled: config.validatorSuperchargeEnabled, + assistantModel: config.assistantModel, + assistantProvider: config.assistantProvider, + assistantOpenrouterProvider: config.assistantOpenrouterProvider, + assistantOpenrouterReasoningEffort: config.assistantOpenrouterReasoningEffort, + assistantLmStudioFallback: config.assistantLmStudioFallback, + assistantContextSize: config.assistantContextSize, + assistantMaxOutput: config.assistantMaxOutput, + assistantSuperchargeEnabled: config.assistantSuperchargeEnabled, creativityEmphasisBoostEnabled: config.creativityEmphasisBoostEnabled, }; - // Save to both old and new keys - localStorage.setItem('aggregatorConfig', JSON.stringify(configToSave)); - localStorage.setItem('aggregator_settings', JSON.stringify(configToSave)); - }, [config.userPrompt, config.submitterConfigs, config.validatorModel, config.validatorProvider, config.validatorOpenrouterProvider, config.validatorOpenrouterReasoningEffort, config.validatorLmStudioFallback, config.validatorContextSize, config.validatorMaxOutput, config.validatorSuperchargeEnabled, config.creativityEmphasisBoostEnabled]); + try { + // Save to both old and new keys. + localStorage.setItem('aggregatorConfig', JSON.stringify(configToSave)); + localStorage.setItem('aggregator_settings', JSON.stringify(configToSave)); + } catch (error) { + console.warn('Could not persist Aggregator settings to localStorage:', error); + } + }, [config.userPrompt, config.submitterConfigs, config.validatorModel, config.validatorProvider, config.validatorOpenrouterProvider, config.validatorOpenrouterReasoningEffort, config.validatorLmStudioFallback, config.validatorContextSize, config.validatorMaxOutput, config.validatorSuperchargeEnabled, config.assistantModel, config.assistantProvider, config.assistantOpenrouterProvider, config.assistantOpenrouterReasoningEffort, config.assistantLmStudioFallback, config.assistantContextSize, config.assistantMaxOutput, config.assistantSuperchargeEnabled, config.creativityEmphasisBoostEnabled]); // Autonomous mode state const [autonomousRunning, setAutonomousRunning] = useState(false); const [autonomousStopping, setAutonomousStopping] = useState(false); const [autonomousStatus, setAutonomousStatus] = useState(null); - const [autonomousActivity, setAutonomousActivity] = useState([]); + const [autonomousActivity, setAutonomousActivity] = useState(() => ( + readPersistedLiveActivity(AUTONOMOUS_LIVE_ACTIVITY_STORAGE_KEY) + )); const [brainstorms, setBrainstorms] = useState([]); const [papers, setPapers] = useState([]); const [autonomousStats, setAutonomousStats] = useState(null); @@ -539,7 +836,9 @@ function App() { // LeanOJ mode state const [leanojRunning, setLeanojRunning] = useState(false); const [leanojStatus, setLeanojStatus] = useState(null); - const [leanojActivity, setLeanojActivity] = useState([]); + const [leanojActivity, setLeanojActivity] = useState(() => ( + readPersistedLiveActivity(LEANOJ_LIVE_ACTIVITY_STORAGE_KEY) + )); const [leanojSettings, setLeanojSettings] = useState(() => getStoredLeanOJSettings()); const [leanojProofRefreshToken, setLeanojProofRefreshToken] = useState(0); @@ -570,6 +869,14 @@ function App() { const [creditExhaustionNotifications, setCreditExhaustionNotifications] = useState([]); const [codexOAuthNotifications, setCodexOAuthNotifications] = useState([]); + useEffect(() => { + persistLiveActivity(AUTONOMOUS_LIVE_ACTIVITY_STORAGE_KEY, autonomousActivity); + }, [autonomousActivity]); + + useEffect(() => { + persistLiveActivity(LEANOJ_LIVE_ACTIVITY_STORAGE_KEY, leanojActivity); + }, [leanojActivity]); + // Live refs used by websocket listeners (which are registered once) const autonomousRunningRef = useRef(autonomousRunning); const autonomousTierRef = useRef(autonomousStatus?.current_tier || null); @@ -622,31 +929,43 @@ function App() { validator_provider: autonomousConfig.validator_provider, validator_model: autonomousConfig.validator_model, validator_openrouter_provider: autonomousConfig.validator_openrouter_provider, + validator_openrouter_reasoning_effort: autonomousConfig.validator_openrouter_reasoning_effort, validator_lm_studio_fallback: autonomousConfig.validator_lm_studio_fallback, validator_context_window: autonomousConfig.validator_context_window, validator_max_tokens: autonomousConfig.validator_max_tokens, validator_supercharge_enabled: autonomousConfig.validator_supercharge_enabled, - high_context_provider: autonomousConfig.high_context_provider, - high_context_model: autonomousConfig.high_context_model, - high_context_openrouter_provider: autonomousConfig.high_context_openrouter_provider, - high_context_lm_studio_fallback: autonomousConfig.high_context_lm_studio_fallback, - high_context_context_window: autonomousConfig.high_context_context_window, - high_context_max_tokens: autonomousConfig.high_context_max_tokens, - high_context_supercharge_enabled: autonomousConfig.high_context_supercharge_enabled, + assistant_provider: autonomousConfig.assistant_provider, + assistant_model: autonomousConfig.assistant_model, + assistant_openrouter_provider: autonomousConfig.assistant_openrouter_provider, + assistant_openrouter_reasoning_effort: autonomousConfig.assistant_openrouter_reasoning_effort, + assistant_lm_studio_fallback: autonomousConfig.assistant_lm_studio_fallback, + assistant_context_window: autonomousConfig.assistant_context_window, + assistant_max_tokens: autonomousConfig.assistant_max_tokens, + assistant_supercharge_enabled: autonomousConfig.assistant_supercharge_enabled, + writer_provider: autonomousConfig.writer_provider, + writer_model: autonomousConfig.writer_model, + writer_openrouter_provider: autonomousConfig.writer_openrouter_provider, + writer_openrouter_reasoning_effort: autonomousConfig.writer_openrouter_reasoning_effort, + writer_lm_studio_fallback: autonomousConfig.writer_lm_studio_fallback, + writer_context_window: autonomousConfig.writer_context_window, + writer_max_tokens: autonomousConfig.writer_max_tokens, + writer_supercharge_enabled: autonomousConfig.writer_supercharge_enabled, high_param_provider: autonomousConfig.high_param_provider, high_param_model: autonomousConfig.high_param_model, high_param_openrouter_provider: autonomousConfig.high_param_openrouter_provider, + high_param_openrouter_reasoning_effort: autonomousConfig.high_param_openrouter_reasoning_effort, high_param_lm_studio_fallback: autonomousConfig.high_param_lm_studio_fallback, high_param_context_window: autonomousConfig.high_param_context_window, high_param_max_tokens: autonomousConfig.high_param_max_tokens, high_param_supercharge_enabled: autonomousConfig.high_param_supercharge_enabled, - critique_submitter_provider: autonomousConfig.critique_submitter_provider, - critique_submitter_model: autonomousConfig.critique_submitter_model, - critique_submitter_openrouter_provider: autonomousConfig.critique_submitter_openrouter_provider, - critique_submitter_lm_studio_fallback: autonomousConfig.critique_submitter_lm_studio_fallback, - critique_submitter_context_window: autonomousConfig.critique_submitter_context_window, - critique_submitter_max_tokens: autonomousConfig.critique_submitter_max_tokens, - critique_submitter_supercharge_enabled: autonomousConfig.critique_submitter_supercharge_enabled, + critique_submitter_provider: autonomousConfig.high_param_provider, + critique_submitter_model: autonomousConfig.high_param_model, + critique_submitter_openrouter_provider: autonomousConfig.high_param_openrouter_provider, + critique_submitter_openrouter_reasoning_effort: autonomousConfig.high_param_openrouter_reasoning_effort, + critique_submitter_lm_studio_fallback: autonomousConfig.high_param_lm_studio_fallback, + critique_submitter_context_window: autonomousConfig.high_param_context_window, + critique_submitter_max_tokens: autonomousConfig.high_param_max_tokens, + critique_submitter_supercharge_enabled: autonomousConfig.high_param_supercharge_enabled, }, allowMathematicalProofs: autonomousConfig.allow_mathematical_proofs ?? existingSettings.allowMathematicalProofs ?? true, allowResearchPapers: autonomousConfig.allow_research_papers ?? existingSettings.allowResearchPapers ?? true, @@ -784,9 +1103,31 @@ function App() { }; }, []); + const refreshConnectivityStatus = useCallback(async () => { + try { + const status = await connectivityAPI.getStatus(); + setConnectivityStatus(status); + return status; + } catch (error) { + console.debug('Failed to fetch connectivity status:', error); + return null; + } + }, []); + useEffect(() => { syncProviderAvailability(); - }, [syncProviderAvailability]); + refreshConnectivityStatus(); + }, [syncProviderAvailability, refreshConnectivityStatus]); + + useEffect(() => { + const interval = setInterval(() => { + if (typeof document !== 'undefined' && document.visibilityState === 'hidden') { + return; + } + refreshConnectivityStatus(); + }, 15000); + return () => clearInterval(interval); + }, [refreshConnectivityStatus]); // Fetch update notices on mount, then every 4 hours until one is shown or dismissed. useEffect(() => { @@ -1014,6 +1355,35 @@ function App() { }; }, []); + useEffect(() => { + let cancelled = false; + + const hydrateProviderNotifications = async () => { + try { + const payload = await cloudAccessAPI.getProviderNotifications(); + if (cancelled) { + return; + } + (payload.notifications || []).forEach((notification) => { + addOAuthProviderNotification( + setCodexOAuthNotifications, + notification, + notification.provider || 'oauth', + ); + }); + } catch (error) { + console.warn('Failed to hydrate provider notifications:', error); + } + }; + + hydrateProviderNotifications(); + const unsubscribe = websocket.on('connected', hydrateProviderNotifications); + return () => { + cancelled = true; + unsubscribe(); + }; + }, []); + // Autonomous WebSocket event listeners useEffect(() => { const unsubscribers = []; @@ -1021,8 +1391,70 @@ function App() { // Helper to add activity with limit (prevents unbounded array growth causing UI freeze) // Helper to get timestamp from server or fallback to client time const getTimestamp = (data) => data?._serverTimestamp || new Date().toISOString(); + const isSameAssistantRecallTarget = (left = {}, right = {}) => ( + (left.workflow_mode || '') === (right.workflow_mode || '') && + (left.target_kind || '') === (right.target_kind || '') && + (left.workflow_phase || '') === (right.workflow_phase || '') && + (left.source_type || '') === (right.source_type || '') && + (left.source_id || '') === (right.source_id || '') + ); const addActivity = (event) => { - setAutonomousActivity(prev => [...prev, event].slice(-MAX_LIVE_ACTIVITY_EVENTS)); + setAutonomousActivity(prev => { + if (hasRecentAssistantProofPackDuplicate(prev, event.event, event.data || {}, event.timestamp)) { + return prev; + } + if (event.event === 'assistant_proof_pack_updated') { + const withoutPriorRecall = prev.filter((existing) => ( + existing.event !== 'assistant_proof_pack_updated' || + !isSameAssistantRecallTarget(existing.data || {}, event.data || {}) + )); + return [...withoutPriorRecall, event].slice(-MAX_LIVE_ACTIVITY_EVENTS); + } + return [...prev, event].slice(-MAX_LIVE_ACTIVITY_EVENTS); + }); + }; + const countTrailingRejections = (events) => { + let count = 0; + for (let index = events.length - 1; index >= 0; index -= 1) { + const eventName = events[index]?.event || events[index]?.type || ''; + if (eventName === 'rejection_feedback_notice') { + continue; + } + if (eventName.includes('rejected') || eventName === 'submission_rejected') { + count += 1; + continue; + } + break; + } + return count; + }; + const addActivityWithRejectionFeedbackNotice = (event) => { + setAutonomousActivity(prev => { + const observedConsecutiveRejections = countTrailingRejections(prev) + 1; + const nextEvents = [event]; + const shown = { first: false, tenth: false }; + for (let index = prev.length - 1; index >= 0; index -= 1) { + const eventName = prev[index]?.event || prev[index]?.type || ''; + if (eventName === 'auto_research_started' || eventName === 'auto_research_resumed') { + break; + } + if (eventName !== 'rejection_feedback_notice') { + continue; + } + if (Number(prev[index]?.data?.consecutive_rejections) >= 10) { + shown.tenth = true; + } else { + shown.first = true; + } + } + if (shouldAddRejectionFeedbackNotice(event.data || {}, observedConsecutiveRejections, shown)) { + nextEvents.push(buildRejectionFeedbackNoticeActivity(event.timestamp, { + ...(event.data || {}), + consecutive_rejections: observedConsecutiveRejections, + })); + } + return [...prev, ...nextEvents].slice(-MAX_LIVE_ACTIVITY_EVENTS); + }); }; const formatHungConnectionMessage = (data = {}) => { const model = data.model || 'model'; @@ -1156,7 +1588,7 @@ function App() { ? `Proof check complete: ${verified}/${data.total_candidates} candidates verified, ${novel} novel` : `Proof check complete: ${verified} verified`; const base = roundLabel ? `${roundLabel} complete: ${baseMessage}` : baseMessage; - const detail = formatReason(data.message, 220); + const detail = formatReason(data.message, 1800); return detail ? `${base} - ${detail}` : base; }; @@ -1227,10 +1659,10 @@ function App() { })); unsubscribers.push(websocket.on('topic_selection_rejected', (data) => { - addActivity({ + addActivityWithRejectionFeedbackNotice({ event: 'topic_selection_rejected', timestamp: getTimestamp(data), - message: `Topic selection rejected`, + message: `Topic selection rejected with feedback`, data }); })); @@ -1252,10 +1684,10 @@ function App() { if (!autonomousRunningRef.current) return; const modelName = data.submitter_model ? (data.submitter_model.split('/')[1] || data.submitter_model.substring(0, 15)) : 'N/A'; const creativityPrefix = data.creativity_emphasized ? '(Creativity Emphasized) ' : ''; - addActivity({ + addActivityWithRejectionFeedbackNotice({ event: 'submission_rejected', timestamp: getTimestamp(data), - message: `${creativityPrefix}Submitter ${data.submitter_id} [${modelName}]: ✗ REJECTED (total: ${data.total_rejections})`, + message: `${creativityPrefix}Submitter ${data.submitter_id} [${modelName}]: ✗ REJECTED WITH FEEDBACK (total: ${data.total_rejections})`, data }); })); @@ -1431,6 +1863,25 @@ function App() { }); })); + const handleAutonomousAssistantEvent = (eventName, data) => { + const workflowMode = String(data.workflow_mode || ''); + if (!['autonomous', 'aggregator', 'compiler'].includes(workflowMode)) { + return; + } + if (!autonomousRunningRef.current) { + return; + } + addActivity({ + event: eventName, + timestamp: getTimestamp(data), + message: formatAssistantProofPackEventMessage(eventName, data), + data + }); + }; + unsubscribers.push(websocket.on('assistant_proof_pack_updated', (data) => { + handleAutonomousAssistantEvent('assistant_proof_pack_updated', data); + })); + unsubscribers.push(websocket.on('proof_check_started', (data) => { setProofRefreshToken((prev) => prev + 1); })); @@ -1600,11 +2051,16 @@ function App() { }); })); - unsubscribers.push(websocket.on('auto_research_started', () => { - setAutonomousActivity([]); + unsubscribers.push(websocket.on('auto_research_started', (data = {}) => { setAutonomousRunning(true); setAnyWorkflowRunning(true); setAutonomousStopping(false); + addActivity({ + event: 'auto_research_started', + timestamp: getTimestamp(data), + message: 'Autonomous research started', + data + }); })); unsubscribers.push(websocket.on('auto_research_resumed', (data) => { @@ -1628,11 +2084,17 @@ function App() { }).catch(console.error); })); - unsubscribers.push(websocket.on('auto_research_stopped', () => { + unsubscribers.push(websocket.on('auto_research_stopped', (data = {}) => { setAutonomousRunning(false); setAutonomousStopping(false); setAnyWorkflowRunning(false); autonomousTierRef.current = null; + addActivity({ + event: 'auto_research_stopped', + timestamp: getTimestamp(data), + message: data.message || `Research stopped. Total: ${data.final_stats?.total_papers_completed || 0} papers`, + data + }); })); // Tier 3 events @@ -1862,6 +2324,24 @@ function App() { }); })); + unsubscribers.push(websocket.on('context_overflow_error', (data) => { + console.error('Context overflow:', data); + const message = formatContextOverflowActivityMessage(data); + const roleId = String(data?.role_id || '').toLowerCase(); + const workflowMode = String(data?.workflow_mode || '').toLowerCase(); + const event = { + event: 'context_overflow_error', + timestamp: getTimestamp(data), + message, + data + }; + if (workflowMode === 'leanoj' || roleId.startsWith('leanoj_')) { + addLeanOJActivityFromGlobalAlert(event); + } else if (autonomousRunningRef.current || shouldAddHungAlertToAutonomousFeed(data)) { + addActivity(event); + } + })); + unsubscribers.push(websocket.on('account_credits_exhausted', (data) => { console.error('Account credits exhausted:', data); addActivity({ @@ -1932,28 +2412,10 @@ function App() { addActivity({ event: 'openai_codex_oauth_error', timestamp: getTimestamp(data), - message: `OpenAI Codex OAuth failed for ${data.role_id || 'a role'}; check your OAuth connection and sign in again.`, - ...data - }); - setCodexOAuthNotifications(prev => { - const roleId = data.role_id || 'openai_codex_oauth'; - const reason = data.reason || 'unrecoverable_codex_error'; - const provider = data.provider || 'openai_codex_oauth'; - if (prev.some(n => n.provider === provider && n.role_id === roleId && n.reason === reason)) return prev; - return [ - ...prev, - { - id: `codex_oauth_${roleId}_${Date.now()}`, - provider, - provider_label: data.provider_label || 'OpenAI Codex', - role_id: roleId, - model: data.model, - reason, - message: data.message || 'Check your OpenAI Codex OAuth connection, sign in again, and retry.', - timestamp: getTimestamp(data) - } - ].slice(-3); + ...data, + message: buildOAuthActivityMessage(data, 'OpenAI Codex'), }); + addOAuthProviderNotification(setCodexOAuthNotifications, data, 'openai_codex_oauth'); })); unsubscribers.push(websocket.on('oauth_provider_error', (data) => { @@ -1962,28 +2424,33 @@ function App() { addActivity({ event: 'oauth_provider_error', timestamp: getTimestamp(data), - message: `${providerLabel} OAuth failed for ${data.role_id || 'a role'}; check your OAuth connection and sign in again.`, - ...data + ...data, + message: buildOAuthActivityMessage(data, providerLabel), }); - setCodexOAuthNotifications(prev => { - const provider = data.provider || 'oauth'; - const roleId = data.role_id || provider; - const reason = data.reason || 'unrecoverable_oauth_error'; - if (prev.some(n => n.provider === provider && n.role_id === roleId && n.reason === reason)) return prev; - return [ - ...prev, - { - id: `oauth_${provider}_${roleId}_${Date.now()}`, - provider, - provider_label: providerLabel, - role_id: roleId, - model: data.model, - reason, - message: data.message || `Check your ${providerLabel} OAuth connection, sign in again, and retry.`, - timestamp: getTimestamp(data) - } - ].slice(-3); + addOAuthProviderNotification(setCodexOAuthNotifications, data, data.provider || 'oauth'); + })); + + unsubscribers.push(websocket.on('sakana_fugu_error', (data) => { + console.error('Sakana Fugu error:', data); + addActivity({ + event: 'sakana_fugu_error', + timestamp: getTimestamp(data), + ...data, + message: buildOAuthActivityMessage(data, 'Sakana Fugu'), }); + addOAuthProviderNotification(setCodexOAuthNotifications, data, 'sakana_fugu'); + })); + + unsubscribers.push(websocket.on('oauth_provider_usage_limited', (data) => { + console.warn('OAuth provider usage limit:', data); + const providerLabel = data.provider_label || 'OpenAI Codex'; + addActivity({ + event: 'oauth_provider_usage_limited', + timestamp: getTimestamp(data), + ...data, + message: buildOAuthUsageLimitActivityMessage(data, providerLabel), + }); + addOAuthProviderNotification(setCodexOAuthNotifications, data, data.provider || 'openai_codex_oauth'); })); unsubscribers.push(websocket.on('leanoj_provider_paused', (data) => { @@ -2182,15 +2649,18 @@ function App() { ); }; const addLeanOJActivity = (event, data = {}, message = '') => { - setLeanojActivity(prev => [ - ...prev, - { - event, - timestamp: getTimestamp(data), - message: message || data.message || data.reasoning || data.decision || data.phase || 'Proof Solver update', - data, - }, - ].slice(-MAX_LIVE_ACTIVITY_EVENTS)); + const nextEvent = { + event, + timestamp: getTimestamp(data), + message: message || data.message || data.reasoning || data.decision || data.phase || 'Proof Solver update', + data, + }; + setLeanojActivity(prev => { + if (hasRecentAssistantProofPackDuplicate(prev, event, data, nextEvent.timestamp)) { + return prev; + } + return [...prev, nextEvent].slice(-MAX_LIVE_ACTIVITY_EVENTS); + }); }; const summarizeLeanOJText = (text = '', limit = 220) => { const cleaned = String(text || '').replace(/\s+/g, ' ').trim(); @@ -2330,10 +2800,14 @@ function App() { setLeanojRunning(true); addLeanOJActivity('leanoj_started', data, 'Proof Solver started'); }], + ['assistant_proof_pack_updated', (data) => { + if (data.workflow_mode !== 'leanoj') return; + addLeanOJActivity('assistant_proof_pack_updated', data, formatAssistantProofPackEventMessage('assistant_proof_pack_updated', data)); + }], ['leanoj_stopped', (data) => { setLeanojRunning(false); setAnyWorkflowRunning(false); - addLeanOJActivity('leanoj_stopped', data, 'Proof Solver stopped'); + addLeanOJActivity('leanoj_stopped', data, data?.message || 'Proof Solver stopped'); leanojAPI.getStatus().then(setLeanojStatus).catch(console.error); }], ['leanoj_status_updated', (data) => setLeanojStatus(data)], @@ -2503,11 +2977,12 @@ function App() { openrouter_provider: cfg.openrouterProvider || null, openrouter_reasoning_effort: cfg.openrouterReasoningEffort || 'auto', lm_studio_fallback_id: lmStudioEnabled ? (cfg.lmStudioFallbackId || null) : null, - context_window: cfg.contextWindow, - max_output_tokens: cfg.maxOutputTokens, + context_window: coercePositiveIntegerSetting(cfg.contextWindow, DEFAULT_CONTEXT_WINDOW), + max_output_tokens: coercePositiveIntegerSetting(cfg.maxOutputTokens, DEFAULT_MAX_OUTPUT_TOKENS), supercharge_enabled: superchargeAllowed && Boolean(cfg.superchargeEnabled || cfg.supercharge_enabled) })) || []; + const assistantMemoryEnabled = connectivityStatus?.skills?.agent_conversation_memory?.enabled !== false; await autonomousAPI.start({ user_research_prompt: researchPrompt, submitter_configs: submitterConfigs, @@ -2523,24 +2998,24 @@ function App() { validator_lm_studio_fallback: lmStudioEnabled ? autonomousConfig.validator_lm_studio_fallback : null, - validator_context_window: autonomousConfig.validator_context_window, - validator_max_tokens: autonomousConfig.validator_max_tokens, + validator_context_window: coercePositiveIntegerSetting(autonomousConfig.validator_context_window, DEFAULT_CONTEXT_WINDOW), + validator_max_tokens: coercePositiveIntegerSetting(autonomousConfig.validator_max_tokens, DEFAULT_MAX_OUTPUT_TOKENS), validator_supercharge_enabled: superchargeAllowed && Boolean(autonomousConfig.validator_supercharge_enabled), - // High-context submitter config with OpenRouter support - high_context_provider: normalizeRuntimeProvider( - autonomousConfig.high_context_provider, + // Writing submitter config with OpenRouter support + writer_provider: normalizeRuntimeProvider( + autonomousConfig.writer_provider, lmStudioEnabled ), - high_context_model: autonomousConfig.high_context_model, - high_context_openrouter_provider: autonomousConfig.high_context_openrouter_provider, - high_context_openrouter_reasoning_effort: autonomousConfig.high_context_openrouter_reasoning_effort || 'auto', - high_context_lm_studio_fallback: lmStudioEnabled - ? autonomousConfig.high_context_lm_studio_fallback + writer_model: autonomousConfig.writer_model, + writer_openrouter_provider: autonomousConfig.writer_openrouter_provider, + writer_openrouter_reasoning_effort: autonomousConfig.writer_openrouter_reasoning_effort || 'auto', + writer_lm_studio_fallback: lmStudioEnabled + ? autonomousConfig.writer_lm_studio_fallback : null, - high_context_context_window: autonomousConfig.high_context_context_window, - high_context_max_tokens: autonomousConfig.high_context_max_tokens, - high_context_supercharge_enabled: superchargeAllowed && Boolean(autonomousConfig.high_context_supercharge_enabled), - // High-param submitter config with OpenRouter support + writer_context_window: coercePositiveIntegerSetting(autonomousConfig.writer_context_window, DEFAULT_CONTEXT_WINDOW), + writer_max_tokens: coercePositiveIntegerSetting(autonomousConfig.writer_max_tokens, DEFAULT_MAX_OUTPUT_TOKENS), + writer_supercharge_enabled: superchargeAllowed && Boolean(autonomousConfig.writer_supercharge_enabled), + // Rigor & Proofs Submitter config with OpenRouter support high_param_provider: normalizeRuntimeProvider( autonomousConfig.high_param_provider, lmStudioEnabled @@ -2551,33 +3026,63 @@ function App() { high_param_lm_studio_fallback: lmStudioEnabled ? autonomousConfig.high_param_lm_studio_fallback : null, - high_param_context_window: autonomousConfig.high_param_context_window, - high_param_max_tokens: autonomousConfig.high_param_max_tokens, + high_param_context_window: coercePositiveIntegerSetting(autonomousConfig.high_param_context_window, DEFAULT_CONTEXT_WINDOW), + high_param_max_tokens: coercePositiveIntegerSetting(autonomousConfig.high_param_max_tokens, DEFAULT_MAX_OUTPUT_TOKENS), high_param_supercharge_enabled: superchargeAllowed && Boolean(autonomousConfig.high_param_supercharge_enabled), - // Critique submitter config with OpenRouter support + // Deprecated critique fields mirror Rigor & Proofs for compatibility. critique_submitter_provider: normalizeRuntimeProvider( - autonomousConfig.critique_submitter_provider, + autonomousConfig.high_param_provider, lmStudioEnabled ), - critique_submitter_model: autonomousConfig.critique_submitter_model, - critique_submitter_openrouter_provider: autonomousConfig.critique_submitter_openrouter_provider, - critique_submitter_openrouter_reasoning_effort: autonomousConfig.critique_submitter_openrouter_reasoning_effort || 'auto', + critique_submitter_model: autonomousConfig.high_param_model, + critique_submitter_openrouter_provider: autonomousConfig.high_param_openrouter_provider, + critique_submitter_openrouter_reasoning_effort: autonomousConfig.high_param_openrouter_reasoning_effort || 'auto', critique_submitter_lm_studio_fallback: lmStudioEnabled - ? autonomousConfig.critique_submitter_lm_studio_fallback + ? autonomousConfig.high_param_lm_studio_fallback : null, - critique_submitter_context_window: autonomousConfig.critique_submitter_context_window, - critique_submitter_max_tokens: autonomousConfig.critique_submitter_max_tokens, - critique_submitter_supercharge_enabled: superchargeAllowed && Boolean(autonomousConfig.critique_submitter_supercharge_enabled), + critique_submitter_context_window: coercePositiveIntegerSetting(autonomousConfig.high_param_context_window, DEFAULT_CONTEXT_WINDOW), + critique_submitter_max_tokens: coercePositiveIntegerSetting(autonomousConfig.high_param_max_tokens, DEFAULT_MAX_OUTPUT_TOKENS), + critique_submitter_supercharge_enabled: superchargeAllowed && Boolean(autonomousConfig.high_param_supercharge_enabled), + assistant_provider: assistantMemoryEnabled + ? normalizeRuntimeProvider( + autonomousConfig.assistant_provider || autonomousConfig.validator_provider, + lmStudioEnabled + ) + : normalizeRuntimeProvider(autonomousConfig.validator_provider, lmStudioEnabled), + assistant_model: assistantMemoryEnabled ? (autonomousConfig.assistant_model || autonomousConfig.validator_model) : '', + assistant_openrouter_provider: assistantMemoryEnabled + ? (autonomousConfig.assistant_openrouter_provider || autonomousConfig.validator_openrouter_provider) + : null, + assistant_openrouter_reasoning_effort: assistantMemoryEnabled + ? (autonomousConfig.assistant_openrouter_reasoning_effort || autonomousConfig.validator_openrouter_reasoning_effort || 'auto') + : 'auto', + assistant_lm_studio_fallback: assistantMemoryEnabled && lmStudioEnabled + ? (autonomousConfig.assistant_lm_studio_fallback || autonomousConfig.validator_lm_studio_fallback) + : null, + assistant_context_window: assistantMemoryEnabled + ? coercePositiveIntegerSetting( + autonomousConfig.assistant_context_window || autonomousConfig.validator_context_window, + DEFAULT_CONTEXT_WINDOW + ) + : 0, + assistant_max_tokens: assistantMemoryEnabled + ? coercePositiveIntegerSetting( + autonomousConfig.assistant_max_tokens || autonomousConfig.validator_max_tokens, + DEFAULT_MAX_OUTPUT_TOKENS + ) + : 0, + assistant_supercharge_enabled: assistantMemoryEnabled && superchargeAllowed && Boolean(autonomousConfig.assistant_supercharge_enabled), allow_mathematical_proofs: !capabilities.genericMode && (autonomousConfig.allow_mathematical_proofs ?? true), allow_research_papers: autonomousConfig.allow_research_papers ?? true, tier3_enabled: autonomousConfig.tier3_enabled ?? false }); setAutonomousRunning(true); setAutonomousStopping(false); - setAutonomousActivity([]); setAnyWorkflowRunning(true); + return true; } catch (error) { alert(`Failed to start autonomous research: ${error.details || error.message}`); + return false; } }; @@ -2602,7 +3107,7 @@ function App() { const handleAutonomousClear = async () => { if (!window.confirm('Clear all autonomous research data? This cannot be undone.')) { - return; + return false; } try { const result = await autonomousAPI.clear(); @@ -2619,10 +3124,12 @@ function App() { } else { alert('All autonomous research data cleared successfully.'); } + return true; } catch (error) { // Show detailed error message const errorMsg = error.details || error.message || 'Unknown error'; alert(`Failed to clear data:\n\n${errorMsg}\n\nThis may be due to Windows file locking. Try closing file explorer and any programs that may have files open, then try again.`); + return false; } }; @@ -2649,13 +3156,20 @@ function App() { brainstorm_validator: normalizeLeanOJRoleForCapabilities(request.brainstorm_validator), path_decider: normalizeLeanOJRoleForCapabilities(request.path_decider || request.final_solver), final_solver: normalizeLeanOJRoleForCapabilities(request.final_solver), + assistant: connectivityStatus?.skills?.agent_conversation_memory?.enabled === false + ? { + ...normalizeLeanOJRoleForCapabilities(request.topic_validator), + model_id: '', + openrouter_provider: null, + lm_studio_fallback_id: null, + } + : normalizeLeanOJRoleForCapabilities(request.assistant || request.topic_validator), }); const handleLeanOJStart = async (request) => { try { await leanojAPI.start(normalizeLeanOJRequestForCapabilities(request)); setLeanojRunning(true); - setLeanojActivity([]); const status = await leanojAPI.getStatus(); setLeanojStatus(status); setLeanojProofRefreshToken((prev) => prev + 1); @@ -2683,10 +3197,16 @@ function App() { } try { const result = await leanojAPI.clear(); + const nextSettings = persistLeanOJSettings({ + ...leanojSettings, + prompt: '', + leanTemplate: '', + }); setLeanojRunning(false); setAnyWorkflowRunning(false); setLeanojActivity([]); setLeanojStatus(result.status || null); + setLeanojSettings(nextSettings); setLeanojProofRefreshToken((prev) => prev + 1); } catch (error) { alert(`Failed to clear Proof Solver progress: ${error.message}`); @@ -2824,7 +3344,15 @@ function App() { }; const handleDismissCodexOAuthNotification = (notificationId) => { - setCodexOAuthNotifications(prev => prev.filter(n => n.id !== notificationId)); + setCodexOAuthNotifications(prev => { + const notification = prev.find(n => n.id === notificationId); + if (notification?.notification_key) { + const dismissed = readDismissedOAuthProviderNotifications(); + dismissed.add(notification.notification_key); + persistDismissedOAuthProviderNotifications(dismissed); + } + return prev.filter(n => n.id !== notificationId); + }); }; const handleOpenCloudAccessFromCodexNotification = () => { @@ -2920,7 +3448,7 @@ function App() { if (cloudAccessPresent) { setStartupSetupMessage( - 'OAuth login is saved, but OAuth providers are supplementary chat/model providers. Configure OpenRouter or confirm LM Studio for RAG embeddings before starting.' + 'Cloud provider access is saved, but direct role providers are supplementary chat/model providers. Configure OpenRouter or confirm LM Studio for RAG embeddings before starting.' ); setShowStartupSetupModal(true); return; @@ -3028,7 +3556,7 @@ function App() { cloudAccessJustConfiguredRef.current = false; setShowStartupSetupModal(true); setStartupSetupMessage( - 'OAuth login was saved. To start MOTO, also configure OpenRouter or confirm LM Studio so RAG embeddings are available.' + 'Cloud provider access was saved. To start MOTO, also configure OpenRouter or confirm LM Studio so RAG embeddings are available.' ); } }; @@ -3046,6 +3574,36 @@ function App() { setHasOpenRouterKey(true); setHasCloudAccess(true); console.log('OpenRouter API key set successfully'); + refreshConnectivityStatus(); + }; + + const handleConnectivityToggle = async (toggles) => { + try { + const nextStatus = await connectivityAPI.updateToggles(toggles); + setConnectivityStatus(nextStatus); + if (toggles.agent_conversation_memory_enabled === false) { + setConfig((prev) => ({ + ...prev, + assistantProvider: prev.validatorProvider || 'lm_studio', + assistantModel: '', + assistantOpenrouterProvider: null, + assistantOpenrouterReasoningEffort: prev.validatorOpenrouterReasoningEffort || 'auto', + assistantLmStudioFallback: null, + })); + setAutonomousConfig((prev) => ({ + ...prev, + assistant_provider: prev.validator_provider || 'lm_studio', + assistant_model: '', + assistant_openrouter_provider: null, + assistant_openrouter_reasoning_effort: prev.validator_openrouter_reasoning_effort || 'auto', + assistant_lm_studio_fallback: null, + })); + } + return nextStatus; + } catch (error) { + console.error('Failed to update connectivity toggles:', error); + return null; + } }; const mainTabs = [ @@ -3095,17 +3653,6 @@ function App() { } }, [autonomousConfig.tier3_enabled, autonomousActiveTab]); - // Sync with WorkflowPanel collapse state (stored in localStorage) - useEffect(() => { - const handleStorageChange = () => { - const savedState = localStorage.getItem('workflow_panel_collapsed'); - setWorkflowPanelCollapsed(savedState !== 'false'); - }; - - const interval = setInterval(handleStorageChange, 500); - return () => clearInterval(interval); - }, []); - // Check if any workflow is running useEffect(() => { const checkWorkflowStatus = async () => { @@ -3129,26 +3676,6 @@ function App() { return () => clearInterval(interval); }, []); - const cloudAccessChipClass = hasOpenRouterKey === true - ? 'header-status-chip--ready' - : hasCloudAccess === false - ? 'header-status-chip--inactive' - : 'header-status-chip--pending'; - const cloudAccessChipTitle = hasOpenRouterKey === true - ? 'OpenRouter is configured and can provide cloud models plus RAG embedding fallback.' - : hasCloudAccess === true - ? 'OAuth login is configured for model roles. Add OpenRouter or use LM Studio embeddings before starting workflows.' - : hasCloudAccess === false - ? 'Configure Cloud Access & Keys' - : 'Checking cloud access status...'; - const cloudAccessChipLabel = hasOpenRouterKey === true - ? 'Cloud Access & Keys ✓' - : hasCloudAccess === true - ? 'Cloud Access & Keys (OAuth add-on)' - : hasCloudAccess === false - ? 'Cloud Access & Keys' - : 'Cloud Access…'; - return (
{/* Banner Section */} @@ -3177,89 +3704,26 @@ function App() { /> )} - {/* CRITICAL: Boost buttons are ETERNAL - they NEVER disappear */} - {/* These buttons are fixed-position, high z-index, and unconditionally rendered */} - {/* They are visible at program launch and stay visible forever */} - {/* Slide with WorkflowPanel collapse/expand animation */}
-
- - -
-
-
- - {showApiBoostTooltip && ( -
- Use this mode to change your model selections mid-run. It is a good way to use your free daily OpenRouter credits without interrupting your research run. For the easiest setup, select your free model and enable "Use boost as next API call when available." Some free models may be more rate-limited on OpenRouter than others. -
- )} -
- -
- - {capabilities.lmStudioEnabled ? ( - - {lmStudioAvailable ? 'LM Studio ✓' : 'LM Studio Offline'} - - ) : capabilities.genericMode && ( - - Hosted Web Mode - - )} - {developerModeEnabled && ( - - Developer Mode - - )} + onOpenLmStudio={() => setShowLmStudioModal(true)} + onOpenSyntheticLib4={() => setShowSyntheticLib4Modal(true)} + onOpenAgentMemory={() => setShowAgentMemoryModal(true)} + onOpenWolfram={() => setShowWolframModal(true)} + onToggleSyntheticLib4={(enabled) => handleConnectivityToggle({ syntheticlib4_enabled: enabled })} + onToggleAgentMemory={(enabled) => handleConnectivityToggle({ agent_conversation_memory_enabled: enabled })} + onToggleWolfram={(enabled) => handleConnectivityToggle({ wolfram_alpha_enabled: enabled })} + />
@@ -3424,6 +3888,9 @@ function App() { capabilities={capabilities} api={{ getAutonomousPaper: autonomousAPI.getAutonomousPaper, + getCurrentSession: autonomousAPI.getCurrentSession, + getPrunedPaperHistory: autonomousAPI.getPrunedPaperHistory, + getPrunedHistoryPaper: autonomousAPI.getPrunedHistoryPaper, deletePaper: autonomousAPI.deletePaper, deleteAllPrunedPapers: autonomousAPI.deleteAllPrunedPapers }} @@ -3512,6 +3979,7 @@ function App() { onSkipBrainstorm={handleLeanOJSkipBrainstorm} onForceBrainstorm={handleLeanOJForceBrainstorm} developerModeEnabled={developerModeEnabled} + assistantMemoryEnabled={connectivityStatus?.skills?.agent_conversation_memory?.enabled !== false} /> )} {activeTab === 'leanoj-brainstorms' && ( @@ -3547,6 +4015,7 @@ function App() { config={config} setConfig={setConfig} capabilities={capabilities} + connectivityStatus={connectivityStatus} anyWorkflowRunning={anyWorkflowRunning} onWorkflowRunningChange={setAnyWorkflowRunning} developerModeEnabled={developerModeEnabled} @@ -3554,12 +4023,29 @@ function App() { )} {/* Full-width settings screens with model sidebars are rendered outside the padded tab container. */} {activeTab === 'aggregator-logs' && } - {activeTab === 'aggregator-results' && } + {activeTab === 'aggregator-results' && ( + { + aggregatorPromptClearedRef.current = true; + removePromptDraft(AGGREGATOR_PROMPT_STORAGE_KEY); + try { + const emptyPromptConfig = { ...buildAggregatorConfigFromStorage(), userPrompt: '' }; + localStorage.setItem('aggregatorConfig', JSON.stringify(emptyPromptConfig)); + localStorage.setItem('aggregator_settings', JSON.stringify(emptyPromptConfig)); + } catch { + localStorage.removeItem('aggregatorConfig'); + localStorage.removeItem('aggregator_settings'); + } + setConfig((prev) => ({ ...prev, userPrompt: '' })); + }} + /> + )} {activeTab === 'compiler-interface' && ( @@ -3606,6 +4093,7 @@ function App() { settings={leanojSettings} onSettingsChange={setLeanojSettings} capabilities={capabilities} + connectivityStatus={connectivityStatus} isRunning={leanojRunning} developerModeEnabled={developerModeEnabled} /> @@ -3616,6 +4104,7 @@ function App() { config={config} setConfig={setConfig} capabilities={capabilities} + connectivityStatus={connectivityStatus} developerModeEnabled={developerModeEnabled} /> )} @@ -3623,6 +4112,7 @@ function App() { {activeTab === 'compiler-settings' && ( )} @@ -3630,7 +4120,12 @@ function App() { {/* WorkflowPanel is ETERNAL - always visible for boost controls */} {/* The panel shows workflow tasks when running, but boost controls are ALWAYS accessible */} {/* Users can configure boost (set next count, toggle categories) at any time */} - + setShowBoostModal(true)} + collapsed={workflowPanelCollapsed} + onCollapseChange={setWorkflowPanelCollapsed} + /> {/* Disclaimer Modal - Shows on every app load */} {showDisclaimer && ( @@ -3722,6 +4217,39 @@ function App() { reason={openRouterKeyReason} capabilities={capabilities} /> + + setShowLmStudioModal(false)} + status={lmStudioStatus} + capabilities={capabilities} + onRefresh={syncProviderAvailability} + /> + + setShowSyntheticLib4Modal(false)} + connectivityStatus={connectivityStatus} + onConnectivityChanged={setConnectivityStatus} + anyWorkflowRunning={anyWorkflowRunning} + /> + + setShowAgentMemoryModal(false)} + connectivityStatus={connectivityStatus} + onConnectivityChanged={setConnectivityStatus} + anyWorkflowRunning={anyWorkflowRunning} + /> + + setShowWolframModal(false)} + connectivityStatus={connectivityStatus} + onConnectivityChanged={setConnectivityStatus} + capabilities={capabilities} + anyWorkflowRunning={anyWorkflowRunning} + /> {/* OpenRouter Privacy Warning Modal */} { + setLoading(true); + setMessage(''); + try { + const next = await connectivityAPI.updateToggles({ + agent_conversation_memory_enabled: nextEnabled, + }); + onConnectivityChanged?.(next); + setMessage( + nextEnabled + ? 'Session History Memory enabled for Assistant workflow-memory search.' + : 'Session History Memory disabled for future Assistant workflow-memory search.' + ); + } catch (error) { + setMessage(error.message || 'Failed to update Session History Memory toggle'); + } finally { + setLoading(false); + } + }; + + return ( +
event.target === event.currentTarget && onClose()}> +
+
+

Session History Memory

+ +
+ +

+ Assistant runs in parallel during brainstorming, writing, and proof work.It retrieves up to 7 relevant records from local proof-history memory and SyntheticLib4 when enabled. It does not block workflows and is disabled during critique phases. +

+ + + {anyWorkflowRunning && ( +
+ Stop the active workflow before changing run-level Session History Memory availability. +
+ )} + +
+ Status: {memory.status || 'disabled'} +
+ {memory.message || 'Local proof-history memory status is unavailable.'} + {memory.local_records !== undefined && ( + <> +
+ {memory.local_records} local proof/history record{memory.local_records === 1 ? '' : 's'} indexed. + + )} +
+ + {message && ( +
+ {message} +
+ )} + +
+

Behavior

+

+ This is not raw provider transcript storage and does not expose private chain-of-thought or retry scaffolding. Disabling it removes local MOTO/manual/LeanOJ proof-memory corpora from Assistant retrieval without deleting proof records, session history, rejection logs, or saved prompts. +

+
+
+
+ ); +} + diff --git a/frontend/src/components/AgentConversationMemoryModal.test.jsx b/frontend/src/components/AgentConversationMemoryModal.test.jsx new file mode 100644 index 0000000..8f25a82 --- /dev/null +++ b/frontend/src/components/AgentConversationMemoryModal.test.jsx @@ -0,0 +1,32 @@ +import { render, screen } from '@testing-library/react'; +import { describe, expect, test, vi } from 'vitest'; +import AgentConversationMemoryModal from './AgentConversationMemoryModal'; + +const baseConnectivityStatus = { + skills: { + agent_conversation_memory: { + enabled: true, + status: 'active', + message: 'Ready', + local_records: 7, + }, + }, +}; + +describe('AgentConversationMemoryModal', () => { + test('explains Assistant workflow-memory scope and critique exclusion', () => { + render( + + ); + + expect(screen.getByText(/Assistant runs in parallel during brainstorming, writing, and proof work/i)).toBeInTheDocument(); + expect(screen.getByText(/retrieves up to 7 relevant records/i)).toBeInTheDocument(); + expect(screen.getByText(/disabled during critique phases/i)).toBeInTheDocument(); + expect(screen.getByText(/not raw provider transcript storage/i)).toBeInTheDocument(); + }); +}); + diff --git a/frontend/src/components/BoostControlModal.jsx b/frontend/src/components/BoostControlModal.jsx index 7364d6d..95145ff 100644 --- a/frontend/src/components/BoostControlModal.jsx +++ b/frontend/src/components/BoostControlModal.jsx @@ -6,6 +6,8 @@ import { DEFAULT_MAX_OUTPUT_TOKENS, DEFAULT_OPENROUTER_REASONING_EFFORT, findOpenRouterModel, + formatOpenRouterProviderLabel, + getOpenRouterProviderTitle, getProviderNames, getReasoningSupportInfo, normalizeOpenRouterReasoningEffort, @@ -488,7 +490,9 @@ export default function BoostControlModal({
diff --git a/frontend/src/components/CodexOAuthNotificationStack.jsx b/frontend/src/components/CodexOAuthNotificationStack.jsx index 55ccd4a..4c2127d 100644 --- a/frontend/src/components/CodexOAuthNotificationStack.jsx +++ b/frontend/src/components/CodexOAuthNotificationStack.jsx @@ -150,7 +150,7 @@ function CodexOAuthNotification({ notification, onDismiss, onOpenCloudAccess }) cursor: 'pointer', }} > - Open Cloud Access & Keys + Open OpenRouter/OAuth