SollanSystems · SollanSystems · Jul 3, 2026 · Jul 2, 2026 · Jul 2, 2026 · Jul 2, 2026
diff --git a/evals/cases/structural.json b/evals/cases/structural.json
@@ -18,7 +18,8 @@
     "prompt-templates.md",
     "eval-suite.md",
     "safety-and-approvals.md",
-    "platform-map.md"
+    "platform-map.md",
+    "model-routing.md"
   ],
   "terminal_states": [
     "Succeeded",
@@ -57,6 +58,9 @@
     "terminal_state.json.tmpl",
     "verify-fast.sh",
     "verify-full.sh",
+    "verify-safety.sh",
+    "judge-rubric.sh",
+    "extract-trace-metrics.sh",
     "EVALS-rubric.md.tmpl"
   ],
   "eval_layer_names": [

diff --git a/reference/model-routing.md b/reference/model-routing.md
@@ -0,0 +1,54 @@
+# Model-routing doctrine — the canonical table + rationale
+
+The one place the `read → haiku / reason → sonnet / write → opus` rule is defined. Every
+spoke states the one-line rule inline (so it can act) and points here for the table, the
+rationale, and the optional enforcement. This file is the single source of truth; if a skill
+and this file ever disagree, this file wins.
+
+> **Base directory.** This file ships at the **plugin root** `reference/` (a sibling of
+> `skills/`), i.e. `${CLAUDE_PLUGIN_ROOT}/reference/model-routing.md`. Skills reach it as
+> `reference/model-routing.md` resolved against the plugin root, not their own folder.
+
+## The one rule
+
+**Every agent dispatch names an explicit `model:`.** This holds for the `Agent` tool and for
+every Workflow `agent()` call. There is no default-by-omission: an omitted `model:` inherits
+the costly main-loop model, which is the single biggest cost leak in an agent loop.
+
+## The tier table
+
+| Tier | `model:` | Use it for |
+|---|---|---|
+| **read** | `haiku` | Read-only lookups feeding the loop — status/coverage scans, trace/RUNLOG fact extraction, a monitor poll, "where is X", list-and-report. The default: anything that only *reports* what it found. |
+| **reason** | `sonnet` | Judgment without production writes — plan critique / pre-execution reflection, rubric judging, failure triage, multi-source synthesis, an ADR review. |
+| **write** | `opus` | Production writes and load-bearing decisions — the per-task worker, a bounded repair (repairs edit code), committing regression cases. |
+| **orchestrate** | main loop | The operator itself — advancing the state machine, choosing the next transition, adjudicating verification. Not a dispatched sub-agent tier. |
+
+Rule of thumb: **read → haiku, reason → sonnet, write → opus, orchestrate → main loop.** If you
+cannot justify sonnet or opus for a dispatch, it is a haiku dispatch.
+
+## Why it is load-bearing
+
+- **Cost is bounded.** Routing read-only work to haiku instead of the main-loop model is the
+  difference between a loop that is cheap to run overnight and one that is not.
+- **Dispatches are auditable.** An explicit tier per dispatch is a receipt line — append one
+  receipt per dispatch to `.loop/receipts/*.jsonl` (schema: `schemas/receipt.schema.json`), so
+  cost and routing are reconstructable after the run.
+- **Omission is a broken call.** Treat a missing `model:` like a missing `prompt:` — fix it
+  before dispatching. On the Workflow tool the leak is the same: a model-less `agent()` inherits
+  the main-loop model just as an `Agent` call does.
+
+## Optional enforcement (the author's stack — not required)
+
+The rule holds as policy text in `WORKFLOW.md` on any platform, even one that cannot enforce it
+at runtime. Where you already run them, these harden it — none are required to run a loop:
+
+- **PreToolUse hooks** (`model_routing.py` for the `Agent` tool, `workflow_routing.py` for
+  Workflow `agent()` calls) block a model-less or over-tier dispatch before it fires.
+- **`/routing` modes** (`normal` / `conserve` / `burn`) modulate the ceilings; explicitness is
+  never waived in any mode.
+- **The `[escalation]` valve** — after a *verified* dispatch failure, re-dispatch the same prompt
+  at +1 tier once, with the literal `[escalation]` marker in the prompt. Never on a first attempt.
+
+Without any of this, keep the rule as a line in the loop's `WORKFLOW.md` and name `model:` by
+hand on every dispatch. That is enough — the enforcement tooling only automates the same rule.
diff --git a/skills/loop-architect/SKILL.md b/skills/loop-architect/SKILL.md
@@ -5,6 +5,8 @@ description: "Classify an agent-loop task and choose its architecture + realizat
 
 # loop-architect
 
+> **Base directory.** `reference/…` paths below are **plugin-root-relative** — resolve them against the plugin root (`${CLAUDE_PLUGIN_ROOT}/reference/…`, i.e. `../../reference/…` from this `skills/loop-architect/` folder), where the shared docs ship, not inside this skill's own folder. (The `scripts/verify-*` gate named below is the operated loop's own workspace script.)
+
 The **brain** of the [[loop-engineer]] suite. It does **not** do the end task — it
 turns an underspecified objective into a decision: *what shape should this loop be,
 and what Claude-Code primitive physically runs it?* The output is an **architecture
@@ -14,7 +16,8 @@ it decides, it does not scaffold or run.
 Two separable choices (see `reference/architecture-matrix.md`):
 - **(A) Architecture** — how many agents, how much orchestration.
 - **(B) Realization** — which Claude-Code primitive (Workflow tool / markdown
-  supervisor / portable Python FSM spine / delegate to an acceptance gate such as `/verify-slice`).
+  supervisor / portable Python FSM spine / delegate to an acceptance gate — the
+  contract's `verify-fast`→`verify-full` by default, optionally `/verify-slice`).
 
 ## Prime directive
 
@@ -108,9 +111,9 @@ to reach an explicit terminal state; never a silent "completed."
 
 Any agent dispatch you suggest in the ADR names an explicit `model:` (read → `haiku`,
 reason → `sonnet`, write → `opus`) per the model-routing rule — a Workflow
-`agent({ model: "sonnet", … })` fan-out, a write agent on `opus`. Omitting
-`model:` inherits the costly main-loop model; the author blocks that with a PreToolUse hook
-(`workflow_routing.py`), but the rule holds on any surface.
+`agent({ model: "sonnet", … })` fan-out, a write agent on `opus`. Never omit it: omitting
+`model:` inherits the costly main-loop model. The full tier table + rationale (and the
+optional `workflow_routing.py` enforcement) are in `reference/model-routing.md`.
 
 ## Hand-off
 

diff --git a/skills/loop-contract/SKILL.md b/skills/loop-contract/SKILL.md
@@ -5,6 +5,8 @@ description: "Scaffold the repo-OS operating contract for an agent loop — SPEC
 
 # loop-contract — scaffold the repo-OS operating contract
 
+> **Base directory.** `reference/…` and `templates/…` paths below are **plugin-root-relative** — resolve them against the plugin root (`${CLAUDE_PLUGIN_ROOT}/…`, i.e. `../../` from this `skills/loop-contract/` folder), where the shared docs and scaffold templates ship. The `scripts/verify-*` you scaffold land in the *new loop's* own workspace, not this plugin.
+
 Turn an architecture decision into the **on-disk operating contract** an agent loop reads its
 truth from every turn. State lives in files, not chat context, so the loop survives compaction,
 a crashed session, and even a different engine. This is the externalized-state ("code as agent
@@ -46,6 +48,11 @@ outputs / permissions / approval_gates / terminal_states) and an **iteration-0 R
 recording the pre-execution reflection. Each artifact has exactly one owner concern — no file
 carries two jobs (rationale: `reference/repo-os-contract.md` §9).
 
+The two deterministic gates `verify-fast` and `verify-full` scaffold as runnable stubs; the
+three deeper proof-surface scripts (`verify-safety`, `judge-rubric`, `extract-trace-metrics`)
+ship in `templates/` as stubs you copy into `scripts/` and wire as the SPEC criteria earn them —
+`[[loop-evals]]` owns that proof logic, not this spoke.
+
 ## How to fill each template
 
 Map the ADR + goal onto the templates in `templates/` (the `{{PLACEHOLDER}}` tokens are the fill

diff --git a/skills/loop-engineer/SKILL.md b/skills/loop-engineer/SKILL.md
@@ -5,6 +5,8 @@ description: "Router for designing, launching, verifying, repairing, and improvi
 
 # loop-engineer
 
+> **Base directory.** `reference/…` paths below are **plugin-root-relative** — resolve them against the plugin root (`${CLAUDE_PLUGIN_ROOT}/reference/…`, i.e. `../../reference/…` from this `skills/loop-engineer/` folder), where the shared docs ship, not inside this skill's own folder.
+
 **The loop is the design object — not the prompt.** A loop-engineer designs, launches, verifies, repairs, and improves *other* agent loops; it does not primarily solve the end task. Its first job is to turn an underspecified objective into an **executable operating contract** — success criteria, task queue, tool boundaries, evaluation methods, stopping rules, approval gates, and persistent artifacts that survive across turns and sessions.
 
 **Prime directive.** If you cannot define success, verification, or a terminal state, the task is **underspecified** (`FailedSpecGap`) — say so, do not call the next completion "done." This is the central defense against the #1 long-horizon failure mode: false completion / weak self-verification / verifier gaming.
@@ -58,7 +60,7 @@ The bundled portable core runs every loop with no external setup: `python3 -m lo
 - **`/verify-slice` and `/verify-milestone`** (claude-code-orchestration, *optional*) — auto-repair + cross-review layered on the contract's `verify-*` gate. `loop-evals` *designs* the criteria; `loop-run` *calls* the gate. No new verification engine is shipped here.
 - **A portable Python FSM spine** (*optional*) — the init/next/complete + `state.json`-resume pattern for max-determinism / cross-engine resume; ~100 lines, or reuse the author's `harmony-agent` `engine/cli.py`. v1 ships no spine code.
 - **The grader-split pattern** (as in the `launch-local-agent` skill) — an objective blocking gate in front of a judged advisory rubric; the model for keeping deterministic checks ahead of any model verdict.
-- **The model-routing rule** — every dispatched agent names an explicit `model:` (read→haiku, reason→sonnet, write→opus) so cost is bounded and dispatches are auditable; receipts append to `.loop/receipts/*.jsonl`. *Optional:* the author enforces this with PreToolUse hooks (`model_routing.py` / `workflow_routing.py`) and `/routing` modes.
+- **The model-routing rule** — every dispatched agent names an explicit `model:` (read→haiku, reason→sonnet, write→opus) so cost is bounded and dispatches are auditable; receipts append to `.loop/receipts/*.jsonl`. Canonical tier table + rationale: `reference/model-routing.md`. *Optional:* the author enforces this with PreToolUse hooks (`model_routing.py` / `workflow_routing.py`) and `/routing` modes.
 - **superpowers** (*optional*) — `writing-plans`, `executing-plans`, `subagent-driven-development`, `verification-before-completion`, `test-driven-development` compose the markdown-supervisor realization. Any on-disk planning dir works as the planning surface (the author uses GSD `.gsd/`).
 - **ui/orchestration surfaces** — when a loop's actual work is UI/UX or general orchestration (not loop engineering), defer to the appropriate `ui-ux`/`orchestration` surface; this suite builds and runs the loop, it does not do that domain work.
 
@@ -75,5 +77,6 @@ This router stays deliberately thin; every detail lives one hop away in `referen
 - `reference/eval-suite.md` — [[loop-evals]] and [[loop-flywheel]]; the 7 layers, the two first-class metrics, the flywheel schedule.
 - `reference/safety-and-approvals.md` — [[loop-run]] and [[loop-repair]]; escalation ladder, approval lifecycle, terminal states, anti-cheat.
 - `reference/platform-map.md` — [[loop-architect]]; the engine-neutral contract mapped onto Claude / Codex / Hermes / Google.
+- `reference/model-routing.md` — every spoke; the canonical read→haiku / reason→sonnet / write→opus tier table + rationale + optional enforcement.
 
 If a question is about *how* a step works rather than *which* step is next, you have left the router — open the reference above and read it there.
diff --git a/skills/loop-evals/SKILL.md b/skills/loop-evals/SKILL.md
@@ -5,6 +5,8 @@ description: "Design the evaluation harness for an agent loop — the proof laye
 
 # loop-evals — design the harness that proves the loop
 
+> **Base directory.** `reference/…` and this plugin's bundled tools (`scripts/holdout_gate.py`, `scripts/anticheat_scan.py`) are **plugin-root-relative** — resolve them against the plugin root (`${CLAUDE_PLUGIN_ROOT}/…`, i.e. `../../` from this `skills/loop-evals/` folder). The `scripts/verify-*` gate and the `EVALS/…` tree live inside the *graded loop's* own repo, not this plugin.
+
 A loop without measurement is a loop that *claims* success. This skill designs the evaluation harness for a loop — what to check, in what order, with which metric — so a "Succeeded" terminal state is backed by evidence, not narration. It is the **designer** of the suite; `[[loop-run]]` is the caller that runs the gate each iteration, and `[[loop-flywheel]]` feeds real failures back into it.
 
 **In → out.** In: the loop's `SPEC.md` (success criteria, constraints, evidence rules) + its artifacts. Out: `scripts/verify-*` skeletons, an `EVALS/{dataset,rubrics,regressions,traces}/` tree, and the metric definitions — all committed inside the loop's own repo. This skill is read-only/advisory toward the loop it grades; it authors the harness, it does not run the task.
@@ -89,7 +91,7 @@ for (const c of regressionCases) {
   }
 }
 ```
-(`read → haiku`, `reason → sonnet`, `write → opus`; receipts append to `.loop/receipts/`.)
+(`read → haiku`, `reason → sonnet`, `write → opus` per the model-routing rule — tier table + rationale in `reference/model-routing.md`; receipts append to `.loop/receipts/`.)
 
 ## Standing the suite up
 

diff --git a/skills/loop-flywheel/SKILL.md b/skills/loop-flywheel/SKILL.md
@@ -5,6 +5,8 @@ description: "Turn a loop's own run history into compounding improvement — min
 
 # loop-flywheel — the loop that improves the loop
 
+> **Base directory.** `reference/…` paths below are **plugin-root-relative** — resolve them against the plugin root (`${CLAUDE_PLUGIN_ROOT}/reference/…`, i.e. `../../reference/…` from this `skills/loop-flywheel/` folder), where the shared docs ship. `EVALS/…` and `.loop/…` are inside the *improved loop's* own repo, not this plugin.
+
 A loop that only runs gets no better. `loop-flywheel` is the **improvement engine**: it reads what a loop has already done (its `RUNLOG.md`, `EVALS/traces/`, and `.loop/receipts/*.jsonl`) and turns that history into three durable outputs — **new eval cases**, **harness-change proposals**, and **compacted memory**. It is the reflect→see step of the self-learning flywheel applied to an agent loop itself.
 
 It owns no gate. The deterministic and rubric layers live in [[loop-evals]] and `reference/eval-suite.md`; this skill *feeds* that suite (mines failures into it) and *watches* its two first-class metrics over time. Read [[loop-evals]] first if you are standing the suite up; come here once a loop has run ≥2 iterations and you want it to compound.
@@ -44,7 +46,7 @@ await agent({ model: "opus",     // write: turn confirmed failures into committe
   prompt: `From the confirmed failures in this proposal, write one EVALS/regressions/<case>.json per distinct real failure (input + expected deterministic verdict). Commit only failures that actually occurred; leave harness-change proposals for human review.` });
 ```
 
-(`read → haiku`, `reason → sonnet`, `write → opus`; receipts append to `.loop/receipts/`. The haiku+sonnet pass mines and *proposes*; only the opus pass writes the committed regression cases — and even then never reimplements the verify engine: the contract's `scripts/verify-*` gate, optionally `/verify-slice`, is the source of truth.)
+(`read → haiku`, `reason → sonnet`, `write → opus` per the model-routing rule — canonical table in `reference/model-routing.md`; receipts append to `.loop/receipts/`. The haiku+sonnet pass mines and *proposes*; only the opus pass writes the committed regression cases — and even then never reimplements the verify engine: the contract's `scripts/verify-*` gate, optionally `/verify-slice`, is the source of truth.)
 
 ## Memory compaction: two stores, never mixed
 

diff --git a/skills/loop-inspector/SKILL.md b/skills/loop-inspector/SKILL.md
@@ -5,6 +5,8 @@ description: "Inspect an existing agent loop and emit a scored gap report — th
 
 # loop-inspector — the quality layer above the ecosystem
 
+> **Base directory.** This skill's own `reference/patterns.md` sits in this folder. `reference/repo-os-contract.md` and the bundled `scripts/inspect_loop.py` are **plugin-root-relative** (`${CLAUDE_PLUGIN_ROOT}/…`, i.e. `../../` from this `skills/loop-inspector/` folder). The `scripts/verify-*` / `holdout_gate.py` / `anticheat_scan.py` names are *signals it looks for in the inspected loop*, not files shipped by this plugin.
+
 Most of the [[loop-engineer]] suite *builds* a loop. `loop-inspector` **judges one
 that already exists** — yours or someone else's. Point it at a loop directory (a
 `.loop/` repo-OS contract, a superpowers or ruflo harness, any agent-loop dir) and it

diff --git a/skills/loop-inspector/reference/patterns.md b/skills/loop-inspector/reference/patterns.md
@@ -1,5 +1,10 @@
 # loop-inspector — inspection checklist, scoring rubric, and foreign-harness reading
 
+> **Base directory.** The bundled `scripts/inspect_loop.py` and `reference/repo-os-contract.md`
+> named below are **plugin-root-relative** (`${CLAUDE_PLUGIN_ROOT}/…`, i.e. `../../../` from this
+> `skills/loop-inspector/reference/` folder). The `scripts/verify-*` / `holdout_gate.py` /
+> `anticheat_scan.py` names are *signals read from the inspected loop*, not files in this plugin.
+
 This is the depth behind [[loop-inspector]]: the exact checklist, how the score is
 computed, and how to read a loop that does **not** use this suite's filenames. The
 runnable core is `scripts/inspect_loop.py`; this file is the rubric it encodes.

diff --git a/skills/loop-repair/SKILL.md b/skills/loop-repair/SKILL.md
@@ -5,6 +5,8 @@ description: "Patch-and-repair loop for a failing agent run — use when a loop
 
 # loop-repair
 
+> **Base directory.** `reference/…` paths below are **plugin-root-relative** — resolve them against the plugin root (`${CLAUDE_PLUGIN_ROOT}/reference/…`, i.e. `../../reference/…` from this `skills/loop-repair/` folder), where the shared docs ship. The `scripts/verify-*` gate and `.loop/…` are inside the *operated loop's* workspace, not this plugin.
+
 The repair lane. When verification disagrees with the work, **this skill is what reacts — bounded, recorded, and capped.** It does not own running the loop ([[loop-run]] does) or defining the checks ([[loop-evals]] does); it owns the disciplined response to a *failing* check so the loop converges instead of thrashing. Every rule here is downstream of the escalation ladder, the repair cap, and the verifier-gaming guard in `reference/safety-and-approvals.md` — read that for the full safety model; this is the operating procedure.
 
 **When this fires:** a deterministic check (test / lint / typecheck / schema) failed, the rubric judge fell below threshold, or [[loop-run]] reached the `repair` state. Inputs: the failing `verification_bundle`, the best prior `.loop/state.json`, and the diff since that best state. Output: a structured **repair record** (below) + an updated state, then control back to [[loop-run]] to re-verify.
@@ -83,7 +85,7 @@ The cap is `repair.max_attempts`, **default N=2**, configurable in `WORKFLOW.md`
 
 ## Dispatching a repair to a subagent
 
-For a bounded, isolated fix, [[loop-run]] may dispatch the repair to a write-tier agent. Per the model-routing rule, the dispatch **must** name an explicit `model:` — repairs write production code, so they route to `opus`:
+For a bounded, isolated fix, [[loop-run]] may dispatch the repair to a write-tier agent. Per the model-routing rule (tier table: `reference/model-routing.md`), the dispatch **must** name an explicit `model:` — repairs write production code, so they route to `opus`:
 
 ```
 Agent(