fix(cli): help/version/usage, distinct errors, explicit exit codes, ledger tolerance (M4)#10
Conversation
…1-6) - `--help`/`-h`: full help with per-command descriptions to STDOUT, exit 0 (was: bare usage to STDERR, exit 2). - `--version`: prints package version, single source of truth = pyproject (installed metadata, falling back to reading pyproject.toml). - usage/help say `python3 -m loop` (was `python -m loop`). - missing target argument: usage to STDERR + nonzero exit, never defaults to cwd (was: silently operated on the current directory). - nonexistent target path (read commands): distinct, actionable STDERR error naming the path, exit 2 — no longer the same JSON `missing_file` report a malformed/empty contract produces. - `verify`/`validate` reconciled with docs: help describes them as doctor aliases (they were already wired). New: scripts/test_loop_cli.py (11 cases, each fails against the prior CLI). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Split the scan outcome from an operational failure so a broken invocation is never read as a clean scan or as findings: 0 clean · 1 findings (FailedUnverifiable/review) · 2 gate tampering (FailedSafety) · 3 operational error (unreadable --diff/--trajectory file, malformed --trajectory JSON, failed --self-check git, or a bad argument). Previously an unreadable input raised an uncaught traceback (exit 1, colliding with "findings") and an argparse error exited 2 (colliding with tampering). Input gathering is now guarded; a custom ArgumentParser maps usage errors to 3. Documented the code split in the module docstring. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
main() always returned 0, so a stalled/churned/over-budget loop exited the same as a healthy one — useless in a script or CI gate. Map the health report to an exit code via _exit_code(): 0 healthy or terminal (continue/done) · 1 intervention recommended (replan/revert/approval) or a degraded/inert RUNLOG · 2 precondition error (missing/invalid loop state). Two existing CLI tests asserted rc 0 on a stalled loop (the old always-0 behavior); updated to the new intervention semantics. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
An append-only JSONL ledger can be truncated mid-write or hand-edited; a single bad line made read()/summarize() raise JSONDecodeError and lose the whole lineage. Non-object lines (unparseable, or valid JSON that is not a dict) are now skipped with a stderr warning and counted via a shared _read_with_stats(). summarize() gains a `malformed` count; read() keeps its list signature. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4321ab55ef
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| trajectory = None | ||
| if args.trajectory: | ||
| with open(args.trajectory, encoding="utf-8") as fh: | ||
| trajectory = json.load(fh) |
There was a problem hiding this comment.
Validate trajectory JSON type before scanning
When --trajectory points at syntactically valid JSON that is not the documented array, such as {}, json.load succeeds and the value is passed to scan(); an empty object is normalized to [] there, so the CLI exits 0 with a clean scan instead of the new operational-error code 3. This breaks the exit-code split for malformed trajectory inputs, so the loaded value should be validated as a list before scanning.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR tightens the CLI/UX contract for python3 -m loop and related operational scripts by standardizing help/version/usage output, introducing explicit exit-code semantics, and making ledger parsing tolerant to corruption with warnings and accounting.
Changes:
- Implemented a
python3 -m loopCLI with explicit--help/--versionbehavior, usage text updates, and improved error messaging for missing/nonexistent targets. - Added explicit, script-friendly exit codes for
runtime_monitorandanticheat_scan, including a distinct operational-error code path. - Made
rollout_ledgerresilient to malformed JSONL lines (warn + skip + count), with new tests covering the tolerance behavior.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
loop/__main__.py |
Implements CLI help/version/usage behavior, target validation, and error handling. |
scripts/runtime_monitor.py |
Adds _exit_code() mapping and returns outcome-specific exit codes. |
scripts/anticheat_scan.py |
Introduces explicit exit-code constants and operational-error handling (including argparse collision avoidance). |
scripts/rollout_ledger.py |
Adds tolerant JSONL reading with stderr warnings and malformed-line counting in summarize(). |
scripts/test_loop_cli.py |
New subprocess-based CLI contract tests for help/version/usage and error distinctions. |
scripts/test_runtime_monitor.py |
Updates/extends tests to pin new runtime-monitor exit code semantics. |
scripts/test_anticheat_scan.py |
Adds tests asserting operational-error exit code 3 and preserves 0/1/2 outcome semantics. |
scripts/test_rollout_ledger.py |
Adds tests ensuring malformed ledger lines are skipped, warned, and counted. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if not argv: | ||
| print(f"{command}: missing target argument", file=sys.stderr) | ||
| print(_USAGE, file=sys.stderr) | ||
| return 2 | ||
| target = Path(argv[0]) |
| import io # noqa: E402 | ||
| import json # noqa: E402 | ||
|
|
M4 CLI/UX polish batch (launch criterion S6), TDD with fail-before evidence per item:
--help/-h→ exit 0, usage + per-command descriptions on stdout--version→ package version (single-sourced with pyproject)python3 -m loopverifyalias: was already wired as adoctoralias and documented — help text now says so (no rename needed)anticheat_scan: distinct operational-error exit code 3 (0/1/2 clean/findings/tampering preserved by guard tests)runtime_monitor: explicit exit codes per outcome (0 healthy, 1 intervention, 2 operational error); two existing tests that pinned the old always-0 behavior updated to the new semantics with commented rationalerollout_ledger: malformed ledger lines skipped with stderr warning + counted, not a crash+29 tests (each FAILED before its fix). Suite: 147 passed + 2 skipped (stdlib) / 149 passed (jsonschema). self_eval 13/13; frontmatter 9/9.