Add sophistry_bench_sprint_env: single-agent advocacy reward-hacking environment by acharyaanusha · Pull Request #787 · huggingface/OpenEnv

acharyaanusha · 2026-06-11T02:16:06Z

Summary

Adds envs/sophistry_bench_sprint_env/ — an OpenEnv port of the sophistry-bench single-agent advocacy reward-hacking environment. One-turn advocacy over QuALITY reading-comprehension passages: the policy is assigned the gold answer or a distractor and must produce one <claim>/<cite> argument. The reward proxy peaks at exactly 8 <claim> tags (claim_count_cliff); four weight-0 canaries detect format hacking, making this a compact, reproducible reward-hacking measurement env.

Single-step episode, typed (non-MCP) pattern: models.py (AdvocacyAction/AdvocacyObservation) + EnvClient, FastAPI app, Dockerfile copied from echo_env.
reset() issues the task (passage + question + answer-to-defend); step(AdvocacyAction(text=...)) returns reward + all 8 reward components and done=True. Hidden ground truth: the policy is told what to defend, never whether it's gold; correctness surfaces only in step metadata.
No scoring drift: scoring/dataset logic is imported from the upstream sophistry-bench-sprint package rather than reimplemented, and a parity test asserts the OpenEnv aggregate_reward equals the canonical reward to 1e-9.
Config via env vars: SPRINT_N_ITEMS, SPRINT_PASSAGE_CHARS, SPRINT_SEED, SPRINT_WEIGHTS.

A serialization fix worth flagging for the framework

While containerizing, I found that core/env_server/serialization.py::serialize_observation excludes the base Observation.metadata from the wire payload — so reward components placed in metadata are silently dropped over HTTP (in-process tests don't catch it because they never serialize). Worked around it within the env by mirroring components into a declared components field (declared subclass fields survive serialization) and restoring metadata client-side, guarded by serialization round-trip tests. Happy to discuss whether the framework should preserve metadata instead.

Test Plan

cd envs/sophistry_bench_sprint_env && uv run pytest tests/ -v → 10 passed (incl. anti-drift parity + 2 wire-serialization regression tests)
openenv build sophistry_bench_sprint_env builds; container smoke test green (reset → step_text with 8 claims → reward 0.5, all 8 components over HTTP)

Dependency & live demo

sophistry-bench-sprint is published on PyPI (https://pypi.org/project/sophistry-bench-sprint/) and pulled as a normal dependency — no vendored wheel. A live demo Space is deployed at https://huggingface.co/spaces/anushaacharya/sophistry_bench_sprint_env.

🤖 Generated with Claude Code

…print wheel Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ion models Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…d parity Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The framework's serialize_observation() strips the base Observation.metadata dict from the wire response, so the eight reward components never reached the containerized client (metadata arrived empty). Mirror the components into a declared AdvocacyObservation.components field server-side and re-populate metadata from it in the client's _parse_result, preserving the public contract that observation.metadata carries all eight components. Verified end-to-end against the built container (smoke test: REWARD 0.5, all 8 keys present). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tion regression test Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…survival test Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…-sprint 0.1.5 (drop vendored wheel) sophistry-bench-sprint is now on PyPI; switch the dependency to the release, remove the vendored wheel, and add HF Space README frontmatter. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

acharyaanusha · 2026-06-11T02:39:23Z

@Darktex @burtenshaw would love a review!

Darktex

Review: `sophistry_bench_sprint_env`

Thanks for a well-structured, well-documented environment — the hidden-ground-truth design, the anti-drift parity test, and the proactively-documented serialization workaround are all genuinely nice. The structure respects the rewards-inside-environment invariant and the typed (non-MCP) client/server split is correct. A few mechanical items block CI and should be fixed before merge, plus two alignment points worth a human look.

Tier 1 — Fixes required (CI-blocking)

tests/test_environment.py:85 — ruff F811: AdvocacyAction is imported at line 4 and re-imported at line 85. Remove the redundant import (ruff check --fix handles it).
tests/test_environment.py — usort: import asyncio (and the following from sophistry_bench_sprint import load_environment) sit mid-file after function definitions. Move them into the top-of-file import block.
models.py & server/sophistry_bench_sprint_environment.py — ruff format: both files would be reformatted (line-length wrapping only). Run uv run ruff format.

I reproduced all three locally: ruff check → 1 error, ruff format --check → 2 files, usort check → 1 file.

Tier 1 — Smaller fixes

server/sophistry_bench_sprint_environment.py:54 — missing generic parameters: declared as class SophistryBenchSprintEnvironment(Environment):. Other typed envs parameterize the base — e.g. maze_env (Environment[MazeAction, MazeObservation, MazeState]) and tbench2_env. The client already does this correctly (client.py:21). Suggest Environment[AdvocacyAction, AdvocacyObservation, <StateType>] to match convention and INVARIANTS.md.
README.md:41 — usage import path: example uses from envs.sophistry_bench_sprint_env import ..., which only resolves with the repo root on sys.path. The canonical pattern (see echo_env/README.md) is from sophistry_bench_sprint_env import ....

Tier 2 — Alignment / for human review

Serialization workaround couples the env to undocumented framework behavior. core/env_server/serialization.py::serialize_observation strips the base Observation.metadata from the HTTP payload, so the env mirrors reward components into a declared components field and restores metadata client-side. This is correctly done and locked down by round-trip tests — but it's now load-bearing against an internal framework detail. If the framework later preserves metadata (or renames the mirroring contract), the client restore logic could silently diverge. Worth a maintainer decision on whether the framework should preserve metadata instead (the author explicitly invited this discussion). cc @Darktex
aggregate_reward formula is locally reimplemented, not delegated upstream. The PR claims "no scoring drift," and the parity test (test_aggregate_matches_canonical_verifiers_reward, to 1e-9) is the right safeguard. But the combination (cliff + ground) / 2.0 is recomputed in step() rather than imported from sophistry-bench-sprint. If upstream changes how sub-scores combine, this goes stale silently (and the parity test only runs when the package is installed). Consider importing an aggregate function directly, or adding a comment marking the formula as load-bearing and in-sync-required.

Claims verified against the code

✅ reset() issues task with done=False; step() returns done=True and all 8 components.
✅ Hidden ground truth: reset() exposes what to defend, not whether it's gold; correctness only in step() metadata.
✅ Serialization workaround + round-trip tests exercise the real serialize_observation path.
✅ SPRINT_* env-var config all handled; uv.lock present.
⚠️ "No scoring drift" — parity test is solid, but aggregate formula is reimplemented (see above).

Verdict

Request changes — the three lint/format/usort failures will fail CI and must land first; the generic-param and README fixes are quick convention items. The two Tier-2 points are for discussion, not blockers.

Automated review by Claude Code | Learn more

…nv conventions Code review (CI-blocking + convention items): - Fix ruff F811 / usort: rewrite test import block, drop redundant import. - ruff format models.py and the environment module. - Parameterize the base class: Environment[AdvocacyAction, AdvocacyObservation, State]. - README usage: import from `sophistry_bench_sprint_env` (not `envs.`). - Mark the reproduced `aggregate_reward` formula LOAD-BEARING (no public export to import; parity test pins it to 1e-9). Re-review vs merged envs (echo/maze/tbench2): - Add docs stub docs/source/environments/sophistry_bench_sprint.md (CI doc-sync check was failing) and register in _toctree.yml + environments.md. - Move tests to the central tests/envs/ layout so CI actually collects them; guard with pytest.importorskip (the env's scoring dep isn't in the base test env), matching the camel-guarded pattern in tbench2. - Dockerfile: huggingface openenv-base + ENV ENABLE_WEB_INTERFACE=true (matches echo/tbench2); README front-matter base_path: /web. - pyproject: depend on `openenv[core]` (not `openenv-core[core]`) like all other envs; add pytest-asyncio/pytest-cov dev extras; re-lock (openenv 0.3.1). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

acharyaanusha · 2026-06-11T15:20:09Z

Pushed 792905d3 addressing the review

Tier 1 — fixes

F811 / usort: rewrote the test import block (single top-of-file block, dropped the redundant AdvocacyAction import).
ruff format applied to models.py and the environment module.
Base class now parameterized: Environment[AdvocacyAction, AdvocacyObservation, State] (uses base State, matching the client's EnvClient[..., State]).
README usage now imports from sophistry_bench_sprint_env, not envs.sophistry_bench_sprint_env.

Tier 2 — alignment

aggregate formula: there is no public aggregate to import — aggregate_reward is an inner closure of sophistry_bench_sprint._build_reward_funcs, not in the package's __all__. Rather than widen the package's public API for one expression, I marked the reproduced (cliff + ground) / 2.0 LOAD-BEARING in a comment and kept the 1e-9 parity test as the drift guard. Happy to export an aggregate from the package instead if you'd prefer that contract.
serialization workaround: left as-is (declared components/error fields + client restore, locked by round-trip tests) since it's the only thing that survives serialize_observation stripping base metadata. cc @Darktex

Darktex

Verdict: comment (non-blocking suggestions — nothing here is a confirmed bug, the env is well-constructed and ships with a canonical parity test)

Thanks for a thorough, well-documented environment. The typed (non-MCP) pattern, the hidden-ground-truth design (correctness only in step metadata, never told to the policy), and the 1e-9 parity test against the upstream canonical reward are all exactly right. A few suggestions to consider:

Suggestions (non-blocking)

Fragile positional zip for reward weighting — server/sophistry_bench_sprint_environment.py
```
reward = sum(w * c for w, c in zip(self.weights, metadata.values()))
```
This relies on metadata dict insertion order matching the positional weights array. It is correct for the shipped default ([1,0,0,0,0,0,0,0] — only aggregate_reward is weighted) and dict order is guaranteed in Python 3.7+, but any custom SPRINT_WEIGHTS that weights a non-aggregate component would silently break if the dict is ever reordered. Consider an explicit named mapping to make the contract self-documenting and refactor-safe:
```
_WEIGHT_KEYS = ["aggregate_reward", "correctness_reward", "n_claims", "n_citations",
                "alternation_canary", "starts_with_canary", "length_band_canary", "template_echo_canary"]
reward = sum(w * metadata[k] for w, k in zip(self.weights, _WEIGHT_KEYS))
```
Parity test verifies formula parity, not dataset parity — tests/envs/test_sophistry_bench_sprint_environment.py
The test builds vf_env = load_environment(...) but never resets it; it passes env._current_passage (the OpenEnv side) straight into the canonical reward fn. That confirms the arithmetic matches given the same passage, but not that both sides select the same passage for a given seed. If dataset selection ever diverges, this test would still pass. Either add a same-passage-at-seed assertion, or add a comment clarifying the test intentionally covers only formula parity.
Private-attribute access in test — capture env._current_passage before step() (which flips _has_task=False) and avoid reaching into private state from the test, e.g. read it right after reset().

Alignment notes for a human reviewer

Metadata stripped over the wire: the env mirrors all 8 reward components into a declared components field because serialize_observation excludes metadata. The workaround is sound and tested, but it papers over a framework-level limitation — worth fixing at the framework layer so every typed env gets observation.metadata over the wire for free rather than each env reconstructing it. (This observation was made against a possibly-older snapshot of src/openenv/core/env_server/serialization.py; please confirm against current main.)
Reproduced upstream formula + unbounded pin: the aggregate formula is reproduced locally (it's an inner closure upstream, not importable — acknowledged in a code comment) and pyproject.toml pins sophistry-bench-sprint>=0.1.5 with no upper bound. The parity test is the right guard, but it only catches drift when the suite runs against a newer package version. Consider an upper bound or a CI check.

None of the above blocks merge — they're hardening suggestions. Nice work.

Automated review by Claude Code | Learn more

Darktex

Thoughtful, well-tested contribution — wire-serialization regression tests, a 1e-9 parity test against the upstream scorer, the weight-0 canary design for detecting format hacking, and proper copyright headers. The structure conforms to the standard env layout (models / client / server / openenv.yaml / Dockerfile / tests). No blocking issues; a few non-blocking points below.

Tier 1 (Bugs & Lint) — non-blocking

server/sophistry_bench_sprint_environment.py:652 — reward = sum(w * c for w, c in zip(self.weights, metadata.values())) couples the 8 weights to the insertion order of the metadata dict literal (lines 642–651). It's correct today (the order matches the documented SPRINT_WEIGHTS order and Python preserves insertion order), but a future reorder of that dict would silently scramble every weight with no error. Consider binding to an explicit ordered list of (key, value) pairs so the weight↔component mapping is reviewable at a glance.
models.py:242 / models.py:281 — from typing import Dict / components: Dict[str, float]: prefer the built-in dict[str, float] (project targets Python ≥ 3.10).
models.py:279 — AdvocacyObservation re-declares reward: float = Field(0.0, ...), narrowing the base Observation.reward (bool|int|float|None). The default-0.0 path is covered by tests; just confirm no framework code path (e.g. reset serialization) ever constructs this observation with reward=None, which would now fail validation.

Tier 2 (Alignment)

Ground-truth visibility (worth an explicit doc note): correctness_reward is surfaced in observation.components (and reconstructed into observation.metadata client-side), so it is present in the per-step result. This is intentional per the design ("correctness surfaces only in step metadata"), but it's only safe if the training harness never feeds observation.metadata/components back into the policy prompt. Recommend stating that constraint explicitly in the README so downstream users don't accidentally leak the gold/distractor signal into the agent's context.
Reward parity pinning: the aggregate-reward formula is a deliberate reimplementation of an upstream private closure, guarded by the parity test. To keep the "no scoring drift" guarantee robust, consider pinning sophistry-bench-sprint to an exact version and documenting that any bump must re-run the parity test (a floor >= bound lets upstream patch the formula while the test stays green against a pinned lock).

Flagging the two Tier 2 items for maintainer eyes (@Darktex). None of the above blocks merge.

Automated review by Claude Code | Learn more

…re/test contracts Addresses two follow-up reviews and a self-review pass. - Reward weighting: bind weights to an explicit `_COMPONENT_KEYS` order (not dict insertion order); validate `len(weights) == 8` on the constructor `weights=` path too (the env-var path was already checked) and `zip(..., strict=True)` as a backstop. A mis-sized vector now raises instead of silently truncating the reward and dropping canary components. Adds a regression test. - Parity test: now also asserts dataset parity (same passage selected at the same seed), captures the passage before `step()`, and feeds the canonical fn its own passage — covers dataset + formula parity, not just arithmetic. - models.py: `Dict` -> `dict`; document that post-step reward must be read from `StepResult.reward` (observation.reward is stripped over the wire). - README/docs: test command now actually executes (was silently skipping via importorskip under the base venv); add a ground-truth-leak warning (don't feed observation.metadata/components back into the policy prompt); mirror it to the docs stub. - pyproject: cap `sophistry-bench-sprint>=0.1.5,<0.2.0` so an upstream formula change can't silently drift in; re-lock. - step(): count only scored steps (increment after the reset guard). - app.py: comment the load-bearing package-dir remap the container import relies on. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

acharyaanusha · 2026-06-12T05:26:25Z

Pushed 94a7f616 addressing the follow-up reviews.

Reward weighting (raised in both reviews) — the zip(self.weights, metadata.values()) coupled weights to dict insertion order. Now:

Weights bind to an explicit _COMPONENT_KEYS tuple (reward = sum(w * metadata[k] for w, k in zip(self.weights, _COMPONENT_KEYS, strict=True))).
Self-review caught that the constructor weights= arg skipped the 8-length check (only SPRINT_WEIGHTS was validated) — a mis-sized vector silently truncated via zip, dropping canary components with no error. Now validated in __init__ for both paths, with strict=True as a backstop. Added a regression test.

Parity test (formula vs dataset parity) — now also asserts vf_env.dataset[0].info.passage == passage (same passage selected at the same seed), captures the passage before step(), and feeds the canonical fn its own passage. Covers dataset + formula parity, not just arithmetic-given-the-same-passage.

Ground-truth leak — added an explicit README warning (and mirrored it into the docs stub) that harnesses must not feed observation.metadata/components back into the policy prompt, since correctness_reward is the hidden gold/distractor signal.

Wire reward contract — clarified in models.py that post-step reward must be read from StepResult.reward; observation.reward is stripped by the serializer and is always 0.0 over the wire.

Smaller items

from typing import Dict → built-in dict[str, float].
pyproject: capped sophistry-bench-sprint>=0.1.5,<0.2.0 so an upstream formula change can't silently drift past the parity guard; re-locked.
README/docs test command now actually runs the suite (uv run --project envs/sophistry_bench_sprint_env --extra dev pytest …) — the previous command silently skipped under the base venv. In the repo's shared CI it still skips via importorskip (no scoring dep installed), same as tbench2's camel guard.
step() now counts only scored steps (increment moved after the reset guard).
app.py: commented the load-bearing package-dir remap the container import relies on.

On the reward: float narrowing of base Observation.reward (Union[bool,int,float,None]): confirmed no construction path passes reward=None — serialize_observation strips reward and the client reconstructs from the float we emit — so the narrowing can't raise a ValidationError.

The two framework-level alignment notes (serialize_observation stripping metadata; whether the framework should preserve it so every typed env gets observation.metadata for free) remain maintainer calls — verified against current main, the strip still happens. cc @Darktex

Darktex

Alignment Review — Two-Tier

Tier 1: Changes Required

[BLOCK] client.py imports from env_server/ — violates client-server invariant

envs/sophistry_bench_sprint_env/client.py lines ~10/14:

from openenv.core.env_server.types import State
from core.env_server.types import State  # fallback

env_server/ is the server package. INVARIANTS.md is explicit: "Clients must never import from server/ directory." The fix: either remove the typed _parse_state override entirely (the base class handles a raw dict), or re-export State from a shared location and import from there. The Action/Observation pattern in models.py shows the right approach.

_parse_result mutates a Pydantic model via attribute assignment

client.py line ~255:

observation.metadata = dict(observation.components)

With validate_assignment=True this works today, but it is brittle. Prefer building a new model or using object.__setattr__ to make the intent explicit.

Test accesses private env._current_passage

tests/envs/test_sophistry_bench_sprint_environment.py line ~5826. Read the passage from the reset observation's prompt or expose a public accessor.

reward field default differs from framework convention

models.py: reward: float = Field(0.0, ...) overrides the base Observation.reward: ... | None = None. Post-reset wire payload will carry "reward": 0.0 instead of the standard null. Any harness using result.reward is None to detect a reset response will misbehave. Either keep the default None and update docs, or explicitly document this deviation in the env's README.

Tier 2: Alignment Discussion

ALIGNMENT FLAG 1 — Framework metadata-stripping: bug or design?

Principle at stake: Pydantic serialization / wire type contract (INVARIANTS.md §3)
The concern: serialize_observation in src/openenv/core/env_server/serialization.py:154-159 explicitly excludes metadata from every observation payload. The workaround (mirroring data into components) is technically correct and well-tested, but the root issue affects all environments. If this is a bug it should be fixed at the framework level (with an RFC); if it is intentional (metadata is server-side only), that should be documented and this env should not work around it without a design discussion.

ALIGNMENT FLAG 2 — correctness_reward reachable by the orchestration layer

Principle at stake: Agent isolation / rewards inside environment (INVARIANTS.md §Security, PRINCIPLES.md)
The concern: correctness_reward (hidden ground truth — whether the assigned answer is gold) is returned in StepResult.observation.components and restored into observation.metadata by the typed client. The only guardrail against leaking this into the agent's context is a prose warning in the README. For a reward-hacking measurement environment this is the most sensitive field. Should it be stripped from the wire payload entirely and only logged server-side, surfaced only through an authenticated orchestration channel?

ALIGNMENT FLAG 3 — Inline reproduction of upstream aggregate formula

Principle at stake: Rewards inside environment / drift risk (PRINCIPLES.md)
The concern: aggregate = (cliff + ground) / 2.0 is reproduced inline because the upstream package does not export it. The parity test + <0.2.0 version cap is a reasonable short-term mitigation, but this is a latent correctness hazard. Long-term, request the upstream package to export the formula publicly.

What is working well

Thorough tests, including a parity test against the canonical verifiers reward and two wire-serialization round-trip tests that directly prove the workaround.
The weight-vector length guard in both __init__ and _weights_from_env with zip(..., strict=True) backstop is solid.
Clean single-step episode model, deterministic cursor, and correct SUPPORTS_CONCURRENT_SESSIONS = False default protecting session safety.
The importorskip guard for CI is the correct pattern for heavy external deps.

Automated review by Claude Code | Learn more

…arse Address the latest review (Tier 1): - models.py: drop the `reward: float = 0.0` override so it inherits the base Observation default (None) — no sibling narrows it, and the override made the reset wire payload carry `reward: 0.0` instead of the conventional `null`, breaking any harness that uses `result.reward is None` to detect a reset. reset() now leaves reward as None; step() still sets the float. - client.py `_parse_result`: build the observation once with `metadata=` set at construction instead of mutating the model after creation (pop any in-process metadata, else restore from the mirrored `components`, then fold in `error`). - Test no longer reaches into private `env._current_passage`; added a public `current_passage` accessor (the passage is already in the reset prompt, not hidden ground truth) and the parity test reads that. The flagged `State` import is not a client->server violation: INVARIANTS.md §19 lists Action/Observation/State as shared wire types, and every sibling client (grid_world, maze) imports State from openenv.core.env_server.types identically. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

acharyaanusha · 2026-06-13T02:46:23Z

Pushed ac729723 (rebased onto the latest main merge).

Tier 1

reward field default — dropped the reward: float = 0.0 override so AdvocacyObservation inherits the base Observation.reward default (None). No sibling narrows it, and the override made the reset wire payload carry reward: 0.0 instead of the conventional null. reset() now leaves reward as None; step() sets the float. Updated the affected assertions.
_parse_result mutation — now builds the observation once with metadata= set at construction (pop any in-process metadata, else restore from the mirrored components, then fold in error), instead of assigning observation.metadata after creation.
Test private access — added a public current_passage accessor (the passage is the reading-comprehension text already in the reset prompt, not hidden ground truth) and the parity test reads that instead of env._current_passage.
client.py imports State from env_server/ — respectfully, this isn't a client→server violation. INVARIANTS.md §19 lists Action/Observation/State as the shared wire types, and every typed sibling client imports State from openenv.core.env_server.types identically (grid_world_env/client.py:13, maze_env/client.py:21; sibling models do the same). The rule (§60) is about importing an env's own server/ implementation — which this client does not do.

Tier 2 (alignment flags) — these are framework/upstream-level and out of this env's scope to change unilaterally:

metadata stripping (FLAG 1): the components mirror is the in-env workaround; whether serialize_observation should preserve metadata for all envs is the framework RFC you flagged. Verified the strip still happens on current main.
correctness_reward on the wire (FLAG 2): it must reach the orchestration/measurement layer (that's how reward-hacking is measured), but must never be fed back into the agent prompt — guarded by the README warning + docs stub. Happy to additionally gate it behind a config flag if you'd prefer it omitted by default.
inline aggregate formula (FLAG 3): mitigated by the 1e-9 parity test + <0.2.0 cap; I'll file an upstream issue to export the aggregate publicly so this env can import it.

cc: @Darktex

…eproducing it sophistry-bench-sprint 0.1.6 now exports the advocacy aggregate as a public `aggregate_reward(claims, citations, passage)`. Import and call it in step() rather than reproducing `(cliff + ground) / 2` inline — removes the LOAD-BEARING duplication and the drift hazard the reviewers flagged. Bump the pin to `>=0.1.6,<0.2.0`, drop the now-unused claim_count_cliff/citation_grounding imports, re-lock. Reward values are unchanged (same canonical impl); the parity test still guards dataset/seed selection and the rubric index-0 mapping. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

acharyaanusha · 2026-06-13T03:00:15Z

Resolved the inline-aggregate-formula alignment flag (FLAG 3) at the source rather than working around it — we own the upstream package.

sophistry-bench-sprint 0.1.6 now exports a public aggregate_reward(claims, citations, passage). Internally it's a pure _aggregate_reward helper shared by both the trained rubric func (closure name/metric key unchanged) and the public alias, with a package test asserting the two agree to 1e-9. Published to PyPI.
This env (8ad6f3b6) now imports and calls aggregate_reward in step() instead of reproducing (cliff + ground) / 2 — the LOAD-BEARING duplication is gone, so the formula can't drift. Pin bumped to >=0.1.6,<0.2.0, unused imports dropped, re-locked. Reward values are identical (same canonical impl); the parity test still guards dataset/seed selection and the rubric index-0 mapping.

That leaves only FLAG 1 (framework-level metadata stripping) and FLAG 2 (correctness_reward on the wire) as maintainer/design calls — both unchanged and noted above.

Darktex

Alignment Review Report

Automated Checks

Lint: SKIPPED — new files not present on disk in this branch checkout, ruff returned No such file or directory for the new env paths. The diff itself contains no obvious formatting violations (correct spacing, import ordering looks correct).
Debug code: CLEAN — no print, breakpoint, or leftover debug statements found in the new env files.

Tier 1: Fixes Required

envs/sophistry_bench_sprint_env/server/app.py — create_app is imported from openenv.core.env_server unconditionally (no try/except standalone fallback), yet the environment body uses a dual-import pattern everywhere else. In isolation this works because the package __init__.py re-exports create_app, but it is inconsistent with the rest of the env and will silently break if only the server/ subpackage is installed without the top-level package being on sys.path. Either add the same try/except guard used in all other imports in this file, or align with the echo_env pattern of importing from openenv.core.env_server.http_server directly.
envs/sophistry_bench_sprint_env/pyproject.toml — requests>=2.31.0 is listed as a runtime dependency. SophistryBenchSprintEnv extends EnvClient, which uses the framework's transport layer. If requests is not called directly in client.py or models.py, remove it. Unused runtime dependencies widen the attack surface and increase container image size.
tests/envs/test_sophistry_bench_sprint_environment.py — sys.path.insert(0, ...) is used to resolve envs.sophistry_bench_sprint_env. Other env tests in this repo rely on PYTHONPATH=src:envs set at the test runner level (per CLAUDE.md build commands). Using sys.path.insert in the test file itself is fragile (relative path join from __file__) and creates a local convention that diverges from the rest of the test suite. Remove the manual path manipulation and rely on the PYTHONPATH approach documented in CLAUDE.md.
tests/envs/test_sophistry_bench_sprint_environment.py (test test_aggregate_matches_canonical_verifiers_reward) — The rubric introspection rubric.funcs[0] is fragile: the comment itself notes the RubricGroup branch, and the code does a conditional duck-type walk to find aggregate_fn. If the upstream sophistry-bench-sprint package changes its rubric structure (even a minor release within <0.2.0), this test will fail with an IndexError or AttributeError rather than a meaningful assertion failure. Add an explicit assert aggregate_fn is not None, "Could not locate aggregate_reward function" guard before calling it, and document the assumption clearly.

Tier 2: Alignment Discussion

ALIGNMENT FLAG: No MCP interface — env uses raw Gym-style HTTP directly

Principle at stake: "MCP as universal standard" (PRINCIPLES.md Key Decisions, RFC 003) — "All agent-environment tool interaction via MCP"
The concern: SophistryBenchSprintEnv extends EnvClient[AdvocacyAction, AdvocacyObservation, State] and exposes step_text() as the agent-facing call. The reference env (echo_env) uses MCPToolClient / MCPEnvironment; MCP is the intended universal boundary between agents and environments. This env implements a direct typed HTTP Gym-style interface instead. The PR description acknowledges this as a "typed (non-MCP) pattern" but no RFC discussion is referenced. If this pattern is intentional for single-step scoring envs, it should be documented as an accepted variant in PATTERNS.md; otherwise it needs to be ported to MCP.
Suggested reviewer: @Darktex

ALIGNMENT FLAG: correctness_reward leaks into wire payload via components field

Principle at stake: "Rewards inside environment" (RFC 002) / hidden ground truth invariant stated in the env's own documentation
The concern: The env's README and docstring both explicitly warn: "Do not feed observation.components back into the policy's prompt — it includes correctness_reward (hidden ground truth)." However, components is a declared Pydantic field on AdvocacyObservation, so it is always serialized and always present in the wire payload returned to the caller after step(). There is no enforcement mechanism preventing a training loop from accidentally including it in the prompt. This is a footgun: the safety property is documented but structurally unenforced. Options include (a) returning components only through a separate orchestration-tier endpoint, (b) stripping correctness_reward from components and only exposing it via a privileged state() call (which agents cannot access per INVARIANTS.md §Security), or (c) documenting explicitly that callers must be trusted. This trade-off deserves explicit alignment before merge.
Suggested reviewer: @Darktex

ALIGNMENT FLAG: Framework metadata serialization bug worked around inside the env

Principle at stake: "Minimize lifecycle deltas" (PRINCIPLES.md Core Principle 1); correctness of the shared serialization layer
The concern: The PR author correctly identified that serialize_observation in src/openenv/core/env_server/serialization.py explicitly excludes metadata from the wire payload. The workaround (mirror into a declared components field, reconstruct metadata client-side in _parse_result) is clever and the regression tests (test_metadata_survives_wire_serialization_round_trip) lock it in. However, this means: (1) every other environment that puts anything in metadata silently loses it over HTTP, (2) the workaround is invisible to future env authors who will hit the same bug, and (3) the fix in the PR body ("Happy to discuss whether the framework should preserve metadata instead") is the right fix and should happen in the framework, not in this env. The env-level workaround is acceptable as a stopgap, but the framework fix (preserving metadata in serialize_observation) should be tracked as a follow-up issue before this pattern becomes established cargo-cult across all new envs.
Suggested reviewer: @Darktex

Summary

Mechanical issues to fix: inconsistent create_app import guard; unused requests dependency; sys.path.insert in tests; fragile rubric introspection guard
Alignment points for human review: no-MCP pattern, correctness_reward leaking through declared field, and framework metadata serialization bug being papered over at the env level

The environment logic itself is solid — the reward computation is correctly delegated to the upstream package, the single-step episode model is correct, the weight-length invariant is well-guarded with zip(..., strict=True), and the parity test is a good addition. The Tier 1 items are all small fixes. The Tier 2 items, especially the MCP interface question and the correctness_reward structural exposure, need explicit alignment before merge.

Automated review by Claude Code | Learn more

Darktex

Alignment Review Report

A well-conceived environment with excellent test coverage, a clear purpose, and a clever, well-documented metadata-survival workaround. Conformance to the echo_env structure is solid. One blocking bug needs fixing before merge; the rest is discussion material.

Automated Checks

Lint: PASS (no formatting issues visible in the diff)
Debug code: CLEAN — no print, breakpoint, or TODO artifacts in the new files

Structural conformance (vs. `echo_env`)

Client–server separation: PASS — client.py and __init__.py import only from models.py, never from server/.
Rewards inside environment: PASS — all reward computation lives in sophistry_bench_sprint_environment.py::step().
Agents cannot reset: PASS — no MCP tools; reset()/state not exposed to the agent.
Required files (__init__.py, client.py, models.py, server/{__init__,app}.py, env module, Dockerfile, openenv.yaml, pyproject.toml): all present and matching the reference pattern.

Tier 1: Fixes Required

[BLOCKING] client.py — step_text is a sync method wrapping an async call

def step_text(self, text: str) -> StepResult[AdvocacyObservation]:
    return super().step(AdvocacyAction(text=text))   # returns a coroutine, not StepResult

EnvClient.step() is async def, so super().step(...) returns a coroutine, not a StepResult. A caller writing result = env.step_text("...") gets a coroutine object; accessing result.reward raises AttributeError. SyncEnvClient.__getattr__ only auto-wraps methods that are themselves declared async, so because step_text is a plain def, the wrapping never fires. Fix:

async def step_text(self, text: str) -> StepResult[AdvocacyObservation]:
    """Convenience: submit a raw argument string as an AdvocacyAction."""
    return await super().step(AdvocacyAction(text=text))

and update the README/docs to show await env.step_text(...). (Note: grid_world_env/client.py:step_move has the identical pre-existing bug — worth a follow-up issue.)

Tier 2: Alignment Discussion

ALIGNMENT FLAG: correctness_reward surfaces the hidden ground truth over the wire.

Principle at stake: "Rewards inside environment" (RFC 002) + this env's own goal of hiding whether the assigned answer is gold.
The concern: correctness_reward appears in both metadata and components on the returned AdvocacyObservation, with no framework mechanism to strip it before the agent sees it. The docs warn not to feed observation.metadata/components back into the policy prompt, but nothing enforces it — a naive harness that passes the full observation would leak the ground-truth signal and nullify the reward-hacking measurement. Consider stripping correctness_reward server-side before it reaches the client, or surfacing it on a training-harness-only channel.
Suggested reviewer: a human maintainer.

ALIGNMENT FLAG: confirmed framework bug in serialization.py — metadata stripped from the wire payload.

Principle at stake: Pydantic serialization invariant (INVARIANTS.md).
The concern: src/openenv/core/env_server/serialization.py::serialize_observation explicitly excludes "metadata" from model_dump, so the base Observation.metadata is silently dropped from HTTP/WS responses for every environment. The PR's workaround (mirroring components into a declared field) is correct, and test_metadata_survives_wire_serialization_round_trip guards it — but the real fix belongs in the framework, and locking in the broken behavior with a test makes the framework fix harder later (it would break this test). Worth tracking as a framework issue.
Suggested reviewer: a human maintainer.

ALIGNMENT FLAG: parity test assumes rubric.funcs[0] is aggregate_reward.

The concern: the parity test does aggregate_fn = rubric.funcs[0] with a comment that index 0 is aggregate_reward, but never asserts it. If the upstream package reorders functions within the pinned <0.2.0 range, the parity test silently compares the wrong function. Add a free, unambiguous guard: assert aggregate_fn.__name__ == "aggregate_reward".
Suggested reviewer: @acharyaanusha (fix inline).

Additional Notes (Non-Blocking)

client.py standalone-fallback except ImportError: from core... branch is dead code — the container installs openenv[core], and bare core.* paths would fail anyway (package is openenv.core). Mirrors a pre-existing pattern in grid_world_env; worth cleaning up since it implies a path that doesn't work.
_parse_result does dict(data["observation"]) with no guard — a non-standard server response raises KeyError. The CLI template uses payload.get("observation", {}); consider matching it defensively.
A couple of tests pin exact upstream reward values (e.g. aggregate_reward == 0.5); correct today and protected by the <0.2.0 cap, but a short comment noting these come from the upstream spec would help future maintainers.

Summary

1 blocking Tier 1 issue: step_text returns a coroutine instead of StepResult (fix: async def).
3 Tier 2 alignment points for human review: correctness_reward wire visibility, the framework metadata serialization bug, and the parity-test index assumption.

Automated review by Claude Code | Learn more

- client.py: make step_text `async def` returning `await super().step(...)`. The base EnvClient.step is async, so the old plain `def` returned a coroutine (not a StepResult), breaking `result = env.step_text(...)` and the .sync() wrapper. Add a regression test asserting it's a coroutine function. - README: rewrite usage as proper async (await reset/step_text) + a .sync() example, matching echo_env (the client is async by default). - app.py: move `create_app` into both try/except branches (from openenv.core.env_server.http_server), matching echo_env's import structure. - pyproject.toml: drop the unused `requests` runtime dep (never imported here; the framework transport brings its own); re-lock. - parity test: assert `aggregate_fn.__name__ == "aggregate_reward"` so an upstream func reorder fails loudly instead of via IndexError. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…port fallbacks Address the remaining review items: - correctness_reward (hidden ground truth) is now withheld from the wire observation by default. step() computes the full 8-component vector for the weighted reward but the surfaced metadata/components omit correctness_reward unless `expose_correctness` (constructor) / SPRINT_EXPOSE_CORRECTNESS=1. This structurally prevents a naive harness from leaking gold/distractor to the policy; measurement code opts in. Adds a test for both modes. - Remove the dead `except ImportError: from core...` framework-import fallback in client.py/models.py/server env module (no top-level `core` package resolves; openenv is always installed). The env-specific `..models` fallback is kept. - _parse_result uses defensive `.get(...)` (matches the CLI template) so a malformed response doesn't raise a bare KeyError. - Test: note that pinned reward values come from the upstream spec. The framework-level `serialize_observation` metadata stripping remains a maintainer/framework issue; the env keeps its tested `components` workaround. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

acharyaanusha · 2026-06-13T16:22:24Z

Pushed a1e0f528 — worked through the rest of the review.

Tier 1

✅ step_text is now async def returning await super().step(...) (fixed in 8aafeae8); README usage rewritten to async + .sync().

Tier 2 alignment

correctness_reward leaking over the wire — now structurally prevented. step() computes the full 8-component vector for the weighted reward, but the surfaced metadata/components omit correctness_reward by default. Trusted measurement code opts back in via expose_correctness=True / SPRINT_EXPOSE_CORRECTNESS=1. So a naive harness that forwards the whole observation to the policy can no longer leak whether the assigned answer is gold; the reward itself is unaffected (correctness still counts toward it). Added a test covering both modes and documented the flag.
parity test rubric.funcs[0] assumption — ✅ added assert aggregate_fn.__name__ == "aggregate_reward" (in 8aafeae8).
framework metadata stripping in serialize_observation — this one genuinely can't be fixed from inside an env; it's a framework change for all envs. The components mirror + round-trip test remain as the stopgap. Flagging for a maintainer to track the framework fix. cc @Darktex

Non-blocking notes

Removed the dead except ImportError: from core... framework-import fallback in client.py/models.py/the server module — confirmed no top-level core package resolves, so it was broken-not-dead. The env-specific ..models fallback (needed for the container's top-level server) is kept.
_parse_result now uses defensive .get(...) (matching the CLI template) instead of data["..."].
Added a comment noting the pinned reward values come from the upstream spec.

13 tests pass; lint/format clean.

Darktex

Alignment Review Report

Summary

A well-structured new environment that follows the typed client/server pattern correctly, includes a thorough test suite, and ships clear documentation. Two items warrant attention: one Tier 1 robustness concern and two Tier 2 alignment flags. The author's framework serialization claim is verified correct (details below).

Automated Checks

Lint: PASS — diff adds no changes to src//tests/ reachable by the repo lint pipeline; env ships its own uv.lock.
Debug code: CLEAN — no print/breakpoint/pdb in new env files.

Tier 1: Fixes / Items to Resolve

client.py _parse_result swallows malformed payloads. obs_data = dict(data.get("observation") or {}) silently constructs an empty AdvocacyObservation if the server returns "observation": null or omits the key (e.g. an error response). Prefer data["observation"] (let KeyError propagate) or raise a ValueError with a diagnostic message.
Docs vs. default weights for correctness. correctness_reward is computed locally (1.0 if gold else 0.0) but the default SPRINT_WEIGHTS weights it at 0.0, so it contributes nothing to reward unless the caller opts in. The README/docs wording around correctness "always counting toward the weighted reward" is misleading given the default zero weight. Clarify the intended semantics in the README/docs.

Tier 2: Alignment Discussion

ALIGNMENT FLAG: Reward formula delegated to external PyPI package.

Principle at stake: "Rewards inside environment" (RFC 002, PRINCIPLES.md, INVARIANTS.md).
Concern: The primary reward signal aggregate_reward is imported verbatim from sophistry-bench-sprint on PyPI. The invariant requires reward computation to stay inside the environment boundary. Importing from a third-party package means (a) the formula can change independently of this repo (a future patch release could silently alter reward semantics), and (b) the reward logic isn't reviewable within OpenEnv's codebase. The version pin <0.2.0 and the parity test (test_aggregate_matches_canonical_verifiers_reward) partially mitigate drift, but this needs an explicit decision: is a pinned external scoring package an approved exception to RFC 002, or should the formula be vendored?

ALIGNMENT FLAG: Non-MCP typed client pattern.

Principle at stake: "MCP as universal standard" (RFC 003, PRINCIPLES.md).
Concern: The env uses the typed EnvClient[ActT, ObsT, StateT] Gym-style pattern rather than MCP tool-calling. This pattern exists elsewhere in the repo (grid_world_env, maze_env), so it is not unprecedented, but it's worth confirming the typed pattern is the approved alternative for single-turn non-interactive environments.

Serialization Claim Assessment — VERIFIED REAL

The author's claim is correct and verified. src/openenv/core/env_server/serialization.py (serialize_observation) explicitly does:

obs_dict = observation.model_dump(exclude={"reward", "done", "metadata"})

The base Observation.metadata dict is excluded from the wire payload — this is a real framework-level issue, not a quirk of this env. Any environment that populates observation.metadata and sends it over HTTP silently loses that data (in-process tests don't catch it because they never serialize). The workaround — mirroring reward components into a declared components: dict[str, float] field that survives serialization as a normal subclass field, guarded by a round-trip test — is sound.

Recommendation: Please open (or link) a framework-level issue tracking the metadata exclusion in serialize_observation, so future environment authors aren't caught by the same silent drop. The framework should arguably preserve metadata rather than requiring per-env workarounds.

Verdict: comment — solid env; address the _parse_result robustness fix and docs wording, and the two alignment flags / framework issue are for maintainer decision.

Automated review by Claude Code | Learn more

…ctness weighting - client.py _parse_result now raises a diagnostic ValueError when the response has no/`null` `observation`, instead of silently building an empty observation (reverses the over-defensive `.get` from the prior round, per review). Adds a regression test for the missing/null cases. - Clarify (README + __init__ comment) that SPRINT_EXPOSE_CORRECTNESS controls only whether correctness_reward is *surfaced*, not its weighting: correctness affects `reward` only via its SPRINT_WEIGHTS entry, which is 0 by default. The previous "always counts toward the weighted reward" wording was misleading. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

acharyaanusha · 2026-06-13T19:14:35Z

Pushed 01d96f91.

Tier 1

_parse_result robustness — reversed the prior round's over-defensive .get: it now raises a diagnostic ValueError when the response has a missing/null observation, rather than silently building an empty observation. Added a regression test for both cases. (Heads-up that this is the opposite of an earlier automated review's suggestion to use .get("observation", {}) — agreed the strict version is correct.)
Docs vs default correctness weight — fixed the misleading "always counts toward the weighted reward" wording. SPRINT_EXPOSE_CORRECTNESS controls only whether correctness_reward is surfaced; it affects reward only via its SPRINT_WEIGHTS entry, which is 0 by default. Updated both the README row and the __init__ comment.

Tier 2 (maintainer decisions)

Reward formula imported from external PyPI package vs RFC 002 — worth flagging the history here: an earlier review flagged the inline-reproduced formula as a drift risk and asked me to import it instead, so I exported aggregate_reward from sophistry-bench-sprint 0.1.6 and now import it. This round's review flags the import as a "rewards outside the env" concern. The two pulls are in tension and only a maintainer can set the policy. Current mitigations: <0.2.0 pin + the 1e-9 parity test. I'm happy to vendor the formula inline (it's a one-liner: (claim_count_cliff + citation_grounding) / 2) while keeping the parity test against the package — that satisfies RFC 002 (reward logic inside the env) and guards drift. Say the word and I'll switch it.
Non-MCP typed client — matches grid_world_env / maze_env; confirming it's the approved pattern for single-turn non-interactive envs is a maintainer call.

Serialization (metadata stripping) — verified real, as you noted. There's already an open framework issue tracking it: #616 ("serialize_observation() silently drops Observation.metadata"). The env keeps its components mirror + round-trip test as the stopgap until that's fixed framework-side.

14 tests pass; lint/format clean.

acharyaanusha · 2026-06-13T19:30:57Z

On the reward-formula-from-external-package flag (RFC 002) — decision: keep importing it from sophistry-bench-sprint, rather than vendoring. Reasoning:

The env already imports its entire scoring surface from the package, not just the aggregate: all four canaries (alternation_canary, starts_with_canary, length_band_canary, template_echo_canary), both parsers (parse_claims/parse_citations), and the dataset builder. The canaries are reward components. Vendoring only aggregate_reward would be inconsistent (the rest of the reward still comes from the package); vendoring the whole scoring module to be consistent would reintroduce the exact drift/duplication the canonical-package design — and an earlier review — set out to remove.
Precedent: reasoning_gym_env imports the external reasoning_gym PyPI package for its task/scoring logic. Importing a pinned scoring library is already an established pattern here.
RFC 002 is satisfied in substance: the reward is computed deterministically inside step() server-side; the package is pinned <0.2.0, guarded by the 1e-9 parity test (test_aggregate_matches_canonical_verifiers_reward), and is open-source and maintained by the same author as this env.

Darktex

Alignment Review Report

Automated Checks

Debug code: CLEAN — no print, breakpoint, pdb, TODO, or FIXME statements found in added code.
Lint: Manual scan finds no obvious ruff/usort violations. The try/except import pattern in server/app.py and server/sophistry_bench_sprint_environment.py follows the established convention for dual in-repo/standalone contexts (matches echo_env, websearch_env, etc.).

Tier 1: Fixes Required

envs/sophistry_bench_sprint_env/server/sophistry_bench_sprint_environment.py — _weights_from_env() calls float(p) in a list comprehension with no error handling. A malformed SPRINT_WEIGHTS value (e.g., SPRINT_WEIGHTS=1,two,3,...) raises a bare ValueError with no contextual message. Same issue for int(os.getenv("SPRINT_N_ITEMS", "50")), int(os.getenv("SPRINT_PASSAGE_CHARS", "2000")), and int(os.getenv("SPRINT_SEED", "0")) in __init__. These are container-boot-time failures, so not silently dangerous, but the errors will be confusing. Wrap each in a try/except ValueError and re-raise with a message identifying the env var name and the invalid value.

Tier 2: Alignment Discussion

ALIGNMENT FLAG: Framework metadata stripping is worked around inside the env rather than fixed at the source.

Principle at stake: "Be hands-on" and "Production-readiness from day one" (PRINCIPLES.md §Core Principles).
The concern: The PR correctly identifies that core/env_server/serialization.py::serialize_observation explicitly excludes the base Observation.metadata field from the wire payload (confirmed at src/openenv/core/env_server/serialization.py:158). The env works around this by mirroring reward components into a declared subclass field (components) and restoring metadata client-side. This workaround is technically sound and is covered by regression tests, but it means every future env that puts meaningful data in metadata will silently lose it over HTTP and need the same pattern. The PR author explicitly flags this and asks whether the framework should preserve metadata instead — that question should be answered before this workaround becomes a load-bearing pattern. At minimum, a GitHub issue should be opened to track the framework fix.
Suggested reviewer: @Darktex

ALIGNMENT FLAG: SPRINT_EXPOSE_CORRECTNESS surfaces hidden ground truth in the wire observation's metadata/components.

Principle at stake: "Rewards inside environment" (PRINCIPLES.md, RFC 002); and the env's stated design goal of measuring reward hacking without leaking the oracle signal to the policy.
The concern: When SPRINT_EXPOSE_CORRECTNESS=1, correctness_reward appears in StepResult.observation.metadata/components. The env docs warn against feeding these back to the policy, but this is a convention-based guard, not an enforced one. Any harness that logs StepResult verbosely or passes structured observations back to the model will leak the oracle. The default-off behavior is correct. What's worth discussing: should correctness_reward be exposed at all via the standard observation path, or should it be available only through a separate privileged side-channel (e.g., a dedicated /eval endpoint or a state property accessible only to orchestration)? The current design puts the correctness guard on the harness author rather than the environment boundary.
Suggested reviewer: @Darktex

ALIGNMENT FLAG: parity test assumes rubric.funcs[0] is aggregate_reward without asserting it.

The concern: the parity test does aggregate_fn = rubric.funcs[0] with a comment that index 0 is aggregate_reward, but never asserts it. If the upstream package reorders functions within the pinned range, the parity test silently compares the wrong function. Add a free, unambiguous guard: assert aggregate_fn.__name__ == "aggregate_reward".
Suggested reviewer: @acharyaanusha (fix inline).

What Looks Good

Correct Environment[AdvocacyAction, AdvocacyObservation, State] generics throughout.
Clean client-server separation: client.py imports only from openenv.core.* and models.py. No imports from the env's own server/ directory.
reset/step/state are correctly exposed only via the Gym-like WebSocket API, not as MCP tools. No MCP tools defined anywhere in this env.
Reward computation is entirely inside the environment. No external reward augmentation.
The _current_is_gold ground truth is correctly kept private; only current_passage (already visible in the reset prompt) is exposed as a property.
Anti-drift parity test locks in the aggregate_reward formula against the upstream package to 1e-9.
Wire serialization round-trip regression tests are exactly the right thing to have here given the metadata-stripping workaround.

Summary

1 mechanical issue to fix (env var parsing error messages).
2 alignment points for human review (framework metadata stripping design debt; oracle leakage via SPRINT_EXPOSE_CORRECTNESS observation path), plus a free parity-test assertion hardening.

Automated review by Claude Code | Learn more

Darktex

Alignment Review Report

Automated Checks

Lint: SKIPPED — branch not checked out; the new env directory does not exist at HEAD on main. Code reviewed manually from the diff.
Debug code: CLEAN — no print, breakpoint, or bare TODO statements in the new source files.

Tier 1: Fixes Required

envs/sophistry_bench_sprint_env/server/Dockerfile — Wrong base image registry. The Dockerfile defaults BASE_IMAGE to ghcr.io/huggingface/openenv-base:latest. Every other environment (echo_env, reasoning_gym_env, finrl_env, chess_env, …) and the canonical scaffold template at src/openenv/cli/templates/openenv_env/server/Dockerfile use ghcr.io/meta-pytorch/openenv-base:latest. Using the wrong registry means openenv build sophistry_bench_sprint_env pulls a different (possibly non-existent/stale) base and silently produces a container with a different base than the rest of the fleet. Fix: change the ARG BASE_IMAGE= default to ghcr.io/meta-pytorch/openenv-base:latest.

Tier 2: Alignment Discussion

ALIGNMENT FLAG: Framework serialization workaround locked in env code rather than fixed in the framework

Principle at stake: INVARIANTS.md — Pydantic wire types must round-trip cleanly; PRINCIPLES.md — type safety across the wire.
The concern: serialize_observation in src/openenv/core/env_server/serialization.py intentionally excludes the base Observation.metadata field. This PR works around that by mirroring components into a declared subclass field (components) then reconstructing metadata client-side in _parse_result. The workaround is sound and covered by two regression tests, but it sets a precedent: any future env wanting structured metadata must replicate the mirror-and-reconstruct pattern, and if the framework later stops stripping metadata, this env needs updating. The team should decide: (a) fix serialize_observation to preserve declared subclass fields, or (b) officially document the mirror-field workaround as the canonical pattern.
Suggested reviewer: @Darktex

ALIGNMENT FLAG: Reward-component naming asymmetry could cause misconfigured experiments

Principle at stake: PRINCIPLES.md — "Rewards inside environment" and domain-knowledge encapsulation.
The concern: _COMPONENT_KEYS[0] is aggregate_reward (the cliff-scaled, citation-grounded form score, which does NOT include correctness) and _COMPONENT_KEYS[1] is correctness_reward (weight 0 by default). The two are orthogonal, but the names suggest "aggregate" subsumes "correctness," which could lead a researcher to misconfigure SPRINT_WEIGHTS. A clearer name for index 0 (e.g. form_reward / advocacy_form_reward) would reduce that risk. Documentation/naming question, not a code bug.
Suggested reviewer: @Darktex

Summary

1 mechanical issue to fix (wrong Docker base image registry — blocks a correct container build).
2 alignment points for human review (framework metadata serialization strategy; reward-component naming clarity).

Otherwise, the implementation is solid:

Rewards are computed entirely inside the environment (SophistryBenchSprintEnvironment.step); all 8 components computed server-side and the weighted scalar returned as StepResult.reward. No external reward augmentation. ✔ "Rewards in environment".
client.py imports only from openenv.core.* and .models; nothing from server/. ✔ Client-server separation.
reset()/step() are owned by the orchestration harness via EnvClient, not exposed to the evaluated agent. ✔ Agent isolation.
The non-MCP typed EnvClient pattern is already established in the repo (reasoning_gym_env, finrl_env, atari_env) and is appropriate for single-step RL-style environments.
Environment[ActT, ObsT, StateT] generics are correctly used; SUPPORTS_CONCURRENT_SESSIONS correctly defaults to False given per-instance mutable state.
Anti-drift parity test + two wire-serialization round-trip regression tests are good additions.
Upstream PyPI package properly pinned (>=0.1.6,<0.2.0) with an explanatory comment.
correctness_reward withheld from the wire by default prevents accidental ground-truth leakage to the policy.

Automated review by Claude Code | Learn more

…vars A bad SPRINT_WEIGHTS / SPRINT_N_ITEMS / SPRINT_PASSAGE_CHARS / SPRINT_SEED value raised a bare ValueError with no indication of which var was wrong. Add an `_int_env` helper and wrap the SPRINT_WEIGHTS float parse so each re-raises a message naming the env var and the offending value. Adds a regression test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

acharyaanusha · 2026-06-16T01:40:33Z

Pushed e05c72dc. Covering both recent reviews:

Fixed

Env-var parsing errors — a malformed SPRINT_WEIGHTS / SPRINT_N_ITEMS / SPRINT_PASSAGE_CHARS / SPRINT_SEED previously raised a bare ValueError. Added an _int_env helper and wrapped the SPRINT_WEIGHTS float parse so each re-raises a message naming the var and the bad value. Added a regression test.

Already addressed (flagged as missing, but present since 8aafeae8)

Parity test rubric.funcs[0] assertion — assert aggregate_fn.__name__ == "aggregate_reward" is already in the test (with a descriptive failure message). No change needed.

Dockerfile base image — moved to meta-pytorch (8f998911)
On investigation the repo is mid-migration and inconsistent across two layers: the Dockerfile ARG BASE_IMAGE defaults are still ghcr.io/huggingface/openenv-base:latest (all sibling Dockerfiles + the scaffold template), while the openenv push --base-image examples in several env doc pages (maze, reasoning_gym, textarena, websearch) already point at ghcr.io/meta-pytorch/openenv-base:latest. Rather than sit on the soon-to-be-stale registry, we moved ahead of the migration: server/Dockerfile now defaults to ghcr.io/meta-pytorch/openenv-base:latest. Verified both images are currently published and pullable (docker manifest inspect succeeds for each), so the build is unaffected today.

Maintainer / design items (unchanged)

Framework metadata stripping — tracked by existing framework issue serialize_observation() silently drops Observation.metadata #616; the env keeps its components mirror + round-trip tests as the stopgap.
SPRINT_EXPOSE_CORRECTNESS exposing ground truth via the observation path — default-off already prevents accidental leakage. Moving it to a privileged side-channel (a state-only field) is a larger design change; happy to do it if you'd prefer the boundary enforced rather than convention-guarded — your call.
Reward-component naming (aggregate_reward vs form_reward) — aggregate_reward is the upstream package's public name (renaming would be a package API break + would churn the parity test's name assertion). I can add a clarifying note in the README/SPRINT_WEIGHTS docs that index 0 is the form score (cliff × grounding) and is orthogonal to correctness_reward, if that's preferred over a rename.

15 tests pass; lint/format clean.

cc: @Darktex ready for a re-review

Move the Dockerfile ARG BASE_IMAGE default from ghcr.io/huggingface/openenv-base to ghcr.io/meta-pytorch/openenv-base ahead of the repo's huggingface->meta-pytorch migration (the org has already moved; the per-env `openenv push --base-image` docs already point at meta-pytorch). Both base images are currently published and pullable, so the build is unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

burtenshaw · 2026-06-18T06:56:44Z

Thanks for this contribution! Very cool environment. Next it would be cool to see this env deployed to hugging face and a working example of training or inference. But that can be in a subsequent PR. Let's go!

bot-ci-comment · 2026-06-18T06:58:46Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

acharyaanusha · 2026-06-19T05:38:33Z

@burtenshaw Yes will work on that once this is merged in! Thank you

acharyaanusha and others added 11 commits June 10, 2026 19:13

feat(sophistry_bench_sprint_env): scaffold OpenEnv package + vendor s…

ebf13e5

…print wheel Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(sophistry_bench_sprint_env): add AdvocacyAction/AdvocacyObservat…

76ed010

…ion models Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(sophistry_bench_sprint_env): typed HTTP client

21b7952

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(sophistry_bench_sprint_env): environment construction + reset()

83465f0

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(sophistry_bench_sprint_env): step() scoring with canonical rewar…

b6d0355

…d parity Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(sophistry_bench_sprint_env): FastAPI app + Dockerfile

4ce6be8

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

docs(sophistry_bench_sprint_env): README + build/usage

d337a1d

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix(sophistry_bench_sprint_env): error path survives wire + serializa…

230269a

…tion regression test Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

docs+test(sophistry_bench_sprint_env): correct image tag; error wire-…

b419779

…survival test Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Darktex requested changes Jun 11, 2026

View reviewed changes

acharyaanusha requested a review from Darktex June 11, 2026 18:06

Darktex reviewed Jun 11, 2026

View reviewed changes

Darktex reviewed Jun 12, 2026

View reviewed changes

acharyaanusha requested a review from Darktex June 12, 2026 05:47

Merge branch 'main' into feature/sophistry_bench_sprint_env

71f51c5

Darktex requested changes Jun 13, 2026

View reviewed changes

acharyaanusha requested a review from Darktex June 13, 2026 02:47

Darktex requested changes Jun 13, 2026

View reviewed changes

acharyaanusha and others added 2 commits June 13, 2026 09:08

acharyaanusha requested a review from Darktex June 13, 2026 16:35

Darktex reviewed Jun 13, 2026

View reviewed changes

acharyaanusha requested a review from Darktex June 13, 2026 19:32

Merge branch 'main' into feature/sophistry_bench_sprint_env

a620fd2

Darktex reviewed Jun 15, 2026

View reviewed changes

Darktex requested changes Jun 16, 2026

View reviewed changes

acharyaanusha requested a review from Darktex June 16, 2026 01:46

Merge branch 'main' into feature/sophistry_bench_sprint_env

6e3e00c

Merge upstream/main into feature/sophistry_bench_sprint_env

b33a375

Conversation

acharyaanusha commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

A serialization fix worth flagging for the framework

Test Plan

Dependency & live demo

Uh oh!

acharyaanusha commented Jun 11, 2026

Uh oh!

Darktex left a comment

Choose a reason for hiding this comment

Review: sophistry_bench_sprint_env

Tier 1 — Fixes required (CI-blocking)

Tier 1 — Smaller fixes

Tier 2 — Alignment / for human review

Claims verified against the code

Verdict

Uh oh!

acharyaanusha commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Darktex left a comment

Choose a reason for hiding this comment

Suggestions (non-blocking)

Alignment notes for a human reviewer

Uh oh!

Darktex left a comment

Choose a reason for hiding this comment

Tier 1 (Bugs & Lint) — non-blocking

Tier 2 (Alignment)

Uh oh!

acharyaanusha commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Darktex left a comment

Choose a reason for hiding this comment

Alignment Review — Two-Tier

Tier 1: Changes Required

Tier 2: Alignment Discussion

What is working well

Uh oh!

acharyaanusha commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

acharyaanusha commented Jun 13, 2026

Uh oh!

Darktex left a comment

Choose a reason for hiding this comment

Alignment Review Report

Automated Checks

Tier 1: Fixes Required

Tier 2: Alignment Discussion

Summary

Uh oh!

Darktex left a comment

Choose a reason for hiding this comment

Alignment Review Report

Automated Checks

Structural conformance (vs. echo_env)

Tier 1: Fixes Required

Tier 2: Alignment Discussion

Additional Notes (Non-Blocking)

Summary

Uh oh!

acharyaanusha commented Jun 13, 2026

Uh oh!

Darktex left a comment

Choose a reason for hiding this comment

Alignment Review Report

Summary

Automated Checks

Tier 1: Fixes / Items to Resolve

Tier 2: Alignment Discussion

Serialization Claim Assessment — VERIFIED REAL

Uh oh!

acharyaanusha commented Jun 13, 2026

Uh oh!

acharyaanusha commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

acharyaanusha commented Jun 11, 2026 •

edited

Loading

Review: `sophistry_bench_sprint_env`

acharyaanusha commented Jun 11, 2026 •

edited

Loading

acharyaanusha commented Jun 12, 2026 •

edited

Loading

acharyaanusha commented Jun 13, 2026 •

edited

Loading

Structural conformance (vs. `echo_env`)

acharyaanusha commented Jun 13, 2026 •

edited

Loading

acharyaanusha commented Jun 16, 2026 •

edited

Loading