A CLI ETL tool you drive with natural language. Load a CSV, type "normalize phone numbers" or "drop duplicate emails," and the LLM rewrites a small JSON spec that the runtime replays against the data. The full motivation is in spec/rationale.md; the wire-protocol idea — keeping per-turn token cost constant regardless of table size — is in spec/behavior.md.
Live: zsvedic.github.io/TamedTable — landing page · Open Web App — runs directly in the browser, no install needed.
Organized by lifecycle, not by file type:
TamedTable/ root: README.md, MAP.md (feature + code navigation), LICENSE, .gitignore
├── marketing/ everything the public sees + the shared design base — never part of src/
│ ├── tokens.json design token master — colors, typography, spacing
│ ├── brand/ marks, favicons, lockups, brand.md
│ ├── claude-design-app/ in-browser design canvas (scratch JSX + generated tokens.jsx)
│ ├── illustrations/ SVG feature tiles + gallery
│ └── web/ the landing page that ships to the site root
├── process/ how the project is built; never deployed
│ ├── journal/ historic status reports
│ ├── prompts/ reusable session starters — see AGENTS.md
│ └── repo-tracking/ commit-size script + chart generator
├── spec/ the contract — human-authored / human-blessed
│ ├── README.md spec index + test-fixture naming
│ ├── rationale.md what TamedTable is and why
│ ├── behavior.md what the user sees + what the system does (API-free)
│ ├── code-contract.md types, signatures, libraries, env vars, exit codes
│ ├── prompt-app-edit.md the three LLM prompts (imported by the runtime at init)
│ ├── writing-style.md writing style for every markdown file in the repo
│ ├── packages/ per-package specs — mirrors src/packages/; rules in its README.md
│ └── test-cases/ Gherkin features + -input/-expected/.flow fixtures
├── src/ the implementation — self-contained, deployable unit
│ ├── package.json, … build config; run every bun command from here
│ ├── node_modules/ gitignored
│ ├── packages/ app (core/headless/cli/web) + library packages — regenerable from spec/
│ └── tests/ app step definitions — regenerable from Gherkin
└── temp/ scratch: test outputs, charts, logs — gitignored
behavior.md and code-contract.md are section-aligned twins: behavior.md describes what happens in plain English (no types, no library names); code-contract.md carries the matching types, signatures, env vars, and exit codes. Each section in one links to the same section in the other.
You need bun and an Anthropic API key.
- Install the project's libraries — a one-time step you repeat only if the
dependencies change:
cd src && bun install - Put your API key in a
.envfile at the repo root (the loader walks up fromsrc/to find it):ANTHROPIC_API_KEY=sk-ant-...
Optional env vars and defaults if you omit them:
| Var | Default | What it does |
|---|---|---|
TAMEDTABLE_MODEL |
claude-sonnet-4-6 |
Model that writes the spec patch each turn. |
TAMEDTABLE_CELL_MODEL |
claude-sonnet-4-5 |
Secondary model that fills in per-row LLM cells. Override with claude-haiku-4-5 for cheaper/faster runs at some cost in per-cell fidelity. Must share the primary model's provider. |
TAMEDTABLE_RPM |
40 |
Per-process request-per-minute cap. The Anthropic org-wide ceiling is 50. |
TAMEDTABLE_BATCH_SIZE |
20 |
Rows packed into a single LLM request. The model replies with a JSON array; on a parse failure the runner falls back to per-row calls for that batch. Set to 1 to disable batching. |
TAMEDTABLE_CHUNK_SIZE |
5 |
LLM requests that fire concurrently. Orthogonal to batch size — total parallel rows = batch × chunk. |
TAMEDTABLE_DEBUG |
on |
On by default — the REPL prints a per-turn debug block after a failed request (indented, dimmed, capped at 20 lines). Set to 0, false, or off to disable. |
Interactive REPL — load a CSV, then type natural-language requests. REPL commands use a : prefix (/ is intercepted by Claude Code and other CLI agents): :help lists commands, :undo reverts the last patch, :save <out.jsonl> writes current rows to disk, :save-flow <out.flow> saves the current spec for later replay, :save-py <out.py> exports the flow as a standalone Python script, :reorder <cols> sets the column order for the table view and saved files, :exit (or bare exit) leaves.
bun src/packages/cli/index.ts spec/test-cases/datanorm-input.csv
Email | Phone | Country
alice@example.com | 555-123-4567 | usa
...
> normalize phone numbers
running … row 1: Phone "555-123-4567" → "+15551234567"
Email | Phone | Country
alice@example.com | +15551234567 | usa
...
> exit
Ctrl-C cancels an in-progress request and rolls back the half-applied transformation.
Batch mode — replay a saved .flow against a CSV with no LLM call:
bun src/packages/cli/index.ts execute spec/test-cases/datanorm.flow \
--input spec/test-cases/datanorm-input.csv \
--output temp/out.jsonl
Exit codes are documented in spec/code-contract.md.
A browser front-end runs on the same engine the CLI drives.
Day to day, you need just one command. From the web package's folder:
cd src/packages/web
bun run dev
bun run dev starts a local server with live reload and prints a URL (default http://localhost:5173); open it in your browser. Leave the command running while you use the app — Ctrl-C stops it.
Here is every bun command the web UI uses, and when you need each:
| Command | Run it from | When |
|---|---|---|
bun install |
src/ |
Once during Setup; again only if dependencies change. |
bun run dev |
src/packages/web/ |
Every time you want to use the web UI. |
bun run build |
src/packages/web/ |
Only to deploy — compiles the UI into static files in dist/ for hosting on any web server. Normal use never needs it. |
Why two directories? bun install installs libraries for the whole project at once, so it runs from the project root (src/); bun run dev and bun run build belong to the web package, so they run from that package's folder (src/packages/web/).
Once the page loads, click Settings and paste your Anthropic API key — the web UI reads the key from a per-tab settings panel, not from .env. The Settings panel also picks which Anthropic model drives requests. Then click Open URL or sample… to load a CSV or JSONL from a URL (or pick one of the bundled samples), or use its dropdown for Open local… to load a file from your computer. Type a request in the chat sidebar and watch cells stream in. Click a cell to select it, double-click to edit it, drag a column header to reorder; Undo, Save data, and Save flow mirror the CLI's :undo / :save / :save-flow. The table shows 20 rows per page with a pager along the bottom, and a status footer reports the selected cell and whether the app is idle, running, or saved.
There is no server: the web UI calls Anthropic directly from the browser through the same SDK the CLI uses. File input/output uses the File System Access API where the browser supports it, with a download/upload fallback elsewhere.
Everything runs from src/ — cd src first. (src/ is the self-contained
package: it holds package.json and node_modules, so bun runs there.)
First time on a machine, run bun run setup once. It installs the libraries
and the headless Chromium that the @web profile and test:smoke drive — a
browser binary bun install alone does not fetch.
| Command | Runs |
|---|---|
bun run test |
All tests — the bun unit tests plus all three Cucumber profiles. Offline, no API key. |
bun run test:unit |
The bun unit tests only. |
bun run test:headless |
The Cucumber @headless profile only. |
bun run test:cli |
The Cucumber @cli profile only. |
bun run test:web |
The Cucumber @web profile only. Drives the demos in headless Chromium, so it needs the browser from bun run setup (or bunx playwright install chromium). |
bun run test:smoke |
The module-demo smoke test: builds each demo with the deploy workflow's flags and drives it in headless Chromium. Needs a Chromium binary (bunx playwright install chromium); not part of bun run test. |
bun run test:record |
Re-records the cassettes (see below) against the live Anthropic API. |
bun run typecheck |
Type-check only — tsc --noEmit for the engine packages and the web package. |
Run one feature with TAMEDTABLE_FEATURES, e.g. TAMEDTABLE_FEATURES=validate bun run test.
The Cucumber suite issues real natural-language requests. A live Anthropic call
per scenario takes 7–9 minutes (rate-limited) and needs an API key, so each model
response is recorded once to src/tests/__cassettes__/<feature>.json and
replayed from disk on every later run. The recordings are committed to git;
bun run test replays them by default — seconds, offline, no key.
Each request is fingerprinted over its full prompt, so changing a prompt never
matches an old recording: replay fails loudly with no recording for this request instead of returning a stale answer. When that happens — or when you add
a scenario — refresh the cassettes and commit the updated files:
bun run test:record # needs ANTHROPIC_API_KEY (see Setup above)
For a live run that ignores the cassettes, set TAMEDTABLE_CASSETTE=off.
WoZ (Wizard-of-Oz) and SCRIBE let you iterate TamedTable's behavior interactively without running the implementation. WoZ simulates what TamedTable would do from spec/behavior.md only; when WoZ reveals a gap or surprise, SCRIBE updates the spec.
In a fresh Claude Code session at the repo root:
claude
> @process/prompts/prompt-woz.md
That loads WoZ. Every message you type is independently classified by its first character — no persistent persona switching:
| Prefix | Persona | Use for |
|---|---|---|
> <note> |
SCRIBE | Spec edits: > change the wording of :undo to …, > pin the page size at 20. One-shot — the next message without a > prefix returns to WoZ automatically. |
| anything else | WoZ | Simulate the app's response from spec/behavior.md. |
Visual: WoZ output appears in fenced code blocks (terminal-shaped — that's the simulated TamedTable output). SCRIBE responses appear as markdown blockquotes (every line prefixed with > , mirroring your input prefix).
SCRIBE edits spec/behavior.md (almost always), spec/code-contract.md (only when the API surface changes), or any LLM prompt files the spec references (prompt tuning). It never touches src/, process/journal/, or spec/test-cases/*.feature.
- Re-recording cassettes is slow.
bun run testreplays recorded responses in seconds, butbun run test:recordmakes a live API call per scenario — 7–9 minutes, mostly the 40 RPM throttle waiting out the 50 RPM org ceiling. Re-record only when a prompt changes. - Golden-file fragility on LLM cells. Some
datanormscenarios assert byte equality against a frozen JSONL golden. Sonnet and Haiku produce semantically-equivalent but not byte-identical outputs for ambiguous inputs (e.g. phone numbers without a country code), and the model's own minor revisions can shift the answer over time. Mismatches on LLM-driven cells aren't necessarily regressions — see the determinism note at the end of spec/behavior.md → Headless. - CSV and JSONL only. Both load and save; other tabular formats (
.xlsx,.parquet) are out of scope until their own scenarios are written. - No
{sql}in the web UI. DuckDB is a native module that cannot run in a browser, so{sql}transformations are unavailable in the web front-end; the CLI and headless library support them in full. "Save data" in the web UI writes JSONL only.