Appearance
tdd-flow — test-first development by construction
tdd-flow drives a single change test-first: write pseudocode, derive failing tests from it, prove they're red, then implement until green — with a check-in gate between each step. Coverage is built by construction rather than bolted on afterwards.
Introduced in #604 (part of the test-driven-development capability #593).
Why it exists
/warp-drive already gates on tests — but after implementation: it runs the suite in its testing phase and blocks the commit on red. That catches regressions; it does not give you test-first discipline, and it assumes a suite already exists. tdd-flow adds the test-first path on top of that gate: the test is written and proven to fail before the code exists, so the test is known to actually exercise the new behaviour.
Per the Prime Directive it is built on reuse, not reinvention:
| Concern | Reused from |
|---|---|
| Iterate-until-green loop | loop-primitive (scripts/loop/run.js) |
| "Green" definition | warp-drive's test-command resolution (one source of truth) |
| Check-in level policy | warp-drive's L1/L2/L3 automation conventions |
The flow
pseudocode -> write failing tests -> implement until green -> check-in
│ │ │ │
checkin.sh assert-red.sh loop-primitive checkin.sh1. Pseudocode → check-in
Write the intended behaviour as plain-language pseudocode, capture it as the first inspectable artifact, then gate:
bash
scripts/tdd-flow/artifact.sh init
scripts/tdd-flow/artifact.sh capture --step pseudocode --stdin <<<'…pseudocode…'
scripts/tdd-flow/checkin.sh --gate after-pseudocode # resolve the level policy
scripts/tdd-flow/artifact.sh verdict --step pseudocode --decision approve2. Write failing tests → assert red
Translate the pseudocode into tests in the repo's native framework, then prove they fail before writing any implementation:
For a non-default language, detect the framework and let the unit-test-generator agent author idiomatic tests for it (cross-language authoring, #605):
bash
scripts/tdd-flow/detect-framework.sh # {language, framework, runner, test_convention}
# unit-test-generator authors the failing tests for the detected framework
scripts/tdd-flow/assert-red.sh # exit 0 only when the suite is RED
scripts/tdd-flow/checkin.sh --gate after-testsdetect-framework.sh maps repo markers (Cargo.toml, go.mod, pyproject.toml, package.json) to the native framework + runner, so authored tests run under the project's existing runner with no bespoke harness. The agent emits idiomatic Rust/Go/Python/JS tests and never writes a vacuous (always-green) test — see the Rust and Go examples.
assert-red.sh then exits 0 when the tests fail (precondition satisfied), 1 when they already pass (test-first violated — the tests assert nothing new), and 2 when no test command can be resolved. tdd-flow gates on their redness regardless of who wrote them.
3. Implement until green
Run the implement loop on the delivered loop-primitive — no new runner:
bash
cp ~/.claude/templates/loops/implement-until-green.json /tmp/impl.json
# set evaluator.command to the project's test command, then:
node ~/.claude/scripts/loop/run.js /tmp/impl.json --cwd "$(pwd)"The template forbids editing tests to force green and halts on its guardrails (max_iterations, cost/time budgets) as a blockable event.
4. Check-in
bash
scripts/tdd-flow/checkin.sh --gate after-greenInspectable artifacts & check-in verdicts
Each step lands as a durable, on-disk artifact, and each check-in records an explicit verdict — so a reviewer can intervene early and audit the run afterwards rather than relying on terminal scrollback. Both live in artifact.sh, backed by a per-run store at .tdd-flow/runs/<run-id>/ (one file per step plus a manifest.json ledger). The store is gitignored working state.
bash
scripts/tdd-flow/artifact.sh init # {run, dir}
scripts/tdd-flow/artifact.sh capture --step tests --file tests/foo_test.rs --ext rs
scripts/tdd-flow/artifact.sh verdict --step tests --decision approve
scripts/tdd-flow/artifact.sh manifest # review the whole runThe verdict's exit code is what drives early intervention — a rejected step loops back for revision instead of aborting the flow:
| Decision | Exit | Meaning |
|---|---|---|
approve | 0 | Proceed to the next step. |
reject | 20 | Loop back — revise this step, re-capture, re-gate. |
edit | 21 | The reviewer edited the artifact — re-capture it, then proceed. |
The ledger is append-only, so a reject → revise → approve cycle stays visible after the fact. Re-capturing a step bumps its revision so revisions are distinguishable from the original.
checkin.sh and artifact.sh split the job: checkin.sh resolves how the gate behaves for the active level (ask / confirm / auto); artifact.sh captures the artifact and records the decision + loop-back signal. At L1/L2 the calling flow asks the human and feeds the answer to verdict; at L3 the gate degrades to auto-proceed-with-record — checkin.sh returns auto-proceed and the flow records verdict --mode auto (decision defaults to approve, level captured for audit).
Check-in behaviour by automation level
| Level | Behaviour | checkin.sh exit | recorded verdict |
|---|---|---|---|
| L1 (supervised) | Ask — stop for an explicit human decision | 10 | human approve/reject/edit |
| L2 (trusted dev) | Confirm — proceed on confirmation | 10 | human approve/reject/edit |
| L3 (autonomous) | Auto-proceed-with-record — don't block; log it | 0 | --mode auto approve |
Level is read from .claude/settings.local.json (_automation.active_level), defaulting to L1/ask when unset. With RDB enabled the verdict carries "rdb": true and the L1/L2 ask routes through ask_remote.
Composition with warp-drive
Run tdd-flow within warp-drive's coding phase as the test-first way to produce a chunk; warp-drive's existing testing phase then gates the commit unchanged. Because assert-red.sh and the loop evaluator resolve the test command the same way warp-drive does, the commit gate re-confirms the same green — it does not double-gate, and nothing here overrides test_before_commit.
See also
loop-primitivereference — the iterate-until-done runner this flow's implement step rides on- warp-drive how-to — the post-hoc test gate
tdd-flowcomposes with