Token Monitoring & Budget System
Warp-drive tracks token usage per chunk, per session, and across sessions. It estimates costs, enforces budgets via a circuit breaker, and generates optimization insights. This document covers the full pipeline: capture, storage, enforcement, and reporting.
Architecture
Claude Code session transcript (~/.claude/projects/<slug>/*.jsonl)
β
βΌ
token-snapshot.js ββ reads transcript, computes token delta βββΊ JSON snapshot
β
βΌ
state-machine.js ββ stores snapshots in .warp-drive-state.json
β chunk_snapshots[] (per chunk)
β session_total (at session end)
β
ββββΊ checkBudgets() ββ enforces hard limits, triggers circuit breaker
β
ββββΊ persistTokenUsage() ββ appends records to ~/.claude/token-usage.jsonl
at session completion
β
βΌ
token-report.js ββ reads JSONL, generates markdown/JSON reports
β /token-report skill wraps this
βΌ
Optimization insights, cost estimates, trend analysis
Key files
| File | Purpose |
|---|---|
scripts/warp-drive/token-snapshot.js | Reads Claude Code transcript, computes token totals or deltas |
scripts/warp-drive/token-report.js | Aggregates JSONL records into reports with cost estimates |
scripts/warp-drive/state-machine.js | Orchestrates capture, stores state, enforces budgets, persists data |
registry/skills/token-report/SKILL.md | User-invocable /token-report skill definition |
~/.claude/token-usage.jsonl | Persistent append-only log of all session and chunk records |
.claude/.warp-drive-state.json | Per-project state file (transient, deleted at session end) |
Data Flow
When tokens are captured
| Event | What happens | State field |
|---|---|---|
chunks_defined | Mark chunk start timestamp | token_usage.current_chunk_started_at |
next_chunk | Capture delta since chunk started, push snapshot | token_usage.chunk_snapshots[] |
requirement_done | Capture final chunk delta, push snapshot | token_usage.chunk_snapshots[] |
session_ended | Capture full-session total (no --since filter) | token_usage.session_total |
session_ended | Persist all records to ~/.claude/token-usage.jsonl | β (written to disk) |
How snapshots work
token-snapshot.js reads the Claude Code session transcript β the .jsonl file that Claude Code writes to ~/.claude/projects/<slug>/. Each line is a JSON record with a message.usage object containing token counts.
- Full snapshot (no
--since): sums allmessage.usagerecords in the transcript. - Delta snapshot (
--since <ISO>): sums only records withtimestamp >= since. Used for per-chunk deltas.
The state machine calls captureTokenSnapshot(projectRoot, sinceTimestamp) which spawns the script and parses its JSON output. If the script fails or doesnβt exist, it returns null and the session continues without token data.
Budget System
The budget system prevents runaway sessions. It has five constraints, checked before every state transition by checkBudgets().
Constraints
| Constraint | Config key | Default | Counter | Enforcement |
|---|---|---|---|---|
| Phase timeout | max_phase_minutes | 30 | Elapsed time in current phase | Configurable: warn, block, or abort |
| Retry limit | max_retries_per_chunk | 5 | state.budgets.retry_count | Hard (enforced) |
| Coding cycles | max_coding_cycles | 3 | state.budgets.coding_cycles | Hard (enforced) |
| Total chunks | max_total_chunks | 20 | state.metrics.chunks_completed | Hard (enforced) |
| Session duration | max_session_minutes | 480 | Elapsed since session.started_at_epoch | Hard (enforced) |
Phase timeout enforcement mode is set via phase_timeout_enforcement in config:
warn(default): advisory only, included in response but doesnβt blockblock: rejects the transitionabort: auto-transitions tobudget_exceeded
Circuit breaker
When a hard limit is exceeded, the state machine doesnβt crash or silently continue. It transitions to the budget_exceeded phase β a first-class state in the state machine.
any phase ββ[hard limit exceeded]βββΊ budget_exceeded
β
βββββββββ΄ββββββββ
βΌ βΌ
budget_continue budget_abort
β β
βΌ βΌ
coding aborted
(budgets reset) (session ends)
The circuit breaker:
- Runs
checkBudgets()before every transition. - Filters for enforced issues (hard limits, not advisory warnings).
- If enforced issues exist and the event is not on the bypass list, transitions to
budget_exceeded. - Stores diagnostic info:
exceeded_reasons,exceeded_at,exceeded_from_phase. - Presents the user with a choice: continue (extend budget) or abort.
Bypass events β these skip the circuit breaker check to avoid deadlocks:
abort, abort_resolved, session_ended, budget_continue, budget_abort
Human checkpoint β budget_exceeded requires human approval at all automation levels, including Level 3. This is a mandatory checkpoint that cannot be auto-bypassed.
Budget recovery
When the user chooses budget_continue:
retry_countandcoding_cyclesreset to 0budget_extensionscounter increments (tracks how many times the user extended)- Execution returns to the
codingphase
Per-chunk budget resets
After each successful chunk (next_chunk and requirement_done events), retry_count and coding_cycles reset to 0. This means per-chunk limits apply fresh to each chunk, while max_total_chunks and max_session_minutes apply across the entire session.
Reasoning Budget
The reasoning budget controls Claudeβs thinking effort per phase. High-reasoning phases get deeper analysis; standard phases get efficient execution. This is the reasoning sandwich pattern.
Defaults
| Phase | Level | Rationale |
|---|---|---|
| prerequisites | standard | Mechanical setup |
| discovering | high | Work discovery needs judgment |
| planning | high | Architecture decisions need depth |
| chunking | high | Decomposition affects everything downstream |
| coding | standard | Implementation follows the plan |
| updating_docs | standard | Straightforward documentation |
| testing | high | Test interpretation needs careful analysis |
| committing | standard | Mechanical commit creation |
| reporting | standard | Structured output |
| chunk_complete | standard | Status check |
| requirement_complete | high | Final verification β last chance to catch issues |
| merging | standard | Mechanical merge/PR |
| session_ending | standard | Reporting |
| budget_exceeded | standard | Decision presentation |
| aborted | standard | Cleanup |
Configuration
Override per-phase reasoning in .claude/settings.local.json:
{
"_workflow": {
"reasoning_budget": {
"coding": "high",
"testing": "standard"
}
}
}Config overrides take precedence over defaults. The level is injected into the state machineβs systemMessage for each phase transition, where it instructs Claude to adjust its reasoning effort.
State File Structure
During a session, token and budget data lives in .claude/.warp-drive-state.json:
{
"token_usage": {
"session_total": null,
"chunk_snapshots": [
{
"chunk_index": 0,
"acs": ["AC-01", "AC-02"],
"input_tokens": 61250,
"output_tokens": 22250,
"cache_read_tokens": 18400,
"cache_creation_tokens": 3200,
"total_tokens": 83500,
"message_count": 11,
"timestamp": "2026-04-12T14:30:00Z"
}
],
"current_chunk_started_at": "2026-04-12T15:10:00Z"
},
"budgets": {
"phase_started_at": "2026-04-12T15:10:00Z",
"retry_count": 0,
"coding_cycles": 0,
"merge_retries": 0,
"push_retries": 0,
"budget_extensions": 0,
"exceeded_reasons": null,
"exceeded_at": null,
"exceeded_from_phase": null,
"aborted_at": null,
"aborted_from_phase": null
},
"metrics": {
"commits": 0,
"reports_filed": 0,
"tests_run": 0,
"chunks_completed": 0,
"session_duration_minutes": 0
}
}session_total is null during the session and populated at session end by a full (unfiltered) snapshot. chunk_snapshots accumulates one entry per completed chunk. current_chunk_started_at is the ISO timestamp used as the --since argument for the next chunk delta.
Persistent Storage
token-usage.jsonl
At session completion, persistTokenUsage() appends records to ~/.claude/token-usage.jsonl. Each line is a self-contained JSON object.
Chunk record
{
"type": "chunk",
"session_id": "abc123",
"project": "paulirv/bodmail",
"requirement": "#42",
"branch": "feat/42-email-templates",
"level": 2,
"started_at": "2026-04-12T14:00:00Z",
"chunk_index": 0,
"acs": ["AC-01", "AC-02"],
"input_tokens": 61250,
"output_tokens": 22250,
"cache_read_tokens": 18400,
"cache_creation_tokens": 3200,
"total_tokens": 83500,
"message_count": 11,
"timestamp": "2026-04-12T14:30:00Z"
}Session record
{
"type": "session",
"session_id": "abc123",
"project": "paulirv/bodmail",
"requirement": "#42",
"branch": "feat/42-email-templates",
"level": 2,
"started_at": "2026-04-12T14:00:00Z",
"chunks_completed": 3,
"commits": 3,
"input_tokens": 245000,
"output_tokens": 89000,
"cache_read_tokens": 72000,
"cache_creation_tokens": 12800,
"total_tokens": 334000,
"message_count": 42,
"timestamp": "2026-04-12T16:45:00Z"
}Both record types share a base set of fields (session_id, project, requirement, branch, level, started_at) for filtering and grouping.
CLI Tools
token-snapshot.js
Reads a Claude Code session transcript and computes token usage.
node ~/.claude/scripts/warp-drive/token-snapshot.js <project-root> [--since <ISO-timestamp>]| Argument | Required | Description |
|---|---|---|
<project-root> | Yes | Absolute path to the project directory |
--since <ISO> | No | Only count tokens from messages after this timestamp |
Output: JSON to stdout with timestamp, session_id, project, project_root, input_tokens, output_tokens, cache_read_tokens, cache_creation_tokens, total_tokens, message_count.
Exit codes: 0 = success, 1 = missing argument, 2 = no transcript found.
How it finds the transcript: Slugifies the project path (/Users/paul/projects/bodmail becomes -Users-paul-projects-bodmail), looks in ~/.claude/projects/<slug>/ for the most recent .jsonl file (excluding subagent transcripts).
token-report.js
Aggregates ~/.claude/token-usage.jsonl into human-readable reports.
node ~/.claude/scripts/warp-drive/token-report.js [options]| Flag | Default | Description |
|---|---|---|
--last <N> | 5 | Show last N sessions |
--all | β | Show all sessions |
--project <name> | β | Filter by project name (partial match) |
--json | β | Output as JSON instead of markdown |
--insights | β | Include optimization insights section |
--budget <N> | 5.00 | Budget threshold in dollars (used by insights) |
Standard output sections:
- By Project β aggregated tokens and cost per project
- Session History β per-session table: date, project, requirement, chunks, tokens, messages, cost
- Totals β sum across displayed sessions
- Averages β per-session averages (shown when 2+ sessions)
Insights output (with --insights):
- Top Token Consumers β projects ranked by estimated cost
- Cost Efficiency per AC β cost and messages per acceptance criterion
- Sessions Over Budget β sessions exceeding the threshold, with overage amount
- Optimization Suggestions β automated analysis:
- Cache reuse ratio (read/create) β flags low reuse (<5x)
- Messages per AC β flags high back-and-forth (>50)
- Output-to-input ratio β flags verbose sessions
- Cost trend β compares recent 3 sessions against earlier average
/token-report skill
The /token-report skill (provisioned from registry/skills/token-report/) wraps token-report.js for interactive use. It runs the script and presents the output as formatted markdown.
Cost Estimation
Costs are estimated using Claude Opus API pricing:
| Token type | Rate |
|---|---|
| Input | $15.00 / MTok |
| Output | $75.00 / MTok |
| Cache read | $1.50 / MTok |
| Cache creation | $18.75 / MTok |
These rates are defined in token-report.js (line 12). Update them if pricing changes. Cost estimates appear in both the standard report and insights output.
Configuration Reference
All budget and reasoning settings live in .claude/settings.local.json under the _workflow key:
{
"_workflow": {
"max_phase_minutes": 30,
"max_retries_per_chunk": 5,
"max_coding_cycles": 3,
"max_total_chunks": 20,
"max_session_minutes": 480,
"phase_timeout_enforcement": "warn",
"reasoning_budget": {
"discovering": "high",
"planning": "high",
"coding": "standard",
"testing": "high"
}
}
}| Key | Type | Default | Description |
|---|---|---|---|
max_phase_minutes | number | 30 | Minutes before phase timeout triggers |
max_retries_per_chunk | number | 5 | Hard limit on retries per chunk |
max_coding_cycles | number | 3 | Hard limit on code/test cycles per chunk |
max_total_chunks | number | 20 | Hard limit on total chunks per session |
max_session_minutes | number | 480 | Hard limit on total session duration (8 hours) |
phase_timeout_enforcement | string | "warn" | "warn", "block", or "abort" |
reasoning_budget | object | {} | Per-phase reasoning level overrides |
reasoning_budget.<phase> | string | varies | "standard" or "high" |
Session Summary Integration
Warp-drive session summaries (filed as GitHub Issues with label session-summary) include a Token Usage section with input, output, cache read, and total token counts read from state.token_usage.session_total. This makes cost visible in the projectβs issue history without needing to run a separate report.
Troubleshooting
No data in token report
- Complete at least one warp-drive session. Data is only persisted at session end (
session_endedtransition). - Check
~/.claude/token-usage.jsonlexists and has content.
Token counts are zero or null
token-snapshot.jscouldnβt find the session transcript. Verify~/.claude/projects/<slug>/contains.jsonlfiles.- The slug is computed by replacing
/and.with-in the project path.
Budget exceeded unexpectedly
- Check
state.budgetsin.claude/.warp-drive-state.jsonfor current counter values. max_total_chunksdefault is 20 instate-machine.jsbut 50 in the warp-drive guide config table β the state machine default applies unless overridden insettings.local.json.- Run
node ~/.claude/scripts/warp-drive/state-machine.js status "$(pwd)"to see current budget state.
Cache reuse ratio is low
- Short sessions with few chunks create cache entries that are never reused. Longer sessions with more chunks per session improve the ratio.
- Context-busting tool calls (large file reads, many parallel agents) force cache recreation.
Pricing is outdated
- Update the
PRICINGobject intoken-report.js(line 12) when API pricing changes. The skill doc and this doc reference the same values.