Token Monitoring & Budget System

Warp-drive tracks token usage per chunk, per session, and across sessions. It estimates costs, enforces budgets via a circuit breaker, and generates optimization insights. This document covers the full pipeline: capture, storage, enforcement, and reporting.


Architecture

Claude Code session transcript (~/.claude/projects/<slug>/*.jsonl)
       β”‚
       β–Ό
token-snapshot.js  ── reads transcript, computes token delta ──►  JSON snapshot
       β”‚
       β–Ό
state-machine.js   ── stores snapshots in .warp-drive-state.json
       β”‚                  chunk_snapshots[] (per chunk)
       β”‚                  session_total    (at session end)
       β”‚
       β”œβ”€β”€β–Ί checkBudgets()    ── enforces hard limits, triggers circuit breaker
       β”‚
       └──► persistTokenUsage() ── appends records to ~/.claude/token-usage.jsonl
                                        at session completion
       β”‚
       β–Ό
token-report.js   ── reads JSONL, generates markdown/JSON reports
       β”‚                  /token-report skill wraps this
       β–Ό
Optimization insights, cost estimates, trend analysis

Key files

FilePurpose
scripts/warp-drive/token-snapshot.jsReads Claude Code transcript, computes token totals or deltas
scripts/warp-drive/token-report.jsAggregates JSONL records into reports with cost estimates
scripts/warp-drive/state-machine.jsOrchestrates capture, stores state, enforces budgets, persists data
registry/skills/token-report/SKILL.mdUser-invocable /token-report skill definition
~/.claude/token-usage.jsonlPersistent append-only log of all session and chunk records
.claude/.warp-drive-state.jsonPer-project state file (transient, deleted at session end)

Data Flow

When tokens are captured

EventWhat happensState field
chunks_definedMark chunk start timestamptoken_usage.current_chunk_started_at
next_chunkCapture delta since chunk started, push snapshottoken_usage.chunk_snapshots[]
requirement_doneCapture final chunk delta, push snapshottoken_usage.chunk_snapshots[]
session_endedCapture full-session total (no --since filter)token_usage.session_total
session_endedPersist all records to ~/.claude/token-usage.jsonlβ€” (written to disk)

How snapshots work

token-snapshot.js reads the Claude Code session transcript β€” the .jsonl file that Claude Code writes to ~/.claude/projects/<slug>/. Each line is a JSON record with a message.usage object containing token counts.

  • Full snapshot (no --since): sums all message.usage records in the transcript.
  • Delta snapshot (--since <ISO>): sums only records with timestamp >= since. Used for per-chunk deltas.

The state machine calls captureTokenSnapshot(projectRoot, sinceTimestamp) which spawns the script and parses its JSON output. If the script fails or doesn’t exist, it returns null and the session continues without token data.


Budget System

The budget system prevents runaway sessions. It has five constraints, checked before every state transition by checkBudgets().

Constraints

ConstraintConfig keyDefaultCounterEnforcement
Phase timeoutmax_phase_minutes30Elapsed time in current phaseConfigurable: warn, block, or abort
Retry limitmax_retries_per_chunk5state.budgets.retry_countHard (enforced)
Coding cyclesmax_coding_cycles3state.budgets.coding_cyclesHard (enforced)
Total chunksmax_total_chunks20state.metrics.chunks_completedHard (enforced)
Session durationmax_session_minutes480Elapsed since session.started_at_epochHard (enforced)

Phase timeout enforcement mode is set via phase_timeout_enforcement in config:

  • warn (default): advisory only, included in response but doesn’t block
  • block: rejects the transition
  • abort: auto-transitions to budget_exceeded

Circuit breaker

When a hard limit is exceeded, the state machine doesn’t crash or silently continue. It transitions to the budget_exceeded phase β€” a first-class state in the state machine.

any phase ──[hard limit exceeded]──► budget_exceeded
                                          β”‚
                                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”
                                  β–Ό               β–Ό
                           budget_continue    budget_abort
                                  β”‚               β”‚
                                  β–Ό               β–Ό
                              coding          aborted
                          (budgets reset)   (session ends)

The circuit breaker:

  1. Runs checkBudgets() before every transition.
  2. Filters for enforced issues (hard limits, not advisory warnings).
  3. If enforced issues exist and the event is not on the bypass list, transitions to budget_exceeded.
  4. Stores diagnostic info: exceeded_reasons, exceeded_at, exceeded_from_phase.
  5. Presents the user with a choice: continue (extend budget) or abort.

Bypass events β€” these skip the circuit breaker check to avoid deadlocks: abort, abort_resolved, session_ended, budget_continue, budget_abort

Human checkpoint β€” budget_exceeded requires human approval at all automation levels, including Level 3. This is a mandatory checkpoint that cannot be auto-bypassed.

Budget recovery

When the user chooses budget_continue:

  • retry_count and coding_cycles reset to 0
  • budget_extensions counter increments (tracks how many times the user extended)
  • Execution returns to the coding phase

Per-chunk budget resets

After each successful chunk (next_chunk and requirement_done events), retry_count and coding_cycles reset to 0. This means per-chunk limits apply fresh to each chunk, while max_total_chunks and max_session_minutes apply across the entire session.


Reasoning Budget

The reasoning budget controls Claude’s thinking effort per phase. High-reasoning phases get deeper analysis; standard phases get efficient execution. This is the reasoning sandwich pattern.

Defaults

PhaseLevelRationale
prerequisitesstandardMechanical setup
discoveringhighWork discovery needs judgment
planninghighArchitecture decisions need depth
chunkinghighDecomposition affects everything downstream
codingstandardImplementation follows the plan
updating_docsstandardStraightforward documentation
testinghighTest interpretation needs careful analysis
committingstandardMechanical commit creation
reportingstandardStructured output
chunk_completestandardStatus check
requirement_completehighFinal verification β€” last chance to catch issues
mergingstandardMechanical merge/PR
session_endingstandardReporting
budget_exceededstandardDecision presentation
abortedstandardCleanup

Configuration

Override per-phase reasoning in .claude/settings.local.json:

{
  "_workflow": {
    "reasoning_budget": {
      "coding": "high",
      "testing": "standard"
    }
  }
}

Config overrides take precedence over defaults. The level is injected into the state machine’s systemMessage for each phase transition, where it instructs Claude to adjust its reasoning effort.


State File Structure

During a session, token and budget data lives in .claude/.warp-drive-state.json:

{
  "token_usage": {
    "session_total": null,
    "chunk_snapshots": [
      {
        "chunk_index": 0,
        "acs": ["AC-01", "AC-02"],
        "input_tokens": 61250,
        "output_tokens": 22250,
        "cache_read_tokens": 18400,
        "cache_creation_tokens": 3200,
        "total_tokens": 83500,
        "message_count": 11,
        "timestamp": "2026-04-12T14:30:00Z"
      }
    ],
    "current_chunk_started_at": "2026-04-12T15:10:00Z"
  },
  "budgets": {
    "phase_started_at": "2026-04-12T15:10:00Z",
    "retry_count": 0,
    "coding_cycles": 0,
    "merge_retries": 0,
    "push_retries": 0,
    "budget_extensions": 0,
    "exceeded_reasons": null,
    "exceeded_at": null,
    "exceeded_from_phase": null,
    "aborted_at": null,
    "aborted_from_phase": null
  },
  "metrics": {
    "commits": 0,
    "reports_filed": 0,
    "tests_run": 0,
    "chunks_completed": 0,
    "session_duration_minutes": 0
  }
}

session_total is null during the session and populated at session end by a full (unfiltered) snapshot. chunk_snapshots accumulates one entry per completed chunk. current_chunk_started_at is the ISO timestamp used as the --since argument for the next chunk delta.


Persistent Storage

token-usage.jsonl

At session completion, persistTokenUsage() appends records to ~/.claude/token-usage.jsonl. Each line is a self-contained JSON object.

Chunk record

{
  "type": "chunk",
  "session_id": "abc123",
  "project": "paulirv/bodmail",
  "requirement": "#42",
  "branch": "feat/42-email-templates",
  "level": 2,
  "started_at": "2026-04-12T14:00:00Z",
  "chunk_index": 0,
  "acs": ["AC-01", "AC-02"],
  "input_tokens": 61250,
  "output_tokens": 22250,
  "cache_read_tokens": 18400,
  "cache_creation_tokens": 3200,
  "total_tokens": 83500,
  "message_count": 11,
  "timestamp": "2026-04-12T14:30:00Z"
}

Session record

{
  "type": "session",
  "session_id": "abc123",
  "project": "paulirv/bodmail",
  "requirement": "#42",
  "branch": "feat/42-email-templates",
  "level": 2,
  "started_at": "2026-04-12T14:00:00Z",
  "chunks_completed": 3,
  "commits": 3,
  "input_tokens": 245000,
  "output_tokens": 89000,
  "cache_read_tokens": 72000,
  "cache_creation_tokens": 12800,
  "total_tokens": 334000,
  "message_count": 42,
  "timestamp": "2026-04-12T16:45:00Z"
}

Both record types share a base set of fields (session_id, project, requirement, branch, level, started_at) for filtering and grouping.


CLI Tools

token-snapshot.js

Reads a Claude Code session transcript and computes token usage.

node ~/.claude/scripts/warp-drive/token-snapshot.js <project-root> [--since <ISO-timestamp>]
ArgumentRequiredDescription
<project-root>YesAbsolute path to the project directory
--since <ISO>NoOnly count tokens from messages after this timestamp

Output: JSON to stdout with timestamp, session_id, project, project_root, input_tokens, output_tokens, cache_read_tokens, cache_creation_tokens, total_tokens, message_count.

Exit codes: 0 = success, 1 = missing argument, 2 = no transcript found.

How it finds the transcript: Slugifies the project path (/Users/paul/projects/bodmail becomes -Users-paul-projects-bodmail), looks in ~/.claude/projects/<slug>/ for the most recent .jsonl file (excluding subagent transcripts).

token-report.js

Aggregates ~/.claude/token-usage.jsonl into human-readable reports.

node ~/.claude/scripts/warp-drive/token-report.js [options]
FlagDefaultDescription
--last <N>5Show last N sessions
--allβ€”Show all sessions
--project <name>β€”Filter by project name (partial match)
--jsonβ€”Output as JSON instead of markdown
--insightsβ€”Include optimization insights section
--budget <N>5.00Budget threshold in dollars (used by insights)

Standard output sections:

  1. By Project β€” aggregated tokens and cost per project
  2. Session History β€” per-session table: date, project, requirement, chunks, tokens, messages, cost
  3. Totals β€” sum across displayed sessions
  4. Averages β€” per-session averages (shown when 2+ sessions)

Insights output (with --insights):

  1. Top Token Consumers β€” projects ranked by estimated cost
  2. Cost Efficiency per AC β€” cost and messages per acceptance criterion
  3. Sessions Over Budget β€” sessions exceeding the threshold, with overage amount
  4. Optimization Suggestions β€” automated analysis:
    • Cache reuse ratio (read/create) β€” flags low reuse (<5x)
    • Messages per AC β€” flags high back-and-forth (>50)
    • Output-to-input ratio β€” flags verbose sessions
    • Cost trend β€” compares recent 3 sessions against earlier average

/token-report skill

The /token-report skill (provisioned from registry/skills/token-report/) wraps token-report.js for interactive use. It runs the script and presents the output as formatted markdown.


Cost Estimation

Costs are estimated using Claude Opus API pricing:

Token typeRate
Input$15.00 / MTok
Output$75.00 / MTok
Cache read$1.50 / MTok
Cache creation$18.75 / MTok

These rates are defined in token-report.js (line 12). Update them if pricing changes. Cost estimates appear in both the standard report and insights output.


Configuration Reference

All budget and reasoning settings live in .claude/settings.local.json under the _workflow key:

{
  "_workflow": {
    "max_phase_minutes": 30,
    "max_retries_per_chunk": 5,
    "max_coding_cycles": 3,
    "max_total_chunks": 20,
    "max_session_minutes": 480,
    "phase_timeout_enforcement": "warn",
    "reasoning_budget": {
      "discovering": "high",
      "planning": "high",
      "coding": "standard",
      "testing": "high"
    }
  }
}
KeyTypeDefaultDescription
max_phase_minutesnumber30Minutes before phase timeout triggers
max_retries_per_chunknumber5Hard limit on retries per chunk
max_coding_cyclesnumber3Hard limit on code/test cycles per chunk
max_total_chunksnumber20Hard limit on total chunks per session
max_session_minutesnumber480Hard limit on total session duration (8 hours)
phase_timeout_enforcementstring"warn""warn", "block", or "abort"
reasoning_budgetobject{}Per-phase reasoning level overrides
reasoning_budget.<phase>stringvaries"standard" or "high"

Session Summary Integration

Warp-drive session summaries (filed as GitHub Issues with label session-summary) include a Token Usage section with input, output, cache read, and total token counts read from state.token_usage.session_total. This makes cost visible in the project’s issue history without needing to run a separate report.


Troubleshooting

No data in token report

  • Complete at least one warp-drive session. Data is only persisted at session end (session_ended transition).
  • Check ~/.claude/token-usage.jsonl exists and has content.

Token counts are zero or null

  • token-snapshot.js couldn’t find the session transcript. Verify ~/.claude/projects/<slug>/ contains .jsonl files.
  • The slug is computed by replacing / and . with - in the project path.

Budget exceeded unexpectedly

  • Check state.budgets in .claude/.warp-drive-state.json for current counter values.
  • max_total_chunks default is 20 in state-machine.js but 50 in the warp-drive guide config table β€” the state machine default applies unless overridden in settings.local.json.
  • Run node ~/.claude/scripts/warp-drive/state-machine.js status "$(pwd)" to see current budget state.

Cache reuse ratio is low

  • Short sessions with few chunks create cache entries that are never reused. Longer sessions with more chunks per session improve the ratio.
  • Context-busting tool calls (large file reads, many parallel agents) force cache recreation.

Pricing is outdated

  • Update the PRICING object in token-report.js (line 12) when API pricing changes. The skill doc and this doc reference the same values.