Appendix C: Performance Benchmarks

This appendix provides reference data for typical token usage, cost estimates, and performance characteristics across common Claude Code task types. All figures are approximate and represent typical ranges observed across real development workflows.

Token Usage by Task Type

Token counts represent total tokens (input + output) for a complete task, including context accumulation. Actual usage varies significantly based on codebase size, CLAUDE.md length, and conversation history.

Exploration and Understanding Tasks

Task	Typical Token Range	Notes
Explain a single 100-200 line file	3,000–6,000	One file read + explanation
Understand a module (5-10 files)	10,000–25,000	Multiple file reads, cross-reference
Map a full subsystem (20+ files)	30,000–80,000	Deep exploration; use subagent to contain
Answer a question about a function	1,500–4,000	Targeted read
Explain unfamiliar codebase end-to-end	50,000–150,000	Very wide exploration; plan to compact
Git log analysis (last 50 commits)	5,000–15,000	Varies with commit message length

Code Writing Tasks

Task	Typical Token Range	Notes
Write a unit test for one function	2,000–5,000	One read + generation
Implement a simple utility function	1,500–4,000	Usually no file reads needed
Implement a CRUD endpoint (standard pattern)	5,000–15,000	Reads schema + existing handlers
Implement a medium feature (3-5 files)	15,000–35,000	Multiple reads, multiple writes
Implement a complex feature (10+ files)	40,000–100,000	Consider subagents for parts
Write a complete test suite for a module	10,000–30,000	Module read + test generation
Migrate a file to a new API/library	5,000–20,000	Depends on file size and library complexity

Bug Fixing Tasks

Task	Typical Token Range	Notes
Fix a simple bug (error message provided)	3,000–8,000	Targeted file reads
Debug a complex multi-file issue	20,000–60,000	Exploration-heavy; context fills fast
Trace a performance issue	15,000–50,000	Depends on profiling depth
Fix a failing test (cause not obvious)	5,000–20,000	Test file + relevant source files
Fix a security vulnerability	10,000–30,000	Security analysis + multiple files

Code Review and Analysis Tasks

Task	Typical Token Range	Notes
Review a small PR (1-3 files)	5,000–12,000	Diff + file context
Review a medium PR (5-10 files)	15,000–35,000	Larger diff + cross-file analysis
Review a large PR (20+ files)	40,000–100,000	Consider subagent reviewer
Security audit of a module	15,000–40,000	All files in module + analysis
Performance audit	10,000–30,000	Code + query analysis
Dependency vulnerability scan	5,000–15,000	package.json + npm audit output

Automation and Scripting Tasks

Task	Typical Token Range	Notes
Write a shell script	2,000–6,000	Usually no file reads
Create a CI/CD workflow file	3,000–8,000	Reads existing config if present
Set up a GitHub Actions workflow	4,000–10,000
Batch rename/refactor (10 files)	10,000–25,000	Parallel approach more efficient
Add type annotations (single file)	3,000–8,000	Depends on file size
Documentation generation (module)	5,000–15,000	Module read + doc generation

Cost Estimates

The following estimates use approximate Anthropic API pricing as of early 2025. Subscription plan costs (Pro, Max) are flat-rate and not reflected here — these figures apply to API (pay-per-use) usage.

Pricing tiers (approximate, subject to change):

Model	Input tokens (per 1M)	Output tokens (per 1M)	Cache read (per 1M)
Claude Haiku 4.5	~$1	~$5	~$0.10
Claude Sonnet 4.6	~$3	~$15	~$0.30
Claude Opus 4.6	~$5	~$25	~$0.50

Prompt caching is automatic — Claude Code caches CLAUDE.md, system prompts, and other stable content. Cache reads cost approximately 10% of normal input price, which substantially reduces costs in long sessions where the same context is reused.

Typical task costs (Sonnet, approximate):

Task	Estimated Cost
Quick question / single function explain	$0.01–$0.05
Write a unit test	$0.03–$0.10
Implement a CRUD endpoint	$0.10–$0.30
Implement a medium feature	$0.30–$0.80
Debug a complex issue	$0.40–$1.50
Full PR review (large PR)	$0.80–$2.50
Architecture planning session	$1.00–$4.00

Model selection impact: Running the same task on Opus 4.6 vs Sonnet 4.6 costs approximately 1.7x more per token. However, Opus sessions tend to use more tokens due to deeper reasoning, so the effective cost difference is often 2-3x in practice. For tasks where Sonnet produces satisfactory results, the cost difference over a month of heavy use is still meaningful.

Speed Benchmarks

Latency is primarily determined by output length. Time-to-first-token is typically 1-3 seconds. Full response time depends on output length.

Task	Typical Total Time (Sonnet)
Short explanation	3–8 seconds
Write a function	5–15 seconds
Write a test file	10–30 seconds
Implement a medium feature	30–120 seconds
Large refactor (10+ files)	2–8 minutes
Complex debugging session	5–20 minutes (interactive)

Parallel agent speedup:

Running 3 parallel subagents for independent tasks provides roughly 2.5–3x speedup compared to sequential execution (accounting for orchestration overhead). For tasks that can be parallelized cleanly (such as reviewing 30 files), parallel execution reduces wall-clock time significantly.

Context Window Reference

Claude models have different context window sizes. As of early 2026:

Model	Context Window	Notes
Claude Haiku 4.5	200,000 tokens	Standard
Claude Sonnet 4.6	200,000 tokens	Standard
Claude Opus 4.6	200,000 tokens	Standard
Claude Opus 4.6 (1M)	1,000,000 tokens	Max/Team/Enterprise automatically; disable with `CLAUDE_CODE_DISABLE_1M_CONTEXT=1`

200,000 tokens sounds large, but fills faster than expected:

Content	Approximate Tokens
1,000 lines of TypeScript	~6,000 tokens
One CLAUDE.md file (200 lines)	~1,500 tokens
A 100-message conversation	~20,000–60,000 tokens
Full test suite output (200 tests)	~15,000–30,000 tokens
A large PR diff (500 lines changed)	~10,000–20,000 tokens

Practical limit for sustained quality: Claude Code typically maintains high performance up to roughly 60-70% of context window capacity. Above that, quality degradation becomes perceptible. Above 90%, it becomes significant.

For a 200K window, this means:

Comfortable working range: up to ~130,000 tokens
Quality begins degrading: 130,000–180,000 tokens
Compact or clear: above 180,000 tokens

Benchmarks: Subagent vs. Sequential Processing

For tasks involving many files, using subagents provides meaningful performance advantages:

Scenario: Review 10 files for security issues

Approach	Wall-Clock Time	Main Context Consumed	Quality
Sequential (all in main context)	3-5 min	~30,000-50,000 tokens	Degrades on later files
One subagent per file (parallel)	1-2 min	~5,000 tokens (summaries)	Consistent across all files

Scenario: Implement a feature with extensive codebase exploration

Approach	Context After Implementation	Context Available for Iteration
Explore everything in main context	80,000-120,000 tokens	Limited
Use subagent for exploration, summary back	20,000-30,000 tokens	Substantial

The efficiency gain from delegating exploration to subagents is most pronounced on large codebases with many related files.

Tips for Reducing Token Consumption

The following habits reduce token usage by 30-60% in typical workflows:

Be specific about files to read. "Read src/auth/session.ts, specifically the refreshToken function" vs. "Read the auth module."
Use /clear between unrelated tasks. Removing 30,000 tokens of irrelevant conversation before starting a new task saves real money on API plans.
Compact proactively. Compacting at 50% context fill produces better summaries than waiting until 90%.
Use subagents for exploration. A subagent that reads 20 files costs the same total tokens but keeps 80% of those tokens out of your main context.
Keep CLAUDE.md concise. Every line of CLAUDE.md costs tokens on every session start. A 400-line CLAUDE.md might cost 3,000 tokens per session — at 100 sessions per month, that is 300,000 tokens of pure overhead.
Avoid re-reading unchanged files. If Claude already read a file in this session, reference the information rather than asking Claude to re-read it.
Use /btw for quick lookups. Quick reference questions answered in the /btw overlay consume zero main context tokens.

Appendix C: Performance Benchmarks ​

Token Usage by Task Type ​

Exploration and Understanding Tasks ​

Code Writing Tasks ​

Bug Fixing Tasks ​

Code Review and Analysis Tasks ​

Automation and Scripting Tasks ​

Cost Estimates ​

Speed Benchmarks ​

Context Window Reference ​

Benchmarks: Subagent vs. Sequential Processing ​

Tips for Reducing Token Consumption ​