Skip to content

Usage Monitoring and Cost Optimization

Your finance team wants to know how much Claude Code costs per developer per month. Your engineering manager wants to know which teams are getting the most value. Your security team wants audit logs. Without telemetry, you are guessing. With OpenTelemetry, you have dashboards that answer every question.

  • OpenTelemetry setup for metrics and event logging
  • The /cost command and status line for individual tracking
  • Team cost management with workspace limits and rate limiting
  • Token reduction strategies that cut costs without reducing effectiveness
  • A practical framework for measuring Claude Code ROI

Every developer can track their session costs in real-time:

/cost

Output:

Total cost: $0.55
Total duration (API): 6m 19.7s
Total duration (wall): 6h 33m 10.2s
Total code changes: 42 lines added, 18 lines removed

For continuous visibility, configure your status line to show token usage. See the status line documentation for configuration options.

Based on Anthropic’s published data:

MetricValue
Average cost per developer per day$6
90th percentile daily cost$12
Monthly average (Sonnet)$100-200/developer
Monthly average (Opus-heavy usage)$300-500/developer
Terminal window
# Enable telemetry
export CLAUDE_CODE_ENABLE_TELEMETRY=1
# Configure OTLP exporter
export OTEL_METRICS_EXPORTER=otlp
export OTEL_LOGS_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
# Optional: authentication
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer your-token"
# Start Claude Code
claude

Deploy via managed settings so every developer automatically reports telemetry:

{
"env": {
"CLAUDE_CODE_ENABLE_TELEMETRY": "1",
"OTEL_METRICS_EXPORTER": "otlp",
"OTEL_LOGS_EXPORTER": "otlp",
"OTEL_EXPORTER_OTLP_PROTOCOL": "grpc",
"OTEL_EXPORTER_OTLP_ENDPOINT": "http://collector.company.com:4317",
"OTEL_EXPORTER_OTLP_HEADERS": "Authorization=Bearer company-token"
}
}
MetricTypeWhat It Tracks
session.countCounterSessions started
lines_of_codeCounterLines added/removed by Claude
pull_request.countCounterPRs created
commit.countCounterCommits made
cost.usageCounterDollar cost of API calls
token.usageCounterInput and output tokens
code_edit_tool.decisionCounterEdit tool allow/deny decisions
active_timeCounterActive session time in seconds
EventWhat It Captures
user_promptWhen prompts are submitted (content optional via OTEL_LOG_USER_PROMPTS=1)
tool_resultTool call results and outcomes
api_requestAPI call details (model, tokens, latency)
api_errorAPI errors and rate limits
tool_decisionPermission decisions for tool calls

For API users, set workspace-level spend limits in the Anthropic Console:

  1. Go to console.anthropic.com
  2. Navigate to your Claude Code workspace (auto-created on first authentication)
  3. Set monthly spend limits per workspace
Team SizeTPM per UserRPM per User
1-5200k-300k5-7
5-20100k-150k2.5-3.5
20-5050k-75k1.25-1.75
50-10025k-35k0.62-0.87
100-50015k-20k0.37-0.47

Per-user TPM decreases with team size because not all users are active concurrently.

Context size directly drives cost. Every message includes the full conversation history.

  • Clear between tasks: /clear when switching to unrelated work
  • Use targeted compaction: /compact Keep test output and code changes. Summarize discussion.
  • Add compaction instructions to CLAUDE.md:
    # Compact instructions
    When compacting, preserve test output, error traces, and file paths. Summarize discussion and reasoning.
TaskRecommended ModelWhy
Code reviewSonnetGood enough, significantly cheaper
Bug fixesSonnetMost bugs do not need Opus-level reasoning
Architecture decisionsOpusComplex multi-step reasoning benefits from Opus
Simple file editsSonnet (or Haiku for subagents)Overkill to use Opus
Security auditsOpusNuanced analysis requires deeper reasoning

Switch models mid-session with /model or set defaults in /config.

Each MCP server adds tool definitions to your context, consuming tokens even when idle:

  • Run /context to see what consumes space
  • Disable unused servers with /mcp
  • Prefer CLI tools (gh, aws, gcloud) over MCP servers when possible
  • Set ENABLE_TOOL_SEARCH=auto:5 to defer tools that exceed 5% of context

Subagents have their own context windows. Use them for:

  • Verbose operations (reading many files, running test suites)
  • Parallel tasks that would otherwise bloat the main context
  • Repetitive operations (applying the same change across multiple files)

Configure subagents with cheaper models:

---
model: sonnet
---

Telemetry data not appearing: Check that CLAUDE_CODE_ENABLE_TELEMETRY=1 is set. Verify the OTLP endpoint is reachable from developer machines. The default export interval is 60 seconds for metrics — wait at least that long before debugging.

Costs higher than expected: Check /context to see what is consuming space. Large MCP server configurations or bloated auto-memory files inflate every request. Also check for sessions that were never cleared — stale context accumulates.

Rate limits hit during high-usage periods: The per-user TPM guidelines assume average concurrency. During training sessions or onboarding events, temporarily increase limits or stagger usage.

Bedrock/Vertex costs not tracked: Claude Code does not send metrics from your cloud provider. Use LiteLLM or your cloud provider’s own cost tracking for Bedrock/Vertex billing.