Usage Monitoring and Cost Optimization
Your finance team wants to know how much Claude Code costs per developer per month. Your engineering manager wants to know which teams are getting the most value. Your security team wants audit logs. Without telemetry, you are guessing. With OpenTelemetry, you have dashboards that answer every question.
What You Will Walk Away With
Section titled “What You Will Walk Away With”- OpenTelemetry setup for metrics and event logging
- The
/costcommand and status line for individual tracking - Team cost management with workspace limits and rate limiting
- Token reduction strategies that cut costs without reducing effectiveness
- A practical framework for measuring Claude Code ROI
Individual Cost Tracking
Section titled “Individual Cost Tracking”The /cost Command
Section titled “The /cost Command”Every developer can track their session costs in real-time:
/costOutput:
Total cost: $0.55Total duration (API): 6m 19.7sTotal duration (wall): 6h 33m 10.2sTotal code changes: 42 lines added, 18 lines removedFor continuous visibility, configure your status line to show token usage. See the status line documentation for configuration options.
Typical Cost Ranges
Section titled “Typical Cost Ranges”Based on Anthropic’s published data:
| Metric | Value |
|---|---|
| Average cost per developer per day | $6 |
| 90th percentile daily cost | $12 |
| Monthly average (Sonnet) | $100-200/developer |
| Monthly average (Opus-heavy usage) | $300-500/developer |
OpenTelemetry Setup
Section titled “OpenTelemetry Setup”Quick Start
Section titled “Quick Start”# Enable telemetryexport CLAUDE_CODE_ENABLE_TELEMETRY=1
# Configure OTLP exporterexport OTEL_METRICS_EXPORTER=otlpexport OTEL_LOGS_EXPORTER=otlpexport OTEL_EXPORTER_OTLP_PROTOCOL=grpcexport OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
# Optional: authenticationexport OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer your-token"
# Start Claude CodeclaudeOrganization-Wide Deployment
Section titled “Organization-Wide Deployment”Deploy via managed settings so every developer automatically reports telemetry:
{ "env": { "CLAUDE_CODE_ENABLE_TELEMETRY": "1", "OTEL_METRICS_EXPORTER": "otlp", "OTEL_LOGS_EXPORTER": "otlp", "OTEL_EXPORTER_OTLP_PROTOCOL": "grpc", "OTEL_EXPORTER_OTLP_ENDPOINT": "http://collector.company.com:4317", "OTEL_EXPORTER_OTLP_HEADERS": "Authorization=Bearer company-token" }}Available Metrics
Section titled “Available Metrics”| Metric | Type | What It Tracks |
|---|---|---|
session.count | Counter | Sessions started |
lines_of_code | Counter | Lines added/removed by Claude |
pull_request.count | Counter | PRs created |
commit.count | Counter | Commits made |
cost.usage | Counter | Dollar cost of API calls |
token.usage | Counter | Input and output tokens |
code_edit_tool.decision | Counter | Edit tool allow/deny decisions |
active_time | Counter | Active session time in seconds |
Available Events
Section titled “Available Events”| Event | What It Captures |
|---|---|
user_prompt | When prompts are submitted (content optional via OTEL_LOG_USER_PROMPTS=1) |
tool_result | Tool call results and outcomes |
api_request | API call details (model, tokens, latency) |
api_error | API errors and rate limits |
tool_decision | Permission decisions for tool calls |
Team Cost Management
Section titled “Team Cost Management”Workspace Spend Limits
Section titled “Workspace Spend Limits”For API users, set workspace-level spend limits in the Anthropic Console:
- Go to console.anthropic.com
- Navigate to your Claude Code workspace (auto-created on first authentication)
- Set monthly spend limits per workspace
Rate Limit Guidelines
Section titled “Rate Limit Guidelines”| Team Size | TPM per User | RPM per User |
|---|---|---|
| 1-5 | 200k-300k | 5-7 |
| 5-20 | 100k-150k | 2.5-3.5 |
| 20-50 | 50k-75k | 1.25-1.75 |
| 50-100 | 25k-35k | 0.62-0.87 |
| 100-500 | 15k-20k | 0.37-0.47 |
Per-user TPM decreases with team size because not all users are active concurrently.
Token Reduction Strategies
Section titled “Token Reduction Strategies”Manage Context Proactively
Section titled “Manage Context Proactively”Context size directly drives cost. Every message includes the full conversation history.
- Clear between tasks:
/clearwhen switching to unrelated work - Use targeted compaction:
/compact Keep test output and code changes. Summarize discussion. - Add compaction instructions to CLAUDE.md:
# Compact instructionsWhen compacting, preserve test output, error traces, and file paths. Summarize discussion and reasoning.
Choose the Right Model
Section titled “Choose the Right Model”| Task | Recommended Model | Why |
|---|---|---|
| Code review | Sonnet | Good enough, significantly cheaper |
| Bug fixes | Sonnet | Most bugs do not need Opus-level reasoning |
| Architecture decisions | Opus | Complex multi-step reasoning benefits from Opus |
| Simple file edits | Sonnet (or Haiku for subagents) | Overkill to use Opus |
| Security audits | Opus | Nuanced analysis requires deeper reasoning |
Switch models mid-session with /model or set defaults in /config.
Reduce MCP Server Overhead
Section titled “Reduce MCP Server Overhead”Each MCP server adds tool definitions to your context, consuming tokens even when idle:
- Run
/contextto see what consumes space - Disable unused servers with
/mcp - Prefer CLI tools (
gh,aws,gcloud) over MCP servers when possible - Set
ENABLE_TOOL_SEARCH=auto:5to defer tools that exceed 5% of context
Delegate to Subagents
Section titled “Delegate to Subagents”Subagents have their own context windows. Use them for:
- Verbose operations (reading many files, running test suites)
- Parallel tasks that would otherwise bloat the main context
- Repetitive operations (applying the same change across multiple files)
Configure subagents with cheaper models:
---model: sonnet---When This Breaks
Section titled “When This Breaks”Telemetry data not appearing: Check that CLAUDE_CODE_ENABLE_TELEMETRY=1 is set. Verify the OTLP endpoint is reachable from developer machines. The default export interval is 60 seconds for metrics — wait at least that long before debugging.
Costs higher than expected: Check /context to see what is consuming space. Large MCP server configurations or bloated auto-memory files inflate every request. Also check for sessions that were never cleared — stale context accumulates.
Rate limits hit during high-usage periods: The per-user TPM guidelines assume average concurrency. During training sessions or onboarding events, temporarily increase limits or stagger usage.
Bedrock/Vertex costs not tracked: Claude Code does not send metrics from your cloud provider. Use LiteLLM or your cloud provider’s own cost tracking for Bedrock/Vertex billing.
What is Next
Section titled “What is Next”- Enterprise Integration — Organization-wide telemetry deployment
- GitHub Actions — Track CI costs alongside developer usage
- Performance and Cost Tips — 10 specific tips for reducing token usage