Understanding Context Windows and Token Limits
You are forty minutes into a debugging session. Claude Code has read twelve files, run six commands, and produced a detailed analysis. Then you ask it to implement the fix and the output is nonsensical — it “forgets” the error it just diagnosed, invents an API that does not exist, and ignores the file structure it mapped five minutes ago. You did not do anything wrong. Your context window filled up.
This is not an edge case. It is the single most common failure mode in AI-assisted development, and it is entirely preventable once you understand how context windows work.
What You’ll Walk Away With
Section titled “What You’ll Walk Away With”- A concrete understanding of how tokens, context windows, and compaction work
- Per-tool strategies for monitoring and managing context usage
- Prompts for recovering from context exhaustion without starting over
- Rules of thumb for how much context different task types consume
How Context Windows Work
Section titled “How Context Windows Work”A context window is measured in tokens. Tokens are not words — they are chunks of text that the model processes. As a rough guide, 1,000 tokens equals roughly 750 words, or about 40 lines of code.
Everything in your session consumes tokens:
| What | Approximate Token Cost |
|---|---|
| Your prompt (a few sentences) | 50-200 tokens |
| A typical source file (200 lines) | 800-1,500 tokens |
| A large file (1,000 lines) | 4,000-7,000 tokens |
| Command output (npm test, 50 lines) | 200-500 tokens |
| AI response (paragraph + code) | 300-1,000 tokens |
| Conversation history (10 exchanges) | 5,000-15,000 tokens |
Current context window sizes:
| Model | Context Window | Practical Capacity |
|---|---|---|
| Claude Opus 4.6 | 200K tokens | ~120K before degradation |
| Claude Sonnet 4.5 | 200K tokens | ~120K before degradation |
| GPT-5.3-Codex | 200K tokens | ~130K before degradation |
| GPT-5.2 | 128K tokens | ~80K before degradation |
| Gemini 3 Pro | 1M+ tokens | ~600K before degradation |
The “practical capacity” column matters more than the raw limit. Model performance does not degrade at a cliff — it degrades gradually. Instructions from early in the conversation get progressively less likely to be followed as the window fills.
Monitoring Context Usage
Section titled “Monitoring Context Usage”You cannot manage what you do not measure. Each tool provides different ways to track context consumption.
Cursor shows token usage in the bottom status bar of the chat panel. Watch for the percentage approaching 100%.
When context is running high:
- Start a new chat for the next task
- Use
@mentions to reference specific files instead of letting the agent browse - Close irrelevant files in your editor to reduce automatic context
Cursor’s Max Mode unlocks larger context windows (Gemini 3 Pro with 1M+ tokens) for tasks that genuinely need broad context. Use it sparingly — it is slower and more expensive.
Claude Code displays context usage in the status bar. You can customize the status line to show token counts continuously:
# Add to your shell config to show tokens in the status line# Claude Code's /status command also shows current usageWhen context is filling up:
- Run
/compactto summarize the conversation while preserving key decisions - Run
/compact Focus on the API changesto guide what gets preserved - Run
/clearto reset entirely between unrelated tasks - Claude Code automatically compacts when approaching limits
Auto-compaction preserves code patterns, file states, and key decisions while discarding less relevant conversation history.
Codex monitors and reports remaining context space. The model automatically compacts context by summarizing information and discarding less relevant details.
When context is running high:
- Start a new thread for the next task
- Break complex tasks into multiple focused threads
- Use cloud threads for long-running tasks — they have access to the full model context without competing with your local session
Codex can continue working across many steps through repeated compaction, making it effective for complex multi-step tasks.
Strategies for Staying Within Bounds
Section titled “Strategies for Staying Within Bounds”Strategy 1: Scope Every Task Tightly
Section titled “Strategy 1: Scope Every Task Tightly”The most effective context management strategy is also the simplest: do one thing per session. Instead of “build the notification feature,” break it into tasks like “add the notifications database table” and start a fresh context for each.
Strategy 2: Reference, Do Not Dump
Section titled “Strategy 2: Reference, Do Not Dump”Instead of pasting large files into your prompt, reference them by path. The AI reads them on demand, consuming context only when needed.
Use @ mentions to reference files precisely:
Fix the bug in @src/services/auth.ts using the pattern from@src/services/billing.ts. The test is in@src/services/__tests__/auth.test.ts.Cursor’s semantic search can also find relevant code by meaning, reducing the need to specify exact files.
Reference files in your prompt and let Claude read them:
Fix the bug in src/services/auth.ts. Look at the pattern insrc/services/billing.ts for reference. Run the test insrc/services/__tests__/auth.test.ts afterward.Claude will read only the files it needs. You can also use CLAUDE.md imports with @path/to/file syntax to pre-load critical reference files.
Codex gathers context from file contents and tool output as it works. Provide specific file references to guide it:
Fix the bug in src/services/auth.ts. Reference the pattern insrc/services/billing.ts. Run tests insrc/services/__tests__/auth.test.ts to verify.In the IDE extension, open files appear as automatic context, so keep only relevant files open.
Strategy 3: Compact Aggressively
Section titled “Strategy 3: Compact Aggressively”When you need to continue a long session, compaction preserves the essential context while freeing space.
Start a new chat but reference your previous work:
I was just working on the rate limiter. The implementation is in@src/middleware/rateLimiter.ts and tests are in@src/middleware/__tests__/rateLimiter.test.ts. Both files arealready complete.
Next task: add Redis connection pooling to the rate limiter.Use /compact with guidance on what to preserve:
/compact Preserve the list of files modified, the test commands,and the architectural decisions about the rate limiter. Discardthe debugging conversation.You can also use the rewind menu (Escape+Escape) to summarize from a specific point, keeping earlier context intact while compressing recent exploration.
Codex compacts automatically. For manual control, start a new thread with a summary prompt:
Continuing from previous work: I implemented a rate limiter atsrc/middleware/rateLimiter.ts with Redis backend. Tests pass.
Next: add connection pooling. Read the current implementationand extend it.When This Breaks
Section titled “When This Breaks”The AI “forgets” instructions from earlier in the session. This is the clearest sign of context pressure. Your early instructions have been pushed out or down-weighted. Compact or clear, then restate your key constraints in the new prompt.
Compaction loses critical details. Compaction is a summarization process — it can drop details you care about. Always provide explicit guidance on what to preserve. And keep your todo list and plan files on disk where they can be re-read, not just in conversation history.
The AI reads too many files. Some AI agents will read 20 files to answer a simple question. If you notice excessive file reading, interrupt and provide the specific files yourself. Tell the AI not to explore broadly.
You are paying for wasted context. Every token costs money, especially on powerful models like Claude Opus 4.6. If you find yourself regularly hitting context limits, you are probably not scoping tasks tightly enough. See the Cost per Context guide for optimization strategies.