Skip to content

Understanding Context Windows and Token Limits

You are forty minutes into a debugging session. Claude Code has read twelve files, run six commands, and produced a detailed analysis. Then you ask it to implement the fix and the output is nonsensical — it “forgets” the error it just diagnosed, invents an API that does not exist, and ignores the file structure it mapped five minutes ago. You did not do anything wrong. Your context window filled up.

This is not an edge case. It is the single most common failure mode in AI-assisted development, and it is entirely preventable once you understand how context windows work.

  • A concrete understanding of how tokens, context windows, and compaction work
  • Per-tool strategies for monitoring and managing context usage
  • Prompts for recovering from context exhaustion without starting over
  • Rules of thumb for how much context different task types consume

A context window is measured in tokens. Tokens are not words — they are chunks of text that the model processes. As a rough guide, 1,000 tokens equals roughly 750 words, or about 40 lines of code.

Everything in your session consumes tokens:

WhatApproximate Token Cost
Your prompt (a few sentences)50-200 tokens
A typical source file (200 lines)800-1,500 tokens
A large file (1,000 lines)4,000-7,000 tokens
Command output (npm test, 50 lines)200-500 tokens
AI response (paragraph + code)300-1,000 tokens
Conversation history (10 exchanges)5,000-15,000 tokens

Current context window sizes:

ModelContext WindowPractical Capacity
Claude Opus 4.6200K tokens~120K before degradation
Claude Sonnet 4.5200K tokens~120K before degradation
GPT-5.3-Codex200K tokens~130K before degradation
GPT-5.2128K tokens~80K before degradation
Gemini 3 Pro1M+ tokens~600K before degradation

The “practical capacity” column matters more than the raw limit. Model performance does not degrade at a cliff — it degrades gradually. Instructions from early in the conversation get progressively less likely to be followed as the window fills.

You cannot manage what you do not measure. Each tool provides different ways to track context consumption.

Cursor shows token usage in the bottom status bar of the chat panel. Watch for the percentage approaching 100%.

When context is running high:

  • Start a new chat for the next task
  • Use @ mentions to reference specific files instead of letting the agent browse
  • Close irrelevant files in your editor to reduce automatic context

Cursor’s Max Mode unlocks larger context windows (Gemini 3 Pro with 1M+ tokens) for tasks that genuinely need broad context. Use it sparingly — it is slower and more expensive.

The most effective context management strategy is also the simplest: do one thing per session. Instead of “build the notification feature,” break it into tasks like “add the notifications database table” and start a fresh context for each.

Instead of pasting large files into your prompt, reference them by path. The AI reads them on demand, consuming context only when needed.

Use @ mentions to reference files precisely:

Fix the bug in @src/services/auth.ts using the pattern from
@src/services/billing.ts. The test is in
@src/services/__tests__/auth.test.ts.

Cursor’s semantic search can also find relevant code by meaning, reducing the need to specify exact files.

When you need to continue a long session, compaction preserves the essential context while freeing space.

Start a new chat but reference your previous work:

I was just working on the rate limiter. The implementation is in
@src/middleware/rateLimiter.ts and tests are in
@src/middleware/__tests__/rateLimiter.test.ts. Both files are
already complete.
Next task: add Redis connection pooling to the rate limiter.

The AI “forgets” instructions from earlier in the session. This is the clearest sign of context pressure. Your early instructions have been pushed out or down-weighted. Compact or clear, then restate your key constraints in the new prompt.

Compaction loses critical details. Compaction is a summarization process — it can drop details you care about. Always provide explicit guidance on what to preserve. And keep your todo list and plan files on disk where they can be re-read, not just in conversation history.

The AI reads too many files. Some AI agents will read 20 files to answer a simple question. If you notice excessive file reading, interrupt and provide the specific files yourself. Tell the AI not to explore broadly.

You are paying for wasted context. Every token costs money, especially on powerful models like Claude Opus 4.6. If you find yourself regularly hitting context limits, you are probably not scoping tasks tightly enough. See the Cost per Context guide for optimization strategies.