Understanding Context Windows and Token Limits

You are forty minutes into a debugging session. Claude Code has read twelve files, run six commands, and produced a detailed analysis. Then you ask it to implement the fix and the output is nonsensical — it “forgets” the error it just diagnosed, invents an API that does not exist, and ignores the file structure it mapped five minutes ago. You did not do anything wrong. Your context window filled up.

This is not an edge case. It is the single most common failure mode in AI-assisted development, and it is entirely preventable once you understand how context windows work.

What You’ll Walk Away With

A concrete understanding of how tokens, context windows, and compaction work
Per-tool strategies for monitoring and managing context usage
Prompts for recovering from context exhaustion without starting over
Rules of thumb for how much context different task types consume

How Context Windows Work

A context window is measured in tokens. Tokens are not words — they are chunks of text that the model processes. As a rough guide, 1,000 tokens equals roughly 750 words, or about 40 lines of code.

Everything in your session consumes tokens:

What	Approximate Token Cost
Your prompt (a few sentences)	50-200 tokens
A typical source file (200 lines)	800-1,500 tokens
A large file (1,000 lines)	4,000-7,000 tokens
Command output (npm test, 50 lines)	200-500 tokens
AI response (paragraph + code)	300-1,000 tokens
Conversation history (10 exchanges)	5,000-15,000 tokens

Current context window sizes:

Model	Context Window	Practical Capacity
Claude Opus 4.6	200K tokens	~120K before degradation
Claude Sonnet 4.5	200K tokens	~120K before degradation
GPT-5.3-Codex	200K tokens	~130K before degradation
GPT-5.2	128K tokens	~80K before degradation
Gemini 3 Pro	1M+ tokens	~600K before degradation

The “practical capacity” column matters more than the raw limit. Model performance does not degrade at a cliff — it degrades gradually. Instructions from early in the conversation get progressively less likely to be followed as the window fills.

Monitoring Context Usage

You cannot manage what you do not measure. Each tool provides different ways to track context consumption.

Cursor shows token usage in the bottom status bar of the chat panel. Watch for the percentage approaching 100%.

When context is running high:

Start a new chat for the next task
Use @ mentions to reference specific files instead of letting the agent browse
Close irrelevant files in your editor to reduce automatic context

Cursor’s Max Mode unlocks larger context windows (Gemini 3 Pro with 1M+ tokens) for tasks that genuinely need broad context. Use it sparingly — it is slower and more expensive.

Claude Code displays context usage in the status bar. You can customize the status line to show token counts continuously:

# Add to your shell config to show tokens in the status line
# Claude Code's /status command also shows current usage

When context is filling up:

Run /compact to summarize the conversation while preserving key decisions
Run /compact Focus on the API changes to guide what gets preserved
Run /clear to reset entirely between unrelated tasks
Claude Code automatically compacts when approaching limits

Auto-compaction preserves code patterns, file states, and key decisions while discarding less relevant conversation history.

I need to understand how [FEATURE] works in this codebase.

Do NOT read entire files. Instead:
1. Search for the entry point (the main function, route handler, or component)
2. Read only the relevant function/class, not the whole file
3. Follow the critical path through at most 3-4 files
4. Summarize the flow in a few paragraphs

I want to conserve context for implementation work afterward.

Strategies for Staying Within Bounds

Strategy 1: Scope Every Task Tightly

The most effective context management strategy is also the simplest: do one thing per session. Instead of “build the notification feature,” break it into tasks like “add the notifications database table” and start a fresh context for each.

This session has ONE task: [SPECIFIC TASK DESCRIPTION].

Files you should read:
- @[specific file 1]
- @[specific file 2]

Files you should NOT read (they are not relevant):
- Anything in src/unrelated/

When done, I will start a new session for the next task.

Strategy 2: Reference, Do Not Dump

Instead of pasting large files into your prompt, reference them by path. The AI reads them on demand, consuming context only when needed.

Use @ mentions to reference files precisely:

Fix the bug in @src/services/auth.ts using the pattern from
@src/services/billing.ts. The test is in
@src/services/__tests__/auth.test.ts.

Cursor’s semantic search can also find relevant code by meaning, reducing the need to specify exact files.

Reference files in your prompt and let Claude read them:

Fix the bug in src/services/auth.ts. Look at the pattern in
src/services/billing.ts for reference. Run the test in
src/services/__tests__/auth.test.ts afterward.

Claude will read only the files it needs. You can also use CLAUDE.md imports with @path/to/file syntax to pre-load critical reference files.

Codex gathers context from file contents and tool output as it works. Provide specific file references to guide it:

Fix the bug in src/services/auth.ts. Reference the pattern in
src/services/billing.ts. Run tests in
src/services/__tests__/auth.test.ts to verify.

In the IDE extension, open files appear as automatic context, so keep only relevant files open.

Strategy 3: Compact Aggressively

When you need to continue a long session, compaction preserves the essential context while freeing space.

Start a new chat but reference your previous work:

I was just working on the rate limiter. The implementation is in
@src/middleware/rateLimiter.ts and tests are in
@src/middleware/__tests__/rateLimiter.test.ts. Both files are
already complete.

Next task: add Redis connection pooling to the rate limiter.

Use /compact with guidance on what to preserve:

/compact Preserve the list of files modified, the test commands,
and the architectural decisions about the rate limiter. Discard
the debugging conversation.

You can also use the rewind menu (Escape+Escape) to summarize from a specific point, keeping earlier context intact while compressing recent exploration.

Codex compacts automatically. For manual control, start a new thread with a summary prompt:

Continuing from previous work: I implemented a rate limiter at
src/middleware/rateLimiter.ts with Redis backend. Tests pass.

Next: add connection pooling. Read the current implementation
and extend it.

Summarize everything we've done in this session:
1. Files created or modified (with paths)
2. Key architectural decisions made
3. Tests written and their current status
4. What remains to be done

Format as a briefing I can paste into a new session to continue the work.

When This Breaks

The AI “forgets” instructions from earlier in the session. This is the clearest sign of context pressure. Your early instructions have been pushed out or down-weighted. Compact or clear, then restate your key constraints in the new prompt.

Compaction loses critical details. Compaction is a summarization process — it can drop details you care about. Always provide explicit guidance on what to preserve. And keep your todo list and plan files on disk where they can be re-read, not just in conversation history.

The AI reads too many files. Some AI agents will read 20 files to answer a simple question. If you notice excessive file reading, interrupt and provide the specific files yourself. Tell the AI not to explore broadly.

You are paying for wasted context. Every token costs money, especially on powerful models like Claude Opus 4.6. If you find yourself regularly hitting context limits, you are probably not scoping tasks tightly enough. See the Cost per Context guide for optimization strategies.

What’s Next

File Organization Structure your project so the AI finds relevant code with minimal context consumption.

Documentation as Context Pre-load critical context automatically with CLAUDE.md, .cursor/rules, and AGENTS.md.

Cost per Context Understand the cost implications of context usage across models and tools.