Search and Indexing Strategies
You ask the AI “where do we handle payment webhooks?” It spends 30 seconds reading through your entire src/ directory, finds six files that mention “payment,” reads all of them, and finally gives you an answer — but it consumed 15,000 tokens of context in the process. Meanwhile, a well-indexed project would have returned the relevant file in under a second with minimal context cost.
How your AI tool finds code matters as much as what it finds. The difference between a tool that greps blindly and one that uses semantic understanding determines whether your context window survives long enough to actually implement the fix.
What You’ll Walk Away With
Section titled “What You’ll Walk Away With”- A clear understanding of how each tool indexes and searches your codebase
- Strategies for optimizing search results in each tool
- Prompts that guide efficient code discovery without wasting context
- Configuration patterns for ignore files, indexing settings, and search scoping
How Each Tool Searches
Section titled “How Each Tool Searches”The three tools use fundamentally different approaches to code discovery, and understanding these differences helps you work with them more effectively.
Cursor uses semantic search backed by AI-generated embeddings. This is the most sophisticated approach:
- When you open a workspace, Cursor scans your files and breaks them into meaningful chunks (functions, classes, logical blocks)
- Each chunk is converted into a vector embedding that captures its semantic meaning
- Embeddings are stored in a vector database optimized for fast similarity search
- When you or the agent searches, the query is also converted to a vector and compared against stored embeddings
What this means in practice: You can search by meaning, not just text. Asking “where is authentication handled” finds auth.ts, session-manager.ts, and login-handler.ts even though none of them contain the word “authentication” in their filenames.
Cursor also uses traditional grep alongside semantic search. The agent decides which approach to use based on the query — exact pattern matches use grep, conceptual queries use semantic search.
Key configuration:
- Check indexing status in
Cursor Settings > Indexing & Docs - Semantic search becomes available at 80% indexing completion
- Index auto-updates every 5 minutes, only processing changed files
- Use
.cursorignoreto exclude files from indexing
Claude Code uses file tools for code discovery: Read, Grep, Glob, and Bash. There is no pre-built semantic index — Claude navigates your codebase in real-time using these tools.
- Grep: Pattern-based text search across files. Fast for exact matches
- Glob: Find files by name pattern (e.g.,
**/*.test.ts) - Read: Read specific files or portions of files
- Bash: Run commands like
find,ag, orrgfor more complex searches
What this means in practice: Claude’s search is precise and deterministic, but it relies on knowing where to look. Descriptive filenames and clear project structure dramatically improve Claude’s ability to find relevant code quickly.
Claude Code also supports subagents for investigation. When you need broad codebase exploration, a subagent can search in its own context window and report back a summary, keeping your main session clean:
Use a subagent to investigate where payment webhooks are handled.Report back the key files and the flow.Codex gathers context through file contents, tool output, and its ongoing record of actions. In the IDE extension, open files automatically become context. In the App and CLI, Codex reads files as needed during its work loop.
Codex can use MCP servers for enhanced search capabilities. For example, a JetBrains MCP server can provide precise symbol navigation and code intelligence.
What this means in practice: Codex is efficient at gathering what it needs as it works, but benefits from explicit file references in your prompts. Point it at the right files instead of letting it explore broadly.
Context management is automatic — Codex monitors remaining space and compacts when needed. For large codebases, break work into focused threads that each deal with a specific area of the code.
Optimizing for Search
Section titled “Optimizing for Search”Name Things for Discoverability
Section titled “Name Things for Discoverability”The single most impactful thing you can do for search quality across all tools: use descriptive, consistent naming.
# Bad: The AI has to read each file to know what's insidesrc/ utils.ts helpers.ts service.ts handler.ts
# Good: The AI can find what it needs from filenames alonesrc/ payment-webhook-handler.ts email-notification-service.ts password-validation.ts date-formatting.tsConfigure Ignore Files
Section titled “Configure Ignore Files”Every tool respects .gitignore, but you should also configure tool-specific ignores for files that are tracked in git but irrelevant for AI work.
Create a .cursorignore file in your project root:
# Large data filesdata/fixtures/*.jsonscripts/migration-data/
# Generated code (read the source instead)src/generated/src/__generated__/
# Documentation buildsdocs/build/
# Lock files (too large, too noisy)pnpm-lock.yamlIgnoring large content files improves both indexing speed and answer accuracy because the AI focuses on source code rather than noise.
Claude Code respects .gitignore natively. For additional exclusions, add guidance to your CLAUDE.md:
# Files to ignore- Do not read files in src/generated/ -- these are auto-generated- Do not read *.lock files- Focus on src/ for application code- Focus on tests/ for test filesYou can also use .claude/settings.json to configure permission boundaries that prevent Claude from reading certain paths.
Codex respects .gitignore. Add additional guidance in your AGENTS.md:
## Code Navigation- Application source is in src/- Tests are in tests/ (co-located with source)- Ignore src/generated/ -- auto-generated, do not read or modify- Ignore data/ -- large fixtures, not relevant for code tasksAdvanced Search Patterns
Section titled “Advanced Search Patterns”Tracing a Feature End-to-End
Section titled “Tracing a Feature End-to-End”When you need to understand how a feature works across multiple layers (route -> controller -> service -> database), guide the AI to trace the path efficiently.
Finding All Usages of a Pattern
Section titled “Finding All Usages of a Pattern”Cursor’s semantic search can find conceptually similar code:
Find all places in the codebase where we handle errors fromexternal API calls. I want to see if we have a consistent patternor if each service handles errors differently.Claude can combine Grep and Read for precise pattern discovery:
Search the codebase for all instances of try/catch blocks thathandle HTTP errors from external APIs. Use grep to find them,then read the surrounding context for each. Group them bypattern -- are they consistent or inconsistent?Find all error handling patterns for external API calls in thiscodebase. List each file and the pattern used. Identify if thereis a standard pattern or if they vary.When This Breaks
Section titled “When This Breaks”Semantic search returns irrelevant results. This usually means the index includes too much noise (build artifacts, vendored code, large data files). Update your ignore files and let the index rebuild.
The AI reads too many files during search. Set explicit limits in your prompts: “Do not read more than 5 files.” In Claude Code, use subagents for broad searches to keep your main context clean.
Search finds the wrong version of a function. In codebases with multiple implementations of the same concept (e.g., v1 and v2 of an API), tell the AI which version to focus on. Reference the specific directory or file.
Indexing takes too long or fails. Large codebases (100K+ files) can be slow to index. In Cursor, ensure your .cursorignore excludes large directories. The index builds incrementally, so you can start working once it reaches 80% completion.