Search and Indexing Strategies

You ask the AI “where do we handle payment webhooks?” It spends 30 seconds reading through your entire src/ directory, finds six files that mention “payment,” reads all of them, and finally gives you an answer — but it consumed 15,000 tokens of context in the process. Meanwhile, a well-indexed project would have returned the relevant file in under a second with minimal context cost.

How your AI tool finds code matters as much as what it finds. The difference between a tool that greps blindly and one that uses semantic understanding determines whether your context window survives long enough to actually implement the fix.

What You’ll Walk Away With

A clear understanding of how each tool indexes and searches your codebase
Strategies for optimizing search results in each tool
Prompts that guide efficient code discovery without wasting context
Configuration patterns for ignore files, indexing settings, and search scoping

How Each Tool Searches

The three tools use fundamentally different approaches to code discovery, and understanding these differences helps you work with them more effectively.

Cursor uses semantic search backed by AI-generated embeddings. This is the most sophisticated approach:

When you open a workspace, Cursor scans your files and breaks them into meaningful chunks (functions, classes, logical blocks)
Each chunk is converted into a vector embedding that captures its semantic meaning
Embeddings are stored in a vector database optimized for fast similarity search
When you or the agent searches, the query is also converted to a vector and compared against stored embeddings

What this means in practice: You can search by meaning, not just text. Asking “where is authentication handled” finds auth.ts, session-manager.ts, and login-handler.ts even though none of them contain the word “authentication” in their filenames.

Cursor also uses traditional grep alongside semantic search. The agent decides which approach to use based on the query — exact pattern matches use grep, conceptual queries use semantic search.

Key configuration:

Check indexing status in Cursor Settings > Indexing & Docs
Semantic search becomes available at 80% indexing completion
Index auto-updates every 5 minutes, only processing changed files
Use .cursorignore to exclude files from indexing

Claude Code uses file tools for code discovery: Read, Grep, Glob, and Bash. There is no pre-built semantic index — Claude navigates your codebase in real-time using these tools.

Grep: Pattern-based text search across files. Fast for exact matches
Glob: Find files by name pattern (e.g., **/*.test.ts)
Read: Read specific files or portions of files
Bash: Run commands like find, ag, or rg for more complex searches

What this means in practice: Claude’s search is precise and deterministic, but it relies on knowing where to look. Descriptive filenames and clear project structure dramatically improve Claude’s ability to find relevant code quickly.

Claude Code also supports subagents for investigation. When you need broad codebase exploration, a subagent can search in its own context window and report back a summary, keeping your main session clean:

Use a subagent to investigate where payment webhooks are handled.
Report back the key files and the flow.

Optimizing for Search

Name Things for Discoverability

The single most impactful thing you can do for search quality across all tools: use descriptive, consistent naming.

# Bad: The AI has to read each file to know what's inside
src/
  utils.ts
  helpers.ts
  service.ts
  handler.ts

# Good: The AI can find what it needs from filenames alone
src/
  payment-webhook-handler.ts
  email-notification-service.ts
  password-validation.ts
  date-formatting.ts

Configure Ignore Files

Every tool respects .gitignore, but you should also configure tool-specific ignores for files that are tracked in git but irrelevant for AI work.

Create a .cursorignore file in your project root:

# Large data files
data/fixtures/*.json
scripts/migration-data/

# Generated code (read the source instead)
src/generated/
src/__generated__/

# Documentation builds
docs/build/

# Lock files (too large, too noisy)
pnpm-lock.yaml

Ignoring large content files improves both indexing speed and answer accuracy because the AI focuses on source code rather than noise.

Claude Code respects .gitignore natively. For additional exclusions, add guidance to your CLAUDE.md:

# Files to ignore
- Do not read files in src/generated/ -- these are auto-generated
- Do not read *.lock files
- Focus on src/ for application code
- Focus on tests/ for test files

You can also use .claude/settings.json to configure permission boundaries that prevent Claude from reading certain paths.

Codex respects .gitignore. Add additional guidance in your AGENTS.md:

## Code Navigation
- Application source is in src/
- Tests are in tests/ (co-located with source)
- Ignore src/generated/ -- auto-generated, do not read or modify
- Ignore data/ -- large fixtures, not relevant for code tasks

I need to find where [SPECIFIC BEHAVIOR] is implemented.

Search strategy:
1. Search for relevant function/class names (not just the concept)
2. Check import statements to trace the dependency chain
3. Read only the relevant functions, not entire files
4. Report: file path, function name, and a 2-3 sentence summary

Do not read more than 4 files. If you cannot find it in 4 files,
tell me what you searched for and I will narrow it down.

Advanced Search Patterns

Tracing a Feature End-to-End

When you need to understand how a feature works across multiple layers (route -> controller -> service -> database), guide the AI to trace the path efficiently.

Trace the [FEATURE] flow from entry point to database:

1. Find the route/endpoint that handles [USER ACTION]
2. Follow the handler to the service layer
3. Follow the service to the data access layer
4. Identify the database tables/queries involved

At each step, note the file, function, and any middleware/interceptors.
Read only the relevant functions at each layer -- not entire files.
Produce a summary diagram of the flow.

Finding All Usages of a Pattern

Cursor’s semantic search can find conceptually similar code:

Find all places in the codebase where we handle errors from
external API calls. I want to see if we have a consistent pattern
or if each service handles errors differently.

Claude can combine Grep and Read for precise pattern discovery:

Search the codebase for all instances of try/catch blocks that
handle HTTP errors from external APIs. Use grep to find them,
then read the surrounding context for each. Group them by
pattern -- are they consistent or inconsistent?

Find all error handling patterns for external API calls in this
codebase. List each file and the pattern used. Identify if there
is a standard pattern or if they vary.

When This Breaks

Semantic search returns irrelevant results. This usually means the index includes too much noise (build artifacts, vendored code, large data files). Update your ignore files and let the index rebuild.

The AI reads too many files during search. Set explicit limits in your prompts: “Do not read more than 5 files.” In Claude Code, use subagents for broad searches to keep your main context clean.

Search finds the wrong version of a function. In codebases with multiple implementations of the same concept (e.g., v1 and v2 of an API), tell the AI which version to focus on. Reference the specific directory or file.

Indexing takes too long or fails. Large codebases (100K+ files) can be slow to index. In Cursor, ensure your .cursorignore excludes large directories. The index builds incrementally, so you can start working once it reaches 80% completion.

What’s Next

Memory Patterns Keep search insights persistent across sessions with auto-memory and project rules.

File Organization Structure your codebase so search results are relevant and minimal.

Cost per Context Understand the cost of search-heavy sessions and how to optimize.