Skip to content

Search and Indexing Strategies

You ask the AI “where do we handle payment webhooks?” It spends 30 seconds reading through your entire src/ directory, finds six files that mention “payment,” reads all of them, and finally gives you an answer — but it consumed 15,000 tokens of context in the process. Meanwhile, a well-indexed project would have returned the relevant file in under a second with minimal context cost.

How your AI tool finds code matters as much as what it finds. The difference between a tool that greps blindly and one that uses semantic understanding determines whether your context window survives long enough to actually implement the fix.

  • A clear understanding of how each tool indexes and searches your codebase
  • Strategies for optimizing search results in each tool
  • Prompts that guide efficient code discovery without wasting context
  • Configuration patterns for ignore files, indexing settings, and search scoping

The three tools use fundamentally different approaches to code discovery, and understanding these differences helps you work with them more effectively.

Cursor uses semantic search backed by AI-generated embeddings. This is the most sophisticated approach:

  1. When you open a workspace, Cursor scans your files and breaks them into meaningful chunks (functions, classes, logical blocks)
  2. Each chunk is converted into a vector embedding that captures its semantic meaning
  3. Embeddings are stored in a vector database optimized for fast similarity search
  4. When you or the agent searches, the query is also converted to a vector and compared against stored embeddings

What this means in practice: You can search by meaning, not just text. Asking “where is authentication handled” finds auth.ts, session-manager.ts, and login-handler.ts even though none of them contain the word “authentication” in their filenames.

Cursor also uses traditional grep alongside semantic search. The agent decides which approach to use based on the query — exact pattern matches use grep, conceptual queries use semantic search.

Key configuration:

  • Check indexing status in Cursor Settings > Indexing & Docs
  • Semantic search becomes available at 80% indexing completion
  • Index auto-updates every 5 minutes, only processing changed files
  • Use .cursorignore to exclude files from indexing

The single most impactful thing you can do for search quality across all tools: use descriptive, consistent naming.

# Bad: The AI has to read each file to know what's inside
src/
utils.ts
helpers.ts
service.ts
handler.ts
# Good: The AI can find what it needs from filenames alone
src/
payment-webhook-handler.ts
email-notification-service.ts
password-validation.ts
date-formatting.ts

Every tool respects .gitignore, but you should also configure tool-specific ignores for files that are tracked in git but irrelevant for AI work.

Create a .cursorignore file in your project root:

# Large data files
data/fixtures/*.json
scripts/migration-data/
# Generated code (read the source instead)
src/generated/
src/__generated__/
# Documentation builds
docs/build/
# Lock files (too large, too noisy)
pnpm-lock.yaml

Ignoring large content files improves both indexing speed and answer accuracy because the AI focuses on source code rather than noise.

When you need to understand how a feature works across multiple layers (route -> controller -> service -> database), guide the AI to trace the path efficiently.

Cursor’s semantic search can find conceptually similar code:

Find all places in the codebase where we handle errors from
external API calls. I want to see if we have a consistent pattern
or if each service handles errors differently.

Semantic search returns irrelevant results. This usually means the index includes too much noise (build artifacts, vendored code, large data files). Update your ignore files and let the index rebuild.

The AI reads too many files during search. Set explicit limits in your prompts: “Do not read more than 5 files.” In Claude Code, use subagents for broad searches to keep your main context clean.

Search finds the wrong version of a function. In codebases with multiple implementations of the same concept (e.g., v1 and v2 of an API), tell the AI which version to focus on. Reference the specific directory or file.

Indexing takes too long or fails. Large codebases (100K+ files) can be slow to index. In Cursor, ensure your .cursorignore excludes large directories. The index builds incrementally, so you can start working once it reaches 80% completion.