Skip to content

Understanding Large Codebases via CLI

It is your first week on a new team. The repository has 400,000 lines of code, seven years of git history, and the original architect left two years ago. The README has not been updated since 2023. You need to ship a feature by Friday, but you do not even know where the authentication logic lives.

This is exactly the problem Claude Code was built for. Instead of spending three days reading source files and tracing call chains by hand, you can systematically explore a codebase from the terminal — using sub-agents to investigate multiple areas in parallel without burning through your main session’s context window.

  • A systematic exploration workflow that works on any codebase size
  • Prompts that extract architecture, data flow, and conventions in minutes
  • Sub-agent patterns that keep your main context clean while investigating deeply
  • Techniques for building a mental model of code you have never seen before

Effective codebase analysis follows a top-down pattern: start with the broad architecture, identify the subsystems, then drill into the specific area you need to modify.

  1. Start with a high-level overview

    Open Claude Code at the project root and ask for the big picture. Claude reads key files like package.json, directory structure, configuration files, and entry points to build an architectural summary.

    Give me a high-level overview of this codebase. What does it do,
    what's the tech stack, and how is the code organized? Focus on
    the main entry points and the directory structure.

    Claude reads dozens of files to answer this, but the response is a concise summary. This is your map for deeper exploration.

  2. Identify the key subsystems

    What are the main modules or subsystems in this project? For each one,
    tell me: what it does, where the code lives, and what other modules
    it depends on. Keep it brief -- one paragraph per module.
  3. Drill into the area you need to modify

    Now that you know the landscape, focus on the specific subsystem relevant to your task.

  4. Map the data flow

    Understanding how data moves through the system is often more valuable than understanding any single file.

    Trace how a "create order" request flows through the system.
    Start from the API endpoint, follow it through validation,
    business logic, database writes, and any events/notifications
    that get triggered. Show the data shape at each step.

Here is the critical technique that separates effective codebase exploration from context-burning exploration: sub-agents.

When Claude reads files to answer your questions, every file goes into your context window. In a large codebase, a single deep investigation can consume half your context. Sub-agents solve this by running in their own context windows and reporting back summaries.

Use sub-agents to investigate these three areas in parallel:
1. How does the payment processing work? Trace from checkout to
payment confirmation, including error handling and retries.
2. What's the caching strategy? Find all caching layers (Redis,
in-memory, CDN) and document what's cached, TTLs, and
invalidation patterns.
3. How are background jobs handled? Find the job queue system,
list all job types, and document retry/failure handling.
For each investigation, report back: key files involved,
the main flow, and any potential issues you notice.

Each sub-agent explores independently, reads as many files as needed, and returns a focused summary. Your main context stays clean.

Every codebase has unwritten rules. Claude can identify them by analyzing patterns across files.

This prompt is especially useful when you are about to write new code and want to match the existing style without being told by a teammate.

The git log often tells you more about a codebase’s evolution than any documentation.

Look through the git history of src/auth/ and summarize how
the authentication system evolved. Focus on:
- Major refactors (what changed and why, based on commit messages)
- Recent changes in the last month
- Files that change frequently (likely hot spots)
- Contributors who know this code best

For understanding why a specific piece of confusing code exists:

Show me the git blame for src/middleware/rateLimit.ts and explain
why it's implemented this way. Look at the original commit and any
PRs that modified it. There's a comment saying "DO NOT CHANGE"
on line 47 -- find out why.

When you want to explore a codebase without any risk of accidentally modifying files, use Plan Mode. Claude can read files and run read-only commands, but cannot edit anything.

Terminal window
claude --permission-mode plan

This is particularly useful during your first day on a new project. You can ask any question, trace any flow, and read any file — all with the guarantee that Claude will not change a single line of code.

I'm in Plan Mode. Walk me through the request lifecycle for
this Express application. Start from the server entry point,
follow a request through all middleware, and explain what each
middleware layer does. I want to understand the system before
making any changes.

After exploring a codebase, capture what you learned so you do not have to re-explore in future sessions.

  1. Generate an architecture document

    Based on everything you've learned about this codebase, create
    an ARCHITECTURE.md file that covers:
    - System overview and tech stack
    - Directory structure with explanations
    - Key data flows (request lifecycle, background job processing)
    - External dependencies and integrations
    - Development workflow (how to run, test, deploy)
  2. Update CLAUDE.md with your discoveries

    Update CLAUDE.md with the conventions and gotchas you discovered.
    Include the build/test commands, code style rules, and any
    "traps" where the code does something non-obvious. Keep it
    under 50 lines.
  3. Create a glossary of domain terms

    This codebase uses domain-specific terms I keep seeing: "fulfillment",
    "settlement", "reconciliation", "provider". Create a GLOSSARY.md
    that defines each domain term as used in THIS codebase, with
    references to where the concept is implemented.

For automated or recurring analysis, use headless mode to generate reports without an interactive session.

Terminal window
claude -p "Analyze the test coverage in this project. List all \
source files that have no corresponding test file. Group by \
directory and sort by most recently modified." \
--output-format json > coverage-gaps.json

This is useful for onboarding reports, tech debt audits, or periodic codebase health checks that run in CI.

Claude gives a shallow overview that misses important details. Ask follow-up questions that force deeper reading: “Trace the actual function calls, not just the module names” or “Show me the specific database query, not just ‘it queries the database’.”

Context fills up during exploration. Use sub-agents for any investigation that might touch more than 5-10 files. Run /compact if your main session is getting heavy, or /clear if you are switching to a different area of the codebase.

Claude misidentifies the architecture. This happens with unconventional project structures. Correct it explicitly: “This is not a standard MVC app. The ‘handlers’ directory contains business logic, not HTTP handlers. Re-analyze with that understanding.”

Old documentation contradicts the code. Tell Claude to always trust the code over documentation: “When the README and the actual code disagree, the code is correct. Flag the documentation as outdated.”

Now that you can navigate any codebase, it is time to plan the feature you need to build.