Codebase Exploration & Analysis

Imagine joining a new team and facing a 100,000-line codebase with cryptic variable names, unclear architecture, and minimal documentation. What used to take weeks of painful exploration now takes hours with Claude Code. This lesson transforms you into a codebase archaeologist, uncovering hidden patterns and understanding complex systems at terminal velocity.

The Art of Codebase Discovery

Scenario: You’ve just inherited a legacy e-commerce platform. The original team is gone, documentation is sparse, and you need to add a new payment provider by Friday. Where do you even begin?

Your First Questions

Launch Claude Code at the project root
Terminal window
```
cd legacy-ecommerce-platform
claude
```

Get the lay of the land

> What does this project do? Give me a high-level overview

Understand the architecture

> Explain the folder structure and main architecture patterns used

Find the entry points

> Where is the main entry point? How does the application start?

Map the critical paths

> Trace the flow of a user making a purchase from start to finish

Within minutes, Claude provides:

Project purpose and main features
Technology stack identification
Architectural patterns (MVC, microservices, etc.)
Key directories and their responsibilities
Data flow diagrams
External dependencies and integrations

The Breadth-First Exploration

Start broad, then drill down:

Initial Survey
Deep Dive

> Give me a comprehensive overview of this codebase, including:
> - Main technologies and frameworks
> - Project structure and organization
> - Key business domains
> - External integrations
> - Testing approach

Claude analyzes the entire project and provides a structured report:

## Project Overview: Legacy E-Commerce Platform

### Technology Stack
- Backend: Node.js with Express 4.x
- Frontend: React 16.x with Redux
- Database: PostgreSQL 12, Redis for caching
- Payment: Stripe, PayPal integrations
- Infrastructure: Docker, AWS deployment

### Architecture
- Monolithic with service-oriented modules
- RESTful API with JWT authentication
- Server-side rendering for SEO
- Event-driven order processing

### Key Domains
1. **User Management** (`/src/auth`, `/src/users`)
2. **Product Catalog** (`/src/products`, `/src/inventory`)
3. **Order Processing** (`/src/orders`, `/src/payments`)
4. **Fulfillment** (`/src/shipping`, `/src/warehouses`)

### Critical Issues Found
- Outdated dependencies (security vulnerabilities)
- Inconsistent error handling
- Missing test coverage in payment module

Once you have the overview, dive into specific areas:

> Focus on the payment processing module. Show me:
> - All payment provider integrations
> - How payment data flows through the system
> - Security measures in place
> - Recent changes or commits related to payments

Claude provides detailed analysis:

File-by-file breakdown of payment module
Sequence diagrams for payment flow
Security audit findings
Integration points and dependencies

The Hunter Pattern

When searching for specific functionality:

> I need to find where discount codes are validated.
> Search for discount, coupon, promo code logic

Claude uses its agentic search capabilities to:

Find all relevant files containing discount logic
Trace the execution path
Identify edge cases and business rules
Show related database schemas
Highlight potential issues

Understanding Code Dependencies

Dependency Mapping

Scenario: You need to upgrade a critical library, but you’re unsure what will break.

> Analyze all dependencies of the authentication module.
> Show me what depends on it and what it depends on

Claude generates a comprehensive dependency graph:

Direct Dependencies

bcrypt - Password hashing
jsonwebtoken - JWT generation
passport - Authentication strategies
redis - Session storage

Internal Dependents

/api/routes/* - All API endpoints
/middleware/auth.js - Auth middleware
/services/user.js - User service
/workers/session-cleanup.js - Background jobs

Database Dependencies

users table - User data
sessions table - Active sessions
auth_logs table - Security audit
permissions table - Role-based access

External Integrations

OAuth providers (Google, GitHub)
Email service for password resets
SMS service for 2FA
Audit logging service

Impact Analysis

Before making changes:

> If I change the user authentication to use OAuth only,
> what parts of the codebase would be affected?

Claude provides:

List of affected files and functions
Required database migrations
API contract changes
Frontend components needing updates
Test suites requiring modification

Pattern Recognition and Anti-Patterns

Identifying Code Smells

> Analyze this codebase for common anti-patterns and code smells.
> Focus on maintainability issues

Claude’s analysis might reveal:

Anti-Patterns Found
Refactoring Suggestions

// 1. God Object - OrderController doing too much
class OrderController {
  // 2000+ lines handling:
  // - Order creation
  // - Payment processing
  // - Inventory management
  // - Email notifications
  // - Analytics tracking
}

// 2. Copy-Paste Programming
// Found 15 instances of nearly identical error handling

// 3. Magic Numbers
if (order.total > 1000) { // What does 1000 represent?
  applyDiscount(0.1); // Why 10%?
}

// 4. Callback Hell
getUserData((user) => {
  getOrderHistory(user.id, (orders) => {
    calculateLoyaltyPoints(orders, (points) => {
      updateUserProfile(user.id, points, (result) => {
        // ... deeper nesting
      });
    });
  });
});

// 1. Split God Object into focused services
class OrderService { /* Order logic */ }
class PaymentService { /* Payment logic */ }
class NotificationService { /* Notifications */ }

// 2. Extract common error handling
const handleError = createErrorHandler({
  log: true,
  notify: true,
  fallback: 'default'
});

// 3. Use named constants
const PREMIUM_ORDER_THRESHOLD = 1000;
const PREMIUM_DISCOUNT_RATE = 0.1;

// 4. Use async/await
const user = await getUserData();
const orders = await getOrderHistory(user.id);
const points = await calculateLoyaltyPoints(orders);
const result = await updateUserProfile(user.id, points);

Performance Bottleneck Detection

Database Query Analysis

> Identify potential performance bottlenecks in database queries

Claude examines your codebase and finds:

// ISSUE 1: N+1 Query Problem
const orders = await db.query('SELECT * FROM orders WHERE user_id = ?', [userId]);
for (const order of orders) {
  // This runs a query for EACH order!
  order.items = await db.query('SELECT * FROM order_items WHERE order_id = ?', [order.id]);
}

// CLAUDE'S SUGGESTION: Use a single join query
const ordersWithItems = await db.query(`
  SELECT o.*,
         json_agg(oi.*) as items
  FROM orders o
  LEFT JOIN order_items oi ON o.id = oi.order_id
  WHERE o.user_id = ?
  GROUP BY o.id
`, [userId]);

// ISSUE 2: Missing Indexes
// Found queries filtering on unindexed columns:
await db.query('SELECT * FROM products WHERE sku = ?'); // 'sku' not indexed
await db.query('SELECT * FROM users WHERE email = ?');   // 'email' not indexed

// ISSUE 3: Loading Unnecessary Data
const users = await db.query('SELECT * FROM users'); // Loading all columns
// But only using: user.id, user.name, user.email

Memory Leak Detection

> Check for potential memory leaks or inefficient memory usage

Claude identifies problematic patterns:

// ISSUE: Event listeners not cleaned up
class OrderMonitor {
  constructor() {
    // Listeners added but never removed
    eventBus.on('order.created', this.handleOrder);
    eventBus.on('order.updated', this.handleOrder);
  }
  // Missing cleanup method
}

// ISSUE: Unbounded cache growth
const cache = {};
function cacheResult(key, value) {
  cache[key] = value; // Cache grows forever!
}

// ISSUE: Large arrays kept in memory
let allOrdersEver = []; // Grows indefinitely

Cross-Language Analysis

Scenario: Your platform uses Node.js for the API, Python for data processing, and Go for real-time services.

> This project uses multiple languages. Give me an overview of:
> - Which language handles what functionality
> - How services communicate
> - Data flow between different language boundaries

Claude maps the polyglot architecture:

Architecture Overview:

Node.js API (./api/)
├── REST endpoints
├── GraphQL server
└── WebSocket handlers
    ↓
    Redis Queue / RabbitMQ
    ↓
Python Workers (./workers/)
├── Data pipeline
├── ML predictions
└── Report generation
    ↓
    PostgreSQL / S3
    ↓
Go Services (./services/)
├── Real-time notifications
├── WebRTC signaling
└── High-frequency trading

Communication:
- HTTP/REST between frontend and Node.js
- Message queues between Node.js and Python
- gRPC between Python and Go services
- Shared PostgreSQL for persistent data
- Redis for caching and pub/sub

Creating Living Documentation

Auto-Generated Documentation

> Create comprehensive documentation for the order processing module.
> Include architecture decisions, flow diagrams, and API documentation

Claude generates:

Architecture Doc

# Order Processing Module

## Overview
The order processing module handles the complete lifecycle of customer orders,
from cart to delivery.

## Architecture Decisions

### Event-Driven Design
**Decision**: Use event-driven architecture for order state changes
**Rationale**:
- Decouples order processing from other systems
- Enables real-time notifications
- Supports eventual consistency
- Allows for easy addition of new order events

### State Machine Pattern
**Decision**: Implement orders as a finite state machine
**States**: pending → confirmed → processing → shipped → delivered
**Benefits**:
- Clear state transitions
- Prevents invalid state changes
- Easier testing and debugging

## Component Diagram
```mermaid
graph TD
    A[Order API] --> B[Order Service]
    B --> C[State Machine]
    B --> D[Event Publisher]
    D --> E[Notification Service]
    D --> F[Inventory Service]
    D --> G[Payment Service]
    C --> H[(Order Database)]

API Endpoints

POST /orders - Create new order
GET /orders/:id - Get order details
PATCH /orders/:id/status - Update order status
POST /orders/:id/cancel - Cancel order

</TabItem>
<TabItem label="Flow Diagram">
```mermaid
sequenceDiagram
    participant C as Customer
    participant API as Order API
    participant OS as Order Service
    participant PS as Payment Service
    participant IS as Inventory Service
    participant NS as Notification Service

    C->>API: Create Order
    API->>OS: Process Order
    OS->>PS: Charge Payment
    PS-->>OS: Payment Result

    alt Payment Success
        OS->>IS: Reserve Inventory
        IS-->>OS: Inventory Reserved
        OS->>NS: Send Confirmation
        OS-->>API: Order Confirmed
        API-->>C: Success Response
    else Payment Failed
        OS->>NS: Send Failure Notice
        OS-->>API: Order Failed
        API-->>C: Error Response
    end

Continuous Documentation

Set up Claude to maintain documentation:

> Create a CLAUDE.md file that captures the essential knowledge
> about this codebase for future developers

Claude generates a comprehensive CLAUDE.md:

# E-Commerce Platform Knowledge Base

## Quick Start
- `npm install` - Install dependencies
- `docker-compose up` - Start all services
- `npm run dev` - Start development server
- `npm test` - Run test suite

## Architecture Overview
Monolithic Node.js application with:
- Express.js API
- React frontend
- PostgreSQL database
- Redis caching
- Docker deployment

## Key Business Logic

### Order Processing
Orders follow a state machine pattern:
- Customer adds items to cart
- Checkout validates inventory
- Payment processed via Stripe/PayPal
- Order confirmed and inventory reserved
- Fulfillment triggered
- Shipping updates via webhooks

### Pricing Engine
- Base prices in products table
- Discounts applied via rules engine
- Tax calculated based on shipping address
- Currency conversion via exchange rate service

## Common Tasks

### Adding a Payment Provider
1. Implement provider interface in `/src/payments/providers/`
2. Add configuration to `/config/payments.js`
3. Update PaymentService to include new provider
4. Add provider-specific webhook handler
5. Update frontend payment options

### Debugging Order Issues
1. Check order state in database
2. Review order_events table for history
3. Check payment provider dashboard
4. Review logs: `docker logs ecommerce-api`

## Known Issues
- Memory leak in report generation (workaround: restart worker daily)
- Race condition in inventory reservation (use database locks)
- Slow product search (needs Elasticsearch integration)

## Testing
- Unit tests: `npm run test:unit`
- Integration tests: `npm run test:integration`
- E2E tests: `npm run test:e2e`
- Load tests: `npm run test:load`

## Deployment
- Staging: Automatic on merge to develop
- Production: Manual approval required
- Rollback: `npm run deploy:rollback`

Security Vulnerability Scanning

Security Audit

> Perform a security audit of this codebase. Look for:
> - SQL injection vulnerabilities
> - XSS possibilities
> - Authentication bypasses
> - Sensitive data exposure
> - Outdated dependencies with known vulnerabilities

Claude’s security analysis:

For each issue, Claude provides:

Exact location in code
Severity assessment
Proof of concept
Remediation steps
Prevention strategies

Best Practices for Codebase Analysis

1. Start with Questions, Not Assumptions

Instead of diving into code randomly:

> What are the most important parts of this codebase to understand first?

2. Use Progressive Disclosure

Begin broad, then narrow:

Overall architecture
Key subsystems
Specific modules
Individual functions

3. Verify Understanding

> Based on my analysis, here's how I think the payment flow works: [description].
> Is this correct? What am I missing?

4. Document as You Explore

> Create a diagram showing how these components interact

5. Look for Patterns

> What coding patterns and conventions does this team follow?

Multi-Repository Analysis

Scenario: Your company’s system spans 12 repositories.

# Add multiple directories to Claude's context
claude --add-dir ../user-service ../payment-service ../notification-service

Then:

> How do these three services communicate? Trace a user registration
> flow across all three repositories

Claude provides:

Service interaction diagrams
API contracts between services
Shared data models
Message queue flows
Common libraries and dependencies

Refactoring Guide Transform legacy code with confidence using Claude's refactoring capabilities

Debugging Workflows Master complex debugging scenarios with AI-powered investigation

Documentation Generation Create and maintain comprehensive documentation automatically

Next Steps

You’ve learned how to navigate and understand complex codebases with Claude Code. This skill is fundamental - whether you’re joining a new team, inheriting a project, or trying to optimize existing systems.

Remember: Claude Code isn’t just a search tool. It’s your intelligent guide through the labyrinth of code, helping you understand not just what the code does, but why it was written that way and how it all fits together. Use these techniques to become productive in new codebases in hours instead of weeks.

Codebase Exploration & Analysis

The Art of Codebase Discovery

Your First Questions

Navigation Patterns for Large Codebases

The Breadth-First Exploration

The Hunter Pattern

Understanding Code Dependencies

Dependency Mapping

Impact Analysis

Pattern Recognition and Anti-Patterns

Identifying Code Smells

Performance Bottleneck Detection

Database Query Analysis

Memory Leak Detection

Cross-Language Analysis

Polyglot Navigation

Creating Living Documentation

Auto-Generated Documentation

API Endpoints

Continuous Documentation

Security Vulnerability Scanning

Security Audit

Best Practices for Codebase Analysis

1. Start with Questions, Not Assumptions

2. Use Progressive Disclosure

3. Verify Understanding

4. Document as You Explore

5. Look for Patterns

Multi-Repository Analysis

Cross-Repository Navigation

Next Steps

Codebase Exploration & Analysis

The Art of Codebase Discovery

Your First Questions

Navigation Patterns for Large Codebases

The Breadth-First Exploration

The Hunter Pattern

Understanding Code Dependencies

Dependency Mapping

Impact Analysis

Pattern Recognition and Anti-Patterns

Identifying Code Smells

Performance Bottleneck Detection

Database Query Analysis

Memory Leak Detection

Cross-Language Analysis

Polyglot Navigation

Creating Living Documentation

Auto-Generated Documentation

API Endpoints

Continuous Documentation

Security Vulnerability Scanning

Security Audit

Best Practices for Codebase Analysis

1. Start with Questions, Not Assumptions

2. Use Progressive Disclosure

3. Verify Understanding

4. Document as You Explore

5. Look for Patterns

Multi-Repository Analysis

Cross-Repository Navigation

Related Lessons

Next Steps