Skip to content

LLM Gateway Configuration

Your compliance team requires that all AI model traffic stays within your cloud perimeter. No direct calls to api.anthropic.com — everything must go through AWS Bedrock or Google Vertex AI so you can audit usage, enforce data residency, and consolidate billing. Claude Code supports this out of the box.

  • Claude Code routing through AWS Bedrock with IAM authentication
  • Google Vertex AI configuration with workload identity
  • LiteLLM proxy setup for cost tracking and multi-model routing
  • Custom API gateway patterns for advanced enterprise requirements
  • AWS account with Amazon Bedrock enabled
  • Claude models enabled in your Bedrock region
  • IAM credentials with bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream permissions
Terminal window
# Set Bedrock as the API provider
export CLAUDE_CODE_USE_BEDROCK=1
# Standard AWS credential chain applies
export AWS_REGION=us-east-1
export AWS_ACCESS_KEY_ID=your-access-key
export AWS_SECRET_ACCESS_KEY=your-secret-key
# Or use AWS SSO / profiles
export AWS_PROFILE=bedrock-profile
claude

For persistent configuration, add to your Claude Code settings:

{
"env": {
"CLAUDE_CODE_USE_BEDROCK": "1",
"AWS_REGION": "us-east-1",
"AWS_PROFILE": "bedrock-profile"
}
}

If you need models from multiple regions:

Terminal window
export ANTHROPIC_MODEL="us.anthropic.claude-sonnet-4-5-20250929-v1:0"
export AWS_REGION=us-east-1

The us. prefix routes to the US cross-region profile, which balances across available US regions.

  • Google Cloud project with Vertex AI enabled
  • Claude models enabled in your region
  • Service account with Vertex AI User role
Terminal window
# Set Vertex AI as the API provider
export CLAUDE_CODE_USE_VERTEX=1
# Google Cloud configuration
export CLOUD_ML_REGION=us-east5
export ANTHROPIC_VERTEX_PROJECT_ID=your-project-id
# Authenticate
gcloud auth application-default login
claude

LiteLLM is an open-source proxy that sits between Claude Code and any LLM provider. It adds cost tracking, rate limiting, and key management.

  • Cost tracking by API key: See spend per developer, per team, per project
  • Rate limiting: Enforce per-user token limits
  • Multi-model routing: Route different requests to different providers
  • Audit logging: Full request/response logging for compliance
Terminal window
# Install LiteLLM
pip install litellm[proxy]
# Start the proxy with Claude model configuration
litellm --model claude-sonnet-4-5-20250929 --port 4000

Configure Claude Code to use the proxy:

Terminal window
export ANTHROPIC_BASE_URL=http://localhost:4000
export ANTHROPIC_API_KEY=your-litellm-key
claude
litellm_config.yaml
model_list:
- model_name: claude-sonnet-4-5-20250929
litellm_params:
model: claude-sonnet-4-5-20250929
api_key: sk-ant-your-key
- model_name: claude-opus-4-6
litellm_params:
model: claude-opus-4-6
api_key: sk-ant-your-key
general_settings:
master_key: sk-litellm-master-key
database_url: postgresql://user:pass@localhost/litellm
Terminal window
litellm --config litellm_config.yaml --port 4000

For organizations with existing API gateways (Kong, Apigee, AWS API Gateway), you can route Claude Code through them:

Terminal window
# Point Claude Code at your custom gateway
export ANTHROPIC_BASE_URL=https://ai-gateway.company.com/v1
export ANTHROPIC_API_KEY=your-gateway-key
claude

Your gateway needs to proxy requests to https://api.anthropic.com/v1/ with the appropriate authentication headers.

For environments where API keys rotate or are generated dynamically, Claude Code supports a helper script:

{
"apiKeyHelper": "/opt/scripts/get-claude-key.sh"
}

The script must output the API key to stdout. It runs in /bin/sh and the result is sent as both X-Api-Key and Authorization: Bearer headers.

/opt/scripts/get-claude-key.sh
#!/bin/bash
# Example: fetch from AWS Secrets Manager
aws secretsmanager get-secret-value \
--secret-id claude-api-key \
--query SecretString \
--output text

Bedrock returns “model not found”: Check that Claude models are enabled in your Bedrock region. Not all regions have all models. Use the cross-region inference prefix (us.) if your region does not have the specific model version.

Vertex AI authentication fails in CI: Workload identity federation must be configured correctly. The GitHub OIDC token must map to a service account with Vertex AI permissions. Check gcloud auth application-default print-access-token to verify credentials.

LiteLLM proxy adds latency: LiteLLM adds a hop. For latency-sensitive workflows, consider running it on the same machine or network as your developers. Typical overhead is 50-100ms per request.

Custom gateway strips headers: Some API gateways modify or strip headers that Anthropic’s API requires. Ensure your gateway passes through anthropic-version, content-type, and x-api-key headers without modification.