Building Octo-router
January 8, 2026
Octo Router
A production-ready, open-source LLM router built in Go. Route requests across multiple LLM providers with intelligent load balancing, provider-specific configurations, and comprehensive input validation.
Features
- Multi-Provider Support: OpenAI, Anthropic (Claude), and Google Gemini
- Standardized Model Naming: Consistent
provider/modelformat with built-in pricing metadata - Intelligent Routing:
- Cost-Based Routing: Automatically select the cheapest model based on tier constraints
- Round-Robin: Distribute load evenly across providers
- Tier-Based Selection: Control quality/cost trade-offs with tier constraints (budget, standard, premium, ultra-premium)
- Fallback Chain: Automatic failover to backup providers when primary provider fails
- Provider-Specific Defaults: Configure model and token limits per provider
- Input Validation: Comprehensive request validation with detailed error messages
- Token Estimation: Local token counting using tiktoken (no API calls)
- Cost Tracking: Automatic per-request cost calculation with Prometheus metrics
- Resilience: Circuit breakers, retries with exponential backoff, and error translation
- Dynamic Provider Management: Enable/disable providers without code changes
- Health Monitoring: Built-in health check endpoint
- Structured Logging: Production-ready logging with zap
- Streaming Support: Server-Sent Events for streaming responses
Supported Providers
All models use the standardized provider/model naming format.
| Provider | Status | Models |
|---|---|---|
| OpenAI | Supported | openai/gpt-5, openai/gpt-5.1, openai/gpt-4o, openai/gpt-4o-mini, openai/gpt-3.5-turbo |
| Anthropic | Supported | anthropic/claude-opus-4.5, anthropic/claude-sonnet-4, anthropic/claude-haiku-4.5, anthropic/claude-haiku-3 |
| Google Gemini | Supported | gemini/gemini-2.5-flash, gemini/gemini-2.5-flash-lite, gemini/gemini-2.0-pro |
See Model Standardization for complete pricing and model details.
Getting Started
Prerequisites
- Go 1.21 or higher
- API keys for desired providers
Installation
git clone https://github.com/oviecodes/octo-router.git
cd octo-router
go mod download
Configuration
Create a config.yaml file in the project root:
providers:
- name: openai
apiKey: ${OPENAI_API_KEY}
enabled: true
- name: anthropic
apiKey: ${ANTHROPIC_API_KEY}
enabled: true
- name: gemini
apiKey: ${GEMINI_API_KEY}
enabled: true
models:
defaults:
openai:
model: "openai/gpt-4o-mini"
maxTokens: 4096
anthropic:
model: "anthropic/claude-sonnet-4"
maxTokens: 4096
gemini:
model: "gemini/gemini-2.5-flash"
maxTokens: 8192
routing:
strategy: round-robin
fallbacks:
- anthropic
- gemini
- openai
resilience:
timeout: 30000
retries:
maxAttempts: 3
initialDelay: 1000
maxDelay: 10000
backoffMultiplier: 2
circuitBreaker:
failureThreshold: 5
resetTimeout: 60000
cache:
enabled: true
ttl: 3600
Note: Use environment variables for API keys. The router supports ${VAR_NAME} syntax for environment variable substitution.
Running the Server
go run main.go
The server starts on localhost:8000 by default.
API Reference
Health Check
Check the router's health and number of enabled providers.
GET /health
Response:
{
"status": "healthy",
"providers": 3
}
Chat Completions
Send messages to LLM providers via the router.
POST /v1/chat/completions
Request Body:
{
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
],
"model": "openai/gpt-4o-mini",
"temperature": 0.7,
"max_tokens": 1000,
"stream": false
}
Parameters:
| Field | Type | Required | Description |
|---|---|---|---|
messages | array | Yes | Array of message objects with role and content |
model | string | No | Override default model for selected provider |
temperature | float | No | Sampling temperature (0-2) |
max_tokens | integer | No | Maximum tokens to generate (1-100000) |
top_p | float | No | Nucleus sampling (0-1) |
frequency_penalty | float | No | Frequency penalty (-2 to 2) |
presence_penalty | float | No | Presence penalty (-2 to 2) |
stream | boolean | No | Enable streaming responses (not yet implemented) |
Message Roles:
user: User messagesassistant: Assistant responses (for conversation history)system: System instructions
Response:
{
"message": "The capital of France is Paris.",
"role": "assistant",
"provider": "*providers.OpenAIProvider"
}
Error Response:
{
"error": "Validation failed",
"details": [
{
"field": "messages[0].role",
"message": "Role must be one of: user assistant system"
}
]
}
Admin Endpoints
Get All Providers Configuration
POST /admin/config
Response:
{
"providers": [
{
"name": "openai",
"apiKey": "sk-***",
"enabled": true
}
]
}
Get Enabled Providers
POST /admin/providers
Response:
{
"enabled": [
{
"name": "openai",
"apiKey": "sk-***",
"enabled": true
}
],
"count": 3
}
Architecture
Project Structure
llm-router/
├── cmd/
│ └── internal/
│ ├── providers/ # Provider implementations
│ │ ├── provider.go
│ │ ├── openai.go
│ │ ├── anthropic.go
│ │ └── gemini.go
│ ├── router/ # Routing logic
│ │ └── router.go
│ └── server/ # HTTP handlers
│ ├── server.go
│ └── validation.go
├── config/ # Configuration loading
│ └── config.go
├── types/ # Shared types
│ ├── completion.go
│ ├── message.go
│ ├── provider.go
│ └── router.go
├── utils/ # Utility functions
├── config.yaml # Configuration file
└── main.go
How It Works
- Configuration Loading: At startup, the router loads provider configurations, model defaults, and fallback chain from
config.yaml - Provider Initialization: Enabled providers are initialized with their respective API clients
- Model Validation: Model names are validated against the centralized model catalog with pricing metadata
- Request Handling: Incoming requests are validated, routed to a provider using the configured strategy, and responses are normalized
- Fallback Chain: If a provider fails, the router automatically tries fallback providers in the configured order
- Error Handling: Provider-specific errors are translated to domain errors with retry logic
- Resilience: Circuit breakers track provider health, retries handle transient failures
- Metrics: Prometheus metrics track requests, latency, costs, and circuit breaker state
Routing Strategies
Currently supported:
- Round-robin: Distributes requests evenly across enabled providers
- Cost-based routing: Pick the cheapest model for a request depending on the tier set in config.yaml
Planned:
- Latency-based routing
- Provider-specific routing rules
Fallback Chain
The fallback chain provides automatic failover when the primary provider fails, ensuring high availability and reliability.
How It Works:
- Provider Selection: The router selects a primary provider using the configured routing strategy (e.g., round-robin)
- Fallback Chain Building: A fallback chain is built by combining the primary provider with the configured fallback providers from
config.yaml - Sequential Retry: If the primary provider fails, the router automatically tries each fallback provider in order
- Circuit Breaker Integration: Each provider's circuit breaker state is checked and updated during the fallback process
- Deduplication: The chain automatically prevents duplicate providers (e.g., if primary is already in fallback list)
- Immediate Success Return: The first successful provider response is returned immediately
- All-Failed Error: Only returns an error if all providers in the chain fail
Configuration:
routing:
strategy: round-robin
fallbacks:
- anthropic # Try Anthropic first if primary fails
- gemini # Then try Gemini
- openai # Finally try OpenAI
Example Flow:
- Round-robin selects
openaias primary - Fallback chain built:
[openai, anthropic, gemini] - OpenAI fails → automatically retries with Anthropic
- Anthropic succeeds → returns response immediately
- Circuit breaker updated for both OpenAI (failure) and Anthropic (success)
Benefits:
- High Availability: Requests succeed even when providers are down
- Transparent Failover: Clients don't need to handle provider failures
- Cost Optimization: Configure cheaper providers as fallbacks
- Latency Management: Try faster providers before slower ones
- Comprehensive Logging: Detailed logs track each fallback attempt with provider names, error details, and remaining providers
Note: Fallback chain works for both streaming and non-streaming completions. Each provider in the chain respects the configured retry policy and circuit breaker settings.
Validation
The router performs comprehensive input validation:
- Message validation: Required fields, role validation, content length limits
- Parameter validation: Range checks for temperature, max_tokens, penalties
- Business logic validation: First message role requirements, total content size limits
- Detailed error messages: Clear, actionable error messages for clients
Token Counting
Token counting uses tiktoken for local estimation:
- No API calls required
- No rate limiting
- Fast and accurate for cost estimation
- Works across all providers (normalized to OpenAI's encoding)
Development
Running Tests
go test ./...
Environment Variables
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-..."
export GEMINI_API_KEY="..."
export APP_ENV="development" # or "production"
Adding a New Provider
- Create provider implementation in
cmd/internal/providers/ - Implement the
Providerinterface:type Provider interface { Complete(ctx context.Context, messages []types.Message) (*types.Message, error) CountTokens(ctx context.Context, messages []types.Message) (int, error) } - Add provider case to
ConfigureProviders()inprovider.go - Add default configuration to
config.yaml
Roadmap
- Streaming support (Server-Sent Events)
- Proper error handling for different types of Error
- Circuit breaker for provider failures
- Fallback chain for automatic provider failover
- Model standardization with pricing metadata
- Metrics and observability (Prometheus)
- Request/response caching (semantic caching planned)
- Cost tracking and reporting (cost calculation implemented)
- Rate limiting per provider
- Custom routing strategies
- Function/Tool calling support
- Multi-tenancy support
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
Acknowledgments
Built with: