Octo Router

A production-ready, open-source LLM router built in Go. Route requests across multiple LLM providers with intelligent load balancing, provider-specific configurations, and comprehensive input validation.

Features

Multi-Provider Support: OpenAI, Anthropic (Claude), and Google Gemini
Standardized Model Naming: Consistent provider/model format with built-in pricing metadata
Intelligent Routing:
- Cost-Based Routing: Automatically select the cheapest model based on tier constraints
- Round-Robin: Distribute load evenly across providers
- Tier-Based Selection: Control quality/cost trade-offs with tier constraints (budget, standard, premium, ultra-premium)
Fallback Chain: Automatic failover to backup providers when primary provider fails
Provider-Specific Defaults: Configure model and token limits per provider
Input Validation: Comprehensive request validation with detailed error messages
Token Estimation: Local token counting using tiktoken (no API calls)
Cost Tracking: Automatic per-request cost calculation with Prometheus metrics
Resilience: Circuit breakers, retries with exponential backoff, and error translation
Dynamic Provider Management: Enable/disable providers without code changes
Health Monitoring: Built-in health check endpoint
Structured Logging: Production-ready logging with zap
Streaming Support: Server-Sent Events for streaming responses

Supported Providers

All models use the standardized provider/model naming format.

Provider	Status	Models
OpenAI	Supported	openai/gpt-5, openai/gpt-5.1, openai/gpt-4o, openai/gpt-4o-mini, openai/gpt-3.5-turbo
Anthropic	Supported	anthropic/claude-opus-4.5, anthropic/claude-sonnet-4, anthropic/claude-haiku-4.5, anthropic/claude-haiku-3
Google Gemini	Supported	gemini/gemini-2.5-flash, gemini/gemini-2.5-flash-lite, gemini/gemini-2.0-pro

See Model Standardization for complete pricing and model details.

Getting Started

Prerequisites

Go 1.21 or higher
API keys for desired providers

Installation

git clone https://github.com/oviecodes/octo-router.git
cd octo-router
go mod download

Configuration

Create a config.yaml file in the project root:

providers:
  - name: openai
    apiKey: ${OPENAI_API_KEY}
    enabled: true

  - name: anthropic
    apiKey: ${ANTHROPIC_API_KEY}
    enabled: true

  - name: gemini
    apiKey: ${GEMINI_API_KEY}
    enabled: true

models:
  defaults:
    openai:
      model: "openai/gpt-4o-mini"
      maxTokens: 4096

    anthropic:
      model: "anthropic/claude-sonnet-4"
      maxTokens: 4096

    gemini:
      model: "gemini/gemini-2.5-flash"
      maxTokens: 8192

routing:
  strategy: round-robin
  fallbacks:
    - anthropic
    - gemini
    - openai

resilience:
  timeout: 30000

  retries:
    maxAttempts: 3
    initialDelay: 1000
    maxDelay: 10000
    backoffMultiplier: 2

  circuitBreaker:
    failureThreshold: 5
    resetTimeout: 60000

cache:
  enabled: true
  ttl: 3600

Note: Use environment variables for API keys. The router supports ${VAR_NAME} syntax for environment variable substitution.

Running the Server

go run main.go

The server starts on localhost:8000 by default.

API Reference

Health Check

Check the router's health and number of enabled providers.

GET /health

Response:

{
  "status": "healthy",
  "providers": 3
}

Chat Completions

Send messages to LLM providers via the router.

POST /v1/chat/completions

Request Body:

{
  "messages": [
    {
      "role": "user",
      "content": "What is the capital of France?"
    }
  ],
  "model": "openai/gpt-4o-mini",
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": false
}

Parameters:

Field	Type	Required	Description
`messages`	array	Yes	Array of message objects with `role` and `content`
`model`	string	No	Override default model for selected provider
`temperature`	float	No	Sampling temperature (0-2)
`max_tokens`	integer	No	Maximum tokens to generate (1-100000)
`top_p`	float	No	Nucleus sampling (0-1)
`frequency_penalty`	float	No	Frequency penalty (-2 to 2)
`presence_penalty`	float	No	Presence penalty (-2 to 2)
`stream`	boolean	No	Enable streaming responses (not yet implemented)

Message Roles:

user: User messages
assistant: Assistant responses (for conversation history)
system: System instructions

Response:

{
  "message": "The capital of France is Paris.",
  "role": "assistant",
  "provider": "*providers.OpenAIProvider"
}

Error Response:

{
  "error": "Validation failed",
  "details": [
    {
      "field": "messages[0].role",
      "message": "Role must be one of: user assistant system"
    }
  ]
}

Admin Endpoints

Get All Providers Configuration

POST /admin/config

Response:

{
  "providers": [
    {
      "name": "openai",
      "apiKey": "sk-***",
      "enabled": true
    }
  ]
}

Get Enabled Providers

POST /admin/providers

Response:

{
  "enabled": [
    {
      "name": "openai",
      "apiKey": "sk-***",
      "enabled": true
    }
  ],
  "count": 3
}

Architecture

Project Structure

llm-router/
├── cmd/
│   └── internal/
│       ├── providers/      # Provider implementations
│       │   ├── provider.go
│       │   ├── openai.go
│       │   ├── anthropic.go
│       │   └── gemini.go
│       ├── router/         # Routing logic
│       │   └── router.go
│       └── server/         # HTTP handlers
│           ├── server.go
│           └── validation.go
├── config/                 # Configuration loading
│   └── config.go
├── types/                  # Shared types
│   ├── completion.go
│   ├── message.go
│   ├── provider.go
│   └── router.go
├── utils/                  # Utility functions
├── config.yaml            # Configuration file
└── main.go

How It Works

Configuration Loading: At startup, the router loads provider configurations, model defaults, and fallback chain from config.yaml
Provider Initialization: Enabled providers are initialized with their respective API clients
Model Validation: Model names are validated against the centralized model catalog with pricing metadata
Request Handling: Incoming requests are validated, routed to a provider using the configured strategy, and responses are normalized
Fallback Chain: If a provider fails, the router automatically tries fallback providers in the configured order
Error Handling: Provider-specific errors are translated to domain errors with retry logic
Resilience: Circuit breakers track provider health, retries handle transient failures
Metrics: Prometheus metrics track requests, latency, costs, and circuit breaker state

Routing Strategies

Currently supported:

Round-robin: Distributes requests evenly across enabled providers
Cost-based routing: Pick the cheapest model for a request depending on the tier set in config.yaml

Planned:

Latency-based routing
Provider-specific routing rules

Fallback Chain

The fallback chain provides automatic failover when the primary provider fails, ensuring high availability and reliability.

How It Works:

Provider Selection: The router selects a primary provider using the configured routing strategy (e.g., round-robin)
Fallback Chain Building: A fallback chain is built by combining the primary provider with the configured fallback providers from config.yaml
Sequential Retry: If the primary provider fails, the router automatically tries each fallback provider in order
Circuit Breaker Integration: Each provider's circuit breaker state is checked and updated during the fallback process
Deduplication: The chain automatically prevents duplicate providers (e.g., if primary is already in fallback list)
Immediate Success Return: The first successful provider response is returned immediately
All-Failed Error: Only returns an error if all providers in the chain fail

Configuration:

routing:
  strategy: round-robin
  fallbacks:
    - anthropic # Try Anthropic first if primary fails
    - gemini # Then try Gemini
    - openai # Finally try OpenAI

Example Flow:

Round-robin selects openai as primary
Fallback chain built: [openai, anthropic, gemini]
OpenAI fails → automatically retries with Anthropic
Anthropic succeeds → returns response immediately
Circuit breaker updated for both OpenAI (failure) and Anthropic (success)

Benefits:

High Availability: Requests succeed even when providers are down
Transparent Failover: Clients don't need to handle provider failures
Cost Optimization: Configure cheaper providers as fallbacks
Latency Management: Try faster providers before slower ones
Comprehensive Logging: Detailed logs track each fallback attempt with provider names, error details, and remaining providers

Note: Fallback chain works for both streaming and non-streaming completions. Each provider in the chain respects the configured retry policy and circuit breaker settings.

Validation

The router performs comprehensive input validation:

Message validation: Required fields, role validation, content length limits
Parameter validation: Range checks for temperature, max_tokens, penalties
Business logic validation: First message role requirements, total content size limits
Detailed error messages: Clear, actionable error messages for clients

Token Counting

Token counting uses tiktoken for local estimation:

No API calls required
No rate limiting
Fast and accurate for cost estimation
Works across all providers (normalized to OpenAI's encoding)

Development

Running Tests

go test ./...

Environment Variables

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-..."
export GEMINI_API_KEY="..."
export APP_ENV="development"  # or "production"

Adding a New Provider

Create provider implementation in cmd/internal/providers/

Implement the Provider interface:

type Provider interface {
    Complete(ctx context.Context, messages []types.Message) (*types.Message, error)
    CountTokens(ctx context.Context, messages []types.Message) (int, error)
}

Add provider case to ConfigureProviders() in provider.go
Add default configuration to config.yaml

Roadmap

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

Acknowledgments

Built with: