Provider Routing
Alephant Provider Routing lets your application use one OpenAI-compatible API surface while Alephant routes requests across AI providers, models, and custom backends.
Routing in Alephant is not only provider forwarding. It is the decision layer that connects model selection, provider access, budget policy, fallback behavior, token accounting, and request observability.
Every routed request can be:
- resolved to the correct provider and model
- checked against workspace, key, user, agent, session, and model policy
- dispatched through the right provider adapter
- retried or failed over when configured
- normalized back into an OpenAI-style response
- recorded with provider, model, token, cost, latency, cache, retry, and trace metadata
Why Provider Routing Matters
Production AI systems rarely use only one model or one provider.
Teams may use OpenAI for general chat, Anthropic for reasoning, Gemini for long context, Bedrock for enterprise deployment, Ollama for local models, and custom endpoints for private inference. Without a gateway, every application has to manage provider-specific SDKs, model names, credentials, error formats, usage fields, retry behavior, and cost reporting.
Alephant gives teams one routing layer for all AI traffic.
Routing Surfaces
Alephant supports multiple routing surfaces depending on how much control you want at request time.
For most applications, start with /v1/* and provider-prefixed model names.
Model Naming
Alephant uses model names to resolve the target provider.
Provider-prefixed model IDs
Use provider/model_id when you want the request to go to a specific provider.
Examples:
openai/gpt-4o-mini
anthropic/claude-3-5-sonnet
google/gemini-1.5-pro
bedrock/anthropic.claude-3-5-sonnet
ollama/llama3.1
Provider-prefixed model IDs are the clearest option for production traffic because they make routing intent explicit.
Bare model IDs
Alephant can also resolve a bare model ID when the model maps to exactly one known provider.
If the bare model is unique, Alephant expands it internally to the canonical provider/model form.
gpt-4o-mini -> openai/gpt-4o-mini
If the same model ID exists under multiple providers, Alephant returns a 400 Bad Request and asks you to specify the provider.
Request Lifecycle
A routed request moves through the same gateway lifecycle:
- Receive request
Your application or agent sends an OpenAI-compatible request to Alephant. - Authenticate key
Alephant validates the virtual key and loads workspace, user, agent, session, and key metadata. - Resolve provider and model
Alephant resolves the provider from the request path, router ID, provider-prefixed model, bare model ID, or key-bound provider configuration. - Check policy before dispatch
Alephant evaluates model allowlists, provider access, rate limits, budget rules, concurrency limits, and route-level policy before any upstream provider cost is created. - Adapt and dispatch
Alephant maps the OpenAI-style request into the selected provider format, calls the upstream provider, and applies retry or fallback behavior when configured. - Normalize response
Provider-specific responses, usage fields, errors, streaming events, and finish reasons are normalized into a consistent gateway response. - Record cost and trace metadata
Alephant records provider, model, tokens, cost, latency, status code, cache status, retry count, fallback path, session, agent, user, and workspace metadata.
Policy-Aware Routing
Provider routing is connected to Alephant policy and budget control.
Before dispatching to a provider, Alephant can enforce:
- workspace-level provider access
- virtual-key scoped provider access
- model allowlists and denylists
- agent-level model rules
- member or team-level budgets
- per-session budget caps
- rate limits and concurrency controls
- route-specific fallback or downgrade rules
This means routing can answer both technical and financial questions:
- Can this key use this provider?
- Can this agent call this model?
- Is the workspace still within budget?
- Should this request be blocked, throttled, downgraded, or routed normally?
- Which provider/model decision created the final cost?
Fallback and Reliability
Provider routing can also support reliability behavior.
When configured, Alephant can retry failed upstream calls or fall back to another model or provider. This is useful when a provider is unavailable, rate-limited, overloaded, or returning transient errors.
Example fallback chain:
openai/gpt-4o-mini -> anthropic/claude-3-5-haiku -> groq/llama-3.1-70b
Fallback decisions are recorded so the dashboard can show which model actually served the request and how the fallback affected latency, cost, and reliability.
Routing and Cost Attribution
Every route decision becomes part of the Alephant cost ledger.
For each request, Alephant can show:
- requested model
- resolved provider
- resolved model
- final provider used after retry or fallback
- input tokens
- output tokens
- total token cost
- cache hit or miss
- latency
- status code
- workspace
- virtual key
- user or team member
- agent
- session
- prompt or tool-call metadata
This makes provider routing visible to both engineering and finance teams.
Examples
OpenAI-compatible request
Bare model auto-resolution
If gpt-4o-mini is unique in the model catalog, Alephant resolves it automatically.
Explicit provider route
Use explicit provider routes when you need deterministic provider selection.
Recommended Usage
Use provider-prefixed model IDs for production workloads.
provider/model_id
Use bare model IDs when you want a simpler developer experience and the model name is unambiguous.
Use configured routers when routing should be controlled centrally by workspace policy rather than hardcoded in application code.
Use direct provider passthrough only when you intentionally want explicit upstream behavior.
Common Questions
Is Alephant just a model proxy?
No. Alephant routes requests, but it also applies policy, budget controls, provider adaptation, retry and fallback behavior, cost attribution, and observability in the gateway path.
Can one application use multiple providers?
Yes. A single application can send OpenAI-compatible requests to Alephant and select different providers by changing the model value or by using configured routers.
What happens if the model is not allowed?
Alephant rejects the request before dispatching it to the provider. This prevents disallowed provider usage and avoids creating upstream cost.
What happens if the model name is ambiguous?
Alephant returns a 400 Bad Request and asks you to specify the provider, such as openai/gpt-4o or azure/gpt-4o.
Does routing affect observability?
Yes. Alephant records both the requested model and the resolved provider/model so teams can see exactly how each request was routed and how much it cost.