Provider Routing

View as Markdown

Alephant Provider Routing lets your application use one OpenAI-compatible API surface while Alephant routes requests across AI providers, models, and custom backends.

Routing in Alephant is not only provider forwarding. It is the decision layer that connects model selection, provider access, budget policy, fallback behavior, token accounting, and request observability.

Every routed request can be:

  • resolved to the correct provider and model
  • checked against workspace, key, user, agent, session, and model policy
  • dispatched through the right provider adapter
  • retried or failed over when configured
  • normalized back into an OpenAI-style response
  • recorded with provider, model, token, cost, latency, cache, retry, and trace metadata

Why Provider Routing Matters

Production AI systems rarely use only one model or one provider.

Teams may use OpenAI for general chat, Anthropic for reasoning, Gemini for long context, Bedrock for enterprise deployment, Ollama for local models, and custom endpoints for private inference. Without a gateway, every application has to manage provider-specific SDKs, model names, credentials, error formats, usage fields, retry behavior, and cost reporting.

Alephant gives teams one routing layer for all AI traffic.

Routing Surfaces

Alephant supports multiple routing surfaces depending on how much control you want at request time.

SurfaceUse Case
/v1/*OpenAI-compatible gateway access for existing SDKs and agent clients
model=“provider/model_id”Explicitly select a provider and model from an OpenAI-style request
model=“model_id”Let Alephant resolve a bare model ID when it maps to one known provider
/router/{id}/*Route through a configured policy router
/{provider}/*Direct provider passthrough for explicit upstream control

For most applications, start with /v1/* and provider-prefixed model names.

1curl https://ai.alephant.io/v1/chat/completions \
2
3 -H "Authorization: Bearer $ALEPHANT_VIRTUAL_KEY" \
4 -H "Content-Type: application/json" \
5 -d '{
6 "model": "openai/gpt-4o-mini",
7 "messages": [
8 { "role": "user", "content": "Write a short product summary." }
9 ]
10 }'

Model Naming

Alephant uses model names to resolve the target provider.

Provider-prefixed model IDs

Use provider/model_id when you want the request to go to a specific provider.

1{
2 "model": "openai/gpt-4o-mini"
3}

Examples:

openai/gpt-4o-mini
anthropic/claude-3-5-sonnet
google/gemini-1.5-pro
bedrock/anthropic.claude-3-5-sonnet
ollama/llama3.1

Provider-prefixed model IDs are the clearest option for production traffic because they make routing intent explicit.

Bare model IDs

Alephant can also resolve a bare model ID when the model maps to exactly one known provider.

1{
2 "model": "gpt-4o-mini"
3}

If the bare model is unique, Alephant expands it internally to the canonical provider/model form.

gpt-4o-mini -> openai/gpt-4o-mini

If the same model ID exists under multiple providers, Alephant returns a 400 Bad Request and asks you to specify the provider.

1{
2 "error": {
3 "message": "Ambiguous model 'gpt-4o': matches multiple providers. Please specify one of: openai/gpt-4o, azure/gpt-4o",
4 "type": "invalid_request_error",
5 "code": "ambiguous_model"
6 }
7}

Request Lifecycle

A routed request moves through the same gateway lifecycle:

  1. Receive request
    Your application or agent sends an OpenAI-compatible request to Alephant.
  2. Authenticate key
    Alephant validates the virtual key and loads workspace, user, agent, session, and key metadata.
  3. Resolve provider and model
    Alephant resolves the provider from the request path, router ID, provider-prefixed model, bare model ID, or key-bound provider configuration.
  4. Check policy before dispatch
    Alephant evaluates model allowlists, provider access, rate limits, budget rules, concurrency limits, and route-level policy before any upstream provider cost is created.
  5. Adapt and dispatch
    Alephant maps the OpenAI-style request into the selected provider format, calls the upstream provider, and applies retry or fallback behavior when configured.
  6. Normalize response
    Provider-specific responses, usage fields, errors, streaming events, and finish reasons are normalized into a consistent gateway response.
  7. Record cost and trace metadata
    Alephant records provider, model, tokens, cost, latency, status code, cache status, retry count, fallback path, session, agent, user, and workspace metadata.

Policy-Aware Routing

Provider routing is connected to Alephant policy and budget control.

Before dispatching to a provider, Alephant can enforce:

  • workspace-level provider access
  • virtual-key scoped provider access
  • model allowlists and denylists
  • agent-level model rules
  • member or team-level budgets
  • per-session budget caps
  • rate limits and concurrency controls
  • route-specific fallback or downgrade rules

This means routing can answer both technical and financial questions:

  • Can this key use this provider?
  • Can this agent call this model?
  • Is the workspace still within budget?
  • Should this request be blocked, throttled, downgraded, or routed normally?
  • Which provider/model decision created the final cost?

Fallback and Reliability

Provider routing can also support reliability behavior.

When configured, Alephant can retry failed upstream calls or fall back to another model or provider. This is useful when a provider is unavailable, rate-limited, overloaded, or returning transient errors.

Example fallback chain:

openai/gpt-4o-mini -> anthropic/claude-3-5-haiku -> groq/llama-3.1-70b

Fallback decisions are recorded so the dashboard can show which model actually served the request and how the fallback affected latency, cost, and reliability.

Routing and Cost Attribution

Every route decision becomes part of the Alephant cost ledger.

For each request, Alephant can show:

  • requested model
  • resolved provider
  • resolved model
  • final provider used after retry or fallback
  • input tokens
  • output tokens
  • total token cost
  • cache hit or miss
  • latency
  • status code
  • workspace
  • virtual key
  • user or team member
  • agent
  • session
  • prompt or tool-call metadata

This makes provider routing visible to both engineering and finance teams.

Examples

OpenAI-compatible request

1import OpenAI from "openai";
2
3const client = new OpenAI({
4 baseURL: "https://api.alephant.io/v1",
5 apiKey: process.env.ALEPHANT_API_KEY,
6});
7
8const response = await client.chat.completions.create({
9 model: "openai/gpt-4o-mini",
10 messages: [
11 { role: "user", content: "Summarize this support conversation." },
12 ],
13});

Bare model auto-resolution

1{
2 "model": "gpt-4o-mini",
3 "messages": [
4 { "role": "user", "content": "Explain gateway routing." }
5 ]
6}

If gpt-4o-mini is unique in the model catalog, Alephant resolves it automatically.

Explicit provider route

1{
2 "model": "anthropic/claude-3-5-sonnet",
3 "messages": [
4 { "role": "user", "content": "Review this architecture decision." }
5 ]
6}

Use explicit provider routes when you need deterministic provider selection.

Use provider-prefixed model IDs for production workloads.

provider/model_id

Use bare model IDs when you want a simpler developer experience and the model name is unambiguous.

Use configured routers when routing should be controlled centrally by workspace policy rather than hardcoded in application code.

Use direct provider passthrough only when you intentionally want explicit upstream behavior.

Common Questions

Is Alephant just a model proxy?

No. Alephant routes requests, but it also applies policy, budget controls, provider adaptation, retry and fallback behavior, cost attribution, and observability in the gateway path.

Can one application use multiple providers?

Yes. A single application can send OpenAI-compatible requests to Alephant and select different providers by changing the model value or by using configured routers.

What happens if the model is not allowed?

Alephant rejects the request before dispatching it to the provider. This prevents disallowed provider usage and avoids creating upstream cost.

What happens if the model name is ambiguous?

Alephant returns a 400 Bad Request and asks you to specify the provider, such as openai/gpt-4o or azure/gpt-4o.

Does routing affect observability?

Yes. Alephant records both the requested model and the resolved provider/model so teams can see exactly how each request was routed and how much it cost.