> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://developers.alephant.io/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://developers.alephant.io/_mcp/server.

# Routing Optimization

> Use fallback, cost-aware routing, and model policy to improve reliability and control spend

Routing optimization is the layer that decides whether a request should use the requested model, fall back to another provider, route to a cheaper model, or stop before provider cost is created.

Alephant connects routing decisions to Virtual Keys, agents, budgets, policies, request metadata, and cost analytics.

## Optimization Modes

| Mode                 | Purpose                                                            |
| -------------------- | ------------------------------------------------------------------ |
| Model fallback       | Try another model or provider when the primary route fails         |
| Cost-aware routing   | Send low-risk work to lower-cost models when policy allows it      |
| Model allowlists     | Restrict which models an agent, department, or Virtual Key can use |
| Budget-aware routing | Block, throttle, or downgrade traffic when budget limits are near  |
| Prompt-aware routing | Route prompt-managed requests by template, version, or task type   |

## Model Fallback

Fallback improves availability when a provider is unavailable, rate-limited, overloaded, or returning transient errors.

Example fallback chain:

```text
openai/gpt-4o
-> anthropic/claude-3-5-sonnet
-> google/gemini-1.5-pro
```

For each fallback attempt, Alephant can record:

* Requested model
* Attempted provider and model
* Failure status or error class
* Final serving provider and model
* Added latency
* Cost impact
* Request log and run trace metadata

## Cost-Aware Routing

Cost-aware routing helps avoid spending frontier-model budget on simple work.

Typical routing signals include:

* Prompt length
* Task class
* Prompt template ID
* Agent or workflow identity
* Required model capability
* Budget state
* User, department, or workspace policy

Example:

```text
Short summarization task
-> route from openai/gpt-4o to openai/gpt-4o-mini
-> record savings and final model used
```

## Policy Guardrails

Routing optimization should stay inside policy boundaries. A cost rule should not override:

* Workspace model restrictions
* Agent or department allowlists
* Security policies
* Data residency requirements
* Paid endpoint policy
* Budget hard stops

If multiple controls apply, Alephant should prefer the safest decision: block, throttle, or require explicit configuration instead of silently choosing a disallowed route.

## What To Monitor

Track routing optimization with:

* Fallback count
* Fallback success rate
* Average fallback latency
* Requests optimized by cost-aware routing
* Estimated savings
* Error rate after routing changes
* Quality or business outcome metrics when available

## Related Pages

* [Provider Routing](/ai-gateway/provider-routing)
* [Policies & Rules](/docs/overview/security-compliance/policies-rules)
* [Cost Analytics](/docs/overview/fin-ops-budget/cost-analytics)
* [Agent Finance](/docs/overview/fin-ops-budget/agent-finance)