Routing Optimization
Routing optimization is the layer that decides whether a request should use the requested model, fall back to another provider, route to a cheaper model, or stop before provider cost is created.
Alephant connects routing decisions to Virtual Keys, agents, budgets, policies, request metadata, and cost analytics.
Optimization Modes
Model Fallback
Fallback improves availability when a provider is unavailable, rate-limited, overloaded, or returning transient errors.
Example fallback chain:
For each fallback attempt, Alephant can record:
- Requested model
- Attempted provider and model
- Failure status or error class
- Final serving provider and model
- Added latency
- Cost impact
- Request log and run trace metadata
Cost-Aware Routing
Cost-aware routing helps avoid spending frontier-model budget on simple work.
Typical routing signals include:
- Prompt length
- Task class
- Prompt template ID
- Agent or workflow identity
- Required model capability
- Budget state
- User, department, or workspace policy
Example:
Policy Guardrails
Routing optimization should stay inside policy boundaries. A cost rule should not override:
- Workspace model restrictions
- Agent or department allowlists
- Security policies
- Data residency requirements
- Paid endpoint policy
- Budget hard stops
If multiple controls apply, Alephant should prefer the safest decision: block, throttle, or require explicit configuration instead of silently choosing a disallowed route.
What To Monitor
Track routing optimization with:
- Fallback count
- Fallback success rate
- Average fallback latency
- Requests optimized by cost-aware routing
- Estimated savings
- Error rate after routing changes
- Quality or business outcome metrics when available