Overview | Alephant Documentation

Alephant AI Gateway is an OpenAI-compatible control layer for production AI applications, available as hosted SaaS or as a self-hosted gateway. It gives developers one stable API surface while the gateway handles provider-specific adaptation, model routing, policy enforcement, layered caching, retries, fallback, usage metadata, request logging, and audit trails.

Instead of wiring every application directly to every provider, teams connect once and route across 50+ providers, 320+ models, and custom model backends. Start with Alephant Cloud for a managed workspace, or self-host the gateway when you need private infrastructure, BYO keys, and direct operational control.

1 import { OpenAI } from "openai";
2 
3 const client = new OpenAI({
4   baseURL: "https://ai.alephant.io/v1",
5   defaultHeaders: {
6     Authorization: `Bearer ${ALEPHANT_VIRTUAL_KEY}`,
7     "Alephant-Session-Id": "session-xxx", // optional
8    });
9 
10 const response = await client.chat.completions.create({
11   model: "gpt-4o",
12   messages: [{ role: "user", content: "Hello!" }]
13 });

Why this exists

AI applications are moving from single-model prototypes to production systems that call many providers, agents, tools, and custom model backends. Without a gateway, every team ends up rebuilding the same operational layer: provider adapters, routing rules, key management, usage metadata, retries, caching, and request logs.

Alephant AI Gateway centralizes that layer behind one OpenAI-compatible API. It gives developers a stable integration surface while platform teams get policy before provider access, cache before repeated calls, fallback before outages, and audit trails before production incidents.

The goal is simple: make AI traffic observable, governable, and reliable without slowing developers down. Learn more ->

Features

Capability	What Alephant AI Gateway provides
One API surface	OpenAI-compatible `/v1/` and `/ai/` routes for chat, responses, embeddings, images, and provider-style model names
Provider and model coverage	50+ providers, 320+ models, local runtimes, OpenRouter-style catalogs, and custom/private backends
Provider adaptation	Request, tool, streaming, error, usage, finish-reason, and response normalization across provider APIs
Routing and resilience	Direct provider paths, policy routers, retries, fallback, health checks, provider 429 handling, and fail-open cache paths
Agent client compatibility	OpenAI-compatible formats for Cursor, Codex, opencode, and Antigravity workflows
IDE integration	Cursor-ready with architecture rules, workflow guides, implementation skills, and task management; opencode, Codex, and Claude Code adapters in progress
Policy and key control	Virtual keys, master key resolution, model policy, workspace provider allowlists, and concurrency controls
Caching	Gateway-side LLM KV cache and semantic cache to avoid repeated upstream calls
Observability	Request logs, traces, metrics, usage metadata, optional body archival, and downstream log delivery
Live operations	Route, virtual key, and provider key refresh from database changes without restarting the gateway
Deployment	Hosted SaaS through Alephant Cloud, or self-hosted Rust gateway with PostgreSQL, Redis, Qdrant, and S3-compatible integrations

Developer surface

Surface	Purpose
`/v1/*`	Drop-in OpenAI-compatible API for existing SDKs and agent clients
`/router/{id}/*`	Policy-driven routing through a configured router
`/{provider}/*`	Direct provider passthrough when you want explicit upstream control
`model=provider/model_id`	Select a provider and model without changing application code
Custom backends	Put private models or self-hosted runtimes behind the same gateway contract

Architecture & request lifecycle

Every request passes through the same gateway lifecycle: global middleware, routing, provider mapping, dispatch, cache, fallback, and async logging. The entry path depends on how much control you want:

Path	Use it for
`/v1/*`	Unified OpenAI-style access with `model=provider/model_id`
`/router/{id}/*`	Policy-driven routing through a configured router
`/{provider}/*`	Direct provider passthrough when you want an explicit upstream

Multi-provider adaptation

Use one OpenAI-style request shape across 50+ providers and 320+ models, including OpenAI-compatible APIs, Anthropic Messages, Gemini, Bedrock, Ollama, OpenRouter-style catalogs, and custom backends. The client selects a runtime with model=provider/model_id; Alephant resolves the provider, applies the right adapter, maps provider-specific fields, and returns a normalized OpenAI-style response.

Instead of listing every model in the README, this section focuses on the contract: one request format in, one consistent response out. The provider and model catalog can evolve independently without forcing application code changes.


Mainstream models	GPT-4o · GPT-4.1 · o3 · Claude 3.5/3.7 Sonnet · Claude Opus · Gemini 1.5/2.0 · Llama 3/4 · Mistral Large · Command R+
Provider ecosystem	OpenAI · Anthropic · Google Gemini · AWS Bedrock · Azure OpenAI · OpenRouter · Together AI · Fireworks · Groq · Cohere · Mistral · Perplexity · DeepSeek · xAI · Ollama
Agent client compatibility	Cursor · Codex · opencode · Antigravity

IDE integration

Alephant AI Gateway ships repository-level tooling for AI-assisted development inside supported IDEs.

IDE / Agent Client	Status	What’s included
Cursor	Ready	Project architecture & code-convention rules, development & API workflow guides, gated-module-implementation skill (Skill), file-based task management (Task Magic) — see the `.cursor` directory; also configure the gateway in Agent Settings → Models
opencode	In progress	Adapter and configuration under development
Codex	In progress	Adapter and configuration under development
Claude Code	In progress	Adapter and configuration under development

Comparison

Portkey, Alephant, and LiteLLM are excellent projects, but they start from different centers of gravity. Alephant is built for teams shipping agentic AI products: a hosted SaaS workspace plus a self-hosted gateway path for agent development, cost control, provider routing, governance, and operational visibility.

Project	Best known for	Best fit
Portkey	Enterprise AI gateway controls, guardrails, and managed policy workflows	Teams that want a managed AI control plane
Helicone	LLM observability, request analytics, sessions, and cost visibility	Teams whose primary need is tracing and analytics
LiteLLM	Broad Python proxy/SDK ecosystem for many providers	Teams that want maximum provider breadth through a Python stack
Alephant AI Gateway	Agent development infrastructure, cost control, governance, provider routing, and SaaS + self-host deployment	Teams building production agents that need cost guardrails, request traceability, BYO keys, and multi-provider control

Capability	Portkey	Alephant	LiteLLM	Alephant AI Gateway
OpenAI-compatible API	Yes	Yes	Yes	Yes
SaaS + self-host	Enterprise/self-host options	Hosted and self-host options	Self-hosted proxy	Yes: Alephant Cloud plus self-hosted Rust gateway
Provider/model coverage	Broad	Broad logging/proxy coverage	Very broad	50+ providers, 320+ models, custom backends
Agent coding clients	No dedicated compatibility layer	No dedicated compatibility layer	No dedicated compatibility layer	Cursor, Codex, opencode, Antigravity workflows
Agent cost control	Guardrails and policy controls	Cost analytics and request visibility	Budgets and spend controls	Agent/session-aware usage visibility, cache savings, budget controls, and governance workflows
Provider adaptation	Gateway policies and routing	Proxy plus observability pipeline	Strong provider abstraction	Explicit mappers for requests, streaming, errors, usage, and responses
Routing and resilience	Routing, retries, fallbacks	Gateway controls plus observability	Router, fallback, budgets	Direct paths, policy routers, fallback, health checks, provider 429 handling
BYO key control	Key vault / enterprise controls	BYO keys with proxy controls	Virtual keys and self-hosted keys	BYO provider keys, master-key resolution, workspace allowlists
Cache	Gateway caching	Cache tracking/integrations	Cache integrations	LLM KV cache plus semantic cache
Observability	Logs and policy events	Core strength	Callback/logging integrations	Logs, traces, metrics, usage metadata, optional body archival
Governance path	Strong enterprise guardrails	Workspace controls around observability	Teams, budgets, rate limits	Agent/session governance, model policy, provider allowlists, concurrency controls, and workspace-level controls

Alephant’s differentiator is the combination: hosted SaaS, self-hosted Rust gateway, agent-first developer compatibility, cost-control workflows, BYO-key governance, explicit provider adaptation, and workspace-level AI FinOps.

Repository structure

alephant-ai-gateway/
├── ai-gateway/                 # Gateway service crate
├── crates/                     # Shared libraries and harnesses
├── docs/                       # In-repo notes; curated docs at https://api.alephant.io/
├── scripts/                    # CI and local automation
├── infrastructure/             # Deployment and observability infra
├── test/                       # Integration and runtime test helpers
├── AGENTS.md                   # Agent collaboration conventions
├── CLAUDE.md                   # Command and architecture reference
└── CHANGELOG.md                # Project changelog