capability.routelive

Upstream capabilities

LLMReasoning Modelschat · code · reasoning

IMGImage Modelsgenerate · edit · vision

AUDAudio Modelstranscribe · diarize · ASR

VIDVideo Modelsgenerate · edit · understand

EXTExternal Capabilitiessearch · fetch · tools

OctaFuseCapability Gateway

routekeysbudgetfailoverbillingaudit

Unified access

TXTChat APIOpenAI · Anthropic · Gemini

IMGImages APIgenerations · edits

AUDAudio APItranscriptions · ASR

TOOLTools APIagents · extensions

ADMAdmin APIusers · keys · budgets

5 capability typesN+ upstream services1 unified gateway5 access surfaces

AI Gateway & Control PlaneOctaFuse

Unify AI capabilities. Control every call.

Quick start Read the docs

AI capabilities are expanding beyond text models into image generation, Agent Tools, private models, and specialized services. Providers, compatible endpoints, coding plans, self-hosted services, and internal accounts each bring their own Base URLs, keys, quotas, pricing, and logs. Applications need something simpler: one stable entrypoint, controlled routing, clear budgets, and traceable request history.

OctaFuse is a self-hostable AI capability gateway and operations control plane. Unify official providers, third-party compatible endpoints, image generation, Agent Tools, private model services, and upstream key pools while keeping routing, sticky affinity, limits, circuit breaking, budgets, billing, audit, and Admin APIs under your control. Clients still use one Gateway URL and one user key.

Application Scenarios

Individual developers

Consolidate model subscriptions, coding plans, image generation, Agent Tools, and backup upstreams behind one Gateway. Cursor, CLIs, scripts, and personal tools only need one Gateway URL and one user key.

AI apps and agents

Build against stable model IDs and common protocols. Move model swaps, canary routes, backup providers, degradation, and default parameters into the gateway layer.

Platform and IT teams

Create separate users and API keys for departments, projects, members, or customers, with budget periods, status, and metadata outside application services.

API proxy businesses

Connect multiple upstream providers and key pools, issue downstream customer keys, and manage each customer through request logs, cost lenses, and audit records.

Private and hybrid models

Put self-hosted compatible services, internal models, Ollama / vLLM paths, and public model providers behind one routing system with a consistent external surface.

Cost and reliability ops

Observe usage and failures by model, Provider, user, and reliability view, then tune default routes, backups, budgets, time-of-day pricing, and upstream key quotas.

Product Capabilities

Three protocol entrypoints

Supports OpenAI Chat Completions, Anthropic Messages, Gemini generateContent / streamGenerateContent. Agents / SDKs use authenticated GET /v1/models; portals can use public GET /catalog/models.

Image generation / edit

OpenAI-compatible POST /v1/images/generations and edits, with token-metered and per-image (per_image) catalog pricing. Playground / Simulator cover image smoke and billing logs.

Agent Tools

Extensible product APIs for agents (/v1/tools/*). Shipping today: web tools (Search / Fetch / Deep Search); more tools can follow. Configure engine keys under Admin → Tools — per-call billing, no charge on failure.

Models and route groups

Clients can use stable model IDs or baseId:group. One model can attach multiple upstream routes by protocol, route group, and route priority — routes have no weight.

Provider and model presets

One-click Admin import for a large preset catalog: official model vendors plus aggregation platforms and Coding / Token Plans, with Base URLs and catalog pricing prefilled so you are not hunting docs to maintain endpoints by hand.

Provider key scheduling

A Provider can hold multiple upstream keys. After a route is chosen, scheduling uses key priority, headroom, key weight (near ties), circuit state, and sticky bindings before failover.

Limits, circuits, sticky

Upstream keys support RPM / TPM / max concurrency. 429, 401 / 403, 5xx, and network errors use separate cooldown policies; opt-in sticky bindings keep the same user on the same key to improve prompt cache hits.

User keys, budgets, and pricing

Budgets and reset periods belong to Users; each user can own multiple active API keys. Proxy checks budget before forwarding and records three amounts per request: supplier cost, catalog list price, and charged to user. Routes support base factors plus daily schedule multipliers (business-timezone peak / off-peak) for vendor time-based pricing.

Logs, audit, analytics

Request logs capture protocol, model, route group, Provider, provider key, tokens, status, and costs. Audit logs track budget charges, resets, and user / key lifecycle events.

Playground and Simulator

Playground tests a single Admin route. Simulator uses a real user key from the browser to verify auth, routing, billing, and logs.

Deployment and integration

Default Cloudflare Workers + D1 (individuals and light traffic can usually stay in the free tier); or Node / Docker with PostgreSQL or MySQL 8. External portals can provision users, keys, and budgets through /api/admin/*.

Next Steps

Default path is Cloudflare: try Wrangler + local D1, then one CLI onto your account; configure Provider, model, and route, create a user key, and point clients at the Gateway Base URL.

Quick start — local D1 → Cloudflare deploy
Providers · Models · Routing
Provider Catalog · Model Catalog — browse the static presets bundled for direct Gateway import
Images · Agent Tools · Analytics
Request logs · Audit logs · System config
Cloudflare deployment · Docker deployment
GitHub technical reference — APIs, architecture, migrations, and operations index
Source repository · Contributing