Multi-protocol surface
/v1/chat/completions (OpenAI), /v1/messages (Anthropic), /v1beta/* (Gemini).
OctaFuse is an AI Gateway for teams and enterprises—a unified access layer between your upstream AI providers and every product line you run.
On the upstream side, public cloud models, third-party inference services, and self-hosted models are all wired in and managed in one place. On the downstream side, each product line calls through standard OpenAI / Anthropic / Gemini interfaces without caring which provider handles the request. Keys and budgets, routing and failover, cost metering and audit trails all live in this layer—not scattered across individual service codebases.
It fits teams connecting multiple AI providers to multiple product lines and wanting reliability, cost visibility, and audit coverage lifted out of application code into dedicated infrastructure.
OctaFuse was built to own and evolve an in-house AI gateway for several internal SaaS systems.
After reviewing many open-source and commercial options, we kept seeing the same pain points:
OctaFuse aims to address that with more freedom:
Multi-protocol surface
/v1/chat/completions (OpenAI), /v1/messages (Anthropic), /v1beta/* (Gemini).
Keys and budgets
Users / API keys, caps and period resets, plus GET /v1/me for budget-style status from clients.
Routing
Providers, models, and routes; route groups and priority-based failover.
Cost layers
metered_cost, standard_cost, and charged_cost—supply-side metering vs catalog vs what you charge users.
Audit and observability
Global and per-key request logs, plus user-level audit trails for traceability and investigations.
Proxy error alerts
Optional Feishu (Lark) and WeChat Work bot webhooks in Admin—forwarding failures surface upstream incidents, quota or rate-limit pressure, and upstream API keys that may need attention or top-up.
Analytics
Time-range views in Admin for model, provider, and user usage plus reliability summaries—capacity checks, cost awareness, and upstream health comparisons.
Playground
Send a test call for one model route without spending user budgets or leaving the same metering / logs as real traffic—great for troubleshooting and pre-flight checks.
Simulator
Call your deployed gateway from the browser with a real user API key in OpenAI / Anthropic / Gemini shapes—rehearse auth, routing, billing, and logging the way production clients do.
Runtimes
Cloudflare (Worker + Pages + D1) or self-hosted (Docker / Node + Postgres or MySQL). See the deployment sections in the docs.
Decoupled from apps
SaaS and portals integrate via /api/admin/* so product code stays focused on AI use cases.
Pick based on compliance, latency, data residency, and operations. Commands and troubleshooting live under Deploy on the documentation home.
Read the production section on the documentation home for trade-offs and Quick start for the shortest path. Licensed under GNU AGPL v3 — see LICENSE.