Skip to content

AI GatewayOctaFuse

Unify your upstream AI providers and continuously deliver model capabilities to every product line.

OctaFuse is an AI Gateway for teams and enterprises—a unified access layer between your upstream AI providers and every product line you run.

On the upstream side, public cloud models, third-party inference services, and self-hosted models are all wired in and managed in one place. On the downstream side, each product line calls through standard OpenAI / Anthropic / Gemini interfaces without caring which provider handles the request. Keys and budgets, routing and failover, cost metering and audit trails all live in this layer—not scattered across individual service codebases.

It fits teams connecting multiple AI providers to multiple product lines and wanting reliability, cost visibility, and audit coverage lifted out of application code into dedicated infrastructure.

OctaFuse was built to own and evolve an in-house AI gateway for several internal SaaS systems.

After reviewing many open-source and commercial options, we kept seeing the same pain points:

  • Narrow provider coverage, making it awkward to mix public cloud, private hosting, and internal models in one stack.
  • OpenAI-only surfaces, which force extra adapters when your stack already speaks Anthropic or Gemini.
  • Rigid or shallow billing and audit trails, so it is hard to model per-route, per-user, or supply-side vs user-side costs the way internal products need.

OctaFuse aims to address that with more freedom:

  • Wire in more providers—including models you run yourself—and expose multiple client-facing protocols from one gateway.
  • Define routing, billing, and how you trace and reconcile usage across teams and routes.
  • Integrate upstream systems through a stable Admin API with less coupling.

Multi-protocol surface

/v1/chat/completions (OpenAI), /v1/messages (Anthropic), /v1beta/* (Gemini).

Keys and budgets

Users / API keys, caps and period resets, plus GET /v1/me for budget-style status from clients.

Routing

Providers, models, and routes; route groups and priority-based failover.

Cost layers

metered_cost, standard_cost, and charged_cost—supply-side metering vs catalog vs what you charge users.

Audit and observability

Global and per-key request logs, plus user-level audit trails for traceability and investigations.

Proxy error alerts

Optional Feishu (Lark) and WeChat Work bot webhooks in Admin—forwarding failures surface upstream incidents, quota or rate-limit pressure, and upstream API keys that may need attention or top-up.

Analytics

Time-range views in Admin for model, provider, and user usage plus reliability summaries—capacity checks, cost awareness, and upstream health comparisons.

Playground

Send a test call for one model route without spending user budgets or leaving the same metering / logs as real traffic—great for troubleshooting and pre-flight checks.

Simulator

Call your deployed gateway from the browser with a real user API key in OpenAI / Anthropic / Gemini shapes—rehearse auth, routing, billing, and logging the way production clients do.

Runtimes

Cloudflare (Worker + Pages + D1) or self-hosted (Docker / Node + Postgres or MySQL). See the deployment sections in the docs.

Decoupled from apps

SaaS and portals integrate via /api/admin/* so product code stays focused on AI use cases.

Pick based on compliance, latency, data residency, and operations. Commands and troubleshooting live under Deploy on the documentation home.

  • Docker (laptop or private network): fastest path to try OctaFuse, PoCs, and internal debugging.
  • Cloudflare: edge-friendly footprint with D1 for configuration and metadata.
  • Self-hosted: stricter data boundaries or alignment with existing Postgres / MySQL operations.

Read the production section on the documentation home for trade-offs and Quick start for the shortest path. Licensed under GNU AGPL v3 — see LICENSE.

  • Platform and product orgs that need one governed front door for model traffic across many lines of business or tenants.
  • Teams that want cost, reliability, and auditability lifted out of application code into dedicated infrastructure.
  • Organizations that want freedom to evolve vendors and models without bespoke failover logic in every service.