Does TarqaAI support BYOK (Bring Your Own Keys)?

BYOK is coming soon. Once available, you will be able to bring your own provider keys (OpenAI, Anthropic, Google, etc.) and token usage will be billed directly by the provider with zero markup from TarqaAI. Currently, TarqaAI provides pooled access to all 60+ models through its own provider relationships.

How does TarqaAI pricing scale?

Start free with no credit card required. Starter is $15/mo (1,000 requests), Pro is $49/mo (8,000 requests), Team is $149/mo (40,000 pooled requests across 5 seats). Enterprise pricing is custom.

Unified AI Infrastructure & Governance

One layer for your
agents and teams.

Q: Do I have to rewrite my code to switch models?

No. The API is standards-compatible and drops into the SDKs you already use. Moving between any of 60+ models or a local GPU is a one-line change to the model string — your application code stays the same.

Q: What are GPU WebSocket tunnels?

Run one CLI command and a local model on a spare GPU is exposed as a first-class endpoint through a secure WebSocket reverse proxy. Your whole team queries it through the standard Tarqa API at $0 cloud inference.

Route to 60+ models — frontier reasoning, multimodal, open-weight and your own local GPUs — through a single endpoint. RAG, budget gates and audit trails included.

Start building free See how it works ›

BYOK · $0 token markupGovernance built-inNo credit card

60+

AI Models

Unified API

RBAC Roles

To Start

Claude Opus 4.6 ◆Claude Sonnet 4.5 ◆GPT-4.1 ◆GPT-4o ◆o3 ◆o4-mini ◆Gemini 2.5 Pro ◆Gemini 2.5 Flash ◆Llama 4 Maverick ◆Llama 4 Scout ◆DeepSeek-R1 ◆DeepSeek V3 ◆Qwen3 235B ◆Mistral Large 2 ◆Nova Pro ◆Local GPU · Tunnel ◆60+ models, one key ◆Claude Opus 4.6 ◆Claude Sonnet 4.5 ◆GPT-4.1 ◆GPT-4o ◆o3 ◆o4-mini ◆Gemini 2.5 Pro ◆Gemini 2.5 Flash ◆Llama 4 Maverick ◆Llama 4 Scout ◆DeepSeek-R1 ◆DeepSeek V3 ◆Qwen3 235B ◆Mistral Large 2 ◆Nova Pro ◆Local GPU · Tunnel ◆60+ models, one key ◆

/01 — Control Plane

Not just a proxy. The complete control plane for AI.

01 / UNIFIED API

Swap any model by changing one string.

Stop installing five SDKs and reconciling five invoices. Tarqa speaks every major model runtime under the hood, so all 60+ foundation models — plus your own local GPUs — are one model string away.

✓One endpoint, 60+ models — every major runtime behind a single API, no per-provider SDKs.
✓Zero lock-in — switch foundation models without touching application code.
✓BYOK, $0 markup — bring your own provider keys and pay them directly.
✓Standards-compatible — drop-in for the SDKs your team already uses.

Read the SDK docs ›

route.ts

// same code — pick any provider
const res = await tarqa.chat({
  model: 'claude-opus-4-6',
  rag_context: 'workspace-docs',
  messages
});

↻ routed & billed through one gateway

02 / EMBEDDED RAG

Grounded answers, no vector DB to wrangle.

Upload PDFs, codebase docs, or scrape URLs into a workspace knowledge base. Pass rag_context on any call and Tarqa runs the vector search and injects the relevant context before the prompt ever reaches the provider.

✓Index docs, sites and repos in a single call.
✓Semantic search with cited, grounded responses out of the box.
✓Context isolated per workspace — personal data never mixes with the team's.

Explore RAG ›

▤

workspace-documents

842 chunks · 12 sources indexed

vector

query → top-k semantic match inject → context window → cited, grounded answer ✓

/02 — GPU Tunnels

Secure reverse tunnels for AI inference.

Have a spare GPU humming under your desk running an open-source model? Run one CLI command and the whole remote team can query that local GPU through the standard Tarqa API — as if it were a cloud model.

✓tarqa-tunnel-agent opens a secure WebSocket reverse proxy to Tarqa Cloud.
✓Your local open-source model becomes a first-class endpoint for the entire workspace.
✓$0 cloud inference for internal testing — your hardware, their fingertips.

Set up a tunnel ›

▣

Local GPU · self-hosted

open-model · localhost:8000

your machine

⇅

tarqa-tunnel-agent

secure WebSocket reverse proxy

encrypted

◎

Tarqa Cloud Gateway

exposed as model: local-gpu

$0 inference

⛭

Your remote team

queries via standard Tarqa API

anywhere

/03 — Governance & Control

Governance and control, built in.

Budget safety, access control and observability ship as infrastructure — not as a sales call.

⛨

Dual-Gate Allowance

Every request is checked against request limits and token limits before it hits the provider. Breach either and Tarqa returns 403 — physically blocking runaway bills.

Cost Safety

⚿

5-Role RBAC & SSO

SAML 2.0 SSO with your identity provider. Granular roles for Owners, Admins, Developers, Members and Viewers, plus JSON exception overrides.

Governance

◷

Production Observability

Per-request latency, token counts, model cost breakdowns and error traces in one dashboard. Know exactly what your AI spends and why.

Analytics

⧉

Workspace Scoping

Solo seats are isolated; teams pool resources — a 5-seat team shares 50,000 monthly requests in one registry. Personal keys never touch shared keys.

Teams

⟲

Context-Aware Memory

Persistent conversation state that survives sessions, with token budgeting that keeps responses sharp without ballooning costs.

Stateful AI

⛬

Exportable Audit Trails

Every token generated is logged with an exportable audit trail. Encrypted in transit and at rest, with configurable data-retention controls.

Auditability

/04 — Ecosystem

The unifying layer across your entire AI stack.

Tarqa sits below your application and above every provider — one registry for the clouds, models, tools and policies you already run.

Managed Model Runtimes

Native routing to 60+ hosted foundation models across every major enterprise runtime — no per-provider SDKs.

Frontier LLMs

Top reasoning and multimodal models — Claude Opus 4.6, GPT-4.1, o3, Gemini 2.5 Pro and more — through one endpoint.

Direct Model APIs

BYOK access to frontier model providers — you pay them directly, Tarqa adds $0 markup.

Local & Self-Hosted

Expose any self-hosted or open-weight model as a first-class endpoint over a secure tunnel.

RAG Knowledge Base

Built-in vector search over your docs, sites and repos — no separate vector DB.

Identity & SSO

SAML 2.0 with your identity provider, 5-role RBAC and per-workspace scoping.

Observability

Latency, token counts, per-model cost and error traces in one dashboard.

SDKs & CI/CD

A standards-compatible API drops into the SDKs and pipelines your team already uses.

60+ models across every major runtime — unified behind one key.

/05 — Live Demo

Same request. Any model.

Pick a model

Switch the model, not your code.

Pick any of 60+ models or a local tunnel — the request body barely changes. Tarqa handles formatting, routing and billing.

Claude Opus 4.6FLAGSHIP

Gemini 2.5 ProFAST

OpenAI o3REASONING

Open Model · Local GPULOCAL

DeepSeek-R1THINKING

POST /api/v1/chat

{
  "model": "claude-opus-4-6",
  "messages": [{ "role": "user",
    "content": "What can TarqaAI do?" }],
  "rag_context": "workspace-docs",
  "stream": true
}

— awaiting request. Hit send.

✓ 60+ MODELS, ONE ENDPOINT✓ ROLE-BASED ACCESS✓ ENCRYPTED IN TRANSIT✓ STANDARDS COMPATIBLE✓ FULL AUDIT TRAIL

/06 — Pricing

Start free. Scale when you ship.

Free

$0 /forever

✓ 100 requests / mo
✓ 200K tokens / mo
✓ Economy models
✓ 1 API key · 10 req/min
✓ Community support

Start Free

Starter

$15 /mo

✓ 1,000 requests / mo
✓ 1.5M tokens / mo
✓ All models
✓ 3 API keys · 60 req/min
✓ RAG — 150 ops / 800 searches
✓ Basic analytics · Email support

Get Started

Most popularPro

$49 /mo

✓ 8,000 requests / mo
✓ 12M tokens / mo
✓ All models + Premium caps
✓ 10 API keys · 200 req/min
✓ RAG — 1,500 ops / 8,000 searches
✓ Full analytics + webhooks + memory

Upgrade to Pro

Team

$149 /mo

✓ 40,000 requests / mo pooled
✓ 60M tokens / mo pooled
✓ 5 seats · $25/mo per extra seat
✓ 25 API keys · 600 req/min
✓ RBAC + per-seat budgets
✓ 99.9% SLA · Priority support (4h)

Start Team Plan

Enterprise

Custom pricing

Unlimited scale, dedicated infra & negotiated SLAs. Built for teams that can't afford downtime.

✓Custom request & token volume

✓All models + private model hosting

✓Unlimited keys & seats

✓SSO / SAML · Audit logs · VPC deploy

✓Unlimited RAG · White-label

✓99.95% SLA · Named CSM · Dedicated Slack

Talk to us Compare all plans

See full pricing & feature comparison ›

/07 — FAQ

Questions, answered.

What exactly is Tarqa?+

Tarqa is the unified gateway and control plane for AI. It sits below your application and above every provider — so you register, route, govern and observe every model through one endpoint instead of stitching together SDKs, keys and dashboards for each cloud.

Do I have to rewrite my code to switch models?+

No. The API is standards-compatible, so it drops into the SDKs you already use. Moving between any of 60+ models or a local GPU is a one-line change to the model string — your application code stays the same.

How does BYOK and "$0 markup" work?+

Bring your own provider keys. Token usage is billed by the provider directly — Tarqa never marks up inference. You pay us for the control plane (routing, governance, RAG, observability), not for tokens.

What are GPU WebSocket tunnels?+

Run one CLI command and a local model — say an open-weight model on a spare GPU under your desk — is exposed as a first-class endpoint through a secure WebSocket reverse proxy. Your whole team queries it through the standard Tarqa API, at $0 cloud inference.

Is it secure and enterprise-ready?+

Security and governance are built into the core: SAML 2.0 SSO with your identity provider, 5-role RBAC, per-workspace isolation, encryption in transit and at rest, and exportable audit trails on every token. We're actively building toward SOC 2.

How does pricing scale?+

Start free with no credit card, then move to usage-based plans as you ship. Teams pool their request quota in a single registry — a 5-seat Team Pro plan shares 50,000 monthly requests rather than paying per seat.

/08 — From the Blog

Insights from the team.

Practical guides on AI integration, cost optimization, and building with 60+ models.

gateway

comprehensive-guide

Mastering TarqaAI: The Complete Guide to Multi-Model AI Integration

A comprehensive guide covering everything from basic setup to advanced deployment strategies, model optimization, and enterprise-grade reliability with TarqaAI.

Read article ›

cost

guide