Cut your AI coding bill 40%
without changing a line of code.

Cohrint routes every Claude Code, Copilot, and Cursor request to the cheapest model that meets your quality bar. You save money while you sleep. We only get paid when you save.

Start saving in 60 seconds → See real savings data from 3 engineering teams
// how it works
Routing that works while you sleep
01
Classifies every request by intent
Autocomplete, generation, refactor, or explanation — classified in under 50ms. Each intent has a different cost-quality optimum.
02
Routes to the cheapest model that qualifies
Not the cheapest model, period. The cheapest model that meets your quality bar for that specific task. Quality is continuously sampled and measured.
03
Publishes real-time savings to your dashboard
Every routing decision is logged and overridable. You see exactly what was routed where, and exactly how much was saved. No black boxes.
// works everywhere
Drop into your stack in 60 seconds
Claude Code
Native MCP support — add one config block and you're live
MCP SERVER
VS Code
MCP server — add .vscode/mcp.json and ask about costs in chat
MCP SERVER
Python
Two-line drop-in proxy for OpenAI and Anthropic SDKs
SDK
TypeScript / JS
createOpenAIProxy() wraps any existing client with zero changes
SDK
MCP-compatible editors
Works with any editor supporting the Model Context Protocol — one config block
MCP SERVER
Cohrint CLI
Transparent wrapper — optimize, forward, track. Works with any AI CLI agent.
CLI TOOL
OTel Collector
Native OpenTelemetry ingestion — auto-track Claude Code, Codex CLI, and Gemini CLI
OTEL
Local Proxy
Privacy-first HTTP proxy — your keys and prompts never leave your machine
PROXY
Codex CLI
Track OpenAI Codex usage via OTel or CLI wrapper
OTEL
// integration
Two lines. Seriously that's it.
Python
TypeScript
MCP (Claude Code)
CLI Wrapper
# Before
from openai import OpenAI

# After — only 2 lines changed
import cohrint
from cohrint.proxy.openai_proxy import OpenAI

cohrint.init(api_key="crt_your_key")
client = OpenAI(api_key="sk-...")

# Everything else is identical — Cohrint wraps transparently
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)
# ✓ Tokens: 12 in, 8 out  ✓ Cost: $0.000110  ✓ Latency: 423ms
# ✓ Cheapest alternative: gemini-1.5-flash — save 94%
// Before
import OpenAI from "openai";

// After — only 2 lines changed
import { init, createOpenAIProxy } from "cohrint";
import OpenAI from "openai";

init({ apiKey: "crt_your_key" });
const openai = createOpenAIProxy(new OpenAI());

// Identical API — Cohrint wraps every call automatically
const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello!" }],
});
// ✓ Captured: tokens, cost, latency, cheapest alternative
// ~/.claude/mcp.json  (or claude-code / VS Code equivalent)
{
  "mcpServers": {
    "cohrint": {
      "command": "npx",
      "args": ["-y", "cohrint-mcp"],
      "env": {
        "COHRINT_API_KEY": "crt_your_key",
        "COHRINT_ORG_ID": "your_org_id"
      }
    }
  }
}

// Then ask Claude Code in chat:
// "How much did I spend on AI this week?"
// "Which model is cheapest for my summarisation workflow?"
// "Show requests wasting the most tokens"
# Install globally
$ npx cohrint-cli

# Pipe mode — optimize + forward to Claude
$ echo "Could you please explain kubernetes pods" | cohrint
  ⚡ Optimized: 16 → 12 tokens (saved 4, -25%)

  A Kubernetes pod is the smallest deployable unit...

  💰 Cost: $0.0065 | 💾 Saved: 4 tokens

# REPL mode — switch agents on the fly
$ cohrint
cohrint [claude] ▸ explain load balancers
cohrint [claude] ▸ /gemini summarize this in 2 lines
cohrint [claude] ▸ /compare what is DNS
cohrint [claude] ▸ /session  # interactive mode with /compact, /clear
cohrint [claude] ▸ /summary # dashboard stats in terminal
// live demo
See it in action — interactive preview
cohrint.com / app.html
INTERACTIVE
◈ All AI Spend
⌨ CLI Wrapper
🔧 Optimizer
📡 OTel Collector
⚖ Compare
MTD Spend
$4,821
↑ 12%
Tokens Used
182M
↓ eff +8%
Developers
14
3 providers
Budget Used
64%
$3,086 left
Daily spend by provider — last 30 days
CLAUDE CODE
$2,140
8 developers
CODEX CLI
$1,430
5 developers
GEMINI CLI
$820
4 developers
Create free account → Open full dashboard →

No credit card required · Free for teams under $500/mo in AI spend

// pricing
You only pay when you save
FREE
$0
for teams under $500/mo in AI spend
Start for free →
  • Routing with quality control
  • Real-time savings dashboard
  • No credit card
ENTERPRISE
Custom
available after 30 days on Growth
Talk to sales →
  • Custom contract terms
  • Dedicated support
  • SLA & SSO
// customer savings
Real savings from real teams
DESIGN PARTNER 1
$14,200/mo saved

"We pointed Cohrint at our Claude Code fleet on a Friday. By Monday it had rerouted 60% of our autocomplete traffic to a cheaper model with no quality drop."

Platform Engineering Lead · Series B startup
RESERVED FOR
Design Partner 2

Your team's savings data could go here.

Apply now →
RESERVED FOR
Design Partner 3

Your team's savings data could go here.

Apply now →

Enterprise-grade security, privacy-first local proxy, and full audit logging. Read our trust center →

// faq
Common questions
Does Cohrint store my prompts and responses? +
Cohrint stores a short preview of your request and response (first 500 chars) to help you debug expensive or low-quality calls. You can disable this entirely in your SDK config. We never train models on your data, and you can request deletion at any time.
Does adding the Cohrint SDK add latency to my calls? +
No measurable latency is added. Cohrint captures event data after your LLM call completes and sends it to our ingest API in a background thread. Your main request path is completely unaffected — zero added latency.
What happens if the Cohrint ingest API is down? +
Your AI calls continue to work normally — Cohrint is not in the critical path. Events are queued locally and retried when connectivity is restored. Your production traffic is never affected by Cohrint availability.
Which LLM providers and models are supported? +
Cohrint supports models across 3 providers: OpenAI (GPT-4o, o1, o3, Codex), Anthropic (Claude 3.5/4 family), and Google (Gemini 1.5/2.0). Pricing is updated regularly as new models launch.
Can I self-host Cohrint on my own infrastructure? +
Yes. The Cohrint server is open-source and deployable on Railway, Render, or any VPS in under 10 minutes. Enterprise customers get a Docker Compose package and Kubernetes Helm chart for air-gapped deployments where data never leaves your VPC.
How does the free tier compare to paid plans? +
The free tier gives you 50,000 events/month, the core cost dashboard, 7-day log retention, and 1 user seat — no credit card required, forever. The Team plan ($99/mo) adds unlimited requests, cross-model pricing intelligence, budget alerts, team attribution, and up to 10 seats.
What is the Cohrint CLI and how does it work? +
The Cohrint CLI is a lightweight terminal companion that integrates alongside any AI coding agent. It tracks usage costs in real time, provides session-level spend summaries, and surfaces optimization opportunities as you work — without modifying your existing workflow or requiring any configuration changes.
How does cross-platform OTel tracking work? +
Cohrint integrates with Claude Code, Codex CLI, Gemini CLI, and more via standard OpenTelemetry. Setup takes under 2 minutes — no code changes required. Token counts, cost metrics, and session data flow in automatically. For tools that don't report costs natively, Cohrint calculates USD spend from token counts using live model pricing.
CONTACT US
We're here to help

Reach out anytime — whether you have a question, need help getting started, or want to discuss an enterprise plan.

SUPPORT
Technical Help

SDK issues, integration questions, billing support.

[email protected]
GENERAL
Say Hello

Partnerships, press, feedback, or just a chat.

[email protected]
ENTERPRISE
Sales & Pricing

Custom pricing, dedicated support, SLA discussion.

[email protected]

Your team is overpaying.
We'll fix that.

// two lines of config · savings visible in 60 seconds · we only win when you save

Start saving in 60 seconds →
Coming in 2026: The Cohrint Index — the first public benchmark of AI coding efficiency across real engineering teams.