AI Spend Intelligence for Engineering Teams

Know what your AI bill will be
before it arrives — and cut it.

Per developer, per model, per team. Built for engineering leaders who need answers before the invoice arrives.

Claude Code OpenAI Codex CLI Gemini CLI Cursor Cline GitHub Copilot
Request Access → View Live Demo
$ pip install cohrint · $ npm install cohrint · $ npx cohrint-cli

↗ Open source · Free forever · No credit card

3
AI coding agents tracked via OTel — Claude Code, Codex CLI, and Gemini CLI
40%
avg token savings on real-world prompts — automatically, before your request is sent
22
LLM models with live pricing — auto cost estimation even when your tool doesn't report costs
$0
to get started — optimizer, CLI wrapper, dashboard analytics — no credit card
// how it works
Up and running in three steps
01
Install the SDK
One pip or npm install. Drop two lines into your existing code — no architecture changes, no new infrastructure to manage.
02
Every call is tracked + optimized
Tokens, cost, latency and quality scores are captured automatically. Cohrint surfaces optimization opportunities and model recommendations — consistently delivering 20–40% token savings.
03
Get live recommendations
Real-time alerts on cost spikes, model suggestions based on your actual usage, quality regression warnings, and budget governance — in the dashboard or right in your editor via MCP.
// capabilities
Everything you need to run AI intelligently
📊
Token & cost analytics
40+ metrics
Real-time breakdown of token usage and spend per model, feature, user and team. See which 5% of requests consume 50% of your budget.
→ per-request granularity · daily trend · 365-day history
💱
Cross-model pricing intelligence
24 LLMs live
Live pricing across 24 LLMs from OpenAI, Anthropic, and Google. Compare costs side-by-side and find the cheapest model for your exact workload and quality requirements.
→ "save $3,200/mo switching to Gemini Flash"
🏷️
Cost attribution & chargeback
per-call tags
Break down AI spend by team, feature, user or customer. Tag every call with custom dimensions. Built-in team budgets with per-team alerts and RBAC data isolation.
→ team · feature · user · environment · custom tags
🔔
Budget alerts & enforcement
4 thresholds
Configurable spend thresholds at the team and organizational level. Graduated alerts notify stakeholders progressively as budgets are approached, with automated escalation before limits are breached.
→ graduated alerts · per-team · per-org · Slack · email
📈
Exec ROI dashboard
cost / outcome
Translate token costs into business outcomes — cost per PR, per resolved ticket, per feature shipped. Built for CTOs and CFOs who need signal, not raw token counts.
→ cost/PR · cost/commit · lines-of-code per dollar
♻️
Cache & waste detection
cut redundant spend
Identify redundant AI spend before it compounds. Cohrint surfaces patterns of repeated work across your team's usage, enabling targeted caching decisions that translate directly to cost reduction.
→ redundancy analysis · savings by model · cost trend reporting
👥
Developer productivity ROI
per-developer
Track cost per PR, cost per commit, and lines of code per dollar for every developer. Correlate AI spend with actual code output. Enterprise chargeback reporting built in.
→ cost/PR · cost/commit · lines/$ · team budgets
🚨
Anomaly detection
proactive alerts
Continuous monitoring of your AI spend patterns. Detects cost spikes, traffic surges, and runaway agents — with Slack notifications before the damage reaches your invoice.
→ continuous monitoring · Slack · cost spike detection
⚙️
CI/CD cost gate
pipeline guard
Enforce AI spend policies directly in your CI/CD pipeline. Deployments that exceed defined thresholds are flagged automatically, ensuring budget governance extends into your engineering workflow.
→ GitHub Actions · configurable thresholds · automated enforcement
🤖
MCP — query costs from your IDE
12 tools
Native integration with Claude Code, VS Code, and any MCP-compatible editor. Ask cost questions in natural language and receive actionable insights without leaving your development environment.
→ Claude Code native · VS Code · any MCP editor · zero configuration
⌨️
Cohrint CLI agent
instant visibility
A lightweight terminal companion for AI-assisted development. Tracks spend, surfaces cost insights, and provides real-time feedback — all without interrupting your existing workflow.
→ works with any AI CLI · real-time tracking · zero friction
🔒
Local proxy — zero trust
3 privacy modes
Run the Cohrint proxy on your machine. API keys and prompts never leave your environment — only token counts and costs reach the dashboard. Strict / standard / relaxed modes.
→ strict · standard · relaxed · npm install cohrint-local-proxy
🌐
Cross-platform OTel collector
10+ tools
Unified cost visibility across every AI coding tool your team uses. Plug in once and Cohrint automatically consolidates spend, tokens, and usage data from all connected tools into a single dashboard.
→ Claude Code · GitHub Copilot · Gemini · Codex · and more
🔗
Agent trace explorer
end-to-end visibility
Visualize multi-step AI agent workflows end-to-end. Understand cost and latency distribution across every step in a pipeline — enabling precise optimization at the workflow level, not just the call level.
→ workflow visualization · cost per step · latency breakdown
💡
Live recommendations
actionable
Continuous, automated analysis of your AI usage patterns. Cohrint surfaces prioritized cost-reduction opportunities and notifies your team via Slack before overspend reaches your invoice.
→ model optimization · proactive alerts · Slack notifications
🐍
SDK — Python & TypeScript
2 lines
Drop-in proxy wrappers for OpenAI and Anthropic SDKs. Zero API changes. Streaming supported. Works in any framework. Auto-captures cost, tokens, latency and metadata per call.
→ pip install cohrint · npm install cohrint · streaming
🛡️
Security & data governance
zero plaintext
Enterprise-grade security architecture with credentials stored in hardened, non-reversible form. All session management follows current web security standards. Hosted on Cloudflare's globally distributed, SOC 2-certified edge network.
→ hardened credential storage · secure sessions · SOC 2 · Cloudflare edge
👤
RBAC & team scoping
4 roles
Four roles: owner / admin / member / viewer. Members can be scoped to a single team — they see only their team's data, never another team's. Viewers are read-only at the API level.
→ owner · admin · member · viewer · team-scoped isolation
📋
Audit log & compliance
every action
Full event stream of every API action — who accessed what, when, from which key. Export-ready for compliance audits, security reviews, and SOC2 evidence packages.
→ actor · action · timestamp · exportable
🎯
Hallucination & quality scoring
6 dimensions
Track hallucination rate, faithfulness, relevancy, consistency, toxicity and efficiency per model and per call. Compare quality across providers and catch regressions before users do.
→ hallucination · faithfulness · relevancy · toxicity · efficiency
🔔
Budget alerts & enforcement
4 thresholds
Configurable spend thresholds at the team and organizational level. Graduated alerts notify stakeholders progressively as budgets are approached, with automated escalation before limits are breached.
→ graduated alerts · per-team · per-org · Slack · email
🔑
Key recovery & session security
single-use tokens
Secure, time-limited account recovery delivered via email. Session tokens carry strong entropy and are bound to authenticated devices. Administrators can invalidate all active sessions instantly.
→ time-limited recovery · device-bound sessions · instant revocation
⚙️
CI/CD cost gate
pipeline guard
Enforce AI spend policies directly in your CI/CD pipeline. Deployments that exceed defined thresholds are flagged automatically, ensuring budget governance extends into your engineering workflow.
→ GitHub Actions · configurable thresholds · automated enforcement
// works everywhere
Drop into your stack in 60 seconds
Claude Code
Native MCP support — add one config block and you're live
MCP SERVER
VS Code
MCP server — add .vscode/mcp.json and ask about costs in chat
MCP SERVER
Python
Two-line drop-in proxy for OpenAI and Anthropic SDKs
SDK
TypeScript / JS
createOpenAIProxy() wraps any existing client with zero changes
SDK
MCP-compatible editors
Works with any editor supporting the Model Context Protocol — one config block
MCP SERVER
Cohrint CLI
Transparent wrapper — optimize, forward, track. Works with any AI CLI agent.
CLI TOOL
OTel Collector
Native OpenTelemetry ingestion — auto-track Claude Code, Codex CLI, and Gemini CLI
OTEL
Local Proxy
Privacy-first HTTP proxy — your keys and prompts never leave your machine
PROXY
Codex CLI
Track OpenAI Codex usage via OTel or CLI wrapper
OTEL
// integration
Two lines. Seriously that's it.
Python
TypeScript
MCP (Claude Code)
CLI Wrapper
# Before
from openai import OpenAI

# After — only 2 lines changed
import cohrint
from cohrint.proxy.openai_proxy import OpenAI

cohrint.init(api_key="crt_your_key")
client = OpenAI(api_key="sk-...")

# Everything else is identical — Cohrint wraps transparently
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)
# ✓ Tokens: 12 in, 8 out  ✓ Cost: $0.000110  ✓ Latency: 423ms
# ✓ Cheapest alternative: gemini-1.5-flash — save 94%
// Before
import OpenAI from "openai";

// After — only 2 lines changed
import { init, createOpenAIProxy } from "cohrint";
import OpenAI from "openai";

init({ apiKey: "crt_your_key" });
const openai = createOpenAIProxy(new OpenAI());

// Identical API — Cohrint wraps every call automatically
const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello!" }],
});
// ✓ Captured: tokens, cost, latency, cheapest alternative
// ~/.claude/mcp.json  (or claude-code / VS Code equivalent)
{
  "mcpServers": {
    "cohrint": {
      "command": "npx",
      "args": ["-y", "cohrint-mcp"],
      "env": {
        "COHRINT_API_KEY": "crt_your_key",
        "COHRINT_ORG_ID": "your_org_id"
      }
    }
  }
}

// Then ask Claude Code in chat:
// "How much did I spend on AI this week?"
// "Which model is cheapest for my summarisation workflow?"
// "Show requests wasting the most tokens"
# Install globally
$ npx cohrint-cli

# Pipe mode — optimize + forward to Claude
$ echo "Could you please explain kubernetes pods" | cohrint
  ⚡ Optimized: 16 → 12 tokens (saved 4, -25%)

  A Kubernetes pod is the smallest deployable unit...

  💰 Cost: $0.0065 | 💾 Saved: 4 tokens

# REPL mode — switch agents on the fly
$ cohrint
cohrint [claude] ▸ explain load balancers
cohrint [claude] ▸ /gemini summarize this in 2 lines
cohrint [claude] ▸ /compare what is DNS
cohrint [claude] ▸ /session  # interactive mode with /compact, /clear
cohrint [claude] ▸ /summary # dashboard stats in terminal
// live demo
See it in action — interactive preview
cohrint.com / app.html
INTERACTIVE
◈ All AI Spend
⌨ CLI Wrapper
🔧 Optimizer
📡 OTel Collector
⚖ Compare
MTD Spend
$4,821
↑ 12%
Tokens Used
182M
↓ eff +8%
Developers
14
3 providers
Budget Used
64%
$3,086 left
Daily spend by provider — last 30 days
CLAUDE CODE
$2,140
8 developers
CODEX CLI
$1,430
5 developers
GEMINI CLI
$820
4 developers
Create free account → Open full dashboard →

No credit card required · Free tier includes 50,000 events/month

// pricing
Simple, transparent pricing
FREE
$0
forever · no credit card
Sign up free →
  • Up to 50,000 events/month
  • All core analytics
  • 1 org
ENTERPRISE
Custom
volume pricing
Talk to sales →
  • Everything in Team
  • Dedicated support
  • SLA discussion
// used by developers at

In early access — design partners welcome

We're onboarding a small cohort of engineering teams to shape the roadmap. Early partners get free Pro access and a direct line to the founders.

Become a design partner →

No commitment. Cancel anytime. [email protected]

// vs the alternatives
Why teams choose Cohrint
Capability ✦ Cohrint API Gateway Tools LLM Observability SDKs General APM Platforms
Real-time cost dashboardpartial
Cross-model pricing intelligence
Token efficiency scoring
MCP server (AI-native IDE integration)
Budget alerts (Slack/email)partial
Cross-platform OTel ingestion
CLI agent wrapper + optimizer
Privacy-first local proxy
RBAC + full audit logpartialpartial
Free tier✓ generous

Capabilities based on publicly documented features of top-rated tools in each category (G2, Capterra, vendor documentation). No competitor is specifically named or implied.

// security
Security controls actually shipped
🔑 API keys stored in hardened, non-reversible form — shown once at creation
🍪 Secure session management — industry-standard web security controls
📋 Full audit log — every admin action recorded with timestamp and IP
🌐 Encrypted in transit — globally distributed edge network, no plaintext
🚦 Automated rate limiting — brute-force protection on all auth endpoints
🔒 Privacy-first local proxy — prompts never leave your machine (opt-in)
// faq
Common questions
Does Cohrint store my prompts and responses? +
Cohrint stores a short preview of your request and response (first 500 chars) to help you debug expensive or low-quality calls. You can disable this entirely in your SDK config. We never train models on your data, and you can request deletion at any time.
Does adding the Cohrint SDK add latency to my calls? +
No measurable latency is added. Cohrint captures event data after your LLM call completes and sends it to our ingest API in a background thread. Your main request path is completely unaffected — zero added latency.
What happens if the Cohrint ingest API is down? +
Your AI calls continue to work normally — Cohrint is not in the critical path. Events are queued locally and retried when connectivity is restored. Your production traffic is never affected by Cohrint availability.
Which LLM providers and models are supported? +
Cohrint supports models across 3 providers: OpenAI (GPT-4o, o1, o3, Codex), Anthropic (Claude 3.5/4 family), and Google (Gemini 1.5/2.0). New models are added within 24 hours of launch.
Can I self-host Cohrint on my own infrastructure? +
Yes. The Cohrint server is open-source and deployable on Railway, Render, or any VPS in under 10 minutes. Enterprise customers get a Docker Compose package and Kubernetes Helm chart for air-gapped deployments where data never leaves your VPC.
How does the free tier compare to paid plans? +
The free tier gives you 50,000 events/month, the core cost dashboard, 7-day log retention, and 1 user seat — no credit card required, forever. The Team plan ($99/mo) adds unlimited requests, cross-model pricing intelligence, budget alerts, team attribution, and up to 10 seats.
What is the Cohrint CLI and how does it work? +
The Cohrint CLI is a lightweight terminal companion that integrates alongside any AI coding agent. It tracks usage costs in real time, provides session-level spend summaries, and surfaces optimization opportunities as you work — without modifying your existing workflow or requiring any configuration changes.
How does cross-platform OTel tracking work? +
Cohrint integrates with Claude Code, Codex CLI, Gemini CLI, and more via standard OpenTelemetry. Setup takes under 2 minutes — no code changes required. Token counts, cost metrics, and session data flow in automatically. For tools that don't report costs natively, Cohrint calculates USD spend from token counts using live model pricing.
CONTACT US
We're here to help

Reach out anytime — whether you have a question, need help getting started, or want to discuss an enterprise plan.

SUPPORT
Technical Help

SDK issues, integration questions, billing support.

[email protected]
GENERAL
Say Hello

Partnerships, press, feedback, or just a chat.

[email protected]
ENTERPRISE
Sales & Pricing

Custom pricing, dedicated support, SLA discussion.

[email protected]

Stop guessing.
Start measuring.

// one line of code · full visibility in 60 seconds · free forever

Create free account →