Usage and Costs

Monitoring token usage and costs across your Superagent deployment, with per-agent breakdowns, model-level detail, and daily aggregation.

Superagent tracks every LLM API call your agents make and computes costs automatically. The usage dashboard in Settings > Usage gives you a daily breakdown of spending by agent and by model, so you can identify which agents consume the most tokens and where to optimize.

What is tracked

Every time an agent sends a message to the Claude API, the response includes token usage metadata. Superagent records four token categories for each API call:

Token type	Description
Input tokens	Tokens in the prompt sent to the model (user messages, system prompt, tool results)
Output tokens	Tokens generated by the model in its response
Cache creation tokens	Tokens written into the prompt cache on a cache miss
Cache read tokens	Tokens served from the prompt cache on a cache hit

These counts are extracted from the Claude API response's usage object and written to JSONL session log files alongside each assistant message.

How costs are calculated

Superagent calculates costs using per-million-token pricing for each Claude model. The pricing table covers all supported models:

Model family	Input	Output	Cache creation	Cache read
Claude Opus 4.6 / 4.7	$5.00	$25.00	$6.25	$0.50
Claude Opus 4.1 / 4	$15.00	$75.00	$18.75	$1.50
Claude Sonnet 4.5 / 4.6 / 4	$3.00	$15.00	$3.75	$0.30
Claude Haiku 4.5	$1.00	$5.00	$1.25	$0.10

All prices are per million tokens. The cost formula for a single API call is:

cost = (input_tokens * input_price
      + output_tokens * output_price
      + cache_creation_tokens * cache_creation_price
      + cache_read_tokens * cache_read_price) / 1,000,000

When a JSONL entry includes a costUSD field (provided by some proxy configurations), that value takes precedence over the calculated cost.

Model name normalization

Superagent normalizes model names from different providers before looking up pricing. This means usage from Bedrock (us.anthropic.claude-opus-4-6-v1), OpenRouter (anthropic/claude-4.6-opus-20260205), and the direct Anthropic API (claude-opus-4-6) all consolidate into a single entry in the usage chart.

Daily aggregation

Usage data is aggregated by calendar day (in local timezone) across all session log files. For each day, Superagent computes:

Total cost across all agents
Total tokens (sum of all four token types)
Per-agent breakdown with cost and token totals for each agent
Per-model breakdown with cost for each model used that day

The aggregation scans JSONL files in each agent's Claude configuration directory. To avoid double-counting, entries are deduplicated by message ID and request ID, keeping only the snapshot with the highest output token count (since Claude streams partial usage updates as it generates a response).

Data retention

Usage data is derived from session log files, so it persists as long as those files exist. Deleting an agent removes its session logs and associated usage data. There is no separate database table for usage --- it is computed on the fly from the raw logs.

The usage dashboard

The usage tab in Settings displays a stacked bar chart of daily costs. You can configure it with three controls:

Time range

Select from Last 7 days, Last 14 days, or Last 30 days. The API supports up to 90 days.

Segmentation

Total --- A single bar per day showing aggregate cost.
By Model --- Stacked bars colored by model, so you can see which models drive costs.
By Agent --- Stacked bars colored by agent, so you can see which agents drive costs.

Scope (auth mode)

In auth mode, admins see a scope toggle:

My Agents --- Only shows usage for agents the current user has access to.
All Agents --- Shows usage across the entire deployment.

Non-admin users always see only their own agents' usage.

The chart displays a running total at the bottom right (e.g., "Total: $4.72").

Context window tracking

In addition to cost tracking, Superagent monitors how much of each model's context window is being used during active sessions. Each session's metadata includes the latest context window usage percentage, calculated from the input token counts relative to the model's maximum context size.

The context percentage calculation handles both the old and new Anthropic API token counting formats:

New format: input_tokens already includes cached tokens, so it is used directly.
Old format: input_tokens counts only non-cached tokens, so cache creation and cache read tokens are added to get the total.

This percentage is displayed in the session sidebar, giving you a real-time sense of how close an agent is to its context limit.

Optimizing agent costs

Use the usage dashboard to identify cost reduction opportunities:

Check model distribution. Switch the segmentation to "By Model" to see if agents are using more expensive models than necessary. An agent handling simple tasks may not need Opus-tier models.
Review per-agent costs. Switch to "By Agent" to find agents with disproportionately high costs. These may benefit from better prompts, more focused instructions, or lower-effort settings.
Monitor cache hit rates. High cache creation costs with low cache read costs suggest that prompt caching is not being utilized effectively. Agents with stable system prompts and tool definitions benefit most from caching.
Watch context window usage. Sessions that consistently approach 100% context utilization are likely hitting compaction (context summarization), which generates additional output tokens. Structuring tasks to complete within the context window avoids this overhead.
Use scheduled task model overrides. Scheduled tasks accept an optional model parameter. You can configure recurring background tasks to use a less expensive model without changing the agent's default.