Usage and Costs
Monitoring token usage and costs across your Superagent deployment, with per-agent breakdowns, model-level detail, and daily aggregation.
Superagent tracks every LLM API call your agents make and computes costs automatically. The usage dashboard in Settings > Usage gives you a daily breakdown of spending by agent and by model, so you can identify which agents consume the most tokens and where to optimize.
What is tracked
Every time an agent sends a message to the Claude API, the response includes token usage metadata. Superagent records four token categories for each API call:
| Token type | Description |
|---|---|
| Input tokens | Tokens in the prompt sent to the model (user messages, system prompt, tool results) |
| Output tokens | Tokens generated by the model in its response |
| Cache creation tokens | Tokens written into the prompt cache on a cache miss |
| Cache read tokens | Tokens served from the prompt cache on a cache hit |
These counts are extracted from the Claude API response's usage object and written to JSONL session log files alongside each assistant message.
How costs are calculated
Superagent calculates costs using per-million-token pricing for each Claude model. The pricing table covers all supported models:
| Model family | Input | Output | Cache creation | Cache read |
|---|---|---|---|---|
| Claude Opus 4.6 / 4.7 | $5.00 | $25.00 | $6.25 | $0.50 |
| Claude Opus 4.1 / 4 | $15.00 | $75.00 | $18.75 | $1.50 |
| Claude Sonnet 4.5 / 4.6 / 4 | $3.00 | $15.00 | $3.75 | $0.30 |
| Claude Haiku 4.5 | $1.00 | $5.00 | $1.25 | $0.10 |
All prices are per million tokens. The cost formula for a single API call is:
cost = (input_tokens * input_price
+ output_tokens * output_price
+ cache_creation_tokens * cache_creation_price
+ cache_read_tokens * cache_read_price) / 1,000,000
When a JSONL entry includes a costUSD field (provided by some proxy configurations), that value takes precedence over the calculated cost.
Model name normalization
Superagent normalizes model names from different providers before looking up pricing. This means usage from Bedrock (us.anthropic.claude-opus-4-6-v1), OpenRouter (anthropic/claude-4.6-opus-20260205), and the direct Anthropic API (claude-opus-4-6) all consolidate into a single entry in the usage chart.
Daily aggregation
Usage data is aggregated by calendar day (in local timezone) across all session log files. For each day, Superagent computes:
- Total cost across all agents
- Total tokens (sum of all four token types)
- Per-agent breakdown with cost and token totals for each agent
- Per-model breakdown with cost for each model used that day
The aggregation scans JSONL files in each agent's Claude configuration directory. To avoid double-counting, entries are deduplicated by message ID and request ID, keeping only the snapshot with the highest output token count (since Claude streams partial usage updates as it generates a response).
Data retention
Usage data is derived from session log files, so it persists as long as those files exist. Deleting an agent removes its session logs and associated usage data. There is no separate database table for usage --- it is computed on the fly from the raw logs.
The usage dashboard
The usage tab in Settings displays a stacked bar chart of daily costs. You can configure it with three controls:
Time range
Select from Last 7 days, Last 14 days, or Last 30 days. The API supports up to 90 days.
Segmentation
- Total --- A single bar per day showing aggregate cost.
- By Model --- Stacked bars colored by model, so you can see which models drive costs.
- By Agent --- Stacked bars colored by agent, so you can see which agents drive costs.
Scope (auth mode)
In auth mode, admins see a scope toggle:
- My Agents --- Only shows usage for agents the current user has access to.
- All Agents --- Shows usage across the entire deployment.
Non-admin users always see only their own agents' usage.
The chart displays a running total at the bottom right (e.g., "Total: $4.72").
Context window tracking
In addition to cost tracking, Superagent monitors how much of each model's context window is being used during active sessions. Each session's metadata includes the latest context window usage percentage, calculated from the input token counts relative to the model's maximum context size.
The context percentage calculation handles both the old and new Anthropic API token counting formats:
- New format:
input_tokensalready includes cached tokens, so it is used directly. - Old format:
input_tokenscounts only non-cached tokens, so cache creation and cache read tokens are added to get the total.
This percentage is displayed in the session sidebar, giving you a real-time sense of how close an agent is to its context limit.
Optimizing agent costs
Use the usage dashboard to identify cost reduction opportunities:
- Check model distribution. Switch the segmentation to "By Model" to see if agents are using more expensive models than necessary. An agent handling simple tasks may not need Opus-tier models.
- Review per-agent costs. Switch to "By Agent" to find agents with disproportionately high costs. These may benefit from better prompts, more focused instructions, or lower-effort settings.
- Monitor cache hit rates. High cache creation costs with low cache read costs suggest that prompt caching is not being utilized effectively. Agents with stable system prompts and tool definitions benefit most from caching.
- Watch context window usage. Sessions that consistently approach 100% context utilization are likely hitting compaction (context summarization), which generates additional output tokens. Structuring tasks to complete within the context window avoids this overhead.
- Use scheduled task model overrides. Scheduled tasks accept an optional
modelparameter. You can configure recurring background tasks to use a less expensive model without changing the agent's default.