Skip to main content

Command Palette

Search for a command to run...

A Practical Guide to Implementing LLM Token Usage Tracking for AI Agents

Published
3 min read

What you'll learn

  • How to architect a token counting system that works across different LLM providers
  • Why granular token tracking prevents unexpected API bills and performance degradation
  • The difference between prompt tokens, completion tokens, and effective token cost calculation
  • How to structure logs for historical analysis and cost optimization

Introduction

Building production AI agents means operating in an environment where every API call has a real financial and performance cost. Unlike traditional software development, where resources are predictable, LLM interactions introduce variable costs per request—and without proper tracking, these expenses compound quickly.

Token usage tracking isn't just about cost monitoring. It's about understanding your AI system's behavior, identifying inefficient prompts, and detecting anomalies before they drain your budget. This guide walks you through implementing a comprehensive tracking system from scratch.

Step 1: Choose Your Instrumentation Point

The most effective tracking happens at the API response level, not at the prompt construction level. Here's why: OpenAI and other providers return exact token counts in response headers, eliminating estimation errors.

Create a wrapper function that intercepts all LLM calls:

async function callLLMWithTracking(provider, model, messages, metadata) {
  const startTime = Date.now();
  const response = await provider.createChatCompletion({
    model: model,
    messages: messages,
  });

  return {
    response: response,
    tokens: {
      prompt: response.usage.prompt_tokens,
      completion: response.usage.completion_tokens,
      total: response.usage.total_tokens
    },
    duration: Date.now() - startTime,
    timestamp: new Date().toISOString()
  };
}

This single wrapper becomes your instrumentation backbone. Every agent interaction flows through it, creating a consistent data source.

Step 2: Structure Your Token Log Schema

Design a schema that captures context, not just numbers. A well-structured log answers questions like "Which agent feature costs the most?" and "Which prompts are inefficient?"

Your schema should include:

  • Agent ID and session ID (enables fleet tracking)
  • Model name and provider (crucial when comparing costs across providers)
  • Token counts (prompt, completion, total)
  • Prompt hash or summary (detect repeated queries)
  • Task category (classify by feature or function)
  • Latency metrics
  • Success/failure status

Tip: Include a cost_usd field calculated at logging time. Token prices change, but recording the actual cost at invocation time preserves historical accuracy for billing reconciliation.

Step 3: Implement Real-Time Aggregation

Raw logs are useful, but real-time metrics are actionable. Aggregate data into rolling windows—last hour, last 24 hours, last 7 days. This reveals trends like cost spikes or efficiency improvements.

Calculate three key metrics per agent:

  • Average tokens per completion
  • Cost per task (total tokens × unit price)
  • Token efficiency ratio (completion tokens ÷ prompt tokens)

The efficiency ratio is revealing: high ratios indicate prompts generating substantial output. Low ratios suggest concise answers—which might indicate underutilized model capability.

Step 4: Set Up Anomaly Detection

Tracking becomes valuable when you detect problems before they escalate. Define baseline thresholds:

  • Alert when a single request exceeds 8,000 tokens (potential prompt injection)
  • Alert when hourly agent costs exceed your daily budget ÷ 24
  • Alert when average tokens per task increases 30% month-over-month

Note: Anomalies often precede bugs. An unexpected token spike frequently signals a prompt becoming repetitive or a system message growing uncontrolled.

Step 5: Connect to Your Monitoring Infrastructure

For production AI agents, token tracking integrates into your broader observability stack. Tools like ClawPulse provide real-time dashboards for LLM metrics alongside your existing monitoring—giving you fleet-wide visibility into token consumption patterns, cost trends, and agent efficiency all in one place.

This unified view transforms token data from spreadsheet auditing into actionable intelligence.

Next Steps

Start with the instrumentation wrapper this week. Run it in staging, collect a week's worth of logs, then analyze patterns. Once you understand your baseline, implement anomaly detection and cost budgeting.

Ready to scale token tracking across multiple agents? ClawPulse dashboard provides built-in LLM monitoring. Visit clawpulse.org/signup to explore how real-time token analytics help you manage AI costs at scale.

More from this blog

C

ClawPulse

86 posts