Search documentation

Search for pages in the documentation

Token Usage

Understanding and managing AI token consumption

Tokens are the units of text that AI models process. Understanding token usage helps you optimize performance and manage costs.

What Are Tokens?

Tokens are pieces of text that AI models read and generate:

  • A token is roughly 4 characters or 0.75 words in English
  • "Hello, world!" ≈ 4 tokens
  • A 1,000-word document ≈ 1,300 tokens
  • A meeting transcript might be 5,000-50,000 tokens

Token Examples

TextApproximate Tokens
"Hello"1
"Meeting summary"2
Short sentence10-15
Paragraph50-100
1 page of text400-500
Meeting summary200-500
Full transcript5,000-50,000

Token Components

Each AI request uses tokens in two areas:

Input Tokens (Prompt)

What you send to the AI:

  • System message
  • User message
  • Context data (meeting info, transcript)

Output Tokens (Completion)

What the AI generates:

  • The response
  • Structured output

Total tokens = Input tokens + Output tokens

Token Limits

Per Request Limits

ComponentLimit
Maximum input~128,000 tokens (varies by model)
Maximum output~4,000-8,000 tokens (varies by model)

Practical Limits

For optimal performance:

  • Keep prompts under 10,000 tokens
  • Target outputs under 1,000 tokens
  • Use summaries instead of full transcripts

Reducing Token Usage

1. Use Transcript Summaries

The biggest token savings come from using summaries:

High token usage:

liquid
{{ json.callRecording.transcript }}
// Could be 20,000+ tokens

Low token usage:

liquid
{{ json.callRecording.transcriptSummary }}
// Typically 500-1,000 tokens

Impact: 10-40x reduction in input tokens

2. Include Only Relevant Data

Only include data the AI needs:

Everything:

liquid
Meeting: {{ json.meeting | json }}
// Includes unnecessary metadata

Only relevant fields:

liquid
Meeting: {{ json.meeting.title }}
Attendees: {{ json.meeting.attendees | map: "name" | join: ", " }}
Summary: {{ json.callRecording.transcriptSummary }}

3. Keep Prompts Concise

Every word in your prompt uses tokens:

Verbose:

text
I would like you to carefully analyze the following meeting
transcript and then provide me with a comprehensive summary
that captures all of the key points and discussions...

Concise:

text
Summarize this meeting in 3 bullet points.
Focus on decisions and next steps.

4. Limit Output Length

Constrain output to necessary length:

liquid
Provide a summary in exactly 2-3 sentences.
Maximum 100 words.

5. Use Appropriate Return Types

Simple types use fewer output tokens:

Return TypeOutput Tokens
boolean1
integer1-2
string (short)10-50
string (long)100-500
string_list20-100

Token Estimation

Quick Estimates

Content TypeTokens
System prompt50-200
User message (basic)20-50
Meeting metadata30-100
Transcript summary500-1,500
Full transcript (30 min)10,000-20,000
Full transcript (1 hour)20,000-40,000
Summary output100-300
List output (5 items)50-100

Calculating Total Usage

Example: Meeting Summary Workflow

text
Input:
- System prompt: 100 tokens
- Meeting metadata: 50 tokens
- Transcript summary: 800 tokens
- Total input: 950 tokens

Output:
- Summary: 150 tokens

Total: ~1,100 tokens per execution

Example: Full Transcript Analysis

text
Input:
- System prompt: 100 tokens
- Meeting metadata: 50 tokens
- Full transcript: 15,000 tokens
- Total input: 15,150 tokens

Output:
- Analysis: 500 tokens

Total: ~15,650 tokens per execution

Cost Optimization Strategies

Strategy 1: Filter Before AI

Don't send everything to AI:

text
[Load Meeting] ──▶ [If: has recording?]
                        │
                        ├── Yes ──▶ [AI: Analyze]
                        │
                        └── No ───▶ [Simple notification]

Strategy 2: Use Lower Tiers

Lower model tiers often have lower per-token costs:

TierRelative Cost
Low$
Medium$$
High$$$

Strategy 3: Batch Similar Operations

One comprehensive prompt vs multiple simple prompts:

Multiple calls:

text
[AI: Extract summary] → tokens
[AI: Extract action items] → tokens
[AI: Extract sentiment] → tokens

Single call:

text
[AI: Extract summary, action items, and sentiment] → tokens (once)

Strategy 4: Cache Results

If the same meeting might be processed multiple times, consider caching AI results.

Monitoring Token Usage

Execution Logs

Each AI execution shows:

  • Input token count
  • Output token count
  • Total tokens used

Cost Tracking

Track over time:

  • Average tokens per workflow
  • Tokens by workflow type
  • High-token workflows

Optimization Indicators

Watch for:

  • Workflows using > 10,000 input tokens
  • Full transcripts being processed
  • Redundant AI calls in same workflow

Token Usage by Task

Common Tasks

TaskTypical InputTypical OutputTotal
Quick classification200-5005-20~500
Meeting summary800-1,500100-300~1,500
Action item extraction800-1,50050-200~1,500
Full analysis1,000-2,000300-500~2,000
Transcript analysis10,000-20,000200-500~15,000
Research with agent2,000-5,000500-1,000~4,000

High vs Low Token Patterns

Low Token Pattern:

text
[Load Meeting] → [AI: Is urgent? (boolean)] → [Route]
~500 tokens

High Token Pattern:

text
[Load Meeting] → [AI: Full transcript analysis] → [Report]
~15,000+ tokens

Best Practices Summary

PracticeToken ImpactImplementation
Use summaries10-40x reductiontranscriptSummary
Concise prompts2-5x reductionRemove verbose instructions
Limit output2-3x reductionAdd length constraints
Filter firstAvoid unnecessary callsAdd If nodes
Batch operationsAvoid duplicate contextCombine prompts
Choose right tierCost efficiencyMatch tier to task

Token Limits and Errors

Input Too Long

If input exceeds limits:

  • Error: "Input exceeds maximum context length"
  • Solution: Use summary instead of full content

Output Truncation

If output is cut off:

  • Response may be incomplete
  • Solution: Ask for shorter output, split into multiple calls

Handling Token Errors

text
[AI Prompt]
    ├── Success ──▶ [Use output]
    └── Error ────▶ [Retry with smaller input]