Token Usage

Understanding and managing AI token consumption

Tokens are the units of text that AI models process. Understanding token usage helps you optimize performance and manage costs.

What Are Tokens?

Tokens are pieces of text that AI models read and generate:

A token is roughly 4 characters or 0.75 words in English
"Hello, world!" ≈ 4 tokens
A 1,000-word document ≈ 1,300 tokens
A meeting transcript might be 5,000-50,000 tokens

Token Examples

Text	Approximate Tokens
"Hello"	1
"Meeting summary"	2
Short sentence	10-15
Paragraph	50-100
1 page of text	400-500
Meeting summary	200-500
Full transcript	5,000-50,000

Token Components

Each AI request uses tokens in two areas:

Input Tokens (Prompt)

What you send to the AI:

System message
User message
Context data (meeting info, transcript)

Output Tokens (Completion)

What the AI generates:

The response
Structured output

Total tokens = Input tokens + Output tokens

Token Limits

Per Request Limits

Component	Limit
Maximum input	~128,000 tokens (varies by model)
Maximum output	~4,000-8,000 tokens (varies by model)

Practical Limits

For optimal performance:

Keep prompts under 10,000 tokens
Target outputs under 1,000 tokens
Use summaries instead of full transcripts

Reducing Token Usage

1. Use Transcript Summaries

The biggest token savings come from using summaries:

❌ High token usage:

liquid

{{ json.callRecording.transcript }}
// Could be 20,000+ tokens

✅ Low token usage:

liquid

{{ json.callRecording.transcriptSummary }}
// Typically 500-1,000 tokens

Impact: 10-40x reduction in input tokens

2. Include Only Relevant Data

Only include data the AI needs:

❌ Everything:

liquid

Meeting: {{ json.meeting | json }}
// Includes unnecessary metadata

✅ Only relevant fields:

liquid

Meeting: {{ json.meeting.title }}
Attendees: {{ json.meeting.attendees | map: "name" | join: ", " }}
Summary: {{ json.callRecording.transcriptSummary }}

3. Keep Prompts Concise

Every word in your prompt uses tokens:

❌ Verbose:

text

I would like you to carefully analyze the following meeting
transcript and then provide me with a comprehensive summary
that captures all of the key points and discussions...

✅ Concise:

text

Summarize this meeting in 3 bullet points.
Focus on decisions and next steps.

4. Limit Output Length

Constrain output to necessary length:

liquid

Provide a summary in exactly 2-3 sentences.
Maximum 100 words.

5. Use Appropriate Return Types

Simple types use fewer output tokens:

Return Type	Output Tokens
`boolean`	1
`integer`	1-2
`string` (short)	10-50
`string` (long)	100-500
`string_list`	20-100

Token Estimation

Quick Estimates

Content Type	Tokens
System prompt	50-200
User message (basic)	20-50
Meeting metadata	30-100
Transcript summary	500-1,500
Full transcript (30 min)	10,000-20,000
Full transcript (1 hour)	20,000-40,000
Summary output	100-300
List output (5 items)	50-100

Calculating Total Usage

Example: Meeting Summary Workflow

text

Input:
- System prompt: 100 tokens
- Meeting metadata: 50 tokens
- Transcript summary: 800 tokens
- Total input: 950 tokens

Output:
- Summary: 150 tokens

Total: ~1,100 tokens per execution

Example: Full Transcript Analysis

text

Input:
- System prompt: 100 tokens
- Meeting metadata: 50 tokens
- Full transcript: 15,000 tokens
- Total input: 15,150 tokens

Output:
- Analysis: 500 tokens

Total: ~15,650 tokens per execution

Cost Optimization Strategies

Strategy 1: Filter Before AI

Don't send everything to AI:

text

[Load Meeting] ──▶ [If: has recording?]
                        │
                        ├── Yes ──▶ [AI: Analyze]
                        │
                        └── No ───▶ [Simple notification]

Strategy 2: Use Lower Tiers

Lower model tiers often have lower per-token costs:

Tier	Relative Cost
Low	$
Medium	$$
High	$$$

Strategy 3: Batch Similar Operations

One comprehensive prompt vs multiple simple prompts:

❌ Multiple calls:

text

[AI: Extract summary] → tokens
[AI: Extract action items] → tokens
[AI: Extract sentiment] → tokens

✅ Single call:

text

[AI: Extract summary, action items, and sentiment] → tokens (once)

Strategy 4: Cache Results

If the same meeting might be processed multiple times, consider caching AI results.

Monitoring Token Usage

Execution Logs

Each AI execution shows:

Input token count
Output token count
Total tokens used

Cost Tracking

Track over time:

Average tokens per workflow
Tokens by workflow type
High-token workflows

Optimization Indicators

Watch for:

Workflows using > 10,000 input tokens
Full transcripts being processed
Redundant AI calls in same workflow

Token Usage by Task

Common Tasks

Task	Typical Input	Typical Output	Total
Quick classification	200-500	5-20	~500
Meeting summary	800-1,500	100-300	~1,500
Action item extraction	800-1,500	50-200	~1,500
Full analysis	1,000-2,000	300-500	~2,000
Transcript analysis	10,000-20,000	200-500	~15,000
Research with agent	2,000-5,000	500-1,000	~4,000

High vs Low Token Patterns

Low Token Pattern:

text

[Load Meeting] → [AI: Is urgent? (boolean)] → [Route]
~500 tokens

High Token Pattern:

text

[Load Meeting] → [AI: Full transcript analysis] → [Report]
~15,000+ tokens

Best Practices Summary

Practice	Token Impact	Implementation
Use summaries	10-40x reduction	`transcriptSummary`
Concise prompts	2-5x reduction	Remove verbose instructions
Limit output	2-3x reduction	Add length constraints
Filter first	Avoid unnecessary calls	Add If nodes
Batch operations	Avoid duplicate context	Combine prompts
Choose right tier	Cost efficiency	Match tier to task

Token Limits and Errors

Input Too Long

If input exceeds limits:

Error: "Input exceeds maximum context length"
Solution: Use summary instead of full content

Output Truncation

If output is cut off:

Response may be incomplete
Solution: Ask for shorter output, split into multiple calls

Handling Token Errors

text

[AI Prompt]
    ├── Success ──▶ [Use output]
    └── Error ────▶ [Retry with smaller input]