Token Usage
Understanding and managing AI token consumption
Tokens are the units of text that AI models process. Understanding token usage helps you optimize performance and manage costs.
What Are Tokens?
Tokens are pieces of text that AI models read and generate:
- A token is roughly 4 characters or 0.75 words in English
- "Hello, world!" ≈ 4 tokens
- A 1,000-word document ≈ 1,300 tokens
- A meeting transcript might be 5,000-50,000 tokens
Token Examples
| Text | Approximate Tokens |
|---|---|
| "Hello" | 1 |
| "Meeting summary" | 2 |
| Short sentence | 10-15 |
| Paragraph | 50-100 |
| 1 page of text | 400-500 |
| Meeting summary | 200-500 |
| Full transcript | 5,000-50,000 |
Token Components
Each AI request uses tokens in two areas:
Input Tokens (Prompt)
What you send to the AI:
- System message
- User message
- Context data (meeting info, transcript)
Output Tokens (Completion)
What the AI generates:
- The response
- Structured output
Total tokens = Input tokens + Output tokens
Token Limits
Per Request Limits
| Component | Limit |
|---|---|
| Maximum input | ~128,000 tokens (varies by model) |
| Maximum output | ~4,000-8,000 tokens (varies by model) |
Practical Limits
For optimal performance:
- Keep prompts under 10,000 tokens
- Target outputs under 1,000 tokens
- Use summaries instead of full transcripts
Reducing Token Usage
1. Use Transcript Summaries
The biggest token savings come from using summaries:
❌ High token usage:
{{ json.callRecording.transcript }}
// Could be 20,000+ tokens
✅ Low token usage:
{{ json.callRecording.transcriptSummary }}
// Typically 500-1,000 tokens
Impact: 10-40x reduction in input tokens
2. Include Only Relevant Data
Only include data the AI needs:
❌ Everything:
Meeting: {{ json.meeting | json }}
// Includes unnecessary metadata
✅ Only relevant fields:
Meeting: {{ json.meeting.title }}
Attendees: {{ json.meeting.attendees | map: "name" | join: ", " }}
Summary: {{ json.callRecording.transcriptSummary }}
3. Keep Prompts Concise
Every word in your prompt uses tokens:
❌ Verbose:
I would like you to carefully analyze the following meeting
transcript and then provide me with a comprehensive summary
that captures all of the key points and discussions...
✅ Concise:
Summarize this meeting in 3 bullet points.
Focus on decisions and next steps.
4. Limit Output Length
Constrain output to necessary length:
Provide a summary in exactly 2-3 sentences.
Maximum 100 words.
5. Use Appropriate Return Types
Simple types use fewer output tokens:
| Return Type | Output Tokens |
|---|---|
boolean | 1 |
integer | 1-2 |
string (short) | 10-50 |
string (long) | 100-500 |
string_list | 20-100 |
Token Estimation
Quick Estimates
| Content Type | Tokens |
|---|---|
| System prompt | 50-200 |
| User message (basic) | 20-50 |
| Meeting metadata | 30-100 |
| Transcript summary | 500-1,500 |
| Full transcript (30 min) | 10,000-20,000 |
| Full transcript (1 hour) | 20,000-40,000 |
| Summary output | 100-300 |
| List output (5 items) | 50-100 |
Calculating Total Usage
Example: Meeting Summary Workflow
Input:
- System prompt: 100 tokens
- Meeting metadata: 50 tokens
- Transcript summary: 800 tokens
- Total input: 950 tokens
Output:
- Summary: 150 tokens
Total: ~1,100 tokens per execution
Example: Full Transcript Analysis
Input:
- System prompt: 100 tokens
- Meeting metadata: 50 tokens
- Full transcript: 15,000 tokens
- Total input: 15,150 tokens
Output:
- Analysis: 500 tokens
Total: ~15,650 tokens per execution
Cost Optimization Strategies
Strategy 1: Filter Before AI
Don't send everything to AI:
[Load Meeting] ──▶ [If: has recording?]
│
├── Yes ──▶ [AI: Analyze]
│
└── No ───▶ [Simple notification]
Strategy 2: Use Lower Tiers
Lower model tiers often have lower per-token costs:
| Tier | Relative Cost |
|---|---|
| Low | $ |
| Medium | $$ |
| High | $$$ |
Strategy 3: Batch Similar Operations
One comprehensive prompt vs multiple simple prompts:
❌ Multiple calls:
[AI: Extract summary] → tokens
[AI: Extract action items] → tokens
[AI: Extract sentiment] → tokens
✅ Single call:
[AI: Extract summary, action items, and sentiment] → tokens (once)
Strategy 4: Cache Results
If the same meeting might be processed multiple times, consider caching AI results.
Monitoring Token Usage
Execution Logs
Each AI execution shows:
- Input token count
- Output token count
- Total tokens used
Cost Tracking
Track over time:
- Average tokens per workflow
- Tokens by workflow type
- High-token workflows
Optimization Indicators
Watch for:
- Workflows using > 10,000 input tokens
- Full transcripts being processed
- Redundant AI calls in same workflow
Token Usage by Task
Common Tasks
| Task | Typical Input | Typical Output | Total |
|---|---|---|---|
| Quick classification | 200-500 | 5-20 | ~500 |
| Meeting summary | 800-1,500 | 100-300 | ~1,500 |
| Action item extraction | 800-1,500 | 50-200 | ~1,500 |
| Full analysis | 1,000-2,000 | 300-500 | ~2,000 |
| Transcript analysis | 10,000-20,000 | 200-500 | ~15,000 |
| Research with agent | 2,000-5,000 | 500-1,000 | ~4,000 |
High vs Low Token Patterns
Low Token Pattern:
[Load Meeting] → [AI: Is urgent? (boolean)] → [Route]
~500 tokens
High Token Pattern:
[Load Meeting] → [AI: Full transcript analysis] → [Report]
~15,000+ tokens
Best Practices Summary
| Practice | Token Impact | Implementation |
|---|---|---|
| Use summaries | 10-40x reduction | transcriptSummary |
| Concise prompts | 2-5x reduction | Remove verbose instructions |
| Limit output | 2-3x reduction | Add length constraints |
| Filter first | Avoid unnecessary calls | Add If nodes |
| Batch operations | Avoid duplicate context | Combine prompts |
| Choose right tier | Cost efficiency | Match tier to task |
Token Limits and Errors
Input Too Long
If input exceeds limits:
- Error: "Input exceeds maximum context length"
- Solution: Use summary instead of full content
Output Truncation
If output is cut off:
- Response may be incomplete
- Solution: Ask for shorter output, split into multiple calls
Handling Token Errors
[AI Prompt]
├── Success ──▶ [Use output]
└── Error ────▶ [Retry with smaller input]