Error Handling
Building resilient workflows that handle failures gracefully
Workflows interact with external systems that can fail. Building resilient workflows means handling errors gracefully, providing appropriate fallbacks, and ensuring issues are visible. This guide covers error handling strategies and best practices.
Understanding Errors
Error Types
| Type | Cause | Example |
|---|---|---|
| External service | API failures, rate limits | Slack API down |
| Data issues | Missing or malformed data | No meeting transcript |
| Configuration | Wrong settings | Invalid channel ID |
| Logic | Expression errors | Accessing null field |
| Timeout | Operation too slow | AI node exceeds 120s |
Error vs Success Channels
Many nodes have two output types:
[Node]
├─ Success ─→ Normal processing
└─ Error ───→ Error handling
- Success output: Node completed normally
- Error output: Node encountered a recoverable error
Retry Behavior
Automatic Retries
Nodes automatically retry on transient failures:
| Setting | Default |
|---|---|
| Max attempts | 3 |
| Strategy | Exponential with jitter |
| Initial backoff | 1 second |
| Max backoff | 30 seconds |
Attempt 1 ─fail─→ Wait ~1s → Attempt 2 ─fail─→ Wait ~2s → Attempt 3
What Gets Retried
✅ Retried:
- Network timeouts
- Rate limit responses (429)
- Temporary service errors (503)
- Connection failures
❌ Not retried:
- Invalid configuration
- Permission errors (403)
- Not found errors (404)
- Validation failures
After Retries Exhaust
When max attempts are reached:
- Error is sent to error output (if connected)
- If no error output, execution fails
- Failed executions appear in logs
Error Handling Patterns
Pattern: Log and Continue
Log the error, terminate the branch gracefully:
[Action Node]
├─ Success ─→ [Continue Processing]
└─ Error ───→ [Log Error] → [Sink]
Use when: Error is not critical, workflow should complete other branches.
Pattern: Notify on Error
Alert when something fails:
[Action Node]
├─ Success ─→ [Continue]
└─ Error ───→ [Slack: "Error in workflow"] → [Sink]
Use when: Failures need human attention.
Pattern: Fallback Action
Try an alternative when primary fails:
[Primary Action]
├─ Success ─→ [Done]
└─ Error ───→ [Fallback Action] → [Done]
Use when: There's a reasonable alternative.
Pattern: Graceful Degradation
Do something simpler when the full action fails:
[AI: Complex Analysis]
├─ Success ─→ [Rich Notification]
└─ Error ───→ [Simple Notification] → [Sink]
Use when: Partial success is better than complete failure.
Pattern: Validation Before Action
Check data before attempting error-prone operations:
[If: data is valid?]
├─ Yes ─→ [Action] → [Continue]
└─ No ──→ [Log: "Invalid data"] → [Sink]
Use when: You can predict failure conditions.
Error Handling by Node Type
Loader Nodes (Load Meeting)
Common errors:
- Meeting not found
- No recording/transcript
- Permission denied
Handling:
[Load Meeting]
├─ Success ─→ [Process with data]
└─ Error ───→ [Slack: "Could not load meeting"] → [Sink]
AI Nodes
Common errors:
- Timeout (>120s)
- Parse error (output doesn't match type)
- Content filter triggered
Handling:
[AI Prompt]
├─ Success ─→ [Use AI output]
└─ Error ───→ [Use default/fallback] → [Continue]
Action Nodes (Slack, Email, CRM)
Common errors:
- Authentication failure
- Rate limits
- Invalid recipient/channel
Handling:
[Slack Post]
├─ Success ─→ [Done]
└─ Error ───→ [Email fallback] → [Sink]
Building Resilient Workflows
1. Always Connect Error Outputs
❌ Bad: Error output disconnected
[Node]
├─ Success ─→ [Next]
└─ Error ───→ (nothing)
✅ Good: Error output handled
[Node]
├─ Success ─→ [Next]
└─ Error ───→ [Handle] → [Sink]
2. Validate Early
Check conditions before complex processing:
[Load Meeting]
↓
[If: has transcript?]
├─ Yes ─→ [AI Analysis] → [Send]
└─ No ──→ [Simple notification] → [Sink]
3. Provide Context in Error Messages
When alerting on errors, include useful context:
⚠️ Workflow Error
*Workflow:* Post-Meeting Summary
*Meeting:* {{ trigger.meetingPlanId }}
*Error:* {{ json.error }}
*Time:* {{ trigger.firedAt }}
Please investigate.
4. Don't Swallow Errors Silently
Avoid hiding problems:
❌ Bad: Error goes to Sink with no logging ✅ Good: Error is logged/alerted, then Sink
5. Use Appropriate Timeouts
AI nodes have 120s timeouts. For complex prompts:
- Keep prompts concise
- Reduce input data size
- Consider model tier
6. Plan for Partial Failures
What if some parts succeed and others fail?
[Send Email] → Success
[Update CRM] → Error
Design workflows where partial completion is acceptable or add compensation logic.
Monitoring and Debugging
Execution Logs
View detailed execution history:
- Each node's status
- Input/output data
- Error messages and stack traces
Error Indicators
Watch for patterns:
- High failure rates on specific nodes
- Time-based failures (rate limits)
- Data-related failures (specific inputs)
Alerting Strategy
For critical workflows:
- Immediate alert: Send Slack/SMS on failure
- Log all errors: Maintain history for debugging
- Track metrics: Monitor success/failure rates
Error Response Checklist
When a workflow fails:
- Identify the failing node - Check execution logs
- Understand the error - Read error message/code
- Determine root cause - Data? Service? Config?
- Fix the issue - Update workflow or fix external cause
- Retry if needed - Some failures are temporary
- Add prevention - Update workflow to handle edge case
Best Practices Summary
| Practice | Description |
|---|---|
| Connect all error outputs | Never leave errors unhandled |
| Log errors | Maintain visibility into failures |
| Alert on critical failures | Get human attention when needed |
| Validate data early | Catch problems before complex processing |
| Provide fallbacks | Have alternatives for non-critical failures |
| Include context | Error messages should be actionable |
| Test error paths | Verify error handling works |
| Monitor patterns | Watch for recurring issues |
Example: Complete Error Handling
[Event Trigger: MEETING_ENDED]
↓
[Load Meeting]
↓ ↓
Success Error → [Slack: "Load failed"] → [Sink]
↓
[If: has transcript?]
↓ ↓
Yes No → [Slack: "No transcript"]
↓
[AI: Analyze]
↓ ↓
Success Error → [Slack: "AI failed"]
↓
[Slack Post]
↓ ↓
Success Error → [Email fallback]
↓
[Done]
This workflow:
- Handles Load Meeting failures
- Checks for transcript availability
- Has fallback for AI failures
- Has fallback for Slack failures
- Provides notifications at each failure point