Error Handling

Building resilient workflows that handle failures gracefully

Workflows interact with external systems that can fail. Building resilient workflows means handling errors gracefully, providing appropriate fallbacks, and ensuring issues are visible. This guide covers error handling strategies and best practices.

Understanding Errors

Error Types

Type	Cause	Example
External service	API failures, rate limits	Slack API down
Data issues	Missing or malformed data	No meeting transcript
Configuration	Wrong settings	Invalid channel ID
Logic	Expression errors	Accessing null field
Timeout	Operation too slow	AI node exceeds 120s

Error vs Success Channels

Many nodes have two output types:

text

[Node]
├─ Success ─→ Normal processing
└─ Error ───→ Error handling

Success output: Node completed normally
Error output: Node encountered a recoverable error

Retry Behavior

Automatic Retries

Nodes automatically retry on transient failures:

Setting	Default
Max attempts	3
Strategy	Exponential with jitter
Initial backoff	1 second
Max backoff	30 seconds

text

Attempt 1 ─fail─→ Wait ~1s → Attempt 2 ─fail─→ Wait ~2s → Attempt 3

What Gets Retried

✅ Retried:

Network timeouts
Rate limit responses (429)
Temporary service errors (503)
Connection failures

❌ Not retried:

Invalid configuration
Permission errors (403)
Not found errors (404)
Validation failures

After Retries Exhaust

When max attempts are reached:

Error is sent to error output (if connected)
If no error output, execution fails
Failed executions appear in logs

Error Handling Patterns

Pattern: Log and Continue

Log the error, terminate the branch gracefully:

text

[Action Node]
├─ Success ─→ [Continue Processing]
└─ Error ───→ [Log Error] → [Sink]

Use when: Error is not critical, workflow should complete other branches.

Pattern: Notify on Error

Alert when something fails:

text

[Action Node]
├─ Success ─→ [Continue]
└─ Error ───→ [Slack: "Error in workflow"] → [Sink]

Use when: Failures need human attention.

Pattern: Fallback Action

Try an alternative when primary fails:

text

[Primary Action]
├─ Success ─→ [Done]
└─ Error ───→ [Fallback Action] → [Done]

Use when: There's a reasonable alternative.

Pattern: Graceful Degradation

Do something simpler when the full action fails:

text

[AI: Complex Analysis]
├─ Success ─→ [Rich Notification]
└─ Error ───→ [Simple Notification] → [Sink]

Use when: Partial success is better than complete failure.

Pattern: Validation Before Action

Check data before attempting error-prone operations:

text

[If: data is valid?]
├─ Yes ─→ [Action] → [Continue]
└─ No ──→ [Log: "Invalid data"] → [Sink]

Use when: You can predict failure conditions.

Error Handling by Node Type

Loader Nodes (Load Meeting)

Common errors:

Meeting not found
No recording/transcript
Permission denied

Handling:

text

[Load Meeting]
├─ Success ─→ [Process with data]
└─ Error ───→ [Slack: "Could not load meeting"] → [Sink]

AI Nodes

Common errors:

Timeout (>120s)
Parse error (output doesn't match type)
Content filter triggered

Handling:

text

[AI Prompt]
├─ Success ─→ [Use AI output]
└─ Error ───→ [Use default/fallback] → [Continue]

Action Nodes (Slack, Email, CRM)

Common errors:

Authentication failure
Rate limits
Invalid recipient/channel

Handling:

text

[Slack Post]
├─ Success ─→ [Done]
└─ Error ───→ [Email fallback] → [Sink]

Building Resilient Workflows

1. Always Connect Error Outputs

❌ Bad: Error output disconnected

text

[Node]
├─ Success ─→ [Next]
└─ Error ───→ (nothing)

✅ Good: Error output handled

text

[Node]
├─ Success ─→ [Next]
└─ Error ───→ [Handle] → [Sink]

2. Validate Early

Check conditions before complex processing:

text

      [Load Meeting]
            ↓
   [If: has transcript?]
   ├─ Yes ─→ [AI Analysis] → [Send]
   └─ No ──→ [Simple notification] → [Sink]

3. Provide Context in Error Messages

When alerting on errors, include useful context:

liquid

⚠️ Workflow Error

*Workflow:* Post-Meeting Summary
*Meeting:* {{ trigger.meetingPlanId }}
*Error:* {{ json.error }}
*Time:* {{ trigger.firedAt }}

Please investigate.

4. Don't Swallow Errors Silently

Avoid hiding problems:

❌ Bad: Error goes to Sink with no logging ✅ Good: Error is logged/alerted, then Sink

5. Use Appropriate Timeouts

AI nodes have 120s timeouts. For complex prompts:

Keep prompts concise
Reduce input data size
Consider model tier

6. Plan for Partial Failures

What if some parts succeed and others fail?

text

[Send Email] → Success
[Update CRM] → Error

Design workflows where partial completion is acceptable or add compensation logic.

Monitoring and Debugging

Execution Logs

View detailed execution history:

Each node's status
Input/output data
Error messages and stack traces

Error Indicators

Watch for patterns:

High failure rates on specific nodes
Time-based failures (rate limits)
Data-related failures (specific inputs)

Alerting Strategy

For critical workflows:

Immediate alert: Send Slack/SMS on failure
Log all errors: Maintain history for debugging
Track metrics: Monitor success/failure rates

Error Response Checklist

When a workflow fails:

Identify the failing node - Check execution logs
Understand the error - Read error message/code
Determine root cause - Data? Service? Config?
Fix the issue - Update workflow or fix external cause
Retry if needed - Some failures are temporary
Add prevention - Update workflow to handle edge case

Best Practices Summary

Practice	Description
Connect all error outputs	Never leave errors unhandled
Log errors	Maintain visibility into failures
Alert on critical failures	Get human attention when needed
Validate data early	Catch problems before complex processing
Provide fallbacks	Have alternatives for non-critical failures
Include context	Error messages should be actionable
Test error paths	Verify error handling works
Monitor patterns	Watch for recurring issues

Example: Complete Error Handling

text

        [Event Trigger: MEETING_ENDED]
                      ↓
                [Load Meeting]
                ↓           ↓
             Success      Error → [Slack: "Load failed"] → [Sink]
                ↓
       [If: has transcript?]
       ↓                 ↓
      Yes                No → [Slack: "No transcript"]
       ↓
  [AI: Analyze]
   ↓         ↓
Success    Error → [Slack: "AI failed"]
   ↓
[Slack Post]
 ↓         ↓
Success  Error → [Email fallback]
 ↓
[Done]

This workflow:

Handles Load Meeting failures
Checks for transcript availability
Has fallback for AI failures
Has fallback for Slack failures
Provides notifications at each failure point