Search documentation

Search for pages in the documentation

Error Handling

Building resilient workflows that handle failures gracefully

Workflows interact with external systems that can fail. Building resilient workflows means handling errors gracefully, providing appropriate fallbacks, and ensuring issues are visible. This guide covers error handling strategies and best practices.

Understanding Errors

Error Types

TypeCauseExample
External serviceAPI failures, rate limitsSlack API down
Data issuesMissing or malformed dataNo meeting transcript
ConfigurationWrong settingsInvalid channel ID
LogicExpression errorsAccessing null field
TimeoutOperation too slowAI node exceeds 120s

Error vs Success Channels

Many nodes have two output types:

text
[Node]
├─ Success ─→ Normal processing
└─ Error ───→ Error handling
  • Success output: Node completed normally
  • Error output: Node encountered a recoverable error

Retry Behavior

Automatic Retries

Nodes automatically retry on transient failures:

SettingDefault
Max attempts3
StrategyExponential with jitter
Initial backoff1 second
Max backoff30 seconds
text
Attempt 1 ─fail─→ Wait ~1s → Attempt 2 ─fail─→ Wait ~2s → Attempt 3

What Gets Retried

Retried:

  • Network timeouts
  • Rate limit responses (429)
  • Temporary service errors (503)
  • Connection failures

Not retried:

  • Invalid configuration
  • Permission errors (403)
  • Not found errors (404)
  • Validation failures

After Retries Exhaust

When max attempts are reached:

  1. Error is sent to error output (if connected)
  2. If no error output, execution fails
  3. Failed executions appear in logs

Error Handling Patterns

Pattern: Log and Continue

Log the error, terminate the branch gracefully:

text
[Action Node]
├─ Success ─→ [Continue Processing]
└─ Error ───→ [Log Error] → [Sink]

Use when: Error is not critical, workflow should complete other branches.

Pattern: Notify on Error

Alert when something fails:

text
[Action Node]
├─ Success ─→ [Continue]
└─ Error ───→ [Slack: "Error in workflow"] → [Sink]

Use when: Failures need human attention.

Pattern: Fallback Action

Try an alternative when primary fails:

text
[Primary Action]
├─ Success ─→ [Done]
└─ Error ───→ [Fallback Action] → [Done]

Use when: There's a reasonable alternative.

Pattern: Graceful Degradation

Do something simpler when the full action fails:

text
[AI: Complex Analysis]
├─ Success ─→ [Rich Notification]
└─ Error ───→ [Simple Notification] → [Sink]

Use when: Partial success is better than complete failure.

Pattern: Validation Before Action

Check data before attempting error-prone operations:

text
[If: data is valid?]
├─ Yes ─→ [Action] → [Continue]
└─ No ──→ [Log: "Invalid data"] → [Sink]

Use when: You can predict failure conditions.

Error Handling by Node Type

Loader Nodes (Load Meeting)

Common errors:

  • Meeting not found
  • No recording/transcript
  • Permission denied

Handling:

text
[Load Meeting]
├─ Success ─→ [Process with data]
└─ Error ───→ [Slack: "Could not load meeting"] → [Sink]

AI Nodes

Common errors:

  • Timeout (>120s)
  • Parse error (output doesn't match type)
  • Content filter triggered

Handling:

text
[AI Prompt]
├─ Success ─→ [Use AI output]
└─ Error ───→ [Use default/fallback] → [Continue]

Action Nodes (Slack, Email, CRM)

Common errors:

  • Authentication failure
  • Rate limits
  • Invalid recipient/channel

Handling:

text
[Slack Post]
├─ Success ─→ [Done]
└─ Error ───→ [Email fallback] → [Sink]

Building Resilient Workflows

1. Always Connect Error Outputs

Bad: Error output disconnected

text
[Node]
├─ Success ─→ [Next]
└─ Error ───→ (nothing)

Good: Error output handled

text
[Node]
├─ Success ─→ [Next]
└─ Error ───→ [Handle] → [Sink]

2. Validate Early

Check conditions before complex processing:

text
      [Load Meeting]
            ↓
   [If: has transcript?]
   ├─ Yes ─→ [AI Analysis] → [Send]
   └─ No ──→ [Simple notification] → [Sink]

3. Provide Context in Error Messages

When alerting on errors, include useful context:

liquid
⚠️ Workflow Error

*Workflow:* Post-Meeting Summary
*Meeting:* {{ trigger.meetingPlanId }}
*Error:* {{ json.error }}
*Time:* {{ trigger.firedAt }}

Please investigate.

4. Don't Swallow Errors Silently

Avoid hiding problems:

Bad: Error goes to Sink with no logging ✅ Good: Error is logged/alerted, then Sink

5. Use Appropriate Timeouts

AI nodes have 120s timeouts. For complex prompts:

  • Keep prompts concise
  • Reduce input data size
  • Consider model tier

6. Plan for Partial Failures

What if some parts succeed and others fail?

text
[Send Email] → Success
[Update CRM] → Error

Design workflows where partial completion is acceptable or add compensation logic.

Monitoring and Debugging

Execution Logs

View detailed execution history:

  • Each node's status
  • Input/output data
  • Error messages and stack traces

Error Indicators

Watch for patterns:

  • High failure rates on specific nodes
  • Time-based failures (rate limits)
  • Data-related failures (specific inputs)

Alerting Strategy

For critical workflows:

  1. Immediate alert: Send Slack/SMS on failure
  2. Log all errors: Maintain history for debugging
  3. Track metrics: Monitor success/failure rates

Error Response Checklist

When a workflow fails:

  • Identify the failing node - Check execution logs
  • Understand the error - Read error message/code
  • Determine root cause - Data? Service? Config?
  • Fix the issue - Update workflow or fix external cause
  • Retry if needed - Some failures are temporary
  • Add prevention - Update workflow to handle edge case

Best Practices Summary

PracticeDescription
Connect all error outputsNever leave errors unhandled
Log errorsMaintain visibility into failures
Alert on critical failuresGet human attention when needed
Validate data earlyCatch problems before complex processing
Provide fallbacksHave alternatives for non-critical failures
Include contextError messages should be actionable
Test error pathsVerify error handling works
Monitor patternsWatch for recurring issues

Example: Complete Error Handling

text
        [Event Trigger: MEETING_ENDED]
                      ↓
                [Load Meeting]
                ↓           ↓
             Success      Error → [Slack: "Load failed"] → [Sink]
                ↓
       [If: has transcript?]
       ↓                 ↓
      Yes                No → [Slack: "No transcript"]
       ↓
  [AI: Analyze]
   ↓         ↓
Success    Error → [Slack: "AI failed"]
   ↓
[Slack Post]
 ↓         ↓
Success  Error → [Email fallback]
 ↓
[Done]

This workflow:

  • Handles Load Meeting failures
  • Checks for transcript availability
  • Has fallback for AI failures
  • Has fallback for Slack failures
  • Provides notifications at each failure point