Search documentation

Search for pages in the documentation

Monitoring & Logs

Track workflow execution status and view detailed logs

Effective monitoring helps you understand workflow behavior, identify issues, and optimize performance. This guide covers the monitoring capabilities available in Agents.

Execution Status

Status Lifecycle

Every workflow execution progresses through these states:

text
                 ┌→ completed
queued → running ┼→ failed
                 └→ canceled

Status Definitions

StatusDescriptionNext States
queuedExecution created, waiting to startrunning
runningActively processing nodescompleted, failed, canceled
completedAll nodes finished successfully(terminal)
failedExecution stopped due to error(terminal)
canceledExecution manually or automatically stopped(terminal)

Status Timestamps

Each execution tracks key timestamps:

TimestampDescription
enqueuedAtWhen execution was created
seededAtWhen initial graph was built
startedAtWhen first node began executing
completedAtWhen execution succeeded
failedAtWhen execution failed
canceledAtWhen execution was canceled
finishedAtWhen execution reached terminal state

Viewing Executions

Execution List

View all executions with filtering:

Available Filters:

  • Status (queued, running, completed, failed, canceled)
  • Workflow
  • Date range
  • Trigger type

Execution Detail

Each execution shows:

  1. Summary - Status, duration, trigger info
  2. Timeline - Visual execution flow
  3. Node States - Status of each node
  4. Inputs/Outputs - Data at each step
  5. Errors - Any error details

Execution details showing node states

Node-Level Monitoring

Node States

Each node in an execution has its own state:

StateMeaning
PendingWaiting for input
ReadyInput available, waiting to execute
RunningCurrently executing
CompletedFinished successfully
FailedExecution failed
SkippedBranch not taken

Node Execution Details

For each node, view:

  • Input Data - What data the node received
  • Output Data - What data the node produced
  • Duration - How long execution took
  • Attempts - Retry count if applicable
  • Error - Error message if failed

Node output details

Port State Tracking

For complex nodes with multiple ports:

text
Input Ports:
├── in[0]: 3 items ready
└── in[1]: 2 items ready

Output Ports:
├── success[0]: 5 items produced
└── error[0]: 0 items

Execution Logs

Log Levels

LevelPurposeExample
InfoNormal operations"Node execution completed"
WarningPotential issues"Retry attempt 2 of 3"
ErrorFailures"API call failed: 429 Rate Limited"

Log Structure

Each log entry includes:

json
{
  "timestamp": "2024-01-15T10:30:45.123Z",
  "level": "info",
  "message": "Node execution completed",
  "context": {
    "executionId": "exec-123",
    "nodeId": "node-456",
    "duration": 1250
  }
}

Viewing Logs

Access logs through:

  1. Execution Detail - Logs for specific execution
  2. Workflow Logs - All logs for a workflow
  3. System Logs - Platform-wide logs

Monitoring Patterns

1. Execution Success Rate

Track workflow reliability:

text
Success Rate = Completed / (Completed + Failed) × 100

Healthy: > 95% Warning: 90-95% Critical: < 90%

2. Execution Duration

Monitor performance over time:

Workflow TypeTypical Duration
Simple (3-5 nodes)5-15 seconds
Medium (5-10 nodes)15-60 seconds
Complex (10+ nodes)60-300 seconds
With AIAdd 5-30 seconds per AI node

3. Queue Depth

Track pending executions:

Healthy: < 10 queued Warning: 10-50 queued Critical: > 50 queued (processing backlog)

4. Error Patterns

Identify recurring issues:

  • Same error message across executions
  • Errors at specific times
  • Errors for specific trigger types

Error Tracking

Error Information

When a node fails, capture:

FieldDescription
Error TypeClassification of error
Error MessageHuman-readable description
Error CodeMachine-readable code if available
Stack TraceTechnical details (if available)
ContextRelevant data at time of error

Error Categories

CategoryDescriptionAction
Integration ErrorExternal service issueCheck integration status
Validation ErrorInvalid dataReview input data
Timeout ErrorOperation too slowOptimize or increase timeout
Permission ErrorAccess deniedCheck permissions
Internal ErrorPlatform issueContact support

Error Notifications

Set up alerts for:

  1. Execution Failures - Immediate notification
  2. Error Rate Threshold - Alert when rate exceeds limit
  3. DLQ Entries - New messages in dead letter queue

Performance Monitoring

Key Metrics

MetricDescriptionTarget
LatencyTime from trigger to completion< 60s for simple workflows
ThroughputExecutions per minuteVaries by plan
Error RateFailed / Total executions< 5%
Queue TimeTime spent queued< 5 seconds

Identifying Bottlenecks

Look for:

  1. Slow Nodes - Nodes taking longer than expected
  2. High Retry Counts - Nodes requiring multiple attempts
  3. Queue Buildup - Executions waiting to start

Optimization Opportunities

Based on monitoring data:

ObservationPossible Optimization
Slow AI nodesUse lower model tier
High retry rateAdd error handling
Long queue timesSimplify workflow
Timeout errorsOptimize node logic

Dashboard Views

Workflow Dashboard

Overview of all workflow activity:

  • Total executions (24h, 7d, 30d)
  • Success/failure breakdown
  • Average duration
  • Recent executions

Execution Timeline

Visual representation of execution flow:

text
[Trigger] → [Load] → [AI] → [Send]
   0s        1.2s    4.5s    5.1s

Error Summary

Aggregated error information:

  • Error counts by type
  • Most common errors
  • Error trends over time

Best Practices

1. Monitor Proactively

Don't wait for user reports:

  • Check dashboards regularly
  • Set up automated alerts
  • Review DLQ daily

2. Track Baselines

Know what "normal" looks like:

  • Typical execution duration
  • Expected success rate
  • Normal queue depth

3. Investigate Anomalies

When metrics deviate:

  1. Check for system issues
  2. Review recent changes
  3. Examine error patterns

4. Document Incidents

Keep records of:

  • What happened
  • Root cause
  • Resolution
  • Prevention steps

5. Use Log Filters Effectively

Filter logs to find issues quickly:

  • Filter by execution ID for specific issues
  • Filter by error level for all errors
  • Filter by time range for incidents

Troubleshooting with Monitoring

Execution Not Starting

Check:

  • Is workflow released and active?
  • Is trigger configured correctly?
  • Any queued executions?

Execution Stuck in Running

Check:

  • Which node is executing?
  • Has it exceeded timeout?
  • External service issues?

High Failure Rate

Check:

  • Common error message?
  • Specific node failing?
  • Integration issues?

Slow Executions

Check:

  • Which nodes are slow?
  • AI model tier?
  • External API latency?