Skip to main content

Event Debugging with Dumper

To debug Celery events in real-time, you can use the Dumper utility to pipe raw event data to standard output or a custom stream. This provides a "tcpdump-like" view of everything happening in your Celery cluster.

Debugging via the Command Line

The most common way to use the dumper is through the Celery CLI. This starts a process that captures all events and prints them to the console.

celery events --dump

This command invokes _run_evdump in celery/bin/events.py, which calls the high-level evdump function.

Programmatic Event Capture

You can start the event dumper programmatically within your own scripts using the evdump function from celery.events.dumper.

from celery import Celery
from celery.events.dumper import evdump

app = Celery('my_app')

# This will block and print events to sys.stdout
evdump(app=app)

The evdump function handles connection management, including automatic reconnection logic if the broker connection is lost.

Using the Dumper Class for Custom Streams

If you need to redirect event output to something other than stdout (e.g., a file or a memory buffer), use the Dumper class directly.

import io
from celery.events.dumper import Dumper

# Capture events into a string buffer
output_buffer = io.StringIO()
dumper = Dumper(out=output_buffer)

# Example event dictionary
event = {
'hostname': 'worker1.example.com',
'timestamp': 1704110400.0,
'type': 'worker-online',
'sw_ident': 'py-celery',
'sw_ver': '5.3.0',
}

# Process the event
dumper.on_event(event)

print(output_buffer.getvalue())
# Output: worker1.example.com [2024-01-01 12:00:00+00:00] started: sw_ident=py-celery, sw_ver=5.3.0

How Event Formatting Works

The Dumper performs several transformations to make raw events human-readable:

  1. Type Humanization: Internal event types are mapped to friendly names via humanize_type. For example, worker-online becomes started and worker-offline becomes shutdown.
  2. Task Metadata Tracking: The dumper uses a global LRUCache named TASK_NAMES (defined in celery/events/dumper.py with a limit of 4095 entries) to remember task names and arguments.
    • When a task-received or task-sent event arrives, the dumper stores the task name, UUID, args, and kwargs in the cache.
    • Subsequent events for that task (like task-started or task-succeeded) retrieve this metadata from the cache so the output remains descriptive.
  3. Automatic Flushing: The Dumper.say method calls self.out.flush() after every message. This ensures that if you pipe the output to another utility (like grep), the data appears immediately.

Troubleshooting and Limitations

Missing Task Names and Arguments

If you see task events that only show the UUID without the task name or arguments, it is usually due to one of two reasons:

  • Cache Eviction: The TASK_NAMES cache has reached its 4095-entry limit, and the metadata for that task was evicted.
  • Missed Events: The dumper was not running when the task-received or task-sent event occurred, so it never captured the metadata.

In-place Dictionary Modification

The Dumper.on_event method is destructive. It uses .pop() to extract fields like timestamp, type, and hostname from the event dictionary.

# WARNING: This will modify the 'event' dictionary
dumper.on_event(event)

# 'type' and 'timestamp' are now missing from 'event'
assert 'type' not in event

If you need to use the event dictionary after passing it to the dumper, pass a copy instead: dumper.on_event(event.copy()).

Connection Errors

When using evdump, connection errors are caught and reported to the output stream. The utility will attempt to reconnect indefinitely, using humanize_seconds to display the retry interval.