Overview
Celery is a distributed task queue that allows you to offload work from your main application flow to background workers. It handles the complexity of message passing, retries, and scheduling, enabling you to build responsive and scalable systems.
Why Celery?
In a typical web application, certain tasks take too long to process within a single request-response cycle (e.g., sending emails, generating PDFs, or processing images). If you run these tasks synchronously, your users will experience slow page loads or timeouts.
Celery solves this by decoupling the request from the execution. Your application sends a message to a "broker," and a separate worker process picks it up and executes it whenever it's ready.
Core Concepts
- Task: A unit of work, defined as a Python function decorated with
@app.task. - Broker: The message transport (e.g., RabbitMQ, Redis) that receives tasks from the application and delivers them to workers.
- Worker: A separate process that monitors the broker for new tasks and executes them.
- Result Backend: An optional storage (e.g., Redis, Database) where workers save the output or status of a task so the application can retrieve it later.
- Beat: A scheduler that kicks off tasks at regular intervals (cron-like functionality).
How It Works
- Definition: You define a task using the
Celeryapp instance. - Invocation: Your application calls the task using
.delay()or.apply_async(). - Queueing: Celery serializes the arguments and sends a message to the Broker.
- Execution: A Worker process pulls the message from the broker and runs the function.
- Result: If configured, the worker stores the return value in the Result Backend.
Use Cases
1. Background Jobs
Offload non-critical work to keep your UI snappy.
from celery import Celery
app = Celery('tasks', broker='pyamqp://guest@localhost//')
@app.task
def send_welcome_email(user_id):
# Logic to send email
print(f"Sending email to {user_id}")
# In your web view:
send_welcome_email.delay(user_id=123)
2. Scheduled Tasks (Cron)
Run cleanup scripts or generate daily reports using celery beat.
from celery.schedules import crontab
app.conf.beat_schedule = {
'clear-cache-every-midnight': {
'task': 'tasks.clear_cache',
'schedule': crontab(minute=0, hour=0),
},
}
3. Complex Workflows (Canvas)
Chain tasks together or run them in parallel using "Canvas" primitives like chain and group.
from celery import chain, group
from tasks import add, mul
# Run (2 + 2) and then multiply the result by 4
workflow = chain(add.s(2, 2), mul.s(4))
result = workflow.delay()
# Run multiple tasks in parallel
job = group(add.s(i, i) for i in range(10))
result = job.delay()
When to Use / When Not to Use
| Use Celery When... | Avoid Celery When... |
|---|---|
| You have long-running tasks (>100ms). | Your task is extremely short and low-volume (use threads). |
| You need to scale workers independently. | You need "hard" real-time guarantees (sub-millisecond). |
| You need robust retry logic and error handling. | You don't want to manage a broker (RabbitMQ/Redis). |
| You need to schedule periodic work. | Your application is a simple CLI tool with no persistence. |
Stack Compatibility
Celery is highly pluggable and works with almost any Python web framework:
- Brokers: RabbitMQ (recommended), Redis, Amazon SQS.
- Backends: Redis, SQLAlchemy (Postgres/MySQL), MongoDB, Memcached, Cassandra.
- Frameworks: Native support for Django, Flask, Pyramid, and more.
Getting Started Pointers
- [[LINK: First Steps with Celery]] — A quick tutorial to get your first worker running.
- [[LINK: Next Steps]] — Deep dive into configuration and production best practices.
- [[LINK: Canvas: Designing Workflows]] — Learn how to compose complex task graphs.
- [[LINK: Periodic Tasks]] — Setting up the
celery beatscheduler.
Limitations & Assumptions
- Broker Dependency: Celery requires a running broker. If the broker goes down, tasks cannot be sent or received.
- Serialization: Arguments passed to tasks must be serializable (usually JSON). Avoid passing complex objects like database model instances; pass IDs instead.
- Visibility Timeout: When using Redis or SQS as a broker, be aware of visibility timeouts which can cause tasks to be redelivered if they run too long.
- Idempotency: Tasks should ideally be idempotent (safe to run multiple times) because network issues can occasionally cause a task to be delivered twice.
FAQ
Q: Can I use Celery without a broker? A: No. Celery is built on top of a message-passing architecture. You need a broker like RabbitMQ or Redis to move messages between your app and the workers.
Q: How do I see what my workers are doing?
A: You can use the CLI command celery -A proj inspect active or use a monitoring tool like Flower.
Q: Is Celery thread-safe? A: Yes, the Celery app and task definitions are thread-safe. However, the code inside your tasks must also be thread-safe if you use a threaded execution pool.
Q: Can I run Celery on Windows? A: While Celery 4.x+ officially supports only Linux and macOS, it can run on Windows for development, though some features (like the prefork pool) may behave differently.
Q: What happens if a worker crashes mid-task? A: If you use "acknowledgments" (the default), the broker will notice the connection closed and redeliver the message to another worker.