Scheduling & Periodic Tasks
Periodic task scheduling in this codebase is managed by the Celery Beat service. It operates as a central scheduler that dispatches tasks to the cluster at regular intervals.
Core Components
The scheduling system is built around three primary classes located in celery/beat.py:
Service: The main entry point for the Beat process. It manages the scheduler instance and runs the infinite "tick" loop that triggers tasks.Scheduler: The base class that maintains a heap of schedule entries. it determines which tasks are due and calculates how long to sleep until the next task is ready.PersistentScheduler: A subclass ofScheduler(and the default used byService) that persists the schedule state to a local database file using the Pythonshelvemodule. This ensures that task execution counts and last-run timestamps survive service restarts.
The Schedule Entry
Every periodic task is represented by a ScheduleEntry object. This class tracks:
name: The unique identifier for the schedule.task: The name of the task to execute.last_run_at: A timestamp of the last time the task was dispatched.total_run_count: How many times the task has been executed.schedule: The actual schedule object (e.g.,crontaborsolar) that determines if the task is due.
Schedule Types
Schedules are defined in celery/schedules.py. All schedule types inherit from BaseSchedule and must implement is_due(last_run_at), which returns a schedstate namedtuple containing a boolean is_due and the number of seconds to wait until the next execution.
Interval Schedules
The schedule class represents simple periodic intervals. It can be initialized with a float (seconds) or a timedelta object.
from celery.schedules import schedule
# Runs every 30 seconds
s = schedule(run_every=30.0)
Crontab Schedules
The crontab class provides a powerful, cron-like syntax for time-based execution. It supports minutes, hours, day of week, day of month, and month of year.
from celery.schedules import crontab
# Runs every Monday morning at 7:30 AM
c = crontab(hour=7, minute=30, day_of_week=1)
The crontab implementation uses a crontab_parser to expand patterns like */15 or 1-7,15-21 into sets of valid integers for comparison against the current time.
Solar Schedules
The solar class allows scheduling based on astronomical events like sunrise or sunset. It requires the ephem library.
from celery.schedules import solar
# Runs at every sunrise in Melbourne, Australia
s = solar('sunrise', -37.81, 144.96)
Supported events include dawn_civil, sunrise, solar_noon, sunset, and dusk_civil, among others defined in solar._all_events.
Persistence and State
The PersistentScheduler uses a local file (defaulting to celerybeat-schedule) to store the state of the schedule. This is critical for ensuring that tasks with long intervals (e.g., once a week) don't run immediately every time the Beat service is restarted.
Database Resets
In PersistentScheduler.setup_schedule, the scheduler performs safety checks. If it detects a change in the timezone or enable_utc settings, it will clear the persistent database to avoid inconsistent scheduling:
# From celery/beat.py
tz = self.app.conf.timezone
stored_tz = self._store.get('tz')
if stored_tz is not None and stored_tz != tz:
warning('Reset: Timezone changed from %r to %r', stored_tz, tz)
self._store.clear() # Timezone changed, reset db!
The Execution Loop
The Service.start() method runs the main loop. In each iteration, it calls scheduler.tick().
tick():- Checks the top of the heap (the task due soonest).
- Calls
is_due(entry.last_run_at)on the entry's schedule. - If due, it calls
apply_entry(entry), which dispatches the task viaapply_async. - Returns the number of seconds to sleep until the next task is due.
- Sleep: The service sleeps for the returned interval, capped by
beat_max_loop_interval(default 300 seconds). - Sync: Periodically, the
PersistentSchedulersyncs its in-memory state back to theshelvedatabase on disk.
Configuration
Periodic tasks are typically configured via the beat_schedule setting in the Celery app:
app.conf.beat_schedule = {
'add-every-30-seconds': {
'task': 'tasks.add',
'schedule': 30.0,
'args': (16, 16),
},
}
Alternatively, tasks can be added programmatically using the add_periodic_task method on the Celery app instance (found in celery/app/base.py), which updates the beat_schedule configuration internally.
app.add_periodic_task(
crontab(hour=0, minute=0),
test.s(arg='daily_cleanup'),
name='daily-cleanup'
)
Important Considerations
- Timezones: Celery Beat is timezone-aware. If
enable_utcis True (default), it uses UTC. Otherwise, it uses the timezone specified in thetimezonesetting. - Solar Dependencies: Using
solarschedules will raise anImportErrorif theephemlibrary is not installed in the environment. - Scheduler Files: When running multiple Beat instances (not recommended), they must each use a unique
schedule_filenameto avoid database corruption, asshelvedoes not support concurrent access.