Skip to main content

Scheduling & Periodic Tasks

Periodic task scheduling in this codebase is managed by the Celery Beat service. It operates as a central scheduler that dispatches tasks to the cluster at regular intervals.

Core Components

The scheduling system is built around three primary classes located in celery/beat.py:

  • Service: The main entry point for the Beat process. It manages the scheduler instance and runs the infinite "tick" loop that triggers tasks.
  • Scheduler: The base class that maintains a heap of schedule entries. it determines which tasks are due and calculates how long to sleep until the next task is ready.
  • PersistentScheduler: A subclass of Scheduler (and the default used by Service) that persists the schedule state to a local database file using the Python shelve module. This ensures that task execution counts and last-run timestamps survive service restarts.

The Schedule Entry

Every periodic task is represented by a ScheduleEntry object. This class tracks:

  • name: The unique identifier for the schedule.
  • task: The name of the task to execute.
  • last_run_at: A timestamp of the last time the task was dispatched.
  • total_run_count: How many times the task has been executed.
  • schedule: The actual schedule object (e.g., crontab or solar) that determines if the task is due.

Schedule Types

Schedules are defined in celery/schedules.py. All schedule types inherit from BaseSchedule and must implement is_due(last_run_at), which returns a schedstate namedtuple containing a boolean is_due and the number of seconds to wait until the next execution.

Interval Schedules

The schedule class represents simple periodic intervals. It can be initialized with a float (seconds) or a timedelta object.

from celery.schedules import schedule

# Runs every 30 seconds
s = schedule(run_every=30.0)

Crontab Schedules

The crontab class provides a powerful, cron-like syntax for time-based execution. It supports minutes, hours, day of week, day of month, and month of year.

from celery.schedules import crontab

# Runs every Monday morning at 7:30 AM
c = crontab(hour=7, minute=30, day_of_week=1)

The crontab implementation uses a crontab_parser to expand patterns like */15 or 1-7,15-21 into sets of valid integers for comparison against the current time.

Solar Schedules

The solar class allows scheduling based on astronomical events like sunrise or sunset. It requires the ephem library.

from celery.schedules import solar

# Runs at every sunrise in Melbourne, Australia
s = solar('sunrise', -37.81, 144.96)

Supported events include dawn_civil, sunrise, solar_noon, sunset, and dusk_civil, among others defined in solar._all_events.

Persistence and State

The PersistentScheduler uses a local file (defaulting to celerybeat-schedule) to store the state of the schedule. This is critical for ensuring that tasks with long intervals (e.g., once a week) don't run immediately every time the Beat service is restarted.

Database Resets

In PersistentScheduler.setup_schedule, the scheduler performs safety checks. If it detects a change in the timezone or enable_utc settings, it will clear the persistent database to avoid inconsistent scheduling:

# From celery/beat.py
tz = self.app.conf.timezone
stored_tz = self._store.get('tz')
if stored_tz is not None and stored_tz != tz:
warning('Reset: Timezone changed from %r to %r', stored_tz, tz)
self._store.clear() # Timezone changed, reset db!

The Execution Loop

The Service.start() method runs the main loop. In each iteration, it calls scheduler.tick().

  1. tick():
    • Checks the top of the heap (the task due soonest).
    • Calls is_due(entry.last_run_at) on the entry's schedule.
    • If due, it calls apply_entry(entry), which dispatches the task via apply_async.
    • Returns the number of seconds to sleep until the next task is due.
  2. Sleep: The service sleeps for the returned interval, capped by beat_max_loop_interval (default 300 seconds).
  3. Sync: Periodically, the PersistentScheduler syncs its in-memory state back to the shelve database on disk.

Configuration

Periodic tasks are typically configured via the beat_schedule setting in the Celery app:

app.conf.beat_schedule = {
'add-every-30-seconds': {
'task': 'tasks.add',
'schedule': 30.0,
'args': (16, 16),
},
}

Alternatively, tasks can be added programmatically using the add_periodic_task method on the Celery app instance (found in celery/app/base.py), which updates the beat_schedule configuration internally.

app.add_periodic_task(
crontab(hour=0, minute=0),
test.s(arg='daily_cleanup'),
name='daily-cleanup'
)

Important Considerations

  • Timezones: Celery Beat is timezone-aware. If enable_utc is True (default), it uses UTC. Otherwise, it uses the timezone specified in the timezone setting.
  • Solar Dependencies: Using solar schedules will raise an ImportError if the ephem library is not installed in the environment.
  • Scheduler Files: When running multiple Beat instances (not recommended), they must each use a unique schedule_filename to avoid database corruption, as shelve does not support concurrent access.