These sequence diagrams explain the lifecycle of a Job with relation to the database operation in the context of the internal API of Airflow.
As part of AIP-44 implementation we separated the ORM Job instance from the code the job runs, introducing a concept of Job Runners. The Job Runner is a class that is responsible for running the code and it might execute either in-process when direct database is used, or remotely when the job is run remotely and communicates via internal API (this part is a work-in-progress and we will keep on updating these lifecycle diagrams).
This apply to all of the CLI components Airflow runs (Scheduler, DagFileProcessor, Triggerer, Worker) that run a job. The AIP-44 implementation is not yet complete, but when complete it will apply to some of the components (DagFileProcessor, Triggerer, Worker) and not to others (Scheduler).
sequenceDiagram
participant CLI component
participant JobRunner
participant DB
activate CLI component
CLI component-->>DB: Create Session
activate DB
CLI component->>DB: Create Job
DB->>CLI component: Job object
CLI component->>JobRunner: Create Job Runner
JobRunner ->> CLI component: JobRunner object
CLI component->>JobRunner: Run Job
activate JobRunner
JobRunner->>DB: prepare_for_execution [Job]
DB->>JobRunner: prepared
par
JobRunner->>JobRunner: execute_job
and
JobRunner->>DB: access DB (Variables/Connections etc.)
DB ->> JobRunner: returned data
and
JobRunner-->>DB: create heartbeat session
Note over DB: Note: During heartbeat<br> two DB sessions <br>are opened in parallel(!)
JobRunner->>DB: perform_heartbeat [Job]
JobRunner ->> JobRunner: Heartbeat Callback [Job]
DB ->> JobRunner: heartbeat response
DB -->> JobRunner: close heartbeat session
end
JobRunner->>DB: complete_execution [Job]
DB ->> JobRunner: completed
JobRunner ->> CLI component: completed
deactivate JobRunner
deactivate DB
deactivate CLI component
sequenceDiagram
participant CLI component
participant JobRunner
participant Internal API
participant DB
activate CLI component
CLI component->>Internal API: Create Job
Internal API-->>DB: Create Session
activate DB
Internal API ->> DB: Create Job
DB ->> Internal API: Job object
DB --> Internal API: Close Session
deactivate DB
Internal API->>CLI component: JobPydantic object
CLI component->>JobRunner: Create Job Runner
JobRunner ->> CLI component: JobRunner object
CLI component->>JobRunner: Run Job
activate JobRunner
JobRunner->>Internal API: prepare_for_execution [JobPydantic]
Internal API-->>DB: Create Session
activate DB
Internal API ->> DB: prepare_for_execution [Job]
DB->>Internal API: prepared
DB-->>Internal API: Close Session
deactivate DB
Internal API->>JobRunner: prepared
par
JobRunner->>JobRunner: execute_job
and
JobRunner ->> Internal API: access DB (Variables/Connections etc.)
Internal API-->>DB: Create Session
activate DB
Internal API ->> DB: access DB (Variables/Connections etc.)
DB ->> Internal API: returned data
DB-->>Internal API: Close Session
deactivate DB
Internal API ->> JobRunner: returned data
and
JobRunner->>Internal API: perform_heartbeat <br> [Job Pydantic]
Internal API-->>DB: Create Session
activate DB
Internal API->>DB: perform_heartbeat [Job]
Internal API ->> Internal API: Heartbeat Callback [Job]
DB ->> Internal API: heartbeat response
Internal API ->> JobRunner: heartbeat response
DB-->>Internal API: Close Session
deactivate DB
end
JobRunner->>Internal API: complete_execution <br> [Job Pydantic]
Internal API-->>DB: Create Session
Internal API->>DB: complete_execution [Job]
activate DB
DB ->> Internal API: completed
DB-->>Internal API: Close Session
deactivate DB
Internal API->>JobRunner: completed
JobRunner ->> CLI component: completed
deactivate JobRunner
deactivate CLI component