Skip to main content
  1. Posts/

Why-Robinhood-uses-Airflow-Robinhood-Engineering

170 words·1 min

Why-Robinhood-uses-Airflow-Robinhood-Engineering #

We started off with using cron to schedule these jobs but with their growing number and complexity, it became increasingly challenging for us to manage them using cron:

  • Managing dependencies between jobs was difficult.

Dependency Management #

Airflow uses Operators as the fundamental unit of abstraction to define tasks, and uses a DAG (Directed Acyclic Graph) to define workflows using a set of operators. It provides historical views of the jobs and tools to control the state of jobs — such as kill a running job or manually re-running a job. We also use Airflow sensors to run jobs right after market close, while handling market half-days.

  • The Scheduler works separately for scheduled jobs and backfill jobs.
  • Airflow was built primarily for data batch processing due to which the Airflow designers made a decision to always schedule jobs for the previous interval. Hence, a job scheduled to run daily at midnight will pass in the execution date “2016–12–31 00:00:00” to the job’s context when run on “2017–01–01 00:00:00”.