Setup Apache Airflow on ubuntu to run multiple DAGs and tasks using MySQL

Ubuntu || Apache Airflow || MySQL

By default, Airflow using the SQLite database for storing the meta-information. SQLite doesn't support the multiple connections, only sequential execution by default. Here I'm going to use MySQL for parallel execution.

MySQL Installation Guide ==> MySQL-Setup

Airflow-MySQL Setup:
  • Open Terminal and execute
    • mysql -u root -p
    • mysql> CREATE DATABASE airflow;
    • mysql> CREATE USER 'airflow'@'localhost' IDENTIFIED BY 'airflow';
    • mysql> GRANT ALL PRIVILEGES ON airflow. * TO 'airflow'@'localhost';
    • mysql> FLUSH PRIVILEGES;
  • Airflow needs a home, ~/airflow is the default, but you can lay foundation somewhere else if you prefer

    export AIRFLOW_HOME=~/airflow

  • install airflow using pip

    sudo pip install apache-airflow

  • create subfloder for your dags

    mkdir ~/airflow/dags

  • Change the Airflow configuration for parallel execution

  • Initialize the database

    airflow initdb

  • Start the web server, default port is 8080

    airflow webserver -D

  • Start the scheduler

    airflow scheduler -D

  • visit localhost:8080 in the browser.