Programmatically author, schedule and monitor data pipelines
Project description
Apache Airflow
Apache Airflow (or simply Airflow) is a platform to programmatically author, schedule, and monitor workflows.
When workflows are defined as code, they become more maintainable, versionable, testable, and collaborative.
Use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed.
Requirements
Apache Airflow is tested with:
Main version (dev) | Stable version (2.7.1) | |
---|---|---|
Python | 3.8, 3.9, 3.10, 3.11 | 3.8, 3.9, 3.10, 3.11 |
Platform | AMD64/ARM64(*) | AMD64/ARM64(*) |
Kubernetes | 1.24, 1.25, 1.26, 1.27 | 1.24, 1.25, 1.26, 1.27 |
PostgreSQL | 11, 12, 13, 14, 15 | 11, 12, 13, 14, 15 |
MySQL | 5.7, 8.0, 8.1 | 5.7, 8.0, 8.1 |
SQLite | 3.15.0+ | 3.15.0+ |
MSSQL | 2017(*), 2019(*) | 2017(*), 2019(*) |
* Experimental
Note: MySQL 5.x versions are unable to or have limitations with running multiple schedulers -- please see the Scheduler docs. MariaDB is not tested/recommended.
Note: SQLite is used in Airflow tests. Do not use it in production. We recommend using the latest stable version of SQLite for local development.
Note: Airflow currently can be run on POSIX-compliant Operating Systems. For development, it is regularly
tested on fairly modern Linux Distros and recent versions of macOS.
On Windows you can run it via WSL2 (Windows Subsystem for Linux 2) or via Linux Containers.
The work to add Windows support is tracked via #10388, but
it is not a high priority. You should only use Linux-based distros as "Production" execution environment
as this is the only environment that is supported. The only distro that is used in our CI tests and that
is used in the Community managed DockerHub image is
Debian Bullseye
.
Getting started
Visit the official Airflow website documentation (latest stable release) for help with installing Airflow, getting started, or walking through a more complete tutorial.
Note: If you're looking for documentation for the main branch (latest development branch): you can find it on s.apache.org/airflow-docs.
For more information on Airflow Improvement Proposals (AIPs), visit the Airflow Wiki.
Documentation for dependent projects like provider packages, Docker image, Helm Chart, you'll find it in the documentation index.
Installing from PyPI
We publish Apache Airflow as apache-airflow
package in PyPI. Installing it however might be sometimes tricky
because Airflow is a bit of both a library and application. Libraries usually keep their dependencies open, and
applications usually pin them, but we should do neither and both simultaneously. We decided to keep
our dependencies as open as possible (in setup.py
) so users can install different versions of libraries
if needed. This means that pip install apache-airflow
will not work from time to time or will
produce unusable Airflow installation.
To have repeatable installation, however, we keep a set of "known-to-be-working" constraint
files in the orphan constraints-main
and constraints-2-0
branches. We keep those "known-to-be-working"
constraints files separately per major/minor Python version.
You can use them as constraint files when installing Airflow from PyPI. Note that you have to specify
correct Airflow tag/version/branch and Python versions in the URL.
- Installing just Airflow:
Note: Only
pip
installation is currently officially supported.
While it is possible to install Airflow with tools like Poetry or
pip-tools, they do not share the same workflow as
pip
- especially when it comes to constraint vs. requirements management.
Installing via Poetry
or pip-tools
is not currently supported.
There are known issues with bazel
that might lead to circular dependencies when using it to install
Airflow. Please switch to pip
if you encounter such problems. Bazel
community works on fixing
the problem in this PR <https://github.com/bazelbuild/rules_python/pull/1166>
_ so it might be that
newer versions of bazel
will handle it.
If you wish to install Airflow using those tools, you should use the constraint files and convert them to the appropriate format and workflow that your tool requires.
pip install 'apache-airflow==2.7.1' \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.7.1/constraints-3.8.txt"
- Installing with extras (i.e., postgres, google)
pip install 'apache-airflow[postgres,google]==2.7.1' \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.7.1/constraints-3.8.txt"
For information on installing provider packages, check providers.
Official source code
Apache Airflow is an Apache Software Foundation (ASF) project, and our official source code releases:
- Follow the ASF Release Policy
- Can be downloaded from the ASF Distribution Directory
- Are cryptographically signed by the release manager
- Are officially voted on by the PMC members during the Release Approval Process
Following the ASF rules, the source packages released must be sufficient for a user to build and test the release provided they have access to the appropriate platform and tools.
Contributing
Want to help build Apache Airflow? Check out our contributing documentation.
Official Docker (container) images for Apache Airflow are described in IMAGES.rst.
Who uses Apache Airflow?
More than 400 organizations are using Apache Airflow in the wild.
Who maintains Apache Airflow?
Airflow is the work of the community, but the core committers/maintainers are responsible for reviewing and merging PRs as well as steering conversations around new feature requests. If you would like to become a maintainer, please review the Apache Airflow committer requirements.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for apache_airflow-2.7.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 06214ece464bbb54910220903081b84056d97e2271400ca01ee63743c41cf4f1 |
|
MD5 | c044201dbf1b75d8a7360d1bacdd9ebf |
|
BLAKE2b-256 | bb2c1aae9daf5eedc153ec64cb125317d1a7680450114b690cfa484a7217a9fa |