Skip to main content

No project description provided

Project description

# Airflow's DataDriver plugin

## from Pandas' dataframes to Airflow pipelines

#### WHY :

In a machine learning project, there is a recurring problem
with the difference between local interactive modeling source code
and production pipelines source code.
It is very error prone and, as a consequence, time consuming because we
switch constantly between experimentation and production.

The Datadriver project aims to solve this issue by making the glue code **based on Pandas and sklearn**
for modelization, **and on Airflow** for automation, scheduling, and monitoring of training
and predicting pipelines.

#### Plugin description

**Datadriver UI (ddui)** is the Airflow's plugin we developed to track our models.
Combined with the Datadriver's API (pyddapi), it offers a DAG view to track machine learning workflow (or dataflow).

More specifically, it shows the **Output** of any Airflow's Task with a lot of metrics and
charts :

- choose a DAG to track
![img/ddui_titan1.png](img/ddui_titan1.png)
- select a task to see charts and describe metrics on the output_table
![img/ddui_titan3.png](img/ddui_titan3.png)
- look at histograms to verify if columns are correct (distributions, number of NAs,
unique values, etc...)
![img/ddui_titan2.png](img/ddui_titan2.png)

## Getting started

git clone git_url_of_this_project && cd this_project

local install :

pip install -e .
ddui install

docker install :

./run_docker.sh


## Package modules

ddui/
dash_app -> the application defined like a Dash application, with callbacks and event handeling. It is imported in plugin.py later
dash_components -> html custom components like a Panel or an Alert Div
orm -> function to access the Airflow metastore and retrieve DAGs list and infos
plot -> functions using plotly, they return a Graph object
plugin -> defines the DataDriverUI plugin that implements Airflow's Plugin interface https://airflow.apache.org/plugins.html#interface
views -> a FlaskAdminView that implements Dash too, to have the ability to include plotly charts in Airflow


###### dependencies graph

![pydeps ddui](img/dependencies_analysis.png)

## Developer setup

There is an existing DAG in tests/dags that mocks the behavior of Datadriver's API, but
without any dependency to pyddapi.

You can use it to develop the User Interface, using the script located in tests/dev_tools.

cd tests/dev_tools
python run_webserver.py

It runs the Airflow's webserver, and it overrides the AIRFLOW__CORE__DAGS_FOLDER to look into tests/dags.

### Setup your virtual env

virtualenv venv
source venv/bin/activate
pip install -e .
pip install -r ci/tests_requirements.txt
ddui install

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ddui-3.0.4.tar.gz (14.7 kB view details)

Uploaded Source

Built Distribution

ddui-3.0.4-py2.py3-none-any.whl (15.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file ddui-3.0.4.tar.gz.

File metadata

  • Download URL: ddui-3.0.4.tar.gz
  • Upload date:
  • Size: 14.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.7.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for ddui-3.0.4.tar.gz
Algorithm Hash digest
SHA256 186a6a30c1f2497a3d52b5ee3435145fd48e0eccdecf2f345791c7831fac4633
MD5 40baa54d9501d4059949127acc7b2e71
BLAKE2b-256 0dd6b9e73aa626afad788661d0ed10ffffc74bf2356951cf5b10e4ebdd997c22

See more details on using hashes here.

File details

Details for the file ddui-3.0.4-py2.py3-none-any.whl.

File metadata

  • Download URL: ddui-3.0.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 15.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.7.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for ddui-3.0.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b8947a24358c21fc1b2c058a924061355fb0203a07aff38facf2b05d9a9c7688
MD5 75e33edc591eaabbcdcb45ac7d63fd9c
BLAKE2b-256 5af5aa5fe6ad9ebe91f5949ebad13a6bda793169ee4199248d68f5d9ebe69e84

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page