No project description provided
Project description
# Airflow's DataDriver plugin
## from Pandas' dataframes to Airflow pipelines
#### WHY :
In a machine learning project, there is a recurring problem
with the difference between local interactive modeling source code
and production pipelines source code.
It is very error prone and, as a consequence, time consuming because we
switch constantly between experimentation and production.
The Datadriver project aims to solve this issue by making the glue code **based on Pandas and sklearn**
for modelization, **and on Airflow** for automation, scheduling, and monitoring of training
and predicting pipelines.
#### Plugin description
**Datadriver UI (ddui)** is the Airflow's plugin we developed to track our models.
Combined with the Datadriver's API (pyddapi), it offers a DAG view to track machine learning workflow (or dataflow).
More specifically, it shows the **Output** of any Airflow's Task with a lot of metrics and
charts :
- choose a DAG to track
![img/ddui_titan1.png](img/ddui_titan1.png)
- select a task to see charts and describe metrics on the output_table
![img/ddui_titan3.png](img/ddui_titan3.png)
- look at histograms to verify if columns are correct (distributions, number of NAs,
unique values, etc...)
![img/ddui_titan2.png](img/ddui_titan2.png)
## Getting started
git clone git_url_of_this_project && cd this_project
local install :
pip install -e .
ddui install
docker install :
./run_docker.sh
## Package modules
ddui/
dash_app -> the application defined like a Dash application, with callbacks and event handeling. It is imported in plugin.py later
dash_components -> html custom components like a Panel or an Alert Div
orm -> function to access the Airflow metastore and retrieve DAGs list and infos
plot -> functions using plotly, they return a Graph object
plugin -> defines the DataDriverUI plugin that implements Airflow's Plugin interface https://airflow.apache.org/plugins.html#interface
views -> a FlaskAdminView that implements Dash too, to have the ability to include plotly charts in Airflow
###### dependencies graph
![pydeps ddui](img/dependencies_analysis.png)
## Developer setup
There is an existing DAG in tests/dags that mocks the behavior of Datadriver's API, but
without any dependency to pyddapi.
You can use it to develop the User Interface, using the script located in tests/dev_tools.
cd tests/dev_tools
python run_webserver.py
It runs the Airflow's webserver, and it overrides the AIRFLOW__CORE__DAGS_FOLDER to look into tests/dags.
### Setup your virtual env
virtualenv venv
source venv/bin/activate
pip install -e .
pip install -r ci/tests_requirements.txt
ddui install
## from Pandas' dataframes to Airflow pipelines
#### WHY :
In a machine learning project, there is a recurring problem
with the difference between local interactive modeling source code
and production pipelines source code.
It is very error prone and, as a consequence, time consuming because we
switch constantly between experimentation and production.
The Datadriver project aims to solve this issue by making the glue code **based on Pandas and sklearn**
for modelization, **and on Airflow** for automation, scheduling, and monitoring of training
and predicting pipelines.
#### Plugin description
**Datadriver UI (ddui)** is the Airflow's plugin we developed to track our models.
Combined with the Datadriver's API (pyddapi), it offers a DAG view to track machine learning workflow (or dataflow).
More specifically, it shows the **Output** of any Airflow's Task with a lot of metrics and
charts :
- choose a DAG to track
![img/ddui_titan1.png](img/ddui_titan1.png)
- select a task to see charts and describe metrics on the output_table
![img/ddui_titan3.png](img/ddui_titan3.png)
- look at histograms to verify if columns are correct (distributions, number of NAs,
unique values, etc...)
![img/ddui_titan2.png](img/ddui_titan2.png)
## Getting started
git clone git_url_of_this_project && cd this_project
local install :
pip install -e .
ddui install
docker install :
./run_docker.sh
## Package modules
ddui/
dash_app -> the application defined like a Dash application, with callbacks and event handeling. It is imported in plugin.py later
dash_components -> html custom components like a Panel or an Alert Div
orm -> function to access the Airflow metastore and retrieve DAGs list and infos
plot -> functions using plotly, they return a Graph object
plugin -> defines the DataDriverUI plugin that implements Airflow's Plugin interface https://airflow.apache.org/plugins.html#interface
views -> a FlaskAdminView that implements Dash too, to have the ability to include plotly charts in Airflow
###### dependencies graph
![pydeps ddui](img/dependencies_analysis.png)
## Developer setup
There is an existing DAG in tests/dags that mocks the behavior of Datadriver's API, but
without any dependency to pyddapi.
You can use it to develop the User Interface, using the script located in tests/dev_tools.
cd tests/dev_tools
python run_webserver.py
It runs the Airflow's webserver, and it overrides the AIRFLOW__CORE__DAGS_FOLDER to look into tests/dags.
### Setup your virtual env
virtualenv venv
source venv/bin/activate
pip install -e .
pip install -r ci/tests_requirements.txt
ddui install
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ddui-3.0.4.tar.gz
(14.7 kB
view details)
Built Distribution
ddui-3.0.4-py2.py3-none-any.whl
(15.9 kB
view details)
File details
Details for the file ddui-3.0.4.tar.gz
.
File metadata
- Download URL: ddui-3.0.4.tar.gz
- Upload date:
- Size: 14.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.7.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 186a6a30c1f2497a3d52b5ee3435145fd48e0eccdecf2f345791c7831fac4633 |
|
MD5 | 40baa54d9501d4059949127acc7b2e71 |
|
BLAKE2b-256 | 0dd6b9e73aa626afad788661d0ed10ffffc74bf2356951cf5b10e4ebdd997c22 |
File details
Details for the file ddui-3.0.4-py2.py3-none-any.whl
.
File metadata
- Download URL: ddui-3.0.4-py2.py3-none-any.whl
- Upload date:
- Size: 15.9 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.7.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b8947a24358c21fc1b2c058a924061355fb0203a07aff38facf2b05d9a9c7688 |
|
MD5 | 75e33edc591eaabbcdcb45ac7d63fd9c |
|
BLAKE2b-256 | 5af5aa5fe6ad9ebe91f5949ebad13a6bda793169ee4199248d68f5d9ebe69e84 |