Build data products as code
Project description
phidata
Build data products as code
Phidata is a toolkit for building high-quality, reliable data products.
Our goal is to create tables, metrics and dashboards that can be used for Analytics and Machine Learning.
Features:
- Define your data products as code.
- Build a data platform with dev and prd environments.
- Manage tables as python objects and build a data lake as code.
- Run Airflow and Superset locally on docker and production on aws.
- Manage everything in 1 codebase using engineering best practices.
More Information:
- Website: phidata.com
- Documentation: https://docs.phidata.com
- Chat: Discord
Quick start
This guide shows how to run Airflow, Superset, Jupyter and Postgres locally on docker.
To follow along, you need:
- python 3.7+
- docker desktop
Install phidata
Create a python virtual environment
python3 -m venv ~/.venvs/dpenv
source ~/.venvs/dpenv/bin/activate
Install and initialize phidata
pip install phidata
phi init
Create workspace
Workspace is the directory containing the code for your data platform. It is version controlled using git and shared with your team.
Run phi ws init
to create a new workspace in the current directory. Press enter to create a default workspace using the aws
blueprint.
phi ws init
cd into directory
cd data-platform
Run Apps
Apps are open-source tools like airflow, superset and jupyter that run the data products.
Open workspace/settings.py and enable all apps on line 24.
pg_dbs_enabled: bool = True
superset_enabled: bool = True
jupyter_enabled: bool = True
airflow_enabled: bool = True
traefik_enabled: bool = True
Then run phi ws up
to create docker resources. Give 5 minutes for containers to start and the apps to initialize.
phi ws up
Deploying workspace: data-platform
--**-- Docker env: dev
--**-- Confirm resources:
-+-> Network: starter-aws
-+-> Container: dev-pg-starter-aws-container
-+-> Container: airflow-db-starter-aws-container
-+-> Container: airflow-redis-starter-aws-container
-+-> Container: airflow-ws-container
-+-> Container: airflow-scheduler-container
-+-> Container: airflow-worker-container
-+-> Container: jupyter-container
-+-> Container: superset-db-starter-aws-container
-+-> Container: superset-redis-starter-aws-container
-+-> Container: superset-ws-container
-+-> Container: superset-init-container
-+-> Container: traefik
Network: starter-aws
Total 13 resources
Confirm deploy [Y/n]:
Checkout Superset
Open localhost:8410 in your browser to view the superset UI.
- User: admin
- Pass: admin
- Logs:
docker logs -f superset-ws-container
Checkout Airflow
Open localhost:8310 in a separate browser or private window to view the Airflow UI.
- User: admin
- Pass: admin
- Logs:
docker logs -f airflow-ws-container
Checkout Jupyter
Open localhost:8888 in a browser to view the jupyterlab UI.
- Pass: admin
- Logs:
docker logs -f jupyter-container
Shutdown workspace
Shut down all resources using phi ws down
:
phi ws down
or shut down using the app name:
phi ws down --app jupyter
phi ws down --app airflow
phi ws down --app superset
More Information:
- Website: phidata.com
- Documentation: https://docs.phidata.com
- Chat: Discord
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.