Skip to main content

Build data products as code

Project description

phidata

Build data products as code

version pythonversion downloads build-status test-status


Phidata is a toolkit for building high-quality, reliable data products.

Our goal is to create high quality tables, metrics and dashboards that can be used for Analytics and ML.

Features:

  • Define your data products as code.
  • Build a data platform with dev and prd environments.
  • Manage tables as python objects and build a data lake as code.
  • Run Airflow and Superset locally on docker and production on aws.
  • Manage everything in 1 codebase using engineering best practices.

More Information:


Quick start

This guide shows how to run Airflow, Superset, Jupyter and Postgres locally on docker.

To following along, you need:

Install phidata

Create a python virtual environment

python3 -m venv ~/.venvs/dpenv
source ~/.venvs/dpenv/bin/activate

Install and initialize phidata

pip install phidata
phi init

Create workspace

Workspace is the directory containing the code for your data platform. It is version controlled using git and shared by your team.

Run phi ws init to create a new workspace in the current directory. Press enter to create a default workspace using the aws blueprint.

phi ws init

cd into directory

cd data-platform

Run Apps

Apps are open-source tools like airflow, superset and jupyter that run our data products.

Open workspace/settings.py and enable all apps on line 24.

pg_dbs_enabled: bool = True
superset_enabled: bool = True
jupyter_enabled: bool = True
airflow_enabled: bool = True
traefik_enabled: bool = True

Then run phi ws up to create docker resources. Give 5 minutes for containers to start and the apps to initialize.

phi ws up

Deploying workspace: data-platform

--**-- Docker env: dev
--**-- Confirm resources:
  -+-> Network: starter-aws
  -+-> Container: dev-pg-starter-aws-container
  -+-> Container: airflow-db-starter-aws-container
  -+-> Container: airflow-redis-starter-aws-container
  -+-> Container: airflow-ws-container
  -+-> Container: airflow-scheduler-container
  -+-> Container: airflow-worker-container
  -+-> Container: jupyter-container
  -+-> Container: superset-db-starter-aws-container
  -+-> Container: superset-redis-starter-aws-container
  -+-> Container: superset-ws-container
  -+-> Container: superset-init-container
  -+-> Container: traefik

Network: starter-aws
Total 13 resources
Confirm deploy [Y/n]:

Checkout Superset

Open localhost:8410 in your browser to view the superset UI.

  • User: admin
  • Pass: admin
  • Logs: docker logs -f superset-ws-container

Checkout Airflow

Open localhost:8310 in a separate browser or private window to view the Airflow UI.

  • User: admin
  • Pass: admin
  • Logs: docker logs -f airflow-ws-container

Checkout Jupyter

Open localhost:8888 in a browser to view the jupyterlab UI.

  • Pass: admin
  • Logs: docker logs -f jupyter-container

Shutdown workspace

Shut down all resources using phi ws down:

phi ws down

or shut down using the app name:

phi ws down --app jupyter

phi ws down --app airflow

phi ws down --app superset

Project details


Release history Release notifications | RSS feed

This version

0.3.5

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phidata-0.3.5.tar.gz (326.8 kB view hashes)

Uploaded Source

Built Distribution

phidata-0.3.5-py3-none-any.whl (483.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page