Building blocks for Data Engineering
Project description
phidata
Building Blocks for Data Engineering
A python library of data engineering building blocks
Python library of OSS data tools, use it deliver high-quality data products on the cheap.
Honestly our goal is just to save money and run OSS on the cheap. So we run stuff locally using docker and in production on AWS. Because we're running OSS, we're OSS too with a MPL-2.0 license.
How it works
- Phidata converts infrastructure, tools and data assets into python classes.
- These classes are then put together to build data platforms, ML Apis, AI Apps, etc.
- Run your platform locally for development using docker with
phi ws up dev:docker
- Run it in production on AWS:
phi ws up prod:aws
Advantages
- Automate the grunt work
- Recipes for common data tasks
- Everything is version controlled: Infra, Apps and Workflows
- Equal
dev
andproduction
environments for data development at scale - Multiple teams working together share code and define dependencies in a pythonic way
- Formatting (
black
), linting (ruff
), type-checking (mypy
) and testing (pytest
) included
More Information:
- Website: phidata.com
- Documentation: https://docs.phidata.com
- Chat: Discord
Quickstart
Let's build a data product using crypto data. Open the Terminal
and follow along to download sample data and analyze it in a jupyter notebook.
Setup
Create a python virtual environment
python3 -m venv ~/.venvs/dpenv
source ~/.venvs/dpenv/bin/activate
Install and initialize phidata
pip install phidata
phi init
If you encounter errors, try updating pip using
python -m pip install --upgrade pip
Create workspace
Workspace is a directory containing the source code for your data platform. Run phi ws init
to create a new workspace.
Press Enter to select the default name (data-platform
) and template (aws-data-platform
)
phi ws init
cd into the new workspace directory
cd data-platform
Run your first workflow
The first step of building a data product is collecting the data. The workflows/crypto/prices.py
file contains an example task for downloading crypto data locally to a CSV file. Run it using
phi wf run crypto/prices
Note how we define the output as a CsvTableLocal
object with partitions and pre-write checks
# Step 1: Define CsvTableLocal for storing data
# Path: `storage/tables/crypto_prices`
crypto_prices_local = CsvTableLocal(
name="crypto_prices",
database="crypto",
partitions=["ds"],
write_checks=[NotEmpty()],
)
Checkout data-platform/storage/tables/crypto_prices
for the CSVs
Run your first App
Docker is a great tool for testing locally. Your workspace comes pre-configured with a jupyter notebook for analyzing data. Install docker desktop and after the engine is running, start the workspace using
phi ws up
Press Enter to confirm. Verify the container is running using the docker dashboard or docker ps
docker ps --format 'table {{.Names}}\t{{.Image}}'
NAMES IMAGE
jupyter-container phidata/jupyter-aws-dp:dev
Jupyter UI
Open localhost:8888 in a new tab to view the jupyterlab UI. Password: admin
Navigate to notebooks/examples/crypto_prices.ipynb
and run all cells.
Shutdown
Play around and then stop the workspace using
phi ws down
Next
Checkout the documentation for more information or chat with us on discord
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.