No project description provided
Project description
AWS Data Platform
This repo contains the code for building a data platform on AWS.
We enable 2 data environments:
- dev: A development env running on docker
- prd: A production env running on aws + k8s
Setup
- Create + activate a virtual env:
python3 -m venv ~/.venvs/dpenv
source ~/.venvs/dpenv/bin/activate
- Install + init
phidata:
pip install phidata
phi init
from the
data-platformdir:
- Setup workspace:
phi ws setup
- Copy
workspace/example_secretstoworkspace/secrets:
cp -r workspace/example_secrets workspace/secrets
- Deploy dev containers to docker using:
phi ws up
phi will create the following resources:
- Container:
dev-pg-dp-container - Network:
dp
Optional: If something fails, try running again with debug logs:
phi ws up -d
Optional: Create .env file:
cp example.env .env
Using the dev environment
The workspace/dev directory contains the code for the dev resources. The workspace/settings.py file can be used to enable the open-source applications like:
- Postgres App: for storing dev data (runs 1 container)
- Airflow App: for running dags & pipelines (runs 5 containers)
- Superset App: for visualizing dev data (runs 4 containers)
Update the workspace/settings.py file and run:
phi ws up
TIP: The phi ws ... commands use --env dev and --config docker by default. Set in the workspace/config.py file.
Running phi ws up is equivalent to running phi ws up --env dev --config docker
Run Airflow
- Set
airflow_enabled = Truein workspace/settings.py and runphi ws up - Check out the airflow webserver running in the
airflow-ws-container:
- url:
http://localhost:8310/ - user:
admin - pass:
admin
Superset webserver
- Set
superset_enabled = Truein workspace/settings.py and runphi ws up - Check out the superset webserver running in the
superset-ws-container:
- url:
http://localhost:8410/ - user:
admin - pass:
admin
Format + lint workspace
Format with black & lint with mypy using:
./scripts/format.sh
If you need to install packages, run:
pip install black mypy
Upgrading phidata version
activate virtualenv:
source ~/.venvs/dpenv/bin/activate
- Upgrade phidata:
pip install phidata --upgrade
- Rebuild local images & recreate containers:
CACHE=f phi ws up --env dev --config docker
Optional: Install workspace locally
Install the workspace & python packages locally in your virtual env using:
./scripts/install.sh
This will:
- Install python packages from
requirements.txt - Install python project in
--editablemode - Install
requirements-airflow.txtwithout dependencies for code completion
This enables:
- Running
black&mypylocally - Running workflows locally
- Editor auto-completion
Add python packages
Following PEP-631, we should add dependencies to the pyproject.toml file.
To add a new package:
- Add the module to the pyproject.toml file.
- Run:
./scripts/upgrade.sh. This script updates therequirements.txtfile. - Optional: Run:
./scripts/install.shto install the new dependencies in a local virtual env. - Run
CACHE=f phi ws upto recreate images + containers
Adding airflow providers
Airflow requirements are stored in the workspace/dev/airflow_resources/requirements-airflow.txt file.
To add new airflow providers:
- Add the module to the workspace/dev/airflow_resources/requirements-airflow.txt file.
- Optional: Run:
./scripts/install.shto install the new dependencies in a local virtual env. - Run
CACHE=f phi ws up --name airflowto recreate images + containers
To force recreate all images & containers, use the CACHE env variable
CACHE=false phi ws up \
--env dev \
--config docker \
--type image|container \
--name airflow|superset|pg \
--app airflow|superset
Shut down workspace
phi ws down
Restart all resources
phi ws restart
Restart all containers
phi ws restart --type container
Restart traefik app
phi ws restart --app traefik
Restart airflow app
phi ws restart --app airflow
Add environment/secret variables to your apps
The containers read env using the env_file and secrets using the secrets_file params.
These files are stored in the workspace/env or workspace/secrets directories.
Airflow
To add env variables to your airflow containers:
- Update the workspace/env/dev_airflow_env.yml file.
- Restart all airflow containers using:
phi ws restart --name airflow --type container
To add secret variables to your airflow containers:
- Update the workspace/secrets/dev_airflow_secrets.yml file.
- Restart all airflow containers using:
phi ws restart --name airflow --type container
Test a DAG
# ssh into airflow-worker | airflow-ws
docker exec -it airflow-ws-container zsh
docker exec -it airflow-worker-container zsh
# Test run the DAGs using module name
python -m workflow.dir.file
# Test run the DAG file
python /mnt/workspaces/data-platform/workflow/dir/file.py
# List DAGs
airflow dags list
# List tasks in DAG
airflow tasks list \
-S /mnt/workspaces/data-platform/workflow/dir/file.py \
-t dag_name
# Test airflow task
airflow tasks test dag_name task_name 2022-07-01
Recreate everything
Notes:
- Use
data-platformas the working directory - Deactivate existing venv using
deactivateif needed
echo "*- Deleting venv"
rm -rf ~/.venvs/dpenv
echo "*- Deleting af-db-dp-volume volume"
docker volume rm af-db-dp-volume
echo "*- Recreating venv"
python3 -m venv ~/.venvs/dpenv
source ~/.venvs/dpenv/bin/activate
echo "*- Install phi"
pip install phidata
phi init
echo "*- Setup + deploying workspace"
phi ws setup
CACHE=f phi ws up
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file awsdp-0.1.0.tar.gz.
File metadata
- Download URL: awsdp-0.1.0.tar.gz
- Upload date:
- Size: 5.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ebd618bf78e389095266f695d46e9268a617a5cf79aff06dd003c0ce373bc8d2
|
|
| MD5 |
df6503c77c1085e456874115d2f5d1ed
|
|
| BLAKE2b-256 |
6291865705c4c2be2fd4b0c65abda33eb35d70143b3a8f374c5738b60abf8bab
|
File details
Details for the file awsdp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: awsdp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
112eef8359f94e61a61e5673f80617469bce6b2cef1b87ef63745a577a285b81
|
|
| MD5 |
8396d135055fa8e445077ba180f7a006
|
|
| BLAKE2b-256 |
7a0bbf164f0780739c173ec875f903f5a22ea7fb7a6d0db52d8bb4462cfdb5d4
|