No project description provided
Project description
AWS Data Platform
This repo contains the code for building a data platform on AWS.
We enable 2 data environments:
- dev: A development env running on docker
- prd: A production env running on aws + k8s
Setup
- Create + activate a virtual env:
python3 -m venv ~/.venvs/dpenv
source ~/.venvs/dpenv/bin/activate
- Install + init
phidata
:
pip install phidata
phi init
from the
data-platform
dir:
- Setup workspace:
phi ws setup
- Copy
workspace/example_secrets
toworkspace/secrets
:
cp -r workspace/example_secrets workspace/secrets
- Deploy dev containers to docker using:
phi ws up
phi
will create the following resources:
- Container:
dev-pg-dp-container
- Network:
dp
Optional: If something fails, try running again with debug logs:
phi ws up -d
Optional: Create .env
file:
cp example.env .env
Using the dev environment
The workspace/dev directory contains the code for the dev resources. The workspace/settings.py file can be used to enable the open-source applications like:
- Postgres App: for storing dev data (runs 1 container)
- Airflow App: for running dags & pipelines (runs 5 containers)
- Superset App: for visualizing dev data (runs 4 containers)
Update the workspace/settings.py file and run:
phi ws up
TIP: The phi ws ...
commands use --env dev
and --config docker
by default. Set in the workspace/config.py
file.
Running phi ws up
is equivalent to running phi ws up --env dev --config docker
Run Airflow
- Set
airflow_enabled = True
in workspace/settings.py and runphi ws up
- Check out the airflow webserver running in the
airflow-ws-container
:
- url:
http://localhost:8310/
- user:
admin
- pass:
admin
Superset webserver
- Set
superset_enabled = True
in workspace/settings.py and runphi ws up
- Check out the superset webserver running in the
superset-ws-container
:
- url:
http://localhost:8410/
- user:
admin
- pass:
admin
Format + lint workspace
Format with black
& lint with mypy
using:
./scripts/format.sh
If you need to install packages, run:
pip install black mypy
Upgrading phidata version
activate virtualenv:
source ~/.venvs/dpenv/bin/activate
- Upgrade phidata:
pip install phidata --upgrade
- Rebuild local images & recreate containers:
CACHE=f phi ws up --env dev --config docker
Optional: Install workspace locally
Install the workspace & python packages locally in your virtual env using:
./scripts/install.sh
This will:
- Install python packages from
requirements.txt
- Install python project in
--editable
mode - Install
requirements-airflow.txt
without dependencies for code completion
This enables:
- Running
black
&mypy
locally - Running workflows locally
- Editor auto-completion
Add python packages
Following PEP-631, we should add dependencies to the pyproject.toml file.
To add a new package:
- Add the module to the pyproject.toml file.
- Run:
./scripts/upgrade.sh
. This script updates therequirements.txt
file. - Optional: Run:
./scripts/install.sh
to install the new dependencies in a local virtual env. - Run
CACHE=f phi ws up
to recreate images + containers
Adding airflow providers
Airflow requirements are stored in the workspace/dev/airflow_resources/requirements-airflow.txt file.
To add new airflow providers:
- Add the module to the workspace/dev/airflow_resources/requirements-airflow.txt file.
- Optional: Run:
./scripts/install.sh
to install the new dependencies in a local virtual env. - Run
CACHE=f phi ws up --name airflow
to recreate images + containers
To force recreate all images & containers, use the CACHE
env variable
CACHE=false phi ws up \
--env dev \
--config docker \
--type image|container \
--name airflow|superset|pg \
--app airflow|superset
Shut down workspace
phi ws down
Restart all resources
phi ws restart
Restart all containers
phi ws restart --type container
Restart traefik app
phi ws restart --app traefik
Restart airflow app
phi ws restart --app airflow
Add environment/secret variables to your apps
The containers read env using the env_file
and secrets using the secrets_file
params.
These files are stored in the workspace/env or workspace/secrets directories.
Airflow
To add env variables to your airflow containers:
- Update the workspace/env/dev_airflow_env.yml file.
- Restart all airflow containers using:
phi ws restart --name airflow --type container
To add secret variables to your airflow containers:
- Update the workspace/secrets/dev_airflow_secrets.yml file.
- Restart all airflow containers using:
phi ws restart --name airflow --type container
Test a DAG
# ssh into airflow-worker | airflow-ws
docker exec -it airflow-ws-container zsh
docker exec -it airflow-worker-container zsh
# Test run the DAGs using module name
python -m workflow.dir.file
# Test run the DAG file
python /mnt/workspaces/data-platform/workflow/dir/file.py
# List DAGs
airflow dags list
# List tasks in DAG
airflow tasks list \
-S /mnt/workspaces/data-platform/workflow/dir/file.py \
-t dag_name
# Test airflow task
airflow tasks test dag_name task_name 2022-07-01
Recreate everything
Notes:
- Use
data-platform
as the working directory - Deactivate existing venv using
deactivate
if needed
echo "*- Deleting venv"
rm -rf ~/.venvs/dpenv
echo "*- Deleting af-db-dp-volume volume"
docker volume rm af-db-dp-volume
echo "*- Recreating venv"
python3 -m venv ~/.venvs/dpenv
source ~/.venvs/dpenv/bin/activate
echo "*- Install phi"
pip install phidata
phi init
echo "*- Setup + deploying workspace"
phi ws setup
CACHE=f phi ws up
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file awsdp-0.1.0.tar.gz
.
File metadata
- Download URL: awsdp-0.1.0.tar.gz
- Upload date:
- Size: 5.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ebd618bf78e389095266f695d46e9268a617a5cf79aff06dd003c0ce373bc8d2 |
|
MD5 | df6503c77c1085e456874115d2f5d1ed |
|
BLAKE2b-256 | 6291865705c4c2be2fd4b0c65abda33eb35d70143b3a8f374c5738b60abf8bab |
File details
Details for the file awsdp-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: awsdp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 112eef8359f94e61a61e5673f80617469bce6b2cef1b87ef63745a577a285b81 |
|
MD5 | 8396d135055fa8e445077ba180f7a006 |
|
BLAKE2b-256 | 7a0bbf164f0780739c173ec875f903f5a22ea7fb7a6d0db52d8bb4462cfdb5d4 |