Infrastructure for AI applications and machine learning pipelines
Project description
PackYak
Packyak makes it easy to build Lakehouses, Data Pipelines and and AI applications on AWS.
Roadmap
-
StreamlitSite
- deploy a Streamlit application to ECS with VPC and Load Balancing - Infer least privilege IAM Policies for Streamlit scripts (
home.py
,pages/*.py
) -
@function
- host an Lambda Function - Infer least privilege IAM Policies for functions
-
Bucket
- work with files in S3, attach event handlers -
Queues
- send messages to, attach event handlers -
Stream
- send and consume records through AWS Kinesis -
Table
- store structured data (Parquet, Orc, etc.) in a Glue Catalog. Model data usingpydantic
-
@asset
- build data pipelines with dependency graphs -
@train
- capture the inputs and outputs of a function for ML training and human feedback - Generate audit reports for HIPAA and GDPR compliance policies
Installation
Pre-requisites
- Docker (for bundling Python applications for the target runtime, e.g. in an Amazon Linux Lambda Function)
- Python Poetry
curl -sSL https://install.python-poetry.org | python3 -
poetry-plugin-export
- see https://python-poetry.org/docs/plugins/#using-plugins
poetry self add poetry-plugin-export
How To: Deploy Streamlit
Custom Domain
- Create a Hosted Zone
- Transfer the DNS nameservers from your DNS provider to the Hosted Zone
- Create a Certificate
HTTPS
- Create a Certificate via the AWS Console
Example
🔧 Note: Packyak is in active development. Not all features are implemented. Check back to see the following example grow.
Below is the most simple Packyak application: a Bucket with a Function that writes to it.
Your application's infrastructure is declared in code. The Packyak compiler analyzes it to auto-provision cloud resources (in this case AWS S3 Bucket and Lambda Function) with least privilege IAM Policy inference.
from packyak import Bucket, function
videos = Bucket("videos")
@function()
async def upload_video():
await videos.put("key", "value")
@videos.on("create")
async def on_uploaded_video(event: Bucket.ObjectCreatedEvent):
video = await videos.get(event.key)
transcription
@asset()
async def transcribed_videos():
...
Nessie Setup
TODO: should be done as part of packyak init
pip install pynessie
mkdir -p ~/.config
cat <<EOF > ~/.config/nessie
auth:
type: aws
timeout: 10
endpoint: http://localhost:19120/api/v1
verify: yes
EOF
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
packyak-0.1.3.tar.gz
(27.5 kB
view hashes)
Built Distribution
packyak-0.1.3-py3-none-any.whl
(37.5 kB
view hashes)