Infrastructure for AI applications and machine learning pipelines
Project description
Yakka
Yakka is a python library and platform for building data pipelines that clean datasets and train ML models with human supervision and feedback.
It automatically provisions all required infrastructure and guarantees a least-privilege and privacy compliant data architecture.
Features
- Train transformation functions (using AI) that are supervised by humans and continually improved with feedback and corrections.
- Orchestrate transformation with dependency graphs (DAGs)
- Compute data sets when new data arrives or when its dependencies change
- Re-compute data sets when a transformation function is changed or improves from learning
- Auto-provision all required cloud infrastructure
- Auto-configured to be compliant with privacy regulations such as HIPAA and GDPR
- Least-privilege IAM policies with auto-generated reports for regulators
Example
🔧 Note: Yakka is in active development. Not all features are implemented. Check back to see the following example grow.
Below is the most simple Yakka application: a Bucket with a Function that writes to it.
Your application's infrastructure is declared in code. The Yakka compiler analyzes it to auto-provision cloud resources (in this case AWS S3 Bucket and Lambda Function) with least privilege IAM Policy inference.
from yakka import Bucket, function
videos = Bucket("videos")
@function()
async def upload_video():
await videos.put("key", "value")
@asset()
async def transcribed_videos():
...
Research
Inspired by (and integrating with):
- https://dagster.io/
- https://www.llamaindex.ai/
- https://unstructured.io/
- https://docs.modular.com/mojo/roadmap.html
Naming Options
- Smelt is available on Pip
- Yakka is not available on NPM or Pip
- I maybe have access to alchemy on NPM but it's taken on PIP
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.