Skip to main content

Infrastructure for AI applications and machine learning pipelines

Project description

Packyak image

Packyak is a python library and platform for building data pipelines that clean datasets and train ML models with human supervision and feedback.

It automatically provisions all required infrastructure and guarantees a least-privilege and privacy compliant data architecture.

Features

  1. Train transformation functions (using AI) that are supervised by humans and continually improved with feedback and corrections.
  2. Orchestrate transformation with dependency graphs (DAGs)
  3. Compute data sets when new data arrives or when its dependencies change
  4. Re-compute data sets when a transformation function is changed or improves from learning
  5. Auto-provision all required cloud infrastructure
  6. Auto-configured to be compliant with privacy regulations such as HIPAA and GDPR
  7. Least-privilege IAM policies with auto-generated reports for regulators

Example

🔧 Note: Packyak is in active development. Not all features are implemented. Check back to see the following example grow.

Below is the most simple Packyak application: a Bucket with a Function that writes to it.

Your application's infrastructure is declared in code. The Packyak compiler analyzes it to auto-provision cloud resources (in this case AWS S3 Bucket and Lambda Function) with least privilege IAM Policy inference.

from packyak import Bucket, function

videos = Bucket("videos")

@function()
async def upload_video():
    await videos.put("key", "value")

@asset()
async def transcribed_videos():
  ...

Research

Inspired by (and integrating with):

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

packyak-0.1.0.tar.gz (12.1 kB view hashes)

Uploaded Source

Built Distribution

packyak-0.1.0-py3-none-any.whl (15.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page