Dockerfile generator for AGI -- nothing more, nothing less.
Project description
agi-pack
A Dockerfile builder for AGI — nothing more, nothing less.
📦 agi-pack
allows you to define your Docker images using a simple YAML format, and then generate them on-the-fly using Jinja2 templates with Pydantic-based validation. It's a simple tool that aims to simplify the process of building Docker images for ML.
🚨 Disclaimer: More than 75% of this initial implementation was generated by GPT-4 and Github Co-Pilot. See attribution section below for more details.
Goals 🎯
- 😇 Simplicity: Make it easy to define and build docker images for ML.
- 📦 Best-practices: Bring best-practices to building docker images for ML -- good base images, multi-stage builds, minimal image sizes, etc.
- 🧩 Modular, Re-usable, Composable: Define
base
,dev
andprod
targets with multi-stage builds, and re-use them wherever possible. - 👩💻 Extensible: Make the YAML / DSL easily hackable and extensible to support the ML ecosystem, as more libraries, drivers, HW vendors, come into the market.
- ☁️ Vendor-agnostic:
agi-pack
is not intended to be built for any specific vendor -- I need this tool for internal purposes, but I decided to build it in the open and keep it simple.
Installation 📦
pip install agi-pack
For shell completion, you can install them via:
agi-pack --install-completion <bash|zsh|fish|powershell|pwsh>
Go through the examples and the corresponding examples/generated directory to see a few examples of what agi-pack
can do. If you're interested in checking out a CUDA / CUDNN example, check out examples/agibuild.base-cu118.yaml.
Quickstart 🛠
-
Create a simple YAML configuration file called
agibuild.yaml
. You can useagi-pack init
to generate a sample configuration file.agi-pack init
-
Edit
agibuild.yaml
to define your custom system and python packages.images: sklearn-base: base: debian:buster-slim system: - wget - build-essential python: 3.8.10 pip: - loguru - typer - scikit-learn
Let's break this down:
sklearn-base
: name of the target you want to build. Usually, these could be variants like*-base
,*-dev
,*-prod
,*-test
etc.base
: base image to build from.system
: system packages to install viaapt-get install
.python
: specific python version to install viaminiconda
.pip
: python packages to install viapip install
.
-
Generate the Dockerfile using
agi-pack generate
agi-pack generate -c agibuild.yaml
You should see the following output:
$ agi-pack generate -c agibuild.yaml 📦 sklearn-base └── 🎉 Successfully generated Dockerfile (target=sklearn-base, filename=Dockerfile). └── `docker build -f Dockerfile --target sklearn-base .`
That's it! Here's the generated Dockerfile
-- use it to run docker build
and build the image directly.
Rationale 🤔
Docker has become the standard for building and managing isolated environments for ML. However, any one who has gone down this rabbit-hole knows how broken ML development is, especially when you need to experiment and re-configure your environments constantly. Production is another nightmare -- large docker images (10GB+
), bloated docker images with model weights that are ~5-10GB
in size, 10+ minute long docker build times, sloppy package management to name just a few.
What makes Dockerfiles painful? If you've ever tried to roll your own Dockerfiles with all the best-practices while fully understanding their internals, you'll still find yourself building, and re-building, and re-building these images across a whole host of use-cases. Having to build Dockerfile(s) for dev
, prod
, and test
all turn out to be a nightmare when you add the complexity of hardware targets (CPUs, GPUs, TPUs etc), drivers, python, virtual environments, build and runtime dependencies.
agi-pack aims to simplify this by allowing developers to define Dockerfiles in a concise YAML format and then generate them based on your environment needs (i.e. python version, system packages, conda/pip dependencies, GPU drivers etc).
For example, you should be able to easily configure your dev
environment for local development, and have a separate prod
environment where you'll only need the runtime dependencies avoiding any bloat.
agi-pack
hopes to also standardize the base images, so that we can really build on top of giants.
More Complex Example 📚
Now imagine you want to build a more complex image that has multiple stages, and you want to build a base
image that has all the basic dependencies, and a dev
image that has additional build-time dependencies.
images:
base-cpu:
name: agi
base: debian:buster-slim
system:
- wget
python: 3.8.10
pip:
- scikit-learn
run:
- echo "Hello, world!"
dev-cpu:
base: base-cpu
system:
- build-essential
Once you've defined this agibuild.yaml
, running agi-pack generate
will generate the following output:
$ agi-pack generate -c agibuild.yaml
📦 base-cpu
└── 🎉 Successfully generated Dockerfile (target=base-cpu, filename=Dockerfile).
└── `docker build -f Dockerfile --target base-cpu .`
📦 dev-cpu
└── 🎉 Successfully generated Dockerfile (target=dev-cpu, filename=Dockerfile).
└── `docker build -f Dockerfile --target dev-cpu .`
As you can see, agi-pack
will generate a single Dockerfile for each of the targets defined in the YAML file. You can then build the individual images from the same Dockerfile using docker targets: docker build -f Dockerfile --target <target> .
where <target>
is the name of the image target you want to build.
Here's the corresponding Dockerfile
that was generated.
Why the name? 🤷♂️
agi-pack
is very much intended to be tongue-in-cheek -- we are soon going to be living in a world full of quasi-AGI agents orchestrated via ML containers. At the very least, agi-pack
should provide the building blocks for us to build a more modular, re-usable, and distribution-friendly container format for "AGI".
Inspiration and Attribution 🌟
TL;DR
agi-pack
was inspired by a combination of Replicate'scog
, Baseten'struss
, skaffold, and Docker Compose Services. I wanted a standalone project without any added cruft/dependencies of vendors and services.
📦 agi-pack is simply a weekend project I hacked together, that started with a conversation with ChatGPT / GPT-4.
🚨 Disclaimer: More than 75% of this initial implementation was generated by GPT-4 and Github Co-Pilot.
ChatGPT Prompt
Prompt: I'm building a Dockerfile generator and builder to simplify machine learning infrastructure. I'd like for the Dockerfile to be dynamically generated (using Jinja templates) with the following parametrizations:
# Sample YAML file
images:
base-gpu:
base: nvidia/cuda:11.8.0-base-ubuntu22.04
system:
- gnupg2
- build-essential
- git
python: 3.8.10
pip:
- torch==2.0.1
I'd like for this yaml file to generate a Dockerfile via
agi-pack generate -c <name>.yaml
. You are an expert in Docker and Python programming, how would I implement this builder in Python. Use Jinja2 templating and miniconda python environments wherever possible. I'd like an elegant and concise implementation that I can share on PyPI.
Contributing 🤝
Contributions are welcome! Please read the CONTRIBUTING guide for more information.
License 📄
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file agi_pack-0.1.13-py3-none-any.whl
.
File metadata
- Download URL: agi_pack-0.1.13-py3-none-any.whl
- Upload date:
- Size: 16.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 23033d702f39cb202bbab3ee14e983267906d85107474863f8f3c7526358e01d |
|
MD5 | b7c2fe47d527a376b17d235bc9ffcd4d |
|
BLAKE2b-256 | d43df48196abf67bb1f976b48580e615d901f1330d7733b154c9f3b14d194d37 |