A LLM Pipeline for generating optimized Dockerfiles
Project description
DockAI
The End of Manual Dockerfiles: Automated, Intelligent, Production-Ready.
DockAI is a robust, enterprise-grade Python CLI tool designed to intelligently analyze a software repository and generate a production-ready, optimized Dockerfile. It uses a novel three-stage pipeline to understand the project structure ("The Brain"), architect the build environment ("The Architect"), and validate the result ("The Validator").
💡 Why DockAI?
Automated Dockerfiles > Human Written > Cloud Native Buildpacks
DockAI represents the next evolution in containerization.
- Better than Humans: Humans forget best practices, security patches, and layer optimizations. DockAI applies the collective knowledge of thousands of expert DevOps engineers to every single build, ensuring multi-stage optimization, non-root users, and perfect caching strategies every time.
- Better than Buildpacks: Cloud Native Buildpacks are opaque "black boxes" that add bloat and are hard to debug. DockAI generates a transparent, standard Dockerfile that you can read, audit, and modify. You get the automation of buildpacks with the control of a handwritten file.
✨ Key Features
- Zero-Config Automation: Developers never need to write a Dockerfile again. The GitHub Action automatically generates a perfect, up-to-date Dockerfile on every commit.
- Three-Stage Pipeline: Combines analysis (cheap/fast), generation (smart/expensive), and validation (agentic feedback) for maximum reliability.
- Context-Aware Validation: Intelligently distinguishes between long-running services (web servers, APIs) and short-lived scripts (CLI tools, batch jobs) to apply the correct validation logic. Services must stay running; scripts must exit successfully with code 0.
- Agentic Self-Correction: Automatically builds and runs the generated Dockerfile in a sandboxed environment to verify it works. If it fails, the agent analyzes the error logs and self-corrects until success.
- Intelligent Scanning: Uses
pathspecto fully respect.gitignoreand.dockerignorepatterns (including wildcards like*.logorsecret_*.json). - Robust & Reliable: Built-in automatic retries with exponential backoff for all AI API calls to handle network instability.
- Real-time Cost Tracking: Displays token usage and estimated cost for every run using live pricing data from the community-maintained LiteLLM API.
- Observability: Structured logging with a
--verbosemode for deep debugging and transparency. - Security First: Generates non-root, multi-stage builds by default.
- Smart Health Checks: Automatically detects health endpoints (e.g.,
/health,/api/health) and generates appropriateHEALTHCHECKinstructions in the Dockerfile. - Adaptive Wait Strategies: Intelligently estimates service startup times and generates robust wait strategies to ensure services are fully ready before validation proceeds.
- Enhanced Error Handling: Provides clear, actionable error messages when generation fails or when the AI encounters ambiguous project structures.
🧠 Architecture
The system operates in three distinct phases:
-
The Intelligent Scanner (
scanner.py):- Maps the entire repository file tree.
- Automatically filters out files based on
.gitignoreand.dockerignoreusing industry-standard wildcard matching.
-
Stage 1: The Brain (
analyzer.py):- Input: JSON list of file paths.
- Task: Identifies the technology stack (e.g., Python/Flask, Node/Express, Go/Gin) and pinpoints the exact files needed for context (e.g.,
package.json,requirements.txt,go.mod). - Output: Stack description, critical files list, and initial project type classification.
-
Stage 2: The Architect (
generator.py):- Input: Content of the critical files identified in Stage 1.
- Task: Analyzes the code to determine if the project is a service (long-running, listens on a port) or a script (runs once and exits).
- Output: A multi-stage, security-focused Dockerfile with version pinning and cache optimization, plus the refined project type classification.
-
Stage 3: Context-Aware Validation (
validator.py):- Task: Builds the generated Dockerfile and runs a container in a secure, sandboxed environment with resource limits (512MB RAM, 1 CPU, 100 PIDs max).
- Smart Health Checks: Automatically detects and validates health endpoints (e.g.,
/health) to ensure the service is truly ready to accept traffic. - Adaptive Wait Times: Dynamically adjusts wait times based on the service type (e.g., Java apps need more time than Go apps) to prevent false negatives during validation.
- Service Validation: Verifies the container starts and stays running.
- Script Validation: Verifies the container runs and exits successfully with code 0.
- Feedback: If validation fails, the error logs are fed back to "The Architect" to regenerate a fixed Dockerfile. This cycle repeats until success or
MAX_RETRIESis reached.
🚀 Getting Started
Prerequisites
- Python 3.8+
- An OpenAI API Key
Installation
From PyPI (Recommended):
pip install dockai-cli
Configure Environment Variables:
export OPENAI_API_KEY=sk-your-api-key-here
export MODEL_ANALYZER=gpt-4o-mini
export MODEL_GENERATOR=gpt-4o
export MAX_RETRIES=3
[!NOTE]
The PyPI package is nameddockai-cli, but the command you run is simplydockai(without the-clisuffix). This is the same for both installation methods.
From Source (Development):
-
Clone the repository:
git clone https://github.com/itzzjb/dockai.git cd dockai
-
Install the package: You can install the tool locally using pip. We recommend installing in "editable" mode (
-e) if you plan to modify the code.pip install -e .
-
Configure Environment: Create a
.envfile in the root directory and add your OpenAI API key and model configurations:OPENAI_API_KEY=sk-your-api-key-here MODEL_ANALYZER=gpt-4o-mini MODEL_GENERATOR=gpt-4o MAX_RETRIES=3
📚 Examples
DockAI intelligently handles both long-running services and short-lived scripts across different programming languages.
Node.js Examples
Service (HTTP Server):
// index.js
const http = require('http');
const server = http.createServer((req, res) => {
res.writeHead(200);
res.end('Hello World');
});
server.listen(3000, () => {
console.log('Server running at http://localhost:3000/');
});
$ dockai ./my-node-service
✓ Identified Stack: Node.js with npm
✓ Project Type: service
✓ Generated Dockerfile with multi-stage build
✓ Validation: Service is running successfully
Script (One-time Execution):
// index.js
console.log("Hello Script World");
$ dockai ./my-node-script
✓ Identified Stack: Node.js with npm
✓ Project Type: script
✓ Generated Dockerfile
✓ Validation: Script finished successfully (Exit Code 0)
Go Examples
Service (HTTP Server):
// main.go
package main
import (
"fmt"
"net/http"
)
func handler(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Hello from Go Service")
}
func main() {
http.HandleFunc("/", handler)
fmt.Println("Server starting on port 8080...")
http.ListenAndServe(":8080", nil)
}
$ dockai ./my-go-service
✓ Identified Stack: Go 1.21
✓ Project Type: service
✓ Generated optimized multi-stage Dockerfile
✓ Validation: Service is running successfully
Script (One-time Execution):
// main.go
package main
import "fmt"
func main() {
fmt.Println("Hello from Go Script")
}
$ dockai ./my-go-script
✓ Identified Stack: Go 1.21
✓ Project Type: script
✓ Generated Dockerfile
✓ Validation: Script finished successfully (Exit Code 0)
Why This Matters
Traditional Dockerfile generators fail with scripts because they expect all containers to stay running. DockAI understands the difference:
- Services (web servers, APIs, bots): Must listen on a port and stay alive
- Scripts (CLI tools, batch jobs, data processors): Must run once and exit cleanly
This context-awareness eliminates false validation failures and ensures your Dockerfile works correctly for your specific use case.
🤖 Usage as GitHub Action
You can use DockAI directly in your GitHub Actions workflow to automatically generate a Dockerfile on every push. This ensures your Dockerfile is always perfectly in sync with your code changes, without any manual intervention.
Example Workflow
Create a file .github/workflows/dockai.yml:
name: Generate Dockerfile
on:
workflow_dispatch: # Allows manual triggering
jobs:
generate:
runs-on: ubuntu-latest
permissions:
contents: write # Needed to push the generated Dockerfile back
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Run DockAI
uses: itzzjb/dockai@v1
with:
openai_api_key: ${{ secrets.OPENAI_API_KEY }}
model_analyzer: gpt-4o-mini
model_generator: gpt-4o
max_retries: 3
- name: Commit and Push Dockerfile
run: |
git config --global user.name "DockAI Bot"
git config --global user.email "bot@dockai.com"
git add Dockerfile
git commit -m "ci: generate optimized Dockerfile via DockAI" || echo "No changes to commit"
git push
💻 CLI Usage
Once installed, the dockai command is available globally in your terminal.
Run the tool by pointing it to the target repository path.
dockai /path/to/target/repo
Example (Current Directory):
dockai .
Verbose Mode (for debugging):
dockai . --verbose
What to Expect
The CLI uses a rich terminal interface to show progress:
- Scanning: Locates files, respecting all ignore patterns.
- Analyzing: "The Brain" decides what matters.
- Reading: Only reads the content of critical files (privacy/token efficient).
- Generating: "The Architect" builds the Dockerfile.
- Validating: Builds and runs the container to ensure it works (auto-corrects if needed).
- Result: A
Dockerfileis saved to the target directory.
🎨 Custom Instructions
DockAI supports custom instructions to tailor the Dockerfile generation to your specific needs. You can provide instructions in natural language using two methods:
Method 1: Environment Variables
Set environment variables to provide instructions:
export DOCKAI_ANALYZER_INSTRUCTIONS="Always include package-lock.json if it exists"
export DOCKAI_GENERATOR_INSTRUCTIONS="Use port 8080 and install ffmpeg"
dockai .
Or in your .env file:
DOCKAI_ANALYZER_INSTRUCTIONS="Always include package-lock.json if it exists."
DOCKAI_GENERATOR_INSTRUCTIONS="Ensure all images are based on Alpine Linux."
Method 2: .dockai File
Create a .dockai file in your project root with section-based instructions:
# Instructions for the analyzer (file selection stage)
[analyzer]
Always include package-lock.json or yarn.lock if they exist.
Look for any .env.example files to understand environment variables.
Include docker-compose.yml if present.
# Instructions for the generator (Dockerfile creation stage)
[generator]
Ensure the container runs as a non-root user named 'appuser'.
Do not expose any ports other than 8080.
Install 'curl' and 'vim' for debugging purposes.
Set the timezone to 'UTC'.
Define an environment variable 'APP_ENV' with value 'production'.
[!NOTE]
If you don't use sections ([analyzer]and[generator]), the instructions will be applied to both stages.
Use Cases for Custom Instructions
Analyzer Instructions:
- "Always include lock files (package-lock.json, yarn.lock, poetry.lock)"
- "Look for configuration files in the config/ directory"
- "Include any .proto files for gRPC services"
Generator Instructions:
- "Use Alpine-based images only"
- "Install system dependencies: ffmpeg, imagemagick, ghostscript"
- "Expose port 3000 instead of the default"
- "Add health check using curl to /health endpoint"
- "Set NODE_ENV to production"
- "Create a non-root user named 'nodeuser'"
GitHub Action with Custom Instructions
- name: Run DockAI
uses: itzzjb/dockai@v1
with:
openai_api_key: ${{ secrets.OPENAI_API_KEY }}
model_analyzer: gpt-4o-mini
model_generator: gpt-4o
analyzer_instructions: "Always include yarn.lock if present"
generator_instructions: "Use Alpine Linux and install curl"
🛠️ Development
Running Tests
This project uses pytest for testing. To run the test suite:
pytest
Project Structure
The project follows a modern src-layout:
src/dockai/: Source code package.main.py: The CLI orchestrator usingtyperandrich.scanner.py: Directory traversal logic withpathspec.analyzer.py: Interface for the Stage 1 LLM call (with retries).generator.py: Interface for the Stage 2 LLM call (with retries).validator.py: Docker build and run validation logic.
tests/: Unit and integration tests.pyproject.toml: Build configuration and dependency management.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dockai_cli-1.1.2.tar.gz.
File metadata
- Download URL: dockai_cli-1.1.2.tar.gz
- Upload date:
- Size: 25.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
286aec46e11ec826a1129369707c665886c47b542c1455198787a60f6f531a09
|
|
| MD5 |
171826e6461d13914fc91fbcfd3ba2c0
|
|
| BLAKE2b-256 |
5069762a8b7c09da62ac8e965934a634218ef030c566b6d31d0b5dab8c3d3f84
|
File details
Details for the file dockai_cli-1.1.2-py3-none-any.whl.
File metadata
- Download URL: dockai_cli-1.1.2-py3-none-any.whl
- Upload date:
- Size: 20.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c6c471bd1c2e1f77ae20c883fbf83de8c845eddf4cbd38cbea884f61e0cff2d
|
|
| MD5 |
8f27767d5d6d7e13c68a1018385c6c83
|
|
| BLAKE2b-256 |
706c4483d67eb4b3b8fc0b2302bfddd82457e4fc313bf5df27bc29679e124f12
|