AI-assisted ETL and ELT pipelines from the command line
Project description
Loafer
A CLI-first, declarative ETL/ELT pipeline tool driven by YAML.
Overview
Modern data pipelines often become bogged down by repetitive boilerplate and fragile scripts. Loafer solves this by treating data movement and transformation as configuration.
Loafer is a CLI-first tool that allows you to extract data from databases, files, and APIs, apply powerful transformations (both custom and AI-generated), and load the results into target systems — all configured within a single, highly readable YAML file.
Key Idea: Describe your pipeline in YAML, run it from the CLI, and let Loafer handle the execution.
Features
- Declarative Configuration: Define your inputs, outputs, and processing logic in simple YAML. No complex workflow orchestrators required.
- ETL & ELT Support: Choose between Extract-Transform-Load (in-memory/streaming Python transformations) or Extract-Load-Transform (in-database SQL executions).
- AI-Powered Transformations: Optionally use natural language to auto-generate complex Python transforms or target-database SQL using modern LLMs (Gemini, OpenAI, Claude).
- Streaming & Batch Processing: Automatically switches to memory-efficient streaming for large datasets based on configurable thresholds.
- Data Quality Guards: Built-in validation steps to catch malformed data or high null-rates before they hit your target.
- Developer-Friendly & Extensible: Use custom Python files for advanced transformations when declarative logic isn't enough.
Installation
Via pip (PyPI)
The easiest way to install Loafer is via pip:
pip install loafer-etl
Via Docker (GHCR)
Loafer is officially published to the GitHub Container Registry. Available tags include specific versions (e.g. 0.2.0, 0.2) and latest.
docker pull ghcr.io/lupppig/loafer:latest
To run Loafer using Docker, mount your current working directory so Loafer can access your configuration and local files:
docker run --rm -v $(pwd):/workspace -w /workspace ghcr.io/lupppig/loafer:latest run pipeline.yaml
Warning
Do not mount your working directory to/app(e.g.,-v $(pwd):/app). The Loafer image uses/appinternally for its application core and virtual environment. Overwriting this directory will break the container. Always use/workspaceor another path.
From Source
To contribute or use the latest unreleased features:
git clone https://github.com/lupppig/loafer.git
cd loafer
pip install -e .
Quick Start
Create a single YAML file describing your pipeline. This example extracts orders from PostgreSQL, normalizes the data via AI, and saves it to a clean CSV.
pipeline.yaml
name: Daily Orders Pipeline
mode: etl
source:
type: postgres
url: ${DATABASE_URL}
query: "SELECT * FROM orders WHERE created_at >= NOW() - INTERVAL '1 day'"
target:
type: csv
path: ./output/clean_orders.csv
write_mode: overwrite
transform:
type: ai
instruction: >
Drop cancelled orders, normalize currency to USD, and
combine first_name and last_name into full_name.
llm:
# Supported providers: gemini, openai, claude, qwen
provider: gemini
model: gemini-2.5-flash
api_key: ${GEMINI_API_KEY}
Run the pipeline from your terminal:
export DATABASE_URL="postgresql://user:pass@localhost/db"
export GEMINI_API_KEY="your-api-key"
loafer run pipeline.yaml
Expected Outcome:
Loafer will connect to the Postgres database, extract the past day's orders, execute the transformation to clean the data, and successfully write the normalized output to ./output/clean_orders.csv.
How It Works
Loafer pipelines consist of three main stages, executed as a directed graph:
- Extract: Loafer connects to your source (e.g., a SQL database, CSV, or REST API) and pulls the data. Large datasets are automatically chunked and streamed to prevent memory exhaustion.
- Transform: The data is manipulated according to your YAML instructions. This can be AI-generated Python code running in a safe isolated context, a custom Python script you provide, or skipped entirely for simple EL (Extract-Load) operations.
- Load: The transformed data is pushed to your designated target connector (e.g. CSV, Snowflake, Postgres) adhering to your specified write mode (append, overwrite).
In ELT mode, the steps differ slightly: data is first loaded raw into the target database, and transformations are executed as native SQL queries against the target engine.
CLI Usage
Loafer provides a streamlined CLI tailored for day-to-day data engineering.
Command Syntax:
loafer <command> [options]
Common Commands:
loafer run <config.yaml>: Execute a pipeline defined in the given YAML file.--dry-run: Extracts and transforms data without writing to the target.--verbose: Prints detailed execution logs and agent outputs.
loafer validate <config.yaml>: Check a configuration file for syntax and connection errors without running the pipeline.loafer connectors: List all available source and target connectors.
Project Structure
loafer/
├── cli.py # CLI entrypoints and command routing
├── config.py # Configuration parsing and validation
├── runner.py # Pipeline execution logic and LangGraph orchestration
├── connectors/ # Integrations (sources and targets)
│ ├── sources/ # Postgres, CSV, REST API, Excel, etc.
│ └── targets/ # Postgres, CSV, Snowflake, etc.
├── transform/ # Transformation engines (AI, custom, SQL)
├── graph/ # LangGraph state management and pipeline DAGs
├── llm/ # LLM provider integrations (Gemini, Claude, OpenAI)
└── agents/ # Individual workflow nodes (extract, transform, load)
Configuration (YAML)
Loafer pipelines are driven by a single YAML configuration file. Here is the structure:
# Pipeline metadata
name: User Sync Pipeline
mode: etl # Supports 'etl' or 'elt'
# Extract configuration
source:
type: rest_api
url: "https://api.example.com/users"
method: GET
# Load configuration
target:
type: postgres
url: ${TARGET_DB_URL}
table: users_dim
# Transform logic
transform:
type: custom
path: ./transforms/clean_users.py
# Optional: LLM Configuration (required if using type: ai or mode: elt)
llm:
# Supported providers: gemini, openai, claude, qwen
provider: gemini
model: gemini-2.5-flash
api_key: ${GEMINI_API_KEY}
# Performance & Validation
chunk_size: 1000
streaming_threshold: 10000
validation:
strict: true
max_null_rate: 0.1
LLM Setup
To use the AI transformations (via type: ai) or dynamic SQL generation in ELT mode, you must specify an LLM provider in your configuration. We recommend setting API keys via environment variables for security.
Loafer supports four providers out-of-the-box:
1. Gemini (Default)
llm:
provider: gemini
model: gemini-2.5-flash
api_key: ${GEMINI_API_KEY}
2. OpenAI
llm:
provider: openai
model: gpt-4o
api_key: ${OPENAI_API_KEY}
3. Claude (Anthropic)
llm:
provider: claude
model: claude-3-5-sonnet-20241022
api_key: ${ANTHROPIC_API_KEY}
4. Qwen (Alibaba)
llm:
provider: qwen
model: qwen-max
api_key: ${QWEN_API_KEY}
Development
Setting up Loafer for local development requires uv (or standard pip / hatch).
-
Clone and Install dependencies:
git clone https://github.com/yourusername/loafer.git cd loafer # If using uv for fast dependencies: uv sync # Or using standard pip: pip install -e ".[dev]"
-
Run Tests: Loafer uses
pytestfor unit and integration testing.pytest
-
Code Style: This project uses standard Python linting and formatting tools. We recommend running
ruffprior to committing:ruff check . ruff format .
Contributing
Contributions are highly encouraged! Whether it’s a new data connector, a bug fix, or an improvement to the documentation, we'd love your help.
- Fork the repository.
- Create a new branch for your feature (
git checkout -b feature/amazing-connector). - Commit your changes with clear messages.
- Open a Pull Request against the
mainbranch.
Keep things simple and ensure any new features include appropriate tests.
Links
- GitHub Repository: https://github.com/lupppig/loafer
- PyPI Package: https://pypi.org/project/loafer-etl/
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file loafer_etl-0.2.0.tar.gz.
File metadata
- Download URL: loafer_etl-0.2.0.tar.gz
- Upload date:
- Size: 8.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fad74caccbeead178f20fa86d8823a8cdf8b88716919583d15d623c270fa626
|
|
| MD5 |
a8c54ddb868017e49416efa186da3a28
|
|
| BLAKE2b-256 |
ecd0bdf0e26f00ebe5f1896712808add44dd7c1d77f2874f075a95093240ad6a
|
File details
Details for the file loafer_etl-0.2.0-py3-none-any.whl.
File metadata
- Download URL: loafer_etl-0.2.0-py3-none-any.whl
- Upload date:
- Size: 87.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac9f7d64a79aa31d21c046fcdcd8b24fb81abca5becd6571cca43e6bd4923e7c
|
|
| MD5 |
fac966b36824b268030c5b6659c7ddc3
|
|
| BLAKE2b-256 |
d40c4af2bd4c51093ee8d201491757d494e2cfa651397725d14f61d3eb39dfdb
|