A lightweight pipeline/workflow engine. Weave data processing nodes into DAG workflows with decorators and the >> operator.

These details have not been verified by PyPI

Project description

🧶 Dagloom

Like a loom weaving threads into fabric, Dagloom weaves data processing nodes into DAG workflows.

A lightweight pipeline/workflow engine for Python. Define nodes with decorators, connect them with the >> operator, visualize and edit in a drag-and-drop Web UI.

中文文档

✨ Why Dagloom?

Problem	Competitors	Dagloom
Overkill installation	Airflow needs PostgreSQL + Redis + Celery + Webserver	`pip install dagloom && dagloom serve`
Too many concepts	Dagster: Assets, Ops, Jobs, Resources, IO Managers...	Just `@node` and `>>`
Code/visual disconnect	Airflow UI is read-only	True bidirectional sync
Can't resume from failure	Re-run the entire pipeline	`dagloom resume` picks up where it left off
Shell-only nodes	Dagu only supports shell commands	Native Python objects (DataFrames, dicts, classes)

🚀 Quick Start

Installation

pip install dagloom

Quick Demo

# Run the built-in demo pipeline instantly
dagloom demo --run

# Or start the web server with demo pipeline
dagloom demo

Your First Pipeline

from dagloom import node, Pipeline

@node
def greet(name: str) -> str:
    """Create a greeting message."""
    return f"Hello, {name}!"

@node
def shout(message: str) -> str:
    """Convert message to uppercase."""
    return message.upper()

@node
def add_emoji(message: str) -> str:
    """Add emoji to the message."""
    return f"🎉 {message} 🎉"

# Build DAG with >> operator
pipeline = greet >> shout >> add_emoji

# Run the pipeline
result = pipeline.run(name="World")
print(result)  # 🎉 HELLO, WORLD! 🎉

Fan-out / Fan-in (Parallel)

Use parallel() to run independent nodes concurrently and merge results:

from dagloom import node, parallel

@node(cache=True, retry=3)
def fetch_users() -> list[dict]:
    return [{"id": 1, "name": "Alice"}]

@node(cache=True, retry=3)
def fetch_orders() -> list[dict]:
    return [{"id": 101, "user_id": 1}]

@node
def merge(inputs: dict[str, list]) -> dict:
    # inputs = {"fetch_users": [...], "fetch_orders": [...]}
    return {"users": inputs["fetch_users"], "orders": inputs["fetch_orders"]}

pipeline = parallel(fetch_users, fetch_orders) >> merge
result = pipeline.run()

Note: When a node has multiple predecessors, it receives a dict keyed by predecessor name: {predecessor_name: output_value}.

Conditional Branching

Use the | operator to create mutually exclusive branches — the runtime selects which branch to execute based on the upstream output:

from dagloom import node

@node
def classify(text: str) -> dict:
    """Route to different processors."""
    if "urgent" in text:
        return {"branch": "urgent_handler", "text": text}
    return {"branch": "normal_handler", "text": text}

@node
def urgent_handler(data: dict) -> str:
    return f"🚨 URGENT: {data['text']}"

@node
def normal_handler(data: dict) -> str:
    return f"📋 Normal: {data['text']}"

pipeline = classify >> (urgent_handler | normal_handler)
result = pipeline.run(text="urgent: server down!")
# 🚨 URGENT: urgent: server down!

Streaming Nodes (Generator)

Node functions can be generators — yielded values are automatically collected into a list:

@node
def stream_data(url: str):
    """Yield data chunks."""
    for i in range(5):
        yield {"chunk": i, "url": url}

@node
def aggregate(chunks: list[dict]) -> int:
    return len(chunks)

pipeline = stream_data >> aggregate
result = pipeline.run(url="https://example.com")
# 5

Execution Hooks

Monitor node execution with on_node_start / on_node_end callbacks:

import asyncio
from dagloom import node, AsyncExecutor

@node
def step(x: int) -> int:
    return x + 1

pipeline = step

def my_hook(node_name, ctx):
    print(f"  → {node_name}: {ctx.get_node_info(node_name).status}")

executor = AsyncExecutor(
    pipeline,
    on_node_start=my_hook,
    on_node_end=my_hook,
)
result = asyncio.run(executor.execute(x=1))

Pipeline Scheduling

Schedule pipelines to run automatically on cron expressions or fixed intervals:

from dagloom import node, Pipeline

@node
def fetch(url: str = "https://example.com/data.csv") -> list:
    return [1, 2, 3]

@node
def process(data: list) -> int:
    return sum(data)

# Set schedule via Pipeline constructor
pipeline = Pipeline(name="daily_etl", schedule="0 9 * * *")

# Or use interval shorthand
pipeline = Pipeline(name="frequent_check", schedule="every 30m")

# Or set after construction
pipeline = fetch >> process
pipeline.name = "my_pipeline"
pipeline.schedule = "0 9 * * 1-5"  # Weekdays at 9am

The scheduler runs in-process with dagloom serve — schedules are persisted to SQLite and auto-restored on restart.

Notifications (Email / Webhook)

Get notified when pipelines succeed or fail:

from dagloom import node, Pipeline

@node
def fetch(url: str = "https://example.com") -> dict:
    return {"data": [1, 2, 3]}

@node
def process(data: dict) -> int:
    return sum(data["data"])

pipeline = fetch >> process
pipeline.name = "daily_etl"
pipeline.notify_on = {
    "failure": ["email://ops@team.com", "webhook://https://hooks.slack.com/xxx?format=slack"],
    "success": ["webhook://https://hooks.slack.com/yyy?format=slack"],
}

Supported channels:

Email: email://recipient@example.com — SMTP delivery via aiosmtplib
Slack: webhook://https://hooks.slack.com/...?format=slack — Block Kit formatting
WeChat Work: webhook://https://qyapi.weixin.qq.com/...?format=wechat_work
Feishu: webhook://https://open.feishu.cn/...?format=feishu
Generic Webhook: webhook://https://your-endpoint.com/hook — plain JSON POST

Advanced Features

@node(retry=3, cache=True, timeout=30.0)
def fetch_data(url: str) -> pd.DataFrame:
    """Fetch CSV data with retry and caching."""
    return pd.read_csv(url)

@node(cache=True)
def clean(df: pd.DataFrame) -> pd.DataFrame:
    """Remove rows with missing values."""
    return df.dropna()

@node
def save(df: pd.DataFrame) -> str:
    """Persist cleaned data to parquet file."""
    path = "output/cleaned.parquet"
    df.to_parquet(path)
    return path

pipeline = fetch_data >> clean >> save
pipeline.run(url="https://example.com/data.csv")

Cache dependency invalidation: When fetch_data produces a different output on re-run, Dagloom automatically invalidates the caches for clean and save so they re-execute with fresh data. No manual cache management needed.

Per-Node Executor Hints

Control execution strategy per node — run CPU-heavy work in separate processes while keeping I/O-bound nodes in the event loop:

from dagloom import node, AsyncExecutor

@node
def fetch(url: str) -> list:
    """I/O-bound: runs in thread (default)."""
    return [1, 2, 3]

@node(executor="process")
def transform(data: list) -> list:
    """CPU-bound: runs in a separate process."""
    return [x ** 2 for x in data]

@node
async def save(data: list) -> str:
    """Async: awaited directly on the event loop."""
    return f"Saved {len(data)} records"

pipeline = fetch >> transform >> save
executor = AsyncExecutor(pipeline)
result = await executor.execute(url="https://example.com")

Credential Management

Securely store and retrieve secrets with layered resolution (env vars → .env → encrypted DB):

# CLI
export DAGLOOM_MASTER_KEY=$(python -c "from dagloom.security import Encryptor; print(Encryptor.generate_key())")
dagloom secret set API_KEY "sk-abc123"
dagloom secret get API_KEY
dagloom secret list
dagloom secret delete API_KEY

# Python API
from dagloom.security import Encryptor, SecretStore
from dagloom.store.db import Database

db = Database()
await db.connect()
store = SecretStore(db=db, encryptor=Encryptor())

await store.set("API_KEY", "sk-abc123")
value = await store.get("API_KEY")  # Checks env → .env → encrypted DB

HTTP Authentication

Protect your Dagloom server with API Key or Basic authentication:

# API Key authentication
dagloom serve --auth-type API_KEY --auth-key sk-your-secret-key

# Basic authentication (username:password)
dagloom serve --auth-type BASIC_AUTH --auth-key admin:mypassword

# No authentication (default)
dagloom serve

# Client: API Key authentication
import httpx

headers = {"Authorization": "Bearer sk-your-secret-key"}
response = httpx.get("http://localhost:8000/api/pipelines", headers=headers)

# Client: Basic authentication
response = httpx.get("http://localhost:8000/api/pipelines", auth=("admin", "mypassword"))

Web UI

dagloom serve
# Open http://localhost:8000 in your browser

The Web UI provides:

DAG Editor — drag-and-drop pipeline visualization with ReactFlow
Pipeline List — sidebar listing all registered pipelines
Metrics Dashboard — per-node execution stats with bar charts (success/failure rate, p50/p95 latency)
Version History — timeline of pipeline snapshots with diff support
Execution Log — real-time log viewer via WebSocket
Node Inspector — click any node to view config (retry, cache, timeout)

For frontend development:

cd web
npm install
npm run dev    # Vite dev server with HMR, proxies /api to backend

Tech stack: React 18 + TypeScript + Vite + Tailwind CSS + ReactFlow + Recharts

🔖 Pipeline Versioning

Every DAG change is automatically versioned with a SHA-256 hash. Compare versions to see exactly what changed:

# List version history
versions = await db.list_pipeline_versions("my_pipeline")

# Diff two versions
# GET /api/versions/{hash_a}/diff/{hash_b}
# Returns: added/removed nodes, edges, and unified code diff

REST API:

GET /api/pipelines/{id}/versions — list version history
GET /api/versions/{hash} — get a specific version snapshot
GET /api/versions/{hash_a}/diff/{hash_b} — structured diff between versions

📊 Observability

Track node execution metrics (wall time, success/failure rate, retries) across pipeline runs:

from dagloom import AsyncExecutor
from dagloom.store.db import Database

db = Database()
await db.connect()

executor = AsyncExecutor(pipeline, metrics_db=db)
result = await executor.execute(url="https://example.com")

# Query aggregate stats per node
stats = await db.get_node_stats("my_pipeline")
# [{"node_id": "fetch", "total_runs": 50, "avg_ms": 120.5, "p95_ms": 350.2, ...}]

# Execution history with per-node detail
history = await db.get_execution_history("my_pipeline", limit=10)

REST API:

GET /api/metrics/{pipeline_id} — per-node stats (runs, failure rate, p50/p95 latency)
GET /api/history/{pipeline_id}?limit=20 — execution history with node metrics

🔌 Connectors

Dagloom includes built-in connectors for common data sources:

Available connectors: PostgreSQL, MySQL, S3/MinIO, HTTP API, MongoDB, Redis, Kafka

pip install dagloom[connectors]     # PostgreSQL, MySQL, S3, HTTP
pip install dagloom[mongodb]        # MongoDB (motor)
pip install dagloom[redis]          # Redis (redis-py)
pip install dagloom[kafka]          # Kafka (aiokafka)
pip install dagloom[all-connectors] # All connectors

from dagloom.connectors import ConnectionConfig
from dagloom.connectors.mongodb import MongoDBConnector

config = ConnectionConfig(host="localhost", database="mydb")
async with MongoDBConnector(config) as mongo:
    docs = await mongo.execute("find", collection="users", filter={"active": True})

🏗️ Architecture

Single Process Architecture
┌─────────────────────────────────────┐
│  CLI / Web UI                       │
├─────────────────────────────────────┤
│  FastAPI (REST API + WebSocket)     │
├─────────────────────────────────────┤
│  Scheduler (APScheduler + asyncio)  │
├─────────────────────────────────────┤
│  Core (@node + Pipeline + DAG)      │
├─────────────────────────────────────┤
│  SQLite (embedded, zero config)     │
└─────────────────────────────────────┘

📦 Project Structure

dagloom/
├── core/       # @node decorator, Pipeline class, DAG validation
├── scheduler/  # Cron/interval scheduler, asyncio executor, caching, checkpoint
├── security/   # Encrypted secret store, Fernet encryption, HTTP authentication (API Key + Basic Auth)
├── connectors/ # PostgreSQL, MySQL, S3, HTTP, MongoDB, Redis, Kafka connectors
├── server/     # FastAPI REST API + WebSocket
├── store/      # SQLite storage layer
└── cli/        # Click CLI (serve, run, list, inspect, scheduler, secret)

📖 Documentation

🤝 Contributing

Contributions are welcome! Please feel free to:

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

Apache License 2.0 — see LICENSE for details.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.2

Apr 17, 2026

1.0.1

Apr 16, 2026

1.0.0

Apr 16, 2026

0.14.0

Apr 16, 2026

0.13.0

Apr 16, 2026

0.12.0

Apr 15, 2026

0.11.0

Apr 15, 2026

0.10.0

Apr 15, 2026

0.9.0

Apr 15, 2026

0.8.0

Apr 15, 2026

0.7.0

Apr 15, 2026

0.6.0

Apr 15, 2026

0.5.0

Apr 15, 2026

0.4.0

Apr 14, 2026

0.3.0

Apr 10, 2026

0.2.0

Apr 10, 2026

0.1.1

Apr 9, 2026

0.1.0

Apr 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dagloom-1.0.2.tar.gz (249.8 kB view details)

Uploaded Apr 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dagloom-1.0.2-py3-none-any.whl (102.7 kB view details)

Uploaded Apr 17, 2026 Python 3

File details

Details for the file dagloom-1.0.2.tar.gz.

File metadata

Download URL: dagloom-1.0.2.tar.gz
Upload date: Apr 17, 2026
Size: 249.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dagloom-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`f86b69b96a7ca922b95a4bd33abff28dc1daa71c51e1ff4ff0a27529194a5200`
MD5	`00206035a08b7f91d4c046b8d5967761`
BLAKE2b-256	`dc519eab41f3b76429e896e5c0a54795f9e126914fc2e437dd9aa6a75ceb97cc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dagloom-1.0.2.tar.gz:

Publisher: publish.yml on lucientong/dagloom

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dagloom-1.0.2.tar.gz
- Subject digest: f86b69b96a7ca922b95a4bd33abff28dc1daa71c51e1ff4ff0a27529194a5200
- Sigstore transparency entry: 1326988771
- Sigstore integration time: Apr 17, 2026
Source repository:
- Permalink: lucientong/dagloom@787ac31cd2a8b4989e9f516abb5979de9fea1b1d
- Branch / Tag: refs/tags/v1.0.2
- Owner: https://github.com/lucientong
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@787ac31cd2a8b4989e9f516abb5979de9fea1b1d
- Trigger Event: workflow_dispatch

File details

Details for the file dagloom-1.0.2-py3-none-any.whl.

File metadata

Download URL: dagloom-1.0.2-py3-none-any.whl
Upload date: Apr 17, 2026
Size: 102.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dagloom-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5498a162578c9f19628011641fc3b0eafb7886c2935dbd0d856ac20fa3cec864`
MD5	`e34d40171d5878377c386308c2fd6db6`
BLAKE2b-256	`85a88b6a7d7ef9726e8ff2146b172e5946e18efb4477c1283fe939b99fcce39e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dagloom-1.0.2-py3-none-any.whl:

Publisher: publish.yml on lucientong/dagloom

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dagloom-1.0.2-py3-none-any.whl
- Subject digest: 5498a162578c9f19628011641fc3b0eafb7886c2935dbd0d856ac20fa3cec864
- Sigstore transparency entry: 1326988866
- Sigstore integration time: Apr 17, 2026
Source repository:
- Permalink: lucientong/dagloom@787ac31cd2a8b4989e9f516abb5979de9fea1b1d
- Branch / Tag: refs/tags/v1.0.2
- Owner: https://github.com/lucientong
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@787ac31cd2a8b4989e9f516abb5979de9fea1b1d
- Trigger Event: workflow_dispatch

dagloom 1.0.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

🧶 Dagloom

✨ Why Dagloom?

🚀 Quick Start

Installation

Quick Demo

Your First Pipeline

Fan-out / Fan-in (Parallel)

Conditional Branching

Streaming Nodes (Generator)

Execution Hooks

Pipeline Scheduling

Notifications (Email / Webhook)

Advanced Features

Per-Node Executor Hints

Credential Management

HTTP Authentication

Web UI

🔖 Pipeline Versioning

📊 Observability

🔌 Connectors

🏗️ Architecture

📦 Project Structure

📖 Documentation

🤝 Contributing

📄 License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance