Skip to main content

LLM-first conversational ETL pipeline generator

Project description

Osiris Pipeline

The deterministic compiler for AI-native data pipelines. You describe outcomes in plain English; Osiris compiles them into reproducible, production-ready manifests that run with the same behavior everywhere (local or cloud).

🚀 Quick Start

# Setup
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Initialize configuration
osiris init

# Start MCP server for AI integration (Claude Desktop, etc.)
osiris mcp

🎯 What Makes Osiris Different

  • Compiler, not orchestrator - Others schedule what you hand-craft. Osiris generates, validates, and compiles pipelines from plain English.
  • Determinism as a contract - Fingerprinted manifests guarantee reproducibility across environments.
  • Conversational → executable - Describe intent; Osiris interrogates real systems and proposes a feasible plan.
  • Run anywhere, same results - Transparent adapters deliver execution parity (local and E2B today).
  • Boring by design - Predictable, explainable, portable — industrial-grade AI, not magical fragility.

📊 Visual Overview

Pipeline Execution Dashboard

Osiris Dashboard Interactive HTML dashboard showing pipeline execution metrics and performance

Run Overview with E2B Integration

Run Overview Comprehensive run overview showing E2B cloud execution with <1% overhead

Step-by-Step Pipeline Execution

Pipeline Steps Detailed view of pipeline steps with row counts and execution times

Example Usage via MCP

# Start the MCP server
$ osiris mcp

# Use with Claude Desktop or any MCP-compatible client to:
# - Discover database schemas and sample data
# - Generate SQL queries and transformations
# - Validate and compile pipelines
# - Execute with deterministic, reproducible results

# Or run pipelines directly:
$ osiris run examples/inactive_customers.yaml

✨ Key Features

  • AI-native pipeline generation from plain English descriptions
  • Deterministic compilation with fingerprinted, reproducible manifests
  • Run anywhere with identical behavior (local or E2B cloud)
  • Interactive HTML reports with comprehensive observability
  • AI Operation Package (AIOP) for LLM-friendly debugging and analysis
  • LLM-friendly with machine-readable documentation for AI assistants

🤖 LLM-Friendly Documentation

Osiris provides machine-readable documentation for AI assistants:

🚀 E2B Cloud Execution

Run pipelines in isolated E2B sandboxes with <1% overhead:

# Run in cloud sandbox
osiris run pipeline.yaml --e2b

# With custom resources
osiris run pipeline.yaml --e2b --e2b-cpu 4 --e2b-mem 8

See the User Guide for complete E2B documentation.

🤖 AI Operation Package (AIOP)

Every pipeline run automatically generates a comprehensive AI Operation Package for LLM analysis:

# View AIOP export after any run
osiris logs aiop --last

# Generate human-readable summary
osiris logs aiop --last --format md

# Configure in osiris.yaml
aiop:
  enabled: true  # Auto-export after each run
  policy: core   # ≤300KB for LLM consumption

AIOP provides four semantic layers for AI understanding:

  • Evidence Layer: Timestamped events, metrics, and artifacts
  • Semantic Layer: DAG structure and component relationships
  • Narrative Layer: Natural language descriptions with citations
  • Metadata Layer: LLM primer and configuration

See AIOP Architecture for details.

📚 Documentation

For comprehensive documentation, visit the Documentation Hub:

🚦 Roadmap

  • v0.2.0 ✅ - Conversational agent, deterministic compiler, E2B parity
  • v0.3.0 ✅ - AI Operation Package (AIOP) for LLM-friendly debugging
  • v0.3.1 ✅ - Fixed validation warnings for ADR-0020 compliant configs
  • v0.3.5 ✅ - GraphQL extractor, DuckDB processor, test infrastructure improvements
  • v0.5.4 (Current) ✅ - CLI version display hotfix
  • v0.5.3 ✅ - Python version requirement fix + CSV extractor runtime bug fix
  • M2 - Production workflows, approvals, orchestrator integration
  • M3 - Streaming, parallelism, enterprise scale
  • M4 - Iceberg tables, intelligent DWH agent

See docs/roadmap/ for details.

🛠️ Contributing

See CONTRIBUTING.md for development workflow, code quality standards, and commit guidelines.

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

osiris_pipeline-0.5.5.tar.gz (442.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

osiris_pipeline-0.5.5-py3-none-any.whl (487.3 kB view details)

Uploaded Python 3

File details

Details for the file osiris_pipeline-0.5.5.tar.gz.

File metadata

  • Download URL: osiris_pipeline-0.5.5.tar.gz
  • Upload date:
  • Size: 442.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for osiris_pipeline-0.5.5.tar.gz
Algorithm Hash digest
SHA256 849ea43af5bac890e222c7073d521c90edabf833e0156ba1161a68449a1d8821
MD5 ec3952b1676dc1bab845671b1ebb03d7
BLAKE2b-256 6ff59c1df3994458a1926f672d8e856dddd013592b26465f7de4eb9c0bad0436

See more details on using hashes here.

File details

Details for the file osiris_pipeline-0.5.5-py3-none-any.whl.

File metadata

File hashes

Hashes for osiris_pipeline-0.5.5-py3-none-any.whl
Algorithm Hash digest
SHA256 69b6c02c40c27e56bc5f2f16fd16c9c2c27a20c13759a2d47eacf8dcccf45e43
MD5 9328273b473aacd92c6ba6acfb92595a
BLAKE2b-256 b1dfcc0996ed1f12ab5bed4d57966a1bd2bf438d3a3e48ecdb80e5c78c1d7291

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page