Automatic SageMaker Pipeline Generation from DAG Specifications
Project description
Cursus: Automatic SageMaker Pipeline Generation
Transform pipeline graphs into production-ready SageMaker pipelines automatically.
Cursus is an intelligent pipeline generation system that automatically creates complete SageMaker pipelines from user-provided pipeline graphs. Simply define your ML workflow as a graph structure, and Cursus handles all the complex SageMaker implementation details, dependency resolution, and configuration management automatically.
🚀 Quick Start
Installation
# Core installation
pip install cursus
# With ML frameworks
pip install cursus[pytorch,xgboost]
# Full installation with all features
pip install cursus[all]
30-Second Example
from cursus.core import compile_dag_to_pipeline
from cursus.api import PipelineDAG
# Create a simple DAG
dag = PipelineDAG()
dag.add_node("CradleDataLoading")
dag.add_node("TabularPreprocessing")
dag.add_node("XGBoostTraining")
dag.add_edge("CradleDataLoading", "TabularPreprocessing")
dag.add_edge("TabularPreprocessing", "XGBoostTraining")
# Compile to SageMaker pipeline automatically
pipeline = compile_dag_to_pipeline(dag, pipeline_name="fraud-detection")
pipeline.start() # Deploy and run!
Command Line Interface
# Generate a new project
cursus init --template xgboost --name fraud-detection
# Validate your DAG
cursus validate my_dag.py
# Compile to SageMaker pipeline
cursus compile my_dag.py --name my-pipeline --output pipeline.json
✨ Key Features
🎯 Graph-to-Pipeline Automation
- Input: Simple pipeline graph with step types and connections
- Output: Complete SageMaker pipeline with all dependencies resolved
- Magic: Intelligent analysis of graph structure with automatic step builder selection
⚡ 10x Faster Development
- Before: 2-4 weeks of manual SageMaker configuration
- After: 10-30 minutes from graph to working pipeline
- Result: 95% reduction in development time
🧠 Intelligent Dependency Resolution
- Automatic step connections and data flow
- Smart configuration matching and validation
- Type-safe specifications with compile-time checks
- Semantic compatibility analysis
🛡️ Production Ready
- Built-in quality gates and validation
- Enterprise governance and compliance
- Comprehensive error handling and debugging
- 98% complete with 1,650+ lines of complex code eliminated
📊 Proven Results
Based on production deployments across enterprise environments:
| Component | Code Reduction | Lines Eliminated | Key Benefit |
|---|---|---|---|
| Processing Steps | 60% | 400+ lines | Automatic input/output resolution |
| Training Steps | 60% | 300+ lines | Intelligent hyperparameter handling |
| Model Steps | 47% | 380+ lines | Streamlined model creation |
| Registration Steps | 66% | 330+ lines | Simplified deployment workflows |
| Overall System | ~55% | 1,650+ lines | Intelligent automation |
🏗️ Architecture
Cursus follows a sophisticated layered architecture:
- 🎯 User Interface: Fluent API and Pipeline DAG for intuitive construction
- 🧠 Intelligence Layer: Smart proxies with automatic dependency resolution
- 🏗️ Orchestration: Pipeline assembler and compiler for DAG-to-template conversion
- 📚 Registry Management: Multi-context coordination with lifecycle management
- 🔗 Dependency Resolution: Intelligent matching with semantic compatibility
- 📋 Specification Layer: Comprehensive step definitions with quality gates
📚 Usage Examples
Basic Pipeline
from cursus.core import compile_dag_to_pipeline
from cursus.api import PipelineDAG
# Create DAG
dag = PipelineDAG()
dag.add_node("CradleDataLoading")
dag.add_node("XGBoostTraining")
dag.add_edge("CradleDataLoading", "XGBoostTraining")
# Compile to SageMaker pipeline
pipeline = compile_dag_to_pipeline(dag, pipeline_name="my-ml-pipeline")
Advanced Configuration
from cursus.core import compile_dag_to_pipeline
from cursus.api import PipelineDAG
# Create DAG with more complex workflow
dag = PipelineDAG()
dag.add_node("CradleDataLoading")
dag.add_node("TabularPreprocessing")
dag.add_node("XGBoostTraining")
dag.add_node("XGBoostModelEval")
dag.add_edge("CradleDataLoading", "TabularPreprocessing")
dag.add_edge("TabularPreprocessing", "XGBoostTraining")
dag.add_edge("XGBoostTraining", "XGBoostModelEval")
# Compile with custom configuration
pipeline = compile_dag_to_pipeline(
dag=dag,
pipeline_name="advanced-ml-pipeline",
config_path="config.yaml"
)
Using the Compiler Class
from cursus.core import PipelineDAGCompiler
from cursus.api import PipelineDAG
# Create DAG
dag = PipelineDAG()
dag.add_node("TabularPreprocessing")
dag.add_node("XGBoostTraining")
dag.add_edge("TabularPreprocessing", "XGBoostTraining")
# Use compiler for more control
compiler = PipelineDAGCompiler()
pipeline = compiler.compile(dag, pipeline_name="my-pipeline")
🔧 Installation Options
Core Installation
pip install cursus
Includes basic DAG compilation and SageMaker integration.
Framework-Specific
pip install cursus[pytorch] # PyTorch Lightning models
pip install cursus[xgboost] # XGBoost training pipelines
pip install cursus[nlp] # NLP models and processing
pip install cursus[processing] # Advanced data processing
Development
pip install cursus[dev] # Development tools
pip install cursus[docs] # Documentation tools
pip install cursus[all] # Everything included
🎯 Who Should Use Cursus?
Data Scientists & ML Practitioners
- Focus on model development, not infrastructure complexity
- Rapid experimentation with 10x faster iteration
- Business-focused interface eliminates SageMaker expertise requirements
Platform Engineers & ML Engineers
- 60% less code to maintain and debug
- Specification-driven architecture prevents common errors
- Universal patterns enable faster team onboarding
Organizations
- Accelerated innovation with faster pipeline development
- Reduced technical debt through clean architecture
- Built-in governance and compliance frameworks
📖 Documentation
📚 Complete Documentation Hub
Your gateway to all Cursus documentation - start here for comprehensive navigation
Knowledge Management Philosophy
- Zettelkasten Principles - The knowledge management principles behind our slipbox documentation system, explaining how we organize and connect information for maximum discoverability and organic growth
Core Documentation
- Developer Guide - Comprehensive guide for developing new pipeline steps and extending Cursus
- Design Documentation - Detailed architectural documentation and design principles
- Pipeline Catalog - Comprehensive collection of prebuilt pipeline templates organized by framework and task
- API Reference - Detailed API documentation including core, api, steps, and other components
- Examples - Ready-to-use pipeline blueprints and examples
Quick Links
- Getting Started - Start here for adding new pipeline steps
- Design Principles - Core architectural principles
- Best Practices - Recommended development practices
- Component Guide - Overview of key components
- Validation System - Comprehensive validation framework for pipeline alignment and quality assurance
🤝 Contributing
We welcome contributions! See our Developer Guide for comprehensive details on:
- Prerequisites - What you need before starting development
- Creation Process - Step-by-step process for adding new pipeline steps
- Validation Checklist - Comprehensive checklist for validating implementations
- Common Pitfalls - Common mistakes to avoid
For architectural insights and design decisions, see the Design Documentation.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🔗 Links
- GitHub: https://github.com/TianpeiLuke/cursus
- Issues: https://github.com/TianpeiLuke/cursus/issues
- PyPI: https://pypi.org/project/cursus/
Cursus: Making SageMaker pipeline development 10x faster through intelligent automation. 🚀
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cursus-1.3.6.tar.gz.
File metadata
- Download URL: cursus-1.3.6.tar.gz
- Upload date:
- Size: 802.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11150ed94816c8f930ea351f5c3bcc21efc462a8c0971e96b1c4e964356875ef
|
|
| MD5 |
b91fe8f318fb10a5c4029c3c6d909704
|
|
| BLAKE2b-256 |
7d00cc73a09c17970166efb0cc3d495764e30cfd34cc0564669df2549e26e77a
|
File details
Details for the file cursus-1.3.6-py3-none-any.whl.
File metadata
- Download URL: cursus-1.3.6-py3-none-any.whl
- Upload date:
- Size: 1.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
27bdaf6731a3bfeea02249cd6108674f88928db16a0ceb8d2b7432e4bcb586ae
|
|
| MD5 |
4303a01a9a221e413af5c352d1c3c640
|
|
| BLAKE2b-256 |
a478cf41a9d3cd2a0fbbc0e11c1a98e94e4263b19abfb6d569edd31383d514f4
|