A Python library for building robust ETL pipelines with declarative stages and data flow management
Project description
stagecraft
A Python library for building robust ETL (Extract, Transform, Load) pipelines with declarative stages and powerful data flow management.
Features
- Pipeline Architecture: Build complex data pipelines using declarative stages and conditions
- Type-Safe Variables: Strongly-typed variable system with support for DataFrames, NumPy arrays, and serializable data
- Memory Management: Built-in memory tracking and optimization for data-intensive workflows
- Data Sources: Out-of-the-box support for CSV, JSON, and file-based data sources
- Conditional Execution: Flexible condition system for controlling stage execution
- Exception Handling: Comprehensive exception handling with custom wrappers
- Logging: Configurable logging system for pipeline monitoring
- Utility Functions: Rich set of utility functions for file operations, string manipulation, and more
Installation
pip install stagecraft
Quick Start
from stagecraft import (
PipelineDefinition,
PipelineRunner,
ETLStage,
DFVar,
)
# Define your pipeline stages
class LoadDataStage(ETLStage):
def recipe(self, **kwargs):
# Load your data
pass
# Create pipeline definition
pipeline = PipelineDefinition(
name="my_pipeline",
stages=[LoadDataStage()]
)
# Run the pipeline
runner = PipelineRunner()
result = runner.run(pipeline)
Examples
Check out the examples/ directory for comprehensive, runnable examples:
- basic_pipeline.py - Simple end-to-end pipeline with CSV loading, transformation, and saving
- dataframe_pipeline.py - DataFrame operations with Pandera schema validation
- conditional_execution.py - Conditional stage execution with various condition types
Each example is self-contained and demonstrates best practices. See the examples/README.md for detailed documentation.
Core Components
Pipeline System
PipelineDefinition: Define pipeline structure and stagesPipelineRunner: Execute pipelines with context managementETLStage: Base class for creating custom pipeline stagesPipelineContext: Manage pipeline state and variables
Variables
DFVar: pandas DataFrame variablesNDArrayVar: NumPy array variablesSVar: Serializable variables for general Python objects
Data Sources
CSVSource: Read data from CSV filesJSONSource: Read data from JSON filesFileSource: Read data from text files
Conditions
AlwaysExecute: Unconditional executionAndCondition/OrCondition: Combine multiple conditionsConfigFlagCondition: Execute based on configuration flagsVariableExistsCondition: Check variable presenceCustomCondition: Define custom execution logic
Utilities
- File operations:
read_file,write_file,append_file - CSV operations:
read_csv,write_csv,append_csv - JSON operations:
read_json,write_json,append_json - String utilities:
camel_to_snake_case,snake_to_camel_case, and more - Time utilities:
get_timestamp,get_current_date
Requirements
- Python 3.8+
Development
Install development dependencies:
pip install stagecraft[dev]
Run tests:
pytest
License
Apache-2.0 License - see LICENSE file for details
Contributing
This project is not accepting contributions at this time.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file stagecraft-0.1.6.tar.gz.
File metadata
- Download URL: stagecraft-0.1.6.tar.gz
- Upload date:
- Size: 55.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b98d487efa7d56a747d4c048998878501898ecf351c93aec471ff77c5abff2ac
|
|
| MD5 |
6cdb6954b12b40200e7c53feedc087a3
|
|
| BLAKE2b-256 |
6f7f15362419beafe4d1a5f0e08d3d15e602089f6996b1c03ac9163d01414c20
|
File details
Details for the file stagecraft-0.1.6-py3-none-any.whl.
File metadata
- Download URL: stagecraft-0.1.6-py3-none-any.whl
- Upload date:
- Size: 62.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
367014383e8ccf6035d248512b09cb7ce7c95ad50103c840a774fcb8a1db79c3
|
|
| MD5 |
c028644bfbe96f3b62ec85a16c3c5ab7
|
|
| BLAKE2b-256 |
de4b0db3539ac3375e4a3191c09da04d9692ff67e61ff47feefe3e8299fbb1b7
|