Skip to main content

A lightweight Python framework for creating clear, reproducible, and scalable machine learning pipelines.

Project description

AtelierFlow 🎨

A lightweight, flexible Python framework for creating clear, reproducible, and scalable machine learning pipelines.

AtelierFlow helps you structure your ML code into a series of modular, reusable steps. It's designed to bring clarity and standardization to your experimentation process, from data loading to model evaluation, without imposing a heavy, restrictive structure.

✨ Features

  • Modular by Design: Build pipelines by chaining together independent, reusable Step components.

  • Centralized Configuration: Easily manage global settings like device (CPU/GPU), logging_level, and custom tags for your entire experiment from one place.

  • Extensible Library: Use pre-built, common steps for I/O, preprocessing, training, evaluation and others or easily create your own.

  • Framework Agnostic: While providing helpers for libraries like scikit-learn, the core is lightweight and can orchestrate any Python-based ML workflow.

  • Clear & Explicit: The framework prioritizes readability and explicit dependencies, making your pipelines easy to understand, share, and debug.

🚀 Installation

You can install the core framework via pip. Optional dependencies can be installed to add support for specific ML libraries.

# Install the core framework
pip install atelierflow

# To include support for scikit-learn based steps
pip install atelierflow[sklearn]

# To include support for pytorch based steps
pip install atelierflow[torch]

Quick Start

Here’s how to build and run a simple pipeline in just a few lines of code. This example uses pre-built steps to generate data, train a model, evaluate it, and save the results.

import logging
from sklearn.ensemble import RandomForestClassifier
from atelierflow.experiment import Experiment
from atelierflow.steps.common.save_data.save_to_avro import SaveToAvroStep

#  Steps not implemented
from atelierflow.steps.sklearn.evaluation import ClassificationEvaluationStep
from atelierflow.steps.sklearn.training import TrainModelStep

# 1. Define your components and schemas
model_component = RandomForestClassifier(n_estimators=50)
scores_schema = {'name': 'Scores', 'type': 'record', 'fields': [{'name': 'AUC', 'type': 'double'}]}

# 2. Create an Experiment and configure it
experiment = Experiment(
  name="Quick Start Classification",
  logging_level="INFO",
  tags={"project": "onboarding-example"}
)

# 3. Add steps to the pipeline
experiment.add_step(GenerateDataStep())
experiment.add_step(TrainModelStep(model=model_component))
experiment.add_step(ClassificationEvaluationStep(metrics={'AUC': roc_auc_score}))
experiment.add_step(
  SaveToAvroStep(
    output_path="./quick_start_results.avro",
    data_key='evaluation_scores',
    schema=scores_schema
  )
)

# 4. Run the experiment!
if __name__ == "__main__":
  final_results = experiment.run()
  logging.info(f"Pipeline complete. Results saved to ./quick_start_results.avro")

AtelierFlow is built around a few simple, powerful concepts.

  • Experiment: The main orchestrator. You create an Experiment instance, give it a name, and provide global configurations. It is responsible for running the steps in the correct order.

  • Step: A single, executable stage in your pipeline. A step can do anything: load data, train a model, or save a file. Every step receives the output of the previous step and the global experiment configuration.

  • StepResult: A simple key-value store that acts as the data carrier between steps. A step adds its outputs (e.g., result.add('trained_model', model)) and the next step retrieves them (input_data.get('trained_model')).

💡 Working with Pre-built Steps

When using pre-built steps like TrainModelStep or ClassificationEvaluationStep, it's crucial that the objects you pass to them adhere to the framework's core interfaces.

  • Models must implement the Model interface. Any object passed to TrainModelStep must be a class that inherits from atelierflow.core.model.Model. This ensures the step can reliably call methods like .fit() and .predict().

  • Metrics must implement the Metric interface. Similarly, any custom metric passed to an evaluation step must inherit from atelierflow.core.metric.Metric and implement the .compute() method.

This "programming to an interface" design is what gives AtelierFlow its flexibility. It allows the pre-built steps to work with any model or metric, as long as it follows the expected contract.

🛠️ Creating a Custom Step

Creating your own step is the primary way to extend AtelierFlow. It's as simple as inheriting from the Step base class and implementing the run method.

  • Inherit from Step: Create a new class that inherits from atelierflow.core.step.Step.

  • Use __init__ for Configuration: Pass any parameters your step needs to its constructor.

  • Implement run: This is where your logic goes. Use the input_data to get results from previous steps and use experiment_config to access global settings.

  • Return a StepResult: Your step must return a StepResult object, even if it's empty, to continue the pipeline.

Example: A Custom Hello World Step

import logging
from atelierflow.core.step import Step
from atelierflow.core.step_result import StepResult
from typing import Dict, Any, Optional

logger = logging.getLogger(__name__)

class HelloWorldStep(Step):
  """A simple example of a custom step."""
  def __init__(self, message: str):
    # 1. Use __init__ for step-specific parameters
    self.message = message

  def run(self, input_data: Optional[StepResult], experiment_config: Dict[str, Any]) -> StepResult:
    # 2. Implement your logic in the 'run' method
    
    # You can access global config
    exp_name = experiment_config.get('name', 'Default Experiment')
    
    logger.info(f"Hello from the '{exp_name}' experiment!")
    logger.info(f"Custom message for this step: {self.message}")
    
    # 3. Return a StepResult
    return input_data # Pass through the data to the next step

🤝 Contributing

Contributions are welcome! Whether it's adding new pre-built steps, improving documentation, or reporting bugs, please feel free to open an issue or submit a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atelierflow-0.1.3.tar.gz (98.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

atelierflow-0.1.3-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file atelierflow-0.1.3.tar.gz.

File metadata

  • Download URL: atelierflow-0.1.3.tar.gz
  • Upload date:
  • Size: 98.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for atelierflow-0.1.3.tar.gz
Algorithm Hash digest
SHA256 c25b75789a470439f808bb116e7672f74bf055edfb783857a303b5e2d84d4043
MD5 28f7962657450bc2ae618660a3e0aa89
BLAKE2b-256 419a093e0166d1eab7d96a47ac4ac0ef84ba64ebcdd2c1aace25610d0b142340

See more details on using hashes here.

Provenance

The following attestation bundles were made for atelierflow-0.1.3.tar.gz:

Publisher: python-publish.yml on IoTDataAtelier/atelierflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file atelierflow-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: atelierflow-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for atelierflow-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 547d4177358e0989cc653ea1db36999af7429072b80727105b60d056c3e886c9
MD5 ad0e8e026315f74e9e5730eb174bbd24
BLAKE2b-256 4b6fc259e1a4eacb587ed5a3ee7296b88d411c26c94ad0024a437a3a9203cf5a

See more details on using hashes here.

Provenance

The following attestation bundles were made for atelierflow-0.1.3-py3-none-any.whl:

Publisher: python-publish.yml on IoTDataAtelier/atelierflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page