Skip to main content

A flexible framework for machine learning pipelines

Project description

๐Ÿ”ฌ LabChain

The Modern ML Experimentation Framework

test_on_push Python 3.11+ License: AGPL-3.0 PyPI version Documentation

Build, experiment, and deploy ML pipelines with confidence

Documentation โ€ข Quick Start โ€ข Examples โ€ข Contributing


๐ŸŽฏ What is LabChain?

LabChain is a production-ready ML experimentation framework that combines the flexibility of research with the rigor of production deployment. Stop fighting with boilerplate code and focus on what matters: your models.

โœจ Why LabChain?

๐Ÿงฉ Modular by Design

  • Compose pipelines from reusable filters
  • Plug-and-play architecture
  • No vendor lock-in

๐Ÿš€ Production Ready

  • Automatic caching and versioning
  • Distributed processing support
  • Cloud-native storage backends

๐Ÿ”„ Reproducible

  • Version-controlled experiments
  • Deterministic pipelines
  • Full audit trails

โšก Experimental Features

  • Remote code injection
  • Zero-deployment pipelines
  • Automatic dependency management

๐Ÿš€ Quick Start

Installation

pip install framework3

Your First Pipeline (2 minutes)

from labchain import Container, F3Pipeline
from labchain.plugins.filters import StandardScalerPlugin, KnnFilter
from labchain.plugins.metrics import F1, Precission, Recall
from labchain.base import XYData
from sklearn.datasets import load_iris

# Load data
iris = load_iris()
X = XYData.mock(iris.data)
y = XYData.mock(iris.target)

# Build pipeline
pipeline = F3Pipeline(
    filters=[
        StandardScalerPlugin(),
        KnnFilter(n_neighbors=5)
    ],
    metrics=[F1("weighted"), Precission("weighted"), Recall("weighted")]
)

# Train and evaluate
pipeline.fit(X, y)
predictions = pipeline.predict(X)
results = pipeline.evaluate(X, y, predictions)

print(results)
# {'F1': 0.95, 'Precision': 0.95, 'Recall': 0.95}

That's it! ๐ŸŽ‰ You just built, trained, and evaluated an ML pipeline.


๐Ÿ’ก Key Features

๐Ÿ—๏ธ Modular Architecture

# Mix and match components like LEGO blocks
from labchain.plugins.filters import (
    PCAPlugin,
    StandardScalerPlugin,
    ClassifierSVMPlugin
)

pipeline = F3Pipeline(
    filters=[
        StandardScalerPlugin(),
        PCAPlugin(n_components=2),
        ClassifierSVMPlugin(kernel='rbf')
    ]
)

๐Ÿ”„ Smart Caching

from labchain.plugins.filters import Cached

# Cache expensive operations automatically
pipeline = F3Pipeline(
    filters=[
        Cached(
            filter=ExpensivePreprocessor(),
            cache_data=True,
            cache_filter=True
        ),
        MyModel()
    ]
)

๐Ÿ“Š Hyperparameter Optimization

from labchain import WandbOptimizer

# Optimize with Weights & Biases
optimizer = WandbOptimizer(
    project="my-experiment",
    scorer=F1(),
    method="bayes",
    n_trials=50
)

# Define search space
pipeline = F3Pipeline(
    filters=[
        KnnFilter().grid({
            'n_neighbors': [3, 5, 7, 9]
        })
    ]
)

optimizer.optimize(pipeline)
optimizer.fit(X_train, y_train)

โšก Remote Injection (Experimental)

Deploy pipelines without deploying code:

# On your laptop
@Container.bind(persist=True)
class MyCustomFilter(BaseFilter):
    def predict(self, x):
        return x * 2

Container.storage = S3Storage(bucket="my-models")
Container.ppif.push_all()

# On production server (no source code needed!)
from labchain.base import BasePlugin

pipeline = BasePlugin.build_from_dump(config, Container.ppif)
predictions = pipeline.predict(data)  # Just works! โœจ

๐ŸŒ Distributed Processing (Experimental)

from labchain import HPCPipeline

# Automatic Spark distribution
pipeline = HPCPipeline(
    app_name="distributed-training",
    filters=[Filter1(), Filter2(), Filter3()]
)

pipeline.fit(large_dataset)

๐Ÿ“š Examples

Classification with Cross-Validation
from labchain import F3Pipeline, KFoldSplitter
from labchain.plugins.filters import StandardScalerPlugin, ClassifierSVMPlugin
from labchain.plugins.metrics import F1, Precission, Recall

pipeline = F3Pipeline(
    filters=[
        StandardScalerPlugin(),
        ClassifierSVMPlugin(kernel='rbf', C=1.0)
    ],
    metrics=[F1(), Precission(), Recall()]
).splitter(
    KFoldSplitter(n_splits=5, shuffle=True, random_state=42)
)

pipeline.fit(X_train, y_train)
results = pipeline.evaluate(X_test, y_test, pipeline.predict(X_test))
Parallel Processing
from labchain import LocalThreadPipeline
from labchain.plugins.filters import Filter1, Filter2, Filter3

# Process filters in parallel
pipeline = LocalThreadPipeline(
    filters=[
        Filter1(),  # Runs in parallel
        Filter2(),  # Runs in parallel
        Filter3()   # Runs in parallel
    ]
)

# Results are concatenated automatically
predictions = pipeline.predict(X)
Custom Components
from labchain import Container
from labchain.base import BaseFilter, XYData

@Container.bind()
class MyCustomFilter(BaseFilter):
    def __init__(self, threshold: float = 0.5):
        super().__init__(threshold=threshold)

    def fit(self, x: XYData, y: XYData = None):
        # Your training logic
        pass

    def predict(self, x: XYData) -> XYData:
        # Your prediction logic
        return XYData.mock(x.value > self.threshold)

# Use it like any other filter

pipeline = F3Pipeline(filters=[MyCustomFilter(threshold=0.7)])
Version Control & Rollback
# Version 1
@Container.bind(persist=True)
class MyModel(BaseFilter):
    def predict(self, x):
        return x * 1

Container.ppif.push_all()
hash_v1 = Container.pcm.get_class_hash(MyModel)

# Version 2
@Container.bind(persist=True)
class MyModel(BaseFilter):
    def predict(self, x):
        return x * 2

Container.ppif.push_all()
hash_v2 = Container.pcm.get_class_hash(MyModel)

# Rollback to V1
ModelV1 = Container.ppif.get_version("MyModel", hash_v1)

๐Ÿ“– Documentation

Resource Description
๐Ÿ“˜ Quick Start Guide Get up and running in 5 minutes
๐ŸŽ“ Tutorials Step-by-step guides and examples
๐Ÿ“š API Reference Complete API documentation
โšก Remote Injection Deploy without code (experimental)
๐Ÿ—๏ธ Architecture Deep dive into design principles
๐Ÿ’ก Best Practices Production-ready patterns

๐Ÿ› ๏ธ Supported Components

Filters

  • โœ… Classification (SVM, KNN, Random Forest, etc.)
  • โœ… Clustering (KMeans, DBSCAN, etc.)
  • โœ… Transformation (PCA, StandardScaler, etc.)
  • โœ… Text Processing (TF-IDF, Embeddings, etc.)
  • โœ… Custom filters (extend BaseFilter)

Pipelines

  • โœ… F3Pipeline: Sequential execution
  • โœ… MonoPipeline: Parallel execution
  • โœ… HPCPipeline: Spark-based distribution

Optimizers

  • โœ… Optuna: Bayesian optimization
  • โœ… Weights & Biases: Experiment tracking
  • โœ… Grid Search: Exhaustive search
  • โœ… Sklearn: Scikit-learn integration

Storage

  • โœ… Local Storage: Filesystem caching
  • โœ… S3 Storage: Cloud-native storage
  • โœ… Custom backends: Extend BaseStorage

๐Ÿšฆ Roadmap

  • Core pipeline functionality
  • Automatic caching system
  • Hyperparameter optimization
  • Distributed processing (Spark)
  • Remote injection (experimental)
  • Multi-cloud storage backends (GCS, Azure)
  • Real-time inference API
  • AutoML capabilities
  • Model registry integration
  • Kubernetes deployment templates

๐Ÿค Contributing

We โค๏ธ contributions! Here's how you can help:

Ways to Contribute

  • ๐Ÿ› Report bugs by opening an issue
  • ๐Ÿ’ก Suggest features in discussions
  • ๐Ÿ“ Improve documentation
  • ๐Ÿ”ง Submit pull requests
  • โญ Star the repo to show support

Development Setup

# Clone the repository
git clone https://github.com/manucouto1/LabChain.git
cd LabChain

# Install dependencies
pip install -r requirements.txt

# Run tests
pytest tests/

# Build documentation
cd docs && mkdocs serve

Guidelines

  • Follow PEP 8 style guide
  • Add tests for new features
  • Update documentation
  • Keep commits atomic and well-described

๐Ÿ“Š Community & Support

GitHub issues GitHub pull requests GitHub stars


๐Ÿ“œ License

This project is licensed under the AGPL-3.0 License - see the LICENSE file for details.

What this means:

  • โœ… Use LabChain for free in your projects
  • โœ… Modify and distribute the code
  • โš ๏ธ If you modify and distribute LabChain, you must release your changes under AGPL-3.0
  • โš ๏ธ If you use LabChain in a network service, you must make the source available

โฌ† back to top

Made with โ˜• and Python

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

framework3-1.2.1.tar.gz (104.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

framework3-1.2.1-py3-none-any.whl (147.8 kB view details)

Uploaded Python 3

File details

Details for the file framework3-1.2.1.tar.gz.

File metadata

  • Download URL: framework3-1.2.1.tar.gz
  • Upload date:
  • Size: 104.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.11.14 Linux/6.11.0-1018-azure

File hashes

Hashes for framework3-1.2.1.tar.gz
Algorithm Hash digest
SHA256 fb8f646e6e2f368ace71012bc98ea9c4d1ff2ca7463127c9a378c21cac423a71
MD5 cd45c592bfe74dbf82fee495c44d0c40
BLAKE2b-256 e7863b1f3f2660f061480a8ba8f0d57c20f67a5f9a9a41b37b07038bcafa7925

See more details on using hashes here.

File details

Details for the file framework3-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: framework3-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 147.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.11.14 Linux/6.11.0-1018-azure

File hashes

Hashes for framework3-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9561e919235415233cf6fdccced8ec3b0d5e0508789ff3dcbea123149df30b01
MD5 3f08eb2f8eaf71c063cd682cb87c103d
BLAKE2b-256 a03fc2c7e5bdb5e831617bb46e06d6de6715ff7b141409508d630570658c3444

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page