Rethinking Data and Feature Engineering

These details have not been verified by PyPI

Project links

Project description

mloda: Revolutionary Process-Data Separation for Feature and Data Engineering

- - -

⚠️ Early Version Notice: mloda is in active development. Some features described below are still being implemented. We're actively seeking feedback to shape the future of the framework. Share your thoughts!

🚀 Transforming Feature Engineering Through Process-Data Separation

mloda revolutionizes feature engineering by separating processes (transformations) from data, enabling unprecedented flexibility, reusability, and scalability in machine learning workflows.

🤖 Built for the AI Era: While others write code, AI writes mloda plugins. Check the inline comments in our experimental plugin code - all AI written.

🌐 Share Without Secrets: Traditional pipelines lock business logic inside - mloda plugins separate transformations from business context, enabling safe community sharing.

🎯 Try the first example out NOW: sklearn Integration Example - See mloda transform traditional sklearn pipelines!

📋 Table of Contents

🍳 Think of mloda Like Cooking Recipes
💡 The Value Proposition
📊 Why Process-Data Separation Changes Everything
🚀 Quick Start
🔄 Write Once, Run Anywhere
🌍 Deploy Anywhere Python Runs
🎯 Minimal Dependencies
🔧 Complete Data Processing
👥 Role-Based Governance
🌐 Community-Driven Plugin Ecosystem
📖 Documentation
🤝 Contributing
📄 License

🍳 Think of mloda Like Cooking Recipes

Traditional Data Pipelines = Making everything from scratch

Want pasta? Make noodles, sauce, cheese from raw ingredients
Want pizza? Start over - make dough, sauce, cheese again
Want lasagna? Repeat everything once more
Can't share recipes easily - they're mixed with your kitchen setup

mloda = Using recipe components

Create reusable recipes: "tomato sauce", "pasta dough", "cheese blend"
Use same "tomato sauce" for pasta, pizza, lasagna
Switch kitchens (home → restaurant → food truck) - same recipes work
Share your "tomato sauce" recipe with friends - they don't need your whole kitchen

Real Example: You need to clean customer ages (remove outliers, fill missing values)

Traditional: Write age-cleaning code for training, testing, production separately
mloda: Create one "clean_age" plugin, use everywhere - development, testing, production, analysis

Result: Instead of rebuilding the same thing 10 times, build once and reuse everywhere!

💡 The Value Proposition

What mloda aims to enable:

Challenge	Traditional Pain Point	mloda's Approach
⏰ Repetitive Work	Rebuild same transformations for each environment	Write once, reuse across all environments
🐛 Consistency Issues	Different implementations create bugs	Single implementation ensures consistency
👥 Knowledge Silos	Senior expertise locked in complex pipelines	Reusable patterns everyone can use
🚀 Deployment Friction	Train/serve skew causes production issues	Same logic guaranteed everywhere
💡 Innovation Bottleneck	Time spent on solved problems	Focus energy on unique business value

Vision: Enable data teams to spend more time solving unique business problems and less time rebuilding common patterns, while reducing the risk of inconsistencies across environments.

📊 Why Process-Data Separation Changes Everything

Aspect	Traditional Approach	mloda Approach
🔄 Reusability	Transformations tied to specific datasets	Same feature definitions work across all contexts
⚡ Flexibility	Locked to single compute framework	Multi-framework support with automatic optimization
📝 Maintainability	Complex nested pipeline objects	Clean, declarative feature names
🏭 Scalability	Framework-specific limitations	Horizontal scaling without architectural changes

For those who know: Want Iceberg-like metadata capabilities across your entire data and feature lifecycle? That's exactly what mloda aims for.

🚀 Quick Start

Installation

pip install mloda

Your First Feature Pipeline

import numpy as np
from mloda_core.api.request import mlodaAPI
from mloda_plugins.compute_framework.base_implementations.pandas.dataframe import PandasDataframe
from mloda_core.abstract_plugins.components.input_data.creator.data_creator import DataCreator
from mloda_core.abstract_plugins.abstract_feature_group import AbstractFeatureGroup

np.random.seed(42)
n_samples = 1000

class YourFirstSyntheticDataSet(AbstractFeatureGroup):
    @classmethod
    def input_data(cls):
        return DataCreator({"age", "weight", "state", "gender"})

    @classmethod
    def calculate_feature(cls, data, features):
        return {
                "age": np.random.randint(25, 65, 500),
                "weight": np.random.normal(80, 20, 500),  # Different distribution
                "state": np.random.choice(["WA", "OR"], 500),  # Different states!
                "gender": np.random.choice(["M", "F", "Other"], 500),  # New category!
            }

# Define features with automatic dependency resolution
features = [
    "standard_scaled__mean_imputed__age",
    "onehot_encoded__state", 
    "robust_scaled__weight"
]

# Execute with automatic framework selection
result = mlodaAPI.run_all(features, compute_frameworks={PandasDataframe})

🔄 Write Once, Run Anywhere: Environments & Frameworks

The Core Promise: One plugin definition works across all environments and technologies.

# Traditional approach: Rebuild for each context
def clean_age_training(data): ...      # Training pipeline
def clean_age_testing(data): ...       # Testing pipeline  
def clean_age_production(data): ...    # Production API
def clean_age_spark(data): ...         # Big data processing
def clean_age_analysis(data): ...      # Analytics

# mloda approach: Write once, use everywhere
class CleanAgePlugin(AbstractFeatureGroup):
    @classmethod
    def calculate_feature(cls, data, features):
        # Single implementation for all contexts
        return process_age_data(data["age"])

# Same plugin, different environments & frameworks
mlodaAPI.run_all(["clean_age"], compute_frameworks={PandasDataframe})  # Dev
mlodaAPI.run_all(["clean_age"], compute_frameworks={SparkDataframe})   # Production
mlodaAPI.run_all(["clean_age"], compute_frameworks={PolarsDataframe})  # High performance
mlodaAPI.run_all(["clean_age"], compute_frameworks={DuckDBFramework})  # Analytics

Result: 5+ implementations → 1 plugin that adapts automatically.

Different Data Scales, Same Processing Logic

graph TB
    subgraph "📊 Data Scenarios"
        CSV["📄 Development<br/>Small CSV files<br/>~1K rows"]
        BATCH["🏋️ Training<br/>Full dataset<br/>~1M+ rows"]
        SINGLE["⚡ Inference<br/>Single row<br/>Real-time"]
        ANALYSIS["📈 Analysis<br/>Historical batch<br/>Post-deployment"]
    end
    
    subgraph "🎯 Same Features Applied"
        RESULT["standard_scaled__mean_imputed__age<br/>onehot_encoded__state<br/>robust_scaled__weight<br/><br/>"]
    end
    
    CSV --> RESULT
    BATCH --> RESULT
    SINGLE --> RESULT
    ANALYSIS --> RESULT
    
    style CSV fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    style BATCH fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    style SINGLE fill:#e1f5fe,stroke:#0288d1,stroke-width:2px
    style ANALYSIS fill:#fce4ec,stroke:#c2185b,stroke-width:2px
    style RESULT fill:#e8f5e8,stroke:#4caf50,stroke-width:3px

🌍 Deploy Anywhere Python Runs

Universal Deployment: mloda runs wherever Python runs - no special infrastructure needed.

Environment	Use Case	Example
💻 Local Development	Prototyping & testing	Jupyter notebooks, VS Code
☁️ Any Cloud	Production workloads	AWS, GCP, Azure, DigitalOcean
🏢 On-Premise	Enterprise & compliance	Air-gapped environments
📊 Notebooks	Data science workflows	Jupyter, Colab, Databricks
🌐 Web APIs	Real-time serving	Flask, FastAPI, Django
⚙️ Orchestration	Batch processing	Airflow, Prefect, Dagster
🐳 Containers	Microservices	Docker, Kubernetes
⚡ Serverless	Event-driven	AWS Lambda, Google Functions

No vendor lock-in. No special runtime. Just Python.

🎯 Minimal Dependencies, Maximum Compatibility

PyArrow-Only Core: mloda uses only PyArrow as its core dependency - no other Python modules required.

Why PyArrow? It's the universal language of modern data:

Interoperability: Native bridge between Pandas, Polars, Spark, DuckDB
Performance: Zero-copy data sharing between frameworks
Standards: Apache Arrow is the foundation of modern data tools
Future-Proof: Industry standard for columnar data processing

This architectural choice enables mloda's seamless framework switching without dependency conflicts.

🔧 Complete Data Processing Capabilities

Beyond Feature Engineering: mloda provides full data processing operations:

Operation	Purpose	Example Use Case
🔗 Joins	Combine datasets	User profiles + transaction history
🔀 Merges	Consolidate data sources	Multiple feature tables into one
🔍 Filters	Data selection & quality	Remove outliers, select time ranges
🏷️ Domain	Data organization & governance	Logical data grouping and access control

All operations work seamlessly across any compute framework with the same simple API.

👥 Logical Role-Based Data Governance

Clear Role Separation: mloda logically splits data responsibilities into three distinct roles:

Role	Responsibility	Key Activities
🏗️ Data Producer	Create & maintain plugins	Define data access, implement feature groups, ensure quality
👤 Data User	Consume features via API	Request features, configure workflows, build ML models
🛡️ Data Owner	Governance & lifecycle	Control access, manage compliance, oversee data quality

Organizational Clarity: Each role has defined boundaries, enabling proper data governance while maintaining development flexibility. Learn more about roles

🌐 Community-Driven Plugin Ecosystem

Share Transformations, Keep Secrets: Unlike traditional pipelines where business logic is embedded, mloda separates transformation patterns from business context.

Challenge	Traditional Pipelines	mloda Solution
🔒 Knowledge Sharing	Business logic embedded - can't share	Transformations separated - safe to share
🔄 Reusability	Rebuild common patterns everywhere	Community library of proven patterns
⚡ Innovation	Everyone reinvents the wheel	Build on collective knowledge
🎯 Focus	Waste time on solved problems	Focus on unique business value

Result: A thriving ecosystem where data teams contribute transformation patterns while protecting their competitive advantages.

📖 Documentation

Getting Started - Installation and first steps
sklearn Integration - Complete tutorial
Feature Groups - Core concepts
Compute Frameworks - Technology integration
API Reference - Complete API documentation

🤝 Contributing

We welcome contributions! Whether you're building plugins, adding features, or improving documentation, your input is invaluable.

Development Guide - How to contribute
GitHub Issues - Report bugs or request features
Email - Direct contact

📄 License

This project is licensed under the Apache License, Version 2.0.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.2

May 7, 2026

0.6.1

Apr 9, 2026

0.6.0

Apr 6, 2026

0.5.7

Mar 30, 2026

0.5.6

Mar 28, 2026

0.5.5

Mar 24, 2026

0.5.4

Mar 21, 2026

0.5.3

Mar 15, 2026

0.5.2

Mar 15, 2026

0.5.1

Mar 14, 2026

0.5.0

Mar 11, 2026

0.4.8

Mar 3, 2026

0.4.7

Feb 13, 2026

0.4.6

Feb 11, 2026

0.4.5

Feb 6, 2026

0.4.4

Feb 4, 2026

0.4.3

Jan 14, 2026

0.4.2

Jan 14, 2026

0.4.1

Dec 17, 2025

0.4.0

Dec 16, 2025

0.3.3

Dec 4, 2025

0.3.2

Dec 3, 2025

0.3.1

Dec 2, 2025

0.3.0

Nov 30, 2025

0.2.15

Nov 28, 2025

0.2.14

Oct 22, 2025

0.2.13

Oct 3, 2025

0.2.12

Jul 30, 2025

This version

0.2.11

Jul 7, 2025

0.2.10

Jun 29, 2025

0.2.9

Apr 22, 2025

0.2.8

Apr 3, 2025

0.2.7

Mar 30, 2025

0.2.6

Mar 22, 2025

0.2.5

Feb 24, 2025

0.2.4

Feb 13, 2025

0.2.3

Jan 31, 2025

0.2.2

Jan 31, 2025

0.2.1

Jan 30, 2025

0.2.0

Jan 27, 2025

0.1.1

Jan 26, 2025

0.0.1

Oct 7, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mloda-0.2.11.tar.gz (191.3 kB view details)

Uploaded Jul 7, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mloda-0.2.11-py3-none-any.whl (302.2 kB view details)

Uploaded Jul 7, 2025 Python 3

File details

Details for the file mloda-0.2.11.tar.gz.

File metadata

Download URL: mloda-0.2.11.tar.gz
Upload date: Jul 7, 2025
Size: 191.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for mloda-0.2.11.tar.gz
Algorithm	Hash digest
SHA256	`c88d40bb37aabe0a474d15cacbd9e62325ac6c1871674cb216db5346dc45a2b9`
MD5	`011ab0d2f29362c3e52031f49ba35465`
BLAKE2b-256	`26815975749bd8400cf81a1bedbb7cd45fd0e583fd83f5a3e26aee9a33fdd020`

See more details on using hashes here.

File details

Details for the file mloda-0.2.11-py3-none-any.whl.

File metadata

Download URL: mloda-0.2.11-py3-none-any.whl
Upload date: Jul 7, 2025
Size: 302.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for mloda-0.2.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f89fe4b6e305644218b41cc83756ca682646d78f81cd4ff1d1c18c9bbd671ae0`
MD5	`2adb271571f07b2af305bc3044c3f4bb`
BLAKE2b-256	`37e6ea1138340c1e339816c30747d644d9d3a2507a21adae0f526be080fbbb99`

See more details on using hashes here.

mloda 0.2.11

Navigation

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

mloda: Revolutionary Process-Data Separation for Feature and Data Engineering

🚀 Transforming Feature Engineering Through Process-Data Separation

📋 Table of Contents

🍳 Think of mloda Like Cooking Recipes

💡 The Value Proposition

📊 Why Process-Data Separation Changes Everything

🚀 Quick Start

Installation

Your First Feature Pipeline

🔄 Write Once, Run Anywhere: Environments & Frameworks

Different Data Scales, Same Processing Logic

🌍 Deploy Anywhere Python Runs

🎯 Minimal Dependencies, Maximum Compatibility

🔧 Complete Data Processing Capabilities

👥 Logical Role-Based Data Governance

🌐 Community-Driven Plugin Ecosystem

📖 Documentation

🤝 Contributing

📄 License

Project details

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes