Rethinking Data and Feature Engineering

These details have not been verified by PyPI

Project links

Project description

mloda: Make data and feature engineering shareable

⚠️ Early Version Notice: mloda is in active development. Some features described below are still being implemented. We're actively seeking feedback to shape the future of the framework. Share your thoughts!

🍳 Think of mloda Like Cooking Recipes

Traditional Data Pipelines = Making everything from scratch

Want pasta? Make noodles, sauce, cheese from raw ingredients
Want pizza? Start over - make dough, sauce, cheese again
Want lasagna? Repeat everything once more
Can't share recipes easily - they're mixed with your kitchen setup

mloda = Using recipe components

Create reusable recipes: "tomato sauce", "pasta dough", "cheese blend"
Use same "tomato sauce" for pasta, pizza, lasagna
Switch kitchens (home → restaurant → food truck) - same recipes work
Share your "tomato sauce" recipe with friends - they don't need your whole kitchen

Result: Instead of rebuilding the same thing 10 times, build once and reuse everywhere!

Installation

pip install mloda

1. The Core API Call - Your Starting Point

The One Command That Does Everything

# This is the heart of mloda. You describe what you want and mloda resolves the dependencies.
from mloda_core.api.request import mlodaAPI

result = mlodaAPI.run_all(
    features=["age", "standard_scaled__weight"]
)

# That's it! You get processed data back
data = result[0]
print(data.head())

What just happened?

mloda found your data automatically
Applied transformations (scaling, encoding)
Returned clean, ready-to-use DataFrame

Key Insight: As long as the plugins and data accesses exist, mloda can derive any feature automatically.

2. Setting Up Your Data

Using DataCreator - The mloda Way

# DataCreator: Perfect for testing, demos, and prototyping
# Use this when you need synthetic data or want to test mloda without external files
from mloda_core.abstract_plugins.components.input_data.creator.data_creator import DataCreator
from mloda_core.abstract_plugins.abstract_feature_group import AbstractFeatureGroup

class SampleDataFeature(AbstractFeatureGroup):
    @classmethod
    def input_data(cls):
        # Define what columns your data will have
        return DataCreator({
            "age", "weight", "state", "income", "target"
        })
    
    @classmethod 
    def calculate_feature(cls, data, features):
        # Generate sample data that matches your DataCreator specification
        # This is where you'd normally load from files, databases, or APIs
        return {
            'age': [25, 30, 35, None, 45, 28, 33],
            'weight': [150, 180, None, 200, 165, 140, 175], 
            'state': ['CA', 'NY', 'TX', 'CA', 'FL', 'NY', 'TX'],
            'income': [50000, 75000, 85000, 60000, None, 45000, 70000],
            'target': [1, 0, 1, 0, 1, 0, 1]
        }

When to Use DataCreator vs Other Data Access Methods:

DataCreator: For testing, demos, synthetic data, or when you want to generate data programmatically within mloda
File Access (DataAccessCollection with files): When your data lives in CSV, JSON, Parquet, etc.
Database Access (DataAccessCollection with credentials): When connecting to SQL databases, data warehouses
API Access: When fetching data from REST APIs or other web services

Key Insight: DataCreator is mloda's built-in data generation tool - perfect for getting started without external dependencies. Once you're ready for production, switch to file or database access methods.

Quick Start with Your Own Data:

# Replace DataCreator with real data access
from mloda_core.abstract_plugins.components.data_access_collection import DataAccessCollection

# For files
data_access = DataAccessCollection(files={"your_data.csv"})

# For databases  
data_access = DataAccessCollection(
    credential_dicts=[{"host": "your-db.com", "username": "user"}]
)

3. Understanding What You Get Back

The Result Structure

from mloda_core.api.request import mlodaAPI
from mloda_plugins.compute_framework.base_implementations.pandas.dataframe import PandasDataframe

result = mlodaAPI.run_all(features, compute_frameworks={PandasDataframe})

# result is always a LIST of result objects
data_list = result  
# Each object matches your compute framework type: pd.DataFrame, spark.DataFrame, etc.

# Access your processed data
data = result[0]  # Most common case: single result
print(f"Shape: {data.shape}, Columns: {list(data.columns)}")

Key Insight: mloda returns a list of results. Most simple cases return a single DataFrame that you access with result[0].

4. The Features Parameter

Feature Object Syntax

from mloda_core.abstract_plugins.components.feature import Feature
from mloda_core.abstract_plugins.components.options import Options
from mloda_core.abstract_plugins.plugin_loader.plugin_loader import PluginLoader

# Load all available plugins (required before using features)
PluginLoader.all()

features = [
    "age",                                    # Simple string
    Feature(
            "weight_replaced",
            options=Options(
                group={
                    "imputation_method": "mean",
                    "mloda_source_feature": "weight",
                }
            ),
        ),
    "onehot_encoded__state"                  # Chaining syntax
]

Three Ways to Define Features:

Simple strings: For basic columns like "age"
Feature objects: For explicit configuration and advanced options
Chaining syntax: Convenient shorthand for transformations

5. Compute Frameworks

Choose Your Processing Engine

# Different processing engines
features = [
    Feature("age", compute_framework=PandasDataframe.get_class_name()),
    Feature("weight", compute_framework=PolarsDataframe.get_class_name()),
]

# Mixed - familiar, extensive ecosystem
result = mlodaAPI.run_all(features)

6. Data Access

Tell mloda Where Your Data Lives

from mloda_core.abstract_plugins.components.data_access_collection import DataAccessCollection

# Configure data sources
data_access = DataAccessCollection(
    files={"data/customers.csv"},                    # Specific files
    folders={"data/archive/"},                       # Entire directories
    credential_dicts=[{"host": "db.example.com"}]    # Database credentials
)

result = mlodaAPI.run_all(
    features=["age", "standard_scaled__income"],
    compute_frameworks={PandasDataframe},
    data_access_collection=data_access
)

Key Insight: Configure data access once globally, and all features can use it automatically.

7. Putting It All Together

Real-World Feature Engineering Pipeline

# Complete mlodaAPI call
result = mlodaAPI.run_all(
    # What you want
    features=[
        "standard_scaled__age",
        "onehot_encoded__state", 
        "mean_imputed__income"
    ],
    
    # How to process it
    compute_frameworks={PandasDataframe},
    
    # Where to get it
    data_access_collection=DataAccessCollection(files={"data/customers.csv"})
)

# Get your results
processed_data = result[0]
print(f"✅ Created {len(processed_data.columns)} features from {len(processed_data)} rows")

# Use in your ML pipeline
from sklearn.model_selection import train_test_split
X = processed_data.drop('target', axis=1)
y = processed_data['target'] 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

🎉 You now understand mloda's core workflow!

📖 Documentation

Getting Started - Installation and first steps
sklearn Integration - Complete tutorial
Feature Groups - Core concepts
Compute Frameworks - Technology integration
API Reference - Complete API documentation

🤝 Contributing

We welcome contributions! Whether you're building plugins, adding features, or improving documentation, your input is invaluable.

Development Guide - How to contribute
GitHub Issues - Report bugs or request features
Email - Direct contact

📄 License

This project is licensed under the Apache License, Version 2.0.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.2

May 7, 2026

0.6.1

Apr 9, 2026

0.6.0

Apr 6, 2026

0.5.7

Mar 30, 2026

0.5.6

Mar 28, 2026

0.5.5

Mar 24, 2026

0.5.4

Mar 21, 2026

0.5.3

Mar 15, 2026

0.5.2

Mar 15, 2026

0.5.1

Mar 14, 2026

0.5.0

Mar 11, 2026

0.4.8

Mar 3, 2026

0.4.7

Feb 13, 2026

0.4.6

Feb 11, 2026

0.4.5

Feb 6, 2026

0.4.4

Feb 4, 2026

0.4.3

Jan 14, 2026

0.4.2

Jan 14, 2026

0.4.1

Dec 17, 2025

0.4.0

Dec 16, 2025

0.3.3

Dec 4, 2025

0.3.2

Dec 3, 2025

0.3.1

Dec 2, 2025

0.3.0

Nov 30, 2025

0.2.15

Nov 28, 2025

0.2.14

Oct 22, 2025

This version

0.2.13

Oct 3, 2025

0.2.12

Jul 30, 2025

0.2.11

Jul 7, 2025

0.2.10

Jun 29, 2025

0.2.9

Apr 22, 2025

0.2.8

Apr 3, 2025

0.2.7

Mar 30, 2025

0.2.6

Mar 22, 2025

0.2.5

Feb 24, 2025

0.2.4

Feb 13, 2025

0.2.3

Jan 31, 2025

0.2.2

Jan 31, 2025

0.2.1

Jan 30, 2025

0.2.0

Jan 27, 2025

0.1.1

Jan 26, 2025

0.0.1

Oct 7, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mloda-0.2.13.tar.gz (200.8 kB view details)

Uploaded Oct 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mloda-0.2.13-py3-none-any.whl (308.8 kB view details)

Uploaded Oct 3, 2025 Python 3

File details

Details for the file mloda-0.2.13.tar.gz.

File metadata

Download URL: mloda-0.2.13.tar.gz
Upload date: Oct 3, 2025
Size: 200.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for mloda-0.2.13.tar.gz
Algorithm	Hash digest
SHA256	`a1535fb5c354cd348f75e9148c8ba0514efd4b09f6a37d6b01b9a8ef42a95954`
MD5	`61acb01718246a427633f0553abd0842`
BLAKE2b-256	`30711db54664f25cf803cc35e38340f97843eeb621e4e767f968264f534ea8be`

See more details on using hashes here.

File details

Details for the file mloda-0.2.13-py3-none-any.whl.

File metadata

Download URL: mloda-0.2.13-py3-none-any.whl
Upload date: Oct 3, 2025
Size: 308.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for mloda-0.2.13-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5b88e97a298b0e59da4d3550de95957545685094fafea29a566696e52af8f2a5`
MD5	`5807c6eb2b4d5f94603d0c9f45370534`
BLAKE2b-256	`880543e809ff6e76c1c53b14b058490e481f3ed96034f40dfbc88232d27c19a1`

See more details on using hashes here.

mloda 0.2.13

Navigation

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

mloda: Make data and feature engineering shareable

🍳 Think of mloda Like Cooking Recipes

Installation

1. The Core API Call - Your Starting Point

2. Setting Up Your Data

3. Understanding What You Get Back

4. The Features Parameter

5. Compute Frameworks

6. Data Access

7. Putting It All Together

📖 Documentation

🤝 Contributing

📄 License

This project is licensed under the Apache License, Version 2.0.

Project details

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes