mloda.ai: Open Data Access for ML and AI

These details have not been verified by PyPI

Project links

Project description

mloda.ai: Open Data Access for ML & AI

Declarative data access for AI agents. Describe what you need - mloda delivers it.

pip install mloda

30-Second Example

Your AI describes what it needs. mloda figures out how to get it:

from mloda.user import PluginLoader, mloda
PluginLoader.all()

result = mloda.run_all(
    features=["customer_id", "income", "income__sum_aggr", "age__avg_aggr"],
    compute_frameworks=["PandasDataFrame"],
    api_data={"SampleData": {
        "customer_id": ["C001", "C002", "C003", "C004", "C005"],
        "age": [25, 35, 45, 30, 50],
        "income": [50000, 75000, 90000, 60000, 85000]
    }}
)

Copy, paste, run. mloda resolves dependencies, chains plugins, delivers data.

What mloda Does

┌─────────────────────────────────────────────────────────────────┐
│                      DATA USERS                                 │
│  AI Agents  •  ML Pipelines  •  Data Science  •  Analytics      │
└───────────────────────────┬─────────────────────────────────────┘
                            │ describe what they need
                            ▼
                    ┌───────────────┐
                    │     mloda     │  ← resolves HOW from WHAT
                    │   [Plugins]   │
                    └───────────────┘
                            │ delivers trusted data
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                     DATA SOURCES                                │
│  Databases  •  APIs  •  Files  •  Any source via plugins        │
└─────────────────────────────────────────────────────────────────┘

Why mloda?

You want to...	mloda gives you...
Give AI agents data access	Declarative API - agents describe WHAT, not HOW
Trace every result	Built-in lineage back to source
Reuse across projects	Plugins work anywhere - notebook to production
Mix data sources	One interface for DBs, APIs, files, anything

AI Use Case: LLM Tool Function

Let LLMs request data without writing code:

# LLM generates this JSON
llm_request = '["customer_id", {"name": "income__sum_aggr"}]'

# mloda executes it
from mloda.user import load_features_from_config
features = load_features_from_config(llm_request, format="json")
result = mloda.run_all(
    features=features,
    compute_frameworks=["PandasDataFrame"],
    api_data={"SampleData": {"customer_id": ["C001", "C002"], "income": [50000, 75000]}}
)

More patterns: Context Window Assembly • RAG Pipelines

How mloda is Different

mloda separates WHAT you need from HOW to get it - through plugins. Existing tools solve parts of this, but none bridge the full gap:

Category	Products	What it does	Why it's not enough
Feature Stores	Feast, Tecton, Featureform	Store + serve features	Infrastructure-tied, storage-only
Semantic Layers	dbt Semantic Layer, Cube	Declarative metrics	SQL-only, centralized
DAG Frameworks	Hamilton, Kedro	Dataflows as code	Function-first, no plugin abstraction
Data Catalogs	DataHub, Atlan	Metadata & discovery	No execution, no contracts
ORMs	SQLAlchemy, Django ORM	Database abstraction	Single database, no ML lifecycle

mloda is the connection layer - separating WHAT you compute from HOW you compute it. Plugins define transformations. Users describe requirements. mloda resolves the pipeline.

Plugins: The Building Blocks

mloda's architecture follows three roles: providers (define plugins), users (access data), and stewards (govern execution). The module structure reflects this: mloda.provider, mloda.user, mloda.steward.

mloda uses three types of plugins:

Type	What it does
FeatureGroup	Defines data transformations
ComputeFramework	Execution backend (Pandas, Spark, etc.)
Extender	Hooks for logging, validation, monitoring

Most of the time, you'll work with FeatureGroups - Python classes that define how to access and transform data (see Quick Example above).

Why plugins?

Steps, not pipelines - Build transformations. mloda wires them together.
Small and testable - Each plugin is a focused unit. Easy to test, easy to debug.
AI-friendly - Small, template-like structures. Let AI generate plugins for you.
Share what isn't secret - Your pipeline runs steps a,b,c,d. Steps b,c,d have no proprietary logic? Share them across projects, teams, even organizations.
Experiment to production - Same plugins in your notebook and your cluster. No rewrite.
Stand on shoulders - Combine community plugins with your own. Build on what exists.

AI Use Case Patterns

1. LLM Tool Function

Give LLMs deterministic data access - they declare what, mloda handles how:

from mloda.user import PluginLoader, load_features_from_config, mloda
PluginLoader.all()

# LLM generates this JSON (no Python code needed)
llm_output = '''
[
    "customer_id",
    {"name": "income__sum_aggr"},
    {"name": "age__avg_aggr"},
    {"name": "total_spend", "options": {"aggregation_type": "sum", "in_features": "income"}}
]
'''

# mloda parses JSON into Feature objects
features = load_features_from_config(llm_output, format="json")

result = mloda.run_all(
    features=features,
    compute_frameworks=["PandasDataFrame"],
    api_data={"SampleData": {
        "customer_id": ["C001", "C002", "C003"],
        "income": [50000, 75000, 90000],
        "age": [25, 35, 45]
    }}
)

LLM-friendly: The agent only declares what it needs - mloda handles the rest.

2. Context Window Assembly

Gather context from multiple sources declaratively - mloda validates and delivers. Why not let an AI agent do it?

Example: This shows the API pattern. Requires custom FeatureGroup implementations for your data sources.

from mloda.user import Feature, mloda

# Build complete context from multiple sources
features = [
    Feature(name="system_instructions", options={"template": "support_agent"}),
    Feature(name="user_profile", options={"user_id": user_id, "include_preferences": True}),
    Feature(name="knowledge_base", options={"query": user_query, "top_k": 5}),
    Feature(name="conversation_history", options={"limit": 20, "summarize_old": True}),
    Feature(name="available_tools", options={"category": "customer_service"}),
    Feature(name="output_format", options={"format": "markdown", "max_length": 500}),
]

result = mloda.run_all(
    features=features,
    compute_frameworks=["PythonDictFramework"],
    api_data={"UserQuery": {"query": [user_query]}}
)

# Each feature resolved via its plugin, validated

3. RAG with Feature Chaining

Build RAG pipelines declaratively - mloda chains the steps for you.

Example: This shows the chaining syntax. Requires custom FeatureGroup implementations for retrieval and processing.

# String-based chaining: query -> validate -> retrieve -> redact
Feature(name="user_query__injection_checked__retrieved__pii_redacted")

# Configuration-based chaining: explicit pipeline
Feature(
    name="safe_context",
    options=Options(context={
        "in_features": "documents__retrieved__pii_redacted",
        "redact_types": ["email", "phone", "ssn"]
    })
)

mloda resolves the full chain - you declare the end result, not the steps.

Automatic dependency resolution: You only declare what you need. If pii_redacted depends on retrieved which depends on documents, just ask for pii_redacted - mloda traces back and resolves the full chain.

Beyond string-based chaining, you can declare dependencies directly via input_features(). Each plugin states what it needs, mloda resolves the rest. Because resolution depends on which plugins are registered, the same request can have different chain lengths per environment: realtime might resolve income straight from a live source, while a RAG pipeline routes it through ETL, validation, and enrichment first. The calling code stays the same:

class RiskAssessment(FeatureGroup):
    def input_features(self, options, feature_name):
        return {Feature("debt_to_income"), Feature("age"), Feature("employment_years")}

class DebtToIncome(FeatureGroup):
    def input_features(self, options, feature_name):
        return {Feature("debt"), Feature("income")}

# Request only risk_assessment. mloda auto-resolves:
#   risk_assessment -> debt_to_income -> {debt, income}
#                   -> age, employment_years
result = mloda.run_all(features=[Feature(name="risk_assessment")], ...)

Compute Frameworks

Mix multiple backends in a single pipeline - mloda routes each feature to the right framework:

result = mloda.run_all(
    features=[...],
    compute_frameworks=["PandasDataFrame", "PolarsDataFrame", "SparkFramework"]
)

# Results may come from different frameworks based on plugin compatibility

Add your own frameworks - mloda is extensible.

Extenders

Wrap plugin execution for logging, validation, or lineage tracking:

import time
from mloda.steward import Extender, ExtenderHook

class LogExecutionTime(Extender):
    def wraps(self):
        return {ExtenderHook.FEATURE_GROUP_CALCULATE_FEATURE}

    def __call__(self, func, *args, **kwargs):
        start = time.time()
        result = func(*args, **kwargs)
        print(f"Took {time.time() - start:.2f}s")
        return result

# Use it
result = mloda.run_all(features, function_extender={LogExecutionTime()})

Built-in and custom extenders give you full lineage - trace any result back to its source.

When to Use mloda

Use mloda when:

Your agents need data from multiple sources
You want consistent, validated data access
You need traceability (audit, debugging)
Multiple agents share the same data patterns

Don't use mloda for:

Single database, simple queries → use an ORM
One-off scripts → just write the code
Real-time streaming (<5ms) → use Kafka/Flink

Documentation

Getting Started - Installation and first steps
Plugin Development - Build your own plugins
API Reference - Complete API docs

Ecosystem

Most plugins currently live in mloda_plugins/ within this repository. The goal is to gradually migrate them to standalone packages in the registry.

Repository	Description
mloda-registry	Official plugin packages and 40+ development guides
mloda-plugin-template	Cookiecutter template for creating standalone plugins

Contributing

We welcome contributions! Build plugins, improve docs, or add features.

GitHub Issues - Report bugs or request features
Development Guide - How to contribute

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.2

May 7, 2026

0.6.1

Apr 9, 2026

This version

0.6.0

Apr 6, 2026

0.5.7

Mar 30, 2026

0.5.6

Mar 28, 2026

0.5.5

Mar 24, 2026

0.5.4

Mar 21, 2026

0.5.3

Mar 15, 2026

0.5.2

Mar 15, 2026

0.5.1

Mar 14, 2026

0.5.0

Mar 11, 2026

0.4.8

Mar 3, 2026

0.4.7

Feb 13, 2026

0.4.6

Feb 11, 2026

0.4.5

Feb 6, 2026

0.4.4

Feb 4, 2026

0.4.3

Jan 14, 2026

0.4.2

Jan 14, 2026

0.4.1

Dec 17, 2025

0.4.0

Dec 16, 2025

0.3.3

Dec 4, 2025

0.3.2

Dec 3, 2025

0.3.1

Dec 2, 2025

0.3.0

Nov 30, 2025

0.2.15

Nov 28, 2025

0.2.14

Oct 22, 2025

0.2.13

Oct 3, 2025

0.2.12

Jul 30, 2025

0.2.11

Jul 7, 2025

0.2.10

Jun 29, 2025

0.2.9

Apr 22, 2025

0.2.8

Apr 3, 2025

0.2.7

Mar 30, 2025

0.2.6

Mar 22, 2025

0.2.5

Feb 24, 2025

0.2.4

Feb 13, 2025

0.2.3

Jan 31, 2025

0.2.2

Jan 31, 2025

0.2.1

Jan 30, 2025

0.2.0

Jan 27, 2025

0.1.1

Jan 26, 2025

0.0.1

Oct 7, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mloda-0.6.0-py3-none-any.whl (402.8 kB view details)

Uploaded Apr 6, 2026 Python 3

File details

Details for the file mloda-0.6.0-py3-none-any.whl.

File metadata

Download URL: mloda-0.6.0-py3-none-any.whl
Upload date: Apr 6, 2026
Size: 402.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for mloda-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cbb55b89c1fceaca9354121f8bd956eec75d7130247bc875f3f9ff9fe5a0c037`
MD5	`bc98c23daeb98093a0c190906c39743a`
BLAKE2b-256	`2a28445544d11cc05553b1725c9ffea714d116d53bf24866ac19d3d57e6783b4`

See more details on using hashes here.

mloda 0.6.0

Navigation

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

mloda.ai: Open Data Access for ML & AI

30-Second Example

What mloda Does

Why mloda?

AI Use Case: LLM Tool Function

How mloda is Different

Plugins: The Building Blocks

AI Use Case Patterns

1. LLM Tool Function

2. Context Window Assembly

3. RAG with Feature Chaining

Compute Frameworks

Extenders

When to Use mloda

Documentation

Ecosystem

Contributing

Project details

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes