Skip to main content

ML Platform for your local machine using cheap cloud services for scalable resources.

Project description

mlforge

PyPI version Python versions License

A simple feature store SDK for machine learning workflows. Build, validate, and serve ML features with point-in-time correctness.

Installation

pip install mlforge-sdk

Or with uv:

uv add mlforge-sdk

Quick Start

import mlforge as mlf
import polars as pl
from datetime import timedelta

@mlf.feature(
    keys=["user_id"],
    source="data/transactions.parquet",
    timestamp="transaction_date",
    interval=timedelta(days=1),
    metrics=[
        mlf.Rolling(
            windows=["7d", "30d"],
            aggregations={"amount": ["sum", "mean", "count"]}
        )
    ],
    validators={
        "amount": [mlf.not_null(), mlf.greater_than(0)],
        "user_id": [mlf.not_null()],
    },
    description="User spending patterns over rolling windows"
)
def user_spend(df: pl.DataFrame) -> pl.DataFrame:
    return df.select(["user_id", "transaction_date", "amount"])

Register features and build them:

import mlforge as mlf
import my_features

defs = mlf.Definitions(
    name="my-project",
    features=[my_features],
    offline_store=mlf.LocalStore("./feature_store")
)

# Build features to storage
defs.build()

Retrieve features for training with point-in-time correctness:

import mlforge as mlf

training_df = mlf.get_training_data(
    entity_df=labels_df,
    features=["user_spend"],
    store=mlf.LocalStore("./feature_store"),
    timestamp="label_time"
)

Features

  • Feature Definition: Define features with the @mlf.feature decorator
  • Rolling Aggregations: Compute time-windowed metrics with mlf.Rolling
  • Data Validation: Validate data with built-in validators (mlf.not_null(), mlf.greater_than(), etc.)
  • Storage Backends: Local filesystem and Amazon S3 support
  • Point-in-Time Joins: Retrieve training data with temporal correctness
  • Feature Metadata: Automatic tracking of schemas, row counts, and lineage
  • CLI: Build, validate, and inspect features from the command line

CLI Usage

Build all features:

mlforge build

Build specific features:

mlforge build --features user_spend,merchant_spend

Build features by tag:

mlforge build --tags users

Validate features without building:

mlforge validate

List registered features:

mlforge list

Inspect feature metadata:

mlforge inspect user_spend

Validators

Built-in validators for data quality:

import mlforge as mlf

@mlf.feature(
    keys=["id"],
    source="data.parquet",
    validators={
        "email": [mlf.not_null(), mlf.matches_regex(r"^[\w.-]+@[\w.-]+\.\w+$")],
        "age": [mlf.not_null(), mlf.in_range(0, 120)],
        "status": [mlf.is_in(["active", "inactive"])],
        "score": [mlf.greater_than_or_equal(0), mlf.less_than_or_equal(100)],
    }
)
def validated_feature(df):
    return df

Available validators: not_null, unique, greater_than, less_than, greater_than_or_equal, less_than_or_equal, in_range, matches_regex, is_in

Storage Backends

Local Storage

import mlforge as mlf

store = mlf.LocalStore("./feature_store")

S3 Storage

import mlforge as mlf

store = mlf.S3Store(
    bucket="my-features",
    prefix="prod/features",
    region="us-west-2"
)

Documentation

Full documentation is available at https://chonalchendo.github.io/mlforge

Contributing

Contributions are welcome! Please see the repository for development setup and guidelines.

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlforge_sdk-0.4.0-py3-none-any.whl (42.1 kB view details)

Uploaded Python 3

File details

Details for the file mlforge_sdk-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: mlforge_sdk-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 42.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlforge_sdk-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dcde98c7da847bed83343df32a3dbbf32f755fde67a09539d79ecaa3d3e34dc2
MD5 c32293fe532bf6e9949d986a4b318ff9
BLAKE2b-256 8eabc3849ae6c50dd6fb0e9f949bb03706c8faf29daecee49462eb734da92b89

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlforge_sdk-0.4.0-py3-none-any.whl:

Publisher: publish.yaml on chonalchendo/mlforge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page