Skip to main content

ML Platform for your local machine using cheap cloud services for scalable resources.

Project description

mlforge

PyPI version Python versions License

A simple feature store SDK for machine learning workflows. Build, version, and serve ML features with point-in-time correctness.

Installation

pip install mlforge-sdk

Or with uv:

uv add mlforge-sdk

Quick Start

1. Initialize a project

mlforge init my-features --profile
cd my-features

This creates:

my-features/
├── src/my_features/
│   ├── definitions.py
│   ├── features.py
│   └── entities.py
├── data/
├── feature_store/
├── mlforge.yaml
└── pyproject.toml

2. Configure environments

Edit mlforge.yaml to configure your stores:

default_profile: dev

profiles:
  dev:
    offline_store:
      KIND: local
      path: ./feature_store

  production:
    offline_store:
      KIND: s3
      bucket: ${oc.env:S3_BUCKET}
      prefix: features
    online_store:
      KIND: redis
      host: ${oc.env:REDIS_HOST}

3. Define features

Edit src/my_features/features.py:

import mlforge as mlf
import polars as pl

@mlf.feature(
    keys=["user_id"],
    source="data/transactions.parquet",
    timestamp="transaction_date",
    metrics=[
        mlf.Rolling(
            windows=["7d", "30d"],
            aggregations={"amount": ["sum", "mean", "count"]}
        )
    ],
    validators={
        "amount": [mlf.not_null(), mlf.greater_than(0)],
    },
)
def user_spend(df: pl.DataFrame) -> pl.DataFrame:
    return df.select(["user_id", "transaction_date", "amount"])

4. Build features

mlforge build

5. Retrieve for training

from my_features.definitions import defs

training_df = defs.get_training_data(
    features=["user_spend"],
    entity_df=labels_df,
    timestamp="label_time"
)

Features

  • Feature Definition: Define features with the @mlf.feature decorator
  • Rolling Aggregations: Compute time-windowed metrics with mlf.Rolling
  • Data Validation: Built-in validators for data quality
  • Semantic Versioning: Automatic version detection (MAJOR/MINOR/PATCH)
  • Storage Backends: Local filesystem, Amazon S3, Google Cloud Storage
  • Online Serving: Redis-backed real-time feature retrieval
  • Entity Keys: Surrogate key generation with mlf.Entity
  • Point-in-Time Joins: Training data with temporal correctness
  • Environment Profiles: Configure dev/staging/prod via mlforge.yaml
  • CLI Tools: Build, validate, inspect, diff, rollback features
  • Git Collaboration: Share definitions via Git, sync data locally

CLI Reference

# Project setup
mlforge init my-project --profile    # Create new project with mlforge.yaml
mlforge profile --validate           # Validate store connectivity

# Build and validate
mlforge build                        # Build all features
mlforge build --online               # Build to online store (Redis)
mlforge build --profile production   # Use production profile
mlforge validate                     # Validate without building

# Discovery
mlforge list features                # List all features
mlforge list entities                # List all entities
mlforge list versions user_spend     # List feature versions
mlforge inspect feature user_spend   # Inspect feature metadata

# Version management
mlforge diff user_spend              # Compare versions
mlforge rollback user_spend --previous  # Rollback to previous version

# Team collaboration
mlforge sync                         # Rebuild from Git metadata
mlforge sync --dry-run               # Preview sync

Environment Profiles

Configure different environments in mlforge.yaml:

default_profile: dev

profiles:
  dev:
    offline_store:
      KIND: local
      path: ./feature_store

  staging:
    offline_store:
      KIND: s3
      bucket: staging-features
      prefix: v1

  production:
    offline_store:
      KIND: s3
      bucket: prod-features
      prefix: v1
    online_store:
      KIND: redis
      host: ${oc.env:REDIS_HOST}
      password: ${oc.env:REDIS_PASSWORD}

Switch profiles:

mlforge build --profile production
# or
export MLFORGE_PROFILE=production
mlforge build

Online Serving

Build features to Redis for real-time inference:

mlforge build --online --profile production

Retrieve features:

from my_features.definitions import defs

features = defs.get_online_features(
    features=["user_spend", "merchant_revenue"],
    entity_df=request_df,
)

Entity Keys

Define entities for automatic surrogate key generation:

import mlforge as mlf

user = mlf.Entity(
    name="user",
    join_key="user_id",
    from_columns=["first", "last", "dob"],
)

@mlf.feature(
    keys=["user_id"],
    source="data/transactions.parquet",
    entities=[user],
)
def user_spend(df):
    return df.group_by("user_id").agg(pl.col("amount").sum())

Entity keys are generated automatically during build and retrieval.

Automatic Versioning

mlforge versions features using semantic versioning:

  • MAJOR (2.0.0): Breaking changes (columns removed, dtype changed)
  • MINOR (1.1.0): Additive changes (columns added, config changed)
  • PATCH (1.0.1): Data refresh (same schema and config)
mlforge build                    # Creates v1.0.0
mlforge build --force            # Creates v1.0.1 (PATCH)
mlforge diff user_spend          # Compare versions
mlforge rollback user_spend 1.0.0  # Rollback if needed

Git Collaboration

Share feature definitions via Git:

# Developer 1: Build and commit
mlforge build --features user_spend
git add feature_store/user_spend/
git commit -m "feat: add user_spend"
git push

# Developer 2: Pull and sync
git pull
mlforge sync  # Rebuilds data from metadata

Documentation

Full documentation: https://chonalchendo.github.io/mlforge

Requirements

  • Python >= 3.13
  • Polars >= 1.35.2

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlforge_sdk-0.7.0-py3-none-any.whl (114.4 kB view details)

Uploaded Python 3

File details

Details for the file mlforge_sdk-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: mlforge_sdk-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 114.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlforge_sdk-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1d537f8b42a2cb6e9aab6fa9194aa7b2e8ad7f737de7911e02f9df9be30e9033
MD5 5929277d6d131016367d95a097fadfab
BLAKE2b-256 cf065270b798f689f4836cb61f5a7a8797aab8bf94265fd56c6e3ee1a80df796

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlforge_sdk-0.7.0-py3-none-any.whl:

Publisher: publish.yaml on chonalchendo/mlforge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page