ML Platform for your local machine using cheap cloud services for scalable resources.
Project description
mlforge
A simple feature store SDK for machine learning workflows. Build, version, and serve ML features with point-in-time correctness.
Installation
pip install mlforge-sdk
Or with uv:
uv add mlforge-sdk
Quick Start
1. Initialize a project
mlforge init my-features --profile
cd my-features
This creates:
my-features/
├── src/my_features/
│ ├── definitions.py
│ ├── features.py
│ └── entities.py
├── data/
├── feature_store/
├── mlforge.yaml
└── pyproject.toml
2. Configure environments
Edit mlforge.yaml to configure your stores:
default_profile: dev
profiles:
dev:
offline_store:
KIND: local
path: ./feature_store
production:
offline_store:
KIND: s3
bucket: ${oc.env:S3_BUCKET}
prefix: features
online_store:
KIND: redis
host: ${oc.env:REDIS_HOST}
3. Define features
Edit src/my_features/features.py:
import mlforge as mlf
import polars as pl
@mlf.feature(
keys=["user_id"],
source="data/transactions.parquet",
timestamp="transaction_date",
metrics=[
mlf.Rolling(
windows=["7d", "30d"],
aggregations={"amount": ["sum", "mean", "count"]}
)
],
validators={
"amount": [mlf.not_null(), mlf.greater_than(0)],
},
)
def user_spend(df: pl.DataFrame) -> pl.DataFrame:
return df.select(["user_id", "transaction_date", "amount"])
4. Build features
mlforge build
5. Retrieve for training
from my_features.definitions import defs
training_df = defs.get_training_data(
features=["user_spend"],
entity_df=labels_df,
timestamp="label_time"
)
Features
- Feature Definition: Define features with the
@mlf.featuredecorator - Rolling Aggregations: Compute time-windowed metrics with
mlf.Rolling - Data Validation: Built-in validators for data quality
- Semantic Versioning: Automatic version detection (MAJOR/MINOR/PATCH)
- Storage Backends: Local filesystem, Amazon S3, Google Cloud Storage
- Online Serving: Redis-backed real-time feature retrieval
- Entity Keys: Surrogate key generation with
mlf.Entity - Point-in-Time Joins: Training data with temporal correctness
- Environment Profiles: Configure dev/staging/prod via
mlforge.yaml - CLI Tools: Build, validate, inspect, diff, rollback features
- Git Collaboration: Share definitions via Git, sync data locally
CLI Reference
# Project setup
mlforge init my-project --profile # Create new project with mlforge.yaml
mlforge profile --validate # Validate store connectivity
# Build and validate
mlforge build # Build all features
mlforge build --online # Build to online store (Redis)
mlforge build --profile production # Use production profile
mlforge validate # Validate without building
# Discovery
mlforge list features # List all features
mlforge list entities # List all entities
mlforge list versions user_spend # List feature versions
mlforge inspect feature user_spend # Inspect feature metadata
# Version management
mlforge diff user_spend # Compare versions
mlforge rollback user_spend --previous # Rollback to previous version
# Team collaboration
mlforge sync # Rebuild from Git metadata
mlforge sync --dry-run # Preview sync
Environment Profiles
Configure different environments in mlforge.yaml:
default_profile: dev
profiles:
dev:
offline_store:
KIND: local
path: ./feature_store
staging:
offline_store:
KIND: s3
bucket: staging-features
prefix: v1
production:
offline_store:
KIND: s3
bucket: prod-features
prefix: v1
online_store:
KIND: redis
host: ${oc.env:REDIS_HOST}
password: ${oc.env:REDIS_PASSWORD}
Switch profiles:
mlforge build --profile production
# or
export MLFORGE_PROFILE=production
mlforge build
Online Serving
Build features to Redis for real-time inference:
mlforge build --online --profile production
Retrieve features:
from my_features.definitions import defs
features = defs.get_online_features(
features=["user_spend", "merchant_revenue"],
entity_df=request_df,
)
Entity Keys
Define entities for automatic surrogate key generation:
import mlforge as mlf
user = mlf.Entity(
name="user",
join_key="user_id",
from_columns=["first", "last", "dob"],
)
@mlf.feature(
keys=["user_id"],
source="data/transactions.parquet",
entities=[user],
)
def user_spend(df):
return df.group_by("user_id").agg(pl.col("amount").sum())
Entity keys are generated automatically during build and retrieval.
Automatic Versioning
mlforge versions features using semantic versioning:
- MAJOR (2.0.0): Breaking changes (columns removed, dtype changed)
- MINOR (1.1.0): Additive changes (columns added, config changed)
- PATCH (1.0.1): Data refresh (same schema and config)
mlforge build # Creates v1.0.0
mlforge build --force # Creates v1.0.1 (PATCH)
mlforge diff user_spend # Compare versions
mlforge rollback user_spend 1.0.0 # Rollback if needed
Git Collaboration
Share feature definitions via Git:
# Developer 1: Build and commit
mlforge build --features user_spend
git add feature_store/user_spend/
git commit -m "feat: add user_spend"
git push
# Developer 2: Pull and sync
git pull
mlforge sync # Rebuilds data from metadata
Documentation
Full documentation: https://chonalchendo.github.io/mlforge
Requirements
- Python >= 3.13
- Polars >= 1.35.2
License
MIT License - see LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlforge_sdk-0.7.0-py3-none-any.whl.
File metadata
- Download URL: mlforge_sdk-0.7.0-py3-none-any.whl
- Upload date:
- Size: 114.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d537f8b42a2cb6e9aab6fa9194aa7b2e8ad7f737de7911e02f9df9be30e9033
|
|
| MD5 |
5929277d6d131016367d95a097fadfab
|
|
| BLAKE2b-256 |
cf065270b798f689f4836cb61f5a7a8797aab8bf94265fd56c6e3ee1a80df796
|
Provenance
The following attestation bundles were made for mlforge_sdk-0.7.0-py3-none-any.whl:
Publisher:
publish.yaml on chonalchendo/mlforge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mlforge_sdk-0.7.0-py3-none-any.whl -
Subject digest:
1d537f8b42a2cb6e9aab6fa9194aa7b2e8ad7f737de7911e02f9df9be30e9033 - Sigstore transparency entry: 829355667
- Sigstore integration time:
-
Permalink:
chonalchendo/mlforge@c24a3efe15bc2c9d153e5dbef7d965ffba032670 -
Branch / Tag:
refs/tags/v0.7.0 - Owner: https://github.com/chonalchendo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@c24a3efe15bc2c9d153e5dbef7d965ffba032670 -
Trigger Event:
push
-
Statement type: