ML Platform for your local machine using cheap cloud services for scalable resources.
Project description
mlforge
A simple feature store SDK for machine learning workflows. Build, validate, and serve ML features with point-in-time correctness.
Installation
pip install mlforge-sdk
Or with uv:
uv add mlforge-sdk
Quick Start
import mlforge as mlf
import polars as pl
from datetime import timedelta
@mlf.feature(
keys=["user_id"],
source="data/transactions.parquet",
timestamp="transaction_date",
interval=timedelta(days=1),
metrics=[
mlf.Rolling(
windows=["7d", "30d"],
aggregations={"amount": ["sum", "mean", "count"]}
)
],
validators={
"amount": [mlf.not_null(), mlf.greater_than(0)],
"user_id": [mlf.not_null()],
},
description="User spending patterns over rolling windows"
)
def user_spend(df: pl.DataFrame) -> pl.DataFrame:
return df.select(["user_id", "transaction_date", "amount"])
Register features and build them:
import mlforge as mlf
import my_features
defs = mlf.Definitions(
name="my-project",
features=[my_features],
offline_store=mlf.LocalStore("./feature_store")
)
# Build features to storage
defs.build()
Retrieve features for training with point-in-time correctness:
import mlforge as mlf
training_df = mlf.get_training_data(
entity_df=labels_df,
features=["user_spend"],
store=mlf.LocalStore("./feature_store"),
timestamp="label_time"
)
Features
- Feature Definition: Define features with the
@mlf.featuredecorator - Rolling Aggregations: Compute time-windowed metrics with
mlf.Rolling - Data Validation: Validate data with built-in validators (
mlf.not_null(),mlf.greater_than(), etc.) - Storage Backends: Local filesystem and Amazon S3 support
- Point-in-Time Joins: Retrieve training data with temporal correctness
- Feature Metadata: Automatic tracking of schemas, row counts, and lineage
- CLI: Build, validate, and inspect features from the command line
CLI Usage
Build all features:
mlforge build
Build specific features:
mlforge build --features user_spend,merchant_spend
Build features by tag:
mlforge build --tags users
Validate features without building:
mlforge validate
List registered features:
mlforge list
Inspect feature metadata:
mlforge inspect user_spend
Validators
Built-in validators for data quality:
import mlforge as mlf
@mlf.feature(
keys=["id"],
source="data.parquet",
validators={
"email": [mlf.not_null(), mlf.matches_regex(r"^[\w.-]+@[\w.-]+\.\w+$")],
"age": [mlf.not_null(), mlf.in_range(0, 120)],
"status": [mlf.is_in(["active", "inactive"])],
"score": [mlf.greater_than_or_equal(0), mlf.less_than_or_equal(100)],
}
)
def validated_feature(df):
return df
Available validators: not_null, unique, greater_than, less_than, greater_than_or_equal, less_than_or_equal, in_range, matches_regex, is_in
Storage Backends
Local Storage
import mlforge as mlf
store = mlf.LocalStore("./feature_store")
S3 Storage
import mlforge as mlf
store = mlf.S3Store(
bucket="my-features",
prefix="prod/features",
region="us-west-2"
)
Documentation
Full documentation is available at https://chonalchendo.github.io/mlforge
Contributing
Contributions are welcome! Please see the repository for development setup and guidelines.
License
MIT License - see LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlforge_sdk-0.4.0-py3-none-any.whl.
File metadata
- Download URL: mlforge_sdk-0.4.0-py3-none-any.whl
- Upload date:
- Size: 42.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dcde98c7da847bed83343df32a3dbbf32f755fde67a09539d79ecaa3d3e34dc2
|
|
| MD5 |
c32293fe532bf6e9949d986a4b318ff9
|
|
| BLAKE2b-256 |
8eabc3849ae6c50dd6fb0e9f949bb03706c8faf29daecee49462eb734da92b89
|
Provenance
The following attestation bundles were made for mlforge_sdk-0.4.0-py3-none-any.whl:
Publisher:
publish.yaml on chonalchendo/mlforge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mlforge_sdk-0.4.0-py3-none-any.whl -
Subject digest:
dcde98c7da847bed83343df32a3dbbf32f755fde67a09539d79ecaa3d3e34dc2 - Sigstore transparency entry: 780775342
- Sigstore integration time:
-
Permalink:
chonalchendo/mlforge@a15c261ea4ee090f749f978c50338660b6c79522 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/chonalchendo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@a15c261ea4ee090f749f978c50338660b6c79522 -
Trigger Event:
push
-
Statement type: