Fast, sklearn-compatible Factorization Machines and Field-aware Factorization Machines
Project description
modern_fm
Fast, sklearn-compatible Factorization Machines (FM) and Field-aware Factorization Machines (FFM) for Python.
Documentation: https://matapanino.github.io/modern_fm/ — install, quickstart, API reference, math specs.
Status: v1.0 (stable). The public API is frozen under the SemVer contract
in docs/compat_policy.md. A Rust CPU backend (parity-tested against
pure-NumPy reference implementations) drives sklearn-style estimators —
FMClassifier, FMRegressor, FFMClassifier, FFMRegressor (binary +
multiclass softmax + regression) and FwFMClassifier (Field-weighted FM) —
with the SGD / AdaGrad / Adam / FTRL-Proximal optimizers, mini-batch
gradient averaging (batch_size), multi-core training via rayon
(n_jobs), early stopping for every cell, partial_fit/warm_start
streaming, sample_weight/class_weight, label_smoothing, a
CategoricalEncoder, top_interactions model inspection, and
save_model/load_model. FTRL's L1 (l1_linear/l1_factors) yields
exact-zero weights. The estimators are scikit-learn
check_estimator-compatible (drop into Pipeline / GridSearchCV /
CalibratedClassifierCV), accept pandas / polars DataFrames, and
load_libffm / dump_libffm read and write the libffm text format. An
optional CUDA backend (backend="cuda") accelerates every prediction and
training cell — FM/FFM/FwFM, binary/regression/multiclass — on NVIDIA GPUs
(compute ≥ 6.0).
Installation
pip install modern-fm # prebuilt wheels for Linux/macOS/Windows, no Rust toolchain needed
The Linux wheels are CUDA-ready out of the box: wherever an NVIDIA driver
(CUDA 12+) is present — e.g. Colab/Kaggle GPU runtimes — backend="cuda"
just works; on CPU-only machines the same wheel behaves exactly like a CPU
build. macOS/Windows wheels are CPU-only.
To build from source instead (e.g. on a platform without a prebuilt wheel), see Development below; it requires a Rust toolchain.
Usage
from modern_fm import FMClassifier, FFMClassifier
model = FMClassifier(
n_factors=16,
optimizer="adagrad",
learning_rate=0.05,
max_iter=100,
batch_size=256, # mini-batch gradient averaging (1 = per-row SGD)
n_jobs=-1, # train batches across all CPU cores
l2_linear=1e-5,
l2_factors=1e-5,
random_state=42,
)
model.fit(X_train, y_train)
proba = model.predict_proba(X_test)
# FTRL-Proximal with L1 for sparse linear weights (classic CTR setup)
sparse = FMClassifier(optimizer="ftrl", l1_linear=1.0, batch_size=256, random_state=42)
sparse.fit(X_train, y_train)
ffm = FFMClassifier(n_factors=8, n_jobs=-1, random_state=42)
ffm.fit(X_train, y_train, field_ids=field_ids)
FMRegressor, multiclass FMClassifier (just pass a target with >2 classes),
early stopping (early_stopping=True or eval_set=(X_val, y_val)), and the
CategoricalEncoder are demonstrated in examples/basic_usage.py.
benchmarks/bench_synthetic.py reports fit time and predict throughput against
the NumPy reference floor.
Benchmarks
On synthetic CTR data (40k train / 20k test; 16 one-hot categorical fields →
256 features) with planted pairwise interactions between field pairs — signal
a linear model cannot represent — FM/FFM recover most of it. n_jobs=-1 uses all
cores (8 here); absolute numbers vary by machine.
| Model | Test AUC | Fit (s) | Predict (rows/s) |
|---|---|---|---|
LogisticRegression (sklearn) |
0.694 | 0.01 | 60M |
FMClassifier (batch=1) |
0.817 | 1.34 | 4.3M |
FMClassifier (batch=512) |
0.816 | 0.45 | 4.8M |
FMClassifier (batch=512, n_jobs=-1) |
0.816 | 0.33 | 5.0M |
FFMClassifier (batch=512) |
0.846 | 1.68 | 2.3M |
FFMClassifier (batch=512, n_jobs=-1) |
0.846 | 1.46 | 2.1M |
- Interactions matter: AUC climbs 0.69 → 0.82 (FM) → 0.85 (FFM) as the model captures the pairwise / field-aware structure the linear baseline misses.
- Mini-batch:
batch_size=512trains ~3× faster than per-row SGD at equal AUC. - Multi-core:
n_jobs=-1adds a further ~1.2–1.4× here (more on larger/denser data).
Reproduce with python benchmarks/bench_vs_baseline.py. xlearn is auto-included
if importable, but it does not build on every platform (it failed to build here on
macOS/arm64 + CPython 3.11).
Real click data (KDD Cup 2012 sample)
On real CTR data — the KDD Cup 2012 track-2 sample from OpenML
(Click_prediction_small; 200k impressions subsampled with seed 0, 9
id-categorical fields → 373k one-hot features, 4.4% CTR, stratified 80/20
split) — with libFM-style fixed hyperparameters (AdaGrad, L2 1e-4, built-in
early stopping; not tuned to this benchmark):
| Model | Test AUC | Fit (s) | Predict (krows/s) |
|---|---|---|---|
LogisticRegression (sklearn) |
0.6908 | 3.5 | 14 594 |
FMClassifier (k=8) |
0.6810 | 1.8 | 2 402 |
FFMClassifier (k=4) |
0.6721 | 5.1 | 1 211 |
FwFMClassifier (k=8) |
0.6891 | 2.8 | 2 481 |
Honest read: this 9-field sample is dominated by rare ids (373k features for
160k train rows), so second-order factor models only match — not beat — a
well-regularized linear baseline; FwFMClassifier comes closest at a
fraction of LR's predict throughput. The planted-interaction synthetic table
above shows the regime where factor models pull ahead. Machine: macOS arm64
(Apple Silicon), Python 3.11; reproduce with
python benchmarks/bench_criteo_like.py (the original Criteo/Avazu samples
are no longer publicly downloadable without credentials, so the bench uses
this real CTR dataset via fetch_openml — details in the script docstring).
Development
Requires Python >= 3.10 and a recent Rust toolchain (1.74+; rustup update).
python3 -m venv .venv
.venv/bin/pip install -e ".[dev]" # builds the Rust extension via maturin
.venv/bin/pytest -q
.venv/bin/ruff check .
pip install -e . compiles rust/ and installs the extension as
modern_fm._rust (maturin mixed layout, config in pyproject.toml).
After editing Rust code, re-run pip install -e . to rebuild. Rust-only
checks:
cd rust
PYO3_PYTHON=$PWD/../.venv/bin/python3 cargo test
PYO3_PYTHON=$PWD/../.venv/bin/python3 cargo clippy
Without the extension built, the package still works: modern_fm._backend
falls back to the pure-NumPy reference implementations, and the parity tests
in tests/test_rust_parity.py are skipped.
Design documents live in docs/ — start with docs/requirements.md and
docs/math_spec.md. The roadmap is in docs/roadmap.md.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file modern_fm-1.1.1.tar.gz.
File metadata
- Download URL: modern_fm-1.1.1.tar.gz
- Upload date:
- Size: 90.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
111bbbe653b6fe630811d093f17cbd09d749017d76bf558421dd715b2fc88407
|
|
| MD5 |
1d56c3ae598dcfc867b797bf87bfbb09
|
|
| BLAKE2b-256 |
2acc669d89c41c654d49fe6dc755e8ea56d80da438a8c7e34c3fdbc63e871fae
|
Provenance
The following attestation bundles were made for modern_fm-1.1.1.tar.gz:
Publisher:
release.yml on Matapanino/modern_fm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
modern_fm-1.1.1.tar.gz -
Subject digest:
111bbbe653b6fe630811d093f17cbd09d749017d76bf558421dd715b2fc88407 - Sigstore transparency entry: 2046729847
- Sigstore integration time:
-
Permalink:
Matapanino/modern_fm@191d19e933b6d431fad7b59039d87589588066ab -
Branch / Tag:
refs/tags/v1.1.1 - Owner: https://github.com/Matapanino
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@191d19e933b6d431fad7b59039d87589588066ab -
Trigger Event:
push
-
Statement type:
File details
Details for the file modern_fm-1.1.1-cp310-abi3-win_amd64.whl.
File metadata
- Download URL: modern_fm-1.1.1-cp310-abi3-win_amd64.whl
- Upload date:
- Size: 403.3 kB
- Tags: CPython 3.10+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
72a3267dc0a2f7d493652e853fb974481e376380ba19163155f2c5134944328d
|
|
| MD5 |
18310c24cd038c3060ca40a139ade4fe
|
|
| BLAKE2b-256 |
6cfcf4b4eb37ead67ff4c6a3aae329afc2c14cecc22584a9f4903bb881841d7a
|
Provenance
The following attestation bundles were made for modern_fm-1.1.1-cp310-abi3-win_amd64.whl:
Publisher:
release.yml on Matapanino/modern_fm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
modern_fm-1.1.1-cp310-abi3-win_amd64.whl -
Subject digest:
72a3267dc0a2f7d493652e853fb974481e376380ba19163155f2c5134944328d - Sigstore transparency entry: 2046729879
- Sigstore integration time:
-
Permalink:
Matapanino/modern_fm@191d19e933b6d431fad7b59039d87589588066ab -
Branch / Tag:
refs/tags/v1.1.1 - Owner: https://github.com/Matapanino
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@191d19e933b6d431fad7b59039d87589588066ab -
Trigger Event:
push
-
Statement type:
File details
Details for the file modern_fm-1.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: modern_fm-1.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 611.6 kB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d24c3428a57bda2ff8679a8e9d4845769b10ad9a4c7ff16cfc0caaf35bc91006
|
|
| MD5 |
6e373800e0a36583eb2f8c9900bf3ccf
|
|
| BLAKE2b-256 |
c416d7c78ca291cdcf5d244cf5151f6aa39b604b3be4f8c898eb5416f67ac8d6
|
Provenance
The following attestation bundles were made for modern_fm-1.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
release.yml on Matapanino/modern_fm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
modern_fm-1.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
d24c3428a57bda2ff8679a8e9d4845769b10ad9a4c7ff16cfc0caaf35bc91006 - Sigstore transparency entry: 2046729858
- Sigstore integration time:
-
Permalink:
Matapanino/modern_fm@191d19e933b6d431fad7b59039d87589588066ab -
Branch / Tag:
refs/tags/v1.1.1 - Owner: https://github.com/Matapanino
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@191d19e933b6d431fad7b59039d87589588066ab -
Trigger Event:
push
-
Statement type:
File details
Details for the file modern_fm-1.1.1-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.
File metadata
- Download URL: modern_fm-1.1.1-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
- Upload date:
- Size: 849.6 kB
- Tags: CPython 3.10+, macOS 10.12+ universal2 (ARM64, x86-64), macOS 10.12+ x86-64, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a950d037e98085ad543b0c5e34e2ab7dff20fa3961d4c02e68aa7c255303688
|
|
| MD5 |
52ba768be93e9626d0167aa7eeafeb3a
|
|
| BLAKE2b-256 |
7b5dae1690ca475b6d9a6c3b93de4f531b24416e26d56c6b4f5558301a10fff9
|
Provenance
The following attestation bundles were made for modern_fm-1.1.1-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:
Publisher:
release.yml on Matapanino/modern_fm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
modern_fm-1.1.1-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl -
Subject digest:
6a950d037e98085ad543b0c5e34e2ab7dff20fa3961d4c02e68aa7c255303688 - Sigstore transparency entry: 2046729870
- Sigstore integration time:
-
Permalink:
Matapanino/modern_fm@191d19e933b6d431fad7b59039d87589588066ab -
Branch / Tag:
refs/tags/v1.1.1 - Owner: https://github.com/Matapanino
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@191d19e933b6d431fad7b59039d87589588066ab -
Trigger Event:
push
-
Statement type: