Skip to main content

Fast, sklearn-compatible Factorization Machines and Field-aware Factorization Machines

Project description

modern_fm

PyPI Python versions CI License: MIT docs

Fast, sklearn-compatible Factorization Machines (FM) and Field-aware Factorization Machines (FFM) for Python.

Documentation: https://matapanino.github.io/modern_fm/ — install, quickstart, API reference, math specs.

Status: v1.0 (stable). The public API is frozen under the SemVer contract in docs/compat_policy.md. A Rust CPU backend (parity-tested against pure-NumPy reference implementations) drives sklearn-style estimators — FMClassifier, FMRegressor, FFMClassifier, FFMRegressor (binary + multiclass softmax + regression) and FwFMClassifier (Field-weighted FM) — with the SGD / AdaGrad / Adam / FTRL-Proximal optimizers, mini-batch gradient averaging (batch_size), multi-core training via rayon (n_jobs), early stopping for every cell, partial_fit/warm_start streaming, sample_weight/class_weight, label_smoothing, a CategoricalEncoder, top_interactions model inspection, and save_model/load_model. FTRL's L1 (l1_linear/l1_factors) yields exact-zero weights. The estimators are scikit-learn check_estimator-compatible (drop into Pipeline / GridSearchCV / CalibratedClassifierCV), accept pandas / polars DataFrames, and load_libffm / dump_libffm read and write the libffm text format. An optional CUDA backend (backend="cuda") accelerates every prediction and training cell — FM/FFM/FwFM, binary/regression/multiclass — on NVIDIA GPUs (compute ≥ 6.0).

Installation

pip install modern-fm        # prebuilt wheels for Linux/macOS/Windows, no Rust toolchain needed

The Linux wheels are CUDA-ready out of the box: wherever an NVIDIA driver (CUDA 12+) is present — e.g. Colab/Kaggle GPU runtimes — backend="cuda" just works; on CPU-only machines the same wheel behaves exactly like a CPU build. macOS/Windows wheels are CPU-only.

To build from source instead (e.g. on a platform without a prebuilt wheel), see Development below; it requires a Rust toolchain.

Usage

from modern_fm import FMClassifier, FFMClassifier

model = FMClassifier(
    n_factors=16,
    optimizer="adagrad",
    learning_rate=0.05,
    max_iter=100,
    batch_size=256,        # mini-batch gradient averaging (1 = per-row SGD)
    n_jobs=-1,             # train batches across all CPU cores
    l2_linear=1e-5,
    l2_factors=1e-5,
    random_state=42,
)
model.fit(X_train, y_train)
proba = model.predict_proba(X_test)

# FTRL-Proximal with L1 for sparse linear weights (classic CTR setup)
sparse = FMClassifier(optimizer="ftrl", l1_linear=1.0, batch_size=256, random_state=42)
sparse.fit(X_train, y_train)

ffm = FFMClassifier(n_factors=8, n_jobs=-1, random_state=42)
ffm.fit(X_train, y_train, field_ids=field_ids)

FMRegressor, multiclass FMClassifier (just pass a target with >2 classes), early stopping (early_stopping=True or eval_set=(X_val, y_val)), and the CategoricalEncoder are demonstrated in examples/basic_usage.py. benchmarks/bench_synthetic.py reports fit time and predict throughput against the NumPy reference floor.

Benchmarks

On synthetic CTR data (40k train / 20k test; 16 one-hot categorical fields → 256 features) with planted pairwise interactions between field pairs — signal a linear model cannot represent — FM/FFM recover most of it. n_jobs=-1 uses all cores (8 here); absolute numbers vary by machine.

Model Test AUC Fit (s) Predict (rows/s)
LogisticRegression (sklearn) 0.694 0.01 60M
FMClassifier (batch=1) 0.817 1.34 4.3M
FMClassifier (batch=512) 0.816 0.45 4.8M
FMClassifier (batch=512, n_jobs=-1) 0.816 0.33 5.0M
FFMClassifier (batch=512) 0.846 1.68 2.3M
FFMClassifier (batch=512, n_jobs=-1) 0.846 1.46 2.1M
  • Interactions matter: AUC climbs 0.69 → 0.82 (FM) → 0.85 (FFM) as the model captures the pairwise / field-aware structure the linear baseline misses.
  • Mini-batch: batch_size=512 trains ~3× faster than per-row SGD at equal AUC.
  • Multi-core: n_jobs=-1 adds a further ~1.2–1.4× here (more on larger/denser data).

Reproduce with python benchmarks/bench_vs_baseline.py. xlearn is auto-included if importable, but it does not build on every platform (it failed to build here on macOS/arm64 + CPython 3.11).

Real click data (KDD Cup 2012 sample)

On real CTR data — the KDD Cup 2012 track-2 sample from OpenML (Click_prediction_small; 200k impressions subsampled with seed 0, 9 id-categorical fields → 373k one-hot features, 4.4% CTR, stratified 80/20 split) — with libFM-style fixed hyperparameters (AdaGrad, L2 1e-4, built-in early stopping; not tuned to this benchmark):

Model Test AUC Fit (s) Predict (krows/s)
LogisticRegression (sklearn) 0.6908 3.5 14 594
FMClassifier (k=8) 0.6810 1.8 2 402
FFMClassifier (k=4) 0.6721 5.1 1 211
FwFMClassifier (k=8) 0.6891 2.8 2 481

Honest read: this 9-field sample is dominated by rare ids (373k features for 160k train rows), so second-order factor models only match — not beat — a well-regularized linear baseline; FwFMClassifier comes closest at a fraction of LR's predict throughput. The planted-interaction synthetic table above shows the regime where factor models pull ahead. Machine: macOS arm64 (Apple Silicon), Python 3.11; reproduce with python benchmarks/bench_criteo_like.py (the original Criteo/Avazu samples are no longer publicly downloadable without credentials, so the bench uses this real CTR dataset via fetch_openml — details in the script docstring).

Development

Requires Python >= 3.10 and a recent Rust toolchain (1.74+; rustup update).

python3 -m venv .venv
.venv/bin/pip install -e ".[dev]"   # builds the Rust extension via maturin
.venv/bin/pytest -q
.venv/bin/ruff check .

pip install -e . compiles rust/ and installs the extension as modern_fm._rust (maturin mixed layout, config in pyproject.toml). After editing Rust code, re-run pip install -e . to rebuild. Rust-only checks:

cd rust
PYO3_PYTHON=$PWD/../.venv/bin/python3 cargo test
PYO3_PYTHON=$PWD/../.venv/bin/python3 cargo clippy

Without the extension built, the package still works: modern_fm._backend falls back to the pure-NumPy reference implementations, and the parity tests in tests/test_rust_parity.py are skipped.

Design documents live in docs/ — start with docs/requirements.md and docs/math_spec.md. The roadmap is in docs/roadmap.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modern_fm-1.1.1.tar.gz (90.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

modern_fm-1.1.1-cp310-abi3-win_amd64.whl (403.3 kB view details)

Uploaded CPython 3.10+Windows x86-64

modern_fm-1.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (611.6 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

modern_fm-1.1.1-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (849.6 kB view details)

Uploaded CPython 3.10+macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file modern_fm-1.1.1.tar.gz.

File metadata

  • Download URL: modern_fm-1.1.1.tar.gz
  • Upload date:
  • Size: 90.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for modern_fm-1.1.1.tar.gz
Algorithm Hash digest
SHA256 111bbbe653b6fe630811d093f17cbd09d749017d76bf558421dd715b2fc88407
MD5 1d56c3ae598dcfc867b797bf87bfbb09
BLAKE2b-256 2acc669d89c41c654d49fe6dc755e8ea56d80da438a8c7e34c3fdbc63e871fae

See more details on using hashes here.

Provenance

The following attestation bundles were made for modern_fm-1.1.1.tar.gz:

Publisher: release.yml on Matapanino/modern_fm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file modern_fm-1.1.1-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: modern_fm-1.1.1-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 403.3 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for modern_fm-1.1.1-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 72a3267dc0a2f7d493652e853fb974481e376380ba19163155f2c5134944328d
MD5 18310c24cd038c3060ca40a139ade4fe
BLAKE2b-256 6cfcf4b4eb37ead67ff4c6a3aae329afc2c14cecc22584a9f4903bb881841d7a

See more details on using hashes here.

Provenance

The following attestation bundles were made for modern_fm-1.1.1-cp310-abi3-win_amd64.whl:

Publisher: release.yml on Matapanino/modern_fm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file modern_fm-1.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for modern_fm-1.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d24c3428a57bda2ff8679a8e9d4845769b10ad9a4c7ff16cfc0caaf35bc91006
MD5 6e373800e0a36583eb2f8c9900bf3ccf
BLAKE2b-256 c416d7c78ca291cdcf5d244cf5151f6aa39b604b3be4f8c898eb5416f67ac8d6

See more details on using hashes here.

Provenance

The following attestation bundles were made for modern_fm-1.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on Matapanino/modern_fm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file modern_fm-1.1.1-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for modern_fm-1.1.1-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 6a950d037e98085ad543b0c5e34e2ab7dff20fa3961d4c02e68aa7c255303688
MD5 52ba768be93e9626d0167aa7eeafeb3a
BLAKE2b-256 7b5dae1690ca475b6d9a6c3b93de4f531b24416e26d56c6b4f5558301a10fff9

See more details on using hashes here.

Provenance

The following attestation bundles were made for modern_fm-1.1.1-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: release.yml on Matapanino/modern_fm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page