Skip to main content

Fast, sklearn-compatible Factorization Machines and Field-aware Factorization Machines

Project description

modern_fm

PyPI Python versions CI License: MIT docs

Fast, sklearn-compatible Factorization Machines (FM) and Field-aware Factorization Machines (FFM) for Python.

Documentation: https://matapanino.github.io/modern_fm/ — install, quickstart, API reference, math specs.

Status: v1.0 (stable). The public API is frozen under the SemVer contract in docs/compat_policy.md. A Rust CPU backend (parity-tested against pure-NumPy reference implementations) drives sklearn-style estimators — FMClassifier, FMRegressor, FFMClassifier, FFMRegressor (binary + multiclass softmax + regression) and FwFMClassifier (Field-weighted FM) — with the SGD / AdaGrad / Adam / FTRL-Proximal optimizers, mini-batch gradient averaging (batch_size), multi-core training via rayon (n_jobs), early stopping for every cell, partial_fit/warm_start streaming, sample_weight/class_weight, label_smoothing, a CategoricalEncoder, top_interactions model inspection, and save_model/load_model. FTRL's L1 (l1_linear/l1_factors) yields exact-zero weights. The estimators are scikit-learn check_estimator-compatible (drop into Pipeline / GridSearchCV / CalibratedClassifierCV), accept pandas / polars DataFrames, and load_libffm / dump_libffm read and write the libffm text format. An optional CUDA backend (backend="cuda") accelerates every prediction and training cell — FM/FFM/FwFM, binary/regression/multiclass — on NVIDIA GPUs (compute ≥ 6.0).

Installation

pip install modern-fm        # prebuilt wheels for Linux/macOS/Windows, no Rust toolchain needed

The Linux wheels are CUDA-ready out of the box: wherever an NVIDIA driver (CUDA 12+) is present — e.g. Colab/Kaggle GPU runtimes — backend="cuda" just works; on CPU-only machines the same wheel behaves exactly like a CPU build. macOS/Windows wheels are CPU-only.

To build from source instead (e.g. on a platform without a prebuilt wheel), see Development below; it requires a Rust toolchain.

Usage

from modern_fm import FMClassifier, FFMClassifier

model = FMClassifier(
    n_factors=16,
    optimizer="adagrad",
    learning_rate=0.05,
    max_iter=100,
    batch_size=256,        # mini-batch gradient averaging (1 = per-row SGD)
    n_jobs=-1,             # train batches across all CPU cores
    l2_linear=1e-5,
    l2_factors=1e-5,
    random_state=42,
)
model.fit(X_train, y_train)
proba = model.predict_proba(X_test)

# FTRL-Proximal with L1 for sparse linear weights (classic CTR setup)
sparse = FMClassifier(optimizer="ftrl", l1_linear=1.0, batch_size=256, random_state=42)
sparse.fit(X_train, y_train)

ffm = FFMClassifier(n_factors=8, n_jobs=-1, random_state=42)
ffm.fit(X_train, y_train, field_ids=field_ids)

FMRegressor, multiclass FMClassifier (just pass a target with >2 classes), early stopping (early_stopping=True or eval_set=(X_val, y_val)), and the CategoricalEncoder are demonstrated in examples/basic_usage.py. benchmarks/bench_synthetic.py reports fit time and predict throughput against the NumPy reference floor.

Benchmarks

On synthetic CTR data (40k train / 20k test; 16 one-hot categorical fields → 256 features) with planted pairwise interactions between field pairs — signal a linear model cannot represent — FM/FFM recover most of it. n_jobs=-1 uses all cores (8 here); absolute numbers vary by machine.

Model Test AUC Fit (s) Predict (rows/s)
LogisticRegression (sklearn) 0.694 0.01 60M
FMClassifier (batch=1) 0.817 1.34 4.3M
FMClassifier (batch=512) 0.816 0.45 4.8M
FMClassifier (batch=512, n_jobs=-1) 0.816 0.33 5.0M
FFMClassifier (batch=512) 0.846 1.68 2.3M
FFMClassifier (batch=512, n_jobs=-1) 0.846 1.46 2.1M
  • Interactions matter: AUC climbs 0.69 → 0.82 (FM) → 0.85 (FFM) as the model captures the pairwise / field-aware structure the linear baseline misses.
  • Mini-batch: batch_size=512 trains ~3× faster than per-row SGD at equal AUC.
  • Multi-core: n_jobs=-1 adds a further ~1.2–1.4× here (more on larger/denser data).

Reproduce with python benchmarks/bench_vs_baseline.py. xlearn is auto-included if importable, but it does not build on every platform (it failed to build here on macOS/arm64 + CPython 3.11).

Real click data (KDD Cup 2012 sample)

On real CTR data — the KDD Cup 2012 track-2 sample from OpenML (Click_prediction_small; 200k impressions subsampled with seed 0, 9 id-categorical fields → 373k one-hot features, 4.4% CTR, stratified 80/20 split) — with libFM-style fixed hyperparameters (AdaGrad, L2 1e-4, built-in early stopping; not tuned to this benchmark):

Model Test AUC Fit (s) Predict (krows/s)
LogisticRegression (sklearn) 0.6908 3.5 14 594
FMClassifier (k=8) 0.6810 1.8 2 402
FFMClassifier (k=4) 0.6721 5.1 1 211
FwFMClassifier (k=8) 0.6891 2.8 2 481

Honest read: this 9-field sample is dominated by rare ids (373k features for 160k train rows), so second-order factor models only match — not beat — a well-regularized linear baseline; FwFMClassifier comes closest at a fraction of LR's predict throughput. The planted-interaction synthetic table above shows the regime where factor models pull ahead. Machine: macOS arm64 (Apple Silicon), Python 3.11; reproduce with python benchmarks/bench_criteo_like.py (the original Criteo/Avazu samples are no longer publicly downloadable without credentials, so the bench uses this real CTR dataset via fetch_openml — details in the script docstring).

Development

Requires Python >= 3.10 and a recent Rust toolchain (1.74+; rustup update).

python3 -m venv .venv
.venv/bin/pip install -e ".[dev]"   # builds the Rust extension via maturin
.venv/bin/pytest -q
.venv/bin/ruff check .

pip install -e . compiles rust/ and installs the extension as modern_fm._rust (maturin mixed layout, config in pyproject.toml). After editing Rust code, re-run pip install -e . to rebuild. Rust-only checks:

cd rust
PYO3_PYTHON=$PWD/../.venv/bin/python3 cargo test
PYO3_PYTHON=$PWD/../.venv/bin/python3 cargo clippy

Without the extension built, the package still works: modern_fm._backend falls back to the pure-NumPy reference implementations, and the parity tests in tests/test_rust_parity.py are skipped.

Design documents live in docs/ — start with docs/requirements.md and docs/math_spec.md. The roadmap is in docs/roadmap.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modern_fm-1.1.0.tar.gz (90.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

modern_fm-1.1.0-cp310-abi3-win_amd64.whl (402.7 kB view details)

Uploaded CPython 3.10+Windows x86-64

modern_fm-1.1.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (611.7 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

modern_fm-1.1.0-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (849.3 kB view details)

Uploaded CPython 3.10+macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file modern_fm-1.1.0.tar.gz.

File metadata

  • Download URL: modern_fm-1.1.0.tar.gz
  • Upload date:
  • Size: 90.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for modern_fm-1.1.0.tar.gz
Algorithm Hash digest
SHA256 7b9149666cf0d2493d4e88bd999b4862369ad3070043a4d4d3afa15cdd720f89
MD5 7ae0c248f8661f03319ba578fc7b284d
BLAKE2b-256 c4101389bbb653a02cdd5ef46f272488036d19f77c46de766b3cb379b6322041

See more details on using hashes here.

Provenance

The following attestation bundles were made for modern_fm-1.1.0.tar.gz:

Publisher: release.yml on Matapanino/modern_fm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file modern_fm-1.1.0-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: modern_fm-1.1.0-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 402.7 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for modern_fm-1.1.0-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 bedd45586be8c14b05eb2bb466f282d4f32d6ffe1ba975f0a45fcd032714dd66
MD5 27fd214a653e018ac63b4526f5bbda17
BLAKE2b-256 e9773fbbcc4ff7dbf135b34ff3233cdbf6a335b68a0a179526be694c606a3e10

See more details on using hashes here.

Provenance

The following attestation bundles were made for modern_fm-1.1.0-cp310-abi3-win_amd64.whl:

Publisher: release.yml on Matapanino/modern_fm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file modern_fm-1.1.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for modern_fm-1.1.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b13090c09f94b047e87db62f93b1cbb7c912f1a4c4e349339e4e4f4400c2088a
MD5 967a9891c050fe4fc79a94c57bc0ba34
BLAKE2b-256 108cb4995cc089ea3475aed012dc433d375ddc1856fb724f9d8092fba0cab5ef

See more details on using hashes here.

Provenance

The following attestation bundles were made for modern_fm-1.1.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on Matapanino/modern_fm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file modern_fm-1.1.0-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for modern_fm-1.1.0-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 71d399c7af36846d40cf284fa6636a2b59d874ca13ffae09e1c27713a65c862a
MD5 1657f4979a6d165abf973425e911abff
BLAKE2b-256 b64ecb2fbf3e2ea061534abee9ac7a346506e2cc237e1f0f60407dadb70684a1

See more details on using hashes here.

Provenance

The following attestation bundles were made for modern_fm-1.1.0-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: release.yml on Matapanino/modern_fm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page