Skip to main content

Fast, sklearn-compatible Factorization Machines and Field-aware Factorization Machines

Project description

modern_fm

PyPI Python versions CI License: MIT docs

Fast, sklearn-compatible Factorization Machines (FM) and Field-aware Factorization Machines (FFM) for Python.

Documentation: https://matapanino.github.io/modern_fm/ — install, quickstart, API reference, math specs.

Status: v1.0 (stable). The public API is frozen under the SemVer contract in docs/compat_policy.md. A Rust CPU backend (parity-tested against pure-NumPy reference implementations) drives sklearn-style estimators — FMClassifier, FMRegressor, FFMClassifier, FFMRegressor (binary + multiclass softmax + regression) and FwFMClassifier (Field-weighted FM) — with the SGD / AdaGrad / Adam / FTRL-Proximal optimizers, mini-batch gradient averaging (batch_size), multi-core training via rayon (n_jobs), early stopping for every cell, partial_fit/warm_start streaming, sample_weight/class_weight, label_smoothing, a CategoricalEncoder, top_interactions model inspection, and save_model/load_model. FTRL's L1 (l1_linear/l1_factors) yields exact-zero weights. The estimators are scikit-learn check_estimator-compatible (drop into Pipeline / GridSearchCV / CalibratedClassifierCV), accept pandas / polars DataFrames, and load_libffm / dump_libffm read and write the libffm text format. An optional CUDA backend (backend="cuda", source-build feature) accelerates FM/FFM prediction and FM/FFM binary/regression training on NVIDIA GPUs (compute ≥ 6.0).

Installation

pip install modern-fm        # prebuilt wheels for Linux/macOS/Windows, no Rust toolchain needed

To build from source instead (e.g. on a platform without a prebuilt wheel), see Development below; it requires a Rust toolchain.

Usage

from modern_fm import FMClassifier, FFMClassifier

model = FMClassifier(
    n_factors=16,
    optimizer="adagrad",
    learning_rate=0.05,
    max_iter=100,
    batch_size=256,        # mini-batch gradient averaging (1 = per-row SGD)
    n_jobs=-1,             # train batches across all CPU cores
    l2_linear=1e-5,
    l2_factors=1e-5,
    random_state=42,
)
model.fit(X_train, y_train)
proba = model.predict_proba(X_test)

# FTRL-Proximal with L1 for sparse linear weights (classic CTR setup)
sparse = FMClassifier(optimizer="ftrl", l1_linear=1.0, batch_size=256, random_state=42)
sparse.fit(X_train, y_train)

ffm = FFMClassifier(n_factors=8, n_jobs=-1, random_state=42)
ffm.fit(X_train, y_train, field_ids=field_ids)

FMRegressor, multiclass FMClassifier (just pass a target with >2 classes), early stopping (early_stopping=True or eval_set=(X_val, y_val)), and the CategoricalEncoder are demonstrated in examples/basic_usage.py. benchmarks/bench_synthetic.py reports fit time and predict throughput against the NumPy reference floor.

Benchmarks

On synthetic CTR data (40k train / 20k test; 16 one-hot categorical fields → 256 features) with planted pairwise interactions between field pairs — signal a linear model cannot represent — FM/FFM recover most of it. n_jobs=-1 uses all cores (8 here); absolute numbers vary by machine.

Model Test AUC Fit (s) Predict (rows/s)
LogisticRegression (sklearn) 0.694 0.01 60M
FMClassifier (batch=1) 0.817 1.34 4.3M
FMClassifier (batch=512) 0.816 0.45 4.8M
FMClassifier (batch=512, n_jobs=-1) 0.816 0.33 5.0M
FFMClassifier (batch=512) 0.846 1.68 2.3M
FFMClassifier (batch=512, n_jobs=-1) 0.846 1.46 2.1M
  • Interactions matter: AUC climbs 0.69 → 0.82 (FM) → 0.85 (FFM) as the model captures the pairwise / field-aware structure the linear baseline misses.
  • Mini-batch: batch_size=512 trains ~3× faster than per-row SGD at equal AUC.
  • Multi-core: n_jobs=-1 adds a further ~1.2–1.4× here (more on larger/denser data).

Reproduce with python benchmarks/bench_vs_baseline.py. xlearn is auto-included if importable, but it does not build on every platform (it failed to build here on macOS/arm64 + CPython 3.11).

Real click data (KDD Cup 2012 sample)

On real CTR data — the KDD Cup 2012 track-2 sample from OpenML (Click_prediction_small; 200k impressions subsampled with seed 0, 9 id-categorical fields → 373k one-hot features, 4.4% CTR, stratified 80/20 split) — with libFM-style fixed hyperparameters (AdaGrad, L2 1e-4, built-in early stopping; not tuned to this benchmark):

Model Test AUC Fit (s) Predict (krows/s)
LogisticRegression (sklearn) 0.6908 3.5 14 594
FMClassifier (k=8) 0.6810 1.8 2 402
FFMClassifier (k=4) 0.6721 5.1 1 211
FwFMClassifier (k=8) 0.6891 2.8 2 481

Honest read: this 9-field sample is dominated by rare ids (373k features for 160k train rows), so second-order factor models only match — not beat — a well-regularized linear baseline; FwFMClassifier comes closest at a fraction of LR's predict throughput. The planted-interaction synthetic table above shows the regime where factor models pull ahead. Machine: macOS arm64 (Apple Silicon), Python 3.11; reproduce with python benchmarks/bench_criteo_like.py (the original Criteo/Avazu samples are no longer publicly downloadable without credentials, so the bench uses this real CTR dataset via fetch_openml — details in the script docstring).

Development

Requires Python >= 3.10 and a recent Rust toolchain (1.74+; rustup update).

python3 -m venv .venv
.venv/bin/pip install -e ".[dev]"   # builds the Rust extension via maturin
.venv/bin/pytest -q
.venv/bin/ruff check .

pip install -e . compiles rust/ and installs the extension as modern_fm._rust (maturin mixed layout, config in pyproject.toml). After editing Rust code, re-run pip install -e . to rebuild. Rust-only checks:

cd rust
PYO3_PYTHON=$PWD/../.venv/bin/python3 cargo test
PYO3_PYTHON=$PWD/../.venv/bin/python3 cargo clippy

Without the extension built, the package still works: modern_fm._backend falls back to the pure-NumPy reference implementations, and the parity tests in tests/test_rust_parity.py are skipped.

Design documents live in docs/ — start with docs/requirements.md and docs/math_spec.md. The roadmap is in docs/roadmap.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modern_fm-1.0.0.tar.gz (77.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

modern_fm-1.0.0-cp310-abi3-win_amd64.whl (400.6 kB view details)

Uploaded CPython 3.10+Windows x86-64

modern_fm-1.0.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (490.5 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

modern_fm-1.0.0-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (844.9 kB view details)

Uploaded CPython 3.10+macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file modern_fm-1.0.0.tar.gz.

File metadata

  • Download URL: modern_fm-1.0.0.tar.gz
  • Upload date:
  • Size: 77.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for modern_fm-1.0.0.tar.gz
Algorithm Hash digest
SHA256 d2d2d9959aa4149edd5309989a0a675f3f944267d44d181fc3601f29498cdc42
MD5 f715631b055a4cac742824b18e1ae82d
BLAKE2b-256 8da6c208d52fcecb17e132b1523bf420a49d8dc814d080d9672f4c9c1fdadc3b

See more details on using hashes here.

Provenance

The following attestation bundles were made for modern_fm-1.0.0.tar.gz:

Publisher: release.yml on Matapanino/modern_fm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file modern_fm-1.0.0-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: modern_fm-1.0.0-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 400.6 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for modern_fm-1.0.0-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 f0d0093aa9938ac5c624ab3bf311004fabd4e40606a0e1d45c5011ccaf52f0fe
MD5 4046757abe27a1e17690b90ca23af04f
BLAKE2b-256 30a603da189a7f672a0183838dbc07ad1cb6bfe866081f5a26d293364e332e49

See more details on using hashes here.

Provenance

The following attestation bundles were made for modern_fm-1.0.0-cp310-abi3-win_amd64.whl:

Publisher: release.yml on Matapanino/modern_fm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file modern_fm-1.0.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for modern_fm-1.0.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8c9213a69f991d9f0316166d422dc3eaad76d2a196cf92b6d6daab4eecedf06b
MD5 828defb484d404aeec9f9beae87c83cf
BLAKE2b-256 bcef3829fc64df2d75af6d4ff2617e5d1ea37de6a9bb19dbc82290ff7d5187b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for modern_fm-1.0.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on Matapanino/modern_fm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file modern_fm-1.0.0-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for modern_fm-1.0.0-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 509091ac94b3f21572e6e0f9db05767c7e2b2f254181c3493adb3923adc4794f
MD5 a4ad810157f9f92e1a915d39b274f4f4
BLAKE2b-256 ee63c8fb333cacd14a53d50923e3d9db103e7662e3807341f6c365966c672e03

See more details on using hashes here.

Provenance

The following attestation bundles were made for modern_fm-1.0.0-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: release.yml on Matapanino/modern_fm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page