Skip to main content

a python implementation of the generic factorization machines model class adapted for collaborative filtering recommendation problems with implicit feedback user-item interaction data and (optionally) additional user/item side features

Project description

RankFM

PyPI version CircleCI Documentation Status License: GPL v3

[original author]

RankFM is a python implementation of the general Factorization Machines model class adapted for collaborative filtering recommendation/ranking problems with implicit feedback user/item interaction data. It uses Bayesian Personalized Ranking (BPR) and a variant of Weighted Approximate-Rank Pairwise (WARP) loss to learn model weights via Stochastic Gradient Descent (SGD). It can (optionally) incorporate sample weights and user/item auxiliary features to augment the main interaction data.

The core (training, prediction, recommendation) methods are written in Cython, making it possible to scale to millions of user/item interactions. Designed for ease-of-use, RankFM accepts both pd.DataFrame and np.ndarray inputs - you do not have to convert your data to scipy.sparse matrices or re-map user/item identifiers prior to use. RankFM internally maps all user/item identifiers to zero-based integer indexes, but always converts its output back to the original user/item identifiers from your data, which can be arbitrary (non-zero-based, non-consecutive) integers or even strings.

In addition to the familiar fit(), predict(), recommend() methods, RankFM includes additional utilities similiar_users() and similar_items() to find the most similar users/items to a given user/item based on latent factor space embeddings. A number of popular recommendation/ranking evaluation metric functions have been included in the separate evaluation module to streamline model tuning and validation.

  • see the Quickstart section below to get started with the basic functionality
  • see the /examples folder for more in-depth jupyter notebook walkthroughs with several popular open-source data sets
  • see the Online Documentation for more comprehensive documentation on the main model class and separate evaluation module
  • see the Medium Article for contextual motivation and a detailed mathematical description of the algorithm

Dependencies

  • Python 3.6+
  • numpy >= 1.15
  • pandas >= 0.24

Installation

Prerequisites

To install RankFM's C extensions you will need the GNU Compiler Collection (GCC). Check to see whether you already have it installed:

gcc --version

If you don't have it already you can easily install it using Homebrew on OSX or your default linux package manager:

# OSX
brew install gcc

# linux
sudo yum install gcc

# ensure [gcc] has been installed correctly and is on the system PATH
gcc --version

Package Installation

You can install the latest published version from PyPI using pip:

pip install rankfmc

Or alternatively install the current development build directly from GitHub:

pip install git+https://github.com/etlundquist/rankfm.git#egg=rankfm

It's highly recommended that you use an Anaconda base environment to ensure that all core numpy C extensions and linear algebra libraries have been installed and configured correctly. Anaconda: it just works.

Quickstart

Let's work through a simple example of fitting a model, generating recommendations, evaluating performance, and assessing some item-item similarities. The data we'll be using here may already be somewhat familiar: you know it, you love it, it's the MovieLens 1M!

Let's first look at the required shape of the interaction data:

user_id item_id
3 233
5 377
8 610

It has just two columns: a user_id and an item_id (you can name these fields whatever you want or use a numpy array instead). Notice that there is no rating column - this library is for implicit feedback data (e.g. watches, page views, purchases, clicks) as opposed to explicit feedback data (e.g. 1-5 ratings, thumbs up/down). Implicit feedback is far more common in real-world recommendation contexts and doesn't suffer from the missing-not-at-random problem of pure explicit feedback approaches.

Now let's import the library, initialize our model, and fit on the training data:

from rankfm.rankfm import RankFM
model = RankFM(factors=20, loss='warp', max_samples=20, alpha=0.01, sigma=0.1, learning_rate=0.1, learning_schedule='invscaling')
model.fit(interactions_train, epochs=20, verbose=True)
# NOTE: this takes about 30 seconds for 750,000 interactions on my 2.3 GHz i5 8GB RAM MacBook

If you set verbose=True the model will print the current epoch number as well as the epoch's log-likelihood during training. This can be useful to gauge both computational speed and training gains by epoch. If the log likelihood is not increasing then try upping the learning_rate or lowering the (alpha, beta) regularization strength terms. If the log likelihood is starting to bounce up and down try lowering the learning_rate or using learning_schedule='invscaling' to decrease the learning rate over time. If you run into overflow errors then decrease the feature and/or sample-weight magnitudes and try upping beta, especially if you have a small number of dense user-features and/or item-features. Selecting BPR loss will lead to faster training times, but WARP loss typically yields superior model performance.

Now let's generate some user-item model scores from the validation data:

valid_scores = model.predict(interactions_valid, cold_start='nan')

this will produce an array of real-valued model scores generated using the Factorization Machines model equation. You can interpret it as a measure of the predicted utility of item (i) for user (u). The cold_start='nan' option can be used to set scores to np.nan for user/item pairs not found in the training data, or cold_start='drop' can be specified to drop those pairs so the results contain no missing values.

Now let's generate our topN recommended movies for each user:

valid_recs = model.recommend(valid_users, n_items=10, filter_previous=True, cold_start='drop')

The input should be a pd.Series, np.ndarray or list of user_id values. You can use filter_previous=True to prevent generating recommendations that include any items observed by the user in the training data, which could be useful depending on your application context. The result will be a pd.DataFrame where user_id values will be the index and the rows will be each user's top recommended items in descending order (best item is in column 0):

0 1 2 3 4 5 6 7 8 9
3 2396 1265 357 34 2858 3175 1 2028 17 356
5 608 1617 1610 3418 590 474 858 377 924 1036
8 589 1036 2571 2028 2000 1220 1197 110 780 1954

Now let's see how the model is performing wrt the included validation metrics evaluated on the hold-out data:

from rankfm.evaluation import hit_rate, reciprocal_rank, discounted_cumulative_gain, precision, recall

valid_hit_rate = hit_rate(model, interactions_valid, k=10)
valid_reciprocal_rank = reciprocal_rank(model, interactions_valid, k=10)
valid_dcg = discounted_cumulative_gain(model, interactions_valid, k=10)
valid_precision = precision(model, interactions_valid, k=10)
valid_recall = recall(model, interactions_valid, k=10)
hit_rate: 0.796
reciprocal_rank: 0.339
dcg: 0.734
precision: 0.159
recall: 0.077

That's a Bingo!

Now let's find the most similar other movies for a few movies based on their embedding representations in latent factor space:

# Terminator 2: Judgment Day (1991)
model.similar_items(589, n_items=10)
2571                       Matrix, The (1999)
1527                Fifth Element, The (1997)
2916                      Total Recall (1990)
3527                          Predator (1987)
780             Independence Day (ID4) (1996)
1909    X-Files: Fight the Future, The (1998)
733                          Rock, The (1996)
1376     Star Trek IV: The Voyage Home (1986)
480                      Jurassic Park (1993)
1200                            Aliens (1986)

I hope you like explosions...

# Being John Malkovich (1999)
model.similar_items(2997, n_items=10)
2599           Election (1999)
3174    Man on the Moon (1999)
2858    American Beauty (1999)
3317        Wonder Boys (2000)
223              Clerks (1994)
3897      Almost Famous (2000)
2395           Rushmore (1998)
2502       Office Space (1999)
2908     Boys Don't Cry (1999)
3481      High Fidelity (2000)

Let's get weird...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rankfmc-0.3.0.tar.gz (189.4 kB view details)

Uploaded Source

Built Distributions

rankfmc-0.3.0-cp311-cp311-win_amd64.whl (129.1 kB view details)

Uploaded CPython 3.11 Windows x86-64

rankfmc-0.3.0-cp311-cp311-manylinux1_x86_64.whl (679.2 kB view details)

Uploaded CPython 3.11

rankfmc-0.3.0-cp311-cp311-macosx_11_0_arm64.whl (264.1 kB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

rankfmc-0.3.0-cp310-cp310-win_amd64.whl (128.7 kB view details)

Uploaded CPython 3.10 Windows x86-64

rankfmc-0.3.0-cp310-cp310-manylinux1_x86_64.whl (632.9 kB view details)

Uploaded CPython 3.10

rankfmc-0.3.0-cp310-cp310-macosx_11_0_arm64.whl (148.7 kB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

rankfmc-0.3.0-cp39-cp39-win_amd64.whl (129.3 kB view details)

Uploaded CPython 3.9 Windows x86-64

rankfmc-0.3.0-cp39-cp39-manylinux1_x86_64.whl (635.4 kB view details)

Uploaded CPython 3.9

rankfmc-0.3.0-cp39-cp39-macosx_11_0_arm64.whl (149.4 kB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

rankfmc-0.3.0-cp38-cp38-win_amd64.whl (129.3 kB view details)

Uploaded CPython 3.8 Windows x86-64

rankfmc-0.3.0-cp38-cp38-manylinux1_x86_64.whl (679.0 kB view details)

Uploaded CPython 3.8

rankfmc-0.3.0-cp38-cp38-macosx_11_0_arm64.whl (148.5 kB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

rankfmc-0.3.0-cp37-cp37m-win_amd64.whl (127.4 kB view details)

Uploaded CPython 3.7m Windows x86-64

rankfmc-0.3.0-cp37-cp37m-manylinux1_x86_64.whl (597.8 kB view details)

Uploaded CPython 3.7m

rankfmc-0.3.0-cp37-cp37m-macosx_11_0_arm64.whl (147.6 kB view details)

Uploaded CPython 3.7m macOS 11.0+ ARM64

rankfmc-0.3.0-cp36-cp36m-win_amd64.whl (138.0 kB view details)

Uploaded CPython 3.6m Windows x86-64

rankfmc-0.3.0-cp36-cp36m-manylinux1_x86_64.whl (593.5 kB view details)

Uploaded CPython 3.6m

rankfmc-0.3.0-cp36-cp36m-macosx_11_0_arm64.whl (144.5 kB view details)

Uploaded CPython 3.6m macOS 11.0+ ARM64

File details

Details for the file rankfmc-0.3.0.tar.gz.

File metadata

  • Download URL: rankfmc-0.3.0.tar.gz
  • Upload date:
  • Size: 189.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for rankfmc-0.3.0.tar.gz
Algorithm Hash digest
SHA256 4e70d8604072e494ffa58fbf7118745ed67c3681cc1ba146a90b45cc9a53e7dd
MD5 b6e278ad42f2a808c2f2c974c9c8355f
BLAKE2b-256 ca4f9d47afda50af2e7051aca671429f8b9a298f3275d85e08de5337d5543ae5

See more details on using hashes here.

File details

Details for the file rankfmc-0.3.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: rankfmc-0.3.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 129.1 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for rankfmc-0.3.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 83563b26d67e6ee6765b5c7ef35ce611ad114abda463a57f9cc5bfac779550f6
MD5 7efcea1162ca6e1945fad5352f6a76d7
BLAKE2b-256 d25f8e1249ba256a8799f4832bbc2ca18071c2f9f820f476d8d37d937b927745

See more details on using hashes here.

File details

Details for the file rankfmc-0.3.0-cp311-cp311-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for rankfmc-0.3.0-cp311-cp311-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 b7b9ee98d51dbf33dfef4c52ae0300cf6270043501bb86f4483a7ee1d50a41f9
MD5 bb18cafa62b7ce999e12ac61c4932201
BLAKE2b-256 e906e7bdf3cc21368efdea8b0eb49353daa30d50cb2f92f2fb9fb8fd9d8e2e3c

See more details on using hashes here.

File details

Details for the file rankfmc-0.3.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rankfmc-0.3.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 cd220a1bd4571adf1deb784658cd11f2c8b240c33f774acf5a0c497510a68184
MD5 bcd3952edbb525a684574cba7f1e1bb7
BLAKE2b-256 ddd87a0c9c9a386cff55664875893f159c2047b7ea1f0e00833e4098de4fb9a8

See more details on using hashes here.

File details

Details for the file rankfmc-0.3.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: rankfmc-0.3.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 128.7 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for rankfmc-0.3.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 705e1650be5bdcd66e861f0adee8297341015d894c1f9308205670195d3b5758
MD5 332de3af82219db09396e06db2d4d04f
BLAKE2b-256 cce2aaeee0e92b4e02405cd482cba087f1affcb53d59843c8639f6655debbe52

See more details on using hashes here.

File details

Details for the file rankfmc-0.3.0-cp310-cp310-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for rankfmc-0.3.0-cp310-cp310-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 7604ea92e2480430728604d55dd11843a9d382496e3c3587e17dcb6114106490
MD5 899fe5db5d8dfd5c2ea3ad5d33e42ce4
BLAKE2b-256 409d0339f58509737a78fd17c00c9ddb8fd3b780732f2c2b5835cc77d1917b24

See more details on using hashes here.

File details

Details for the file rankfmc-0.3.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rankfmc-0.3.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 bc2f1fd5a31a1232cdce66bfb936e7685a174f65914948f5e955bf3f778759ec
MD5 991e92d5e604e7fa0a27bd79f2fca66e
BLAKE2b-256 72bfa31c45795559fcf8f8d0fcaa40f71f64f0040cf517fe30c8304f5262a8ce

See more details on using hashes here.

File details

Details for the file rankfmc-0.3.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: rankfmc-0.3.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 129.3 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for rankfmc-0.3.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 ea553dc7b4003a8da118dfabd567743389ea509e619f0eed276527fc94fc0e42
MD5 553965c9b4594b30e17fc457a185c533
BLAKE2b-256 453b83398dccb78b1238748d8370c083545a1c9efd983c10078564509df16812

See more details on using hashes here.

File details

Details for the file rankfmc-0.3.0-cp39-cp39-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for rankfmc-0.3.0-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 4dd6aec019f5bc54f937995a93ba02edc730c2f3deeb6bf2a356907e134be171
MD5 7a4990eb694582a3cf0720b11da0e33d
BLAKE2b-256 923db5d6cdbcb7b9a156298cc7fc55044d4e8d8c493fac4866998c3b613e25ca

See more details on using hashes here.

File details

Details for the file rankfmc-0.3.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rankfmc-0.3.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 db1044598cc07ee59efdd0272e1327344ce7b9a2b7dd9cf8c2fc93243f84808d
MD5 5c523d67d965fbe39b4478dc876e1099
BLAKE2b-256 ea6e868a7aec98557f78edda96ed20c176d14ae513972e2b9d91479fef952dba

See more details on using hashes here.

File details

Details for the file rankfmc-0.3.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: rankfmc-0.3.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 129.3 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for rankfmc-0.3.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 2291cdf7efa0b7fdd0490d1f70bdc603f0a56b96b2f573efc8595278a561df51
MD5 2c4bb53d3d1301c2944939f65f733064
BLAKE2b-256 e7956c03d3d2f0e09ce9bf59c629e7ecbcec175107dc0155229c692d71430d4e

See more details on using hashes here.

File details

Details for the file rankfmc-0.3.0-cp38-cp38-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for rankfmc-0.3.0-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 f6168d5d4b3894b5e3f6c0c6a33089691a2c90f6bb8c51b797e6c3d404f7c9fc
MD5 36adf19e01629999fa92348e5bfdee6a
BLAKE2b-256 6bcc569edf9d482a335379bc8608cd0627ff4986dc5223500168a5956570d8de

See more details on using hashes here.

File details

Details for the file rankfmc-0.3.0-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rankfmc-0.3.0-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 603e086d16aadfc29924ce32c14d07648ebbefc81fc4cc0e62c3e1506391843b
MD5 5d8f51704723e91b8d5b4835fbcaba3f
BLAKE2b-256 04485332661199a615b19acd1a717d23667a9caef6268e47747ca585c16cdd88

See more details on using hashes here.

File details

Details for the file rankfmc-0.3.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: rankfmc-0.3.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 127.4 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.9

File hashes

Hashes for rankfmc-0.3.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 cd9e18baca667eb34a7eb2f13cee0918c5f44d215a7767e11b9ce8eeece18f9d
MD5 8fea5545c7b3f03284e188cf7c6ce310
BLAKE2b-256 c9015411512d9a14b473e84b282d42a32c220569f43606a72061b4c5460c5a55

See more details on using hashes here.

File details

Details for the file rankfmc-0.3.0-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for rankfmc-0.3.0-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 ce0b28fafe84d3d6d634acce48076f7501cefa27a55bc4da7de955b165505d13
MD5 79b60770d9b2f636c04c97dcb77d5448
BLAKE2b-256 b537d17d2ed072b32582a43fcc8166a99c1656fae4aa5bbb0f4108a97b51514b

See more details on using hashes here.

File details

Details for the file rankfmc-0.3.0-cp37-cp37m-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rankfmc-0.3.0-cp37-cp37m-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e39c762f1af74b20644bc7abd4543dfa932ad4b53373c516aad92f394b2245a3
MD5 5edf28e84f22463a757ad1d8247b5383
BLAKE2b-256 fd21ffeda9762e147ecd8b21e64e9aa4c0bc934e478277528c8e395929b6b098

See more details on using hashes here.

File details

Details for the file rankfmc-0.3.0-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: rankfmc-0.3.0-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 138.0 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/1.0.0 urllib3/1.26.16 tqdm/4.64.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.5 CPython/3.6.8

File hashes

Hashes for rankfmc-0.3.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 ade90b42630582d006ea2836c3300636cc6266ca3260959099220a10de2fb941
MD5 8ccb7a6fc8b3e2cac7f40fa49d22d18d
BLAKE2b-256 ab0f78d85b0add2c63db29e439977871e8c1553c6e0b1de1b8489eb97b4b9e7f

See more details on using hashes here.

File details

Details for the file rankfmc-0.3.0-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: rankfmc-0.3.0-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 593.5 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/1.0.0 urllib3/1.26.16 tqdm/4.64.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.5 CPython/3.6.15

File hashes

Hashes for rankfmc-0.3.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 b8e6ea635cdc475477666dcf087ab24913ea7913b5adcbd4ce906f2b0693c59d
MD5 58f9575a15969c459ebafcfb9c707277
BLAKE2b-256 f09f8991c5549afedcc57501165d6433644664177f57ebfee66c8b6631dc5175

See more details on using hashes here.

File details

Details for the file rankfmc-0.3.0-cp36-cp36m-macosx_11_0_arm64.whl.

File metadata

  • Download URL: rankfmc-0.3.0-cp36-cp36m-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 144.5 kB
  • Tags: CPython 3.6m, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/1.0.0 urllib3/1.26.16 tqdm/4.64.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.5 CPython/3.6.15

File hashes

Hashes for rankfmc-0.3.0-cp36-cp36m-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 59b912db0b804125f292235a685228b1eb204f065aa9bcee337dcf7ee9c7524a
MD5 f098938a001429c57b1edc5b7f1edaae
BLAKE2b-256 ba4916cfddc9853f839833a9f5c07f89800df47f904c430f1d2fb6d0cee2aac5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page