Skip to main content

RecSys Library

Project description

RePlay

RePlay is a library providing tools for all stages of creating a recommendation system, from data preprocessing to model evaluation and comparison.

RePlay uses PySpark to handle big data.

You can

  • Filter and split data
  • Train models
  • Optimize hyper parameters
  • Evaluate predictions with metrics
  • Combine predictions from different models
  • Create a two-level model

Documentation is available here.

Table of Contents

Installation

Use Linux machine with Python 3.7-3.9, Java 8+ and C++ compiler.

pip install replay-rec

To get the latest development version or RePlay, install it from the GitHab repository. It is preferable to use a virtual environment for your installation.

If you encounter an error during RePlay installation, check the troubleshooting guide.

Quickstart

from rs_datasets import MovieLens

from replay.data_preparator import DataPreparator, Indexer
from replay.metrics import HitRate, NDCG
from replay.models import ItemKNN
from replay.session_handler import State
from replay.splitters import UserSplitter

spark = State().session

ml_1m = MovieLens("1m")

# data preprocessing
preparator = DataPreparator()
log = preparator.transform(
    columns_mapping={
        'user_id': 'user_id',
        'item_id': 'item_id',
        'relevance': 'rating',
        'timestamp': 'timestamp'
    }, 
    data=ml_1m.ratings
)
indexer = Indexer(user_col='user_id', item_col='item_id')
indexer.fit(users=log.select('user_id'), items=log.select('item_id'))
log_replay = indexer.transform(df=log)

# data splitting
user_splitter = UserSplitter(
    item_test_size=10,
    user_test_size=500,
    drop_cold_items=True,
    drop_cold_users=True,
    shuffle=True,
    seed=42,
)
train, test = user_splitter.split(log_replay)

# model training
model = ItemKNN()
model.fit(train)

# model inference
recs = model.predict(
    log=train,
    k=K,
    users=test.select('user_idx').distinct(),
    filter_seen_items=True,
)

# model evaluation
metrics = Experiment(test,  {NDCG(): K, HitRate(): K})
metrics.add_result("knn", recs)

Resources

Usage examples

  1. 01_replay_basics.ipynb - get started with RePlay.
  2. 02_models_comparison.ipynb - reproducible models comparison on MovieLens-1M dataset.
  3. 03_features_preprocessing_and_lightFM.ipynb - LightFM example with pyspark for feature preprocessing.
  4. 04_splitters.ipynb - An example of using RePlay data splitters.
  5. 05_feature_generators.ipynb - Feature generation with RePlay.

Videos and papers

Contributing to RePlay

We welcome community contributions. For details please check our contributing guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

replay_rec-0.11.0.tar.gz (117.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

replay_rec-0.11.0-py3-none-any.whl (148.7 kB view details)

Uploaded Python 3

File details

Details for the file replay_rec-0.11.0.tar.gz.

File metadata

  • Download URL: replay_rec-0.11.0.tar.gz
  • Upload date:
  • Size: 117.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.9.7 Darwin/21.6.0

File hashes

Hashes for replay_rec-0.11.0.tar.gz
Algorithm Hash digest
SHA256 5e89562701fd8d1fe4ac14c9775896ef90cc8dd735da8e84cadda721d9c92ede
MD5 73d03a04f36fb4bc31575df5bb494a21
BLAKE2b-256 d967ac07ab0fe195b7142252bb34e0432bb3bb9fe845a3a4fe9a51fefe295e17

See more details on using hashes here.

File details

Details for the file replay_rec-0.11.0-py3-none-any.whl.

File metadata

  • Download URL: replay_rec-0.11.0-py3-none-any.whl
  • Upload date:
  • Size: 148.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.9.7 Darwin/21.6.0

File hashes

Hashes for replay_rec-0.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 17fa7ec6ea1f29e9f46b08fad3b7093fb0662c76d2efd31f9517f984535d3143
MD5 4abf8ef81604747c1f119cbf0fc4034d
BLAKE2b-256 0b581452e84f2e80aee6a79911d92b3f2ed1d42b60d293bc4ba1238bab49e0ab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page