Skip to main content

RecSys Library

Project description

RePlay

RePlay is a library providing tools for all stages of creating a recommendation system, from data preprocessing to model evaluation and comparison.

RePlay uses PySpark to handle big data.

You can

  • Filter and split data
  • Train models
  • Optimize hyper parameters
  • Evaluate predictions with metrics
  • Combine predictions from different models
  • Create a two-level model

Documentation is available here.

Table of Contents

Installation

Use Linux machine with Python 3.7-3.9, Java 8+ and C++ compiler.

pip install replay-rec

To get the latest development version or RePlay, install it from the GitHab repository. It is preferable to use a virtual environment for your installation.

If you encounter an error during RePlay installation, check the troubleshooting guide.

Quickstart

from rs_datasets import MovieLens

from replay.data_preparator import DataPreparator, Indexer
from replay.metrics import HitRate, NDCG
from replay.models import ItemKNN
from replay.session_handler import State
from replay.splitters import UserSplitter

spark = State().session

ml_1m = MovieLens("1m")

# data preprocessing
preparator = DataPreparator()
log = preparator.transform(
    columns_mapping={
        'user_id': 'user_id',
        'item_id': 'item_id',
        'relevance': 'rating',
        'timestamp': 'timestamp'
    }, 
    data=ml_1m.ratings
)
indexer = Indexer(user_col='user_id', item_col='item_id')
indexer.fit(users=log.select('user_id'), items=log.select('item_id'))
log_replay = indexer.transform(df=log)

# data splitting
user_splitter = UserSplitter(
    item_test_size=10,
    user_test_size=500,
    drop_cold_items=True,
    drop_cold_users=True,
    shuffle=True,
    seed=42,
)
train, test = user_splitter.split(log_replay)

# model training
model = ItemKNN()
model.fit(train)

# model inference
recs = model.predict(
    log=train,
    k=K,
    users=test.select('user_idx').distinct(),
    filter_seen_items=True,
)

# model evaluation
metrics = Experiment(test,  {NDCG(): K, HitRate(): K})
metrics.add_result("knn", recs)

Resources

Usage examples

  1. 01_replay_basics.ipynb - get started with RePlay.
  2. 02_models_comparison.ipynb - reproducible models comparison on MovieLens-1M dataset.
  3. 03_features_preprocessing_and_lightFM.ipynb - LightFM example with pyspark for feature preprocessing.
  4. 04_splitters.ipynb - An example of using RePlay data splitters.
  5. 05_feature_generators.ipynb - Feature generation with RePlay.

Videos and papers

Contributing to RePlay

For more details visit development section in docs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

replay_rec-0.10.0.tar.gz (91.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

replay_rec-0.10.0-py3-none-any.whl (118.9 kB view details)

Uploaded Python 3

File details

Details for the file replay_rec-0.10.0.tar.gz.

File metadata

  • Download URL: replay_rec-0.10.0.tar.gz
  • Upload date:
  • Size: 91.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.9.7 Darwin/21.5.0

File hashes

Hashes for replay_rec-0.10.0.tar.gz
Algorithm Hash digest
SHA256 671bb3bdbc501fdac1662fe97a69cc5dfae85174a579a6241357e2edf4844822
MD5 f9ba1b12c026350e8aa58d976f7e9598
BLAKE2b-256 0daa66c0e1bf586effb01788d36f75c54fe7cb721b93d63c7f4e175773322b69

See more details on using hashes here.

File details

Details for the file replay_rec-0.10.0-py3-none-any.whl.

File metadata

  • Download URL: replay_rec-0.10.0-py3-none-any.whl
  • Upload date:
  • Size: 118.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.9.7 Darwin/21.5.0

File hashes

Hashes for replay_rec-0.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d78f6929ddc17a9a8df1246be9f98c496d88a3ac7f6bdd0a46f3e870e252c05d
MD5 7c0c6fe51158df5688a2e71f714d446e
BLAKE2b-256 a85028e8adf115d95319fc4dd3cb625e6f4cde753125830d13be7b3f28222cde

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page