Skip to main content

RecSys Library

Project description


GitHub License PyPI - Version PyPI - Downloads
Join the community on GitHub Discussions

RePlay is an advanced framework designed to facilitate the development and evaluation of recommendation systems. It provides a robust set of tools covering the entire lifecycle of a recommendation system pipeline:

🚀 Features:

  • Data Preprocessing and Splitting: Streamlines the data preparation process for recommendation systems, ensuring optimal data structure and format for efficient processing.
  • Wide Range of Recommendation Models: Enables building of recommendation models from State-of-the-Art to commonly-used baselines and evaluate their performance and quality.
  • Hyperparameter Optimization: Offers tools for fine-tuning model parameters to achieve the best possible performance, reducing the complexity of the optimization process.
  • Comprehensive Evaluation Metrics: Incorporates a wide range of evaluation metrics to assess the accuracy and effectiveness of recommendation models.
  • Model Ensemble and Hybridization: Supports combining predictions from multiple models and creating two-level (ensemble) models to enhance the quality of recommendations.
  • Seamless Mode Transition: Facilitates easy transition from offline experimentation to online production environments, ensuring scalability and flexibility.

💻 Hardware and Environment Compatibility:

  1. Diverse Hardware Support: Compatible with various hardware configurations including CPU, GPU, Multi-GPU.
  2. Cluster Computing Integration: Integrating with PySpark for distributed computing, enabling scalability for large-scale recommendation systems.

📖 Documentation is available here.

Table of Contents

🔧 Installation

Installation via pip package manager is recommended by default:

pip install replay-rec

In this case it will be installed the core package without PySpark and PyTorch dependencies. Also experimental submodule will not be installed.

To install experimental submodule please specify the version with rc0 suffix. For example:

pip install replay-rec==XX.YY.ZZrc0

Extras

In addition to the core package, several extras are also provided, including:

  • [spark]: Install PySpark functionality
  • [torch]: Install PyTorch and Lightning functionality
  • [all]: [spark] [torch]

Example:

# Install core package with PySpark dependency
pip install replay-rec[spark]

# Install package with experimental submodule and PySpark dependency
pip install replay-rec[spark]==XX.YY.ZZrc0

To build RePlay from sources please use the instruction.

If you encounter an error during RePlay installation, check the troubleshooting guide.

📈 Quickstart

from rs_datasets import MovieLens

from replay.data import Dataset, FeatureHint, FeatureInfo, FeatureSchema, FeatureType
from replay.data.dataset_utils import DatasetLabelEncoder
from replay.metrics import HitRate, NDCG, Experiment
from replay.models import ItemKNN
from replay.utils import convert2spark
from replay.utils.session_handler import State
from replay.splitters import RatioSplitter

spark = State().session

ml_1m = MovieLens("1m")
K=10

# data preprocessing
interactions = convert2spark(ml_1m.ratings)

# data splitting
splitter = RatioSplitter(
    test_size=0.3,
    divide_column="user_id",
    query_column="user_id",
    item_column="item_id",
    timestamp_column="timestamp",
    drop_cold_items=True,
    drop_cold_users=True,
)
train, test = splitter.split(interactions)

# dataset creating
feature_schema = FeatureSchema(
    [
        FeatureInfo(
            column="user_id",
            feature_type=FeatureType.CATEGORICAL,
            feature_hint=FeatureHint.QUERY_ID,
        ),
        FeatureInfo(
            column="item_id",
            feature_type=FeatureType.CATEGORICAL,
            feature_hint=FeatureHint.ITEM_ID,
        ),
        FeatureInfo(
            column="rating",
            feature_type=FeatureType.NUMERICAL,
            feature_hint=FeatureHint.RATING,
        ),
        FeatureInfo(
            column="timestamp",
            feature_type=FeatureType.NUMERICAL,
            feature_hint=FeatureHint.TIMESTAMP,
        ),
    ]
)

train_dataset = Dataset(
    feature_schema=feature_schema,
    interactions=train,
)
test_dataset = Dataset(
    feature_schema=feature_schema,
    interactions=test,
)

# data encoding
encoder = DatasetLabelEncoder()
train_dataset = encoder.fit_transform(train_dataset)
test_dataset = encoder.transform(test_dataset)

# model training
model = ItemKNN()
model.fit(train_dataset)

# model inference
encoded_recs = model.predict(
    dataset=train_dataset,
    k=K,
    queries=test_dataset.query_ids,
    filter_seen_items=True,
)

recs = encoder.query_and_item_id_encoder.inverse_transform(encoded_recs)

# model evaluation
metrics = Experiment(
    [NDCG(K), HitRate(K)],
    test,
    query_column="user_id",
    item_column="item_id",
    rating_column="rating",
)
metrics.add_result("ItemKNN", recs)
print(metrics.results)

📑 Resources

Usage examples

  1. 01_replay_basics.ipynb - get started with RePlay.
  2. 02_models_comparison.ipynb - reproducible models comparison on MovieLens-1M dataset.
  3. 03_features_preprocessing_and_lightFM.ipynb - LightFM example with pyspark for feature preprocessing.
  4. 04_splitters.ipynb - An example of using RePlay data splitters.
  5. 05_feature_generators.ipynb - Feature generation with RePlay.
  6. 06_item2item_recommendations.ipynb - Item to Item recommendations example.
  7. 07_filters.ipynb - An example of using filters.
  8. 08_recommending_for_categories.ipynb - An example of recommendation for product categories.
  9. 09_sasrec_example.ipynb - An example of using transformers to generate recommendations.

Videos and papers

💡 Contributing to RePlay

We welcome community contributions. For details please check our contributing guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

replay_rec-0.14.0rc0.tar.gz (225.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

replay_rec-0.14.0rc0-py3-none-any.whl (315.3 kB view details)

Uploaded Python 3

File details

Details for the file replay_rec-0.14.0rc0.tar.gz.

File metadata

  • Download URL: replay_rec-0.14.0rc0.tar.gz
  • Upload date:
  • Size: 225.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.13 Linux/5.4.0-135-generic

File hashes

Hashes for replay_rec-0.14.0rc0.tar.gz
Algorithm Hash digest
SHA256 083187ef814a915c6ace1d3c569612744e9be6beacce02adf63038d9c8d47331
MD5 ca1507848981d2ae5fdaa7fe360c148e
BLAKE2b-256 5a82fbf8112601aa93aadebd759ee54947800d8ef1295ddf6bf52c91a9395f01

See more details on using hashes here.

File details

Details for the file replay_rec-0.14.0rc0-py3-none-any.whl.

File metadata

  • Download URL: replay_rec-0.14.0rc0-py3-none-any.whl
  • Upload date:
  • Size: 315.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.13 Linux/5.4.0-135-generic

File hashes

Hashes for replay_rec-0.14.0rc0-py3-none-any.whl
Algorithm Hash digest
SHA256 8af05ffc399494b0b468ff6c1d9da4616bf48f6b4a2f8e142446d4a7a30f0a31
MD5 1bcde86e03e1b112e1dc6d9df7e4a174
BLAKE2b-256 26b552d37049dc75d6f4ac46f9011b081cf86495a3a2a1d86bbdd7d366ab3c85

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page