RecSys Library
Project description
RePlay
RePlay is a library providing tools for all stages of creating a recommendation system, from data preprocessing to model evaluation and comparison.
RePlay uses PySpark to handle big data.
You can
- Filter and split data
- Train models
- Optimize hyper parameters
- Evaluate predictions with metrics
- Combine predictions from different models
- Create a two-level model
Documentation is available here.
Table of Contents
Installation
Use Linux machine with Python 3.7-3.9, Java 8+ and C++ compiler.
pip install replay-rec
To get the latest development version or RePlay, install it from the GitHab repository. It is preferable to use a virtual environment for your installation.
If you encounter an error during RePlay installation, check the troubleshooting guide.
Quickstart
from rs_datasets import MovieLens
from replay.data_preparator import DataPreparator, Indexer
from replay.metrics import HitRate, NDCG
from replay.models import ItemKNN
from replay.session_handler import State
from replay.splitters import UserSplitter
spark = State().session
ml_1m = MovieLens("1m")
# data preprocessing
preparator = DataPreparator()
log = preparator.transform(
columns_mapping={
'user_id': 'user_id',
'item_id': 'item_id',
'relevance': 'rating',
'timestamp': 'timestamp'
},
data=ml_1m.ratings
)
indexer = Indexer(user_col='user_id', item_col='item_id')
indexer.fit(users=log.select('user_id'), items=log.select('item_id'))
log_replay = indexer.transform(df=log)
# data splitting
user_splitter = UserSplitter(
item_test_size=10,
user_test_size=500,
drop_cold_items=True,
drop_cold_users=True,
shuffle=True,
seed=42,
)
train, test = user_splitter.split(log_replay)
# model training
model = ItemKNN()
model.fit(train)
# model inference
recs = model.predict(
log=train,
k=K,
users=test.select('user_idx').distinct(),
filter_seen_items=True,
)
# model evaluation
metrics = Experiment(test, {NDCG(): K, HitRate(): K})
metrics.add_result("knn", recs)
Resources
Usage examples
- 01_replay_basics.ipynb - get started with RePlay.
- 02_models_comparison.ipynb - reproducible models comparison on MovieLens-1M dataset.
- 03_features_preprocessing_and_lightFM.ipynb - LightFM example with pyspark for feature preprocessing.
- 04_splitters.ipynb - An example of using RePlay data splitters.
- 05_feature_generators.ipynb - Feature generation with RePlay.
Videos and papers
-
Video guides:
-
Research papers:
- Yan-Martin Tamm, Rinchin Damdinov, Alexey Vasilev Quality Metrics in Recommender Systems: Do We Calculate Metrics Consistently?
Contributing to RePlay
For more details visit development section in docs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
replay_rec-0.10.0.tar.gz
(91.6 kB
view hashes)
Built Distribution
replay_rec-0.10.0-py3-none-any.whl
(118.9 kB
view hashes)
Close
Hashes for replay_rec-0.10.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d78f6929ddc17a9a8df1246be9f98c496d88a3ac7f6bdd0a46f3e870e252c05d |
|
MD5 | 7c0c6fe51158df5688a2e71f714d446e |
|
BLAKE2b-256 | a85028e8adf115d95319fc4dd3cb625e6f4cde753125830d13be7b3f28222cde |