RecSys Library
Project description
RePlay
RePlay is a library providing tools for all stages of creating a recommendation system, from data preprocessing to model evaluation and comparison.
RePlay uses PySpark to handle big data.
You can
- Filter and split data
- Train models
- Optimize hyper parameters
- Evaluate predictions with metrics
- Combine predictions from different models
- Create a two-level model
Documentation is available here.
Table of Contents
Installation
Use Linux machine with Python 3.7-3.9, Java 8+ and C++ compiler.
pip install replay-rec
To get the latest development version or RePlay, install it from the GitHab repository. It is preferable to use a virtual environment for your installation.
If you encounter an error during RePlay installation, check the troubleshooting guide.
Quickstart
from rs_datasets import MovieLens
from replay.data_preparator import DataPreparator, Indexer
from replay.metrics import HitRate, NDCG
from replay.models import ItemKNN
from replay.session_handler import State
from replay.splitters import UserSplitter
spark = State().session
ml_1m = MovieLens("1m")
# data preprocessing
preparator = DataPreparator()
log = preparator.transform(
columns_mapping={
'user_id': 'user_id',
'item_id': 'item_id',
'relevance': 'rating',
'timestamp': 'timestamp'
},
data=ml_1m.ratings
)
indexer = Indexer(user_col='user_id', item_col='item_id')
indexer.fit(users=log.select('user_id'), items=log.select('item_id'))
log_replay = indexer.transform(df=log)
# data splitting
user_splitter = UserSplitter(
item_test_size=10,
user_test_size=500,
drop_cold_items=True,
drop_cold_users=True,
shuffle=True,
seed=42,
)
train, test = user_splitter.split(log_replay)
# model training
model = ItemKNN()
model.fit(train)
# model inference
recs = model.predict(
log=train,
k=K,
users=test.select('user_idx').distinct(),
filter_seen_items=True,
)
# model evaluation
metrics = Experiment(test, {NDCG(): K, HitRate(): K})
metrics.add_result("knn", recs)
Resources
Usage examples
- 01_replay_basics.ipynb - get started with RePlay.
- 02_models_comparison.ipynb - reproducible models comparison on MovieLens-1M dataset.
- 03_features_preprocessing_and_lightFM.ipynb - LightFM example with pyspark for feature preprocessing.
- 04_splitters.ipynb - An example of using RePlay data splitters.
- 05_feature_generators.ipynb - Feature generation with RePlay.
Videos and papers
-
Video guides:
-
Research papers:
- Yan-Martin Tamm, Rinchin Damdinov, Alexey Vasilev Quality Metrics in Recommender Systems: Do We Calculate Metrics Consistently?
Contributing to RePlay
We welcome community contributions. For details please check our contributing guidelines.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file replay_rec-0.11.0.tar.gz.
File metadata
- Download URL: replay_rec-0.11.0.tar.gz
- Upload date:
- Size: 117.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.9.7 Darwin/21.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e89562701fd8d1fe4ac14c9775896ef90cc8dd735da8e84cadda721d9c92ede
|
|
| MD5 |
73d03a04f36fb4bc31575df5bb494a21
|
|
| BLAKE2b-256 |
d967ac07ab0fe195b7142252bb34e0432bb3bb9fe845a3a4fe9a51fefe295e17
|
File details
Details for the file replay_rec-0.11.0-py3-none-any.whl.
File metadata
- Download URL: replay_rec-0.11.0-py3-none-any.whl
- Upload date:
- Size: 148.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.9.7 Darwin/21.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
17fa7ec6ea1f29e9f46b08fad3b7093fb0662c76d2efd31f9517f984535d3143
|
|
| MD5 |
4abf8ef81604747c1f119cbf0fc4034d
|
|
| BLAKE2b-256 |
0b581452e84f2e80aee6a79911d92b3f2ed1d42b60d293bc4ba1238bab49e0ab
|