Simulator for recommendation algorithms
Project description
Simulator
Simulator is framework for training and evaluating recommendation algorithms on real or synthetic data. Framework is based on pyspark library to work with big data. As a part of simulation process the framework incorporates data generators, response functions and other tools, that can provide flexible usage of simulator.
Table of contents
Installation
pip install sim4rec
To install dependencies with poetry run
pip install --upgrade pip wheel poetry lightfm==1.17
poetry install
Quickstart
The following example shows how to use simulator to train model iteratively by refitting recommendation algorithm on the new upcoming history log
import numpy as np
import pandas as pd
import pyspark.sql.types as st
from pyspark.ml import PipelineModel
from sim4rec.utils import pandas_to_spark
from sim4rec.modules import RealDataGenerator, Simulator
from sim4rec.response import NoiseResponse, BernoulliResponse
from ucb import UCB
from replay.metrics import NDCG
LOG_SCHEMA = st.StructType([
st.StructField('user_idx', st.LongType(), True),
st.StructField('item_idx', st.LongType(), True),
st.StructField('relevance', st.DoubleType(), False),
st.StructField('response', st.IntegerType(), False)
])
users_df = pd.DataFrame(
data=np.random.normal(0, 1, size=(100, 15)),
columns=[f'user_attr_{i}' for i in range(15)]
)
items_df = pd.DataFrame(
data=np.random.normal(1, 1, size=(30, 10)),
columns=[f'item_attr_{i}' for i in range(10)]
)
history_df = pandas_to_spark(pd.DataFrame({
'user_idx' : [1, 10, 10, 50],
'item_idx' : [4, 25, 26, 25],
'relevance' : [1.0, 0.0, 1.0, 1.0],
'response' : [1, 0, 1, 1]
}), schema=LOG_SCHEMA)
users_df['user_idx'] = np.arange(len(users_df))
items_df['item_idx'] = np.arange(len(items_df))
users_df = pandas_to_spark(users_df)
items_df = pandas_to_spark(items_df)
user_gen = RealDataGenerator(label='users_real')
item_gen = RealDataGenerator(label='items_real')
user_gen.fit(users_df)
item_gen.fit(items_df)
_ = user_gen.generate(100)
_ = item_gen.generate(30)
sim = Simulator(
user_gen=user_gen,
item_gen=item_gen,
data_dir='test_simulator',
user_key_col='user_idx',
item_key_col='item_idx',
log_df=history_df
)
noise_resp = NoiseResponse(mu=0.5, sigma=0.2, outputCol='__noise')
br = BernoulliResponse(inputCol='__noise', outputCol='response')
pipeline = PipelineModel(stages=[noise_resp, br])
model = UCB()
model.fit(log=history_df)
ndcg = NDCG()
train_ndcg = []
for i in range(10):
users = sim.sample_users(0.1).cache()
recs = model.predict(log=sim.log, k=5, users=users, items=items_df, filter_seen_items=True).cache()
true_resp = sim.sample_responses(
recs_df=recs,
user_features=users,
item_features=items_df,
action_models=pipeline
).select('user_idx', 'item_idx', 'relevance', 'response').cache()
sim.update_log(true_resp, iteration=i)
train_ndcg.append(ndcg(recs, true_resp.filter(true_resp['response'] >= 1), 5))
model.fit(sim.log.drop('relevance').withColumnRenamed('response', 'relevance'))
users.unpersist()
recs.unpersist()
true_resp.unpersist()
print(train_ndcg)
Examples
You can find useful examples in notebooks
folder, which demonstrates how to use synthetic data generators, composite generators, evaluate scores of the generators, iteratively refit recommendation algorithm, use response functions and more.
Build from sources
poetry build
pip install ./dist/sim4rec-0.0.1-py3-none-any.whl
Compile documentation
cd docs
make clean && make html
Tests
For tests the pytest python library is used and to run tests for all modules you can run the following command from repository root directory
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file sim4rec-0.0.1.tar.gz
.
File metadata
- Download URL: sim4rec-0.0.1.tar.gz
- Upload date:
- Size: 21.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.9.7 Darwin/21.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8196ed367421e24f0b69a5f90eff5dc72e4ca1fe74cba979a3e359f35833be00 |
|
MD5 | 065077c6959ccfa581e307c62a53d997 |
|
BLAKE2b-256 | 94912a20a8abb3588675bb3b3911f839e00fa8afaf4d1b0e7a8fe546652bcc85 |
File details
Details for the file sim4rec-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: sim4rec-0.0.1-py3-none-any.whl
- Upload date:
- Size: 25.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.9.7 Darwin/21.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dbef3dee3330bcf0a140a3924a98ff3a5f6585693ed260c07ed80992058608f8 |
|
MD5 | edc83f3eb6dbd46725b7ac1f117d37a6 |
|
BLAKE2b-256 | 6703e4752e124991f49e818031d5f7e139847a23197b28b31b8d8c072d8ff2e9 |