Skip to main content

Simulator for recommendation algorithms

Project description

Simulator

Simulator is framework for training and evaluating recommendation algorithms on real or synthetic data. Framework is based on pyspark library to work with big data. As a part of simulation process the framework incorporates data generators, response functions and other tools, that can provide flexible usage of simulator.

Table of contents

Installation

pip install sim4rec

To install dependencies with poetry run

pip install --upgrade pip wheel poetry lightfm==1.17
poetry install

Quickstart

The following example shows how to use simulator to train model iteratively by refitting recommendation algorithm on the new upcoming history log

import numpy as np
import pandas as pd

import pyspark.sql.types as st
from pyspark.ml import PipelineModel
from sim4rec.utils import pandas_to_spark
from sim4rec.modules import RealDataGenerator, Simulator
from sim4rec.response import NoiseResponse, BernoulliResponse

from ucb import UCB
from replay.metrics import NDCG

LOG_SCHEMA = st.StructType([
    st.StructField('user_idx', st.LongType(), True),
    st.StructField('item_idx', st.LongType(), True),
    st.StructField('relevance', st.DoubleType(), False),
    st.StructField('response', st.IntegerType(), False)
])

users_df = pd.DataFrame(
    data=np.random.normal(0, 1, size=(100, 15)),
    columns=[f'user_attr_{i}' for i in range(15)]
)
items_df = pd.DataFrame(
    data=np.random.normal(1, 1, size=(30, 10)),
    columns=[f'item_attr_{i}' for i in range(10)]
)
history_df = pandas_to_spark(pd.DataFrame({
    'user_idx' : [1, 10, 10, 50],
    'item_idx' : [4, 25, 26, 25],
    'relevance' : [1.0, 0.0, 1.0, 1.0],
    'response' : [1, 0, 1, 1]
}), schema=LOG_SCHEMA)

users_df['user_idx'] = np.arange(len(users_df))
items_df['item_idx'] = np.arange(len(items_df))

users_df = pandas_to_spark(users_df)
items_df = pandas_to_spark(items_df)

user_gen = RealDataGenerator(label='users_real')
item_gen = RealDataGenerator(label='items_real')
user_gen.fit(users_df)
item_gen.fit(items_df)
_ = user_gen.generate(100)
_ = item_gen.generate(30)

sim = Simulator(
    user_gen=user_gen,
    item_gen=item_gen,
    data_dir='test_simulator',
    user_key_col='user_idx',
    item_key_col='item_idx',
    log_df=history_df
)

noise_resp = NoiseResponse(mu=0.5, sigma=0.2, outputCol='__noise')
br = BernoulliResponse(inputCol='__noise', outputCol='response')
pipeline = PipelineModel(stages=[noise_resp, br])

model = UCB()
model.fit(log=history_df)

ndcg = NDCG()

train_ndcg = []
for i in range(10):
    users = sim.sample_users(0.1).cache()

    recs = model.predict(log=sim.log, k=5, users=users, items=items_df, filter_seen_items=True).cache()

    true_resp = sim.sample_responses(
        recs_df=recs,
        user_features=users,
        item_features=items_df,
        action_models=pipeline
    ).select('user_idx', 'item_idx', 'relevance', 'response').cache()

    sim.update_log(true_resp, iteration=i)

    train_ndcg.append(ndcg(recs, true_resp.filter(true_resp['response'] >= 1), 5))

    model.fit(sim.log.drop('relevance').withColumnRenamed('response', 'relevance'))

    users.unpersist()
    recs.unpersist()
    true_resp.unpersist()

print(train_ndcg)

Examples

You can find useful examples in notebooks folder, which demonstrates how to use synthetic data generators, composite generators, evaluate scores of the generators, iteratively refit recommendation algorithm, use response functions and more.

Build from sources

poetry build
pip install ./dist/sim4rec-0.0.1-py3-none-any.whl

Compile documentation

cd docs
make clean && make html

Tests

For tests the pytest python library is used and to run tests for all modules you can run the following command from repository root directory

pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sim4rec-0.0.1.tar.gz (21.9 kB view details)

Uploaded Source

Built Distribution

sim4rec-0.0.1-py3-none-any.whl (25.5 kB view details)

Uploaded Python 3

File details

Details for the file sim4rec-0.0.1.tar.gz.

File metadata

  • Download URL: sim4rec-0.0.1.tar.gz
  • Upload date:
  • Size: 21.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.7 Darwin/21.6.0

File hashes

Hashes for sim4rec-0.0.1.tar.gz
Algorithm Hash digest
SHA256 8196ed367421e24f0b69a5f90eff5dc72e4ca1fe74cba979a3e359f35833be00
MD5 065077c6959ccfa581e307c62a53d997
BLAKE2b-256 94912a20a8abb3588675bb3b3911f839e00fa8afaf4d1b0e7a8fe546652bcc85

See more details on using hashes here.

File details

Details for the file sim4rec-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: sim4rec-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 25.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.7 Darwin/21.6.0

File hashes

Hashes for sim4rec-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dbef3dee3330bcf0a140a3924a98ff3a5f6585693ed260c07ed80992058608f8
MD5 edc83f3eb6dbd46725b7ac1f117d37a6
BLAKE2b-256 6703e4752e124991f49e818031d5f7e139847a23197b28b31b8d8c072d8ff2e9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page