Project description

MatchBox

Industrial recommender systems typically have two main phases: matching and ranking. In the first phase, candidate item matching (also known as candidate retrieval) aims for efficient and high-recall retrieval from a large item corpus. MatchBox provides an open source library for candidate item matching, with stunning features in configurability, tunability, and reproducibility.

Model Zoo

Publication	Model	Paper	Benchmark
UAI'09	MF-BPR	BPR: Bayesian Personalized Ranking from Implicit Feedback	:arrow_upper_right:
RecSys'16	YoutubeNet	Deep Neural Networks for YouTube Recommendations	:arrow_upper_right:
CIKM'21	MF-CCL/ SimpleX	SimpleX: A Simple and Strong Baseline for Collaborative Filtering	:arrow_upper_right:

Dependency

We suggest to use the following environment where we test MatchBox only.

python 3.6.x
torch 1.0.x
PyYAML<5.0
pandas
scikit-learn
numpy
h5py
tqdm

Get Started

The code workflow is structured as follows:

# Set the data config and model config
feature_cols = [{...}] # define feature columns
label_col = {...} # define label column
params = {...} # set data params and model params

# Set the feature encoding specs
feature_encoder = FeatureEncoder(feature_cols, label_col, ...) # define the feature encoder
datasets.build_dataset(feature_encoder, ...) # fit feature_encoder and build dataset 

# Load data generators
train_gen, valid_gen, test_gen = h5_generator(feature_encoder, ...)

# Define a model
model = SimpleX(...)

# Train the model
model.fit(train_gen, valid_gen, ...)

# Evaluation
model.evaluate(test_gen)

Run the benchmark

For reproducing the experiment results, you can run the benchmarking script with the corresponding configs as follows.

--config: The config directory where dataset config and model config are located.
--expid: The experiment id defined in a model config file to denote a specific setting of hyper-parameters.
--gpu: The gpu index used for experiment, and -1 for CPU.

cd model_zoo/SimpleX
python run_expid.py --config ./config/SimpleX_yelp18_m1 --expid SimpleX_yelp18_m1 --gpu 0
python run_expid.py --config ./config/SimpleX_amazonbooks_m1 --expid SimpleX_amazonbooks_m1 --gpu 0
python run_expid.py --config ./config/SimpleX_gowalla_m1 --expid SimpleX_gowalla_m1 --gpu 0

The running logs are also available in each config directory.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.0.4

May 16, 2023

0.0.3

Feb 21, 2023

0.0.2.post0

Feb 17, 2023

This version

0.0.2

Feb 17, 2023

0.0.1

Feb 17, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recbox-0.0.2.tar.gz (24.5 kB view hashes)

Uploaded Feb 17, 2023 Source

Built Distribution

recbox-0.0.2-py3-none-any.whl (36.3 kB view hashes)

Uploaded Feb 17, 2023 Python 3

Hashes for recbox-0.0.2.tar.gz

Hashes for recbox-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`7872e3b8f6f1b520436f94c7df50eda5053362803a97a1f1ff8fd346fe5543f6`
MD5	`91e6777e4b55f118d620a6b4f7745963`
BLAKE2b-256	`025296098a1563c24ce9e6472739a9fc7b419fdb3f7d0441eea4c4c13a669205`

Hashes for recbox-0.0.2-py3-none-any.whl

Hashes for recbox-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8e202b428a6674341ce68caa15cf421263f520d46782a110312c1b9d0efeac24`
MD5	`7041554c949d2553abefe67609caa7f1`
BLAKE2b-256	`2329876ead2745250a5df0b26a177d98f3d708807acd2ad86382717bdfd2921c`