Skip to main content

A configurable, tunable, and reproducible library for candidate item matching

Project description

MatchBox

Industrial recommender systems typically have two main phases: matching and ranking. In the first phase, candidate item matching (also known as candidate retrieval) aims for efficient and high-recall retrieval from a large item corpus. MatchBox provides an open source library for candidate item matching, with stunning features in configurability, tunability, and reproducibility.

Model Zoo

Publication Model Paper Benchmark
UAI'09 MF-BPR BPR: Bayesian Personalized Ranking from Implicit Feedback :arrow_upper_right:
RecSys'16 YoutubeNet Deep Neural Networks for YouTube Recommendations :arrow_upper_right:
CIKM'21 MF-CCL/ SimpleX SimpleX: A Simple and Strong Baseline for Collaborative Filtering :arrow_upper_right:

Dependency

We suggest to use the following environment where we test MatchBox only.

  • python 3.6.x
  • torch 1.0.x
  • PyYAML<5.0
  • pandas
  • scikit-learn
  • numpy
  • h5py
  • tqdm

Get Started

The code workflow is structured as follows:

# Set the data config and model config
feature_cols = [{...}] # define feature columns
label_col = {...} # define label column
params = {...} # set data params and model params

# Set the feature encoding specs
feature_encoder = FeatureEncoder(feature_cols, label_col, ...) # define the feature encoder
datasets.build_dataset(feature_encoder, ...) # fit feature_encoder and build dataset 

# Load data generators
train_gen, valid_gen, test_gen = h5_generator(feature_encoder, ...)

# Define a model
model = SimpleX(...)

# Train the model
model.fit(train_gen, valid_gen, ...)

# Evaluation
model.evaluate(test_gen)

Run the benchmark

For reproducing the experiment results, you can run the benchmarking script with the corresponding configs as follows.

  • --config: The config directory where dataset config and model config are located.
  • --expid: The experiment id defined in a model config file to denote a specific setting of hyper-parameters.
  • --gpu: The gpu index used for experiment, and -1 for CPU.
cd model_zoo/SimpleX
python run_expid.py --config ./config/SimpleX_yelp18_m1 --expid SimpleX_yelp18_m1 --gpu 0
python run_expid.py --config ./config/SimpleX_amazonbooks_m1 --expid SimpleX_amazonbooks_m1 --gpu 0
python run_expid.py --config ./config/SimpleX_gowalla_m1 --expid SimpleX_gowalla_m1 --gpu 0

The running logs are also available in each config directory.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recbox-0.0.2.tar.gz (24.5 kB view hashes)

Uploaded Source

Built Distribution

recbox-0.0.2-py3-none-any.whl (36.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page