A configurable, tunable, and reproducible library for candidate item matching
Project description
MatchBox
Industrial recommender systems typically have two main phases: matching and ranking. In the first phase, candidate item matching (also known as candidate retrieval) aims for efficient and high-recall retrieval from a large item corpus. MatchBox provides an open source library for candidate item matching, with stunning features in configurability, tunability, and reproducibility.
Model Zoo
Publication | Model | Paper | Benchmark |
---|---|---|---|
UAI'09 | MF-BPR | BPR: Bayesian Personalized Ranking from Implicit Feedback | :arrow_upper_right: |
RecSys'16 | YoutubeNet | Deep Neural Networks for YouTube Recommendations | :arrow_upper_right: |
CIKM'21 | MF-CCL/ SimpleX | SimpleX: A Simple and Strong Baseline for Collaborative Filtering | :arrow_upper_right: |
Dependency
We suggest to use the following environment where we test MatchBox only.
- python 3.6.x
- torch 1.0.x
- PyYAML<5.0
- pandas
- scikit-learn
- numpy
- h5py
- tqdm
Get Started
The code workflow is structured as follows:
# Set the data config and model config
feature_cols = [{...}] # define feature columns
label_col = {...} # define label column
params = {...} # set data params and model params
# Set the feature encoding specs
feature_encoder = FeatureEncoder(feature_cols, label_col, ...) # define the feature encoder
datasets.build_dataset(feature_encoder, ...) # fit feature_encoder and build dataset
# Load data generators
train_gen, valid_gen, test_gen = h5_generator(feature_encoder, ...)
# Define a model
model = SimpleX(...)
# Train the model
model.fit(train_gen, valid_gen, ...)
# Evaluation
model.evaluate(test_gen)
Run the benchmark
For reproducing the experiment results, you can run the benchmarking script with the corresponding configs as follows.
- --config: The config directory where dataset config and model config are located.
- --expid: The experiment id defined in a model config file to denote a specific setting of hyper-parameters.
- --gpu: The gpu index used for experiment, and -1 for CPU.
cd model_zoo/SimpleX
python run_expid.py --config ./config/SimpleX_yelp18_m1 --expid SimpleX_yelp18_m1 --gpu 0
python run_expid.py --config ./config/SimpleX_amazonbooks_m1 --expid SimpleX_amazonbooks_m1 --gpu 0
python run_expid.py --config ./config/SimpleX_gowalla_m1 --expid SimpleX_gowalla_m1 --gpu 0
The running logs are also available in each config directory.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for recbox-0.0.2.post0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c5f1ec8df58a4647ea845cb4d5391ea8f729d6ae37adfd576df9979eaf2f6a37 |
|
MD5 | 90d07da669d15092b0ae74a317128cf1 |
|
BLAKE2b-256 | 078e5933c3d8ae1e7b4c4a2efb58e6e57021c8a657e8ffb64c32d75ab333ab90 |