A configurable, tunable, and reproducible library for CTR prediction
Project description
FuxiCTR
This repo is the latest dev version of the official release at huawei-noah/benchmark/FuxiCTR.
Click-through rate (CTR) prediction is an critical task for many industrial applications such as online advertising, recommender systems, and sponsored search. FuxiCTR provides an open-source library for CTR prediction, with stunning features in configurability, tunability, and reproducibility. It also supports the building of BARS-CTR-Benchmark, which aims for open benchmarking for CTR prediction.
Model List
Dependency
FuxiCTR has the following dependent requirements to install. While the implementation of FuxiCTR should support more pytorch versions, we currently perform the tests on pytorch v1.0~1.1 only.
- python 3.6
- pytorch v1.0/v1.1
- pyyaml >=5.1
- scikit-learn
- pandas
- numpy
- h5py
- tqdm
Get Started
1. Run the demo
Please follow the examples in the demo directory to get started. The code workflow is structured as follows:
# Set the data config and model config
feature_cols = [{...}] # define feature columns
label_col = {...} # define label column
params = {...} # set data params and model params
# Set the feature encoding specs
feature_encoder = FeatureEncoder(feature_cols, label_col, ...) # define the feature encoder
feature_encoder.fit(...) # fit and transfrom the data
# Load data generators
train_gen, valid_gen, test_gen = data_generator(feature_encoder, ...)
# Define a model
model = DeepFM(...)
# Train the model
model.fit_generator(train_gen, validation_data=valid_gen, ...)
# Evaluation
model.evaluate_generator(test_gen)
2. Run the benchmark with given experiment_id in config file
For reproducing the experiment result, you can run the benchmarking script with the corresponding config file as follows.
- --config: The config directory of data and model config files.
- --expid: The specific experiment_id that denotes the detailed data and model settings.
- --gpu: The gpu index used for experiment, and -1 for CPU.
In the following example, we create a demo model_config.yaml and dataset_config.yaml in benchmarks/expid_config, and set the experiemnt id FM_test
.
cd benchmarks
python run_expid.py --config ./expid_config --expid FM_test --gpu 0
3. Tune the model hyper-parameters
For tuning model hyper-parameters, you can apply grid-search over the specified tuning space with the following script.
- --config: The config file that defines the tuning space
- --tag: (optional) Specify the tag to determine which expid to run (e.g. 001 for the first expid). This is useful to rerun one specific experiment_id that contains the tag.
- --gpu: The available gpus for parameters tuning and multiple gpus can be used (e.g., using --gpu 0 1 for two gpus)
In the following example, we use the hyper-parameters of FM_test
in benchmarks/expid_config as the base setting, and create a tuner config file FM_tuner_config.yaml
in benchmarks/tuner_config, which defines the tuning space for parameter tuning. In particular, if a key in tuner_space
has values stored in a list, those values will be grid-searched. Otherwise, the default value in FM_test
will be applied. After finished, all the searched results can be accessed from FM_tuner_config.csv
in the ./benchmarks
folder.
cd benchmarks
python run_param_tuner.py --config ./tuner_config/FM_tuner_config.yaml --gpu 0 1
For more running examples, please refer to the benchmarking results in BARS-CTR-Benchmark.
Code Structure
Check an overview of code structure for more details on API design.
Discussion
Welcome to join our WeChat group for any questions and discussions.
Join Us
We have open positions for internships and full-time jobs. If you are interested in research and practice in recommender systems, please send your CV to jamie.zhu@huawei.com.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.