A comprehensive recommendation library with match, ranking, and multi-task learning models
Project description
Table of Contents
- Introduction
- Installation
- Architecture
- 5-Minute Quick Start
- CLI Usage
- Platform Compatibility
- Supported Models
- Contributing
Introduction
NextRec is a modern recommendation framework built on PyTorch, delivering a unified experience for modeling, training, and evaluation. Design with rich model implementations, data-processing utilities, and engineering-ready training components. NextRec focuses on large-scale industrial recommendation scenarios on Spark clusters, training on massive offline features(parquet/csv).
Why NextRec
- Unified feature engineering & data pipeline: NextRec provide unified Dense/Sparse/Sequence feature definitions, DataProcessor, and batch-optimized RecDataLoader, matching offline feature training/inference in industrial big-data settings.
- Multi-scenario coverage: Ranking (CTR/CVR), retrieval, multi-task learning, and more marketing/rec models, with a continuously expanding model zoo.
- Developer-friendly experience:
Stream processing/distributed training/inferenceforcsv/parquet/pathlikedata, plus GPU/MPS acceleration and visualization support. - Efficient training & evaluation: Standardized engine with optimizers, LR schedulers, early stopping, checkpoints, and detailed logging out of the box.
Architecture
NextRec adopts a modular design, enabling full-pipeline reusability and scalability across data processing → model construction → training & evaluation → inference & deployment. Its core components include: a Feature-Spec-driven Embedding architecture, the BaseModel abstraction, a set of independent reusable Layers, a unified DataLoader for both training and inference, and a ready-to-use Model Zoo.
The project borrows ideas from excellent open-source rec libraries, for example: torch-rechub. torch-rechub remains mature in architecture and models; the author contributed a bit there—feel free to check it out.
Installation
You can quickly install the latest NextRec via pip install nextrec; Python 3.10+ is required. If you want to run some tutorial codes, pull this project first:
git clone https://github.com/zerolovesea/NextRec.git
cd NextRec/
pip install nextrec # or pip install -e .
Tutorials
See tutorials/ for examples covering ranking, retrieval, multi-task learning, and data processing:
-
movielen_ranking_deepfm.py — DeepFM training on MovieLens 100k dataset
-
example_ranking_din.py — DIN Deep Interest Network training on e-commerce dataset
-
example_multitask.py — ESMM multi-task learning training on e-commerce dataset
-
movielen_match_dssm.py — DSSM retrieval model training on MovieLens 100k dataset
-
run_all_ranking_models.py — Quickly validate availability of all ranking models
-
run_all_multitask_models.py — Quickly validate availability of all multi-task models
-
run_all_match_models.py — Quickly validate availability of all retrieval models
To dive deeper into NextRec framework details, Jupyter notebooks are available:
5-Minute Quick Start
We provide a detailed quick-start guide and paired datasets to help you get familiar with different features of NextRec framework. In datasets/ you'll find an e-commerce scenario test dataset like this:
| user_id | item_id | dense_0 | dense_1 | dense_2 | dense_3 | dense_4 | dense_5 | dense_6 | dense_7 | sparse_0 | sparse_1 | sparse_2 | sparse_3 | sparse_4 | sparse_5 | sparse_6 | sparse_7 | sparse_8 | sparse_9 | sequence_0 | sequence_1 | label |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 7817 | 0.14704075 | 0.31020382 | 0.77780896 | 0.944897 | 0.62315375 | 0.57124174 | 0.77009535 | 0.3211029 | 315 | 260 | 379 | 146 | 168 | 161 | 138 | 88 | 5 | 312 | [170,175,97,338,105,353,272,546,175,545,463,128,0,0,0] | [368,414,820,405,548,63,327,0,0,0,0,0,0,0,0] | 0 |
| 1 | 3579 | 0.77811223 | 0.80359334 | 0.5185201 | 0.91091245 | 0.043562356 | 0.82142705 | 0.8803686 | 0.33748195 | 149 | 229 | 442 | 6 | 167 | 252 | 25 | 402 | 7 | 168 | [179,48,61,551,284,165,344,151,0,0,0,0,0,0,0] | [814,0,0,0,0,0,0,0,0,0,0,0,0,0,0] | 1 |
Below is a short example showing how to train a DIN (Deep Interest Network) model. You can also run python tutorials/example_ranking_din.py directly to execute the training and inference code.
After training starts, you can find detailed training logs at nextrec_logs/din_tutorial.
import pandas as pd
from nextrec.models.ranking.din import DIN
from nextrec.basic.features import DenseFeature, SparseFeature, SequenceFeature
df = pd.read_csv('dataset/ranking_task.csv')
for col in df.columns and 'sequence' in col: # csv loads lists as text; convert them back to objects
df[col] = df[col].apply(lambda x: eval(x) if isinstance(x, str) else x)
# Define feature columns
dense_features = [DenseFeature(name=f'dense_{i}', input_dim=1) for i in range(8)]
sparse_features = [SparseFeature(name='user_id', embedding_name='user_emb', vocab_size=int(df['user_id'].max() + 1), embedding_dim=32), SparseFeature(name='item_id', embedding_name='item_emb', vocab_size=int(df['item_id'].max() + 1), embedding_dim=32),]
sparse_features.extend([SparseFeature(name=f'sparse_{i}', embedding_name=f'sparse_{i}_emb', vocab_size=int(df[f'sparse_{i}'].max() + 1), embedding_dim=32) for i in range(10)])
sequence_features = [
SequenceFeature(name='sequence_0', vocab_size=int(df['sequence_0'].apply(lambda x: max(x)).max() + 1), embedding_dim=32, padding_idx=0, embedding_name='item_emb'),
SequenceFeature(name='sequence_1', vocab_size=int(df['sequence_1'].apply(lambda x: max(x)).max() + 1), embedding_dim=16, padding_idx=0, embedding_name='sparse_0_emb'),]
mlp_params = {
"dims": [256, 128, 64],
"activation": "relu",
"dropout": 0.3,
}
model = DIN(
dense_features=dense_features,
sparse_features=sparse_features,
sequence_features=sequence_features,
mlp_params=mlp_params,
attention_hidden_units=[80, 40],
attention_activation='sigmoid',
attention_use_softmax=True,
target=['label'], # target variable
device='mps',
embedding_l1_reg=1e-6,
embedding_l2_reg=1e-5,
dense_l1_reg=1e-5,
dense_l2_reg=1e-4,
session_id="din_tutorial", # experiment id for logs
)
# Compile model with optimizer and loss
model.compile(
optimizer = "adam",
optimizer_params = {"lr": 1e-3, "weight_decay": 1e-5},
loss = "focal",
loss_params={"gamma": 2.0, "alpha": 0.25},
)
model.fit(
train_data=df,
metrics=['auc', 'gauc', 'logloss'], # metrics to track
epochs=3,
batch_size=512,
shuffle=True,
user_id_column='user_id' # used for GAUC
)
# Evaluate after training
metrics = model.evaluate(
df,
metrics=['auc', 'gauc', 'logloss'],
batch_size=512,
user_id_column='user_id'
)
CLI Usage
NextRec provides a powerful command-line interface for model training and prediction using YAML configuration files. For detailed CLI documentation, see:
- NextRec CLI User Guide - Complete guide for using the CLI
- NextRec CLI Configuration Examples - CLI configuration file examples
# Train a model
nextrec --mode=train --train_config=path/to/train_config.yaml
# Run prediction
nextrec --mode=predict --predict_config=path/to/predict_config.yaml
As of version 0.4.4, NextRec CLI supports single-machine training; distributed training features are currently under development.
Platform Compatibility
The current version is 0.4.4. All models and test code have been validated on the following platforms. If you encounter compatibility issues, please report them in the issue tracker with your system version:
| Platform | Configuration |
|---|---|
| MacOS latest | MacBook Pro M4 Pro 24GB RAM |
| Ubuntu latest | AutoDL 4070D Dual GPU |
| CentOS 7 | Intel Xeon 5138Y 96 cores 377GB RAM |
Supported Models
Ranking Models
| Model | Paper | Year | Status |
|---|---|---|---|
| FM | Factorization Machines | ICDM 2010 | Supported |
| AFM | Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks | IJCAI 2017 | Supported |
| DeepFM | DeepFM: A Factorization-Machine based Neural Network for CTR Prediction | IJCAI 2017 | Supported |
| Wide&Deep | Wide & Deep Learning for Recommender Systems | DLRS 2016 | Supported |
| xDeepFM | xDeepFM: Combining Explicit and Implicit Feature Interactions | KDD 2018 | Supported |
| FiBiNET | FiBiNET: Combining Feature Importance and Bilinear Feature Interaction for CTR Prediction | RecSys 2019 | Supported |
| PNN | Product-based Neural Networks for User Response Prediction | ICDM 2016 | Supported |
| AutoInt | AutoInt: Automatic Feature Interaction Learning | CIKM 2019 | Supported |
| DCN | Deep & Cross Network for Ad Click Predictions | ADKDD 2017 | Supported |
| DCN v2 | DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems | KDD 2021 | In Progress |
| DIN | Deep Interest Network for CTR Prediction | KDD 2018 | Supported |
| DIEN | Deep Interest Evolution Network | AAAI 2019 | Supported |
| MaskNet | MaskNet: Feature-wise Gating Blocks for High-dimensional Sparse Recommendation Data | 2020 | Supported |
Retrieval Models
| Model | Paper | Year | Status |
|---|---|---|---|
| DSSM | Learning Deep Structured Semantic Models | CIKM 2013 | Supported |
| DSSM v2 | DSSM with pairwise BPR-style optimization | - | Supported |
| YouTube DNN | Deep Neural Networks for YouTube Recommendations | RecSys 2016 | Supported |
| MIND | Multi-Interest Network with Dynamic Routing | CIKM 2019 | Supported |
| SDM | Sequential Deep Matching Model | - | Supported |
Multi-task Models
| Model | Paper | Year | Status |
|---|---|---|---|
| MMOE | Modeling Task Relationships in Multi-task Learning | KDD 2018 | Supported |
| PLE | Progressive Layered Extraction | RecSys 2020 | Supported |
| ESMM | Entire Space Multi-task Model | SIGIR 2018 | Supported |
| ShareBottom | Multitask Learning | - | Supported |
| POSO | POSO: Personalized Cold-start Modules for Large-scale Recommender Systems | 2021 | Supported |
Generative Models
| Model | Paper | Year | Status |
|---|---|---|---|
| TIGER | Recommender Systems with Generative Retrieval | NeurIPS 2023 | In Progress |
| HSTU | Hierarchical Sequential Transduction Units | - | Supported |
Contributing
We welcome contributions of any form!
How to Contribute
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add AmazingFeature') - Push your branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Before submitting a PR, please run
python test/run_tests.pyandpython scripts/format_code.pyto ensure all tests pass and code style is consistent.
Code Style
- Follow PEP8
- Provide unit tests for new functionality
- Update documentation accordingly
Reporting Issues
When submitting issues on GitHub, please include:
- Description of the problem
- Reproduction steps
- Expected behavior
- Actual behavior
- Environment info (Python version, PyTorch version, etc.)
License
This project is licensed under the Apache 2.0 License.
Contact
- GitHub Issues: Submit an issue
- Email: zyaztec@gmail.com
Acknowledgements
NextRec is inspired by the following great open-source projects:
- torch-rechub — Flexible, easy-to-extend recommendation framework
- FuxiCTR — Configurable, tunable, and reproducible CTR library
- RecBole — Unified, comprehensive, and efficient recommendation library
Special thanks to all open-source contributors!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nextrec-0.4.4.tar.gz.
File metadata
- Download URL: nextrec-0.4.4.tar.gz
- Upload date:
- Size: 26.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d3fd162f2381eeaac325b3140d31750243d0c87df366f33d601ca94e2856dae
|
|
| MD5 |
1131008c7e6b84f671c55d48a8528e20
|
|
| BLAKE2b-256 |
5ee2ddff7d0581f77632bc6e17828ca38b687c442ee793899919dc35466dc69a
|
File details
Details for the file nextrec-0.4.4-py3-none-any.whl.
File metadata
- Download URL: nextrec-0.4.4-py3-none-any.whl
- Upload date:
- Size: 168.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
950164e0ccd6814e0859daac9038908d3cc266058f640e6ddcf0b8f37c1c658d
|
|
| MD5 |
2e06642840485b4886d7aa330f4395d8
|
|
| BLAKE2b-256 |
39175bd62492d273d3d3e9a7d7f174677b04a60d5584261bf668eb2bc1c7bd24
|