Skip to main content

A comprehensive recommendation library with match, ranking, and multi-task learning models

Project description

Python PyTorch License Version

English | 中文文档

A Unified, Efficient, and Scalable Recommendation System Framework

Table of Contents

Introduction

NextRec is a modern recommendation framework built on PyTorch, delivering a unified experience for modeling, training, and evaluation. Design with rich model implementations, data-processing utilities, and engineering-ready training components. NextRec focuses on large-scale industrial recommendation scenarios on Spark clusters, training on massive offline features(parquet/csv).

Why NextRec

  • Unified feature engineering & data pipeline: NextRec provide unified Dense/Sparse/Sequence feature definitions, DataProcessor, and batch-optimized RecDataLoader, matching offline feature training/inference in industrial big-data settings.
  • Multi-scenario coverage: Ranking (CTR/CVR), retrieval, multi-task learning, and more marketing/rec models, with a continuously expanding model zoo.
  • Developer-friendly experience: Stream processing/distributed training/inference for csv/parquet/pathlike data, plus GPU/MPS acceleration and visualization support.
  • Efficient training & evaluation: Standardized engine with optimizers, LR schedulers, early stopping, checkpoints, and detailed logging out of the box.

Architecture

NextRec adopts a modular design, enabling full-pipeline reusability and scalability across data processing → model construction → training & evaluation → inference & deployment. Its core components include: a Feature-Spec-driven Embedding architecture, the BaseModel abstraction, a set of independent reusable Layers, a unified DataLoader for both training and inference, and a ready-to-use Model Zoo.

NextRec Architecture

The project borrows ideas from excellent open-source rec libraries, for example: torch-rechub. torch-rechub remains mature in architecture and models; the author contributed a bit there—feel free to check it out.


Installation

You can quickly install the latest NextRec via pip install nextrec; Python 3.10+ is required. If you want to run some tutorial codes, pull this project first:

git clone https://github.com/zerolovesea/NextRec.git
cd NextRec/
pip install nextrec # or pip install -e .

Tutorials

See tutorials/ for examples covering ranking, retrieval, multi-task learning, and data processing:

To dive deeper into NextRec framework details, Jupyter notebooks are available:

5-Minute Quick Start

We provide a detailed quick-start guide and paired datasets to help you get familiar with different features of NextRec framework. In datasets/ you'll find an e-commerce scenario test dataset like this:

user_id item_id dense_0 dense_1 dense_2 dense_3 dense_4 dense_5 dense_6 dense_7 sparse_0 sparse_1 sparse_2 sparse_3 sparse_4 sparse_5 sparse_6 sparse_7 sparse_8 sparse_9 sequence_0 sequence_1 label
1 7817 0.14704075 0.31020382 0.77780896 0.944897 0.62315375 0.57124174 0.77009535 0.3211029 315 260 379 146 168 161 138 88 5 312 [170,175,97,338,105,353,272,546,175,545,463,128,0,0,0] [368,414,820,405,548,63,327,0,0,0,0,0,0,0,0] 0
1 3579 0.77811223 0.80359334 0.5185201 0.91091245 0.043562356 0.82142705 0.8803686 0.33748195 149 229 442 6 167 252 25 402 7 168 [179,48,61,551,284,165,344,151,0,0,0,0,0,0,0] [814,0,0,0,0,0,0,0,0,0,0,0,0,0,0] 1

Below is a short example showing how to train a DIN (Deep Interest Network) model. You can also run python tutorials/example_ranking_din.py directly to execute the training and inference code.

After training starts, you can find detailed training logs at nextrec_logs/din_tutorial.

import pandas as pd

from nextrec.models.ranking.din import DIN
from nextrec.basic.features import DenseFeature, SparseFeature, SequenceFeature

df = pd.read_csv('dataset/ranking_task.csv')

for col in df.columns and 'sequence' in col: # csv loads lists as text; convert them back to objects
    df[col] = df[col].apply(lambda x: eval(x) if isinstance(x, str) else x)

# Define feature columns
dense_features = [DenseFeature(name=f'dense_{i}', input_dim=1) for i in range(8)]

sparse_features = [SparseFeature(name='user_id', embedding_name='user_emb', vocab_size=int(df['user_id'].max() + 1), embedding_dim=32), SparseFeature(name='item_id', embedding_name='item_emb', vocab_size=int(df['item_id'].max() + 1), embedding_dim=32),]

sparse_features.extend([SparseFeature(name=f'sparse_{i}', embedding_name=f'sparse_{i}_emb', vocab_size=int(df[f'sparse_{i}'].max() + 1), embedding_dim=32) for i in range(10)])

sequence_features = [
    SequenceFeature(name='sequence_0', vocab_size=int(df['sequence_0'].apply(lambda x: max(x)).max() + 1), embedding_dim=32, padding_idx=0, embedding_name='item_emb'),
    SequenceFeature(name='sequence_1', vocab_size=int(df['sequence_1'].apply(lambda x: max(x)).max() + 1), embedding_dim=16, padding_idx=0, embedding_name='sparse_0_emb'),]

mlp_params = {
    "dims": [256, 128, 64],
    "activation": "relu",
    "dropout": 0.3,
}

model = DIN(
    dense_features=dense_features,
    sparse_features=sparse_features,
    sequence_features=sequence_features,
    mlp_params=mlp_params,
    attention_hidden_units=[80, 40],
    attention_activation='sigmoid',
    attention_use_softmax=True,
    target=['label'],                                     # target variable
    device='mps',                                         
    embedding_l1_reg=1e-6,
    embedding_l2_reg=1e-5,
    dense_l1_reg=1e-5,
    dense_l2_reg=1e-4,
    session_id="din_tutorial",                            # experiment id for logs
)

# Compile model with optimizer and loss
model.compile(
            optimizer = "adam",
            optimizer_params = {"lr": 1e-3, "weight_decay": 1e-5},
            loss = "focal",
            loss_params={"gamma": 2.0, "alpha": 0.25},
        )

model.fit(
    train_data=df,
    metrics=['auc', 'gauc', 'logloss'],  # metrics to track
    epochs=3,
    batch_size=512,
    shuffle=True,
    user_id_column='user_id'             # used for GAUC
)

# Evaluate after training
metrics = model.evaluate(
    df,
    metrics=['auc', 'gauc', 'logloss'],
    batch_size=512,
    user_id_column='user_id'
)

CLI Usage

NextRec provides a powerful command-line interface for model training and prediction using YAML configuration files. For detailed CLI documentation, see:

# Train a model
nextrec --mode=train --train_config=path/to/train_config.yaml

# Run prediction
nextrec --mode=predict --predict_config=path/to/predict_config.yaml

As of version 0.4.5, NextRec CLI supports single-machine training; distributed training features are currently under development.

Platform Compatibility

The current version is 0.4.5. All models and test code have been validated on the following platforms. If you encounter compatibility issues, please report them in the issue tracker with your system version:

Platform Configuration
MacOS latest MacBook Pro M4 Pro 24GB RAM
Ubuntu latest AutoDL 4070D Dual GPU
CentOS 7 Intel Xeon 5138Y 96 cores 377GB RAM

Supported Models

Ranking Models

Model Paper Year Status
FM Factorization Machines ICDM 2010 Supported
AFM Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks IJCAI 2017 Supported
DeepFM DeepFM: A Factorization-Machine based Neural Network for CTR Prediction IJCAI 2017 Supported
Wide&Deep Wide & Deep Learning for Recommender Systems DLRS 2016 Supported
xDeepFM xDeepFM: Combining Explicit and Implicit Feature Interactions KDD 2018 Supported
FiBiNET FiBiNET: Combining Feature Importance and Bilinear Feature Interaction for CTR Prediction RecSys 2019 Supported
PNN Product-based Neural Networks for User Response Prediction ICDM 2016 Supported
AutoInt AutoInt: Automatic Feature Interaction Learning CIKM 2019 Supported
DCN Deep & Cross Network for Ad Click Predictions ADKDD 2017 Supported
DCN v2 DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems KDD 2021 In Progress
DIN Deep Interest Network for CTR Prediction KDD 2018 Supported
DIEN Deep Interest Evolution Network AAAI 2019 Supported
MaskNet MaskNet: Feature-wise Gating Blocks for High-dimensional Sparse Recommendation Data 2020 Supported

Retrieval Models

Model Paper Year Status
DSSM Learning Deep Structured Semantic Models CIKM 2013 Supported
DSSM v2 DSSM with pairwise BPR-style optimization - Supported
YouTube DNN Deep Neural Networks for YouTube Recommendations RecSys 2016 Supported
MIND Multi-Interest Network with Dynamic Routing CIKM 2019 Supported
SDM Sequential Deep Matching Model - Supported

Multi-task Models

Model Paper Year Status
MMOE Modeling Task Relationships in Multi-task Learning KDD 2018 Supported
PLE Progressive Layered Extraction RecSys 2020 Supported
ESMM Entire Space Multi-task Model SIGIR 2018 Supported
ShareBottom Multitask Learning - Supported
POSO POSO: Personalized Cold-start Modules for Large-scale Recommender Systems 2021 Supported

Generative Models

Model Paper Year Status
TIGER Recommender Systems with Generative Retrieval NeurIPS 2023 In Progress
HSTU Hierarchical Sequential Transduction Units - Supported

Contributing

We welcome contributions of any form!

How to Contribute

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add AmazingFeature')
  4. Push your branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Before submitting a PR, please run python test/run_tests.py and python scripts/format_code.py to ensure all tests pass and code style is consistent.

Code Style

  • Follow PEP8
  • Provide unit tests for new functionality
  • Update documentation accordingly

Reporting Issues

When submitting issues on GitHub, please include:

  • Description of the problem
  • Reproduction steps
  • Expected behavior
  • Actual behavior
  • Environment info (Python version, PyTorch version, etc.)

License

This project is licensed under the Apache 2.0 License.


Contact


Acknowledgements

NextRec is inspired by the following great open-source projects:

  • torch-rechub — Flexible, easy-to-extend recommendation framework
  • FuxiCTR — Configurable, tunable, and reproducible CTR library
  • RecBole — Unified, comprehensive, and efficient recommendation library

Special thanks to all open-source contributors!


Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nextrec-0.4.5.tar.gz (26.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nextrec-0.4.5-py3-none-any.whl (168.5 kB view details)

Uploaded Python 3

File details

Details for the file nextrec-0.4.5.tar.gz.

File metadata

  • Download URL: nextrec-0.4.5.tar.gz
  • Upload date:
  • Size: 26.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for nextrec-0.4.5.tar.gz
Algorithm Hash digest
SHA256 a155a8ab493cab320987df23143b8d198ed682b93a519425b2089698dba2e4ab
MD5 1582342678ac4e9fa75c9ed5a941c52b
BLAKE2b-256 aa7d139edc5116773e172cd9a5cc39e569c122e4468a2529bf6ab569481d3e0a

See more details on using hashes here.

File details

Details for the file nextrec-0.4.5-py3-none-any.whl.

File metadata

  • Download URL: nextrec-0.4.5-py3-none-any.whl
  • Upload date:
  • Size: 168.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for nextrec-0.4.5-py3-none-any.whl
Algorithm Hash digest
SHA256 e4c14fd473547837a1eeb2bfdde83f12182ee3a7ece8276c9afd8e1a1fb9c7f9
MD5 5deb4add975a6974d982348085d9a1bb
BLAKE2b-256 5c3a0373850ed2dc124cb8eda716656556186d5a38c77a6b24ded2a0499e9205

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page