Skip to main content

A modern, production-ready library for state-of-the-art recommender systems

Project description

SOTA Recommender Systems Library

A modern, production-ready Python library for building state-of-the-art recommender systems. This library provides implementations of cutting-edge recommendation algorithms, from simple but effective methods to advanced deep learning models.

Python 3.8+ License: MIT

Features

✨ SOTA Algorithms

  • Simple but Effective

    • 🚀 EASE - Embarrassingly Shallow Autoencoders (closed-form solution, incredibly fast)
    • 📊 SLIM - Sparse Linear Methods with L1/L2 regularization
  • Matrix Factorization

    • 📐 SVD - Singular Value Decomposition
    • SVD++ - SVD with implicit feedback
    • 🔄 ALS - Alternating Least Squares for implicit feedback
  • Deep Learning (requires PyTorch)

    • 🧠 NCF - Neural Collaborative Filtering (GMF + MLP)
    • 🔗 LightGCN - Graph Neural Network for recommendations ✅
    • 📝 SASRec - Self-Attentive Sequential Recommendations ✅

🛠️ Production-Ready Features

  • Comprehensive Evaluation Metrics: Precision@K, Recall@K, NDCG@K, MAP@K, MRR, Hit Rate, Coverage, Diversity
  • Data Processing: Built-in dataset loaders (MovieLens, Amazon, etc.), negative sampling, preprocessing
  • Flexible Architecture: Unified API for all models, easy to extend
  • Performance: Optimized for both speed and accuracy

Installation

Basic Installation

pip install .

With Deep Learning Support

pip install -r requirements.txt

Quick Start

from recommender import (
    EASERecommender,
    load_movielens,
    InteractionDataset,
    Evaluator
)

# Load data
df = load_movielens(size='100k')

# Create dataset
dataset = InteractionDataset(df, implicit=True)
train, test = dataset.split(test_size=0.2)

# Train model
model = EASERecommender(l2_reg=500.0)
model.fit(train.data)

# Generate recommendations
user_ids = [1, 2, 3]
recommendations = model.recommend(user_ids, k=10)

# Evaluate
evaluator = Evaluator(metrics=['precision', 'recall', 'ndcg'])
results = evaluator.evaluate(model, test, task='ranking', train_data=train)
evaluator.print_results(results)

Usage Examples

1. EASE - Fast and Effective

EASE is perfect for large-scale implicit feedback datasets. It has a closed-form solution, making it extremely fast.

from recommender import EASERecommender, load_movielens, InteractionDataset

# Load MovieLens data
df = load_movielens(size='1m')
dataset = InteractionDataset(df, implicit=True, min_user_interactions=5)

# Train/test split
train, test = dataset.split(test_size=0.2, strategy='random')

# Train EASE
model = EASERecommender(l2_reg=500.0)
model.fit(train.data)

# Get recommendations
recommendations = model.recommend([1, 2, 3], k=10, exclude_seen=True)
print(recommendations)

# Save model
model.save('ease_model.pkl')

2. SLIM - Sparse Item-Item Model

SLIM learns a sparse item-item similarity matrix, providing interpretable recommendations.

from recommender import SLIMRecommender

# Train SLIM
model = SLIMRecommender(
    l1_reg=0.1,      # L1 regularization for sparsity
    l2_reg=0.1,      # L2 regularization
    max_iter=100,
    positive_only=True
)
model.fit(train.data)

# Get similar items
similar_items = model.get_similar_items(item_id=123, k=10)
print(f"Items similar to 123: {similar_items}")

3. SVD++ - Matrix Factorization with Implicit Feedback

SVD++ incorporates implicit feedback for better predictions on explicit ratings.

from recommender import SVDPlusPlusRecommender

# Load explicit ratings
df = load_movielens(size='100k')  # Contains ratings 1-5
dataset = InteractionDataset(df, implicit=False)
train, test = dataset.split(test_size=0.2)

# Train SVD++
model = SVDPlusPlusRecommender(
    n_factors=20,
    n_epochs=20,
    lr=0.005,
    reg=0.02
)
model.fit(train.data)

# Predict ratings
user_ids = [1, 1, 2]
item_ids = [10, 20, 30]
predictions = model.predict(user_ids, item_ids)
print(f"Predicted ratings: {predictions}")

4. ALS - Implicit Feedback at Scale

ALS is excellent for large-scale implicit feedback datasets.

from recommender import ALSRecommender

# Train ALS
model = ALSRecommender(
    n_factors=50,
    n_iterations=15,
    reg=0.01,
    alpha=40.0  # Confidence scaling
)
model.fit(train.data)

# Get recommendations
recommendations = model.recommend([1, 2, 3], k=20)

5. NCF - Deep Learning (requires PyTorch)

Neural Collaborative Filtering combines matrix factorization with deep learning.

from recommender import NCFRecommender

# Train NCF
model = NCFRecommender(
    embedding_dim=64,
    hidden_layers=[128, 64, 32],
    learning_rate=0.001,
    batch_size=256,
    epochs=20,
    device='cuda'  # or 'cpu'
)
model.fit(train.data)

# Get recommendations
recommendations = model.recommend([1, 2, 3], k=10)

6. Custom Data Processing

from recommender.data import (
    filter_by_interaction_count,
    binarize_implicit_feedback,
    create_sequences,
    temporal_split
)
import pandas as pd

# Load your custom data
df = pd.read_csv('your_data.csv')

# Filter sparse users/items
df = filter_by_interaction_count(
    df,
    min_user_interactions=5,
    min_item_interactions=5
)

# Convert to implicit feedback
df = binarize_implicit_feedback(df, threshold=4.0)

# Temporal split (if you have timestamps)
train, test = temporal_split(df, test_size=0.2)

7. Advanced Evaluation

from recommender import Evaluator

# Create evaluator with custom metrics
evaluator = Evaluator(
    metrics=['precision', 'recall', 'ndcg', 'map', 'mrr', 'hit_rate', 'coverage', 'diversity'],
    k_values=[5, 10, 20, 50]
)

# Evaluate model
results = evaluator.evaluate(
    model,
    test_data=test,
    task='ranking',
    exclude_train=True,
    train_data=train
)

# Pretty print results
evaluator.print_results(results)

# Access specific metrics
ndcg_10 = results['ndcg@10']
recall_20 = results['recall@20']

8. Cross-Validation

from recommender import cross_validate

# Perform 5-fold cross-validation
cv_results = cross_validate(
    model_class=EASERecommender,
    dataset=dataset,
    n_folds=5,
    metrics=['precision', 'recall', 'ndcg'],
    k_values=[10, 20],
    l2_reg=500.0  # Model hyperparameters
)

9. Negative Sampling

from recommender.data import UniformSampler, PopularitySampler, create_negative_samples

# Uniform negative sampling
sampler = UniformSampler(n_items=dataset.n_items, seed=42)

# Popularity-based sampling
item_popularity = train.data['item_id'].value_counts().to_dict()
sampler = PopularitySampler(n_items=dataset.n_items, item_popularity=item_popularity)

# Create training data with negatives
train_with_negatives = create_negative_samples(
    interactions_df=train.data,
    sampler=sampler,
    n_negatives_per_positive=4
)

Benchmarks

Performance on MovieLens-1M (80/20 split, implicit feedback):

Model NDCG@10 Recall@10 Precision@10 Training Time
EASE 0.3845 0.2156 0.1723 ~5s
SLIM 0.3721 0.2089 0.1654 ~2min
ALS 0.3567 0.1998 0.1589 ~30s
SVD 0.3289 0.1845 0.1456 ~10s
NCF 0.3923 0.2234 0.1789 ~5min

Note: Results may vary based on hyperparameters and hardware.

API Reference

Core Classes

BaseRecommender

Abstract base class for all recommenders.

Methods:

  • fit(interactions) - Train the model
  • predict(user_ids, item_ids) - Predict scores for user-item pairs
  • recommend(user_ids, k, exclude_seen) - Generate top-K recommendations
  • save(path) - Save model to disk
  • load(path) - Load model from disk

InteractionDataset

Dataset wrapper for user-item interactions.

Methods:

  • to_csr_matrix() - Convert to sparse CSR matrix
  • split(test_size, val_size, strategy) - Split into train/val/test
  • get_user_items(user_id) - Get items for a user

Evaluator

Comprehensive model evaluation.

Methods:

  • evaluate(model, test_data, task) - Evaluate model
  • evaluate_ranking(model, test_data) - Ranking metrics
  • evaluate_rating_prediction(model, test_data) - Rating prediction metrics
  • print_results(results) - Pretty print results

Models

All models inherit from BaseRecommender and follow the same API:

model = ModelClass(**hyperparameters)
model.fit(train_data)
recommendations = model.recommend(user_ids, k=10)

Available Models:

  • EASERecommender
  • SLIMRecommender
  • SVDRecommender
  • SVDPlusPlusRecommender
  • ALSRecommender
  • NCFRecommender (requires PyTorch)

Datasets

Built-in dataset loaders:

from recommender.data import (
    load_movielens,
    load_amazon,
    load_book_crossing,
    create_synthetic_dataset
)

# MovieLens
df = load_movielens(size='100k')  # '100k', '1m', '10m', '20m', '25m'

# Amazon Reviews
df = load_amazon(category='Books', max_reviews=100000)

# Book-Crossing
df = load_book_crossing()

# Synthetic data for testing
df = create_synthetic_dataset(n_users=1000, n_items=500, n_interactions=10000)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this library in your research, please cite:

@software{sota_recommender_library,
  author = {Lobachevskiy, Semen},
  title = {SOTA Recommender Systems Library},
  year = {2025},
  url = {https://github.com/hichnicksemen/svd-recommender}
}

References

  • EASE: Harald Steck. 2019. Embarrassingly Shallow Autoencoders for Sparse Data. WWW '19.
  • SLIM: Xia Ning and George Karypis. 2011. SLIM: Sparse Linear Methods for Top-N Recommender Systems. ICDM '11.
  • SVD++: Yehuda Koren. 2008. Factorization meets the neighborhood. KDD '08.
  • ALS: Yifan Hu et al. 2008. Collaborative Filtering for Implicit Feedback Datasets. ICDM '08.
  • NCF: Xiangnan He et al. 2017. Neural Collaborative Filtering. WWW '17.
  • LightGCN: Xiangnan He et al. 2020. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. SIGIR '20.
  • SASRec: Wang-Cheng Kang and Julian McAuley. 2018. Self-Attentive Sequential Recommendation. ICDM '18.

Acknowledgments

This library builds upon research and implementations from the recommender systems community.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sota_recommender-0.3.1.tar.gz (56.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sota_recommender-0.3.1-py3-none-any.whl (68.7 kB view details)

Uploaded Python 3

File details

Details for the file sota_recommender-0.3.1.tar.gz.

File metadata

  • Download URL: sota_recommender-0.3.1.tar.gz
  • Upload date:
  • Size: 56.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.24

File hashes

Hashes for sota_recommender-0.3.1.tar.gz
Algorithm Hash digest
SHA256 965258c92deb3eed7797bee6e395e13735c925579b1ca82883c3adbb61b912cc
MD5 0b23b74b281111b6908461f87a07b8e7
BLAKE2b-256 7db5732f1d28257472341262bcf6521c33c05f292c6eb2f6236323194c7a8d89

See more details on using hashes here.

File details

Details for the file sota_recommender-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for sota_recommender-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fd1d28597ef6a39392e3137423d97a1906b7a7862fa7bbb36ef0234207de1a5b
MD5 8f566c4ba94b849079026017efae7faa
BLAKE2b-256 d1f9c9eb5c8d5418943ddc1f2fb2413b49b870eb57605cf15bd1d2f41c3ecd55

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page