Skip to main content

A modern, production-ready library for state-of-the-art recommender systems

Project description

SOTA Recommender Systems Library

A modern, production-ready Python library for building state-of-the-art recommender systems. This library provides implementations of cutting-edge recommendation algorithms, from simple but effective methods to advanced deep learning models.

Python 3.8+ License: MIT

Features

✨ SOTA Algorithms

  • Simple but Effective

    • 🚀 EASE - Embarrassingly Shallow Autoencoders (closed-form solution, incredibly fast)
    • 📊 SLIM - Sparse Linear Methods with L1/L2 regularization
  • Matrix Factorization

    • 📐 SVD - Singular Value Decomposition
    • SVD++ - SVD with implicit feedback
    • 🔄 ALS - Alternating Least Squares for implicit feedback
  • Deep Learning (requires PyTorch)

    • 🧠 NCF - Neural Collaborative Filtering (GMF + MLP)
    • 🔗 LightGCN - Graph Neural Network for recommendations ✅
    • 📝 SASRec - Self-Attentive Sequential Recommendations ✅

🛠️ Production-Ready Features

  • Comprehensive Evaluation Metrics: Precision@K, Recall@K, NDCG@K, MAP@K, MRR, Hit Rate, Coverage, Diversity
  • Data Processing: Built-in dataset loaders (MovieLens, Amazon, etc.), negative sampling, preprocessing
  • Flexible Architecture: Unified API for all models, easy to extend
  • Performance: Optimized for both speed and accuracy

Installation

Basic Installation

pip install .

With Deep Learning Support

pip install -r requirements.txt

Quick Start

from recommender import (
    EASERecommender,
    load_movielens,
    InteractionDataset,
    Evaluator
)

# Load data
df = load_movielens(size='100k')

# Create dataset
dataset = InteractionDataset(df, implicit=True)
train, test = dataset.split(test_size=0.2)

# Train model
model = EASERecommender(l2_reg=500.0)
model.fit(train.data)

# Generate recommendations
user_ids = [1, 2, 3]
recommendations = model.recommend(user_ids, k=10)

# Evaluate
evaluator = Evaluator(metrics=['precision', 'recall', 'ndcg'])
results = evaluator.evaluate(model, test, task='ranking', train_data=train)
evaluator.print_results(results)

Usage Examples

1. EASE - Fast and Effective

EASE is perfect for large-scale implicit feedback datasets. It has a closed-form solution, making it extremely fast.

from recommender import EASERecommender, load_movielens, InteractionDataset

# Load MovieLens data
df = load_movielens(size='1m')
dataset = InteractionDataset(df, implicit=True, min_user_interactions=5)

# Train/test split
train, test = dataset.split(test_size=0.2, strategy='random')

# Train EASE
model = EASERecommender(l2_reg=500.0)
model.fit(train.data)

# Get recommendations
recommendations = model.recommend([1, 2, 3], k=10, exclude_seen=True)
print(recommendations)

# Save model
model.save('ease_model.pkl')

2. SLIM - Sparse Item-Item Model

SLIM learns a sparse item-item similarity matrix, providing interpretable recommendations.

from recommender import SLIMRecommender

# Train SLIM
model = SLIMRecommender(
    l1_reg=0.1,      # L1 regularization for sparsity
    l2_reg=0.1,      # L2 regularization
    max_iter=100,
    positive_only=True
)
model.fit(train.data)

# Get similar items
similar_items = model.get_similar_items(item_id=123, k=10)
print(f"Items similar to 123: {similar_items}")

3. SVD++ - Matrix Factorization with Implicit Feedback

SVD++ incorporates implicit feedback for better predictions on explicit ratings.

from recommender import SVDPlusPlusRecommender

# Load explicit ratings
df = load_movielens(size='100k')  # Contains ratings 1-5
dataset = InteractionDataset(df, implicit=False)
train, test = dataset.split(test_size=0.2)

# Train SVD++
model = SVDPlusPlusRecommender(
    n_factors=20,
    n_epochs=20,
    lr=0.005,
    reg=0.02
)
model.fit(train.data)

# Predict ratings
user_ids = [1, 1, 2]
item_ids = [10, 20, 30]
predictions = model.predict(user_ids, item_ids)
print(f"Predicted ratings: {predictions}")

4. ALS - Implicit Feedback at Scale

ALS is excellent for large-scale implicit feedback datasets.

from recommender import ALSRecommender

# Train ALS
model = ALSRecommender(
    n_factors=50,
    n_iterations=15,
    reg=0.01,
    alpha=40.0  # Confidence scaling
)
model.fit(train.data)

# Get recommendations
recommendations = model.recommend([1, 2, 3], k=20)

5. NCF - Deep Learning (requires PyTorch)

Neural Collaborative Filtering combines matrix factorization with deep learning.

from recommender import NCFRecommender

# Train NCF
model = NCFRecommender(
    embedding_dim=64,
    hidden_layers=[128, 64, 32],
    learning_rate=0.001,
    batch_size=256,
    epochs=20,
    device='cuda'  # or 'cpu'
)
model.fit(train.data)

# Get recommendations
recommendations = model.recommend([1, 2, 3], k=10)

6. Custom Data Processing

from recommender.data import (
    filter_by_interaction_count,
    binarize_implicit_feedback,
    create_sequences,
    temporal_split
)
import pandas as pd

# Load your custom data
df = pd.read_csv('your_data.csv')

# Filter sparse users/items
df = filter_by_interaction_count(
    df,
    min_user_interactions=5,
    min_item_interactions=5
)

# Convert to implicit feedback
df = binarize_implicit_feedback(df, threshold=4.0)

# Temporal split (if you have timestamps)
train, test = temporal_split(df, test_size=0.2)

7. Advanced Evaluation

from recommender import Evaluator

# Create evaluator with custom metrics
evaluator = Evaluator(
    metrics=['precision', 'recall', 'ndcg', 'map', 'mrr', 'hit_rate', 'coverage', 'diversity'],
    k_values=[5, 10, 20, 50]
)

# Evaluate model
results = evaluator.evaluate(
    model,
    test_data=test,
    task='ranking',
    exclude_train=True,
    train_data=train
)

# Pretty print results
evaluator.print_results(results)

# Access specific metrics
ndcg_10 = results['ndcg@10']
recall_20 = results['recall@20']

8. Cross-Validation

from recommender import cross_validate

# Perform 5-fold cross-validation
cv_results = cross_validate(
    model_class=EASERecommender,
    dataset=dataset,
    n_folds=5,
    metrics=['precision', 'recall', 'ndcg'],
    k_values=[10, 20],
    l2_reg=500.0  # Model hyperparameters
)

9. Negative Sampling

from recommender.data import UniformSampler, PopularitySampler, create_negative_samples

# Uniform negative sampling
sampler = UniformSampler(n_items=dataset.n_items, seed=42)

# Popularity-based sampling
item_popularity = train.data['item_id'].value_counts().to_dict()
sampler = PopularitySampler(n_items=dataset.n_items, item_popularity=item_popularity)

# Create training data with negatives
train_with_negatives = create_negative_samples(
    interactions_df=train.data,
    sampler=sampler,
    n_negatives_per_positive=4
)

Benchmarks

Performance on MovieLens-1M (80/20 split, implicit feedback):

Model NDCG@10 Recall@10 Precision@10 Training Time
EASE 0.3845 0.2156 0.1723 ~5s
SLIM 0.3721 0.2089 0.1654 ~2min
ALS 0.3567 0.1998 0.1589 ~30s
SVD 0.3289 0.1845 0.1456 ~10s
NCF 0.3923 0.2234 0.1789 ~5min

Note: Results may vary based on hyperparameters and hardware.

API Reference

Core Classes

BaseRecommender

Abstract base class for all recommenders.

Methods:

  • fit(interactions) - Train the model
  • predict(user_ids, item_ids) - Predict scores for user-item pairs
  • recommend(user_ids, k, exclude_seen) - Generate top-K recommendations
  • save(path) - Save model to disk
  • load(path) - Load model from disk

InteractionDataset

Dataset wrapper for user-item interactions.

Methods:

  • to_csr_matrix() - Convert to sparse CSR matrix
  • split(test_size, val_size, strategy) - Split into train/val/test
  • get_user_items(user_id) - Get items for a user

Evaluator

Comprehensive model evaluation.

Methods:

  • evaluate(model, test_data, task) - Evaluate model
  • evaluate_ranking(model, test_data) - Ranking metrics
  • evaluate_rating_prediction(model, test_data) - Rating prediction metrics
  • print_results(results) - Pretty print results

Models

All models inherit from BaseRecommender and follow the same API:

model = ModelClass(**hyperparameters)
model.fit(train_data)
recommendations = model.recommend(user_ids, k=10)

Available Models:

  • EASERecommender
  • SLIMRecommender
  • SVDRecommender
  • SVDPlusPlusRecommender
  • ALSRecommender
  • NCFRecommender (requires PyTorch)

Datasets

Built-in dataset loaders:

from recommender.data import (
    load_movielens,
    load_amazon,
    load_book_crossing,
    create_synthetic_dataset
)

# MovieLens
df = load_movielens(size='100k')  # '100k', '1m', '10m', '20m', '25m'

# Amazon Reviews
df = load_amazon(category='Books', max_reviews=100000)

# Book-Crossing
df = load_book_crossing()

# Synthetic data for testing
df = create_synthetic_dataset(n_users=1000, n_items=500, n_interactions=10000)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this library in your research, please cite:

@software{sota_recommender_library,
  author = {Lobachevskiy, Semen},
  title = {SOTA Recommender Systems Library},
  year = {2025},
  url = {https://github.com/hichnicksemen/svd-recommender}
}

References

  • EASE: Harald Steck. 2019. Embarrassingly Shallow Autoencoders for Sparse Data. WWW '19.
  • SLIM: Xia Ning and George Karypis. 2011. SLIM: Sparse Linear Methods for Top-N Recommender Systems. ICDM '11.
  • SVD++: Yehuda Koren. 2008. Factorization meets the neighborhood. KDD '08.
  • ALS: Yifan Hu et al. 2008. Collaborative Filtering for Implicit Feedback Datasets. ICDM '08.
  • NCF: Xiangnan He et al. 2017. Neural Collaborative Filtering. WWW '17.
  • LightGCN: Xiangnan He et al. 2020. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. SIGIR '20.
  • SASRec: Wang-Cheng Kang and Julian McAuley. 2018. Self-Attentive Sequential Recommendation. ICDM '18.

Acknowledgments

This library builds upon research and implementations from the recommender systems community.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sota_recommender-0.3.4.tar.gz (57.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sota_recommender-0.3.4-py3-none-any.whl (68.8 kB view details)

Uploaded Python 3

File details

Details for the file sota_recommender-0.3.4.tar.gz.

File metadata

  • Download URL: sota_recommender-0.3.4.tar.gz
  • Upload date:
  • Size: 57.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.24

File hashes

Hashes for sota_recommender-0.3.4.tar.gz
Algorithm Hash digest
SHA256 31325be12e67912e6f64ec0d3397d90f522521d6e9070ffee9cdb3a18bda9f15
MD5 86c7e2ce121df3c3be268bc99e57d135
BLAKE2b-256 95c0abccedd1f3598cf4ef204024ac52e58f100fddac8ee68f6a9ceabfc4696a

See more details on using hashes here.

File details

Details for the file sota_recommender-0.3.4-py3-none-any.whl.

File metadata

File hashes

Hashes for sota_recommender-0.3.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e502c57d110ef6c90c891f15f9d4c5b8d6840299138a72d37545db2d774ec550
MD5 4415a3d4718a3f677eb92d7a2d7e932a
BLAKE2b-256 742d17237c7f93b1a6bba6970ae32205e31449452dd918fe948bba99bfacef8b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page