A modern, production-ready library for state-of-the-art recommender systems
Project description
SOTA Recommender Systems Library
A modern, production-ready Python library for building state-of-the-art recommender systems. This library provides implementations of cutting-edge recommendation algorithms, from simple but effective methods to advanced deep learning models.
Features
✨ SOTA Algorithms
-
Simple but Effective
- 🚀 EASE - Embarrassingly Shallow Autoencoders (closed-form solution, incredibly fast)
- 📊 SLIM - Sparse Linear Methods with L1/L2 regularization
-
Matrix Factorization
- 📐 SVD - Singular Value Decomposition
- ⭐ SVD++ - SVD with implicit feedback
- 🔄 ALS - Alternating Least Squares for implicit feedback
-
Deep Learning (requires PyTorch)
- 🧠 NCF - Neural Collaborative Filtering (GMF + MLP)
- 🔗 LightGCN - Graph Neural Network for recommendations ✅
- 📝 SASRec - Self-Attentive Sequential Recommendations ✅
🛠️ Production-Ready Features
- Comprehensive Evaluation Metrics: Precision@K, Recall@K, NDCG@K, MAP@K, MRR, Hit Rate, Coverage, Diversity
- Data Processing: Built-in dataset loaders (MovieLens, Amazon, etc.), negative sampling, preprocessing
- Flexible Architecture: Unified API for all models, easy to extend
- Performance: Optimized for both speed and accuracy
Installation
Basic Installation
pip install .
With Deep Learning Support
pip install -r requirements.txt
Quick Start
from recommender import (
EASERecommender,
load_movielens,
InteractionDataset,
Evaluator
)
# Load data
df = load_movielens(size='100k')
# Create dataset
dataset = InteractionDataset(df, implicit=True)
train, test = dataset.split(test_size=0.2)
# Train model
model = EASERecommender(l2_reg=500.0)
model.fit(train.data)
# Generate recommendations
user_ids = [1, 2, 3]
recommendations = model.recommend(user_ids, k=10)
# Evaluate
evaluator = Evaluator(metrics=['precision', 'recall', 'ndcg'])
results = evaluator.evaluate(model, test, task='ranking', train_data=train)
evaluator.print_results(results)
Usage Examples
1. EASE - Fast and Effective
EASE is perfect for large-scale implicit feedback datasets. It has a closed-form solution, making it extremely fast.
from recommender import EASERecommender, load_movielens, InteractionDataset
# Load MovieLens data
df = load_movielens(size='1m')
dataset = InteractionDataset(df, implicit=True, min_user_interactions=5)
# Train/test split
train, test = dataset.split(test_size=0.2, strategy='random')
# Train EASE
model = EASERecommender(l2_reg=500.0)
model.fit(train.data)
# Get recommendations
recommendations = model.recommend([1, 2, 3], k=10, exclude_seen=True)
print(recommendations)
# Save model
model.save('ease_model.pkl')
2. SLIM - Sparse Item-Item Model
SLIM learns a sparse item-item similarity matrix, providing interpretable recommendations.
from recommender import SLIMRecommender
# Train SLIM
model = SLIMRecommender(
l1_reg=0.1, # L1 regularization for sparsity
l2_reg=0.1, # L2 regularization
max_iter=100,
positive_only=True
)
model.fit(train.data)
# Get similar items
similar_items = model.get_similar_items(item_id=123, k=10)
print(f"Items similar to 123: {similar_items}")
3. SVD++ - Matrix Factorization with Implicit Feedback
SVD++ incorporates implicit feedback for better predictions on explicit ratings.
from recommender import SVDPlusPlusRecommender
# Load explicit ratings
df = load_movielens(size='100k') # Contains ratings 1-5
dataset = InteractionDataset(df, implicit=False)
train, test = dataset.split(test_size=0.2)
# Train SVD++
model = SVDPlusPlusRecommender(
n_factors=20,
n_epochs=20,
lr=0.005,
reg=0.02
)
model.fit(train.data)
# Predict ratings
user_ids = [1, 1, 2]
item_ids = [10, 20, 30]
predictions = model.predict(user_ids, item_ids)
print(f"Predicted ratings: {predictions}")
4. ALS - Implicit Feedback at Scale
ALS is excellent for large-scale implicit feedback datasets.
from recommender import ALSRecommender
# Train ALS
model = ALSRecommender(
n_factors=50,
n_iterations=15,
reg=0.01,
alpha=40.0 # Confidence scaling
)
model.fit(train.data)
# Get recommendations
recommendations = model.recommend([1, 2, 3], k=20)
5. NCF - Deep Learning (requires PyTorch)
Neural Collaborative Filtering combines matrix factorization with deep learning.
from recommender import NCFRecommender
# Train NCF
model = NCFRecommender(
embedding_dim=64,
hidden_layers=[128, 64, 32],
learning_rate=0.001,
batch_size=256,
epochs=20,
device='cuda' # or 'cpu'
)
model.fit(train.data)
# Get recommendations
recommendations = model.recommend([1, 2, 3], k=10)
6. Custom Data Processing
from recommender.data import (
filter_by_interaction_count,
binarize_implicit_feedback,
create_sequences,
temporal_split
)
import pandas as pd
# Load your custom data
df = pd.read_csv('your_data.csv')
# Filter sparse users/items
df = filter_by_interaction_count(
df,
min_user_interactions=5,
min_item_interactions=5
)
# Convert to implicit feedback
df = binarize_implicit_feedback(df, threshold=4.0)
# Temporal split (if you have timestamps)
train, test = temporal_split(df, test_size=0.2)
7. Advanced Evaluation
from recommender import Evaluator
# Create evaluator with custom metrics
evaluator = Evaluator(
metrics=['precision', 'recall', 'ndcg', 'map', 'mrr', 'hit_rate', 'coverage', 'diversity'],
k_values=[5, 10, 20, 50]
)
# Evaluate model
results = evaluator.evaluate(
model,
test_data=test,
task='ranking',
exclude_train=True,
train_data=train
)
# Pretty print results
evaluator.print_results(results)
# Access specific metrics
ndcg_10 = results['ndcg@10']
recall_20 = results['recall@20']
8. Cross-Validation
from recommender import cross_validate
# Perform 5-fold cross-validation
cv_results = cross_validate(
model_class=EASERecommender,
dataset=dataset,
n_folds=5,
metrics=['precision', 'recall', 'ndcg'],
k_values=[10, 20],
l2_reg=500.0 # Model hyperparameters
)
9. Negative Sampling
from recommender.data import UniformSampler, PopularitySampler, create_negative_samples
# Uniform negative sampling
sampler = UniformSampler(n_items=dataset.n_items, seed=42)
# Popularity-based sampling
item_popularity = train.data['item_id'].value_counts().to_dict()
sampler = PopularitySampler(n_items=dataset.n_items, item_popularity=item_popularity)
# Create training data with negatives
train_with_negatives = create_negative_samples(
interactions_df=train.data,
sampler=sampler,
n_negatives_per_positive=4
)
Benchmarks
Performance on MovieLens-1M (80/20 split, implicit feedback):
| Model | NDCG@10 | Recall@10 | Precision@10 | Training Time |
|---|---|---|---|---|
| EASE | 0.3845 | 0.2156 | 0.1723 | ~5s |
| SLIM | 0.3721 | 0.2089 | 0.1654 | ~2min |
| ALS | 0.3567 | 0.1998 | 0.1589 | ~30s |
| SVD | 0.3289 | 0.1845 | 0.1456 | ~10s |
| NCF | 0.3923 | 0.2234 | 0.1789 | ~5min |
Note: Results may vary based on hyperparameters and hardware.
API Reference
Core Classes
BaseRecommender
Abstract base class for all recommenders.
Methods:
fit(interactions)- Train the modelpredict(user_ids, item_ids)- Predict scores for user-item pairsrecommend(user_ids, k, exclude_seen)- Generate top-K recommendationssave(path)- Save model to diskload(path)- Load model from disk
InteractionDataset
Dataset wrapper for user-item interactions.
Methods:
to_csr_matrix()- Convert to sparse CSR matrixsplit(test_size, val_size, strategy)- Split into train/val/testget_user_items(user_id)- Get items for a user
Evaluator
Comprehensive model evaluation.
Methods:
evaluate(model, test_data, task)- Evaluate modelevaluate_ranking(model, test_data)- Ranking metricsevaluate_rating_prediction(model, test_data)- Rating prediction metricsprint_results(results)- Pretty print results
Models
All models inherit from BaseRecommender and follow the same API:
model = ModelClass(**hyperparameters)
model.fit(train_data)
recommendations = model.recommend(user_ids, k=10)
Available Models:
EASERecommenderSLIMRecommenderSVDRecommenderSVDPlusPlusRecommenderALSRecommenderNCFRecommender(requires PyTorch)
Datasets
Built-in dataset loaders:
from recommender.data import (
load_movielens,
load_amazon,
load_book_crossing,
create_synthetic_dataset
)
# MovieLens
df = load_movielens(size='100k') # '100k', '1m', '10m', '20m', '25m'
# Amazon Reviews
df = load_amazon(category='Books', max_reviews=100000)
# Book-Crossing
df = load_book_crossing()
# Synthetic data for testing
df = create_synthetic_dataset(n_users=1000, n_items=500, n_interactions=10000)
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Citation
If you use this library in your research, please cite:
@software{sota_recommender_library,
author = {Lobachevskiy, Semen},
title = {SOTA Recommender Systems Library},
year = {2025},
url = {https://github.com/hichnicksemen/svd-recommender}
}
References
- EASE: Harald Steck. 2019. Embarrassingly Shallow Autoencoders for Sparse Data. WWW '19.
- SLIM: Xia Ning and George Karypis. 2011. SLIM: Sparse Linear Methods for Top-N Recommender Systems. ICDM '11.
- SVD++: Yehuda Koren. 2008. Factorization meets the neighborhood. KDD '08.
- ALS: Yifan Hu et al. 2008. Collaborative Filtering for Implicit Feedback Datasets. ICDM '08.
- NCF: Xiangnan He et al. 2017. Neural Collaborative Filtering. WWW '17.
- LightGCN: Xiangnan He et al. 2020. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. SIGIR '20.
- SASRec: Wang-Cheng Kang and Julian McAuley. 2018. Self-Attentive Sequential Recommendation. ICDM '18.
Acknowledgments
This library builds upon research and implementations from the recommender systems community.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sota_recommender-0.3.1.tar.gz.
File metadata
- Download URL: sota_recommender-0.3.1.tar.gz
- Upload date:
- Size: 56.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.24
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
965258c92deb3eed7797bee6e395e13735c925579b1ca82883c3adbb61b912cc
|
|
| MD5 |
0b23b74b281111b6908461f87a07b8e7
|
|
| BLAKE2b-256 |
7db5732f1d28257472341262bcf6521c33c05f292c6eb2f6236323194c7a8d89
|
File details
Details for the file sota_recommender-0.3.1-py3-none-any.whl.
File metadata
- Download URL: sota_recommender-0.3.1-py3-none-any.whl
- Upload date:
- Size: 68.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.24
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd1d28597ef6a39392e3137423d97a1906b7a7862fa7bbb36ef0234207de1a5b
|
|
| MD5 |
8f566c4ba94b849079026017efae7faa
|
|
| BLAKE2b-256 |
d1f9c9eb5c8d5418943ddc1f2fb2413b49b870eb57605cf15bd1d2f41c3ecd55
|