A collaborative-filtering and content-based recommender system for both explicit and implicit datasets.
Project description
LibRecommender
Overview
LibRecommender is an easy-to-use recommender system focused on end-to-end recommendation. The main features are:
- Implemented a number of popular recommendation algorithms such as FM, DIN, LightGCN etc. See full algorithm list.
- A hybrid recommender system, which allows user to use either collaborative-filtering or content-based features. New features can be added on the fly.
- Low memory usage, automatically convert categorical and multi-value categorical features to sparse representation.
- Support training for both explicit and implicit datasets, and negative sampling can be used for implicit dataset.
- Provide end-to-end workflow, i.e. data handling / preprocessing -> model training -> evaluate -> serving.
- Support cold-start prediction and recommendation.
- Provide unified and friendly API for all algorithms. Easy to retrain model with new users/items.
Usage
pure collaborative-filtering example :
import numpy as np
import pandas as pd
from libreco.data import random_split, DatasetPure
from libreco.algorithms import SVDpp # pure data, algorithm SVD++
from libreco.evaluation import evaluate
data = pd.read_csv("examples/sample_data/sample_movielens_rating.dat", sep="::",
names=["user", "item", "label", "time"])
# split whole data into three folds for training, evaluating and testing
train_data, eval_data, test_data = random_split(data, multi_ratios=[0.8, 0.1, 0.1])
train_data, data_info = DatasetPure.build_trainset(train_data)
eval_data = DatasetPure.build_evalset(eval_data)
test_data = DatasetPure.build_testset(test_data)
print(data_info) # n_users: 5894, n_items: 3253, data sparsity: 0.4172 %
svdpp = SVDpp(task="rating", data_info=data_info, embed_size=16, n_epochs=3, lr=0.001,
reg=None, batch_size=256)
# monitor metrics on eval_data during training
svdpp.fit(train_data, verbose=2, eval_data=eval_data, metrics=["rmse", "mae", "r2"])
# do final evaluation on test data
print("evaluate_result: ", evaluate(model=svdpp, data=test_data,
metrics=["rmse", "mae"]))
# predict preference of user 2211 to item 110
print("prediction: ", svdpp.predict(user=2211, item=110))
# recommend 7 items for user 2211
print("recommendation: ", svdpp.recommend_user(user=2211, n_rec=7))
# cold-start prediction
print("cold prediction: ", svdpp.predict(user="ccc", item="not item",
cold_start="average"))
# cold-start recommendation
print("cold recommendation: ", svdpp.recommend_user(user="are we good?",
n_rec=7,
cold_start="popular"))
include features example :
import numpy as np
import pandas as pd
from libreco.data import split_by_ratio_chrono, DatasetFeat
from libreco.algorithms import YouTubeRanking # feat data, algorithm YouTubeRanking
data = pd.read_csv("examples/sample_data/sample_movielens_merged.csv", sep=",", header=0)
data["label"] = 1 # convert to implicit data and do negative sampling afterwards
# split into train and test data based on time
train_data, test_data = split_by_ratio_chrono(data, test_size=0.2)
# specify complete columns information
sparse_col = ["sex", "occupation", "genre1", "genre2", "genre3"]
dense_col = ["age"]
user_col = ["sex", "age", "occupation"]
item_col = ["genre1", "genre2", "genre3"]
train_data, data_info = DatasetFeat.build_trainset(
train_data, user_col, item_col, sparse_col, dense_col
)
test_data = DatasetFeat.build_testset(test_data)
train_data.build_negative_samples(data_info) # sample negative items for each record
test_data.build_negative_samples(data_info)
print(data_info) # n_users: 5962, n_items: 3226, data sparsity: 0.4185 %
ytb_ranking = YouTubeRanking(task="ranking", data_info=data_info, embed_size=16,
n_epochs=3, lr=1e-4, batch_size=512, use_bn=True,
hidden_units="128,64,32")
ytb_ranking.fit(train_data, verbose=2, shuffle=True, eval_data=test_data,
metrics=["loss", "roc_auc", "precision", "recall", "map", "ndcg"])
# predict preference of user 2211 to item 110
print("prediction: ", ytb_ranking.predict(user=2211, item=110))
# recommend 7 items for user 2211
print("recommendation(id, probability): ", ytb_ranking.recommend_user(user=2211, n_rec=7))
# cold-start prediction
print("cold prediction: ", ytb_ranking.predict(user="ccc", item="not item",
cold_start="average"))
# cold-start recommendation
print("cold recommendation: ", ytb_ranking.recommend_user(user="are we good?",
n_rec=7,
cold_start="popular"))
For more examples and usages, see User Guide
Data Format
JUST normal data format, each line represents a sample. One thing is important, the model assumes that user
, item
, and label
column index are 0, 1, and 2, respectively. You may wish to change the column order if that's not the case. Take for Example, the movielens-1m
dataset:
1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275
Besides, if you want to use some other meta features (e.g., age, sex, category etc.), you need to tell the model which columns are [sparse_col
, dense_col
, user_col
, item_col
], which means all features must be in a same table. See above YouTubeRanking
for example.
Also note that your data should not contain missing values.
Serving
For how to serve a trained model in LibRecommender, see Serving Guide .
Installation & Dependencies
From pypi :
$ pip install LibRecommender
To build from source, you 'll first need Cython and Numpy:
$ # pip install numpy cython
$ git clone https://github.com/massquantity/LibRecommender.git
$ cd LibRecommender
$ python setup.py install
Basic Dependencies for libreco
:
- Python >= 3.6
- TensorFlow >= 1.15
- PyTorch >= 1.10
- Numpy >= 1.19.5
- Cython >= 0.29.0
- Pandas >= 1.0.0
- Scipy >= 1.2.1
- scikit-learn >= 0.20.0
- gensim >= 4.0.0
- tqdm
- nmslib (optional, see User Guide)
- DGL (optional, see Implementation Details)
If you are using Python 3.6, you also need to install dataclasses, which was first introduced in Python 3.7.
LibRecommender is tested under TensorFlow 1.15, 2.5, 2.8 and 2.10. If you encounter any problem during running, feel free to open an issue.
Known issue: Sometimes one may encounter errors like ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
. In this case try upgrading numpy, and version 1.22.0 or higher is probably a safe option.
The table below shows some compatible version combinations:
Python | Numpy | TensorFlow | OS |
---|---|---|---|
3.6 | 1.19.5 | 1.15, 2.5 | linux, windows, macos |
3.7 | 1.20.3, 1.21.6 | 1.15, 2.5, 2.8, 2.10 | linux, windows, macos |
3.8 | 1.22.4, 1.23.2 | 2.5, 2.8, 2.10 | linux, windows, macos |
3.9 | 1.22.4, 1.23.2 | 2.5, 2.8, 2.10 | linux, windows, macos |
3.10 | 1.22.4, 1.23.2 | 2.8, 2.10 | linux, windows, macos |
Optional Dependencies for libserving
:
- Python >= 3.7
- sanic >= 22.3
- requests
- aiohttp
- pydantic
- ujson
- redis
- redis-py >= 4.2.0
- faiss >= 1.5.2
- TensorFlow Serving == 2.8.2
Docker
One can also use the library in a docker container without installing dependencies, see Docker.
References
[1] Category:
pure
means collaborative-filtering algorithms which only use behavior data,feat
means other side-features can be included. ↩[2] Sequence: Algorithms that leverage user behavior sequence. ↩
[3] Graph: Algorithms that leverage graph information, including Graph Embedding (GE) and Graph Neural Network (GNN) . ↩
[4] Embedding: Algorithms that can generate final user and item embeddings. ↩
Powered by
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for LibRecommender-0.12.4-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aee6e25c45560568d9ef69752e040625c23953b87967d082356551c7dc789795 |
|
MD5 | dfd3e8ec65beae32d011e39940211032 |
|
BLAKE2b-256 | acbe0b488d2b33b7a4792a4cb0824d48104f4a141ee5802c50847894d6c4233f |
Hashes for LibRecommender-0.12.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0517cbf8fb6eac35a21e31e4eb509efe66d1e3d380ee47f8c4a1c5b7d939854d |
|
MD5 | dfd8ec68e789ebac7617f0687d2c0f28 |
|
BLAKE2b-256 | 6d28007dae17b878270c9933ee082b972f17c6838e1626bef84671c49738a226 |
Hashes for LibRecommender-0.12.4-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 17cd4e40edebc86cdad75e7c84ae13a4365995c62438dbd9eaeaea3280a68921 |
|
MD5 | ef4a7e224b3c76154a6b4d8cbeab8692 |
|
BLAKE2b-256 | 25b229231f228e027d60c57ab92e8478e0f9e9cb9871620fc07d3df30e4bf8e7 |
Hashes for LibRecommender-0.12.4-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13010cfa3bc7e53b6aee19d6e7cdf457285f0b243bc283430b0b7b434fbd10e4 |
|
MD5 | a2c49a5004bfb043b52f1ba2a0b3464d |
|
BLAKE2b-256 | 559deebceabc19973c90929c821383c3bd0477f628f3089cd294fd752c8b6cb4 |
Hashes for LibRecommender-0.12.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 99015eeaa7a6a1ca6f13a7847c14653e92ff95ebe46451ba05c32ab81437a605 |
|
MD5 | 4d72a780092edf289eb5ba3529bc3dee |
|
BLAKE2b-256 | 8ed7790aadb5edb9fdecccc8b778322cd3ef868af5b9fd44143f3e4ac235bb5c |
Hashes for LibRecommender-0.12.4-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88eea36c051c1e72e13ea9cade02043d1c752cb7a712a933cee7ea4510a1496c |
|
MD5 | 932adbb9e7b604becb48dbb329fabdd2 |
|
BLAKE2b-256 | 3601e027a1c680c9226312eb26797dd88ef07e1ab88e264ed3947f6c9399d388 |
Hashes for LibRecommender-0.12.4-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00829925b4e69d17407abcaab17488b25c1ac958e87a3ea81a62ccfa13c82275 |
|
MD5 | 19a6275d7cd442d287d6e0a78a226d1c |
|
BLAKE2b-256 | 06ef512b3f1cc1b22b5b4c8a9e034a3da82b1400e8cae8cfa6d98499608f77b0 |
Hashes for LibRecommender-0.12.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 607a4960e4e03e580614d18b460e8ffae27881cc81942decafeadc6f24c2a586 |
|
MD5 | 09251ff0d7e2a890a314fd2529525778 |
|
BLAKE2b-256 | a8f552d633608edfdced6db82f3eb96dec620eaf98ee6bff28056121bfd3fed3 |
Hashes for LibRecommender-0.12.4-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1d1fc4ea498bf9bc535be314d334b5dde5339fc39ee9545fcd821a6bb63f3561 |
|
MD5 | 1d86c639a60cffb4ece0c648b5eaa26f |
|
BLAKE2b-256 | 018a5059822079dd84fbc2df8bbb89146b2297c2ecc9154743108aefc4240efb |
Hashes for LibRecommender-0.12.4-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dabecee7faa908b70272c7b43ca438f8d8349a49d6b16c0270e6df3c0cd1849d |
|
MD5 | ea857c2a39563cd7230aac75cf315e49 |
|
BLAKE2b-256 | 3cba6cfff8eb02958e987556b4144bf74770dbf1db8f1ce6c91ffe740c993f8d |
Hashes for LibRecommender-0.12.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56d095ebbc7ec2df62abf317e7daab7af6d549d6c0d21c8b3461ba97c0cf5056 |
|
MD5 | ab45090d68c6ccd09626c23aeee9afce |
|
BLAKE2b-256 | 5d91b9a744602c2247f1598170369cd7456993401eb82baefe6f74dc73d71d71 |
Hashes for LibRecommender-0.12.4-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aeb553400e23a61506252a0e44610b5cff20879f21f95a82a84b50fb65346777 |
|
MD5 | bce48f745fb8181c66456071e642b5d8 |
|
BLAKE2b-256 | 41436d255525911bafc6f4ec7d16937eda16891bc5d1a7f4d0d4a1b5f03bd615 |
Hashes for LibRecommender-0.12.4-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f47b2b92cfcd05e9e3bc3c7747114f42fc1887c3c4d0cec890f508e688baf429 |
|
MD5 | f965a2ef3af53b219ab443fd8fd0a99e |
|
BLAKE2b-256 | e116eb8916b3a00fa4c380538e8ad94d397e0d42169219bb2109d5c3ece7e16c |
Hashes for LibRecommender-0.12.4-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 41ca112a38b8f96c8f89d673cf7437f3b0aee6f19824358b34020d825f957272 |
|
MD5 | daa0725380a68a1f02c43a289539b422 |
|
BLAKE2b-256 | 88317fa9eaedeefacd223769b91479b5cf01f77779e246d1b87a8a3b25dbe980 |
Hashes for LibRecommender-0.12.4-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8deb1054208ca58b6d52761f2c6b4435420a8e5516a4f585e0149e6d768eb945 |
|
MD5 | 1f42a124d9f9517b2b81024264b080c7 |
|
BLAKE2b-256 | 3992f471ae1b24be3d44b96f820245083da75bdc405304198bb7dc7178ffaaeb |