A collaborative-filtering and content-based recommender system for both explicit and implicit datasets.
Project description
LibRecommender
Overview
LibRecommender is an easy-to-use recommender system focused on end-to-end recommendation. The main features are:
- Implemented a number of popular recommendation algorithms such as SVD++, DeepFM, BPR etc, see full algorithm list.
- A hybrid recommender system, which allows user to use either collaborative-filtering or content-based features or both. New features can be added on the fly.
- Low memory usage, automatically convert categorical and multi-value categorical features to sparse representation.
- Support training for both explicit and implicit datasets, and negative sampling can be used for implicit dataset.
- Making use of Cython or Tensorflow for high-speed model training.
- Provide end-to-end workflow, i.e. data handling / preprocessing -> model training -> evaluate -> serving.
- Support cold-start prediction and recommendation.
- Provide unified and friendly API for all algorithms. Easy to retrain model with new users/items.
Usage
pure collaborative-filtering example :
import numpy as np
import pandas as pd
from libreco.data import random_split, DatasetPure
from libreco.algorithms import SVDpp # pure data, algorithm SVD++
from libreco.evaluation import evaluate
data = pd.read_csv("examples/sample_data/sample_movielens_rating.dat", sep="::",
names=["user", "item", "label", "time"])
# split whole data into three folds for training, evaluating and testing
train_data, eval_data, test_data = random_split(data, multi_ratios=[0.8, 0.1, 0.1])
train_data, data_info = DatasetPure.build_trainset(train_data)
eval_data = DatasetPure.build_evalset(eval_data)
test_data = DatasetPure.build_testset(test_data)
print(data_info) # n_users: 5894, n_items: 3253, data sparsity: 0.4172 %
svdpp = SVDpp(task="rating", data_info=data_info, embed_size=16, n_epochs=3, lr=0.001,
reg=None, batch_size=256)
# monitor metrics on eval_data during training
svdpp.fit(train_data, verbose=2, eval_data=eval_data, metrics=["rmse", "mae", "r2"])
# do final evaluation on test data
print("evaluate_result: ", evaluate(model=svdpp, data=test_data,
metrics=["rmse", "mae"]))
# predict preference of user 2211 to item 110
print("prediction: ", svdpp.predict(user=2211, item=110))
# recommend 7 items for user 2211
print("recommendation: ", svdpp.recommend_user(user=2211, n_rec=7))
# cold-start prediction
print("cold prediction: ", svdpp.predict(user="ccc", item="not item",
cold_start="average"))
# cold-start recommendation
print("cold recommendation: ", svdpp.recommend_user(user="are we good?",
n_rec=7,
cold_start="popular"))
include features example :
import numpy as np
import pandas as pd
from libreco.data import split_by_ratio_chrono, DatasetFeat
from libreco.algorithms import YouTubeRanking # feat data, algorithm YouTubeRanking
data = pd.read_csv("examples/sample_data/sample_movielens_merged.csv", sep=",", header=0)
data["label"] = 1 # convert to implicit data and do negative sampling afterwards
# split into train and test data based on time
train_data, test_data = split_by_ratio_chrono(data, test_size=0.2)
# specify complete columns information
sparse_col = ["sex", "occupation", "genre1", "genre2", "genre3"]
dense_col = ["age"]
user_col = ["sex", "age", "occupation"]
item_col = ["genre1", "genre2", "genre3"]
train_data, data_info = DatasetFeat.build_trainset(
train_data, user_col, item_col, sparse_col, dense_col
)
test_data = DatasetFeat.build_testset(test_data)
train_data.build_negative_samples(data_info) # sample negative items for each record
test_data.build_negative_samples(data_info)
print(data_info) # n_users: 5962, n_items: 3226, data sparsity: 0.4185 %
ytb_ranking = YouTubeRanking(task="ranking", data_info=data_info, embed_size=16,
n_epochs=3, lr=1e-4, batch_size=512, use_bn=True,
hidden_units="128,64,32")
ytb_ranking.fit(train_data, verbose=2, shuffle=True, eval_data=test_data,
metrics=["loss", "roc_auc", "precision", "recall", "map", "ndcg"])
# predict preference of user 2211 to item 110
print("prediction: ", ytb_ranking.predict(user=2211, item=110))
# recommend 7 items for user 2211
print("recommendation(id, probability): ", ytb_ranking.recommend_user(user=2211, n_rec=7))
# cold-start prediction
print("cold prediction: ", ytb_ranking.predict(user="ccc", item="not item",
cold_start="average"))
# cold-start recommendation
print("cold recommendation: ", ytb_ranking.recommend_user(user="are we good?",
n_rec=7,
cold_start="popular"))
For more examples and usages, see User Guide
Data Format
JUST normal data format, each line represents a sample. One thing is important, the model assumes that user
, item
, and label
column index are 0, 1, and 2, respectively. You may wish to change the column order if that's not the case. Take for Example, the movielens-1m
dataset:
1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275
Besides, if you want to use some other meta features (e.g., age, sex, category etc.), you need to tell the model which columns are [sparse_col, dense_col, user_col, item_col
], which means all features must be in a same table. See above YouTubeRanking
for example.
Also note that your data should not contain missing values.
Serving
For how to serve a trained model in LibRecommender, see Serving Guide .
Installation & Dependencies
From pypi :
$ pip install LibRecommender==0.8.4
To build from source, you 'll first need Cython and Numpy:
$ # pip install numpy cython
$ git clone https://github.com/massquantity/LibRecommender.git
$ cd LibRecommender
$ python setup.py install
Basic Dependencies in libreco
:
- Python >= 3.6
- TensorFlow >= 1.15
- PyTorch >= 1.10
- Numpy >= 1.19.5
- Cython >= 0.29.0
- Pandas >= 1.0.0
- Scipy >= 1.2.1
- scikit-learn >= 0.20.0
- gensim >= 4.0.0
- tqdm >= 4.46.0
- hnswlib
LibRecommender
is tested under TensorFlow 1.15, 2.5 and 2.8. If you encounter any problem during running, feel free to open an issue.
Known issue: Sometimes one may encounter errors like ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
. In this case try upgrading numpy, and version 1.22.0 or higher is probably a safe option.
The table below shows some compatible version combinations:
Python | Numpy | TensorFlow | OS |
---|---|---|---|
3.6 | 1.19.5 | 1.15, 2.5 | linux, windows, macos |
3.7 | 1.20.3, 1.21.5 | 1.15, 2.5, 2.8 | linux, windows, macos |
3.8 | 1.22.2 | 2.5, 2.8 | linux, windows, macos |
3.9 | 1.22.2 | 2.5, 2.8 | linux, windows, macos |
3.10 | 1.22.2 | 2.8 | linux, windows, macos |
Optional Serving Dependencies:
- flask >= 1.0.0
- requests >= 2.22.0
- redis == 3.0.6
- redis-py >= 3.3.5
- faiss == 1.5.2
- Tensorflow Serving
Docker
One can also use the library in a docker container without installing dependencies, see Docker.
References
Category:
pure
means collaborative-filtering algorithms which only use behavior data,feat
means other side-features can be included.Sequence: Algorithms that leverage user behavior sequence.
Graph: Algorithms that leverage graph information, including Graph Embedding (GE) and Graph Neural Network (GNN) .
Embedding: Algorithms that can generate final user and item embeddings.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for LibRecommender-0.10.0-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79319642d7fe88e6d21738d72582b0cb347238c254b4b6a3e7d652ce6de4f0e6 |
|
MD5 | 8af42f4d012c15db048bbda2279af965 |
|
BLAKE2b-256 | 0d833f3527eb4a7df98756bdbbe76e2d25cef2b1523368ebada23f90ced6813c |
Hashes for LibRecommender-0.10.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b62809da4191f9a1f09ef40b4c7473f526cea7d23320e937c7f59ee6dec2efd |
|
MD5 | 81405c969cc3f8514f3783337d717972 |
|
BLAKE2b-256 | 0010a9579f45b6c48e7ff5b99e571516821656637ddb9ab70942aa834c4cea7c |
Hashes for LibRecommender-0.10.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4467b69977c5b7b381887e65482c6bfbf6eecfbee112a465b05596c829667fc7 |
|
MD5 | 1c63a722bb0198e9dd132a91bd333de2 |
|
BLAKE2b-256 | d7fd89ee51d785347aba5a0936c84b6b9ee66e7d64327f87579dd5e1a20de17a |
Hashes for LibRecommender-0.10.0-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 794db56abf79c36a2d64e7a38458449c77d9371432ea746380c1807e55c59813 |
|
MD5 | e3019f4c6a9965b00a3346ed51349f54 |
|
BLAKE2b-256 | f7e4a0327a1812b4642b1be9ae3c733bc14889bab55a2c52d40789714b8a3638 |
Hashes for LibRecommender-0.10.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f4fb45d025212faa34ec8e37fb9fd8e4a68f815a9c22d24730ba0b818d1d30c |
|
MD5 | 3fa8a3c60179fe32d9bdff3b50c68aff |
|
BLAKE2b-256 | df812ba725fd9b5b7a3cfd73c668ddd85b6c9050d76881709a5aed01684585e1 |
Hashes for LibRecommender-0.10.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 870048f72e38e4f709008a5976b432b4af3df8a67709af8a5084db6abc15a180 |
|
MD5 | 70d7bdae8c234bd055060ca3be4b188d |
|
BLAKE2b-256 | 31d15d5e250b97635a9fcf1ffbe37a633ed106a85710f976816b3de7e2d7b29f |
Hashes for LibRecommender-0.10.0-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 521dad3d6bfd1b64163591605024d8ff912c7198a71b446ce37f5fc405a489c2 |
|
MD5 | c1ea4702de85f814a88fc45b2d275cc2 |
|
BLAKE2b-256 | 84c6a1f3e68397971274f1402402976347395237a5bc3d39d5d02b4cf0d3fdc7 |
Hashes for LibRecommender-0.10.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e783cb150f893d94da9105b6e3ad5ee564e0db5bbf9919b37b6b1a94a08449f7 |
|
MD5 | d5be6f9ebaaa73e1a65f248ead1d63d0 |
|
BLAKE2b-256 | f3dc80f1c5ac87c9ec9ff140f6488bdd1415c42fad6d00a820f415dce9d29277 |
Hashes for LibRecommender-0.10.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c1be4dc1260d7226158b7d0018d73575ace97e2ea3de0f78020afcbefb04bee |
|
MD5 | fbf30716e2becb5171d484cb19de06ca |
|
BLAKE2b-256 | e84ad00dd9b84065d750f7f8edbc12822fd380f088985e76f41e4ed45d94b024 |
Hashes for LibRecommender-0.10.0-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 29bef970b1e8ce099b4385c0f3acf165c3d94ff58cc2f1a0dbba2cea8aa15b9c |
|
MD5 | ae45d274b5c26a6fd4203b98b9d1e7bb |
|
BLAKE2b-256 | 9317419d1c12cb4e3d35fac52c2f35e7d6752e5fdddad2d538cb2e74d5c115a0 |
Hashes for LibRecommender-0.10.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 735c7d264cb3ac2918bda89b16bcfcd8948a76bb3dd2a50bbc91dce1c571db87 |
|
MD5 | fdc02c21ed1512ac27a8fd72c8b5a005 |
|
BLAKE2b-256 | f5b271d315076ab4c8ebb195a523a8782d88afdbe88fad2c90f17f57e67e3022 |
Hashes for LibRecommender-0.10.0-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 174da4562b90bd11eb56891dab2adda8f54d5e23435ff120e9acd8f72480374e |
|
MD5 | 72a58a1d9c04d19e6cdfd875050dbf60 |
|
BLAKE2b-256 | e91dcdac35c35e5049435bcfffe908c68af065ccb93ff6e1ea64570416245bf9 |
Hashes for LibRecommender-0.10.0-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 19743a657965dcab0513972d2b383a75b5c47a2318c225f57d6b38b7eb8e935b |
|
MD5 | 5214e773421bab8716548b47dcd12a71 |
|
BLAKE2b-256 | e004044fac02fc0f09182de0cb9ca2e34f16281bf110a28de42c201e41cb4711 |
Hashes for LibRecommender-0.10.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8713303d044d6f173f451db2f0ea2885a456071164ce4552a37222e47d33ed5 |
|
MD5 | e7b4f00116f303f225e76ddc630f331a |
|
BLAKE2b-256 | 92f1516c2759df7d0ea0ed29cc395e920ed7000fd98088260e34e8cd553680d0 |
Hashes for LibRecommender-0.10.0-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 984084b8ec229b094b6b54493745d44edd01847b20cc499d8e590d057301cfd3 |
|
MD5 | 74ea4cf1b45c39a8404bea7812229c1e |
|
BLAKE2b-256 | 1f0121dfb6a9b1d68fb4f502734d0580f7b8a66fd6ebc173c2d935935c46ebd2 |