A collaborative-filtering and content-based recommender system for both explicit and implicit datasets.
Project description
LibRecommender
Overview
LibRecommender is an easy-to-use recommender system focused on end-to-end recommendation. The main features are:
- Implemented a number of popular recommendation algorithms such as SVD++, DeepFM, BPR etc, see full algorithm list.
- A hybrid recommender system, which allows user to use either collaborative-filtering or content-based features or both. New features can be added on the fly.
- Low memory usage, automatically convert categorical and multi-value categorical features to sparse representation.
- Support training for both explicit and implicit datasets, and negative sampling can be used for implicit dataset.
- Making use of Cython or Tensorflow for high-speed model training.
- Provide end-to-end workflow, i.e. data handling / preprocessing -> model training -> evaluate -> serving.
- Support cold-start prediction and recommendation.
- Provide unified and friendly API for all algorithms. Easy to retrain model with new users/items.
Usage
pure collaborative-filtering example :
import numpy as np
import pandas as pd
from libreco.data import random_split, DatasetPure
from libreco.algorithms import SVDpp # pure data, algorithm SVD++
from libreco.evaluation import evaluate
data = pd.read_csv("examples/sample_data/sample_movielens_rating.dat", sep="::",
names=["user", "item", "label", "time"])
# split whole data into three folds for training, evaluating and testing
train_data, eval_data, test_data = random_split(data, multi_ratios=[0.8, 0.1, 0.1])
train_data, data_info = DatasetPure.build_trainset(train_data)
eval_data = DatasetPure.build_evalset(eval_data)
test_data = DatasetPure.build_testset(test_data)
print(data_info) # n_users: 5894, n_items: 3253, data sparsity: 0.4172 %
svdpp = SVDpp(task="rating", data_info=data_info, embed_size=16, n_epochs=3, lr=0.001,
reg=None, batch_size=256)
# monitor metrics on eval_data during training
svdpp.fit(train_data, verbose=2, eval_data=eval_data, metrics=["rmse", "mae", "r2"])
# do final evaluation on test data
print("evaluate_result: ", evaluate(model=svdpp, data=test_data,
metrics=["rmse", "mae"]))
# predict preference of user 2211 to item 110
print("prediction: ", svdpp.predict(user=2211, item=110))
# recommend 7 items for user 2211
print("recommendation: ", svdpp.recommend_user(user=2211, n_rec=7))
# cold-start prediction
print("cold prediction: ", svdpp.predict(user="ccc", item="not item",
cold_start="average"))
# cold-start recommendation
print("cold recommendation: ", svdpp.recommend_user(user="are we good?",
n_rec=7,
cold_start="popular"))
include features example :
import numpy as np
import pandas as pd
from libreco.data import split_by_ratio_chrono, DatasetFeat
from libreco.algorithms import YouTubeRanking # feat data, algorithm YouTubeRanking
data = pd.read_csv("examples/sample_data/sample_movielens_merged.csv", sep=",", header=0)
data["label"] = 1 # convert to implicit data and do negative sampling afterwards
# split into train and test data based on time
train_data, test_data = split_by_ratio_chrono(data, test_size=0.2)
# specify complete columns information
sparse_col = ["sex", "occupation", "genre1", "genre2", "genre3"]
dense_col = ["age"]
user_col = ["sex", "age", "occupation"]
item_col = ["genre1", "genre2", "genre3"]
train_data, data_info = DatasetFeat.build_trainset(
train_data, user_col, item_col, sparse_col, dense_col
)
test_data = DatasetFeat.build_testset(test_data)
train_data.build_negative_samples(data_info) # sample negative items for each record
test_data.build_negative_samples(data_info)
print(data_info) # n_users: 5962, n_items: 3226, data sparsity: 0.4185 %
ytb_ranking = YouTubeRanking(task="ranking", data_info=data_info, embed_size=16,
n_epochs=3, lr=1e-4, batch_size=512, use_bn=True,
hidden_units="128,64,32")
ytb_ranking.fit(train_data, verbose=2, shuffle=True, eval_data=test_data,
metrics=["loss", "roc_auc", "precision", "recall", "map", "ndcg"])
# predict preference of user 2211 to item 110
print("prediction: ", ytb_ranking.predict(user=2211, item=110))
# recommend 7 items for user 2211
print("recommendation(id, probability): ", ytb_ranking.recommend_user(user=2211, n_rec=7))
# cold-start prediction
print("cold prediction: ", ytb_ranking.predict(user="ccc", item="not item",
cold_start="average"))
# cold-start recommendation
print("cold recommendation: ", ytb_ranking.recommend_user(user="are we good?",
n_rec=7,
cold_start="popular"))
For more examples and usages, see User Guide
Data Format
JUST normal data format, each line represents a sample. One thing is important, the model assumes that user
, item
, and label
column index are 0, 1, and 2, respectively. You may wish to change the column order if that's not the case. Take for Example, the movielens-1m
dataset:
1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275
Besides, if you want to use some other meta features (e.g., age, sex, category etc.), you need to tell the model which columns are [sparse_col, dense_col, user_col, item_col
], which means all features must be in a same table. See above YouTubeRanking
for example.
Also note that your data should not contain missing values.
Serving
For how to serve a trained model in LibRecommender, see Serving Guide .
Installation & Dependencies
From pypi :
$ pip install LibRecommender==0.8.4
To build from source, you 'll first need Cython and Numpy:
$ # pip install numpy cython
$ git clone https://github.com/massquantity/LibRecommender.git
$ cd LibRecommender
$ python setup.py install
Basic Dependencies in libreco
:
- Python >= 3.6
- TensorFlow >= 1.15
- PyTorch >= 1.10
- Numpy >= 1.19.5
- Cython >= 0.29.0
- Pandas >= 1.0.0
- Scipy >= 1.2.1
- scikit-learn >= 0.20.0
- gensim >= 4.0.0
- tqdm >= 4.46.0
- hnswlib
LibRecommender
is tested under TensorFlow 1.15, 2.5 and 2.8. If you encounter any problem during running, feel free to open an issue.
Known issue: Sometimes one may encounter errors like ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
. In this case try upgrading numpy, and version 1.22.0 or higher is probably a safe option.
The table below shows some compatible version combinations:
Python | Numpy | TensorFlow | OS |
---|---|---|---|
3.6 | 1.19.5 | 1.15, 2.5 | linux, windows, macos |
3.7 | 1.20.3, 1.21.5 | 1.15, 2.5, 2.8 | linux, windows, macos |
3.8 | 1.22.2 | 2.5, 2.8 | linux, windows, macos |
3.9 | 1.22.2 | 2.5, 2.8 | linux, windows, macos |
3.10 | 1.22.2 | 2.8 | linux, windows, macos |
Optional Serving Dependencies:
- flask >= 1.0.0
- requests >= 2.22.0
- redis == 3.0.6
- redis-py >= 3.3.5
- faiss == 1.5.2
- Tensorflow Serving
References
Category:
pure
means collaborative-filtering algorithms which only use behavior data,feat
means other side-features can be included.Sequence: Algorithms that leverage user behavior sequence.
Graph: Algorithms that leverage graph information, including Graph Embedding (GE) and Graph Neural Network (GNN) .
Embedding: Algorithms that can generate final user and item embeddings.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for LibRecommender-0.8.4-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46c93d436449c063d240cc37f6d725ab0c9b59a2cf4dc9fe815fabb9c1438634 |
|
MD5 | 71d3c9bdca49f046b227abb17ed48592 |
|
BLAKE2b-256 | fbd3f9dec608348fb70b276b4fab407cffcef803056ca35bf790f57c23155454 |
Hashes for LibRecommender-0.8.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6246aa98462fb8170f4547caf6b9965efb92178fb1355e8e70b5885ef5c6001b |
|
MD5 | a027b583bdbbd81a4dad896a77dd4d4e |
|
BLAKE2b-256 | 5895bcc27083e8d68f637f6217ab296e8bd0bb54b11735b74f21892225dc79be |
Hashes for LibRecommender-0.8.4-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e3838e580b359a1d290f707176969c1de9455b15d3443dfb93ff6e6aedfdb810 |
|
MD5 | 02fb4fd9a65c01af614170732bf3598f |
|
BLAKE2b-256 | 846ddf96bc9293ad92a571908084791dbe47f20f21b31737c2671a2d7846ed12 |
Hashes for LibRecommender-0.8.4-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b2afbf6c111a56d07a8bfe212aeb733cde17a0004377a117c06544fd5c7c184c |
|
MD5 | d0bc008e18f84ac4770c5d845f15c69a |
|
BLAKE2b-256 | 47bde7aa99f6e2104fa4d0f3208e52ca06d54f036bfbdd7963e09fe88bb4a4f0 |
Hashes for LibRecommender-0.8.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 17bd1179a970601e0f9351bd942c462af8be5cb0d28805505ecd3a7b6a356438 |
|
MD5 | db0f2f4e190edd33e1f90782a6d4f122 |
|
BLAKE2b-256 | 26f8dc53b690b67e75265880853375930399aac8bf498f2020ac00ba01541533 |
Hashes for LibRecommender-0.8.4-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e74ace52a9c89f0e26b15b16880f5c7f05955f0a1afb89fa5e0778bd3fcb11d3 |
|
MD5 | 022157ec658b697928540668fc7c58c9 |
|
BLAKE2b-256 | 2f2082a7c971833d38d4434bdea8ab92d3bc47207190130efd2c333c246e979a |
Hashes for LibRecommender-0.8.4-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1fd467772fde90c6e8945dc95a1c1fc9996bef6df40756bed2459ade161544eb |
|
MD5 | b76d5ae9ab935ec54cb044ae2f25b621 |
|
BLAKE2b-256 | db240a7948d54a68f412c01dccb212f52f3d80a57663e3b87c25a090a8b80734 |
Hashes for LibRecommender-0.8.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8c1ffd69f984048718aa3c2f494c8daae95b283ebd645cd694438b7d452538e |
|
MD5 | bb157c851e69075d93c9a72ffa7dd8ba |
|
BLAKE2b-256 | b04558008d900125da9ba989bc01541867204628e24f290a5dcf9dae29fee0b1 |
Hashes for LibRecommender-0.8.4-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 94f743ff1ca6a3585df7c5682b894497b8f0c15a2b1ae6d322a0e79d890836d3 |
|
MD5 | d49f41ae7aeadea230666fd9c1b91d27 |
|
BLAKE2b-256 | 1deb209eebbcf54036ecd140910254d5518252c04c23cb38aa592dd5a835252e |
Hashes for LibRecommender-0.8.4-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0dd754c9b050d3de4db0415ae0d416d88e8f82c50afcd1e8ff3dd70985c35ef9 |
|
MD5 | c42dc13ee9cd6c3a74f2de7511903f9d |
|
BLAKE2b-256 | 8e6d84777805a7e7c8e38aa294827139ca029aa3c405818eb3a14585ec6a7011 |
Hashes for LibRecommender-0.8.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ad6b176e24f30bb2d5a80e5d45bb04627fde87b44c0f92c3938edea597fa553 |
|
MD5 | a4c3f3d7bd7ac3d357c74f79207f4eab |
|
BLAKE2b-256 | b6bb246b64ee1eaeeeda750dc1e3bdcc82521c0c751a8b110dfa441e8e0c9289 |
Hashes for LibRecommender-0.8.4-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 92d012e141523d2255406e55775cf0bfa9224e908384e612d63ff0720f520944 |
|
MD5 | e50658775e0b2b851f3ef7aaa2f42103 |
|
BLAKE2b-256 | a966b891c91454c38b3127719fef5b985bd3305c6c54a1f6cc97ce1352011c6c |
Hashes for LibRecommender-0.8.4-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 526637389ad07b18afd5253d430cfbc4a7cae31c0899cd9f799c9c39afe5db72 |
|
MD5 | 4e7ed551d9c8a4bd59eae712d47e3b93 |
|
BLAKE2b-256 | 3b62591d910b84cc716c79ed76d31009bd656e2e246a219a7477a9be758e4362 |
Hashes for LibRecommender-0.8.4-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a2c9fabf20b8fe6b34a9a870ff00fd0d147ee6ae4f0453fbe206ae54bf4838f2 |
|
MD5 | 2f574e0e91fb94a290135f228a8e97c9 |
|
BLAKE2b-256 | c2bf461b9104c73cb53f07cc28fa634706f15e2e2848a8ea1a8f8b2176b47cad |
Hashes for LibRecommender-0.8.4-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f94447d3976a5ec2cd528cd9ccb7cd891d51edb8242a30154ee59905269ec71a |
|
MD5 | 18489d4e0a228333df330b8a9a13c92b |
|
BLAKE2b-256 | 7fadbac0065cd46b6f7aa8a4d2f802c7b31762870bcfacfe81acfac4d616bb64 |