A collaborative-filtering and content-based recommender system for both explicit and implicit datasets.
Project description
LibRecommender
Overview
LibRecommender is an easy-to-use recommender system focused on end-to-end recommendation. The main features are:
-
Implemented a number of popular recommendation algorithms such as SVD++, DeepFM, BPR etc.
-
A hybrid recommender system, which allows user to use either collaborative-filtering or content-based features.
-
Ease of memory usage, automatically convert categorical features to sparse representation.
-
Suitable for both explicit and implicit datasets, and negative sampling can be used for implicit dataset.
-
Making use of Cython or Tensorflow to accelerate model training.
-
Provide end-to-end workflow, i.e. data handling / preprocessing -> model training -> evaluate -> serving.
Usage
pure collaborative-filtering example :
import numpy as np
import pandas as pd
from libreco.data import random_split, DatasetPure
from libreco.algorithms import SVDpp # pure data, algorithm SVD++
data = pd.read_csv("examples/sample_data/sample_movielens_rating.dat", sep="::",
names=["user", "item", "label", "time"])
# split whole data into three folds for training, evaluating and testing
train_data, eval_data, test_data = random_split(data, multi_ratios=[0.8, 0.1, 0.1])
train_data, data_info = DatasetPure.build_trainset(train_data)
eval_data = DatasetPure.build_testset(eval_data)
test_data = DatasetPure.build_testset(test_data)
print(data_info) # n_users: 5894, n_items: 3253, data sparsity: 0.4172 %
svdpp = SVDpp(task="rating", data_info=data_info, embed_size=16, n_epochs=3, lr=0.001,
reg=None, batch_size=256)
# monitor metrics on eval_data during training
svdpp.fit(train_data, verbose=2, eval_data=eval_data, metrics=["rmse", "mae", "r2"])
# do final evaluation on test data
svdpp.evaluate(test_data, metrics=["rmse", "mae"])
# predict preference of user 1 to item 2333
print("prediction: ", svdpp.predict(user=1, item=2333))
# recommend 7 items for user 1
print("recommendation: ", svdpp.recommend_user(user=1, n_rec=7))
include features example :
import numpy as np
import pandas as pd
from libreco.data import split_by_ratio_chrono, DatasetFeat
from libreco.algorithms import YouTubeRanking # feat data, algorithm YouTubeRanking
data = pd.read_csv("examples/sample_data/sample_movielens_merged.csv", sep=",", header=0)
data["label"] = 1 # convert to implicit data and do negative sampling afterwards
# split into train and test data based on time
train_data, test_data = split_by_ratio_chrono(data, test_size=0.2)
# specify complete columns information
sparse_col = ["sex", "occupation", "genre1", "genre2", "genre3"]
dense_col = ["age"]
user_col = ["sex", "age", "occupation"]
item_col = ["genre1", "genre2", "genre3"]
train_data, data_info = DatasetFeat.build_trainset(
train_data, user_col, item_col, sparse_col, dense_col)
test_data = DatasetFeat.build_testset(test_data, sparse_col, dense_col)
train_data.build_negative_samples(data_info) # sample negative items for each record
test_data.build_negative_samples(data_info)
print(data_info) # n_users: 5962, n_items: 3226, data sparsity: 0.4185 %
ytb_ranking = YouTubeRanking(task="ranking", data_info=data_info, embed_size=16,
n_epochs=3, lr=1e-4, batch_size=512, use_bn=True,
hidden_units="128,64,32")
ytb_ranking.fit(train_data, verbose=2, shuffle=True, eval_data=test_data,
metrics=["loss", "roc_auc", "precision", "recall", "map", "ndcg"])
# predict preference of user 1 to item 2333
print("prediction: ", ytb_ranking.predict(user=1, item=2333))
# recommend 7 items for user 1
print("recommendation: ", ytb_ranking.recommend_user(user=1, n_rec=7))
Data Format
JUST normal data format, each line represents a sample. One thing is important, the model assumes that user
, item
, and label
column index are 0, 1, and 2, respectively. You may wish to change the column order if that's not the case. Take for Example, the movielens-1m
dataset:
1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275
Besides, if you want to use some other meta features (e.g., age, sex, category etc.), you need to tell the model which columns are [sparse_col, dense_col, user_col, item_col
], which means all features must be in a same table. See above YouTubeRanking
for example.
Installation & Dependencies
From pypi : pip install LibRecommender
Basic Dependencies in libreco
:
- Python >= 3.6
- tensorflow >= 1.14 (but not tf 2.0 :)
- numpy >= 1.15.4
- pandas >= 0.23.4
- scipy >= 1.2.1
- scikit-learn >= 0.20.0
Optional Serving Dependencies:
- flask >= 1.0.0
- requests >= 2.22.0
- redis == 3.0.6
- redis-py >= 3.3.5
- faiss == 1.5.2
- Tensorflow Serving
References
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.