No project description provided
Project description
Rsdiv: Diversity improvement framework for recommender systems
rsdiv is a Python package for recommender systems to provide the measurements and improvements for the diversity of results.
Some of its features include:
- various kinds of metrics to measure the diversity of recommender systems from a quantitative view.
- various implementations for diversify algorithms and models.
- various implementations of core recommender algorithms.
- benchmarks for comparing and further analysis.
- hyperparameter optimization based on Optuna.
Installation
You can simply install the pre-build binaries with:
$ pip install rsdiv
Or you may want to build from source:
$ cd rsdiv && pip install .
Basic Usage
Prepare for a benchmark dataset
Load a benchmark, say, MovieLens 1M Dataset. This is a table benchmark dataset which contains 1 million ratings from 6000 users on 4000 movies.
>>> import rsdiv as rs
>>> loader = rs.MovieLens1MDownLoader()
Get the user-item interactions (ratings):
>>> ratings = loader.read_ratings()
userId | movieId | rating | timestamp | |
---|---|---|---|---|
0 | 1 | 1193 | 5 | 2000-12-31 22:12:40 |
1 | 1 | 661 | 3 | 2000-12-31 22:35:09 |
... | ... | ... | ... | ... |
1000207 | 6040 | 1096 | 4 | 2000-04-26 02:20:48 |
1000208 | 6040 | 1097 | 4 | 2000-04-26 02:19:29 |
Get the users' infomation:
>>> users = loader.read_users()
userId | gender | age | occupation | zipcode | |
---|---|---|---|---|---|
0 | 1 | F | 1 | 10 | 48067 |
1 | 2 | M | 56 | 16 | 70072 |
... | ... | ... | ... | ... | ... |
6038 | 6039 | F | 45 | 0 | 01060 |
6039 | 6040 | M | 25 | 6 | 11106 |
Get the items' information:
>>> movies = loader.read_items()
movieId | title | genres | release_date | |
---|---|---|---|---|
0 | 1 | Toy Story | ['Animation', "Children's", 'Comedy'] | 1995 |
1 | 2 | Jumanji | ['Adventure', "Children's", 'Fantasy'] | 1995 |
... | ... | ... | ... | ... |
3881 | 3951 | Two Family House | ['Drama'] | 2000 |
3882 | 3952 | Contender, The | ['Drama', 'Thriller'] | 2000 |
Evaluate the results in various aspects
Load the evaluator to analyse the results, say, Gini coefficient metric:
>>> metrics = rs.DiversityMetrics()
>>> metrics.gini_coefficient(ratings['movieId'])
>>> 0.6335616301416965
The nested input type (List[List[str]]
-like) is also favorable. This is especially usful to evaluate the diversity on topic-scale:
>>> metrics.gini_coefficient(movies['genres'])
>>> 0.5158655846858095
Shannon Index and Effective Catalog Size are also available with same usage.
Draw a Lorenz curve graph for insights
Lorenz curve is a graphical representation of the distribution, the cumulative proportion of species is plotted against the cumulative proportion of individuals. This feature is also supported by rsdiv for helping practitioners' analysis.
metrics.get_lorenz_curve(ratings['movieId'])
Train a recommender
rsdiv provides various implementations of core recommender algorithms. To start with, a wrapper for LightFM
is also supported:
>>> rc = rs.FMRecommender(ratings, 0.3).fit()
30% of interactions are split for test set, the precision at top 5 can be calculated with:
>>> rc.precision_at_top_k(5)
>>> 0.14464477
the prediction scores for a given user on each item can be access with (the results with seen items removed can be calculated by predict_for_userId_unseen
):
>>> rc.predict_for_userId(42)
>>> array([-3.0786333, -2.8600938, -5.5952744, ..., -5.9792733, -7.8316765, -6.2370725], dtype=float32)
the scores of top 5
recommended items for the userId: 1024
are given by:
>>> rc.predict_top_n_unseen(1024, 5)
>>> {1296: 1.7469575, 916: 1.773555, 915: 1.63063, 2067: 1.3016684, 28: 1.2860104}
Improve the diversity
TODO.
For developers
Make sure you have pre-commit
installed:
pip install pre-commit
pre-commit install
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.