Skip to main content

Recommender Systems Dataset from FINN.no containing the presented items and whether and what the user clicked on.

Project description

FINN.no Recommender Systems Slate Dataset

Repository containing the recommender systems slates dataset

We release the FINN.no recommender systems slate dataset to improve recommender systems research. The dataset includes both search and recommendation interactions between users and the platform over a 30 day period. The dataset has logged both exposures and clicks, including interactions where the user did not click on any of the items in the slate. To our knowledge there exist no such large-scale dataset, and we hope this contribution can help researchers constructing improved models and improve offline evaluation metrics.

A visualization of a presented slate to the user on the frontpage of FINN.no

For each user u and interaction step t we recorded all items in the visible slate equ (up to the scroll length equ), and the user's click response equ. The dataset consists of 37.4 million interactions, |U| ≈ 2.3) million users and |I| ≈ 1.3 million items that belong to one of G = 290 item groups. For a detailed description of the data please see the paper.

A visualization of a presented slate to the user on the frontpage of FINN.no

FINN.no is the leading marketplace in the Norwegian classifieds market and provides users with a platform to buy and sell general merchandise, cars, real estate, as well as house rentals and job offerings. For questions, email simen.eide@finn.no or file an issue.

Organization

The repository is organized as follows:

Download and prepare dataset

The data files can either be obtained by cloning this repository with git lfs, or (preferably) use the datahelper.download_data_files() function which downloads the same dataset from google drive. For pytorch users, they can directly use the dataset_torch.load_dataloaders() to get ready-to-use dataloaders for training, validation and test datasets.

Quickstart dataset Open In Colab

We provide a quickstart jupyter notebook that runs on Google Colab (quickstart-finn-recsys-slate-data.ipynb) which includes all necessary steps above.

NB: This quickstart notebook is currently incompatible with the main branch. We will update the notebook as soon as we have published a pip-package. In the meantime, please use the v1.0 release of the repository

Citations

This repository accompany the paper "Dynamic Slate Recommendation with Gated Recurrent Units and Thompson Sampling" by Simen Eide, David S. Leslie and Arnoldo Frigessi. The article is under review, and the pre-print can be obtained here.

If you use either the code, data or paper, please consider citing the paper.

@article{eide2021dynamic,
      title={Dynamic Slate Recommendation with Gated Recurrent Units and Thompson Sampling}, 
      author={Simen Eide and David S. Leslie and Arnoldo Frigessi},
      year={2021},
      eprint={2104.15046},
      archivePrefix={arXiv},
      primaryClass={stat.ML}
}

Todo

This repository is currently work in progress, and we will provide descriptions and tutorials. Suggestions and contributions to make the material more available is welcome. There are some features of the repository that we are working on:

  • Release the dataset as numpy objects instead of pytorch arrays. This will help non-pytorch users to more easily utilize the data
  • Maintain a pytorch dataset for easy usage
  • Create a pip package for easier installation and usage. the package should download the dataset using a function.
  • Make the quickstart guide compatible with the pip package and numpy format.
  • Add easily useable functions that compute relevant metrics such as hitrate, log-likelihood etc.
  • Distribute the data on other platforms such as kaggle.
  • Add a short description of the data in the readme.md directly.

As the current state is in early stage, it makes sense to allow the above changes non-backward compatible. However, this should be completed within the next couple of months.

This file will become your README and also the index of your documentation.

Install

pip install your_project_name

How to use

Fill me in please! Don't forget code examples:

1+1
2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recsys_slates_dataset-0.0.1.tar.gz (13.8 kB view hashes)

Uploaded Source

Built Distribution

recsys_slates_dataset-0.0.1-py3-none-any.whl (11.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page