Skip to main content

VSKNN model for recommendations

Project description

WSKNN: k-NN recommender for session-based data

DOI

Weighted session-based k-NN - Intro

Do you build a recommender system for your website? K-nearest neighbors algorithm is a good choice if you are looking for a simple, fast, and explainable solution. Weighted-session-based k-nn recommendations are close to the state-of-the-art, and we don't need to tune multiple hyperparameters and build complex deep learning models to achieve a good result.

How does it work?

You provide two input structures as training data:

sessions : dict
               sessions = {
                   session id: (
                       [sequence of items with user interaction],
                       [timestamp of user interaction per item],
                       [sequence of weighting factors]
                   )
               }

items : dict
        items = {
            item id: (
                [sequence of sessions with an item],
                [the first timestamp of each session with an item]
            )
        }

And you ask a model to recommend products based on the user session:

user session: {session id: [[sequence of items], [sequence of timestamps]]}

The package is lightweight. It depends only on the numpy and pyyaml.

Moreover, we can provide a package for non-programmers, and they can use settings.yaml to control a model behavior.

Why should we use WSKNN?

  • training is faster than deep learning or XGBoost algorithms, model memorizes map of session-items and item-sessions,
  • recommendations are easy to control. We can change how the algorithm works in just a few lines... of text,
  • as a baseline, for comparison of deep learning / XGBoost architectures,
  • swift prototyping,
  • easy to run in production.

The model was created along with multiple other approaches: based on RNN (GRU/LSTM), matrix factorization, and others. Its performance was always very close to the level of fine-tuned neural networks, but it was much easier and faster to train.

What are the limitations of WSKNN?

  • model memorizes session-items and item-sessions maps, and if your product base is large and you use sessions for an extended period, then the model may be too big to fit an available memory; in this case, you can categorize products and train a different model for each category,
  • response time may be slower than from other models, especially if there are available many sessions,
  • there's additional overhead related to the preparation of the input.

Example

from wsknn import fit
from wsknn.utils import load_pickled

# Load data
ITEMS = 'demo-data/items.pkl'
SESSIONS = 'demo-data/sessions.pkl'

items = load_pickled(ITEMS)
sessions = load_pickled(SESSIONS)

trained_model = fit(sessions, items)

test_session = {'unique id': [
    ['product id 1', 'product id 2'],
    ['timestamp #1', 'timestamp #2']
]}

recommendations = trained_model.predict(test_session, number_of_recommendations=3)
print(recommendations)

Output:

[
 ('product id 3', 0.7),
 ('product id 4', 0.33),
 ('product id 5', 0.059)
]

Setup

Version 0.1 of a package can be installed with pip:

pip install wsknn

It works with Python versions greater or equal to 3.6.

Requirements

Package Version Python versions Other packages
0.1 3.6+ numpy, yaml

Developers

  • Szymon Moliński (Sales Intelligence : Digitree Group SA)

Citation

Szymon Moliński. (2022). WSKNN - Weighted Session-based k-NN Recommendations in Python (0.1). Zenodo. https://doi.org/10.5281/zenodo.6393177

Bibliography

Data used in a demo example

  • David Ben-Shimon, Alexander Tsikinovsky, Michael Friedmann, Bracha Shapira, Lior Rokach, and Johannes Hoerle. 2015. RecSys Challenge 2015 and the YOOCHOOSE Dataset. In Proceedings of the 9th ACM Conference on Recommender Systems (RecSys '15). Association for Computing Machinery, New York, NY, USA, 357–358. DOI:https://doi.org/10.1145/2792838.2798723

Comparison between DL and WSKNN

  • Twardowski, B., Zawistowski, P., Zaborowski, S. (2021). Metric Learning for Session-Based Recommendations. In: Hiemstra, D., Moens, MF., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2021. Lecture Notes in Computer Science(), vol 12656. Springer, Cham. https://doi.org/10.1007/978-3-030-72113-8_43

Funding

Funding

  • Development of the package was partially based on the research project E-commerce Shopping Patterns Prediction System that was founded under Priority Axis 1.1 of Smart Growth Operational Programme 2014-2020 for Poland co-funded by European Regional Development Fund. Project number: POIR.01.01.01-00-0632/18

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wsknn-0.1.4.dev2.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

wsknn-0.1.4.dev2-py3-none-any.whl (25.2 kB view details)

Uploaded Python 3

File details

Details for the file wsknn-0.1.4.dev2.tar.gz.

File metadata

  • Download URL: wsknn-0.1.4.dev2.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.13

File hashes

Hashes for wsknn-0.1.4.dev2.tar.gz
Algorithm Hash digest
SHA256 9123b7cfa269a1254a5c8a5818241f8c34157a9a28b537fc824ade28859ed52b
MD5 7ea621fd2db1efdb5b3796477a756274
BLAKE2b-256 9370006df43842cc4e22549bf1712563f233355bf20b79960bd14b72f87daa87

See more details on using hashes here.

File details

Details for the file wsknn-0.1.4.dev2-py3-none-any.whl.

File metadata

  • Download URL: wsknn-0.1.4.dev2-py3-none-any.whl
  • Upload date:
  • Size: 25.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.13

File hashes

Hashes for wsknn-0.1.4.dev2-py3-none-any.whl
Algorithm Hash digest
SHA256 5e75bb25531ad82495676f05822966ad40755eed23b992372e7f869531a3d31b
MD5 8e70854c99a46a5e2051b6ee8ebe0ae8
BLAKE2b-256 9aa52d2f6db969988ab35d5a44b99e312bba9d261f1e1b3012a72c9a069e7537

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page