Skip to main content

Python Framework for Content-Based Recommeder Systems

Project description

Build Status  Coverage Status  Docker Cloud Build Status  Python 3.8

Orange_cb_recsys

Framework for content-based recommender system

Installation

pip install orange-cb-recsys

PyLucene is required and will not be installed like other dependencies, you will need to install it personally.

You also need to manualy copy in the installing directory the files runnable_instances.xz e categories.xz that you can find in source directory

Usage

There are two types of use for this framework It can be used through API or through the use of a config file

API Usage

The use through API is the classic use of a library, classes and methods are used by invoking them.

Example:

Example

Config Usage

The use through the config file is an automated use.

Just indicate which algorithms you want to use and change variables where necessary without having to call classes or methods This use is intended for users who want to use many framework features.

Config File

We can see in the Config File an example of using the framework with this methodology.

As mentioned above, you need to change certain variables in order to allow the framework to work properly, here are some examples of these variables:

"content_type": "ITEM"

This can be ITEM, USER or RATING depending on what you are using

"output_directory": "movielens_test"

You can change this value in any output directory you want to get

"raw_source_path": "../../datasets/movies_info_reduced.json" 

This is the source of the ITEM, USER or RATING file that you are using

"source_type": "json"

Here you can specify the source_type, this can be JSON, CSV or SQL

"id_field_name": ["imdbID", "Title"]

Specify the field name of the ID

"search_index": "True"

True if you want to use the text indexing technique, otherwise False

"fields": [
{
  "field_name": "Plot",
  "lang": "EN",
  "memory_interface": "None",
  "memory_interface_path": "None",

In the "field" field you can specify the name of the field on which to use the technique, its language and the memory interface

The language will be specified for each field, so it will be possible to insert a single file to index ITEM or USER in many languages

"pipeline_list": [
    {
    "field_content_production": {"class": "search_index"},
    "preprocessing_list": [
      ]
    },
    {
    "field_content_production": {"class": "embedding",
      "combining_technique": {"class":  "centroid"},
      "embedding_source": {"class": "binary_file", "file_path": "../../datasets/doc2vec/doc2vec.bin", "embedding_type":  "doc2vec"},
      "granularity": "doc"},
    "preprocessing_list": [
      {"class": "nltk", "url_tagging":"True", "strip_multiple_whitespaces": "True"}
      ]
    },
    {
    "field_content_production": {"class": "lucene_tf-idf"},
    "preprocessing_list": [
      {"class": "nltk", "lemmatization": "True"}
      ]
    }

Here instead it is possible to define the pipeline:

For each field you can create many representations, as in this example search_index, embedding and tf-idf.

For each representation we can specify the preprocessing list to be used.

For example, for the tf-idf the nltk class is used which analyzes the natural language and the lemmatization is done

When using nltk these are the variables that can be changed: stopwords_removal, stemming, lemmatization, strip_multiple_white_space and url_tagging

When specifying embedding as field_content_production one must also specify the combining_technique which is currently only centroid, the source of the embedding and the granularity of it which can be word, doc and sentence

Project details


Release history Release notifications | RSS feed

This version

3.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orange_cb_recsys-3.0.tar.gz (82.0 kB view details)

Uploaded Source

Built Distribution

orange_cb_recsys-3.0-py3-none-any.whl (136.3 kB view details)

Uploaded Python 3

File details

Details for the file orange_cb_recsys-3.0.tar.gz.

File metadata

  • Download URL: orange_cb_recsys-3.0.tar.gz
  • Upload date:
  • Size: 82.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/49.1.2 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for orange_cb_recsys-3.0.tar.gz
Algorithm Hash digest
SHA256 93406a8d8564927401e9ae122935b0a70f1d5b54cce9dadc84da1990d9c35664
MD5 642cb184085217dd13a794648b2b2378
BLAKE2b-256 2c6248fa834a8466a67aa79dc34be5a03c9b8da7a2cb4d059ffaaa0deffa2d90

See more details on using hashes here.

File details

Details for the file orange_cb_recsys-3.0-py3-none-any.whl.

File metadata

  • Download URL: orange_cb_recsys-3.0-py3-none-any.whl
  • Upload date:
  • Size: 136.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/49.1.2 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for orange_cb_recsys-3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3274e2abb178b68f56376b0698cb61b1bacb9469480ab42676984baa4d90a1cb
MD5 35f1009e6787b7a868fdd0265c535de7
BLAKE2b-256 41647cdf18e87668ebad075d35293b2406f63205326b163490d31cd9e3a5adbc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page