Story Recommender System based on LightFM

Project description

Story Recommender System based on LightFM

Overview

This project implements a personalized story recommendation system using the LightFM library. The system is designed to recommend stories (books, comics, audiobooks, etc.) to users based on their explicit ratings, reading behavior, language and preferences.

The code is structured in an object-oriented way, making it easy to maintain and extend. It uses Python type hints for clarity and robustness.

Project Structure

DataManager: Handles loading, preprocessing, and validation of all datasets (stories, users, ratings, behavioral data).
RecommenderSystem: Builds the recommendation model, processes user-story interactions, trains the LightFM model, and generates recommendations.

Data Sources

Story Dataset (story_dataset.csv)
- Contains metadata for each story: genres, keywords, language, age rating, etc.
User Dataset (user_dataset.csv)
- Contains user profiles: favorite genres, preferred keywords, language, age rating limitation, etc.
Rating Dataset (simple_rating_dataset.csv)
- Contains explicit user ratings for stories, with timestamps.
Behavioral Dataset (behavioral_dataset.csv)
- Contains user behavior logs, specifically:
  - reads: List of strings in the format story-uuid-language-percentage_finished (e.g., abc123-en-0.75)
  - impressions: List of story IDs the user has seen
  - timestamp: When the behavior occurred

Key Features

Strict Language Enforcement: Only recommends stories in the user's preferred language. (Can be disabled)
Hybrid Recommendation: Uses both explicit ratings (most important) and implicit behavioral signals (e.g., how much of a story was read).
Completion Percentage Weighting: For behavioral reads, the percentage of the story finished is used as the interaction strength.
Time Decay: More recent interactions are weighted more heavily.

How It Works

1. Data Loading and Preprocessing

DataManager loads all datasets.
Parses columns that contain lists (e.g., genres, keywords, reads).
Builds feature lists for each story and user for use in LightFM.
Validates that all stories are in a language supported by at least one user.

2. Interaction Matrix Construction

Explicit Ratings: For each user-story pair with a rating, an interaction is created with the rating as the weight.
Behavioral Reads: For each read, the code extracts the story ID, language, and completion percentage. If the language matches both the user's preference and the story's language, and there is no explicit rating for that pair, an interaction is created with the completion percentage as the weight.
Impressions: Optionally included as weak implicit interactions.
Time Decay: All interactions are further weighted by recency, so recent interactions are more important.

3. Model Training

The RecommenderSystem builds a LightFM dataset with all users, stories, and their features.
The interaction matrix (with weights) is used to train a LightFM model using the WARP loss function.
User and item features are included to help with cold-start scenarios.

4. Generating Recommendations

For a given user, the system:
- Filters out stories in the wrong language.
- Removes stories the user has already interacted with (read or rated).
- Scores the remaining stories using the trained model.
- Returns the top-N recommendations, including story metadata.

Example Usage

dm = DataManager(
story_path='story_dataset.csv',
user_path='user_dataset.csv',
rating_path='simple_rating_dataset.csv',
behavior_path='behavioral_dataset.csv'
)
dm.load_data()

recommender = RecommenderSystem(dm)
recommender.prepare_dataset()
recommender.train_model(epochs=30)

example_user = dm.users.iloc['userId']
recommendations = recommender.recommend(example_user, top_n=5)
print(recommendations)

Requirements

Python 3.9+
pandas
numpy
lightfm

Authors

Adam Mitrenga
Jan Voracek

Project details

Release history Release notifications | RSS feed

0.1.4

May 12, 2025

0.1.2

May 2, 2025

This version

0.1.0

May 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

literaryuniverse_recommender-0.1.0.tar.gz (8.5 kB view details)

Uploaded May 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

literaryuniverse_recommender-0.1.0-py3-none-any.whl (7.9 kB view details)

Uploaded May 1, 2025 Python 3

File details

Details for the file literaryuniverse_recommender-0.1.0.tar.gz.

File metadata

Download URL: literaryuniverse_recommender-0.1.0.tar.gz
Upload date: May 1, 2025
Size: 8.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for literaryuniverse_recommender-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`9e80d926b890db79fd748efe0aa1e0c2650cde3d29e30fd4753ce5a06be8ed0d`
MD5	`75306bec3b1dd8600ae1805ed150c069`
BLAKE2b-256	`6514e9da6369883f790e231773505c0f86c789af04a2cebcc07bd4e67dad414f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for literaryuniverse_recommender-0.1.0.tar.gz:

Publisher: pypi.yml on Edems10/literaryuniverse-recommender

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: literaryuniverse_recommender-0.1.0.tar.gz
- Subject digest: 9e80d926b890db79fd748efe0aa1e0c2650cde3d29e30fd4753ce5a06be8ed0d
- Sigstore transparency entry: 205669722
- Sigstore integration time: May 1, 2025
Source repository:
- Permalink: Edems10/literaryuniverse-recommender@a22622c5ce5acc5397a52f4b18df80d2a5985e91
- Branch / Tag: refs/heads/master
- Owner: https://github.com/Edems10
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@a22622c5ce5acc5397a52f4b18df80d2a5985e91
- Trigger Event: workflow_dispatch

File details

Details for the file literaryuniverse_recommender-0.1.0-py3-none-any.whl.

File metadata

Download URL: literaryuniverse_recommender-0.1.0-py3-none-any.whl
Upload date: May 1, 2025
Size: 7.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for literaryuniverse_recommender-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4e511cb1b505d5908233846536f26cff58f61d68a75ebffab9250f40a3fe83ff`
MD5	`4707915984cf9bb34a510590a5ba3d80`
BLAKE2b-256	`22297416fbf160bec09918b818d88bdd4a05ad0a321c707f2ca87fdedeae1c43`

See more details on using hashes here.

Provenance

The following attestation bundles were made for literaryuniverse_recommender-0.1.0-py3-none-any.whl:

Publisher: pypi.yml on Edems10/literaryuniverse-recommender

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: literaryuniverse_recommender-0.1.0-py3-none-any.whl
- Subject digest: 4e511cb1b505d5908233846536f26cff58f61d68a75ebffab9250f40a3fe83ff
- Sigstore transparency entry: 205669728
- Sigstore integration time: May 1, 2025
Source repository:
- Permalink: Edems10/literaryuniverse-recommender@a22622c5ce5acc5397a52f4b18df80d2a5985e91
- Branch / Tag: refs/heads/master
- Owner: https://github.com/Edems10
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@a22622c5ce5acc5397a52f4b18df80d2a5985e91
- Trigger Event: workflow_dispatch

literaryuniverse-recommender 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Story Recommender System based on LightFM

Overview

Project Structure

Data Sources

Key Features

How It Works

1. Data Loading and Preprocessing

2. Interaction Matrix Construction

3. Model Training

4. Generating Recommendations

Example Usage

Requirements

Authors

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance