A package for matrix factorization using alternating least squares
Project description
ALSolver
ALSolver implements alternating least squares for matrix factorization to be used for building a recommender system
Installation
pip install pyalsolver
Development
ALSolver is managed by uv. So to start clone this repo and run:
$ uv venv --python 3.12
$ uv sync
After any edits or updates, run ruff to fix any formatting or lining issues:
$ uv run ruff format
$ uv run ruff check
Usage
Loading Data
from pyalsolver.utils import MovieLens, download_dataset,
# This will download the dataset if it doesn't exist
# or return its path if it exists
data_path = download_dataset('ml-32m')
print(data_path)
ml_dataset = MovieLens(
data_path
)
Training a model
from pyalsolver import ALSMF, ENGINE
from pyalsolver.utils import plot_rmse_history
model = ALSMF(0.2, 0.01, 0.01, k=20)
train_rmse_history, valid_rmse_history, loss_history = model.fit(
ml_dataset.Rui_train, ml_dataset.Riu_train, ml_dataset.Rui_valid,
n_epochs=10, engine=ENGINE.NUMBA
)
# to plot rmse history
plot_rmse_history(20, train_rmse_history, valid_rmse_history)
Recommending Movies to an already existing user
uid = 10
pred_ratings, pred_item_indices = model.recommend(uid, topk=30)
pred_item_ids = [ml_dataset.idx_to_item_id[i] for i in pred_item_indices]
pred_item_titles = [ml_dataset.item_id_to_title[i] for i in pred_item_ids]
print(pred_ratings)
print(pred_item_titles)
Coldstart a user with a single rating
import numpy as np
pred_ratings, pred_item_indices = model.coldstart(
np.array([5]),
np.array([628]),
topk=40,
min_popularity=50 # Only consider items with at least 50 ratings
)
pred_item_ids = [ml_dataset.idx_to_item_id[i] for i in pred_item_indices]
pred_item_titles = [ml_dataset.item_id_to_title[i] for i in pred_item_ids]
print(pred_ratings)
print(pred_item_titles)
The packages provide three engines for computation:
ENGINE.NUMPY
: uses NumPy and is recommended for small datasetsENGINE_NUMBA
: uses jitted numba code with not python objects and is recommended for large datasets.ENGINE_PARALLEL
: uses Python process parallelization for spinning up multiple processes that work to update different portions of the latent. It is only recommended if the overhead of spinning up a new process doesn't exceed the time of computing one iteration
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyalsolver-0.0.12.tar.gz
(76.0 kB
view details)
Built Distribution
File details
Details for the file pyalsolver-0.0.12.tar.gz
.
File metadata
- Download URL: pyalsolver-0.0.12.tar.gz
- Upload date:
- Size: 76.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.4.21
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1ae67c6bcb3596479161ffbaf3f69fe7229e898c620efa78d9ccb9942fc34e9 |
|
MD5 | 8081eea6dcf68725b40c2790e6acf6c8 |
|
BLAKE2b-256 | c9589aeff039be281fb01bddddfbd015d5baee4f2d15fb95c0dda74b63321e58 |
File details
Details for the file pyalsolver-0.0.12-py3-none-any.whl
.
File metadata
- Download URL: pyalsolver-0.0.12-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.4.21
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | be7c266ef40d149230da78141283dbc1e515923b3de58d12922eed236b2bc3ce |
|
MD5 | d0e652c63e8bb92412779f355b1af90b |
|
BLAKE2b-256 | 9bb90e80f683a869c9b50058bd518b1a3dc94dce696fa8560b2c80746cd7254a |