High-performance KNN similarity functions in Python, optimized for sparse matrices

These details have not been verified by PyPI

Project links

Project description

similaripy

SimilariPy

High-performance KNN similarity functions in Python, optimized for sparse matrices.

SimilariPy is primarily designed for Recommender Systems and Information Retrieval (IR) tasks, but can be applied to other domains as well.

The package also includes a set of normalization functions useful for pre-processing data before the similarity computation.

The official documentations is available at 📘 SimilariPy Guide

🔍 Similarity Functions

SimilariPy provides a range of high-performance similarity functions for sparse matrices.
All functions are multi-threaded and implemented in Cython + OpenMP for fast parallel computation on CSR matrixes.

Core

Dot Product – Simple raw inner product between vectors.
Cosine – Normalized dot product based on L2 norm.
Asymmetric Cosine – Skewed cosine similarity using an alpha parameter.
Jaccard, Dice, Tversky – Set-based generalized similarities.

Graph-Based

P3α – Graph-based similarity computed through random walk propagation with exponentiation.
RP3β – Similar to P3α but includes popularity penalization using a beta parameter.

Advanced

S-Plus – A hybrid model combining Tversky and Cosine components, with full control over weights and smoothing.

For mathematical definitions and parameter details, see the 📘 SimilariPy Guide.

🧮 Normalization Functions

SimilariPy provides a suite of normalization functions for sparse matrix pre-processing.
All functions are implemented in Cython and can operate in-place on CSR matrixes for maximum performance and memory efficiency.

L1, L2 – Applies row- or column-wise normalization.
TF-IDF – Computes TF-IDF weighting with customizable term-frequency and IDF modes.
BM25 – Applies classic BM25 weighting used in information retrieval.
BM25+ – Variant of BM25 with additive smoothing for low-frequency terms.

For more details, check the 📘 SimilariPy Guide.

🚀 Getting Started

Here’s a minimal example to get you up and running with SimilariPy:

import similaripy as sim
import scipy.sparse as sps

# Create a random User-Rating Matrix (URM)
urm = sps.random(1000, 2000, density=0.025)

# Normalize the URM using BM25
urm = sim.normalization.bm25(urm)

# Train an item-item cosine similarity model
similarity_matrix = sim.cosine(urm.T, k=50)

# Compute recommendations for user 1, 14, 8 
# filtering out already-seen items
recommendations = sim.dot_product(
    urm,
    similarity_matrix.T,
    k=100,
    target_rows=[1, 14, 8],
    filter_cols=urm
)

📦 Installation

SimilariPy can be installed from PyPI with:

pip install similaripy

🔧 GCC Compiler - Required

To install the package and compile the Cython code, a GCC-compatible compiler with OpenMP is required.

Ubuntu / Debian

Install the official dev-tools:

sudo apt update && sudo apt install build-essential

MacOS (Intel & Apple Silicon)

Install GCC with homebrew:

brew install gcc

Windows

Install the official Visual C++ Build Tools.

⚠️ On Windows, use the default format_output='coo' in all similarity functions, as 'csr' is currently not supported.

Optional Optimization: Intel MKL for Intel CPUs

For Intel CPUs, using SciPy/Numpy with MKL (Math Kernel Library) is highly recommended for best performance. The easiest way to achieve this is to install them via Anaconda.

📦 Requirements

Package	Version
numpy	>= 1.21
scipy	>= 1.10.1
tqdm	>= 4.65.2

📜 History

This library originated during the Spotify Recsys Challenge 2018.

Our team, The Creamy Fireflies, faced major challenges computing large similarity models on a dataset with over 66 million interactions. Standard Python/Numpy solutions were too slow as a whole day was required to compute one single model.

To overcome this, I developed high-performance versions of the core similarity functions in Cython and OpenMP. Encouraged by my teammates, I open-sourced this work to help others solve similar challenges.

Thanks to my Creamy Fireflies friends for the support! 🙏

📄 License

This project is released under the MIT License.

🔖 Citation

If you use SimilariPy in your research, please cite:

@misc{boglio_simone_similaripy,
  author       = {Boglio Simone},
  title        = {bogliosimone/similaripy},
  doi          = {10.5281/zenodo.2583851},
  url          = {https://doi.org/10.5281/zenodo.2583851}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.2

May 30, 2025

0.2.1

Apr 27, 2025

0.2.0

Apr 16, 2025

0.1.3

Jun 12, 2022

0.1.2

Jan 23, 2021

0.1.1

Apr 14, 2019

0.1.0

Apr 14, 2019

0.0.14

Apr 14, 2019

0.0.13

Mar 30, 2019

0.0.12

Mar 5, 2019

0.0.11

Jan 11, 2019

0.0.10

Oct 7, 2018

0.0.9

Oct 7, 2018

0.0.8

Sep 25, 2018

0.0.7

Sep 24, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

similaripy-0.2.2.tar.gz (47.8 kB view details)

Uploaded May 30, 2025 Source

File details

Details for the file similaripy-0.2.2.tar.gz.

File metadata

Download URL: similaripy-0.2.2.tar.gz
Upload date: May 30, 2025
Size: 47.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for similaripy-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`631ece084426a840e96deb4147471d72149cd65076a812192378652567fc3984`
MD5	`e619fefac19ae4e22e49dfd156bbd6e4`
BLAKE2b-256	`3464a0163132607ad7170847e9fa1b188c6392e6be75f154ea0b3046aa2ab1ae`

See more details on using hashes here.

similaripy 0.2.2

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

SimilariPy

🔍 Similarity Functions

Core

Graph-Based

Advanced

🧮 Normalization Functions

🚀 Getting Started

📦 Installation

🔧 GCC Compiler - Required

Ubuntu / Debian

MacOS (Intel & Apple Silicon)

Windows

Optional Optimization: Intel MKL for Intel CPUs

📦 Requirements

📜 History

📄 License

🔖 Citation

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes