Skip to main content

YASEM - Yet Another Splade|Sparse Embedder - A simple and efficient library for SPLADE embeddings

Project description

YASEM (Yet Another Splade|Sparse Embedder)

YASEM is a simple and efficient library for executing SPLADE (Sparse Lexical and Expansion Model for Information Retrieval) and creating sparse vectors. It provides a straightforward interface inspired by SentenceTransformers for easy integration into your projects.

Why YASEM?

  • Simplicity: YASEM focuses on providing a clean and simple implementation of SPLADE without unnecessary complexity.
  • Efficiency: Generate sparse embeddings quickly and easily.
  • Flexibility: Works with both NumPy and PyTorch backends.
  • Convenience: Includes helpful utilities like get_token_values for inspecting feature representations.

Installation

You can install YASEM using pip:

pip install yasem

Quick Start

Here's a simple example of how to use YASEM:

from yasem import SpladeEmbedder

# Initialize the embedder
embedder = SpladeEmbedder("naver/splade-v3")

# Prepare some sentences
sentences = [
    "Hello, my dog is cute",
    "Hello, my cat is cute",
    "Hello, I like a ramen",
    "Hello, I like a sushi",
]

# Generate embeddings
embeddings = embedder.encode(sentences)
# or sparse csr matrix
# embeddings = embedder.encode(sentences, convert_to_csr_matrix=True)

# Compute similarity
similarity = embedder.similarity(embeddings, embeddings)
print(similarity)
# [[148.62903569 106.88184372  18.86930016  22.87525314]
#  [106.88184372 122.79656474  17.45339064  21.44758757]
#  [ 18.86930016  17.45339064  61.00272733  40.92700849]
#  [ 22.87525314  21.44758757  40.92700849  73.98511539]]


# Inspect token values for the first sentence
token_values = embedder.get_token_values(embeddings[0])
print(token_values)
# {'hello': 6.89453125, 'dog': 6.48828125, 'cute': 4.6015625,
#  'message': 2.38671875, 'greeting': 2.259765625,
#    ...

token_values = embedder.get_token_values(embeddings[3])
print(token_values)
# {'##shi': 3.63671875, 'su': 3.470703125, 'eat': 3.25,
#  'hello': 2.73046875, 'you': 2.435546875, 'like': 2.26953125, 'taste': 1.8203125,

Features

  • Easy-to-use API inspired by SentenceTransformers
  • Support for both NumPy and scipy.sparse.csr_matrix
  • Efficient dot product similarity computation
  • Utility function to inspect token values in embeddings

License

This project is licensed under the MIT License. See the LICENSE file for the full license text. Copyright (c) 2024 Yuichi Tateno (@hotchpotch)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgements

This library is inspired by the SPLADE model and aims to provide a simple interface for its usage. Special thanks to the authors of the original SPLADE paper and the developers of the model.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yasem-0.3.2.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

yasem-0.3.2-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file yasem-0.3.2.tar.gz.

File metadata

  • Download URL: yasem-0.3.2.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.13 Linux/5.15.0-124-generic

File hashes

Hashes for yasem-0.3.2.tar.gz
Algorithm Hash digest
SHA256 5c91b40651c09e0129bc116d87e95af8c7417d87ba2ce568532c1f3aa0d83278
MD5 8eee8687960857ff28e073b02678276c
BLAKE2b-256 797f3476fa0e1d6a22e1ad16f6622eff4f31d8e438b508a0b725185c5e35cd51

See more details on using hashes here.

Provenance

File details

Details for the file yasem-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: yasem-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 6.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.13 Linux/5.15.0-124-generic

File hashes

Hashes for yasem-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2be8ef108ebdbbfc0647c1e8d1546e3a533601823166e640d98e04cbdcdaf52c
MD5 27f46de51aa8356a36bb0e18a17d1bf4
BLAKE2b-256 7d40f671cbf3b24effd63b73a0b34aa318e02e84aaa5d99c02102904d0391b89

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page