Skip to main content

A lightweight python tool for effortless text similarity scoring using Hugging Face models

Project description

img

Hugging-Mapper

A lightweight python tool for easy text similarity scoring using Hugging Face models

PyPI Python application Read the Docs PyPI - Python Version

GitHub issues GitHub license GitHub last commit GitHub stars

Table of Contents :bookmark_tabs:

Installation

pip install hugging-mapper

Features

  • Easily compare how similar two pieces of text are
  • Customizable model selection at initialization
  • Works with Hugging Face models that create sentence embeddings
  • Batch scoring for lists of sentence pairs

Usage

Embedding text using huggingface models

from hugger.mapper import HuggingMapper

# init
# default model_name is 'cambridgeltl/SapBERT-from-PubMedBERT-fulltext'
mapper = HuggingMapper(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

# generate embedding
embedding = mapper.embed_text("I hope you'll find this helpful.")

Similarity search of given data

from hugger.mapper import NodeMapper
import pandas as pd

# demo data
data = pd.DataFrame({
    "id": ["node1", "node2", "node3"], 
    "text": ["Disease", "Gene", "Drug"]
})

# generate embeddings for data using (default) huggingface model
node_mapper = NodeMapper(data)

# get most similar 
# threshold 0 returns all data sorted by similarity to the given term
most_similar = node_mapper.get_similar("protein", threshold=0)

# get matching node
node_id, metadata = node_mapper.get_match("genetics", threshold=0.7)

Documentation

Tutorials and documentation are available on Read the Docs :notebook_with_decorative_cover::grinning:

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hugging_mapper-1.0.2.tar.gz (165.3 kB view details)

Uploaded Source

File details

Details for the file hugging_mapper-1.0.2.tar.gz.

File metadata

  • Download URL: hugging_mapper-1.0.2.tar.gz
  • Upload date:
  • Size: 165.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hugging_mapper-1.0.2.tar.gz
Algorithm Hash digest
SHA256 45778164558d719c0a88b6acd94fbf23078bccd5f6f4583af3957c76d26eb265
MD5 09fa8604a21d40ff6ed788a16dd04126
BLAKE2b-256 68c781bc866356a7405d6c5cc0b9b4fbaffea7fbccd3e4e44a53c454a3013ecc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page