Skip to main content

A lightweight python tool for effortless text similarity scoring using Hugging Face models

Project description

img

Hugging-Mapper

A lightweight python tool for easy text similarity scoring using Hugging Face models

PyPI Python application Read the Docs PyPI - Python Version

GitHub issues GitHub license GitHub last commit GitHub stars

Table of Contents :bookmark_tabs:

Installation

pip install hugging-mapper

Features

  • Easily compare how similar two pieces of text are
  • Customizable model selection at initialization
  • Works with Hugging Face models that create sentence embeddings
  • Batch scoring for lists of sentence pairs

Usage

Embedding text using huggingface models

from hugger.mapper import HuggingMapper

# init
# default model_name is 'cambridgeltl/SapBERT-from-PubMedBERT-fulltext'
mapper = HuggingMapper(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

# generate embedding
embedding = mapper.embed_text("I hope you'll find this helpful.")

Similarity search of given data

from hugger.mapper import NodeMapper
import pandas as pd

# demo data
data = pd.DataFrame({
    "id": ["node1", "node2", "node3"], 
    "text": ["Disease", "Gene", "Drug"]
})

# generate embeddings for data using (default) huggingface model
node_mapper = NodeMapper(data)

# get most similar 
# threshold 0 returns all data sorted by similarity to the given term
most_similar = node_mapper.get_similar("protein", threshold=0)

# get matching node
node_id, metadata = node_mapper.get_match("genetics", threshold=0.7)

Documentation

Tutorials and documentation are available on Read the Docs :notebook_with_decorative_cover::grinning:

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hugging_mapper-1.0.3.tar.gz (165.5 kB view details)

Uploaded Source

File details

Details for the file hugging_mapper-1.0.3.tar.gz.

File metadata

  • Download URL: hugging_mapper-1.0.3.tar.gz
  • Upload date:
  • Size: 165.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hugging_mapper-1.0.3.tar.gz
Algorithm Hash digest
SHA256 419961c998aae5027d0906762f12eb2a06b12faae880bd2e1ab74f7179cf957f
MD5 79fea80c89ceb749d36970ac22c2636a
BLAKE2b-256 af73a904fa8e9f1607f73b33d2c45a89626494fd1aa2f7653865b35cca15240e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page