Skip to main content

A lightweight python tool for effortless text similarity scoring using Hugging Face models

Project description

img

Hugging-Mapper

A lightweight python tool for easy text similarity scoring using Hugging Face models

PyPI Python application Read the Docs PyPI - Python Version

GitHub issues GitHub license GitHub last commit GitHub stars

Table of Contents

Installation

pip install hugging-mapper

Features

  • Easily compare how similar two pieces of text are
  • Customizable model selection at initialization
  • Works with Hugging Face models that create sentence embeddings
  • Batch scoring for lists of sentence pairs

Usage

Embedding text using huggingface models

from hugger.mapper import HuggingMapper

# init
# default model_name is 'cambridgeltl/SapBERT-from-PubMedBERT-fulltext'
mapper = HuggingMapper(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

# generate embedding
embedding = mapper.embed_text("I hope you'll find this helpful.")

Similarity search of given data

from hugger.mapper import NodeMapper
import pandas as pd

# demo data
data = pd.DataFrame({
    "id": ["node1", "node2", "node3"], 
    "text": ["Disease", "Gene", "Drug"]
})

# generate embeddings for data using (default) huggingface model
node_mapper = NodeMapper(data)

# get most similar 
# threshold 0 returns all data sorted by similarity to the given term
most_similar = node_mapper.get_similar("protein", threshold=0)

# get matching node
node_id, metadata = node_mapper.get_match("genetics", threshold=0.7)

Documentation

Tutorials and documentation are available on Read the Docs :)

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hugging_mapper-1.0.4.tar.gz (194.0 kB view details)

Uploaded Source

File details

Details for the file hugging_mapper-1.0.4.tar.gz.

File metadata

  • Download URL: hugging_mapper-1.0.4.tar.gz
  • Upload date:
  • Size: 194.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hugging_mapper-1.0.4.tar.gz
Algorithm Hash digest
SHA256 1adacace755ce55adf3d1804a04a1d5e3a5402b51f348f5268458197ff399013
MD5 459ad724c46aa609deb382acc9f51ed2
BLAKE2b-256 155db4d7ad0a4c1fd366cf55ba2e790f0a38db975613fdf86a54d26bae285fc2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page