Skip to main content

A lightweight python tool for effortless text similarity scoring using Hugging Face models

Project description

img

Hugging-Mapper

A lightweight python tool for easy text similarity scoring using Hugging Face models

PyPI Python application Read the Docs PyPI - Python Version PyPI - Downloads

GitHub issues GitHub license GitHub last commit GitHub stars

Table of Contents

Installation

pip install hugging-mapper

Features

  • Fast text similarity scoring
  • Customizable model selection at initialization
  • Supports Hugging Face models with sentence embedding capability
  • Batch scoring for lists of sentence pairs

Usage

Embedding text using huggingface models

from hugger.mapper import HuggingMapper

# init
# default model_name is 'cambridgeltl/SapBERT-from-PubMedBERT-fulltext'
mapper = HuggingMapper(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

# generate embedding
embedding = mapper.embed_text("I hope you'll find this helpful.")

Similarity search of given data

from hugger.mapper import NodeMapper
import pandas as pd

# demo data
data = pd.DataFrame({
    "id": ["node1", "node2", "node3"], 
    "text": ["Disease", "Gene", "Drug"]
})

# generate embeddings for data using (default) huggingface model
node_mapper = NodeMapper(data)

# get most similar 
# threshold 0 returns all data sorted by similarity to the given term
most_similar = node_mapper.get_similar("protein", threshold=0)

# get matching node
node_id, metadata = node_mapper.get_match("genetics", threshold=0.7)

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hugging_mapper-1.0.0.tar.gz (22.0 kB view details)

Uploaded Source

File details

Details for the file hugging_mapper-1.0.0.tar.gz.

File metadata

  • Download URL: hugging_mapper-1.0.0.tar.gz
  • Upload date:
  • Size: 22.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for hugging_mapper-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3deeca82a6bb2039feb1cc5f787854008f0b9ee959c7fd35f8e8db2f814b91e9
MD5 83d683358e9626f8f96a7afebcba0f7f
BLAKE2b-256 a697e33e00d17832d70d8a32105adfdcea3082eb5a242821ee90f69b6a0e114d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page