A lightweight python tool for effortless text similarity scoring using Hugging Face models
Project description
Hugging-Mapper
A lightweight python tool for easy text similarity scoring using Hugging Face models
Table of Contents
Installation
pip install hugging-mapper
Features
- Easily compare how similar two pieces of text are
- Customizable model selection at initialization
- Works with Hugging Face models that create sentence embeddings
- Batch scoring for lists of sentence pairs
Usage
Embedding text using huggingface models
from hugger.mapper import HuggingMapper
# init
# default model_name is 'cambridgeltl/SapBERT-from-PubMedBERT-fulltext'
mapper = HuggingMapper(
model_name="sentence-transformers/all-MiniLM-L6-v2"
)
# generate embedding
embedding = mapper.embed_text("I hope you'll find this helpful.")
Similarity search of given data
from hugger.mapper import NodeMapper
import pandas as pd
# demo data
data = pd.DataFrame({
"id": ["node1", "node2", "node3"],
"text": ["Disease", "Gene", "Drug"]
})
# generate embeddings for data using (default) huggingface model
node_mapper = NodeMapper(data)
# get most similar
# threshold 0 returns all data sorted by similarity to the given term
most_similar = node_mapper.get_similar("protein", threshold=0)
# get matching node
node_id, metadata = node_mapper.get_match("genetics", threshold=0.7)
Documentation
Tutorials and documentation are available on Read the Docs :)
License
This project is licensed under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hugging_mapper-1.0.4.tar.gz
(194.0 kB
view details)
File details
Details for the file hugging_mapper-1.0.4.tar.gz.
File metadata
- Download URL: hugging_mapper-1.0.4.tar.gz
- Upload date:
- Size: 194.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1adacace755ce55adf3d1804a04a1d5e3a5402b51f348f5268458197ff399013
|
|
| MD5 |
459ad724c46aa609deb382acc9f51ed2
|
|
| BLAKE2b-256 |
155db4d7ad0a4c1fd366cf55ba2e790f0a38db975613fdf86a54d26bae285fc2
|