Package to calculate the similarity score of two sentences
Project description
Sentence Similarity
Package to calculate the similarity score between two sentences
Examples
Using Transformers
from sentence_similarity import sentence_similarity
sentence_a = "paris is a beautiful city"
sentence_b = "paris is a grogeous city"
Supported Models
You can access some of the official model through the sentence_similarity
class. However, you can directly type the HuggingFace's model name such as bert-base-uncased
or distilbert-base-uncased
when instantiating a sentence_similarity
.
See all the available models at huggingface.co/models.
model=sentence_similarity(model_name='distilbert-base-uncased',embedding_type='cls_token_embedding')
BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.
Set embedding_type to cls_token_embedding
, To compute the similarity score between two sentences based on [CLS] token.
paper link (https://arxiv.org/pdf/1810.04805.pdf)
score=model.get_score(sentence_a,sentence_b,metric="cosine")
print(score)
Available metric are euclidean, manhattan, minkowski, cosine score.
Using Sentence Transformers
from sentence_similarity import sentence_similarity
sentence_a = "paris is a beautiful city"
sentence_b = "paris is a grogeous city"
Supported Models
You can access all the pretrained models of Sentence-Transformers
See all the available models at sbert/models.
model=sentence_similarity(model_name='distilbert-base-uncased',embedding_type='sentence_embedding')
Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity.
Set embedding_type to sentence_embedding
(default embedding_type), To compute the similarity score between two sentences based on sbert.
paper link (https://arxiv.org/pdf/1908.10084.pdf)
score=model.get_score(sentence_a,sentence_b,metric="cosine")
print(score)
Available metric are euclidean, manhattan, minkowski, cosine score.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sentence_similarity-1.0.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa7c67fea77e37f1e7fb4a3d46474f8e6eb24573f1b724c5dd3f845d01d4a71e |
|
MD5 | a1bc430596e50f89d9137ace13500a22 |
|
BLAKE2b-256 | be26acc525a2ac2198df7cd1994d518df21d9f7e5c83d7add101adf230da692a |
Hashes for sentence_similarity-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aefb73cce7733e1b95289119aedb9a734c16b1209ead4e9bc5516a8e26087128 |
|
MD5 | c243e7a5bf78a4c9c15a5243a3368ca5 |
|
BLAKE2b-256 | f60adf518576521e80a6edc77b10b38207acef3078ccac67dcffb8ff8eb86679 |