NLP library that extracts, compares, transforms and sorts with buckets phrases.
Project description
python-semantic-сompare
Extracts, compares, transforms and sorts with buckets phrases.
Installation
The project requires a spacy model for natural language processing. If you want to use English, please run this command
$ python -m spacy download en_core_web_lg
Usage
Extract phrases
Simple Usage
from semantic_compare import SemanticComparator as sc
comparator = sc(sentencizer=True)
phrases = comparator.extract_phrases("Create, promote and develop a business.")
Output:
['Create a business','promote a business','develop a business']
sentencizer - a splitter of sentences by punctuation(dot, question mark, exclamation mark).
Advanced Usage
from semantic_compare import SemanticComparator as sc
# Sentence splitter
def our_sentencizer(doc):
"""
Sentence splitter function that allows splitting document on sentences
by different punctuations and new line
"""
for i, token in enumerate(doc[:-2]):
if token.text == "•" or "•" in token.text:
doc[i].is_sent_start = True
elif (token.text == "." or token.text == '...' \
or token.text == '?' or token.text == '!' or token.text == '\n') \
and doc[i+1].is_title:
doc[i+1].is_sent_start = True
else:
doc[i+1].is_sent_start = False
return doc
# Merge entities and build noun chunks
comparator = sc(merge_entities=False, spacy_model='en_core_web_sm')
# Add a custom pipe for text preprocessing
comparator.add_custom_pipe(our_sentencizer, before='parser')
phrases = comparator.extract_phrases('''
Must Have:
* Experience shaping the BI strategy from C-Level to Technical developers.
* Extensive delivery of platform within a Business Intelligence and Analytics function.
* Communication with stakeholders on all levels.
''')
print('\n'.join(phrases))
Using add_custom_pipe you can add your custom pipe for text processing in spacy.
Compare phrases (Semantic similarity)
Get the similarity of phrases against each other. Example 1:
phrase1 = 'Understand customer needs'
phrase2 = 'Capture business requirements'
similarity = comparator.compare_phrases(phrase1, phrase2)
print(similarity)
Output:
0.38569751
Example 2: Get a two-dimensional matrix that clusters the similarity of phrases against each other.
phrases_1 = [
'Communication with stakeholders',
'Understand customer needs',
'Experience shaping the BI strategy',
'shaping the BI strategy',
'Delivery of platform Analytics function',
]
phrases_2 = [
'Extensive delivery of platform within a Business Intelligence and Analytics function',
'shaping the BI strategy',
'Experience shaping the BI strategy from C-Level to Technical developers',
'Communication with stakeholders on all levels',
'Capture business requirements',
'Play computer games',
]
similarity = comparator.build_similarity_matrix(phrases_1, phrases_2)
print(similarity)
Output:
[[-0.03689054 0.0372301 0.17840812 0.09079809 0.65748763]
[ 0.18079719 0.12055688 0.77624094 1. 0.22749564]
[ 0.08472343 0.11505745 0.7030021 0.48876476 0.13252231]
[ 0.7132235 0.07449755 0.178031 0.15712512 0.0676512 ]
[ 0.11637229 0.38569745 0.23005028 0.25646406 0.26493344]
[ 0.17955953 0.15243992 0.11233422 0.16087453 0.03144675]]
Bucket sorting
When you compare two documents you can see which phrases present in both or only in a specific document.
phrases_1 = [
'Communication with stakeholders',
'Understand customer needs',
'Experience shaping the BI strategy',
'shaping the BI strategy',
'Delivery of platform Analytics function',
]
phrases_2 = [
'Extensive delivery of platform within a Business Intelligence and Analytics function',
'shaping the BI strategy',
'Experience shaping the BI strategy from C-Level to Technical developers',
'Communication with stakeholders on all levels',
'Capture business requirements',
'Play computer games',
]
# cut_off - a percentage of similarity should be bigger than it so that we consider that phrases are similar(default=0.3)
in_both, in_doc1, in_doc2 = comparator.bucket_sorting(
phrases_1, phrases_2, similarity, cut_off=0.5)
Transfrom phrases
Get all steps of transformation from one phrase to another. Example:
print(comparator.transform_phrase(
'Understand customer needs',
'Capture business requirements',
))
Output
["Understand customer needs", "Capture customer needs", "Capture business requirements"]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file semantic_compare-0.9.0.tar.gz.
File metadata
- Download URL: semantic_compare-0.9.0.tar.gz
- Upload date:
- Size: 6.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29b590136073fa2d0dca1c3a35bc181587bdb894b1fc8b5a9143cde580f4dedd
|
|
| MD5 |
c5fb4f7d59c07baaaee38e92e2a42b82
|
|
| BLAKE2b-256 |
02e37c8b9f4bd8f9e98e61cfd80028a0abbea661a52276851b95faa352edc3f2
|