Skip to main content

NLP library that extracts, compares, transforms and sorts with buckets phrases.

Project description

python-semantic-сompare

Extracts, compares, transforms and sorts with buckets phrases.

Installation

Install a library

$ pip install semantic-compare

The library requires a spacy model for natural language processing. If you want to use English, please run this command

$ python -m spacy download en_core_web_lg

Usage

Extract phrases

Simple Usage

from semantic_compare import SemanticComparator as sc
comparator = sc(sentencizer=True)
phrases = comparator.extract_phrases("Create, promote and develop a business.")

Output:

['Create a business','promote a business','develop a business']

sentencizer - a splitter of sentences by punctuation(dot, question mark, exclamation mark).

Advanced Usage

from semantic_compare import SemanticComparator as sc

# Sentence splitter
def our_sentencizer(doc):
    """
    Sentence splitter function that allows splitting document on sentences
    by different punctuations and new line
    """
    for i, token in enumerate(doc[:-2]):
        if token.text == "•" or "•" in token.text:
            doc[i].is_sent_start = True
        elif (token.text == "." or token.text == '...' \ 
            or token.text == '?' or token.text == '!' or token.text == '\n') \
            and doc[i+1].is_title:
            doc[i+1].is_sent_start = True
        else:
            doc[i+1].is_sent_start = False
    return doc


# load small english spacy model(can be any spacy model)
comparator = sc(spacy_model='en_core_web_sm')
    
# Add a custom pipe for text preprocessing
comparator.add_custom_pipe(our_sentencizer, before='parser')

phrases = comparator.extract_phrases('''
Must Have:
* Experience shaping the BI strategy from C-Level to Technical developers.
* Extensive delivery of platform within a Business Intelligence and Analytics function.
* Communication with stakeholders on all levels.
''')
print('\n'.join(phrases))

Using add_custom_pipe you can add your custom pipe for text processing in spacy.

Compare phrases (Semantic similarity)

Get the similarity of phrases against each other. Example 1:

phrase1 = 'Understand customer needs'
phrase2 = 'Capture business requirements'
similarity = comparator.compare_phrases(phrase1, phrase2)
print(similarity)

Output:

0.38569751

Example 2: Get a two-dimensional matrix that clusters the similarity of phrases against each other.

phrases_1 = [
    'Communication with stakeholders',
    'Understand customer needs',
    'Experience shaping the BI strategy',
    'shaping the BI strategy',
    'Delivery of platform Analytics function',
]

phrases_2 = [
    'Extensive delivery of platform within a Business Intelligence and Analytics function',
    'shaping the BI strategy',
    'Experience shaping the BI strategy from C-Level to Technical developers',
    'Communication with stakeholders on all levels',
    'Capture business requirements',
    'Play computer games',
]
similarity = comparator.build_similarity_matrix(phrases_1, phrases_2)
print(similarity)

Output:

[[-0.03689054  0.0372301   0.17840812  0.09079809  0.65748763]
[ 0.18079719  0.12055688  0.77624094  1.          0.22749564]
[ 0.08472343  0.11505745  0.7030021   0.48876476  0.13252231]
[ 0.7132235   0.07449755  0.178031    0.15712512  0.0676512 ]
[ 0.11637229  0.38569745  0.23005028  0.25646406  0.26493344]
[ 0.17955953  0.15243992  0.11233422  0.16087453  0.03144675]]

Bucket sorting

When you compare two documents you can see which phrases present in both or only in a specific document.

phrases_1 = [
    'Communication with stakeholders',
    'Understand customer needs',
    'Experience shaping the BI strategy',
    'shaping the BI strategy',
    'Delivery of platform Analytics function',
]

phrases_2 = [
    'Extensive delivery of platform within a Business Intelligence and Analytics function',
    'shaping the BI strategy',
    'Experience shaping the BI strategy from C-Level to Technical developers',
    'Communication with stakeholders on all levels',
    'Capture business requirements',
    'Play computer games',
]
# cut_off - a percentage of similarity should be bigger than it so that we consider that phrases are similar(default=0.3)
in_both, in_doc1, in_doc2 = comparator.bucket_sorting(
    phrases_1, phrases_2, similarity, cut_off=0.5)

Transfrom phrases

Get all steps of transformation from one phrase to another. Example:

print(comparator.transform_phrase(
    'Understand customer needs',
    'Capture business requirements',
))

Output

["Understand customer needs", "Capture customer needs", "Capture business requirements"]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semantic_compare-0.9.1.tar.gz (6.1 kB view details)

Uploaded Source

File details

Details for the file semantic_compare-0.9.1.tar.gz.

File metadata

  • Download URL: semantic_compare-0.9.1.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.4

File hashes

Hashes for semantic_compare-0.9.1.tar.gz
Algorithm Hash digest
SHA256 89d74f3d5f125c2d7819418713e32d9e3f8e58dee240e9ed2d0a0ebedf601838
MD5 60c60be83da5cdf20d5350dbf0d214d7
BLAKE2b-256 8d7f8c12a335d3cbc5c146653a3c6fe3b68ba43e96b744bee86c976b7d011fb2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page