NLP library that extracts, compares, transforms and sorts with buckets phrases.

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

python-semantic-сompare

Extracts, compares, transforms and sorts with buckets phrases.

Installation

The project requires a spacy model for natural language processing. If you want to use English, please run this command

$ python -m spacy download en_core_web_lg

Usage

Extract phrases

Simple Usage

from semantic_compare import SemanticComparator as sc
comparator = sc(sentencizer=True)
phrases = comparator.extract_phrases("Create, promote and develop a business.")

Output:

['Create a business','promote a business','develop a business']

sentencizer - a splitter of sentences by punctuation(dot, question mark, exclamation mark).

Advanced Usage

from semantic_compare import SemanticComparator as sc

# Sentence splitter
def our_sentencizer(doc):
    """
    Sentence splitter function that allows splitting document on sentences
    by different punctuations and new line
    """
    for i, token in enumerate(doc[:-2]):
        if token.text == "•" or "•" in token.text:
            doc[i].is_sent_start = True
        elif (token.text == "." or token.text == '...' \ 
            or token.text == '?' or token.text == '!' or token.text == '\n') \
            and doc[i+1].is_title:
            doc[i+1].is_sent_start = True
        else:
            doc[i+1].is_sent_start = False
    return doc


# Merge entities and build noun chunks
comparator = sc(merge_entities=False, spacy_model='en_core_web_sm')
    
# Add a custom pipe for text preprocessing
comparator.add_custom_pipe(our_sentencizer, before='parser')

phrases = comparator.extract_phrases('''
Must Have:
* Experience shaping the BI strategy from C-Level to Technical developers.
* Extensive delivery of platform within a Business Intelligence and Analytics function.
* Communication with stakeholders on all levels.
''')
print('\n'.join(phrases))

Using add_custom_pipe you can add your custom pipe for text processing in spacy.

Compare phrases (Semantic similarity)

Get the similarity of phrases against each other. Example 1:

phrase1 = 'Understand customer needs'
phrase2 = 'Capture business requirements'
similarity = comparator.compare_phrases(phrase1, phrase2)
print(similarity)

Output:

0.38569751

Example 2: Get a two-dimensional matrix that clusters the similarity of phrases against each other.

phrases_1 = [
    'Communication with stakeholders',
    'Understand customer needs',
    'Experience shaping the BI strategy',
    'shaping the BI strategy',
    'Delivery of platform Analytics function',
]

phrases_2 = [
    'Extensive delivery of platform within a Business Intelligence and Analytics function',
    'shaping the BI strategy',
    'Experience shaping the BI strategy from C-Level to Technical developers',
    'Communication with stakeholders on all levels',
    'Capture business requirements',
    'Play computer games',
]
similarity = comparator.build_similarity_matrix(phrases_1, phrases_2)
print(similarity)

Output:

[[-0.03689054  0.0372301   0.17840812  0.09079809  0.65748763]
 [ 0.18079719  0.12055688  0.77624094  1.          0.22749564]
 [ 0.08472343  0.11505745  0.7030021   0.48876476  0.13252231]
 [ 0.7132235   0.07449755  0.178031    0.15712512  0.0676512 ]
 [ 0.11637229  0.38569745  0.23005028  0.25646406  0.26493344]
 [ 0.17955953  0.15243992  0.11233422  0.16087453  0.03144675]]

Bucket sorting

When you compare two documents you can see which phrases present in both or only in a specific document.

phrases_1 = [
    'Communication with stakeholders',
    'Understand customer needs',
    'Experience shaping the BI strategy',
    'shaping the BI strategy',
    'Delivery of platform Analytics function',
]

phrases_2 = [
    'Extensive delivery of platform within a Business Intelligence and Analytics function',
    'shaping the BI strategy',
    'Experience shaping the BI strategy from C-Level to Technical developers',
    'Communication with stakeholders on all levels',
    'Capture business requirements',
    'Play computer games',
]
# cut_off - a percentage of similarity should be bigger than it so that we consider that phrases are similar(default=0.3)
in_both, in_doc1, in_doc2 = comparator.bucket_sorting(
    phrases_1, phrases_2, similarity, cut_off=0.5)

Transfrom phrases

Get all steps of transformation from one phrase to another. Example:

print(comparator.transform_phrase(
    'Understand customer needs',
    'Capture business requirements',
))

Output

["Understand customer needs", "Capture customer needs", "Capture business requirements"]

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.9.1

Mar 16, 2020

This version

0.9.0

Mar 16, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semantic_compare-0.9.0.tar.gz (6.1 kB view details)

Uploaded Mar 16, 2020 Source

File details

Details for the file semantic_compare-0.9.0.tar.gz.

File metadata

Download URL: semantic_compare-0.9.0.tar.gz
Upload date: Mar 16, 2020
Size: 6.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.4

File hashes

Hashes for semantic_compare-0.9.0.tar.gz
Algorithm	Hash digest
SHA256	`29b590136073fa2d0dca1c3a35bc181587bdb894b1fc8b5a9143cde580f4dedd`
MD5	`c5fb4f7d59c07baaaee38e92e2a42b82`
BLAKE2b-256	`02e37c8b9f4bd8f9e98e61cfd80028a0abbea661a52276851b95faa352edc3f2`

See more details on using hashes here.

semantic-compare 0.9.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

python-semantic-сompare

Installation

Usage

Extract phrases

Compare phrases (Semantic similarity)

Bucket sorting

Transfrom phrases

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes