fuzzy-sentences-clustering·PyPI

Clustering similar sentences based on their fuzzy similarity.

These details have not been verified by PyPI

Project links

Project description

Clustering similar sentences based on their fuzzy similarity.

For the word stem extractor I am using Snowball stemmers from NLTK library. So the following languages are supported:

Arabic
Danish
Dutch
English
Finnish
French
German
Hungarian
Italian
Norwegian
Portuguese
Romanian
Russian
Spanish
Swedish

Purpose of the Package

There are some popular algorithms on the market for mining topics in a textual set, such as LDA, but they don’t work very well for a small set of data, let’s say a thousand sentences for example.

This package tries to solve this for a small dataset by making the following naive assumption:

If I remove all the stopwords, get the stems from words and after that these sentences become similar, they are probably talking about the same, or similar, subject.

The goal here is to form clusters/groups with at least two similar sentences, isolated sentences (sentences that don’t look like any other in the total set) will not generate a cluster just for them. For these cases, the sentence will receive the -1 tag.

Usage

You can choose more than one method to compare the similarity between sentences:

ratio
partial_ratio
token_sort_ratio (the default one)
token_set_ratio

To know more about these methods click here.

>>> from fuzzy_sentences_clustering import look_for_clusters
>>> eng_sentences = [
        "I live in New York",
        "I want to buy a car",
        "a car I would like to buy",
        "Ohh New York, I lived there in 2005",
        "I have a dog",
    ]
>>> ger_sentences = [
        "ich lebe in New York",
        "Ich möchte ein Auto kaufen",
        "ein Auto, das ich kaufen möchte",
        "Oh New York, Ich habe dort 2005 gelebt",
        "Ich habe einen Hund",
    ]
>>> look_for_clusters(eng_sentences, similarity_threshold=60)
output: [1, 2, 2, 1, -1]
>>> look_for_clusters(ger_sentences, language="german", method="token_set_ratio", similarity_threshold=80)
output: [1, 2, 2, 1, -1]

Contribution

Contributions are welcome.

If you find a bug, please let me know.

Author

Cloves Paiva.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.2

Jul 28, 2022

1.1.1

Jul 28, 2022

0.0.1

Jul 18, 2022

0.0.0

Jul 18, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fuzzy-sentences-clustering-1.1.2.tar.gz (4.1 kB view details)

Uploaded Jul 28, 2022 Source

Built Distribution

fuzzy_sentences_clustering-1.1.2-py3-none-any.whl (4.2 kB view details)

Uploaded Jul 28, 2022 Python 3

File details

Details for the file fuzzy-sentences-clustering-1.1.2.tar.gz.

File metadata

Download URL: fuzzy-sentences-clustering-1.1.2.tar.gz
Upload date: Jul 28, 2022
Size: 4.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.1.14 CPython/3.8.10 Linux/5.10.16.3-microsoft-standard-WSL2

File hashes

Hashes for fuzzy-sentences-clustering-1.1.2.tar.gz
Algorithm	Hash digest
SHA256	`fcd12cbb20a9fae5f7ec12c08705632d2da96a8795727c99d63f9114759be4cc`
MD5	`b18e94e2a24b0df7fc0692fbf1ff977a`
BLAKE2b-256	`17b2aed54f165cecaaba2d0c098296ee318707f290fd1d9b416eeb3eeb12a1da`

See more details on using hashes here.

File details

Details for the file fuzzy_sentences_clustering-1.1.2-py3-none-any.whl.

File metadata

Download URL: fuzzy_sentences_clustering-1.1.2-py3-none-any.whl
Upload date: Jul 28, 2022
Size: 4.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.1.14 CPython/3.8.10 Linux/5.10.16.3-microsoft-standard-WSL2

File hashes

Hashes for fuzzy_sentences_clustering-1.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d799afaa656773042ce47030cc8b118de0429ff24b1210085fb07bcc34b139c4`
MD5	`a6fb88d98ba3428ad8f1f384c5f9a56d`
BLAKE2b-256	`7de623d7b0281ce6ff9211f8bab4cc17c1d8bbf1dbde2d81df15a6a40032580d`

See more details on using hashes here.

fuzzy-sentences-clustering 1.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Purpose of the Package

Usage

Contribution

Author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes