A very light python libary for comparing similarity between text/strings

These details have not been verified by PyPI

Project links

Project description

pysimilar

A python library for computing the similarity between two string(text) based on cosine similarity made by kalebu

How does it work ?

It uses Tfidf Vectorizer to transform the text into vectors and then obtained vectors are converted into arrays of numbers and then finally cosine similary computation is employed resulting to output indicating how similar they are.

Example of usage

Pysimilar allows you to either specify the string you want to compare directly or specify path to files containing string you want to compare.

compare() strings

Here an example on how to compare strings directly;

>>> from pysimilar import compare
>>> compare('very light indeed', 'how fast is light')
0.17077611319011649

compare () files

Here how to compare files with textual documents;

>>> compare('README.md', 'LICENSE', isfile=True)
0.25545580376557886

You can also compare documents with particular extension in a given directory, for instance let's say I want to compare all the documents with .txt in a documents directory here is what I will do;

Directory for documents used by the example below look like this

documents/
├── anomalie.zeta
├── hello.txt
├── hi.txt
└── welcome.txt

compare_documents ()

Here how to compare files of a particular extension

>>> import pysimilar
>>> from pprint import pprint
>>> pysimilar.extensions = '.txt'
>>> comparison_result = pysimilar.compare_documents('documents')
>>> [['welcome.txt vs hi.txt', 0.6053485081062917],
    ['welcome.txt vs hello.txt', 0.0],
    ['hi.txt vs hello.txt', 0.0]]

sorting the outputs

You can also sort the comparison score based on their score by changing the ascending parameter, just as shown below;

>>> comparison_result = pysimilar.compare_documents('documents', ascending=True)
>>> pprint(comparison_result)
[['welcome.txt vs hello.txt', 0.0],
 ['hi.txt vs hello.txt', 0.0],
 ['welcome.txt vs hi.txt', 0.6053485081062917]]

multiple extensions

You can also set pysimilar to include files with multiple extensions

>>> import pysimilar
>>> from pprint import pprint
>>> pysimilar.extensions = ['.txt', '.zeta']
>>> comparison_result = pysimilar.compare_documents('documents', ascending=True)
>>> pprint(comparison_result)
[['welcome.txt vs hello.txt', 0.0],
 ['hi.txt vs hello.txt', 0.0],
 ['anomalie.zeta vs hi.txt', 0.4968161174826459],
 ['welcome.txt vs hi.txt', 0.6292275146695526],
 ['welcome.txt vs anomalie.zeta', 0.7895651507603823]]

Contributions

If you have anything valuable to add to the lib, whether its a documentation, typo error, source code, please don't hesitate to contribute just fork it and submit your pull request and I will try to be as friendly as I can to assist you making the contributions.

All the Credits

All the Credits to kalebu and other future contributors

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.5

May 19, 2022

0.4

May 1, 2021

0.3

Apr 30, 2021

0.2

Apr 17, 2021

0.1

Apr 7, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pysimilar-0.5-py3-none-any.whl (5.2 kB view details)

Uploaded May 19, 2022 Python 3

File details

Details for the file pysimilar-0.5-py3-none-any.whl.

File metadata

Download URL: pysimilar-0.5-py3-none-any.whl
Upload date: May 19, 2022
Size: 5.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pysimilar-0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fb729c5c0d1389eb2d304297279da7fcaaf566be174501027ce3fc891d73e7fd`
MD5	`732fd2bb5d7dada94d3e90c14d3d86d0`
BLAKE2b-256	`158cd285efb796b21531fbe412eb86055e2b1a93d7faae51bc73de12486d4011`

See more details on using hashes here.

pysimilar 0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pysimilar

How does it work ?

Example of usage

compare() strings

compare () files

compare_documents ()

sorting the outputs

multiple extensions

Contributions

All the Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes