Algorithm to define similarity rating between objects

These details have not been verified by PyPI

Project links

Homepage

Project description

FindSimilar

User-friendly library to find similar objects

To Get Acquainted

To be familiar with FindSimilar, try our Demo Project
Or if you like swagger try Demo Project API directly
You can also visit our Official website

Mission Statement

The mission of the "Find Similar" project is to provide a powerful and versatile open source library that empowers developers to efficiently find similar objects and perform comparisons across a variety of data types. Whether dealing with texts, images, audio, or more, our project aims to simplify the process of identifying similarities and enhancing decision-making.

Key Objectives

Extensibility: We strive to build a flexible framework that goes beyond textual comparisons, with plans to expand compatibility to various data formats, including images, audio, and more.
Ease of Integration: Our library will offer an intuitive interface that integrates seamlessly into existing applications and workflows, making it accessible to developers regardless of their experience level.
Scalability: Our focus is on creating efficient algorithms and data structures that can handle datasets of varying sizes, ensuring performance and accuracy as the project scales.
Community Collaboration: By embracing the principles of open source development, we invite a diverse community of contributors to collaborate, improve, and innovate upon the project, fostering a culture of shared knowledge and expertise.
Documentation and Education: We are committed to providing comprehensive documentation, tutorials, and resources to help users and contributors understand the library's capabilities and use them effectively.
Privacy and Ethics: As we expand into various data types, we are dedicated to upholding privacy and ethical considerations, ensuring that our library is built and used responsibly.

Join Us

We invite developers, data scientists, and enthusiasts from all backgrounds to join our mission. Together, we can shape the future of "Find Similar," creating a powerful tool that enhances decision-making, discovery, and innovation across diverse fields.

Open Source Collaboration

"FindSimilar" is an open source project, fostering collaboration and innovation. We welcome contributors from all backgrounds to join us in shaping the future of similarity comparisons across various data types.

To Get Involved

Get start with:

Installation:

From PyPi

pip install find-similar

You install core package from pypi. If you want to use tests and laboratory you can install find-similar from python package

From python package

git clone https://github.com/findsimilar/find-similar
pip3 install wheel
python find-similar/setup.py bdist_wheel
pip3 install find-similar/dist/*

Usage example:

Simple usage

from find_similar import find_similar

texts = ['one two', 'two three', 'three four']

text_to_compare = 'one four'
result = find_similar(text_to_compare, texts, count=10)
for item in result:
    print(item.text)
    print(item.cos)

expected result:

one two
0.5
three four
0.5
two three
0.0

Development

find_similar - this is the main package to install and use
analytics - help functions to improve the main algorithm
lab - python scripts to research

Lab

You can run any useful script from lab package

cd lab

Use load_data_from_file.py to load test data

python load_data_from_file.py /my/path/to/file.xlsx

Use check_total_rating.py to analyze algorithm accuracy

python check_total_rating.py

Example result:

Поиск выполнен для 529 позиций:
топ 1 -- 353 (66.73 %)
топ 5 -- 442 (83.55 %)
топ 10 -- 468 (88.47 %)
топ 25 -- 501 (94.71 %)
топ 50 -- 515 (97.35 %)
топ 100 -- 519 (98.11 %)
топ 500 -- 523 (98.87 %)
топ 1000 -- 529 (100.0 %)
топ 2000 -- 529 (100.0 %)

Use check_time_one_item to check how long time algorithm works for one item

python check_time_one_item.py

Example result:

Load base items...
1999 items loaded
RESULT TIME FOR ONE ITEM (REPEAT 1 times) = 0.03772415800085582

Use compare_two to compare two different texts. You can change texts in compare_two.txt file

python compare_two.py

Use tokenize_one to check how one text will be tokenized. You can set the text in tokenize_one.txt file

python tokenize_one.py

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

2.2.1

Apr 16, 2024

2.2.0

Apr 16, 2024

2.1.0

Nov 12, 2023

2.0.1

Nov 12, 2023

2.0.0

Nov 10, 2023

1.6.1

Nov 1, 2023

1.6.0

Nov 1, 2023

1.5.1

Oct 25, 2023

This version

1.5.0

Oct 6, 2023

1.4.2

Sep 19, 2023

1.3.2

Sep 12, 2023

1.3.1

Sep 12, 2023

1.3.0

Sep 12, 2023

1.2.1

Sep 10, 2023

1.2.0

Sep 8, 2023

1.0.1

Aug 19, 2023

1.0

Aug 18, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

find-similar-1.5.0.tar.gz (16.8 kB view details)

Uploaded Oct 6, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

find_similar-1.5.0-py3-none-any.whl (17.2 kB view details)

Uploaded Oct 6, 2023 Python 3

File details

Details for the file find-similar-1.5.0.tar.gz.

File metadata

Download URL: find-similar-1.5.0.tar.gz
Upload date: Oct 6, 2023
Size: 16.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for find-similar-1.5.0.tar.gz
Algorithm	Hash digest
SHA256	`48b15d6ca0bbe37661b7bf14c879baaeed35c41f8e69cdb4f077e0350245e34d`
MD5	`9e7f530e6ad6948c80ac9809047d02da`
BLAKE2b-256	`5f90d772cabfa6aba0d93237a1f33badc173efc24eb2aa40c8027e6af8ae081a`

See more details on using hashes here.

File details

Details for the file find_similar-1.5.0-py3-none-any.whl.

File metadata

Download URL: find_similar-1.5.0-py3-none-any.whl
Upload date: Oct 6, 2023
Size: 17.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for find_similar-1.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`18da5f24d65e5bbf8d2b9e2be959f686afa7b01a8f01b2cad16dd5570092795f`
MD5	`8b0a8ff24d9f88cb8ce1428d12b57d70`
BLAKE2b-256	`8835bed9bcbd527536863e6a9222833c9af5992647235aa620609a50747ba132`

See more details on using hashes here.

find-similar 1.5.0

Navigation

Verified details

Owner

Unverified details

Project links

Meta

Classifiers

Project description

FindSimilar

To Get Acquainted

Mission Statement

Key Objectives

Join Us

Open Source Collaboration

To Get Involved

Installation:

From PyPi

From python package

Usage example:

Simple usage

Development

Lab

Project details

Verified details

Owner

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes