Skip to main content

Algorithm to define similarity rating between objects

Project description

PyPi PyPI - Python Version PyPI - Downloads

FindSimilar

User-friendly library to find similar objects

To Get Acquainted

Mission Statement

The mission of the "Find Similar" project is to provide a powerful and versatile open source library that empowers developers to efficiently find similar objects and perform comparisons across a variety of data types. Whether dealing with texts, images, audio, or more, our project aims to simplify the process of identifying similarities and enhancing decision-making.

Key Objectives

  1. Extensibility: We strive to build a flexible framework that goes beyond textual comparisons, with plans to expand compatibility to various data formats, including images, audio, and more.
  2. Ease of Integration: Our library will offer an intuitive interface that integrates seamlessly into existing applications and workflows, making it accessible to developers regardless of their experience level.
  3. Scalability: Our focus is on creating efficient algorithms and data structures that can handle datasets of varying sizes, ensuring performance and accuracy as the project scales.
  4. Community Collaboration: By embracing the principles of open source development, we invite a diverse community of contributors to collaborate, improve, and innovate upon the project, fostering a culture of shared knowledge and expertise.
  5. Documentation and Education: We are committed to providing comprehensive documentation, tutorials, and resources to help users and contributors understand the library's capabilities and use them effectively.
  6. Privacy and Ethics: As we expand into various data types, we are dedicated to upholding privacy and ethical considerations, ensuring that our library is built and used responsibly.

Join Us

We invite developers, data scientists, and enthusiasts from all backgrounds to join our mission. Together, we can shape the future of "Find Similar," creating a powerful tool that enhances decision-making, discovery, and innovation across diverse fields.

Open Source Collaboration

"FindSimilar" is an open source project, fostering collaboration and innovation. We welcome contributors from all backgrounds to join us in shaping the future of similarity comparisons across various data types.

To Get Involved

Get start with:

Installation:

From PyPi

pip install find-similar

You install core package from pypi. If you want to use tests and laboratory you can install find-similar from python package

From python package

git clone https://github.com/findsimilar/find-similar
pip3 install wheel
python find-similar/setup.py bdist_wheel
pip3 install find-similar/dist/*

Usage example:

Simple usage

from find_similar import find_similar

texts = ['one two', 'two three', 'three four']

text_to_compare = 'one four'
result = find_similar(text_to_compare, texts, count=10)
for item in result:
    print(item.text)
    print(item.cos)

expected result:

one two
0.5
three four
0.5
two three
0.0

Development

  • find_similar - this is the main package to install and use
  • analytics - help functions to improve the main algorithm
  • lab - python scripts to research

Lab

You can run any useful script from lab package

cd lab
  • Use load_data_from_file.py to load test data
python load_data_from_file.py /my/path/to/file.xlsx
  • Use check_total_rating.py to analyze algorithm accuracy
python check_total_rating.py

Example result:

Поиск выполнен для 529 позиций:
топ 1 -- 353 (66.73 %)
топ 5 -- 442 (83.55 %)
топ 10 -- 468 (88.47 %)
топ 25 -- 501 (94.71 %)
топ 50 -- 515 (97.35 %)
топ 100 -- 519 (98.11 %)
топ 500 -- 523 (98.87 %)
топ 1000 -- 529 (100.0 %)
топ 2000 -- 529 (100.0 %)
  • Use check_time_one_item to check how long time algorithm works for one item
python check_time_one_item.py

Example result:

Load base items...
1999 items loaded
RESULT TIME FOR ONE ITEM (REPEAT 1 times) = 0.03772415800085582
  • Use compare_two to compare two different texts. You can change texts in compare_two.txt file
python compare_two.py
  • Use tokenize_one to check how one text will be tokenized. You can set the text in tokenize_one.txt file
python tokenize_one.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

find-similar-1.5.0.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

find_similar-1.5.0-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file find-similar-1.5.0.tar.gz.

File metadata

  • Download URL: find-similar-1.5.0.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for find-similar-1.5.0.tar.gz
Algorithm Hash digest
SHA256 48b15d6ca0bbe37661b7bf14c879baaeed35c41f8e69cdb4f077e0350245e34d
MD5 9e7f530e6ad6948c80ac9809047d02da
BLAKE2b-256 5f90d772cabfa6aba0d93237a1f33badc173efc24eb2aa40c8027e6af8ae081a

See more details on using hashes here.

File details

Details for the file find_similar-1.5.0-py3-none-any.whl.

File metadata

  • Download URL: find_similar-1.5.0-py3-none-any.whl
  • Upload date:
  • Size: 17.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for find_similar-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 18da5f24d65e5bbf8d2b9e2be959f686afa7b01a8f01b2cad16dd5570092795f
MD5 8b0a8ff24d9f88cb8ce1428d12b57d70
BLAKE2b-256 8835bed9bcbd527536863e6a9222833c9af5992647235aa620609a50747ba132

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page