Simple Text similarity python
Project description
pykosinus
pykosinus is an open-source Python library for text similarity search scoring. It provides a fast and memory-efficient way to calculate cosine similarity scores, making it suitable for various text similarity applications. The library is designed to be user-friendly and encourages contributions from the community.
Installation
To install pykosinus, make sure you have Python 3.8.17 or higher installed. Then, you can install the library using pip:
pip install pykosinus
Additional Library for Mac Users
If you are using pykosinus on a Mac, you may need to install the GCC compiler to enable certain features. GCC is a widely used compiler for various programming languages.
To install GCC on macOS, you can use Homebrew, a popular package manager for macOS. Follow these steps to install GCC using Homebrew:
- Open a terminal window.
- Install Homebrew by running the following command:
[/bin/bash](VALID_FILE) -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
- Install GCC by running the following command:
brew install gcc
- Verify the installation by running the following command:
gcc --version
- Set gfortran
export FC=gfortran
- Verify gfortran installation
gfortran --version
- Install openblas and set pkg config openblas
brew install openblas
export PKG_CONFIG_PATH="/opt/homebrew/opt/openblas/lib/pkgconfig"
Usage
To use pykosinus in your Python project, you can follow these steps:
- Import the necessary modules and classes:
from pykosinus import Content
from pykosinus.lib.scoring import TextScoring
- Create an instance of the TextScoring class, providing the collection name as a parameter:
similarity = TextScoring(collection_name)
- Set the contents to be searched using the push_contents method, passing a list of Content objects:
contents = [
Content(
content="Lorem ipsum dolor sit amet, consectetur adipiscing elit.",
identifier="blog-1",
section="blog_title",
),
Content(
content="Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.",
identifier="blog-2",
section="blog_title",
),
Content(
content="Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris.",
identifier="blog-3",
section="blog_title",
),
# Add more contents as needed
]
similarity.push_contents(contents)
- Initialize the similarity search by calling the initialize method:
similarity.initialize()
- Perform a similarity search by calling the search method, providing a keyword and an optional threshold:
results = similarity.search(keyword="search keyword", threshold=0.2)
- The search method returns a list of ScoringResult objects, which contain the relevant information about the search results. You can access the properties of each result, such as identifier, content, section, similar, and score.
for result in results:
print(
result.identifier, result.content, result.section, result.similar, result.score
)
Contributing
pykosinus welcomes contributions from the community. If you would like to contribute to the library, please follow these steps:
- Fork the pykosinus repository on GitHub.
- Create a new branch for your feature or bug fix.
- Make your changes and commit them with descriptive commit messages.
- Push your changes to your forked repository.
- Submit a pull request to the master pykosinus repository, explaining the changes you have made.
Versioning
pykosinus is currently in version 0.1.7. We encourage continuous development and contributions from other contributors to improve and expand the library.
License
pykosinus is released under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pykosinus-0.1.7.tar.gz
.
File metadata
- Download URL: pykosinus-0.1.7.tar.gz
- Upload date:
- Size: 11.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/40.0 requests/2.28.2 requests-toolbelt/1.0.0 urllib3/1.26.15 tqdm/4.65.0 importlib-metadata/6.7.0 keyring/24.2.0 rfc3986/2.0.0 colorama/0.4.6 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 285fb1da784c168aa75dc78c8d2c07f041773a9c5108bcea2c55c15e101c5585 |
|
MD5 | 75d3aadf221e43a7869daef4d3053793 |
|
BLAKE2b-256 | 76cfc73cfc8c3296af8ac7349cb53aaf2af8d423affd199e145a1bfb78ddb11e |
File details
Details for the file pykosinus-0.1.7-py3-none-any.whl
.
File metadata
- Download URL: pykosinus-0.1.7-py3-none-any.whl
- Upload date:
- Size: 9.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/40.0 requests/2.28.2 requests-toolbelt/1.0.0 urllib3/1.26.15 tqdm/4.65.0 importlib-metadata/6.7.0 keyring/24.2.0 rfc3986/2.0.0 colorama/0.4.6 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a76c2169109a6d2a9f39711a145c82d1f156981b6540e9832cff0afc00140e6 |
|
MD5 | 00b73117490f8c6054547bc5da1a4ffd |
|
BLAKE2b-256 | 7e974ac60772843e001408096005e0b497826a506fd73c8a30d710f338a9ddc9 |