findlike is a package to retrieve similar documents
Project description
findlike
findlike is a command-line tool that enables users to find similar documents in relation to a reference file or an ad-hoc query. This project is written in Python and utilizes well-known libraries that are optimized for performance.
Features:
- Choose between BM25 and TF-IDF + cosine distance for similarity calculation
- Recursive search option
- Control over output format, document size to consider, maximum results to show, etc.
- Multilingual support
Table of Contents
Getting Started
These instructions will guide you through the process of installing and using findlike on your local machine.
Prerequisites
- Python 3.7 or higher
- Additional dependencies as listed in the
requirements.txtfile
Installation
To install findlike, follow the steps below:
pip install --user findlike
If you prefer to download the repository instead:
# Clone this repository
git clone https://github.com/brunoarine/findlike.git
# Navigate into the findlike directory
cd findlike
# Install the required dependencies
pip install -r requirements.txt
# Add an alias for the findlike command (Optional)
echo "alias findlike='python /path/to/findlike/main.py'" >> ~/.bashrc
source ~/.bashrc
Usage
Here is the basic usage of findlike:
findlike [OPTIONS] [REFERENCE_FILE]
findlike will scan a given directory and return the most similar documents in relation to either a reference file or a query passed to with by the --query option.
Options
Here's the breakdown of the available options in Findlike:
--version Show the version and exit.
-q, --query TEXT query option if no reference file is provided
-d, --directory PATH directory to scan for similar files [default:
(current directory)]
-f, --filename-pattern TEXT filename pattern matching [default: *.*]
-R, --recursive recursive search
-a, --algorithm [bm25|tfidf] text similarity algorithm [default: tfidf]
-l, --language TEXT stemmer and stopwords language [default:
english]
-c, --min-chars INTEGER minimum document size (in number of
characters) to be considered [default: 1]
-A, --absolute-paths show absolute rather than relative paths
-m, --max-results INTEGER maximum number of results [default: 10]
-p, --prefix TEXT result lines prefix
-s, --show-scores show similarity scores
-h, --hide-reference remove REFERENCE_FILE from results
-H, --heading TEXT results list heading
-F, --format [plain|json] output format [default: plain]
-t, --threshold FLOAT minimum score for a result to be shown
[default: 0.0]
--help Show this message and exit.
Examples
To find similar documents in a directory (recursively):
findlike -R -d /path/to/directory reference_file.md
To search files using a query instead of a reference file while filtering by extension:
findlike -q "black holes" -d /path/to/ayreon/lyrics -f "*.txt"
To show similarity scores and filenames in JSON format:
findlike -s -F json reference_file.md
To print the results table as a Markdown list:
findlike -H "# List of similar documents" -p "- " reference_file.txt
License
This project is licensed under the terms of the MIT license. See LICENSE for more details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file findlike-1.0.0.tar.gz.
File metadata
- Download URL: findlike-1.0.0.tar.gz
- Upload date:
- Size: 10.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf79088e4349b3ebbd0a89587a7019d48f15e5fe0ad91eb13ff3021498d0a5de
|
|
| MD5 |
04f4a69f8e56d229a78e3d59ffb62259
|
|
| BLAKE2b-256 |
917b5a4d7ca703de40024888988c76b4b59e9ac881284893210a5230782ae618
|
File details
Details for the file findlike-1.0.0-py3-none-any.whl.
File metadata
- Download URL: findlike-1.0.0-py3-none-any.whl
- Upload date:
- Size: 9.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4787a206ac6b7373a23763af93effe475565a5efa0d71973a8f05e4eb1fec325
|
|
| MD5 |
c202668221837ec58a5aed87a6e61d94
|
|
| BLAKE2b-256 |
e4de45cddf858714af8db4f44b4744fb32e269031811dd5efbd041d440cc4307
|