Python library for fast fuzzy search over a big file written in Rust
Project description
Python library for fast fuzzy search over a big file written in Rust
Table of Contents
About The Project
Fastzy is a library written in Rust used for searching over a file for a text based on its distance (levenshtein). The library uses mbleven algorithm for a k-bounded levenshtein distance measurement. When the max distance requested is above 3, where mbleven is slower, the distance algorithm is replaced with Wagner–Fischer. The library loads the whole file into memory, and create a lightweight index, based on the lengths of the lines. It helps to narrow down the amount of lookups to only potential lines.
Built With
Performance
Library | Text Size | Function | Time |
---|---|---|---|
python-Levenshtein | 500mb | Levenshtein.distance('text') | 13.93s |
fastzy | 500mb | fastzy.search('text) | 0.023s |
Installation
pip3 install fastzy
Usage
import fastzy
# open a file and index it in memory
searcher = fastzy.Searcher(
file_path='input_text_file.txt',
separator='',
)
# search for the input text 'text' with the distance of 1
searcher.search(
pattern='text',
max_distance=1,
)
['test', 'texts', 'next']
License
Distributed under the MIT License. See LICENSE
for more information.
Contact
Gal Ben David - gal@intsights.com
Project Link: https://github.com/Intsights/fastzy
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for fastzy-0.3.2-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 223d5c8ff9a7bbb107d84fa0bbf2fcd3ce019961c588931c5ac97fbdd3ae6b79 |
|
MD5 | 00990e250f62ca59465d90db0d2406b3 |
|
BLAKE2b-256 | 6143fabf80c64ae6edeac672f49f73b5645a99d6e8b89a39a81885120fc33802 |
Hashes for fastzy-0.3.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c14ce8c4ab9a4d4114a0308f0c45341df3fd8f91556e015135e80784ab88a76e |
|
MD5 | a0a8a7fedb45216280d99e46b1cb1a34 |
|
BLAKE2b-256 | 2ce4808c5f0968a9f3e1efa75e9e241a92c15d697b12500b19af42e8cc085111 |
Hashes for fastzy-0.3.2-cp39-cp39-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b3e892ab0f61d571093a7279a1396f5ea6415fe7496e0d5c7360e4529b779763 |
|
MD5 | d396efc8dc4aa66e95609934ee84109c |
|
BLAKE2b-256 | 904e2c74dcb7e41346483e86235db4325ba729182550347ec5231c51438a696d |
Hashes for fastzy-0.3.2-cp38-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a48d454f849e9aca9f23f830fbf49e1380c7fa21eda2aaac391d64a3f3a9f96 |
|
MD5 | 263fbddf0d08895fc567154052f64d94 |
|
BLAKE2b-256 | b226e0f339d4e44f0b56e0358ace927f0004f5b9edec6b0bcdb13bd870455275 |
Hashes for fastzy-0.3.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 648d77abbe0018d0284edeeb0b8a3a9b51cc089c3513f19c721605b84805e111 |
|
MD5 | 4e69619ff471865d791d09dc426de104 |
|
BLAKE2b-256 | 1e0ce6cfb8c77f626e7caf6ebd357b2e757041a69364cda20cd4dba9cf2ad89c |
Hashes for fastzy-0.3.2-cp38-cp38-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b79b68d11c0564528b1677ad0e14f996c44f35fcef11999a0d25dad3f75fb98d |
|
MD5 | 3d901536e6094e4b37e41726a2652ca6 |
|
BLAKE2b-256 | 8473d79818002d1d55fdfda08171ea38aaad6ef2f25ee15cd21ecadbe23b2363 |
Hashes for fastzy-0.3.2-cp37-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e55c22747d70252d207288cf8f6bdf72b7770f7e9305b5aedbc6ae386543cfd6 |
|
MD5 | 323681455862f0ea09435d5c08e5551c |
|
BLAKE2b-256 | e6e7e4e3b44068531275709cf3a0421eb3689f15c7af41345c224d4881ca4ed3 |
Hashes for fastzy-0.3.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8bfbf531d6c73c1c198b53e13bac4e22b5c8abe17a54db772a95567019ec146e |
|
MD5 | 9ddccdc7cb4c1f2f379f8fe4744ca2b8 |
|
BLAKE2b-256 | fa1456cad092ee70afe3811fa68d09ca45e42e549402a20f08835dc9ae19cab5 |
Hashes for fastzy-0.3.2-cp37-cp37m-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88d4d1f1d6ca1562903ad8a4678e2acdbd30252594c0857f066a05110398ce91 |
|
MD5 | a59045c5282f7e596315d797aa4e2a27 |
|
BLAKE2b-256 | 6434de634b6604563043fe336f582f97a202a62361e5ba9e45591c391bac44cb |