Skip to main content

Python library for fast fuzzy search over a big file written in Rust

Project description

Logo

Python library for fast fuzzy search over a big file written in Rust

license Python Build PyPi

Table of Contents

About The Project

Fastzy is a library written in Rust used for searching over a file for a text based on its distance (levenshtein). The library uses mbleven algorithm for a k-bounded levenshtein distance measurement. When the max distance requested is above 3, where mbleven is slower, the distance algorithm is replaced with Wagner–Fischer. The library loads the whole file into memory, and create a lightweight index, based on the lengths of the lines. It helps to narrow down the amount of lookups to only potential lines.

Built With

Performance

Library Text Size Function Time
python-Levenshtein 500mb Levenshtein.distance('text') 13.93s
fastzy 500mb fastzy.search('text) 0.023s

Installation

pip3 install fastzy

Usage

import fastzy

# open a file and index it in memory
searcher = fastzy.Searcher(
    file_path='input_text_file.txt',
    separator='',
)

# search for the input text 'text' with the distance of 1
searcher.search(
    pattern='text',
    max_distance=1,
)
['test', 'texts', 'next']

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Gal Ben David - gal@intsights.com

Project Link: https://github.com/Intsights/fastzy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

fastzy-0.3.1-cp39-none-win_amd64.whl (135.0 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

fastzy-0.3.1-cp39-cp39-manylinux2014_x86_64.whl (189.9 kB view hashes)

Uploaded CPython 3.9

fastzy-0.3.1-cp39-cp39-macosx_10_7_x86_64.whl (171.8 kB view hashes)

Uploaded CPython 3.9 macOS 10.7+ x86-64

fastzy-0.3.1-cp38-none-win_amd64.whl (135.2 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

fastzy-0.3.1-cp38-cp38-manylinux2014_x86_64.whl (188.9 kB view hashes)

Uploaded CPython 3.8

fastzy-0.3.1-cp38-cp38-macosx_10_7_x86_64.whl (170.9 kB view hashes)

Uploaded CPython 3.8 macOS 10.7+ x86-64

fastzy-0.3.1-cp37-none-win_amd64.whl (134.4 kB view hashes)

Uploaded CPython 3.7 Windows x86-64

fastzy-0.3.1-cp37-cp37m-manylinux2014_x86_64.whl (188.3 kB view hashes)

Uploaded CPython 3.7m

fastzy-0.3.1-cp37-cp37m-macosx_10_7_x86_64.whl (170.4 kB view hashes)

Uploaded CPython 3.7m macOS 10.7+ x86-64

fastzy-0.3.1-cp36-none-win_amd64.whl (134.6 kB view hashes)

Uploaded CPython 3.6 Windows x86-64

fastzy-0.3.1-cp36-cp36m-manylinux2014_x86_64.whl (188.7 kB view hashes)

Uploaded CPython 3.6m

fastzy-0.3.1-cp36-cp36m-macosx_10_7_x86_64.whl (170.6 kB view hashes)

Uploaded CPython 3.6m macOS 10.7+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page