Skip to main content

Python/Cython Boyer-Moore string-search algorithm

Project description

Build Status codecov PyPI version PyPI - Python Version Code style: black

pybmoore

Python/Cython implementation of Boyer-Moore string-search algorithm.

Installing

Install and update using pip:

pip install pybmoore

notice: gcc must be available on the system.

Usage

Single term

The search method in the pybmoore module will return a list of tuples with all occurrences, where the tuple have the initial and final position. For example:

import pybmoore


TEXT = """The Boyer–Moore string-search algorithm is 
an efficient string-searching algorithm that is the 
standard benchmark for practical string-search literature.
"""

matches = pybmoore.search('string', TEXT)
print(f"Occurrences: {len(matches)}")
# output: Occurrences: 3

print(matches)
# output: [(16, 22), (57, 63), (130, 136)]

for x, y in matches:
    print(f"({x},{y}) - {TEXT[x:y]}")

notice: search method it's case sensitive.

import pybmoore


TEXT = """The algorithm preprocesses the string being searched for (the pattern), 
but not the string being searched in (the text). It is thus well-suited for 
applications in which the pattern is much shorter than the text or where it 
persists across multiple searches.
"""

pybmoore.search('algorithm', TEXT)
# output: [(4, 13)]

pybmoore.search('Algorithm', TEXT)
# output: []

Multiple terms

from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor

import pybmoore


TEXT = """The Boyer-Moore algorithm searches for occurrences of P in T by 
performing explicit character comparisons at different alignments. Instead of a 
brute-force search of all alignments (of which there are m − n + 1, Boyer-Moore 
uses information gained by preprocessing P to skip as many alignments as possible.
"""

# Using a list of patterns
pybmoore.search_m(['brute-force', 'Boyer-Moore'], TEXT, ProcessPoolExecutor)
# output: {'brute-force': [(146, 157)], 'Boyer-Moore': [(4, 15), (214, 225)]}

# Using a set of patterns
pybmoore.search_m({'brute-force', 'Boyer-Moore'}, TEXT, ThreadPoolExecutor)
# output: {'brute-force': [(146, 157)], 'Boyer-Moore': [(4, 15), (214, 225)]}

# Using a tuple of patterns
pybmoore.search_m(('brute-force', 'Boyer-Moore'), TEXT, ThreadPoolExecutor, max_workers=4)
# output: {'brute-force': [(146, 157)], 'Boyer-Moore': [(4, 15), (214, 225)]}

Details

For granular control of the pool, use the parameters listed in the module documentation. For example:

Development

To build pybmoore locally first install requirements-dev.txt dependencies and run:

make build # without Cython

make build USE_CYTHON=1 # with Cython

in some cases it's necesary run make clean before make build.

Type make in the command line to see all available targets.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pybmoore-2.0.0.tar.gz (66.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page