A package for matching a set of strings and textual patterns in a given text file
Project description
pystringmatcher
description
a small utility tool for finding substrings and text patterns in an input file
installation:
pip install pystringmatcher
usage:
- using the python module
python -m py pyringmatcher -h
Finding text patterns in input text file
optional arguments:
-h, --help show this help message and exit
-f FILE_PATH, --file FILE_PATH
the input file to search the patterns in
-p PATTERNS, --patterns PATTERNS
the pattern\s to search in the file separated by ,
-n NUM_LINES_PER_CHUNK, --num-lines NUM_LINES_PER_CHUNK
the number of lines per chunk of text from the input file
- or by using the included console script
stringmatcher -h
- In your own program
import os
from concurrent.futures import ThreadPoolExecutor
from pystringmatcher.Objects import Aggregator
from pystringmatcher.Algorithms import RabinKarp
from pystringmatcher.Objects import Matcher
from pystringmatcher.Types import TextFile
try:
file_path = r"/path/to/file.txt"
text = TextFile(file_path=file_path) # raises FileNotFoundError if the file doesn't exist
algorithm = RabinKarp() # implemented in package in .Algorithms
chunks = text.divide_into_chunks(num_of_lines_each_chunk=1000)
matchers = []
patterns = "alpha,beta,charlie,delta".split(",")
with ThreadPoolExecutor(max_workers=os.cpu_count()) as executor:
for chunk in chunks:
matcher = Matcher(text_chunk=chunk, patterns=patterns, algorithm=algorithm)
matchers.append(matcher)
executor.submit(matcher.find_matches)
aggregator = Aggregator(matchers=matchers)
aggregator.aggregate_matches()
if aggregator.aggregated_matches:
print(f"Found matches")
print(aggregator.aggregated_matches)
except FileNotFoundError:
print(f"The file: {file_path} was not found and may not exist")
- Implementing your own matching algorithm
from pystringmatcher.Algorithms import Algorithm
from pystringmatcher.Types import Match
class MyAlgorithm(Algorithm):
def preprocess(self, pattern, text, *args, **kwargs):
"""some preprocess logic goes here if needed"""
def run(self, pattern, text, *args, **kwargs):
matches = []
"""the mathcing algorithm logic goes here
for any match: matches.append(Match(char_offset=start_index_of_match))
"""
return matches
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pystringmatcher-0.0.6.tar.gz
(9.5 kB
view hashes)
Built Distribution
Close
Hashes for pystringmatcher-0.0.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e66c9ed40ce65714b18ca4fc9f09df4e128990c76ea33d800003b5df833d9093 |
|
MD5 | d3ff232b864336f583a55b40fcfc22f2 |
|
BLAKE2b-256 | bb14af43cbeb1d7b67c42e0a94e95b12288d5dda63a0f6a7f7a4d08a3021462a |