A package for matching a set of strings and textual patterns in a given text file
Project description
pystringmatcher
description
a small utility tool for finding substrings and text patterns in an input file the tool is cutting the text in the file into chunks and processes every chunk in a separate process using python's multiprocessing module
installation:
pip install pystringmatcher
usage:
- using the python module
python -m py pyringmatcher -h
Finding text patterns in input text file
optional arguments:
-h, --help show this help message and exit
-f FILE_PATH, --file FILE_PATH
the input file to search the patterns in
-p PATTERNS, --patterns PATTERNS
the pattern\s to search in the file separated by ,
-n NUM_LINES_PER_CHUNK, --num-lines NUM_LINES_PER_CHUNK
the number of lines per chunk of text from the input file
- or by using the included console script
stringmatcher -h
- In your own program
import os
from multiprocessing.pool import Pool
from pystringmatcher.Objects import Aggregator
from pystringmatcher.Algorithms import RabinKarp
from pystringmatcher.Objects import Matcher
from pystringmatcher.Types import TextFile
try:
text = TextFile(file_path="/path/to/file")
algorithm = RabinKarp()
chunks = text.divide_into_chunks(num_of_lines_each_chunk=1000)
matchers = []
patterns = "alpha,beta,charlie,delta,echo,foxtrot".split(",")
print(f"[X] - Start finding the patterns : {patterns} in the file: {text}")
pool = Pool(processes=os.cpu_count())
for chunk in chunks:
matcher = Matcher(text_chunk=chunk, patterns=patterns, algorithm=algorithm)
matchers.append(matcher)
matchers = pool.map(Matcher.find_matches, matchers)
aggregator = Aggregator(matchers=matchers)
aggregator.aggregate_matches()
if aggregator.aggregated_matches:
print(aggregator.aggregated_matches)
except FileNotFoundError:
print(f"The file: {text} was not found and may not exist")
- Implementing your own matching algorithm
from pystringmatcher.Algorithms import Algorithm
from pystringmatcher.Types import Match
class MyAlgorithm(Algorithm):
def preprocess(self, pattern, text, *args, **kwargs):
"""some preprocess logic goes here if needed"""
def run(self, pattern, text, *args, **kwargs):
matches = []
"""the mathcing algorithm logic goes here
for any match: matches.append(Match(char_offset=start_index_of_match))
"""
return matches
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pystringmatcher-0.0.7.tar.gz
(9.6 kB
view hashes)
Built Distribution
Close
Hashes for pystringmatcher-0.0.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1eaf49a4f2e601a287b08d070f94e7e0993942cfd330d89cb8a10765930460be |
|
MD5 | c0ee00a7436685ac388e9302d07dae71 |
|
BLAKE2b-256 | 0a81067ee9cffce44cae67b08bbf4a74c99d570abc386f95e20a2a43654e43ee |