Skip to main content

A package for matching a set of strings and textual patterns in a given text file

Project description

badge PyPI version fury.io PyPI pyversions Ask Me Anything !

pystringmatcher

description

a small utility tool for finding substrings and text patterns in an input file the tool is cutting the text in the file into chunks and processes every chunk in a separate process using python's multiprocessing module

installation:

pip install pystringmatcher

usage:

  • using the python module
python -m py pyringmatcher -h

Finding text patterns in input text file

optional arguments:
  -h, --help            show this help message and exit
  -f FILE_PATH, --file FILE_PATH
                        the input file to search the patterns in
  -p PATTERNS, --patterns PATTERNS
                        the pattern\s to search in the file separated by ,
  -n NUM_LINES_PER_CHUNK, --num-lines NUM_LINES_PER_CHUNK
                        the number of lines per chunk of text from the input file
  • or by using the included console script
stringmatcher -h 
  • In your own program
from pystringmatcher.Algorithms import RabinKarp
from pystringmatcher.Types import TextFile


try:
    text = TextFile(file_path="/path/to/file")
    algorithm = RabinKarp()
    chunks = text.divide_into_chunks(num_of_lines_each_chunk=1000)
    patterns = "alpha,beta,charlie,delta,echo,foxtrot".split(",")
    print(f"[X] - Start finding the patterns : {patterns} in the file: {text}")
    matches = text.find_matches(chunks=chunks, patterns=patterns, algorithm=algorithm)

    if matches:
        print("Found matches")
        print(matches)

    print("No matches were found")
except FileNotFoundError:
    print(f"The file: {text} was not found and may not exist")
  • Implementing your own matching algorithm
from pystringmatcher.Algorithms import Algorithm
from pystringmatcher.Types import Match


class MyAlgorithm(Algorithm):

    def preprocess(self, pattern, text, *args, **kwargs):
        """some preprocess logic goes here if needed"""

    def run(self, pattern, text, *args, **kwargs):
        matches = []
        """the mathcing algorithm logic goes here
        for any match: matches.append(Match(char_offset=start_index_of_match)) 
        """         
        return matches

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pystringmatcher-0.0.9.tar.gz (9.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pystringmatcher-0.0.9-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file pystringmatcher-0.0.9.tar.gz.

File metadata

  • Download URL: pystringmatcher-0.0.9.tar.gz
  • Upload date:
  • Size: 9.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.8.6

File hashes

Hashes for pystringmatcher-0.0.9.tar.gz
Algorithm Hash digest
SHA256 c2546e1acbb9381ef7057af4cbb5d9f0d2fe2a0778bb8fef618c2e7b878ea3a3
MD5 127f8fa4bf643d1c9828bf0f0a6f0c23
BLAKE2b-256 c3954244114cc6d2d45d56150a82ab3baab0c418bf0e8b30d343d09e99492882

See more details on using hashes here.

File details

Details for the file pystringmatcher-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: pystringmatcher-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 14.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.8.6

File hashes

Hashes for pystringmatcher-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 bf482c89486a0f237c1bab99e5ee0ba5b5e0382cbf9fbac1c94088104c9d95a0
MD5 3ec8f8f329a35e11b94e7866994809c5
BLAKE2b-256 a156fbe433c70112f829e2ff3ab6f97c283075d6b8c6cfb9d7c15cf403c7d4b2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page