Skip to main content

Simple parser for small text chunks

Project description

regexparser

Frequently I have to parse text into float, int and date, for a few examples. The TextParser class to isolates the parsing task, instead of getting the parsing rules (or functions) spread all over the code.

Install

pip install regexparser

pip install from github:

pip install git+https://github.com/wilsonfreitas/regexparser.git

Using

Create a class inheriting TextParser and write methods with names starting with parse. These methods must accept 2 more arguments after self and those arguments are the text that will be parsed and the MatchObject that is returned by applying the regular expression to the text. The parse* methods are called only if its regular expression is matched and their regular expressions are set in the methods' doc string.

textparser provides a compact way of applying transformation rules and that rules don't have to be spread out along the code.

The following code shows how to create text parsing rules for a tew text chunks in portuguese.

class PortugueseRulesParser(TextParser):
    # transform Sim and Não into boolean True and False, ignoring case
    def parseBoolean_ptBR(self, text, match):
        r'^(sim|Sim|SIM|n.o|N.o|N.O)$'
        return text[0].lower() == 's'
    # transform Verdadeiro and Falso into boolean True and False, ignoring case
    def parseBoolean_ptBR2(self, text, match):
        r'^(verdadeiro|VERDADEIRO|falso|FALSO|V|F|v|f)$'
        return text[0].lower() == 'v'
    # parses a decimal number
    def parse_number_decimal_ptBR(self, text, match):
        r'^-?\s*\d+,\d+?$'
        text = text.replace(',', '.')
        return eval(text)
    # parses number with thousands
    def parse_number_with_thousands_ptBR(self, text, match):
        r'^-?\s*(\d+\.)+\d+,\d+?$'
        text = text.replace('.', '')
        text = text.replace(',', '.')
        return eval(text)

parser = PortugueseRulesParser()

assert parser.parse('1,1') == 1.1
assert parser.parse('-1,1') == -1.1
assert parser.parse('- 1,1') == -1.1
assert parser.parse('Wálson') == 'Wálson'
assert parser.parse('1.100,01') == 1100.01

I copied the idea of using a regular expression in __doc__ from PLY.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

regexparser-0.1.0.tar.gz (4.4 kB view details)

Uploaded Source

Built Distribution

regexparser-0.1.0-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file regexparser-0.1.0.tar.gz.

File metadata

  • Download URL: regexparser-0.1.0.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.7.16 Windows/10

File hashes

Hashes for regexparser-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a4f021db06ed3c1aa0d9a97a151be2a116ae0bd6b0001880580c48f3ea4070d0
MD5 c6f3a8e00ab575a8cd8ecd6825dcb064
BLAKE2b-256 bf24b2c0ab8ea331145bda3b10c929e8e4aaa808d300480109951da666ca513f

See more details on using hashes here.

File details

Details for the file regexparser-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: regexparser-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.7.16 Windows/10

File hashes

Hashes for regexparser-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7ef0094b1fc58379b209c3887c357a931e33488496ae1c1b0ab44822d1995e33
MD5 b2a58d6fa2c7d5829acbb6dcb4b754cc
BLAKE2b-256 794a9b2219eb2fd0cd8551b3a6c99d2206a006367706aca5b58f64384b83f6b2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page