regexparser

Simple parser for small text chunks

These details have not been verified by PyPI

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

regexparser

Frequently I have to parse text into float, int and date, for a few examples. The TextParser class to isolates the parsing task, instead of getting the parsing rules (or functions) spread all over the code.

Install

pip install regexparser

pip install from github:

pip install git+https://github.com/wilsonfreitas/regexparser.git

Using

Create a class inheriting TextParser and write methods with names starting with parse. These methods must accept 2 more arguments after self and those arguments are the text that will be parsed and the MatchObject that is returned by applying the regular expression to the text. The parse* methods are called only if its regular expression is matched and their regular expressions are set in the methods' doc string.

textparser provides a compact way of applying transformation rules and that rules don't have to be spread out along the code.

The following code shows how to create text parsing rules for a tew text chunks in portuguese.

class PortugueseRulesParser(TextParser):
    # transform Sim and Não into boolean True and False, ignoring case
    def parseBoolean_ptBR(self, text, match):
        r'^(sim|Sim|SIM|n.o|N.o|N.O)$'
        return text[0].lower() == 's'
    # transform Verdadeiro and Falso into boolean True and False, ignoring case
    def parseBoolean_ptBR2(self, text, match):
        r'^(verdadeiro|VERDADEIRO|falso|FALSO|V|F|v|f)$'
        return text[0].lower() == 'v'
    # parses a decimal number
    def parse_number_decimal_ptBR(self, text, match):
        r'^-?\s*\d+,\d+?$'
        text = text.replace(',', '.')
        return eval(text)
    # parses number with thousands
    def parse_number_with_thousands_ptBR(self, text, match):
        r'^-?\s*(\d+\.)+\d+,\d+?$'
        text = text.replace('.', '')
        text = text.replace(',', '.')
        return eval(text)

parser = PortugueseRulesParser()

assert parser.parse('1,1') == 1.1
assert parser.parse('-1,1') == -1.1
assert parser.parse('- 1,1') == -1.1
assert parser.parse('WÃ¡lson') == 'WÃ¡lson'
assert parser.parse('1.100,01') == 1100.01

I copied the idea of using a regular expression in __doc__ from PLY.

Project details

These details have not been verified by PyPI

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1.0

Feb 19, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

regexparser-0.1.0.tar.gz (4.4 kB view hashes)

Uploaded Feb 19, 2023 Source

Built Distribution

regexparser-0.1.0-py3-none-any.whl (4.6 kB view hashes)

Uploaded Feb 19, 2023 Python 3

Hashes for regexparser-0.1.0.tar.gz

Hashes for regexparser-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a4f021db06ed3c1aa0d9a97a151be2a116ae0bd6b0001880580c48f3ea4070d0`
MD5	`c6f3a8e00ab575a8cd8ecd6825dcb064`
BLAKE2b-256	`bf24b2c0ab8ea331145bda3b10c929e8e4aaa808d300480109951da666ca513f`

Hashes for regexparser-0.1.0-py3-none-any.whl

Hashes for regexparser-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7ef0094b1fc58379b209c3887c357a931e33488496ae1c1b0ab44822d1995e33`
MD5	`b2a58d6fa2c7d5829acbb6dcb4b754cc`
BLAKE2b-256	`794a9b2219eb2fd0cd8551b3a6c99d2206a006367706aca5b58f64384b83f6b2`