Simple parser for small text chunks
Project description
regexparser
Frequently I have to parse text into float
, int
and date
, for a few examples.
The TextParser
class to isolates the parsing task, instead of getting the parsing rules (or functions) spread all over the code.
Install
pip install regexparser
pip
install from github:
pip install git+https://github.com/wilsonfreitas/regexparser.git
Using
Create a class inheriting TextParser
and write methods with names starting with parse
.
These methods must accept 2 more arguments after self
and those arguments are the text
that will be parsed and the MatchObject
that is returned by applying the regular expression to the text
.
The parse*
methods are called only if its regular expression is matched and their regular expressions are set in the methods' doc string.
textparser
provides a compact way of applying transformation rules and that rules don't have to be spread out along the code.
The following code shows how to create text parsing rules for a tew text chunks in portuguese.
class PortugueseRulesParser(TextParser):
# transform Sim and Não into boolean True and False, ignoring case
def parseBoolean_ptBR(self, text, match):
r'^(sim|Sim|SIM|n.o|N.o|N.O)$'
return text[0].lower() == 's'
# transform Verdadeiro and Falso into boolean True and False, ignoring case
def parseBoolean_ptBR2(self, text, match):
r'^(verdadeiro|VERDADEIRO|falso|FALSO|V|F|v|f)$'
return text[0].lower() == 'v'
# parses a decimal number
def parse_number_decimal_ptBR(self, text, match):
r'^-?\s*\d+,\d+?$'
text = text.replace(',', '.')
return eval(text)
# parses number with thousands
def parse_number_with_thousands_ptBR(self, text, match):
r'^-?\s*(\d+\.)+\d+,\d+?$'
text = text.replace('.', '')
text = text.replace(',', '.')
return eval(text)
parser = PortugueseRulesParser()
assert parser.parse('1,1') == 1.1
assert parser.parse('-1,1') == -1.1
assert parser.parse('- 1,1') == -1.1
assert parser.parse('Wálson') == 'Wálson'
assert parser.parse('1.100,01') == 1100.01
I copied the idea of using a regular expression in __doc__
from PLY.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file regexparser-0.1.0.tar.gz
.
File metadata
- Download URL: regexparser-0.1.0.tar.gz
- Upload date:
- Size: 4.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.7.16 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a4f021db06ed3c1aa0d9a97a151be2a116ae0bd6b0001880580c48f3ea4070d0 |
|
MD5 | c6f3a8e00ab575a8cd8ecd6825dcb064 |
|
BLAKE2b-256 | bf24b2c0ab8ea331145bda3b10c929e8e4aaa808d300480109951da666ca513f |
File details
Details for the file regexparser-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: regexparser-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.7.16 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ef0094b1fc58379b209c3887c357a931e33488496ae1c1b0ab44822d1995e33 |
|
MD5 | b2a58d6fa2c7d5829acbb6dcb4b754cc |
|
BLAKE2b-256 | 794a9b2219eb2fd0cd8551b3a6c99d2206a006367706aca5b58f64384b83f6b2 |