Text parser.
Project description
About
A text parser written in the Python language.
The project has one goal, speed! See the benchmark below more details.
Project homepage: https://github.com/eerimoq/textparser
Documentation: http://textparser.readthedocs.org/en/latest
Credits
Thanks PyParsing for a user friendly interface. Many of textparser’s class names are taken from this project.
Installation
pip install textparser
Example usage
The Hello World example parses the string Hello, World! and outputs its parse tree ['Hello', ',', 'World', '!'].
The script:
from pprint import pprint
import textparser
from textparser import Sequence
class Parser(textparser.Parser):
def token_specs(self):
return [
('SKIP', r'[ \r\n\t]+'),
('WORD', r'\w+'),
('EMARK', '!', r'!'),
('COMMA', ',', r','),
('MISMATCH', r'.')
]
def grammar(self):
return Sequence('WORD', ',', 'WORD', '!')
tree = Parser().parse('Hello, World!')
token_tree = Parser().parse('Hello, World!', token_tree=True)
print('Tree:', tree)
print()
print('Token tree:')
pprint(token_tree)
Script execution:
$ env PYTHONPATH=. python3 examples/hello_world.py
Tree: ['Hello', ',', 'World', '!']
Token tree:
[Token(kind='WORD', value='Hello', offset=0),
Token(kind=',', value=',', offset=5),
Token(kind='WORD', value='World', offset=7),
Token(kind='!', value='!', offset=12)]
Benchmark
A benchmark comparing the CPU time of 10 JSON parsers, parsing a 276k bytes file.
$ env PYTHONPATH=. python3 examples/benchmarks/json/cpu.py
Parsed 'examples/benchmarks/json/data.json' 1 time(s) in:
PACKAGE SECONDS RATIO VERSION
textparser 0.10 100% 0.14.0
lark (LALR) 0.26 265% 0.6.2
funcparserlib 0.34 358% unknown
parsimonious 0.41 423% unknown
textx 0.53 548% 1.7.1
pyparsing 0.69 715% 2.2.0
pyleri 0.81 836% 1.2.2
parsy 0.94 976% 1.2.0
lark (Earley) 1.88 1949% 0.6.2
parsita 2.31 2401% unknown
$
NOTE 1: The parsers are not necessarily optimized for speed. Optimizing them will likely affect the measurements.
NOTE 2: The structure of the resulting parse trees varies and additional processing may be required to make them fit the user application.
NOTE 3: Only JSON parsers are compared. Parsing other languages may give vastly different results.
Contributing
Fork the repository.
Install prerequisites.
pip install -r requirements.txt
Implement the new feature or bug fix.
Implement test case(s) to ensure that future changes do not break legacy.
Run the tests.
make test
Create a pull request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file textparser-0.15.0.tar.gz
.
File metadata
- Download URL: textparser-0.15.0.tar.gz
- Upload date:
- Size: 11.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.8.1 pkginfo/1.3.2 requests/2.18.3 setuptools/38.5.0 requests-toolbelt/0.7.0 clint/0.5.1 CPython/2.7.14 Linux/4.13.0-46-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 835173490032f64681fb7edd315968fc281ea0f01d0fe9bedc76cc847dbb5792 |
|
MD5 | 5c0ed2da9e63ad5895688dd9c147965e |
|
BLAKE2b-256 | 1e3a8a8e7be17305b2d32778efafbd7b40a83c040ca6fecddc8c1ac0a4027fcf |
File details
Details for the file textparser-0.15.0-py2.py3-none-any.whl
.
File metadata
- Download URL: textparser-0.15.0-py2.py3-none-any.whl
- Upload date:
- Size: 9.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.8.1 pkginfo/1.3.2 requests/2.18.3 setuptools/38.5.0 requests-toolbelt/0.7.0 clint/0.5.1 CPython/2.7.14 Linux/4.13.0-46-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3dc730bacd60e106647516e3410138e5957b36d9cae0f31060575221d2710e8a |
|
MD5 | de019588b0adb8834fdd50778a2be941 |
|
BLAKE2b-256 | b8f32f2f13211c80818e69e0dc0c2b916383e911c5ca821671926744566aecde |