Skip to main content

Text parser.

Project description

buildstatus coverage

About

A text parser written in the Python language.

The project has one goal, speed! See the benchmark below more details.

Project homepage: https://github.com/eerimoq/textparser

Documentation: http://textparser.readthedocs.org/en/latest

Credits

  • Thanks PyParsing for a user friendly interface. Many of textparser’s class names are taken from this project.

Installation

pip install textparser

Example usage

The Hello World example parses the string Hello, World! and outputs its parse tree ['Hello', ',', 'World', '!'].

The script:

from pprint import pprint

import textparser
from textparser import Sequence


class Parser(textparser.Parser):

    def token_specs(self):
        return [
            ('SKIP',          r'[ \r\n\t]+'),
            ('WORD',          r'\w+'),
            ('EMARK',    '!', r'!'),
            ('COMMA',    ',', r','),
            ('MISMATCH',      r'.')
        ]

    def grammar(self):
        return Sequence('WORD', ',', 'WORD', '!')


tree = Parser().parse('Hello, World!')
token_tree = Parser().parse('Hello, World!', token_tree=True)

print('Tree:', tree)
print()
print('Token tree:')
pprint(token_tree)

Script execution:

$ env PYTHONPATH=. python3 examples/hello_world.py
Tree: ['Hello', ',', 'World', '!']

Token tree:
[Token(kind='WORD', value='Hello', offset=0),
 Token(kind=',', value=',', offset=5),
 Token(kind='WORD', value='World', offset=7),
 Token(kind='!', value='!', offset=12)]

Benchmark

A benchmark comparing the CPU time of 10 JSON parsers, parsing a 276k bytes file.

$ env PYTHONPATH=. python3 examples/benchmarks/json/cpu.py
Parsed 'examples/benchmarks/json/data.json' 1 time(s) in:

PACKAGE         SECONDS   RATIO  VERSION
textparser         0.10    100%  0.14.0
lark (LALR)        0.26    265%  0.6.2
funcparserlib      0.34    358%  unknown
parsimonious       0.41    423%  unknown
textx              0.53    548%  1.7.1
pyparsing          0.69    715%  2.2.0
pyleri             0.81    836%  1.2.2
parsy              0.94    976%  1.2.0
lark (Earley)      1.88   1949%  0.6.2
parsita            2.31   2401%  unknown
$

NOTE 1: The parsers are not necessarily optimized for speed. Optimizing them will likely affect the measurements.

NOTE 2: The structure of the resulting parse trees varies and additional processing may be required to make them fit the user application.

NOTE 3: Only JSON parsers are compared. Parsing other languages may give vastly different results.

Contributing

  1. Fork the repository.

  2. Install prerequisites.

    pip install -r requirements.txt
  3. Implement the new feature or bug fix.

  4. Implement test case(s) to ensure that future changes do not break legacy.

  5. Run the tests.

    make test
  6. Create a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textparser-0.15.0.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

textparser-0.15.0-py2.py3-none-any.whl (9.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file textparser-0.15.0.tar.gz.

File metadata

  • Download URL: textparser-0.15.0.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.8.1 pkginfo/1.3.2 requests/2.18.3 setuptools/38.5.0 requests-toolbelt/0.7.0 clint/0.5.1 CPython/2.7.14 Linux/4.13.0-46-generic

File hashes

Hashes for textparser-0.15.0.tar.gz
Algorithm Hash digest
SHA256 835173490032f64681fb7edd315968fc281ea0f01d0fe9bedc76cc847dbb5792
MD5 5c0ed2da9e63ad5895688dd9c147965e
BLAKE2b-256 1e3a8a8e7be17305b2d32778efafbd7b40a83c040ca6fecddc8c1ac0a4027fcf

See more details on using hashes here.

File details

Details for the file textparser-0.15.0-py2.py3-none-any.whl.

File metadata

  • Download URL: textparser-0.15.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 9.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.8.1 pkginfo/1.3.2 requests/2.18.3 setuptools/38.5.0 requests-toolbelt/0.7.0 clint/0.5.1 CPython/2.7.14 Linux/4.13.0-46-generic

File hashes

Hashes for textparser-0.15.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 3dc730bacd60e106647516e3410138e5957b36d9cae0f31060575221d2710e8a
MD5 de019588b0adb8834fdd50778a2be941
BLAKE2b-256 b8f32f2f13211c80818e69e0dc0c2b916383e911c5ca821671926744566aecde

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page