Skip to main content

Text parser.

Project description

About

A text parser written in the Python language.

The project has one goal, speed! See the benchmark below more details.

Project homepage: https://github.com/eerimoq/textparser

Documentation: http://textparser.readthedocs.org/en/latest

Credits

  • Thanks PyParsing for a user friendly interface. Many of textparser’s class names are taken from this project.

Installation

pip install textparser

Example usage

The Hello World example parses the string Hello, World! and outputs its parse tree ['Hello', ',', 'World', '!'].

The script:

import textparser
from textparser import Sequence


class Parser(textparser.Parser):

    def token_specs(self):
        return [
            ('SKIP',          r'[ \r\n\t]+'),
            ('WORD',          r'\w+'),
            ('EMARK',    '!', r'!'),
            ('COMMA',    ',', r','),
            ('MISMATCH',      r'.')
        ]

    def grammar(self):
        return Sequence('WORD', ',', 'WORD', '!')


tree = Parser().parse('Hello, World!')

print('Tree:', tree)

Script execution:

$ env PYTHONPATH=. python3 examples/hello_world.py
Tree: ['Hello', ',', 'World', '!']

Benchmark

A benchmark comparing the speed of 10 JSON parsers, parsing a 276 kb file.

$ env PYTHONPATH=. python3 examples/benchmarks/json/speed.py

Parsed 'examples/benchmarks/json/data.json' 1 time(s) in:

PACKAGE         SECONDS   RATIO  VERSION
textparser         0.10    100%  0.21.1
parsimonious       0.17    169%  unknown
lark (LALR)        0.27    267%  0.7.0
funcparserlib      0.34    340%  unknown
textx              0.54    546%  1.8.0
pyparsing          0.68    684%  2.4.0
pyleri             0.88    886%  1.2.2
parsy              0.92    925%  1.2.0
parsita            2.28   2286%  unknown
lark (Earley)      2.34   2348%  0.7.0

NOTE 1: The parsers are not necessarily optimized for speed. Optimizing them will likely affect the measurements.

NOTE 2: The structure of the resulting parse trees varies and additional processing may be required to make them fit the user application.

NOTE 3: Only JSON parsers are compared. Parsing other languages may give vastly different results.

Contributing

  1. Fork the repository.

  2. Implement the new feature or bug fix.

  3. Implement test case(s) to ensure that future changes do not break legacy.

  4. Run the tests.

    python3 -m unittest
  5. Create a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textparser-0.24.0.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

textparser-0.24.0-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file textparser-0.24.0.tar.gz.

File metadata

  • Download URL: textparser-0.24.0.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for textparser-0.24.0.tar.gz
Algorithm Hash digest
SHA256 56f708e75aa9d002adb76d823ba6ef166d7ecec1e3e4ca4c1ca103f817568335
MD5 5c02de66ceee040ea9f115c5e225b96a
BLAKE2b-256 64906a829d064411788144dbc5567c0d95e7d0403ad3c372dc9a4b3ea202e26b

See more details on using hashes here.

File details

Details for the file textparser-0.24.0-py3-none-any.whl.

File metadata

  • Download URL: textparser-0.24.0-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for textparser-0.24.0-py3-none-any.whl
Algorithm Hash digest
SHA256 379d25cdb21332f403bfa37b9ef11192b7796340d2602d88fc9246bfdba2a1cf
MD5 2c00353f19e78e2dc088c0bb17a8395f
BLAKE2b-256 68f9a9e18dea98b73f24b5575d742a9ad6d7db0762973429e3c36353dad4dd0d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page