structifytext

Structure semi-structured text

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Structures semi-structured text, useful when parsing command line output from networking devices.

What is it

If you’re reading this you’ve probably been tasked with programmatically retrieving information from a CLI driven device and you’ve got to the point

where you have a nice string of text and say to yourself, “wow I wish it just returned something structured that I could deal with like JSON or some other key/value format”.

Well that’s where structifytext tries to help. It lets you define the payload you wish came back to you, and with a sprinkle of the right regular expressions it does!

Usage

At less than 100 lines of code it’s quite simple. The parse_struct method expects a “structure” and an output string converted to a list (I found the easiest way to do this is to use StringIO.readlines()).

The Struct

A stuct or structure or payload or whatever have you, is just a dictionary that resembles what you wish to get back.

With the values either being a dictionary {}, a list [], or a regular expression string [a-z](\d) with one group (to populate the value).

The structure is recursively parsed, to populate the dictionary/structure that was provided with values from the input string list.

Quite often, similar sections of semi-structured text are repeated in
the text you are trying to parse.
To parse these sections of text, we define a dictionary with key of
either id or block_start the difference being block_start
key/value is dropped from the resulting output.
This id or block_start marks the beginning and end for each
“chunk” that you’d like parsed.
You can forcefully mark the end of a “chunk” by specifying a
block_end key and regex value.

An example is useful here.

E.g. The following structure.

{
        'tables': [
            {
                'id': '\[TABLE (\d{1,2})\]',
                'flows': [
                    {
                        'id': '\[FLOW_ID(\d+)\]',
                        'info': 'info\s+=\s+(.*)'
                    }
                ]
            }
        ]
    }

Will create a “chunk/block” from the following output

[TABLE 0] Total entries: 3
    [FLOW_ID1]
    info = related to table 0 flow 1
[TABLE 1] Total entries: 31
    [FLOW_ID1]
    info = related to table 1 flow 1

That will be parsed as:

{
    'tables': [{
        'id': '0',
        'flows': [{ 'id': '1', 'info': 'related to table 0 flow 1' }],
        }, {
        'id': '1',
        'flows': [{ 'id': '1', 'info': 'related to table 1 flow 1' }]
    }]
}

See under tests/test_parser_api.py for more usage examples.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.2.1

Apr 28, 2017

This version

0.2.0

Apr 24, 2017

0.1.1

Apr 24, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

structifytext-0.2.0.tar.gz (7.6 kB view hashes)

Uploaded Apr 24, 2017 Source

Built Distribution

structifytext-0.2.0-py2.py3-none-any.whl (7.9 kB view hashes)

Uploaded Apr 24, 2017 Python 2 Python 3

Hashes for structifytext-0.2.0.tar.gz

Hashes for structifytext-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`fcbaba8be8b033b347948579bd1c0242d039935617439b74314e11807bd92dc3`
MD5	`adcd1040857b1487dcaddf0532aab0fc`
BLAKE2b-256	`421e9c79b82430956cfcadfdca89c2745b30c68823de131d28f503a7d7065c3e`

Hashes for structifytext-0.2.0-py2.py3-none-any.whl

Hashes for structifytext-0.2.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`2a5a543f7e25f7109dbaeb78aa46b5e40b99ae77856711d0fcaff2ba3497697f`
MD5	`15d381b6bb1f7948fac2e2793de3d6ab`
BLAKE2b-256	`3b6e80b5b3351424d34dc212002ee18c540750a04e2e9b672897f0ec40213f90`