Skip to main content

Structure semi-structured text

Project description

Structures semi-structured text, useful when parsing command line output from networking devices.

What is it

If you’re reading this you’ve probably been tasked with programmatically retrieving information from a CLI driven device and you’ve got to the point
where you have a nice string of text and say to yourself, “wow I wish it just returned something structured that I could deal with like JSON or some other key/value format”.

Well that’s where structifytext tries to help. It lets you define the payload you wish came back to you, and with a sprinkle of the right regular expressions it does!

Installation

With pip:

pip install structifytext

From source

make install

Usage

Pass your text and a “structure” (python dictionary) to the parser modules parse method.

from structifytext import parser

output = """
  eth0      Link encap:Ethernet  HWaddr 00:11:22:3a:c4:ac
            inet addr:192.168.1.2  Bcast:192.168.1.255  Mask:255.255.255.0
            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
            RX packets:147142475 errors:0 dropped:293854 overruns:0 frame:0
            TX packets:136237118 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:1000
            RX bytes:17793317674 (17.7 GB)  TX bytes:46525697959 (46.5 GB)

  eth1      Link encap:Ethernet  HWaddr 00:11:33:4a:c8:ad
            inet addr:192.168.1.3  Bcast:192.168.1.255  Mask:255.255.255.0
            inet6 addr: fe80::225:90ff:fe4a:c8ad/64 Scope:Link
            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
            RX packets:51085118 errors:0 dropped:251 overruns:0 frame:0
            TX packets:3447162 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:1000
            RX bytes:4999277179 (4.9 GB)  TX bytes:657283496 (657.2 MB)
  """

struct = {
        'interfaces': [{
            'id': '(eth\d{1,2})',
            'ipv4_address': 'inet addr:(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})',
            'mac_address': 'HWaddr\s((?:[a-fA-F0-9]{2}[:|\-]?){6})'
          }]
       }

parsed = parser.parse(output, struct)
print parsed

This will return the python dictionary

{
  'interfaces': [
      {
          'id': 'eth0',
          'ipv4_address': '192.168.1.2',
          'mac_address': '00:11:22:3a:c4:ac'
      },
      {
          'id': 'eth1',
          'ipv4_address': '192.168.1.3',
          'mac_address': '00:11:33:4a:c8:ad'
      }
  ]
}

Which you can then do with as you please, maybe return as JSON as part of a REST service…

The Struct

A stuct or structure or payload or whatever have you, is just a dictionary that resembles what you wish to get back.
With the values either being a dictionary {}, a list [], or a regular expression string [a-z](\d) with one group (to populate the value).

The structure is recursively parsed, populating the dictionary/structure that was provided with values from the input text.

Quite often, similar sections of semi-structured text are repeated in the text you are trying to parse.
To parse these sections of text, we define a dictionary with key of either id or block_start the difference being block_start key/value is dropped from the resulting output.
This id or block_start marks the beginning and end for each “chunk” that you’d like parsed.
You can forcefully mark the end of a “chunk” by specifying a block_end key and regex value.

An example is useful here.

E.g. The following structure.

{
        'tables': [
            {
                'id': '\[TABLE (\d{1,2})\]',
                'flows': [
                    {
                        'id': '\[FLOW_ID(\d+)\]',
                        'info': 'info\s+=\s+(.*)'
                    }
                ]
            }
        ]
    }

Will create a “chunk/block” from the following output

[TABLE 0] Total entries: 3
    [FLOW_ID1]
    info = related to table 0 flow 1
[TABLE 1] Total entries: 31
    [FLOW_ID1]
    info = related to table 1 flow 1

That will be parsed as:

{
    'tables': [{
        'id': '0',
        'flows': [{ 'id': '1', 'info': 'related to table 0 flow 1' }],
        }, {
        'id': '1',
        'flows': [{ 'id': '1', 'info': 'related to table 1 flow 1' }]
    }]
}

See under tests/test_parser_api.py for more usage examples.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
structifytext-0.2.1-py2.py3-none-any.whl (7.9 kB) Copy SHA256 hash SHA256 Wheel py2.py3
structifytext-0.2.1.tar.gz (8.2 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page