Parse semi-structured text into Python dictionaries without well-defined schemas
Project description
🗂️ Tabularize
Purpose
Tabularize aids in the parsing of semi-structured data in a table-like format into Python dictionaries given minimal knowledge of the expected data format.
While packages such as csv, pandas, and TextFSM exist, they require the input data to be in a more structured form. For example, requiring clearly distinguishable delimiters, fixed column widths, or knowledge about the data to deduce the start and end of a column based on data types. Tabularize is designed for instances where there can be guess-work due to input data not following these constraints.
This package's design takes influence from the Name/Finger protocol due to its non-standardized, human-readable status reports that tend to give machines a harder time.
Tabularize is probably not the solution for you - that is, modern protocols are often machine-readable, or they offer a means to make it easily machine-readable. It shines when you need to parse semi-structured, tabular data where the schema is unknown (a situation you should avoid) or when you need tabular data parsed quickly.
Usage
Tabularize is offered as both an API for developers and a command-line tool. To install it:
python3 -m pip install tabularize
Command-Line Usage
The tabularize command is available upon installation. The command takes as a parameter a list of files, where it will
locate the first non-blank line of each one to determine headers then print out a JSON object for each later, parsed
entry. For example:
tabularize path-to-file path-to-another-file
Sometimes, automatic header detection may not function as expected when there is a degree of ambiguity since Tabularize only analyzes the singular header line, not the content, to derive column names. For example, given the following data:
Line User Host(s) Idle Location
1 vty 0 idle 00:00:05 192.168.1.1
* 2 vty 1 idle 00:00:00 192.168.1.2
By default, Tabularize will misinterpret the headers and assume that a Idle Location header exists rather than two
separate Idle and Location headers. Since Tabularize works sequentially, you can specify an Idle header, and it
will resolve the error without having to specify a Location header:
tabularize -H Idle path-to-finger-output
The tabularize command also supports piping. When piping is desired, use the file name -:
cat file-to-parse | tabularize -
Tabularize operates at the byte level; however, it prints out data as JSON, which does not support bytes. As a result,
it decodes the data before printing it to the terminal. You can customize the encoding and error resolution strategy
using the --encoding and --errors options:
tabularize --encoding utf-8 --errors backslashreplace path-to-file
API Usage
Programs integrating Tabularize will need to independently determine the appropriate line to extract headers from alongside body lines. The headers are then reused for body line parsing. For example:
import tabularize
data = b"""Name Ice Cream Preference
James Mint Chocolate Chip
""".splitlines()
headers = tabularize.parse_headers(
data[0]
)
for line in data[1:]:
print(tabularize.parse_body(headers, line))
Samples
Tabularize is particularly useful for parsing the Name/Finger Protocol given that the fingerd server implementation is
unknown due to its lack of standardization. However, if the server implementation is known, consider using a
regular expression-based solution instead such as TextFSM as the data types can
help indicate the start and end of output.
🐧 Debian fingerd
Login Name Tty Idle Login Time Office Office Phone
alfred *pts/0 1d Oct 06 19:56 (192.168.1.1)
bert pts/1 2d Oct 06 12:34 (:pts/0:S.0)
chase pts/2 3d Oct 06 05:43 (:pts/0:S.1)
[
{"Login": "alfred", "Tty": "*pts/0", "Idle": "1d", "Login Time": "Oct 06 19:56", "Office": "(192.168.1.1)"},
{"Login": "bert", "Tty": "pts/1", "Idle": "2d", "Login Time": "Oct 06 12:34", "Office": "(:pts/0:S.0)"},
{"Login": "chase", "Tty": "pts/2", "Idle": "3d", "Login Time": "Oct 06 05:43", "Office": "(:pts/0:S.1)"}
]
📡 Cisco fingerd
Line User Host(s) Idle Location
1 vty 0 idle 00:00:00
[
{"Line": "1 vty 0", "Host(s)": "idle", "Idle": "00:00:00"}
]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tabularize-0.0.3.tar.gz.
File metadata
- Download URL: tabularize-0.0.3.tar.gz
- Upload date:
- Size: 8.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ff8b36ff45e7eec36b6d64e7980876f8296e187df466c0822d5e351d0769001
|
|
| MD5 |
71c900cb722babeaf3ea8ebda6e26b2b
|
|
| BLAKE2b-256 |
ee130c43fc560fda9976bae45aac06712746cf8b8ed584c0270c6da657ebceea
|
Provenance
The following attestation bundles were made for tabularize-0.0.3.tar.gz:
Publisher:
python-publish.yml on Jayson-Fong/tabularize
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tabularize-0.0.3.tar.gz -
Subject digest:
8ff8b36ff45e7eec36b6d64e7980876f8296e187df466c0822d5e351d0769001 - Sigstore transparency entry: 590430623
- Sigstore integration time:
-
Permalink:
Jayson-Fong/tabularize@281900ad54c743f3d0c89c59d9bcda6471242aa6 -
Branch / Tag:
refs/tags/0.0.3 - Owner: https://github.com/Jayson-Fong
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@281900ad54c743f3d0c89c59d9bcda6471242aa6 -
Trigger Event:
release
-
Statement type:
File details
Details for the file tabularize-0.0.3-py3-none-any.whl.
File metadata
- Download URL: tabularize-0.0.3-py3-none-any.whl
- Upload date:
- Size: 4.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65bb4c25f1fc8301acbd1c0527e3ca0975fc7abab149980ed3e6231a1765f455
|
|
| MD5 |
119b5d7b0694cc0e43ddc902f273d44b
|
|
| BLAKE2b-256 |
079862627ab486d217ad5091de8fb3d77919b20a7c363cba3fe360b4fea5fc43
|
Provenance
The following attestation bundles were made for tabularize-0.0.3-py3-none-any.whl:
Publisher:
python-publish.yml on Jayson-Fong/tabularize
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tabularize-0.0.3-py3-none-any.whl -
Subject digest:
65bb4c25f1fc8301acbd1c0527e3ca0975fc7abab149980ed3e6231a1765f455 - Sigstore transparency entry: 590430664
- Sigstore integration time:
-
Permalink:
Jayson-Fong/tabularize@281900ad54c743f3d0c89c59d9bcda6471242aa6 -
Branch / Tag:
refs/tags/0.0.3 - Owner: https://github.com/Jayson-Fong
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@281900ad54c743f3d0c89c59d9bcda6471242aa6 -
Trigger Event:
release
-
Statement type: