Template semi-structure text parser
Project description
Templates Semi-Structured Text Parser
tsstp is a Python module, the Template semi-structure text parser, which allows custom parsing of semi-structured text data using templates. tsstp was developed to enable programmatic access to semi-structured textual data generated by computational software in computational materials science, but it can be used to parse any semi-structured text that contains unique repetitive patterns, and can also be extended to other texts with special patterns and textual notation.
Unlike regular file-specific parsers that require only input data, tsstp takes two files as input - the data to be parsed and the parsing template - and returns a result structure with extracted information.
The same data can be parsed by different templates to produce results corresponding to the template. Templates are easy to create and users can define templates to extract data according to their needs and are encouraged to write their own ttp templates,through this way to achieve simple data reuse, catering to the FAIR principle.
install
pip install tsstp
how to use
from tsstp import DataTemplate
data_to_parse = """
POSCAR\(4)
3
1.00000000000000
8.3879995346000005 0.0000000000000000 0.0000000000000000
0.0000000000000000 8.3879995346000005 0.0000000000000000
0.0000000000000000 0.0000000000000000 23.0000000000000000
O Fe Ni
50 33 1
Direct configuration= 1
"""
template = """
{{ head }}
{{ loop_num }}
{{ Scaling }}
{{ Coordinates1 }} ~ loop_num
{{ Coordinates2 }} ~ 3
{{ Coordinates3 }} ~ 3
{{ elements }} ~ n
{{ elements_num }} ~ n
Direct configuration= {{ number }}
"""
# create parser object and parse data using template:
parser = DataTemplate(data=data_to_parse, template=template)
parser.parse()
# print result in JSON format
results = parser.result(format='json')
print(results)
{
"head": "POSCAR\(4)",
"loop_num": "3",
"Scaling": "1.00000000000000",
"Coordinates1": ["8.3879995346000005", "0.0000000000000000","0.0000000000000000"],
"Coordinates2": ["0.0000000000000000", "8.3879995346000005", "0.0000000000000000"],
"Coordinates3": ["0.0000000000000000", "0.0000000000000000", "23.0000000000000000"],
"elements": ["O", "Fe", "Ni"],
"elements_num": ["50", "33", "1"],
"number": "1"
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file tsstp-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: tsstp-0.2.0-py3-none-any.whl
- Upload date:
- Size: 32.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c7cc7f08f4bacd61b780d0fcfd21ad7e36e34e5ad1b95e02ca60a29422d8b4c1 |
|
MD5 | f41c05756ee7230a55ff7c3b2ede6c89 |
|
BLAKE2b-256 | 08439ca68de72603b774cce8e04d4cb255e0b1898c34e1a46a7577c0c796aaba |