Skip to main content

Template semi-structure text parser

Project description

Templates Semi-Structured Text Parser

tsstp is a Python module, the Template semi-structure text parser, which allows custom parsing of semi-structured text data using templates. tsstp was developed to enable programmatic access to semi-structured textual data generated by computational software in computational materials science, but it can be used to parse any semi-structured text that contains unique repetitive patterns, and can also be extended to other texts with special patterns and textual notation.

Unlike regular file-specific parsers that require only input data, tsstp takes two files as input - the data to be parsed and the parsing template - and returns a result structure with extracted information.

The same data can be parsed by different templates to produce results corresponding to the template. Templates are easy to create and users can define templates to extract data according to their needs and are encouraged to write their own ttp templates,through this way to achieve simple data reuse, catering to the FAIR principle.

install

pip install tsstp

how to use

from tsstp import DataTemplate

data_to_parse = """
POSCAR\(4)
3
1.00000000000000
 8.3879995346000005    0.0000000000000000    0.0000000000000000
 0.0000000000000000    8.3879995346000005    0.0000000000000000
 0.0000000000000000    0.0000000000000000   23.0000000000000000
O    Fe   Ni
50    33     1
Direct configuration= 1
"""

template = """
{{ head }}
{{ loop_num }}
{{ Scaling }}
{{ Coordinates1 }} ~ loop_num
{{ Coordinates2 }} ~ 3
{{ Coordinates3 }} ~ 3
{{ elements }} ~ n
{{ elements_num }} ~ n
Direct configuration= {{ number }}
"""

# create parser object and parse data using template:
parser = DataTemplate(data=data_to_parse, template=template)
parser.parse()

# print result in JSON format
results = parser.result(format='json')
print(results)
{
"head": "POSCAR\(4)",
"loop_num": "3",
"Scaling": "1.00000000000000",
"Coordinates1": ["8.3879995346000005", "0.0000000000000000","0.0000000000000000"],
"Coordinates2": ["0.0000000000000000", "8.3879995346000005", "0.0000000000000000"],
"Coordinates3": ["0.0000000000000000", "0.0000000000000000", "23.0000000000000000"],
"elements": ["O", "Fe", "Ni"],
"elements_num": ["50", "33", "1"],
"number": "1"
 }

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

tsstp-0.2.0-py3-none-any.whl (32.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page