Skip to main content

Template semi-structure text parser

Project description

Templates Semi-Structured Text Parser

tsstp is a Python module, the Template semi-structure text parser, which allows custom parsing of semi-structured text data using templates. tsstp was developed to enable programmatic access to semi-structured textual data generated by computational software in computational materials science, but it can be used to parse any semi-structured text that contains unique repetitive patterns, and can also be extended to other texts with special patterns and textual notation.

Unlike regular file-specific parsers that require only input data, tsstp takes two files as input - the data to be parsed and the parsing template - and returns a result structure with extracted information.

The same data can be parsed by different templates to produce results corresponding to the template. Templates are easy to create and users can define templates to extract data according to their needs and are encouraged to write their own ttp templates,through this way to achieve simple data reuse, catering to the FAIR principle.

install

pip install tsstp

how to use

from tsstp import DataTemplate

data_to_parse = """
POSCAR\(4)
3
1.00000000000000
 8.3879995346000005    0.0000000000000000    0.0000000000000000
 0.0000000000000000    8.3879995346000005    0.0000000000000000
 0.0000000000000000    0.0000000000000000   23.0000000000000000
O    Fe   Ni
50    33     1
Direct configuration= 1
"""

template = """
{{ head }}
{{ loop_num }}
{{ Scaling }}
{{ Coordinates1 }} ~ loop_num
{{ Coordinates2 }} ~ 3
{{ Coordinates3 }} ~ 3
{{ elements }} ~ n
{{ elements_num }} ~ n
Direct configuration= {{ number }}
"""

# create parser object and parse data using template:
parser = DataTemplate(data=data_to_parse, template=template)
parser.parse()

# print result in JSON format
results = parser.result(format='json')
print(results)
{
"head": "POSCAR\(4)",
"loop_num": "3",
"Scaling": "1.00000000000000",
"Coordinates1": ["8.3879995346000005", "0.0000000000000000","0.0000000000000000"],
"Coordinates2": ["0.0000000000000000", "8.3879995346000005", "0.0000000000000000"],
"Coordinates3": ["0.0000000000000000", "0.0000000000000000", "23.0000000000000000"],
"elements": ["O", "Fe", "Ni"],
"elements_num": ["50", "33", "1"],
"number": "1"
 }

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

tsstp-0.2.0-py3-none-any.whl (32.7 kB view details)

Uploaded Python 3

File details

Details for the file tsstp-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: tsstp-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 32.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for tsstp-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c7cc7f08f4bacd61b780d0fcfd21ad7e36e34e5ad1b95e02ca60a29422d8b4c1
MD5 f41c05756ee7230a55ff7c3b2ede6c89
BLAKE2b-256 08439ca68de72603b774cce8e04d4cb255e0b1898c34e1a46a7577c0c796aaba

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page