Python parser for USFM files, based on tree-sitter-usfm
Project description
USFM-Grammar
The python library that facilitates
- Parsing and validation of USFM files using
tree-sitter-usfm
- Conversion of USFM files to other formats (USX, dict, list etc)
- Extraction of specific contents from USFM files like scripture alone(clean verses), notes (footnotes, cross-refs) etc
Built on python 3.10
Installation
pip install usfm-grammar
Usage
By importing library in Python code
from usfm_grammar import USFMParser, Filter
# input_usfm_str = open("sample.usfm","r", encoding='utf8').read()
input_usfm_str = '''
\\id GEN
\\c 1
\\p
\\v 1 test verse
'''
my_parser = USFMParser(input_usfm_str)
errors = my_parser.errors
print(errors)
To convert to USX
from lxml import etree
usx_elem = my_parser.to_usx() # default filter=ALL
print(etree.tostring(usx_elem, encoding="unicode", pretty_print=True))
To convert to Dict
output = my_parser.to_dict() # default filter=SCRIPTURE_BCV
#output = my_parser.to_dict(Filter.ALL)
#output = my_parser.to_dict(Filter.NOTES)
#output = my_parser.to_dict(Filter.NOTES_TEXT)
#output = my_parser.to_dict(Filter.SCRIPTURE_PARAGRAPH)
print(output)
To save as json
import json
dict_output = my_parser.to_dict()
with open("file_path.json", "w", encoding='utf-8') as fp:
json.dump(dict_output, fp)
To convert to List or table like format
list_output = my_parser.to_list()
#list_output = my_parser.to_list(Filter.NOTES)
table_output = "\n".join(["\t".join(row) for row in list_output])
print(table_output)
From CLI
usage: usfm-grammar [-h] [--format {json,table,usx,markdown,syntax-tree}]
[--filter {scripture-bcv,notes,scripture-paragraph,all}]
[--csv_col_sep CSV_COL_SEP] [--csv_row_sep CSV_ROW_SEP]
infile
Uses the tree-sitter-usfm grammar to parse and convert USFM to Syntax-tree,
JSON, CSV, USX etc.
positional arguments:
infile input usfm file
options:
-h, --help show this help message and exit
--format {json,table,usx,markdown,syntax-tree}
output format
--filter {scripture-bcv,notes,scripture-paragraph,all}
the type of contents to be included
--csv_col_sep CSV_COL_SEP
column separator or delimiter. Only useful with
format=table.
--csv_row_sep CSV_ROW_SEP
row separator or delimiter. Only useful with
format=table.
Example
>>> python3 -m usfm_grammar sample.usfm --format usx
>>> usfm-grammar sample.usfm --format usx
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
usfm-grammar-3.0.0a3.tar.gz
(264.7 kB
view details)
Built Distribution
File details
Details for the file usfm-grammar-3.0.0a3.tar.gz
.
File metadata
- Download URL: usfm-grammar-3.0.0a3.tar.gz
- Upload date:
- Size: 264.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8e91069f43a428bdc3e79dffa7a38655aafe8cf077f88bb6ea5491059f9a259e |
|
MD5 | d8780b21b923779e3cdb8b47518e4263 |
|
BLAKE2b-256 | 5b47cafe1dd85a3e7b3c78ac32d89db7aeddf98513ca18198d2db9e9f96fef43 |
Provenance
File details
Details for the file usfm_grammar-3.0.0a3-py3-none-any.whl
.
File metadata
- Download URL: usfm_grammar-3.0.0a3-py3-none-any.whl
- Upload date:
- Size: 267.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b14a7a0bf606bda60b93367c5eb3519f232e0b83e2a3de20c778b10abe56c6b9 |
|
MD5 | a4c3c701fe523e1c11f225822728b276 |
|
BLAKE2b-256 | b398dfb5599eef10af1262b3b005f02cf1532bf80d359ad073c7c79b784255c6 |