Python parser for USFM files, based on tree-sitter-usfm3
Project description
USFM-Grammar
The python library that facilitates
- Parsing and validation of USFM files using
tree-sitter-usfm3
- Conversion of USFM files to other formats (USX, dict, list etc)
- Extraction of specific contents from USFM files like scripture alone(clean verses), notes (footnotes, cross-refs) etc
Built on python 3.10
Installation
pip install usfm-grammar
This requires a C compiler. On Windows, Microsoft Visual C++ 14.0 or above is required.
It is recommended that you update pip
, setuptools
and wheel
.
Usage
By importing library in Python code
from usfm_grammar import USFMParser, Filter
# input_usfm_str = open("sample.usfm","r", encoding='utf8').read()
input_usfm_str = '''
\\id GEN
\\c 1
\\p
\\v 1 test verse
'''
my_parser = USFMParser(input_usfm_str)
errors = my_parser.errors
print(errors)
To convert to USX
from lxml import etree
usx_elem = my_parser.to_usx() # default filter=ALL
print(etree.tostring(usx_elem, encoding="unicode", pretty_print=True))
To convert to Dict
output = my_parser.to_usj() # default all markers
#output = my_parser.to_usj([Filter.SCRIPTURE_TEXT])
#output = my_parser.to_usj([Filter.NOTES])
#output = my_parser.to_usj([Filter.NOTES, Filter.ATTRIBUTES])
#output = my_parser.to_usj([Filter.SCRIPTURE_TEXT, Filter.TITLES, Filter.PARAGRAPHS)
print(output)
To save as json
import json
dict_output = my_parser.to_usj()
with open("file_path.json", "w", encoding='utf-8') as fp:
json.dump(dict_output, fp)
To convert to List or table like format
list_output = my_parser.to_list()
#list_output = my_parser.to_list([Filter.SCRIPTURE_TEXT])
table_output = "\n".join(["\t".join(row) for row in list_output])
print(table_output)
From CLI
usage: usfm-grammar [-h] [--format {json,table,syntax-tree,usx,markdown}]
[--filter {book_headers,paragraphs,titles,scripture_text,notes,attributes,milestones,study_bible}]
[--csv_col_sep CSV_COL_SEP] [--csv_row_sep CSV_ROW_SEP]
infile
Uses the tree-sitter-usfm grammar to parse and convert USFM to "+ "Syntax-tree, JSON, CSV, USX etc.
positional arguments:
infile input usfm file
options:
-h, --help show this help message and exit
--format {json,table,syntax-tree,usx,markdown}
output format
--filter {book_headers,paragraphs,titles,scripture_text,notes,attributes,milestones,study_bible}
the type of contents to be included
--csv_col_sep CSV_COL_SEP
column separator or delimiter. Only useful with format=table.
--csv_row_sep CSV_ROW_SEP
row separator or delimiter. Only useful with format=table.
Example
>>> python3 -m usfm_grammar sample.usfm --format usx
>>> usfm-grammar sample.usfm --format usx
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distributions
Close
Hashes for usfm_grammar-3.0.0a6-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 34eb16dd977c8638d96e85fae333b1d68d02c4ef7ac45936efd01e562efd8190 |
|
MD5 | 85786ca31459b0069c304eb6f9b3c707 |
|
BLAKE2b-256 | 2c6c5f595a2b9244296151fc7cc30b27ef4e03a55babdc56ee1da28d7a074161 |
Close
Hashes for usfm_grammar-3.0.0a6-cp311-cp311-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d1caefb86eb9759f7ce688662618b22a0a849eac2c668e3dfe65434f763512a |
|
MD5 | ea1f5ac77cdcb9472cd5982a57e21aa8 |
|
BLAKE2b-256 | 17bbe59c239166dcdfd1a6d0c1cad3599632e34158be2aaa5133fa47318826a4 |
Close
Hashes for usfm_grammar-3.0.0a6-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2df979ab382fde8225ef8862d2c6035fbcf23470a38ee8553ff40d234c7f3275 |
|
MD5 | 51d2b6448e3fd7f0abac7e59fcc60e8a |
|
BLAKE2b-256 | 787540268531c4d64af8e34b58b3c2ebeb19e09d9b028b2786bbe3b8266e8deb |
Close
Hashes for usfm_grammar-3.0.0a6-cp311-cp311-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e4785f3abec934b129401807ea281d7b104e605723db24586e4cb8729bb73f09 |
|
MD5 | a0fd3d94ee85540b9864652183dc83dd |
|
BLAKE2b-256 | 62217dce17b9a11db108dc5207531ae2adcab7be5f0b3e8ee4bd6fe52f649d03 |
Close
Hashes for usfm_grammar-3.0.0a6-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf78be1232b831a7ce215c852ee2caf0bd335f53e2d781126845d8ea73bb1557 |
|
MD5 | b9b9649bd7c7a29bb2f6fc67c5a9efa2 |
|
BLAKE2b-256 | c5fbcefe2f976ea85300d1e770cefcf24e058e33784e1b102c30c33a7cd3e8eb |
Close
Hashes for usfm_grammar-3.0.0a6-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5fc1066afd9c50bf5531fee0f39df9a646d438ce14f2549bdc3ad4033d0fe6ea |
|
MD5 | a4a9f47fb74729e50330048d626cba02 |
|
BLAKE2b-256 | bcecfce0825db71c798236ccaa5d75dc76e8ccc3a43503282144665ab9fb690a |
Close
Hashes for usfm_grammar-3.0.0a6-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c3796c7b7f162874e2062073816f8116db19df80c11a190fc6d7e8e8551ba30d |
|
MD5 | 2ebb028aef3fea01c4f233021615f7e5 |
|
BLAKE2b-256 | 71a15a349885954165842048781a873ebf37eb7a1b2610b1fe41744b265f1a98 |
Close
Hashes for usfm_grammar-3.0.0a6-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f082f8e10bed236527b6f20e4f54af03966589085cf6cebbf4c302aab1ca77bc |
|
MD5 | 42f9f6dcbcfe29c7fe4f77a31bd9b1af |
|
BLAKE2b-256 | 9dc6470b63bde53756d4b2b533e96721c3c56f2cb65e394cc6f1b5d78cdcc390 |
Close
Hashes for usfm_grammar-3.0.0a6-cp310-cp310-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b34237afaeeba8df15005d4278c55289a7e0df7ba88b6d4ad0fdb986f0a95da0 |
|
MD5 | b0c261f88fb91dfde0d1a2da49a156d0 |
|
BLAKE2b-256 | 6c5440748b6a6640e104733e76e82db7b6962281e0d4ca109ac5b73ab13358fb |
Close
Hashes for usfm_grammar-3.0.0a6-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c5aaafc49dd74d48c4ab0ed79a1b2b70467505a365e929d5fc3964af30531bfa |
|
MD5 | 93d8c2dab3d53f02cb02556eb8b71b9f |
|
BLAKE2b-256 | edb3fc26049824c36e9045d45f6a10f17d12e6b5816383f31b09cfceccd548d0 |
Close
Hashes for usfm_grammar-3.0.0a6-cp310-cp310-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 81a7b5c1b38c2ad5cfa76486dd5dca7410c44e796467f402e801bd3e425bc0fc |
|
MD5 | e271d94b9dbdc8baaebf15517c668fe9 |
|
BLAKE2b-256 | b4d33b4f4c337aaec8779d7b65385d7c64e421ebf95aea1e13fd1789bf530f59 |
Close
Hashes for usfm_grammar-3.0.0a6-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 274bf780e0082fc00b9269b99aa48889f5dd1f48bb0103f1296f55742402a200 |
|
MD5 | f865b0c37e065cbaa7915800a00fed92 |
|
BLAKE2b-256 | 93a7891b936492fc359176c544943d06ab5bc7a8e6e021d5099d549296bde901 |
Close
Hashes for usfm_grammar-3.0.0a6-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e9bbf1f158c4465188355fdbf31e818d8ee624087521b11fb8a0e82dfa26f01c |
|
MD5 | b5402cca3213153e4efb8acdcca10b23 |
|
BLAKE2b-256 | fbd52906dfa6e47b500e8cd71f1402369f319354f1a12ed4ffef1811b93fb25c |
Close
Hashes for usfm_grammar-3.0.0a6-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4f2b40192846c62de2ed0a43927e7dbd3dbd02de4f14afd5839e4af3c159d477 |
|
MD5 | dbdd43ff6fecb19e53394e14ed8495ce |
|
BLAKE2b-256 | f750b3c7b1f1c0f58c766fbfe6488578ac0938d7a88bb766350f3c9f5762a2b0 |