Skip to main content

Python parser for USFM files, based on tree-sitter-usfm3

Project description

USFM-Grammar

The python library that facilitates

  • Parsing and validation of USFM files using tree-sitter-usfm3
  • Conversion of USFM files to other formats (USX, dict, list etc)
  • Extraction of specific contents from USFM files like scripture alone(clean verses), notes (footnotes, cross-refs) etc

Built on python 3.10

Installation

pip install usfm-grammar

This requires a C compiler. On Windows, Microsoft Visual C++ 14.0 or above is required. It is recommended that you update pip, setuptools and wheel.

Usage

By importing library in Python code

from usfm_grammar import USFMParser, Filter

# input_usfm_str = open("sample.usfm","r", encoding='utf8').read()
input_usfm_str = '''
\\id GEN
\\c 1
\\p
\\v 1 test verse
'''

my_parser = USFMParser(input_usfm_str)

errors = my_parser.errors
print(errors)

To convert to USX

from lxml import etree

usx_elem = my_parser.to_usx() # default filter=ALL
print(etree.tostring(usx_elem, encoding="unicode", pretty_print=True))

To convert to Dict

output = my_parser.to_usj() # default all markers
#output = my_parser.to_usj([Filter.SCRIPTURE_TEXT])
#output = my_parser.to_usj([Filter.NOTES])
#output = my_parser.to_usj([Filter.NOTES, Filter.ATTRIBUTES])
#output = my_parser.to_usj([Filter.SCRIPTURE_TEXT, Filter.TITLES, Filter.PARAGRAPHS)

print(output)

To save as json

import json
dict_output = my_parser.to_usj()
with open("file_path.json", "w", encoding='utf-8') as fp:
	json.dump(dict_output, fp)

To convert to List or table like format

list_output = my_parser.to_list() 
#list_output = my_parser.to_list([Filter.SCRIPTURE_TEXT])

table_output = "\n".join(["\t".join(row) for row in list_output])
print(table_output)

From CLI

usage: usfm-grammar [-h] [--format {json,table,syntax-tree,usx,markdown}]
                    [--filter {book_headers,paragraphs,titles,scripture_text,notes,attributes,milestones,study_bible}]
                    [--csv_col_sep CSV_COL_SEP] [--csv_row_sep CSV_ROW_SEP]
                    infile

Uses the tree-sitter-usfm grammar to parse and convert USFM to "+ "Syntax-tree, JSON, CSV, USX etc.

positional arguments:
  infile                input usfm file

options:
  -h, --help            show this help message and exit
  --format {json,table,syntax-tree,usx,markdown}
                        output format
  --filter {book_headers,paragraphs,titles,scripture_text,notes,attributes,milestones,study_bible}
                        the type of contents to be included
  --csv_col_sep CSV_COL_SEP
                        column separator or delimiter. Only useful with format=table.
  --csv_row_sep CSV_ROW_SEP
                        row separator or delimiter. Only useful with format=table.

Example

>>> python3 -m usfm_grammar sample.usfm --format usx

>>> usfm-grammar sample.usfm --format usx

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

usfm_grammar-3.0.0a6-cp311-cp311-win_amd64.whl (166.0 kB view details)

Uploaded CPython 3.11 Windows x86-64

usfm_grammar-3.0.0a6-cp311-cp311-win32.whl (168.7 kB view details)

Uploaded CPython 3.11 Windows x86

usfm_grammar-3.0.0a6-cp311-cp311-musllinux_1_1_x86_64.whl (166.2 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

usfm_grammar-3.0.0a6-cp311-cp311-musllinux_1_1_i686.whl (174.6 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ i686

usfm_grammar-3.0.0a6-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (165.9 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

usfm_grammar-3.0.0a6-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (174.4 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

usfm_grammar-3.0.0a6-cp311-cp311-macosx_10_9_x86_64.whl (159.2 kB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

usfm_grammar-3.0.0a6-cp310-cp310-win_amd64.whl (166.0 kB view details)

Uploaded CPython 3.10 Windows x86-64

usfm_grammar-3.0.0a6-cp310-cp310-win32.whl (168.7 kB view details)

Uploaded CPython 3.10 Windows x86

usfm_grammar-3.0.0a6-cp310-cp310-musllinux_1_1_x86_64.whl (166.2 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

usfm_grammar-3.0.0a6-cp310-cp310-musllinux_1_1_i686.whl (174.6 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ i686

usfm_grammar-3.0.0a6-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (165.9 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

usfm_grammar-3.0.0a6-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (174.4 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

usfm_grammar-3.0.0a6-cp310-cp310-macosx_10_9_x86_64.whl (159.2 kB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

File details

Details for the file usfm_grammar-3.0.0a6-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a6-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 34eb16dd977c8638d96e85fae333b1d68d02c4ef7ac45936efd01e562efd8190
MD5 85786ca31459b0069c304eb6f9b3c707
BLAKE2b-256 2c6c5f595a2b9244296151fc7cc30b27ef4e03a55babdc56ee1da28d7a074161

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a6-cp311-cp311-win32.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a6-cp311-cp311-win32.whl
Algorithm Hash digest
SHA256 9d1caefb86eb9759f7ce688662618b22a0a849eac2c668e3dfe65434f763512a
MD5 ea1f5ac77cdcb9472cd5982a57e21aa8
BLAKE2b-256 17bbe59c239166dcdfd1a6d0c1cad3599632e34158be2aaa5133fa47318826a4

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a6-cp311-cp311-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a6-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 2df979ab382fde8225ef8862d2c6035fbcf23470a38ee8553ff40d234c7f3275
MD5 51d2b6448e3fd7f0abac7e59fcc60e8a
BLAKE2b-256 787540268531c4d64af8e34b58b3c2ebeb19e09d9b028b2786bbe3b8266e8deb

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a6-cp311-cp311-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a6-cp311-cp311-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 e4785f3abec934b129401807ea281d7b104e605723db24586e4cb8729bb73f09
MD5 a0fd3d94ee85540b9864652183dc83dd
BLAKE2b-256 62217dce17b9a11db108dc5207531ae2adcab7be5f0b3e8ee4bd6fe52f649d03

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a6-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a6-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bf78be1232b831a7ce215c852ee2caf0bd335f53e2d781126845d8ea73bb1557
MD5 b9b9649bd7c7a29bb2f6fc67c5a9efa2
BLAKE2b-256 c5fbcefe2f976ea85300d1e770cefcf24e058e33784e1b102c30c33a7cd3e8eb

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a6-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a6-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 5fc1066afd9c50bf5531fee0f39df9a646d438ce14f2549bdc3ad4033d0fe6ea
MD5 a4a9f47fb74729e50330048d626cba02
BLAKE2b-256 bcecfce0825db71c798236ccaa5d75dc76e8ccc3a43503282144665ab9fb690a

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a6-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a6-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 c3796c7b7f162874e2062073816f8116db19df80c11a190fc6d7e8e8551ba30d
MD5 2ebb028aef3fea01c4f233021615f7e5
BLAKE2b-256 71a15a349885954165842048781a873ebf37eb7a1b2610b1fe41744b265f1a98

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a6-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a6-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 f082f8e10bed236527b6f20e4f54af03966589085cf6cebbf4c302aab1ca77bc
MD5 42f9f6dcbcfe29c7fe4f77a31bd9b1af
BLAKE2b-256 9dc6470b63bde53756d4b2b533e96721c3c56f2cb65e394cc6f1b5d78cdcc390

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a6-cp310-cp310-win32.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a6-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 b34237afaeeba8df15005d4278c55289a7e0df7ba88b6d4ad0fdb986f0a95da0
MD5 b0c261f88fb91dfde0d1a2da49a156d0
BLAKE2b-256 6c5440748b6a6640e104733e76e82db7b6962281e0d4ca109ac5b73ab13358fb

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a6-cp310-cp310-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a6-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 c5aaafc49dd74d48c4ab0ed79a1b2b70467505a365e929d5fc3964af30531bfa
MD5 93d8c2dab3d53f02cb02556eb8b71b9f
BLAKE2b-256 edb3fc26049824c36e9045d45f6a10f17d12e6b5816383f31b09cfceccd548d0

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a6-cp310-cp310-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a6-cp310-cp310-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 81a7b5c1b38c2ad5cfa76486dd5dca7410c44e796467f402e801bd3e425bc0fc
MD5 e271d94b9dbdc8baaebf15517c668fe9
BLAKE2b-256 b4d33b4f4c337aaec8779d7b65385d7c64e421ebf95aea1e13fd1789bf530f59

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a6-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a6-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 274bf780e0082fc00b9269b99aa48889f5dd1f48bb0103f1296f55742402a200
MD5 f865b0c37e065cbaa7915800a00fed92
BLAKE2b-256 93a7891b936492fc359176c544943d06ab5bc7a8e6e021d5099d549296bde901

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a6-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a6-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 e9bbf1f158c4465188355fdbf31e818d8ee624087521b11fb8a0e82dfa26f01c
MD5 b5402cca3213153e4efb8acdcca10b23
BLAKE2b-256 fbd52906dfa6e47b500e8cd71f1402369f319354f1a12ed4ffef1811b93fb25c

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a6-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a6-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 4f2b40192846c62de2ed0a43927e7dbd3dbd02de4f14afd5839e4af3c159d477
MD5 dbdd43ff6fecb19e53394e14ed8495ce
BLAKE2b-256 f750b3c7b1f1c0f58c766fbfe6488578ac0938d7a88bb766350f3c9f5762a2b0

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page