Skip to main content

Python parser for USFM files, based on tree-sitter-usfm3

Project description

USFM-Grammar

The python library that facilitates

  • Parsing and validation of USFM files using tree-sitter-usfm3
  • Conversion of USFM files to other formats (USX, dict, list etc)
  • Extraction of specific contents from USFM files like scripture alone(clean verses), notes (footnotes, cross-refs) etc

Built on python 3.10

Installation

pip install usfm-grammar

This requires a C compiler. On Windows, Microsoft Visual C++ 14.0 or above is required. It is recommended that you update pip, setuptools and wheel.

Usage

By importing library in Python code

from usfm_grammar import USFMParser, Filter

# input_usfm_str = open("sample.usfm","r", encoding='utf8').read()
input_usfm_str = '''
\\id GEN
\\c 1
\\p
\\v 1 test verse
'''

my_parser = USFMParser(input_usfm_str)

errors = my_parser.errors
print(errors)

To convert to USX

from lxml import etree

usx_elem = my_parser.to_usx() # default filter=ALL
print(etree.tostring(usx_elem, encoding="unicode", pretty_print=True))

To convert to Dict

output = my_parser.to_usj() # default all markers
#output = my_parser.to_usj([Filter.SCRIPTURE_TEXT])
#output = my_parser.to_usj([Filter.NOTES])
#output = my_parser.to_usj([Filter.NOTES, Filter.ATTRIBUTES])
#output = my_parser.to_usj([Filter.SCRIPTURE_TEXT, Filter.TITLES, Filter.PARAGRAPHS)

print(output)

To save as json

import json
dict_output = my_parser.to_usj()
with open("file_path.json", "w", encoding='utf-8') as fp:
	json.dump(dict_output, fp)

To convert to List or table like format

list_output = my_parser.to_list() 
#list_output = my_parser.to_list([Filter.SCRIPTURE_TEXT])

table_output = "\n".join(["\t".join(row) for row in list_output])
print(table_output)

From CLI

usage: usfm-grammar [-h] [--format {json,table,syntax-tree,usx,markdown}]
                    [--filter {book_headers,paragraphs,titles,scripture_text,notes,attributes,milestones,study_bible}]
                    [--csv_col_sep CSV_COL_SEP] [--csv_row_sep CSV_ROW_SEP]
                    infile

Uses the tree-sitter-usfm grammar to parse and convert USFM to "+ "Syntax-tree, JSON, CSV, USX etc.

positional arguments:
  infile                input usfm file

options:
  -h, --help            show this help message and exit
  --format {json,table,syntax-tree,usx,markdown}
                        output format
  --filter {book_headers,paragraphs,titles,scripture_text,notes,attributes,milestones,study_bible}
                        the type of contents to be included
  --csv_col_sep CSV_COL_SEP
                        column separator or delimiter. Only useful with format=table.
  --csv_row_sep CSV_ROW_SEP
                        row separator or delimiter. Only useful with format=table.

Example

>>> python3 -m usfm_grammar sample.usfm --format usx

>>> usfm-grammar sample.usfm --format usx

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

usfm_grammar-3.0.0a7-cp311-cp311-win_amd64.whl (166.0 kB view details)

Uploaded CPython 3.11 Windows x86-64

usfm_grammar-3.0.0a7-cp311-cp311-win32.whl (168.7 kB view details)

Uploaded CPython 3.11 Windows x86

usfm_grammar-3.0.0a7-cp311-cp311-musllinux_1_1_x86_64.whl (166.2 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

usfm_grammar-3.0.0a7-cp311-cp311-musllinux_1_1_i686.whl (174.6 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ i686

usfm_grammar-3.0.0a7-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (165.9 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

usfm_grammar-3.0.0a7-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (174.4 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

usfm_grammar-3.0.0a7-cp311-cp311-macosx_10_9_x86_64.whl (159.2 kB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

usfm_grammar-3.0.0a7-cp310-cp310-win_amd64.whl (166.0 kB view details)

Uploaded CPython 3.10 Windows x86-64

usfm_grammar-3.0.0a7-cp310-cp310-win32.whl (168.7 kB view details)

Uploaded CPython 3.10 Windows x86

usfm_grammar-3.0.0a7-cp310-cp310-musllinux_1_1_x86_64.whl (166.2 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

usfm_grammar-3.0.0a7-cp310-cp310-musllinux_1_1_i686.whl (174.6 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ i686

usfm_grammar-3.0.0a7-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (165.9 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

usfm_grammar-3.0.0a7-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (174.4 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

usfm_grammar-3.0.0a7-cp310-cp310-macosx_10_9_x86_64.whl (159.2 kB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

File details

Details for the file usfm_grammar-3.0.0a7-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a7-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 fec12d1befdfcb4280d17c88f296ff19cb86284465a5eaad08c9857acaa9eab7
MD5 6458356ff6c4b0a86bcad5b1697af02a
BLAKE2b-256 5f29bb08b5de18b5f71f7d48433c433d9bc540976b4ad1686df4677192640f5b

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a7-cp311-cp311-win32.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a7-cp311-cp311-win32.whl
Algorithm Hash digest
SHA256 e8ea34643a4e6cd9fa798ab9cf0775db3c20b81f587e19931f9110318475cf26
MD5 acd6673a9b0b512c6e18b2cd57445055
BLAKE2b-256 a335a6b9219e8d8af4e0dffcbe31197c531833a338834ce8430d5105713f17ed

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a7-cp311-cp311-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a7-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 e275699a3d720b74d767fbe32bc40da79ad7d4096144d18638690b4826382c86
MD5 397376b4149946314431dab4b1fbaf00
BLAKE2b-256 8de937218ab5f8f8bd96e92dcf7fedef746f9f9ef78043d07105dda5cf3d47d2

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a7-cp311-cp311-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a7-cp311-cp311-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 43a2dfe3a3b6dc3a312aaf3165f5af10864368bad4d6e9195950b5a2694f908f
MD5 81f765e2801b9c928cb6007f24ff0d2b
BLAKE2b-256 df40a064dba93356aa25e205c6d0fe37c2c1e1d8c01b9a7f9cbc6799c0d3992f

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a7-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a7-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 76a9feb60feca4329dd4cc8248c49e6131942b42ff6727ba2d58aec39dcc3e69
MD5 be6196747b47741ea223f897b71d382c
BLAKE2b-256 ec3f9ad57aef8aa370238da843a3bc77e80f86b00f6399ddd6638a817b6f54c3

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a7-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a7-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 3c687ca679652d637b63719e868f96a726cf966e0f82ce3600401e5e5aed57c5
MD5 e45c61366f89336ba628eb4733688123
BLAKE2b-256 e486b7219388395f298d3aee92b3f32d56983de0cb98c549b4f28b61db569a18

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a7-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a7-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 f111806da6f7e982338515d3377beeaf9b8629d3954dcd94e424861fb1c452eb
MD5 058dfc530df61895c9cf1f56ca5f6118
BLAKE2b-256 152fe5bcaade72ab1f45d5ec5cd6f1be805fd5b6da33eef8dcc01299f5ff5de6

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a7-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a7-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 49ad6fdd7318bebeab3d7fa9d8dbb32fb8d38fece08ace7fd4fb177a55223aef
MD5 b203c8141cdceb06a589498d9b504fdf
BLAKE2b-256 37eb716362f3cd1f69e68604a097c56d0e19ef9f84f97300e530c6ee3c05b81c

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a7-cp310-cp310-win32.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a7-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 45be1b3fe1290e0a80a61eaeb307a08aada7679f5778d2c1367aa46568b392ef
MD5 27ce8bc8effa5e0deddaf84eec9f4285
BLAKE2b-256 3d41cce7c4d4ae1c9959eb49b26f65f38d163b075c15ff133a3a018fd985203c

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a7-cp310-cp310-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a7-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 b7c4d833eee0a6648b451edc25993ac20353cc0aa0626a695c8e08bde770bcf9
MD5 a7fca22c502b4a5c82227b269fa5176c
BLAKE2b-256 55092351aa9055fd99f0353b86cd791553dcb0b53df5ab6a64ef32555aedfbd2

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a7-cp310-cp310-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a7-cp310-cp310-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 50a11a9988dd7b4aff47cec7e12e5864f3d9af6d7afbca672a632da81c4201fb
MD5 71c1b8484dc45cdb02274a075ccffeee
BLAKE2b-256 36078e9499c863b9add933fe9efdecf431abf4be9bbb97185d9a635ae23ba625

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a7-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a7-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8637f9c646d43ea7404a8d60fed1730b7958674dc0ab67343cec8ca88e496f8f
MD5 fa079039f9aaaf5cd14d842b365d2ea7
BLAKE2b-256 4626e47c82ed63171e85ab910573dd0faa2c8a98a4920c840faaca3fa7135eac

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a7-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a7-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 075495c0ac64300f58f0e422158d3da46f1b0c0c7d24a219ec22ab26f38782a7
MD5 c4fcfb4159ec69c81a4dce0ba9dd8bd5
BLAKE2b-256 e461516379016ebe314e7e7350e3255df4d3dfe4ca6b2168f3ae255e83c3dabc

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a7-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a7-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 3599c9b41205a8e688a30a621bb9b13d05aec89478dc15dd819c07053703805f
MD5 baa03185f7a3f1512d65ccefd176e621
BLAKE2b-256 bcd07998ca4ac527af90b074979c35bbf963cb4e070f611e36986733e7528467

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page