Skip to main content

Python parser for USFM files, based on tree-sitter-usfm3

Project description

USFM-Grammar

The python library that facilitates

  • Parsing and validation of USFM files using tree-sitter-usfm3
  • Conversion of USFM files to other formats (USX, dict, list etc)
  • Extraction of specific contents from USFM files like scripture alone(clean verses), notes (footnotes, cross-refs) etc

Built on python 3.10

Installation

pip install usfm-grammar

This requires a C compiler. On Windows, Microsoft Visual C++ 14.0 or above is required. It is recommended that you update pip, setuptools and wheel.

Usage

By importing library in Python code

from usfm_grammar import USFMParser, Filter

# input_usfm_str = open("sample.usfm","r", encoding='utf8').read()
input_usfm_str = '''
\\id GEN
\\c 1
\\p
\\v 1 test verse
'''

my_parser = USFMParser(input_usfm_str)

errors = my_parser.errors
print(errors)

To convert to USX

from lxml import etree

usx_elem = my_parser.to_usx() # default filter=ALL
print(etree.tostring(usx_elem, encoding="unicode", pretty_print=True))

To convert to Dict

output = my_parser.to_dict() # default all markers
#output = my_parser.to_dict([Filter.SCRIPTURE_TEXT])
#output = my_parser.to_dict([Filter.NOTES])
#output = my_parser.to_dict([Filter.NOTES, Filter.ATTRIBUTES])
#output = my_parser.to_dict([Filter.SCRIPTURE_TEXT, Filter.TITLES, Filter.PARAGRAPHS)

print(output)

To save as json

import json
dict_output = my_parser.to_dict()
with open("file_path.json", "w", encoding='utf-8') as fp:
	json.dump(dict_output, fp)

To convert to List or table like format

list_output = my_parser.to_list() 
#list_output = my_parser.to_list([Filter.SCRIPTURE_TEXT])

table_output = "\n".join(["\t".join(row) for row in list_output])
print(table_output)

From CLI

usage: usfm-grammar [-h] [--format {json,table,syntax-tree,usx,markdown}]
                    [--filter {book_headers,paragraphs,titles,scripture_text,notes,attributes,milestones,study_bible}]
                    [--csv_col_sep CSV_COL_SEP] [--csv_row_sep CSV_ROW_SEP]
                    infile

Uses the tree-sitter-usfm grammar to parse and convert USFM to "+ "Syntax-tree, JSON, CSV, USX etc.

positional arguments:
  infile                input usfm file

options:
  -h, --help            show this help message and exit
  --format {json,table,syntax-tree,usx,markdown}
                        output format
  --filter {book_headers,paragraphs,titles,scripture_text,notes,attributes,milestones,study_bible}
                        the type of contents to be included
  --csv_col_sep CSV_COL_SEP
                        column separator or delimiter. Only useful with format=table.
  --csv_row_sep CSV_ROW_SEP
                        row separator or delimiter. Only useful with format=table.

Example

>>> python3 -m usfm_grammar sample.usfm --format usx

>>> usfm-grammar sample.usfm --format usx

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

usfm_grammar-3.0.0a5-cp311-cp311-win_amd64.whl (199.1 kB view details)

Uploaded CPython 3.11 Windows x86-64

usfm_grammar-3.0.0a5-cp311-cp311-win32.whl (201.2 kB view details)

Uploaded CPython 3.11 Windows x86

usfm_grammar-3.0.0a5-cp311-cp311-musllinux_1_1_x86_64.whl (198.5 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

usfm_grammar-3.0.0a5-cp311-cp311-musllinux_1_1_i686.whl (208.1 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ i686

usfm_grammar-3.0.0a5-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (198.2 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

usfm_grammar-3.0.0a5-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (207.8 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

usfm_grammar-3.0.0a5-cp311-cp311-macosx_10_9_x86_64.whl (194.0 kB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

usfm_grammar-3.0.0a5-cp310-cp310-win_amd64.whl (199.0 kB view details)

Uploaded CPython 3.10 Windows x86-64

usfm_grammar-3.0.0a5-cp310-cp310-win32.whl (201.2 kB view details)

Uploaded CPython 3.10 Windows x86

usfm_grammar-3.0.0a5-cp310-cp310-musllinux_1_1_x86_64.whl (198.5 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

usfm_grammar-3.0.0a5-cp310-cp310-musllinux_1_1_i686.whl (208.1 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ i686

usfm_grammar-3.0.0a5-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (198.2 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

usfm_grammar-3.0.0a5-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (207.8 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

usfm_grammar-3.0.0a5-cp310-cp310-macosx_10_9_x86_64.whl (194.0 kB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

File details

Details for the file usfm_grammar-3.0.0a5-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a5-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 22807a379238ea187f003174bfcb7c7a7499b3cb6503aaca778d912c87580f9a
MD5 a7007fad73fec4dfef9c12ef2fcfba4c
BLAKE2b-256 fde0ee428dbb65287b1d6c3bc3876807f1b5729d9b24dbcabd247d88e61ebafc

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a5-cp311-cp311-win32.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a5-cp311-cp311-win32.whl
Algorithm Hash digest
SHA256 bc445d3e53b044cef5eefdee58122fe20693214cdb4df0b33e4d8640bb3e0874
MD5 3d25f0385ca204337e1633de85e76c00
BLAKE2b-256 5abc580d9951fd300159059a519161e82fe70334cb1ce1eaa1dcaada7155e658

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a5-cp311-cp311-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a5-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 a04c3c03daf445502e1588accaaae3eacc58e145da1512938eb5863624ad75bd
MD5 c30363cb5f7956b0d78e0c823eded734
BLAKE2b-256 aa2404a19e78a660c76aaf8081d7573a7d4d9fc172ad1a0c59701029d3459bf2

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a5-cp311-cp311-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a5-cp311-cp311-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 e1605dda58cada503e05964f9092a2923ad8ea62f4fa5a6a7bdef9b28acaf2a2
MD5 72c472ba60ff7659d8df1df340d9a698
BLAKE2b-256 de0db36f2974513677ddec71e2ed1b1f6b7858b2eb7ca6339904035e384f05f2

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a5-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a5-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2174df6b5457131f866f3b0ff4849bd23d679ac990216e1a73324cf3165a6e54
MD5 c7a10326bca52f43ccd9fb9c67528eaf
BLAKE2b-256 c08f74b049a24b557e8184517ef967663f54c1622361972612a8d087d3931da9

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a5-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a5-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 cbbbfa63646af6497af60c2296ab8b83f2a3a45d4319af6b4cc3015c1afb5208
MD5 3f34d437f0cbb08e293d91313b208064
BLAKE2b-256 77fc3459196a8ef4313618f6a2be53280411cda48a86595b068d2a9d8991206a

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a5-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a5-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 263c83cb0c423ffcb2aa5a3b6d8101243ee29a543113134e8dd7f6af9d0bb17c
MD5 6dfa5e4e2fadfd802f44a87cb8da5db6
BLAKE2b-256 5e21d62a59d5176f772ebcd10012bc1d97afd41563b4c71cb77d3da5c91518bf

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a5-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a5-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 3dd37695875bb86dadd09953b2e71ae6e942166530de820fcd5d34df9acd185c
MD5 83233323de89dacc5e45349be8810697
BLAKE2b-256 46892d9d114f475eba70029d196623b5d57befee8c14a539489b95801530c5dc

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a5-cp310-cp310-win32.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a5-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 9b462af46f72b47d6d972c5665131ec2f6c740c069418739ecc478e97aa59999
MD5 4c146943b7ddd4f5d953e0705616c174
BLAKE2b-256 d5ce5401b0b5bb9776454d7486560e1ab178280645c254a5a1f8243fca78aa78

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a5-cp310-cp310-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a5-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 87dd5b63e29cda14805a613cdbd57d04ad8acd36c043ee768094e876a578fdb2
MD5 8af82bafa8ef3d3aea0d76bc763988c4
BLAKE2b-256 df3669440c2ff0355719277edd62d569dbec6882c26769f5760711e4d870d736

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a5-cp310-cp310-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a5-cp310-cp310-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 e27ef5fe29ded4e8db4179caf35cdeb7dc12c52ad45d8619d86c100a23218b53
MD5 e7d2af4e3fec149c2e969eb25e73c3cb
BLAKE2b-256 4643f10c1d4e1d7ec2083b37b971eff36ec2cef450afbe67c83de3d830e5c721

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a5-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a5-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c148e039c2586e2c4eb464816caf99d59fb6bbfa8520396b3d940225b0f86e4b
MD5 d3dc14e3425b51bd3e59003ab96943d1
BLAKE2b-256 6e253b52eb4af4690db1dc747e5d4c84689d767423a24d8cfe7a418faf15065d

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a5-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a5-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 1b9169182250c0afb54a867a21659e80b0b2ad75ed9c87df3e2582ef6d28af52
MD5 e6d4bb3576ffb1a5ae6950ac933b7937
BLAKE2b-256 26d401939e8b8532d2ec40fc53e16a43c962c7c3ffd537f8f8d067295a9a75e3

See more details on using hashes here.

Provenance

File details

Details for the file usfm_grammar-3.0.0a5-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for usfm_grammar-3.0.0a5-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 66bc485e2452f72d41c92ebf7dc14574e17381f4caa08b94226dcd17f29cfdd0
MD5 e16203db313bde9c52bf9786fc9d898a
BLAKE2b-256 a1be9c4695a1b3033cc779ae3b0771a9d8ff6e954ea74b004ef6b7d5e0dd8e5d

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page