The Greek Room will be a suite of tools supporting Biblical natural language processing.
Project description
greekroom
greekroom is a suite of tools to support Biblical natural language processing (in progress)
When using the GitHub version, we recommend that your PYTHONPATH includes the outer greekroom directory, i.e. the one that includes this README.md; additionally you might want to include in PATH the Greek Room's executable directories such as greekroom/greekroom/gr_utilities:greekroom/greekroom/owl .
gr_utilities
gr_utilities is a set of Greek Room utilities.
gr-wb-file-props A CLI Python script to analyze file properties such as script direction, quotations.
usage: gr-wb-file-props [-h]
[-i INPUT_FILENAME]
[-s INPUT_STRING]
[-j JSON_OUT_FILENAME]
[-o HTML_OUT_FILENAME]
[--lang_code LANG_CODE]
[--lang_name LANG_NAME]
options:
-h, --help show this help message and exit
-i INPUT_FILENAME, --input_filename INPUT_FILENAME
-s INPUT_STRING, --input_string INPUT_STRING
-j JSON_OUT_FILENAME, --json_out_filename JSON_OUT_FILENAME
-o HTML_OUT_FILENAME, --html_out_filename HTML_OUT_FILENAME
--lang_code LANG_CODE
--lang_name LANG_NAME
Notes:
- Typically, either an INPUT_FILENAME or an INPUT_STRING is provided (but not both).
- Typically, a JSON_OUT_FILENAME or a HTML_OUT_FILENAME is provided (or both).
Sample calls
gr-wb-file-props -h
gr-wb-file-props -s """She asked: “Whatʼs a ‘PyPi’?”
He replied: “I don't know.”""" -j test.json
cat test.json
gr_utilities.wb_file_props.script_punct A Python function to analyze file properties such as script direction, quotations.
import json
from greekroom.gr_utilities import wb_file_props
## Apply script to string
text = """She asked: “Whatʼs a ‘PyPi’?”
He replied: “I don't know.”"""
result_dict = wb_file_props.script_punct(None, text, "eng", "English")
print(result_dict)
## Apply script to file content
# Write text to file
filename = "test.txt"
with open(filename, "w") as f_out:
f_out.write(text)
# Apply script
result_dict2 = wb_file_props.script_punct(filename)
# Print result as JSON string
print(json.dumps(result_dict2))
# Write result to HTML file
html_output = "test.html"
with open(html_output, "w") as f_html:
wb_file_props.print_to_html(result_dict2, f_html)
owl
owl is a battery of smaller Bible Translation checks.
gr-repeated-words A CLI Python script to check a file for repeated words, e.g. "the the".
usage: gr-repeated-words [-h]
[-j JSON]
[-i IN_FILENAME]
[-r REF_FILENAME]
[-o OUT_FILENAME]
[--html HTML]
[--project_name PROJECT_NAME]
[--lang_code LANGUAGE-CODE]
[--lang_name LANG_NAME]
[--message_id MESSAGE_ID]
[-d DATA_FILENAMES]
[--verbose]
options:
-h, --help show this help message and exit
-j JSON, --json JSON input (alternative 1)
-i IN_FILENAME, --in_filename IN_FILENAME
text file (alternative 2)
-r REF_FILENAME, --ref_filename REF_FILENAME
ref file (alt. 2)
-o OUT_FILENAME, --out_filename OUT_FILENAME
output JSON filename
--html HTML output HTML filename
--project_name PROJECT_NAME
full name of Bible translation project
--lang_code LANGUAGE-CODE
ISO 639-3, e.g. 'fas' for Persian
--lang_name LANG_NAME
--message_id MESSAGE_ID
-d DATA_FILENAMES, --data_filenames DATA_FILENAMES
--verbose
Notes:
- Typically, either a JSON INPUT_FILENAME or a JSON INPUT_STRING is provided (but not both).
- Typically, a JSON_OUT_FILENAME or a HTML_OUT_FILENAME is provided (or both).
Sample calls
gr-repeated-words -h
gr-repeated-words -j '{"jsonrpc": "2.0",
"id": "eng-sample-01",
"method": "BibleTranslationCheck",
"params": [{"lang-code": "eng", "lang-name": "English",
"project-id": "eng-sample",
"project-name": "English Bible",
"selectors": [{"tool": "GreekRoom", "checks": ["RepeatedWords"]}],
"check-corpus": [{"snt-id": "GEN 1:1", "text": "In in the beginning ..."},
{"snt-id": "JHN 12:24", "text": "Truly truly, I say to you ..."}]}]}' -o test.json
cat test.json
owl.repeated_words.check_mcp A Python function to check a file for repeated words, e.g. "the the".
import json
from greekroom.owl import repeated_words
task_s = '''{"jsonrpc": "2.0",
"id": "eng-sample-01",
"method": "BibleTranslationCheck",
"params": [{"lang-code": "eng", "lang-name": "English",
"project-id": "eng-sample",
"project-name": "English Bible",
"selectors": [{"tool": "GreekRoom", "checks": ["RepeatedWords"]}],
"check-corpus": [{"snt-id": "GEN 1:1", "text": "In in the beginning ..."},
{"snt-id": "JHN 12:24", "text": "Truly truly, I say to you ..."}]}]}'''
# load_data_filename() loads <i>legitimate_duplicates.jsonl</i> (see below); call this function only once, even for multiple checks.
data_filename_dict = repeated_words.load_data_filename()
corpus = repeated_words.new_corpus("eng-sample-01")
mcp_d, misc_data_dict, check_corpus_list = repeated_words.check_mcp(task_s, data_filename_dict, corpus)
print(json.dumps(mcp_d))
print(misc_data_dict)
print(check_corpus_list)
# print to HTML file
feedback = repeated_words.get_feedback(mcp_d, 'GreekRoom', 'RepeatedWords')
corpus = repeated_words.update_corpus_if_empty(corpus, check_corpus_list)
repeated_words.write_to_html(feedback, misc_data_dict, corpus, "test.html", "eng", "English", "English Bible")
# result will be in test.html
legitimate_duplicates.jsonl Data files describing legitimate repeated words.
Samples:
{"lang-code": "eng", "text": "truly, truly"}
{"lang-code": "eng", "text": "her her", "snt-ids": ["HOS 2:17", "EST 2:9", "JDT 10:4"], "context-examples": ["give her her vineyards", "gave her her things for purification"]}
{"lang-code": "grc", "text": "ἀμὴν ἀμὴν", "rom": "amen amen", "gloss": {"eng": "truly truly [I say to you]"}}
{"lang-code": "hin", "text": "जब जब", "rom": "jab jab", "gloss": {"eng": "whenever"}}
{"lang-code": "hin", "text": "कुछ कुछ", "rom": "kuch kuch", "gloss": {"eng": "something, somewhat, some of, part of"}}
{"lang-code": "eng", "text": "they they", "delete": true}
Notes:
- Searches for files owl/data/legitimate_duplicates.jsonl in directories "greekroom", "$XDG_DATA_HOME", "/usr/share", "$HOME/.local/share"
- later entries overwrite prior entries
- "delete": true entries delete prior entries
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file greekroom-0.0.20.tar.gz.
File metadata
- Download URL: greekroom-0.0.20.tar.gz
- Upload date:
- Size: 20.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d22881f98e595f1cf72f1d1c845abbb51d3f019e956167756d4ccced88df95a
|
|
| MD5 |
411d24415e382988316f65f4b4619a90
|
|
| BLAKE2b-256 |
21510c36dce1765edb8875adbe837e39a747ba99a4127ebb177637d46f3b1711
|
File details
Details for the file greekroom-0.0.20-py3-none-any.whl.
File metadata
- Download URL: greekroom-0.0.20-py3-none-any.whl
- Upload date:
- Size: 25.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2d59ff8824249d7ef21e6cc5f616c09af67e6d24d171a1c13cea2be67d1094b
|
|
| MD5 |
5cf43d5d4a3412277b767ba476c38588
|
|
| BLAKE2b-256 |
9b6f673381dfa7d381ebf0ca034531b11ea3fd8cdb8447db111b9e327b32c128
|