Python package containing various utilities relevant in the field of digital humanities.
Project description
Digital Humanities Utilities
Python 3.6+ package containing various utilities relevant in the field of digital humanities.
$ pip install dh-utils
Unicode utilities
Decompose any unicode string:
>>> from dh_utils import unicode as u
>>> u.decompose('λόγος')
λ U+03bb GREEK SMALL LETTER LAMDA
ο U+03bf GREEK SMALL LETTER OMICRON
́ U+0301 COMBINING ACUTE ACCENT
γ U+03b3 GREEK SMALL LETTER GAMMA
ο U+03bf GREEK SMALL LETTER OMICRON
ς U+03c2 GREEK SMALL LETTER FINAL SIGMA
TEI utilities
Tag languages
Tag languages in a given string based on its script:
>>> from dh_utils import tei as t
>>> t.tag('A line contaning the hebrew אגוז מלך inline', 'Hebr')
'A line contaning the hebrew <foreign xml:lang="he-Hebr">אגוז מלך</foreign> inline'
It is also possible to tag a given language based on its script in a TEI XML document (NB: file will be overwritten!):
>>> t.tag_xml('path/to/file.xml', 'Arab')
The available scripts are stored in AVAILABLE_SCRIPTS
and are enumerated below:
>>> t.AVAILABLE_SCRIPTS
['Arab', 'Copt', 'Hebr', 'Latn', 'Cyrl']
Default language-script codes are used to tag the scripts (stored in DEFAULT_LCS
), which can be adjusted using the language_code
keyword argument:
>>> t.tag_xml('path/to/file.xml', 'Cyrl', language_code = 'ov-Cyrs')
Refsdecl generator
To generate refsdecl elements, the generator can be used to create etree xml elements:
from dh_utils.tei import refsdecl_generator
refs_decl = refsdecl_generator.generate_for_file("./path/to/file")
refs_decls = refsdecl_generator.generate_for_path("./path/to/files")
It can also be used trough the command line interface:
python -m dh_utils.tei.refsdecl_generator [--update] [PATH]
By default, it does not update the file but outputs the refsdecl xml to the terminal. If the --update
flag is given, the file is updated with the generated refsdecl.
MyCapytain-compatilble critical apparatus
The Python API MyCapytain only serves the main text of a CTS structured text version, and does not support stand-off annotation, bibliographies, critical apparati, etc. To overcome the last problem, we have developed a script that generates a separate text version of the critical apparatus that can be served through MyCapytain. Brill's Scholarly Editions uses these separate text versions, which can be displayed in parallel.
The following snippet creates such a critapp file from textgroup.work.edition-extension.xml
located in path/to/data/textgroup/work
and saves it as textgroup.work.edition-appcrit1.xml
>>> import crit_app as ca
>>> data_dir = "path/to/data/textgroup/work"
>>> filename = "textgroup.work.edition-extension.xml"
>>> ca_ext = "appcrit1" # Or any other extension
>>> ca.create(filename, ca_ext, data_dir)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dh-utils-0.1.13.tar.gz
.
File metadata
- Download URL: dh-utils-0.1.13.tar.gz
- Upload date:
- Size: 9.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.9.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 321072e327481ac1572b38a57314fa000532be06c5308128f142318999f0a397 |
|
MD5 | 299566db5911c69f3baadd458a052859 |
|
BLAKE2b-256 | 1cb778744f9eb6162bfa26a52894d4d4cdf89aa1117dc4dd30fb82a5ae047043 |
File details
Details for the file dh_utils-0.1.13-py3-none-any.whl
.
File metadata
- Download URL: dh_utils-0.1.13-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.9.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d8424d4efc28951bab0bc807992647f890f76cf3c52db157c966163854a5a54e |
|
MD5 | 2c253b3e2ec5266cb36bca420247806b |
|
BLAKE2b-256 | b8f1ffc210f240f600b1ccdd830c3ac66c383dbcdf75bdc2d5db0cf45debd9ee |