Skip to main content

Python package containing various utilities relevant in the field of digital humanities.

Project description

Digital Humanities Utilities

Python 3.6+ package containing various utilities relevant in the field of digital humanities.

$ pip install dh-utils

Unicode utilities

Decompose any unicode string:

>>> from dh_utils import unicode as u
>>> u.decompose('λόγος')
λ U+03bb GREEK SMALL LETTER LAMDA
ο U+03bf GREEK SMALL LETTER OMICRON
́ U+0301 COMBINING ACUTE ACCENT
γ U+03b3 GREEK SMALL LETTER GAMMA
ο U+03bf GREEK SMALL LETTER OMICRON
ς U+03c2 GREEK SMALL LETTER FINAL SIGMA

TEI utilities

Tag languages

Tag languages in a given string based on its script:

>>> from dh_utils import tei as t
>>> t.tag('A line contaning the hebrew אגוז מלך inline', 'Hebr')
'A line contaning the hebrew <foreign xml:lang="he-Hebr">אגוז מלך</foreign> inline'

It is also possible to tag a given language based on its script in a TEI XML document (NB: file will be overwritten!):

>>> t.tag_xml('path/to/file.xml', 'Arab')

The available scripts are stored in AVAILABLE_SCRIPTS and are enumerated below:

>>> t.AVAILABLE_SCRIPTS
['Arab', 'Copt', 'Hebr', 'Latn', 'Cyrl']

Default language-script codes are used to tag the scripts (stored in DEFAULT_LCS), which can be adjusted using the language_code keyword argument:

>>> t.tag_xml('path/to/file.xml', 'Cyrl', language_code = 'ov-Cyrs')

MyCapytain-compatilble critical apparatus

The Python API MyCapytain only serves the main text of a CTS structured text version, and does not support stand-off annotation, bibliographies, critical apparati, etc. To overcome the last problem, we have developed a script that generates a separate text version of the critical apparatus that can be served through MyCapytain. Brill's Scholarly Editions uses these separate text versions, which can be displayed in parallel.

The following snippet creates such a critapp file from textgroup.work.edition-extension.xml located in path/to/data/textgroup/work and saves it as textgroup.work.edition-appcrit1.xml

>>> import crit_app as ca
>>> data_dir = "path/to/data/textgroup/work"
>>> filename = "textgroup.work.edition-extension.xml"
>>> ca_ext = "appcrit1" # Or any other extension
>>> ca.create(filename, ca_ext, data_dir)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dh-utils-0.1.9.tar.gz (6.6 kB view details)

Uploaded Source

Built Distribution

dh_utils-0.1.9-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file dh-utils-0.1.9.tar.gz.

File metadata

  • Download URL: dh-utils-0.1.9.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.4

File hashes

Hashes for dh-utils-0.1.9.tar.gz
Algorithm Hash digest
SHA256 5c513acf5e7e86580fbc7626b54b142e5041d74e7b4c3c18a54d2791ae72c1cc
MD5 770d577910bd8b02b07645b88b2cff37
BLAKE2b-256 dc3a1247f7b022adb09f966e3036aad1745704f6d661352feed3335dce7a29a4

See more details on using hashes here.

File details

Details for the file dh_utils-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: dh_utils-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 8.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.4

File hashes

Hashes for dh_utils-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 59f11d8b9157b64e2c3cc343641415280a9cda2b4fe73627fe6a8236169f6ba9
MD5 5c3d40a4b65d09a342b4e910ec2dbbfb
BLAKE2b-256 a91ac533a9d1548fbe367f310183b30156e53d7e1a1f057cc8114a2f2a4890b2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page