Python package containing various utilities relevant in the field of digital humanities.

These details have not been verified by PyPI

Project links

Homepage

Project description

Digital Humanities Utilities

Python 3.6+ package containing various utilities relevant in the field of digital humanities.

$ pip install dh-utils

Unicode utilities

Convert Greek beta code to unicode

>>> from dh_utils.unicode import beta2uni
>>> beta2uni('lo/gos')
'λόγος'

This is a wrapper of the CLTK converter. We used this converter to also create inverse:

>>> from dh_utils.unicode import uni2beta
>>> uni2beta('λόγος')
'lo/gos'

NB: Since cltk and its dependency nltk are relatively large, cltk is added as an optional dependency. To use the beta2uni converter, either install cltk separately using pip install cltk or install dh-utils including this optional depency with pip install dh-utils[betacode].

Decompose a unicode string

>>> u.decompose('λόγος')
λ U+03bb GREEK SMALL LETTER LAMDA
ο U+03bf GREEK SMALL LETTER OMICRON
́ U+0301 COMBINING ACUTE ACCENT
γ U+03b3 GREEK SMALL LETTER GAMMA
ο U+03bf GREEK SMALL LETTER OMICRON
ς U+03c2 GREEK SMALL LETTER FINAL SIGMA

TEI utilities

Convert markdown to TEI

A basic converter from markdown to TEI has been added. It will convert a markdown file like:

Some paragraph block

> A blockquote

1. An
2. Ordered
3. List

Another paragraph block with _italics_ and __bold__, and:

* An
* Unordered
* List

using a Python snippet like

>>> from dh_utils.tei import md2tei
>>> with open('file.md') as f:
>>>    md = md2tei(f.read())

to the following TEI XML:

<p>Some paragraph block</p>
<quote>
  <p>A blockquote</p>
</quote>
<list rend="numbered">
  <item>An</item>
  <item>Ordered</item>
  <item>List</item>
</list>
<p>Another paragraph block with <hi rend="italic">italics</hi> and <hi rend="bold">bold</hi>, and:</p>
<list rend="bulleted">
  <item>An</item>
  <item>Unordered</item>
  <item>List</item>
</list>

The function md2tei is syntactic sugar for the markdown extension ToTEI, which can be used in combination with other extensions as follows:

>>> from markdown import markdown
>>> from dh_utils.tei import ToTEI
>>> markdown('some text', extensions=[ToTEI()]) # Other extensions can be added to this list

The extension ToTEI in turn exists solely of the postprocessor TEIPostprocessor. It has priority 0, which usually means that it will run after all other postprocessors have finished. If any other behaviour or prioritization is required, the processor TEIPostprocessor can also be directly imported (from dh_utils.tei import TEIPostprocessor) and used in a custom markdown extension.

Tag languages

Tag languages in a given string based on its script:

>>> from dh_utils.tei import tag_script
>>> tag_script('A line contaning the hebrew אגוז מלך inline', 'Hebr')
'A line contaning the hebrew <foreign xml:lang="he-Hebr">אגוז מלך</foreign> inline'

It is also possible to tag a given language based on its script in a TEI XML document (NB: file will be overwritten!):

>>> from dh_utils.tei import tag_script_from_file
>>> tag_script_from_file('path/to/file.xml', 'Arab')

The available scripts are stored in dh_utils.tei.AVAILABLE_SCRIPTS and are enumerated below:

>>> from dh_utils.tei import AVAILABLE_SCRIPTS
>>> AVAILABLE_SCRIPTS
['Arab', 'Copt', 'Hebr', 'Latn', 'Cyrl']

Default language-script codes are used to tag the scripts (stored in DEFAULT_LCS), which can be adjusted using the language_code keyword argument:

>>> t.tag_script_from_file('path/to/file.xml', 'Cyrl', language_code = 'ov-Cyrs')

Refsdecl generator

To generate refsdecl elements, the generator can be used to create etree xml elements:

from dh_utils.tei import refsdecl_generator

refs_decl = refsdecl_generator.generate_for_file("./path/to/file")
refs_decls = refsdecl_generator.generate_for_path("./path/to/files")

It can also be used trough the command line interface:

python -m dh_utils.tei.refsdecl_generator [--update] [PATH]

By default, it does not update the file but outputs the refsdecl xml to the terminal. If the --update flag is given, the file is updated with the generated refsdecl.

MyCapytain-compatilble critical apparatus

The Python API MyCapytain only serves the main text of a CTS structured text version, and does not support stand-off annotation, bibliographies, critical apparati, etc. To overcome the last problem, we have developed a script that generates a separate text version of the critical apparatus that can be served through MyCapytain. Brill's Scholarly Editions uses these separate text versions, which can be displayed in parallel.

The following snippet creates such a critapp file from textgroup.work.edition-extension.xml located in path/to/data/textgroup/work and saves it as textgroup.work.edition-appcrit1.xml

>>> from dh_utils.tei import crit_app as ca
>>> data_dir = "path/to/data/textgroup/work"
>>> filename = "textgroup.work.edition-extension.xml"
>>> ca_ext = "appcrit1" # Or any other extension
>>> ca.create(filename, ca_ext, data_dir)

If a file contains multiple critical apparati, these can be distinguished using /listApp[@type], e.g.:

<listApp type="superior">
  <app/>
  <app/>
  ...
</listApp>
<listApp type="inferior">
  <app/>
  <app/>
  ...
</listApp>

Using the above snippet will combine these apparati into one file. If these should be conerted to separate files, one can pass an additional argument app_type to ca.create (e.g., ca.create(filename, ca_ext, data_dir app_type="superior")) to convert an apparatus separately.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.23

Jun 17, 2021

0.1.22

Jun 14, 2021

0.1.21

Feb 19, 2021

0.1.20

Feb 9, 2021

0.1.19

Feb 9, 2021

0.1.18

Jan 8, 2021

0.1.17

Jan 8, 2021

0.1.16

Nov 27, 2020

0.1.15

Nov 27, 2020

0.1.14

Nov 7, 2020

0.1.13

Nov 3, 2020

0.1.12

Sep 30, 2020

0.1.11

Sep 30, 2020

0.1.10

Sep 30, 2020

0.1.9

Sep 30, 2020

0.1.8

Jul 20, 2020

0.1.7

May 27, 2020

0.1.6

May 26, 2020

0.1.5

May 11, 2020

0.1.4

May 9, 2020

0.1.3

May 8, 2020

0.1.1

May 8, 2020

0.1.0

May 8, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dh-utils-0.1.23.tar.gz (15.8 kB view details)

Uploaded Jun 17, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dh_utils-0.1.23-py3-none-any.whl (16.7 kB view details)

Uploaded Jun 17, 2021 Python 3

File details

Details for the file dh-utils-0.1.23.tar.gz.

File metadata

Download URL: dh-utils-0.1.23.tar.gz
Upload date: Jun 17, 2021
Size: 15.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/54.1.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.9.2

File hashes

Hashes for dh-utils-0.1.23.tar.gz
Algorithm	Hash digest
SHA256	`1fc9f56187b1f13027afb9836096aaca7b8879b5d7acfc31240b4671f35e359c`
MD5	`5588ebaf85ee23d548595ac45bd4226f`
BLAKE2b-256	`c2f39f2893618b75ee7216282147a5b0004d61f1d44ba2e69f7c8dbbf9e4ac67`

See more details on using hashes here.

File details

Details for the file dh_utils-0.1.23-py3-none-any.whl.

File metadata

Download URL: dh_utils-0.1.23-py3-none-any.whl
Upload date: Jun 17, 2021
Size: 16.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/54.1.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.9.2

File hashes

Hashes for dh_utils-0.1.23-py3-none-any.whl
Algorithm	Hash digest
SHA256	`97727e104d4cf2112c1d95104fc37411edcac24970fa56dddb4ea3711fc04a3b`
MD5	`5c21e8e01ab36be9c1a647e654fe56de`
BLAKE2b-256	`f43d30beb13569fb9c1e47901beee21d2e782c9d45fa1e17e304e167fabd1ab9`

See more details on using hashes here.

dh-utils 0.1.23

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Digital Humanities Utilities

Unicode utilities

Convert Greek beta code to unicode

Decompose a unicode string

TEI utilities

Convert markdown to TEI

Tag languages

Refsdecl generator

MyCapytain-compatilble critical apparatus

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes