Skip to main content

Encoding tools for DDHI

Project description

A collection of command-line utilities to assist in the creation of TEI-encoded oral history interviews for the Dartmouth Digital History Initiative.

Installation

Use pip to install this package:

pip install ddhi-encoder

To peform named-entity tagging with ddhi_tag, you will need a Spacy model. Before running ddhi_tag, install Spacy’s small English model:

python -m spacy download en_core_web_sm

See the Spacy documentation for more information.

Use

Use ddhi_convert to transform a DOCX-encoded transcription into a simply structured TEI document.

ddhi_convert ~/Desktop/transcripts/zien_jimmy_transcript_final.docx -o tmp.tei.xml

Use ddhi_tag to add named-entity tags to a TEI-encoded transcription:

ddhi_tag -o zien.tei.xml tmp.tei.xml

Encoders are then expected to edit the text of the interview, correcting automatically generated named-entity tags and adding new ones.

Use ddhi_generate_standoff to create a <standOff> element in the interview and link the entities to names in the text.

Use ddhi_mentioned_places to extract the places in a TEI file’s standoff markup and print it as tab-separated values:

ddhi_mentioned_places lovely.tei.xml > lovely.tsv

Then use OpenRefine or another tool to refine this list with identifiers and other metadata.

Use ddhi_update_places to update the places in a TEI file’s standoff markup with identifiers and geo-coordinates obtained via OpenRefine or other procedure:

ddhi_update_places lovely.tei.xml lovely_updates.tsv >
updated_lovely.tei.xml

Similarly, use ddhi_mentioned_events and ddhi_update_events to perform the same operations for events.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

ddhi_encoder-1.3.0-py2.py3-none-any.whl (27.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file ddhi_encoder-1.3.0-py2.py3-none-any.whl.

File metadata

  • Download URL: ddhi_encoder-1.3.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 27.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for ddhi_encoder-1.3.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 13a8d5e5c9c923fc8bfe957b30168cf2e5647378a5850e638d9b1f23bf0ebeb6
MD5 041998fc9b7e541228848453f7187791
BLAKE2b-256 2adff10da16448089623e9f7eb37790f86258b87f4c85737b73ad91d0c940b1b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page