Encoding tools for DDHI
Project description
A collection of command-line utilities to assist in the creation of TEI-encoded oral history interviews for the Dartmouth Digital History Initiative.
Installation
Use pip to install this package:
pip install ddhi-encoder
To peform named-entity tagging with ddhi_tag, you will need a Spacy model. Before running ddhi_tag, install Spacy’s small English model:
python -m spacy download en_core_web_sm
See the Spacy documentation for more information.
Use
Use ddhi_convert to transform a DOCX-encoded transcription into a simply structured TEI document.
ddhi_convert ~/Desktop/transcripts/zien_jimmy_transcript_final.docx -o tmp.tei.xml
Use ddhi_tag to add named-entity tags to a TEI-encoded transcription:
ddhi_tag -o zien.tei.xml tmp.tei.xml
Encoders are then expected to edit the text of the interview, correcting automatically generated named-entity tags and adding new ones.
Use ddhi_generate_standoff to create a <standOff> element in the interview and link the entities to names in the text.
Use ddhi_mentioned_places to extract the places in a TEI file’s standoff markup and print it as tab-separated values:
ddhi_mentioned_places lovely.tei.xml > lovely.tsv
Then use OpenRefine or another tool to refine this list with identifiers and other metadata.
Use ddhi_update_places to update the places in a TEI file’s standoff markup with identifiers and geo-coordinates obtained via OpenRefine or other procedure:
ddhi_update_places lovely.tei.xml lovely_updates.tsv >
updated_lovely.tei.xml
Similarly, use ddhi_mentioned_events and ddhi_update_events to perform the same operations for events.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for ddhi_encoder-1.3.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13a8d5e5c9c923fc8bfe957b30168cf2e5647378a5850e638d9b1f23bf0ebeb6 |
|
MD5 | 041998fc9b7e541228848453f7187791 |
|
BLAKE2b-256 | 2adff10da16448089623e9f7eb37790f86258b87f4c85737b73ad91d0c940b1b |