Encoding tools for DDHI
Project description
A collection of command-line utilities to assist in the creation of TEI-encoded oral history interviews. Part of the Dartmouth Digital History Initiative.
DDHI Encoder
The ddhi-encoder package is being developed to assist encoders in the DDHI project in encoding oral history interview transcripts in TEI. At present, it contains two command-line utilities:
ddhi_convert: convert a Dartmouth DVP transcript from docx to tei.xml.
ddhi_tag: perform named-entity tagging on a DDHI TEI transcription.
Installation
You can use pip to install this package:
pip install ddhi-encoder
To peform named-entity tagging with ddhi_tag, you will need a Spacy model. Before running ddhi_tag, install Spacy’s small English model:
python -m spacy download en_core_web_sm
See the Spacy documentation for more information.
Use
Use ddhi_convert to transform a DOCX-encoded transcription into a simply structured TEI document:
ddhi_convert ~/Desktop/transcripts/zien_jimmy_transcript_final.docx -o tmp.tei.xml
Use ddhi_tag to add named-entity tags to a TEI-encoded transcription:
ddhi_tag -o zien.tei.xml tmp.tei.xml
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for ddhi_encoder-1.0.7-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6e12d4b17512c80c0dd003ac2894194320f9ea06efba9013bf4fdb4748359e1b |
|
MD5 | afa46314d3fda189677fe96867555cb4 |
|
BLAKE2b-256 | 6db8b7c4dd046488e894e9d800c7e52d4e6a11ffa357356a4bac798dde86a967 |