Encoding tools for DDHI
Project description
A collection of command-line utilities to assist in the creation of TEI-encoded oral history interviews. Part of the Dartmouth Digital History Initiative.
DDHI Encoder
The ddhi-encoder package is being developed to assist encoders in the DDHI project in encoding oral history interview transcripts in TEI. At present, it contains two command-line utilities:
ddhi_convert: convert a Dartmouth DVP transcript from docx to tei.xml.
ddhi_tag: perform named-entity tagging on a DDHI TEI transcription.
Installation
You can use pip to install this package:
pip install ddhi-encoder
To peform named-entity tagging with ddhi_tag, you will need a Spacy model. Before running ddhi_tag, install Spacy’s small English model:
python -m spacy download en_core_web_sm
See the Spacy documentation for more information.
Use
Use ddhi_convert to transform a DOCX-encoded transcription into a simply structured TEI document:
ddhi_convert ~/Desktop/transcripts/zien_jimmy_transcript_final.docx -o tmp.tei.xml
Use ddhi_tag to add named-entity tags to a TEI-encoded transcription:
ddhi_tag -o zien.tei.xml tmp.tei.xml
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file ddhi_encoder-1.0.8-py2.py3-none-any.whl
.
File metadata
- Download URL: ddhi_encoder-1.0.8-py2.py3-none-any.whl
- Upload date:
- Size: 14.7 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 959c8e576313a9c90b8b53d2453cebc95befe7085470c420f9ddeef9f8294fa9 |
|
MD5 | 610189397ae374d0584fba096bf45cba |
|
BLAKE2b-256 | deb6b6b82cb33d0001f96f8ac654b22ff8602092db90fb187f23b657f64d5900 |