a Python library for managing, annotating, and converting natural language corpuses using popular formats (CoNLL, ELAN, Praat, CSV, JSON, SQLite, VTT, Audacity, TTL, TIG, ISF)
Project description
speach
Speach (formerly texttaglib), is a Python 3 library for managing, annotating, and converting natural language corpuses using popular formats (CoNLL, ELAN, Praat, CSV, JSON, SQLite, VTT, Audacity, TTL, TIG, ISF, etc.)
Main functions are:
- Text corpus management
- Manipuling ELAN transcription files directly in ELAN Annotation Format (eaf)
- TIG - A human-friendly intelinear gloss format for linguistic documentation
- Multiple storage formats (text, CSV, JSON, SQLite databases)
Useful Links
- Speach documentation: https://speach.readthedocs.io/
- Soure code: https://github.com/neocl/speach/
Installation
Speach is availble on PyPI.
pip install speach
ELAN support
Speach can be used to extract annotations as well as metadata from ELAN transcripts, for example:
from speach import elan
# Test ELAN reader function in speach
eaf = elan.open_eaf('./test/data/test.eaf')
# accessing tiers & annotations
for tier in eaf:
print(f"{tier.ID} | Participant: {tier.participant} | Type: {tier.type_ref}")
for ann in tier:
print(f"{ann.ID.rjust(4, ' ')}. [{ann.from_ts} :: {ann.to_ts}] {ann.text}")
Speach also provides command line tools for processing EAF files.
# this command converts an eaf file into csv
python -m speach eaf2csv input_elan_file.eaf -o output_file_name.csv
Read Speach documentation for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.