Skip to main content

a Python library for managing, annotating, and converting natural language corpuses using popular formats (CoNLL, ELAN, Praat, CSV, JSON, SQLite, VTT, Audacity, TTL, TIG, ISF)

Project description

speach

ReadTheDocs Badge Total alerts Language grade: Python

Speach (formerly texttaglib), is a Python 3 library for managing, annotating, and converting natural language corpuses using popular formats (CoNLL, ELAN, Praat, CSV, JSON, SQLite, VTT, Audacity, TTL, TIG, ISF, etc.)

Main functions are:

  • Text corpus management
  • Manipuling ELAN transcription files directly in ELAN Annotation Format (eaf)
  • TIG - A human-friendly intelinear gloss format for linguistic documentation
  • Multiple storage formats (text, CSV, JSON, SQLite databases)

Useful Links

Installation

Speach is availble on PyPI.

pip install speach

ELAN support

Speach can be used to extract annotations as well as metadata from ELAN transcripts, for example:

from speach import elan

# Test ELAN reader function in speach
eaf = elan.open_eaf('./test/data/test.eaf')

# accessing tiers & annotations
for tier in eaf:
    print(f"{tier.ID} | Participant: {tier.participant} | Type: {tier.type_ref}")
    for ann in tier:
        print(f"{ann.ID.rjust(4, ' ')}. [{ann.from_ts} :: {ann.to_ts}] {ann.text}")

Speach also provides command line tools for processing EAF files.

# this command converts an eaf file into csv
python -m speach eaf2csv input_elan_file.eaf -o output_file_name.csv

Read Speach documentation for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speach-0.1a5.tar.gz (28.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page