a Python library for Managing, annotating, and converting natural language corpuses using popular formats (CoNLL, ELAN, Praat, CSV, JSON, SQLite, VTT, Audacity, TTL, TIG, ISF)

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

speach

Speach (formerly texttaglib), is a Python 3 library for managing, annotating, and converting natural language corpuses using popular formats (CoNLL, ELAN, Praat, CSV, JSON, SQLite, VTT, Audacity, TTL, TIG, ISF, etc.)

Main functions are:

Text corpus management
Manipuling ELAN transcription files directly in ELAN Annotation Format (eaf)
TIG - A human-friendly intelinear gloss format for linguistic documentation
Multiple storage formats (text, CSV, JSON, SQLite databases)

Useful Links

Speach documentation: https://speach.readthedocs.io/
Soure code: https://github.com/neocl/speach/

Installation

Speach is availble on PyPI.

pip install speach

ELAN support

speach library contains a command line tool for converting EAF files into CSV.

python -m speach eaf2csv input_elan_file.eaf -o output_file_name.csv

For more complex analyses, speach Python scripts can be used to extract metadata and annotations from ELAN transcripts, for example:

from speach import elan

# Test ELAN reader function in speach
eaf = elan.open_eaf('./test/data/test.eaf')

# accessing metadata
print(f"Author: {eaf.author} | Date: {eaf.date} | Format: {eaf.fileformat} | Version: {eaf.version}")
print(f"Media file: {eaf.media_file}")
print(f"Time units: {eaf.time_units}")
print(f"Media URL: {eaf.media_url} | MIME type: {eaf.mime_type}")
print(f"Media relative URL: {eaf.relative_media_url}")

# accessing tiers & annotations
for tier in eaf.tiers():
    print(f"{tier.ID} | Participant: {tier.participant} | Type: {tier.type_ref}")
    for ann in tier.annotations:
        print(f"{ann.ID.rjust(4, ' ')}. [{ann.from_ts.ts} -- {ann.to_ts.ts}] {ann.value}")

Text corpus

>>> from speach import ttl
>>> doc = ttl.Document('mydoc')
>>> sent = doc.new_sent("I am a sentence.")
>>> sent
#1: I am a sentence.
>>> sent.ID
1
>>> sent.text
'I am a sentence.'
>>> sent.import_tokens(["I", "am", "a", "sentence", "."])
>>> >>> sent.tokens
[`I`<0:1>, `am`<2:4>, `a`<5:6>, `sentence`<7:15>, `.`<15:16>]
>>> doc.write_ttl()

The script above will generate this corpus

-rw-rw-r--.  1 tuananh tuananh       0  3月 29 13:10 mydoc_concepts.txt
-rw-rw-r--.  1 tuananh tuananh       0  3月 29 13:10 mydoc_links.txt
-rw-rw-r--.  1 tuananh tuananh      20  3月 29 13:10 mydoc_sents.txt
-rw-rw-r--.  1 tuananh tuananh       0  3月 29 13:10 mydoc_tags.txt
-rw-rw-r--.  1 tuananh tuananh      58  3月 29 13:10 mydoc_tokens.txt

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.1a15.post1 pre-release

Mar 17, 2022

0.1a15 pre-release

Mar 15, 2022

0.1a14 pre-release

Mar 14, 2022

0.1a13 pre-release

Jan 14, 2022

0.1a12 pre-release

Nov 3, 2021

0.1a11 pre-release

Aug 26, 2021

0.1a10 pre-release

Jul 27, 2021

0.1a9.post2 pre-release

Jun 7, 2021

0.1a9 pre-release

May 27, 2021

0.1a8 pre-release

May 27, 2021

0.1a7 pre-release

May 14, 2021

0.1a6 pre-release

May 1, 2021

0.1a5 pre-release

Apr 29, 2021

0.1a4 pre-release

Apr 28, 2021

This version

0.1a3 pre-release

Apr 28, 2021

0.1a2 pre-release

Apr 28, 2021

0.1a1 pre-release

Apr 28, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speach-0.1a3.tar.gz (28.7 kB view hashes)

Uploaded Apr 28, 2021 Source

Hashes for speach-0.1a3.tar.gz

Hashes for speach-0.1a3.tar.gz
Algorithm	Hash digest
SHA256	`158456eccd57a8fa1313b0f8dcbff8d96a9eab238e04d8dd1a1d1f35b855750b`
MD5	`446a8286725815497cc8a5d249c8a596`
BLAKE2b-256	`939434ec2155d3495bccc5cca8887e04c7e4e892d6d43fa85a5fa0601a294bd9`