Client library for communicating with LaBB-CAT servers
Project description
nzilbb-labbcat
Client library for communicating with LaBB-CAT servers using Python.
e.g.
import labbcat
# Connect to the LaBB-CAT corpus
corpus = labbcat.LabbcatView("https://labbcat.canterbury.ac.nz/demo", "demo", "demo")
# Find all tokens of a word
matches = corpus.getMatches({"orthography":"quake"})
# Get the recording of that utterance
audio = corpus.getSoundFragments(matches)
# Get Praat TextGrids for the utterances
textgrids = corpus.getFragments(
matches, ["utterance", "word","segment"],
"text/praat-textgrid")
LaBB-CAT is a web-based linguistic annotation store that stores audio or video recordings, text transcripts, and other annotations.
Annotations of various types can be automatically generated or manually added.
LaBB-CAT servers are usually password-protected linguistic corpora, and can be accessed manually via a web browser, or programmatically using a client library like this one.
The current version of this library requires LaBB-CAT version 20220307.1126.
Documentation
Detailed documentation is available here
Basic usage
nzilbb-labbcat is available in the Python Package Index here
To install the module:
pip install nzilbb-labbcat
The following example shows how to:
- upload a transcript to LaBB-CAT,
- wait for the automatic annotation tasks to finish,
- extract the annotation labels, and
- delete the transcript from LaBB-CAT.
import labbcat
# Connect to the LaBB-CAT corpus
corpus = labbcat.LabbcatEdit("http://localhost:8080/labbcat", "labbcat", "labbcat")
# List the corpora on the server
corpora = corpus.getCorpusIds()
# List the transcript types
transcript_type_layer = corpus.getLayer("transcript_type")
transcript_types = transcript_type_layer["validLabels"]
# Upload a transcript
corpus_id = corpora[0]
transcript_type = next(iter(transcript_types))
taskId = corpus.newTranscript(
"test/labbcat-py.test.txt", None, None, transcript_type, corpus_id, "test")
# wait for the annotation generation to finish
corpus.waitForTask(taskId)
corpus.releaseTask(taskId)
# get the "POS" layer annotations
annotations = corpus.getAnnotations("labbcat-py.test.txt", "pos")
labels = list(map(lambda annotation: annotation["label"], annotations))
# delete tha transcript from the corpus
corpus.deleteTranscript("labbcat-py.test.txt")
For batch uploading and other example code, see the examples subdirectory.
Developers
To build, test, release, and document the module, the following prerequisites are required:
pip3 install twine
pip3 install pathlib
apt install python3-sphinx
Unit tests
python3 -m unittest
...or for specific tests:
python3 -m unittest test.TestLabbcatAdmin
Documentation generation
cd docs
make clean
make
Publishing
rm dist/*
python3 setup.py sdist bdist_wheel
twine check dist/*
twine upload dist/*
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for nzilbb_labbcat-0.7.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 282f078a174c9d4cd6a163d00196e545da9fe1228ce895b67cacd3a15f978623 |
|
MD5 | 040d170580dd4afeda05100b36b5333a |
|
BLAKE2b-256 | 2375dcb203b41a6d224008db94d3d584ce9392888c8b1f015aed5f06cd2bf792 |