Skip to main content

LAAC@LSCP

Project description

ChildProject

PyPI GitHub tests Travis CI Status ReadTheDocs License

Introduction

Day-long (audio-)recordings of children are increasingly common, but there is no scientific standard formatting that can benefit the organization and analyses of such data. ChildProject provides standardizing specifications and tools for the storage and management of day-long recordings of children and their associated meta-data and annotations.

File organization structure

We assume that the data include three very different types:

  1. Audio, of which we distinguish the raw audio extracted from the hardware; and a version that has been converted into a standardized format. These audios are the long-form ones. At the time being, we do not foresee including clips extracted from these long-form audios, and assume that any such process will generate some form of annotation that can then be re-cast temporally to the long-form audio.
  2. Annotations, of which we again distinguish raw and standardized versions. At present, we can import from Praat's textgrid, ELAN's eaf, and VTC's rttm format.
  3. Metadata corresponding to the children, recordings, and annotations, which will therefore also describe the converted recordings.

Available tools

Day-long audiorecordings are often collected using a LENA recorder, and analyzed with the LENA software. However, open source alternatives to the LENA commercial environment are emerging, some of which are shown in the following figure.

Overview of some tools in the day-long recordings environment

For instance, alternative hardware includes the babylogger and any other light-weight recording device with enough battery and storage to record over several hours.

Alternative automated analysis options include the Voice Type Classifier, which segments the audio into different talker types (key child, female adult, etc) and ALICE, an automated linguistic unit counter.

As for manual annotation options, ELAN can be used, for instance employing the ACLEW DAS annotation scheme. Assignment of annotation to individuals and evaluation can be done using Seshat. Finally, Zooniverse can be used to crowd-source certain aspects of the classification with the help of citizen scientists.

In this context, we provide tools and a procedure to:

  • Validate datasets (making sure that metadata, recordings and annotations are in the right place and format)
  • Convert your raw recordings into the desired format
  • Import annotations (from the LENA, ELAN, Praat, VTC/ALICE/VCM rttms, CHAT files) into a standardized format
  • Generate reliability metrics by comparing annotators (confusion matrices, agreement coefficients, pyannote metrics)
  • Extract metrics from the annotations (e.g. average vocalization rates, durations, etc.)
  • Sample segments of the recordings to annotate from a set of sampling algorithms
  • Add clips to an annotation pipeline in Zooniverse, and retrieve the ensuing annotations

These tools can be used both in command-line or from within your python code, by importing our modules.

Installation

You can find instructions to install and use our package in our documentation.

Citation

If you are using this project for your research, please cite our introductory paper:

@article{gautheron_rochat_cristia_2021,
    title={Managing, storing, and sharing long-form recordings and their annotations},
    url={https://link.springer.com/article/10.1007/s10579-022-09579-3},
    DOI={10.1007/s10579-022-09579-3},
    publisher={Springer},
    journal={Language Resources and Evaluation}
    author={Gautheron, Lucas and Rochat, Nicolas and Cristia, Alejandrina},
    year={2022},
    month={Feb}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ChildProject-0.2.2.tar.gz (106.6 kB view details)

Uploaded Source

Built Distribution

ChildProject-0.2.2-py3-none-any.whl (104.8 kB view details)

Uploaded Python 3

File details

Details for the file ChildProject-0.2.2.tar.gz.

File metadata

  • Download URL: ChildProject-0.2.2.tar.gz
  • Upload date:
  • Size: 106.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for ChildProject-0.2.2.tar.gz
Algorithm Hash digest
SHA256 bafffb50df7270af724172f2c3f9bd6411bdd359ae10492d46aaf79542abbf6b
MD5 7bdd2d7efbdda802f35760631edaa111
BLAKE2b-256 818be4e446cb142a72c30ef0ffef64d46f328ad0daeabf378f2f3b96db97d075

See more details on using hashes here.

File details

Details for the file ChildProject-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: ChildProject-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 104.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for ChildProject-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b3dd18b49dd83a76a263e8152430829a75bdb5fe1ba6810c547c9fac4c82c551
MD5 5823e069f9c1775cde67579c99de33d7
BLAKE2b-256 0d8cf88fc8e8ab60492bce42af854ffb2a5a7e9375ddc002507c55dcb830ffed

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page