Skip to main content

Pipeline for the ACQDIV database

Project description

ACQDIV

CircleCI

This repository contains the code and configuration files for transforming the child language acquisition corpora into the ACQDIV database.

Resources

Download the ACQDIV database (only open-source corpora):

For the complete database, please refer to ...


Supported Corpora

We provide parsers (see acqdiv.parsers.corpora.main) for the following corpora:


Running Pipeline

To run the pipeline yourself:

Download the corpora:

For the CHAT corpora, proceed as follows:

  • Download the transcripts on the CHILDES TalkBank website (where available) (see Download transcripts link)
  • Unzip the data
  • Copy the python script src/acqdiv/util/cha_extractor.py into the folder
  • Run the script: python cha_extractor.py. A directory cha/ will be created.
  • Place the cha/ directory in src/acqdiv/corpora/<corpus_name>/ (also see the corresponding ini file in src/acqdiv/ini/<corpus_name> for which corpus name to use).

For the toolbox corpora, proceed as follows:

  • Download the toolbox and IMDI files.
  • Place the toolbox files in src/acqdiv/corpora/Tuatschin/toolbox/ and the IMDI files in src/acqdiv/corpora/Tuatschin/imdi/.

Create the database:

First, install the acqdiv package, following the instructions in INSTALL.txt.

Run the pipeline:
acqdiv load -f

Run the unittests:
$ pytest tests/unittests

Run the integrity tests on the database:
$ pytest tests/systemtests

For more options:
acqdiv load -h

The database will be created in the directory acqdiv/database/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

acqdiv-0.1.0.tar.gz (153.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page