Pipeline for the ACQDIV database
Project description
ACQDIV
This repository contains the code and configuration files for transforming the child language acquisition corpora into the ACQDIV database.
Resources
Download the ACQDIV database (only open-source corpora):
For the complete database, please refer to ...
Supported Corpora
We provide parsers (see acqdiv.parsers.corpora.main
) for the
following corpora:
- Chintang Language Corpus (Chintang)
- Cree Child Language Acquisition Study (Cree)
- Manchester Corpus (English)
- MPI-EVA Jakarta Child Language Database (Indonesian)
- Allen Inuktitut Child Language Corpus (Inuktitut)
- Japanese MiiPro (Japanese)
- Japanese Miyata (Japanese)
- Sarvasy Nungon Corpus (Nungon)
- Qaqet
- Ku Waru
- Stoll Russian Corpus
- Demuth Corpus (Sesotho)
- Tuatschin
- Koç University Longitudinal Language Development Database (Turkish)
- Pfeiler Yucatec Child Language Corpus (Yucatec)
Running Pipeline
To run the pipeline yourself:
Download the corpora:
For the CHAT corpora, proceed as follows:
- Download the transcripts on the CHILDES TalkBank website (where available)
(see
Download transcripts
link) - Unzip the data
- Copy the python script
src/acqdiv/util/cha_extractor.py
into the folder - Run the script:
python cha_extractor.py
. A directorycha/
will be created. - Place the
cha/
directory insrc/acqdiv/corpora/<corpus_name>/
(also see the corresponding ini file insrc/acqdiv/ini/<corpus_name>
for which corpus name to use).
For the toolbox corpora, proceed as follows:
- Download the toolbox and IMDI files.
- Place the toolbox files in
src/acqdiv/corpora/Tuatschin/toolbox/
and the IMDI files insrc/acqdiv/corpora/Tuatschin/imdi/
.
Create the database:
First, install the acqdiv
package, following the instructions in INSTALL.txt
.
Run the pipeline:
acqdiv load -f
Run the unittests:
$ pytest tests/unittests
Run the integrity tests on the database:
$ pytest tests/systemtests
For more options:
acqdiv load -h
The database will be created in the directory acqdiv/database/
.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.