The CLI for perform actions over the Open Speech Corpus
Project description
Open Speech Corpus CLI
This repository contains the code required to download audiodata from openspeechcorpus.com
Open Speech Corpus is composed by far for three subcorpuses:
- Tales: A crowdsourced corpus based on reading of latin american short tales
- Aphasia: A crowdsourced corpus based in words categorized in 4 levels of difficulty
- Isolated words: A crowdsourced corpus based in isolated words
To download files from the Tales Project use
ops \
--output_folder tales/ \
--output_file tales.txt \
--corpus tales
To download files from the Isolated Words Project use
ops \
--output_folder isolated_words/ \
--output_file isolated_words.txt \
--corpus words
To download files from the Aphasia Project use
ops \
--output_folder aphasia/ \
--output_file aphasia.txt \
--corpus aphasia
By default the page size is 500, to modify it use the args --from
and --to
i.e:
ops \
--from 500 \
--to 1000 \
--output_folder aphasia/ \
--output_file aphasia.txt \
--corpus aphasia
You can download the whole corpus using the flag --download_all
ops \
--output_folder aphasia/ \
--output_file aphasia.txt \
--corpus aphasia \
--download_all
If you use the flag --download_all
with the flag --from
the process will start in the specified arg from
using a
page size of 500
Recursive Convert
The Open Speech Corpus stores its files in mp4 format, which is undesired for most audio processing tasks. To convert
the files into a wav format, first install ffmpeg, then you can execute the
recursive_convert
utility which receives as first argument the source_folder with the mp4 files and as second argument
the output folder i.e.:
recursive_convert aphasia aphasia_wav
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for openspeechcorpus-0.1.2-py3.7.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4e8932d9d1e749bfaa26c89a13c1b72d2febef6201578ed3b612493d15022225 |
|
MD5 | a5a345df74dea228ab610fb2c5ce2770 |
|
BLAKE2b-256 | 268f9d08e37f959e278f4d54c65775e6675ef73adb28649094e532ee2385b2e0 |