Skip to main content

ValidDataSet - TTS Lj Speech Dataset Validator

Project description

ValidDataSet

About [ Menu ]

ValidDataSet was created to help validate datasets created based on the Lj Speech Dataset (for Tacotron, Flowtron, Waveglow, or RadTTS).

VDS works based on plugins (which can be dynamically added by the user in the future).

Descriptions of current plugins can be found in the Plugins section.

Plugins [ Menu ]

Below is a list of currently used plugins (new ones will be added over time).

ID Name Version Description
F001 WavsTranscriptionChecker 23.3.9 Check if all files have been added to the transcription files
F002 WavPropertiesChecker 23.3.9 Check if all files are mono, 22050 Hz with length between 2 and 10 seconds
T001 DatasetStructureChecker 23.3.9 Check if the "wavs" folder and transcription files exist in the dataset
T002 EmptyLineChecker 23.3.9 Check if there are empty lines in the transcriptions
T003 FilesInTranscriptionChecker 23.3.9 Check if all files added to transcription exist
T004 ExistingWavFileTranscriptionChecker 23.3.9 Check if all files added to transcription have a transcription
T005 PunctuationMarksChecker 23.3.9 Check if all transcriptions end with punctuation marks: ".", "?" or "!"
T006 PunctuationMarksChecker 23.3.9 Check if all lines have the same number of PIPE characters
T007 DuplicatedTranscriptionChecker 23.3.9 Check if there are any duplicate paths to WAV files in the transcriptions

Installation [ Menu ]

To install ValidDataSet, use the following command:

pip install vds

Usage [ Menu ]

Command in Linux: vds or vds-win

Command in Windows: vds-win

List of parameters supported by VDS:

 -v, --verbose                    Print additional information
 -o, --output                     Save output to file

     --plugins.list               List plugins
     --plugins.disable            List of plugins to disable like: F001,T002,T006

     --args.path                  Path to dataset
     --args.files                 Set transcription file names like: train.txt,val.txt
     --args.dir-name              wavs folder name (default: wavs)
     --args.sample-rate           Set sample rate (default: 22050)
     --args.number-of-channels    Set number of channels (default: 1 [mono])
     --args.min-duration          Set minimum duration in miliseconds (1000 ms = 1 second)
     --args.max-duration          Set maximum duration in miliseconds (1000 ms = 1 second)
     --args.number-of-pipes       Set number of pipes (|) (default: 1)

Sample commands and their description:

List all plugins:

vds --plugins.list

Run VDS with all plugins without additional information:

vds --args.path /media/username/Disk/Dataset_name/

Run VDS with all plugins with additional information:

vds --args.path /media/username/Disk/Dataset_name/ -v

Run VDS without plugins F001,T002,T006 with additional information:

vds --args.path /media/username/Disk/Dataset_name/ --plugins.disable F001,T002,T006 -v

Run VDS without plugins F001,T002,T006 with own transcription names and with additional information:

vds --args.path /media/username/Disk/Dataset_name/ --plugins.disable F001,T002,T006 --args.files train.txt,val.txt -v

Run VDS and print files which are longer than 20 seconds, shorter than 2 seconds and not in mono:

vds --args.path /media/username/Disk/Dataset_name/ --args.min-duration 2000 --args.max-duration 20000 --args.number-of-channels 2 -v

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vds-23.3.9.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

vds-23.3.9-py3-none-any.whl (15.2 kB view details)

Uploaded Python 3

File details

Details for the file vds-23.3.9.tar.gz.

File metadata

  • Download URL: vds-23.3.9.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.1 CPython/3.10.9 Linux/6.1.0-5-amd64

File hashes

Hashes for vds-23.3.9.tar.gz
Algorithm Hash digest
SHA256 dc8ab3dce2a25a5ad80dad3bdd6188d8bd4f973e0db629a5334d3bab94d98a9b
MD5 ae087a26768638654bb82d8117230a67
BLAKE2b-256 24f2ec6e61ce398f9827a10b478eff53fc4d3a27204da535321dfd83c87fcbd5

See more details on using hashes here.

File details

Details for the file vds-23.3.9-py3-none-any.whl.

File metadata

  • Download URL: vds-23.3.9-py3-none-any.whl
  • Upload date:
  • Size: 15.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.1 CPython/3.10.9 Linux/6.1.0-5-amd64

File hashes

Hashes for vds-23.3.9-py3-none-any.whl
Algorithm Hash digest
SHA256 352d7f28c3f86108f39440ce68dd58d7087d28bf210faa9d015405101573e545
MD5 6c0cd1370d9767761486742027f63194
BLAKE2b-256 35524ac4c12815c18510264ae61dac7a50102c076c324eb6a15b9441b7aaaaea

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page