Inspect, modify, and add metadata to DeepSpeech (speech-to-text) datasets in CSV format.
Project description
Inspect, modify, and add metadata to DeepSpeech (speech-to-text) datasets in CSV format.
Description
This tool lets you quickly inspect, edit, and add metadata to a DeepSpeech dataset.
Typical flow:
A server has training sets in it, stored in the DeepSpeech CSV input format.
Some stakeholders would like to quickly inspect the data without having to download all of it
Would like to be able to extend the set with extra metadata, for example, one might want to look at a subset of samples and tag them as noisy vs. clean [functionality not added yet]
Would like to be able to save/export the modified version of the original input CSV.
This tool can be installed and run in the server where the data resides. It’ll expose a web interface that users can connect to.
This is a Python library. The user can load the CSV with Pandas, do whatever filtering or slicing is needed, then call stt_sample_inspector.serve_df(dataframe) which will start the server. When the user is done inspecting/modifying the DataFrame, the function returns the modified DataFrame.
The module also provides convenience functions to make relative paths in the CSV absolute: stt_sample_inspector.utils.read_csv_and_absolutify (read from a file path) and stt_sample_inspector.utils.create_abs_column (create the column given a DataFrame and the folder to make paths relative to) The abs_wav_filename column created by those functions is required by the tool.
In addition, the package provides a CLI tool which takes two CSV files as parameters, one as input and one was output, where the modified DataFrame will be written to once the user is done editing:
stt_sample_inspector input.csv output.csv
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file stt_sample_inspector-0.0.1.tar.gz
.
File metadata
- Download URL: stt_sample_inspector-0.0.1.tar.gz
- Upload date:
- Size: 838.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b1ae03e26ce1df48d97effe41534bdb06f9edd141e5dc4ef149d8710067c35ec |
|
MD5 | a3e5312f8f5eeca533eca90b8a1c5971 |
|
BLAKE2b-256 | 5c9813b1f6064dcdb367dbf73218b5c5c4aad5348564d40d6d3204b0285eb1c7 |
File details
Details for the file stt_sample_inspector-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: stt_sample_inspector-0.0.1-py3-none-any.whl
- Upload date:
- Size: 830.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66789cf0ec9875b28e2ce28364e8db3bd57016d3b09602e569a3c8d0699169f5 |
|
MD5 | dd7c0b27222e5b8f95d4cf56d488a508 |
|
BLAKE2b-256 | 081046cd079cffc56f92d753f78c07300e3c117bec4aef017cb7327ac422af85 |