Skip to main content

Inspect, modify, and add metadata to DeepSpeech (speech-to-text) datasets in CSV format.

Project description

Inspect, modify, and add metadata to DeepSpeech (speech-to-text) datasets in CSV format.

Description

This tool lets you quickly inspect, edit, and add metadata to a DeepSpeech dataset.

Typical flow:

  • A server has training sets in it, stored in the DeepSpeech CSV input format.

  • Some stakeholders would like to quickly inspect the data without having to download all of it

  • Would like to be able to extend the set with extra metadata, for example, one might want to look at a subset of samples and tag them as noisy vs. clean [functionality not added yet]

  • Would like to be able to save/export the modified version of the original input CSV.

This tool can be installed and run in the server where the data resides. It’ll expose a web interface that users can connect to.

This is a Python library. The user can load the CSV with Pandas, do whatever filtering or slicing is needed, then call stt_sample_inspector.serve_df(dataframe) which will start the server. When the user is done inspecting/modifying the DataFrame, the function returns the modified DataFrame.

The module also provides convenience functions to make relative paths in the CSV absolute: stt_sample_inspector.utils.read_csv_and_absolutify (read from a file path) and stt_sample_inspector.utils.create_abs_column (create the column given a DataFrame and the folder to make paths relative to) The abs_wav_filename column created by those functions is required by the tool.

In addition, the package provides a CLI tool which takes two CSV files as parameters, one as input and one was output, where the modified DataFrame will be written to once the user is done editing:

stt_sample_inspector input.csv output.csv

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stt_sample_inspector-0.0.1.tar.gz (838.5 kB view details)

Uploaded Source

Built Distribution

stt_sample_inspector-0.0.1-py3-none-any.whl (830.1 kB view details)

Uploaded Python 3

File details

Details for the file stt_sample_inspector-0.0.1.tar.gz.

File metadata

  • Download URL: stt_sample_inspector-0.0.1.tar.gz
  • Upload date:
  • Size: 838.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.13

File hashes

Hashes for stt_sample_inspector-0.0.1.tar.gz
Algorithm Hash digest
SHA256 b1ae03e26ce1df48d97effe41534bdb06f9edd141e5dc4ef149d8710067c35ec
MD5 a3e5312f8f5eeca533eca90b8a1c5971
BLAKE2b-256 5c9813b1f6064dcdb367dbf73218b5c5c4aad5348564d40d6d3204b0285eb1c7

See more details on using hashes here.

File details

Details for the file stt_sample_inspector-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for stt_sample_inspector-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 66789cf0ec9875b28e2ce28364e8db3bd57016d3b09602e569a3c8d0699169f5
MD5 dd7c0b27222e5b8f95d4cf56d488a508
BLAKE2b-256 081046cd079cffc56f92d753f78c07300e3c117bec4aef017cb7327ac422af85

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page