Skip to main content

Add a short description here!

Project description

codecov Test docs

Thunder speech

A Hackable speech recognition library.

What to expect from this project:

  • End-to-end speech recognition models
  • Simple fine-tuning to new languages
  • Inference support as a first-class feature
  • Developer oriented api

What it's not:

  • A general-purpose speech toolkit
  • A collection of complex systems that require thousands of gpu-hours and expert knowledge, only focusing on the state-of-the-art results

Quick usage guide

Install

Install the library from PyPI:

pip install thunder-speech

Optionally, if you want to train wav2vec 2.0:

pip install thunder-speech[transformers]

Import desired models

from thunder.quartznet.module import QuartznetModule,  QuartznetCheckpoint

# Tab completion works to discover other QuartznetCheckpoint.*
model = QuartznetModule.load_from_nemo(QuartznetCheckpoint.QuartzNet5x5LS_En)

Load audio and predict

import torchaudio
audio, sr = torchaudio.load("my_sample_file.wav")

transcriptions = model.predict(audio)
# transcriptions is a list of strings with the captions.

More quick tips

If you want to know how to export the models using torchscript, access the raw probabilities and decode manually or fine-tune the models you can access the documentation here.

Contributing

The first step to contribute is to do an editable installation of the library:

git clone https://github.com/scart97/thunder-speech.git
cd thunder-speech
pip install -e .[dev,testing]
pre-commit install

Then, make sure that everything is working. You can run the test suit, that is based on pytest:

RUN_SLOW=1 pytest

Here the RUN_SLOW flag is used to run all the tests, including the ones that might download checkpoints or do small training runs and are marked as slow. If you don't have a CUDA capable gpu, some tests will be unconditionally skipped.

Influences

This library has heavy influence of the best practices in the pytorch ecosystem. The original model code, including checkpoints, is based on the NeMo ASR toolkit. From there also came the inspiration for the fine-tuning and prediction api's.

The data loading and processing is loosely based on my experience using fast.ai. It tries to decouple transforms that happen at the item level from the ones that are efficiently implemented for the whole batch at the GPU. Also, the idea that default parameters should be great.

The overall organization of code and decoupling follows the pytorch-lightning ideals, with self-contained modules that try to reduce the boilerplate necessary.

Finally, the transformers library inspired the simple model implementations, with a clear separation in folders containing the specific code that you need to understand each architecture and preprocessing, and their strong test suit.

Note

This project has been set up using PyScaffold 3.3. For details and usage information on PyScaffold see https://pyscaffold.org/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thunder-speech-2.2.2.tar.gz (217.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thunder_speech-2.2.2-py2.py3-none-any.whl (45.5 kB view details)

Uploaded Python 2Python 3

File details

Details for the file thunder-speech-2.2.2.tar.gz.

File metadata

  • Download URL: thunder-speech-2.2.2.tar.gz
  • Upload date:
  • Size: 217.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for thunder-speech-2.2.2.tar.gz
Algorithm Hash digest
SHA256 e54135a139507a961d38d564fd1c25535b05614059f6bd2710bb8f0a5deb9160
MD5 e0ea8e1f26d7934e9183a11e23f1c6c3
BLAKE2b-256 504b9ea3b098106bfa0fdad088883022678743316931b61ce272adf0a45834fb

See more details on using hashes here.

File details

Details for the file thunder_speech-2.2.2-py2.py3-none-any.whl.

File metadata

  • Download URL: thunder_speech-2.2.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 45.5 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for thunder_speech-2.2.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 382124a4056009bd247b8e4fdf9c62a4f7f11823ee414d2365f007ef7004f482
MD5 2ee6cfa625c3bbe2534e2fb9197edf2b
BLAKE2b-256 9dee36522e3f83d490a601bf43e2c705024f83a4ff5d28b7ad3691cf7d8d7df1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page