Skip to main content

Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition.

Project description

Wav2Vec2 STT Python

Beta Software

Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition.

Donate Donate Donate

Requirements:

  • Python 3.7+
  • Platform: Linux x64 (Windows is a work in progress; MacOS may work; PRs welcome)
  • Python package requirements: cffi, numpy
  • Wav2Vec2 2.0 Model (must be converted to compatible format)
    • Several are available ready-to-go on this project's releases page and below.
    • You can convert your own models by following the instructions here.

Models:

Model Download Size
Facebook Wav2Vec2 2.0 Base (960h) 360 MB
Facebook Wav2Vec2 2.0 Large (960h) 1.18 GB
Facebook Wav2Vec2 2.0 Large LV60 (960h) 1.18 GB
Facebook Wav2Vec2 2.0 Large LV60 Self (960h) 1.18 GB

Usage

from wav2vec2_stt import Wav2Vec2STT
decoder = Wav2Vec2STT('model_dir')

import wave
wav_file = wave.open('tests/test.wav', 'rb')
wav_samples = wav_file.readframes(wav_file.getnframes())

assert decoder.decode(wav_samples).strip().lower() == 'it depends on the context'

Also contains a simple CLI interface for recognizing wav files:

$ python -m wav2vec2_stt decode model test.wav
IT DEPENDS ON THE CONTEXT
$ python -m wav2vec2_stt decode model test.wav test.wav
IT DEPENDS ON THE CONTEXT
IT DEPENDS ON THE CONTEXT
$ python -m wav2vec2_stt -h
usage: python -m wav2vec2_stt [-h] {decode} ...

positional arguments:
  {decode}    sub-command
    decode    decode one or more WAV files

optional arguments:
  -h, --help  show this help message and exit

Installation/Building

Recommended installation via wheel from pip (requires a recent version of pip):

python -m pip install wav2vec2_stt

See setup.py for more details on building it yourself.

Author

License

This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0-or-later). See the LICENSE file for details. If this license is problematic for you, please contact me.

Acknowledgments

  • Contains and uses code from PyTorch and torchaudio, licensed under the BSD 2-Clause License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

wav2vec2_stt-0.2.0-py2.py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (101.1 MB view details)

Uploaded Python 2Python 3manylinux: glibc 2.17+ x86-64

File details

Details for the file wav2vec2_stt-0.2.0-py2.py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for wav2vec2_stt-0.2.0-py2.py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 231df1c52cb3aaf3e36edc4a2fd710e68b23d775b5d07e1d0562aca5a8aecfab
MD5 76372978cdba7115336969cdb4148b08
BLAKE2b-256 337534edab90ccc60170d7f522dabd20c33b42baec256d025aa03d06e0186e4a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page