Skip to main content

Wav2Vec2-based transcriptor fine tuned on chilean lessons

Project description

CLTranscriptor

Wrapper for spanish speech-to-text model based on huggingface's Wav2Vec2ForCTC and finetuned on Chilean lessons + PySpellChecker's spanish spellchecking algorithm.

Install

To install, simply use pip:

pip install cltranscriptor

Usage

To use, initialize a Transcriptor object:

from cltranscriptor.cltranscriptor import Transcriptor
transcriptor = Transcriptor()

By default, spell checking is set to True and the model name is the one available at dannersm/wav2vec2-large-xlsr-53-chilean-lessons, which is based on Jonatas Grosman's model and finetuned on a 6 hour set of chilean lessons.

To transcribe a file, call Transcriptor.transcribe():

transcriptor.transcribe('/path/to/your/audio_file.wav')

By default, the file is streamed into 10 second intervals (to avoid loading it in memory) and returns a list with the transcripts for each segment. If you want to transcribe a relatively short file all at once you can pass interval=None:

transcriptor.transcribe('my_file.wav', interval=None)

You can also pass the offset and duration parameters which will be passed to librosa.stream to set the start time and a maximum duration to the transcription

transcriptor.transcribe('my_file.wav', offset=600, duration=120) # transcribe 2 minutes of audio starting from minute 10

Finally, you can control the length of the streamed segments passing interval:

transcriptor.transcribe('my_file.wav', interval=15) # transcribe every 15 seconds 

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

CLTranscriptor-0.0.6.tar.gz (4.0 kB view hashes)

Uploaded Source

Built Distribution

CLTranscriptor-0.0.6-py3-none-any.whl (4.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page