Create for-alligned transcription TextGrids from Audio
Project description
Transcribe Allign TextGrid
A small wrapper package around whisper-timestamped. Create force-alligned transcription TextGrids from raw audio.
Installation
Requirements
Python3.9
Other python versions might work, but dependencyonnxruntime
is quite iffy.- Use the executable
python3.9
on Unix, available in most package managers, orpy -3.9
on Windows. - The command line executable of python3.9 will be referred to as
[python-executable]
for the rest of the instructions - Install pip on old python versions with
[python-executable] -m ensurepip --default-pip
- Use the executable
ffmpeg
Usually preinstalled on Linux. For windows see instructions for installation on the whisper repositorygit
Usually preinstalled on Linux. For windows, visit the git site.- Needed for installation of whisper-timestamped, as it is not available on pypi
- Note that it needs to be available from the command line; git-bash might not work.
Light installation
If you don't have a Nvidea GPU, or don't want to use it, you cannot use the CUDA platform on which Whisper is run. In this case, you should install a light version of torch before installing whisper-timestamped (and thus this application). Do this with:
[python-executable] -m pip install \
torch==1.13.1+cpu \
torchaudio==0.13.1+cpu \
-f https://download.pytorch.org/whl/torch_stable.html
Installing
Once the requirements are satisfied, you can install whisper-timestamped and this package:
Whisper-timestamped is not on pypi, so the seperate git+
install is needed. (If you only want to use the package as a library instead of a cli, whisper-timestamped is not a dependency, and the manual install is not needed.)
[python-executable] -m pip install git+https://github.com/linto-ai/whisper-timestamped
[python-executable] -m pip install transcribe_allign_textgrid
Running from the command line
Once the application is installed, you can run it with:
[python-executable] -m transcribe_allign_textgrid [path]
here path
is the path to the audio files.
- If a directory path is passed, all audio files in the directory will be transcribed, and force-alligned transcription textgrids of the same name will be generated in this directory.
- If a file path is passed, a force-alligned transcription textgrid will be generated into the same directory as the original file.
Selecting a different language
By default, this will run on the smallest, that is, least accurate and fastest, model, tiny
. To run with another model, pass it as an argument:
[python-executable] -m transcribe_allign_textgrid [path] --model [model]
The available models are:
name | Parameters | Required VRAM | Relative speed |
---|---|---|---|
tiny | 39 M | ~1 GB | ~32x |
base | 74 M | ~1 GB | ~16x |
small | 244 M | ~2 GB | ~6x |
medium | 769 M | ~5 GB | ~2x |
large | 1550 M | ~10 GB | 1x |
Specifying what language to use
By default, the application will try to detect what langage is used automatically. However, you can also specify this manutally:
[python-executable] -m transcribe_allign_textgrid [path] --language [language]
# Or also specifying waht model to use:
[python-executable] -m transcribe_allign_textgrid [path] --model [model] --language [language]
To see what languages are available, please see the tokenizer.py file in the Whisper source (Yes, the OpenAI team themselves recommends finding it this way, too.) (Both the long and the short name work)
Using as a library
The tool can also be used as a library. It exports one function: whisper_to_textgrid()
Which takes in a transcription object (nested dict) from whisper-timestamped and returns a Textgrid object from praatio
This library part of the package does not depend on whisper-timestamped, to make it fully installable and usable as a requirement via pipy.
Output
The output TextGrids have four TextGridTiers:
segment_text
The text in a given segment (Speaker's turn)segment_confidence
The confidence the model has that this is the correct labeling and segmentation for the segmentwords_text
The text of a given wordword_confidence
The confidence the model has that this is the corrent labeling and segmentation for this word.
In praat, it will look a little like this:
Development
The package is quite trivial, but, if you do want work on it, here are some instructions
Style
All code is formatted with the Black code-formatter. As for casing, python standards are used except in cases where dependencies don't.
I am dyslectic, and quite likely to make spelling errors in variables. If you find any, don't hesitate to send me a pull request!
Running Tests
After clonging the repository, moving into it, and installing pytest
and pytest-cov
with pip, run tests with:
# Install current version of package locally to be able to test it.
[python-executable] -m pip install -e .
[python-executable] -m pytest --cov=transcribe_allign_textgrid tests/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for transcribe_allign_textgrid-0.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 76942e4a0dd98fde6d615fabfb1a167bea4ffa6e04cf8fe4a0887189431f75fd |
|
MD5 | aada109b02c0f795d31700d147522502 |
|
BLAKE2b-256 | 55c50435f49cc5ff94d0b9202ddca80c9b2722a4f52dd1ce7feb77e673daad75 |
Hashes for transcribe_allign_textgrid-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 10402f5979d7d1a030c17f6a4ed8207ec152b61f188b03f64f9862ce1ba95596 |
|
MD5 | d3e00ec94ed191ec8d47c46e840b0385 |
|
BLAKE2b-256 | d89a66e70ad5a6e80c88704b573a6a9c0b94d3f8ade338db0660a2de74328e5b |