Skip to main content

Create for-alligned transcription TextGrids from Audio

Project description

Transcribe Allign TextGrid

A small wrapper package around whisper-timestamped. Create force-aligned transcription TextGrids from raw audio.

Installation

Requirements

  • Python3.8 to python3.11.
    • Use the executable python3.x on Unix, available in most package managers, or py -3.x on Windows.
    • This command line executable of will be referred to as [python-executable] for the rest of the instructions
    • Install pip on old python versions with [python-executable] -m ensurepip --default-pip
  • ffmpeg Usually preinstalled on Linux. For Windows see instructions for installation on the whisper repository
  • git Usually preinstalled on Linux. For Windows, visit the git site.
    • Needed for installation of whisper-timestamped, as it is not available on PyPI
    • Note that it needs to be available from the command line; git-bash might not work.

Installing Torch

Torch, on which Whisper is built, is quite a low-level library, meaning which version you'll need depends on your OS and type of GPU. On Mac and Windows, pip will by default install a non-accelerated CPU version of the library. If you are on Linux, it will presume you have a CUDA-capable (which is to say Nvidia branded) GPU. If you are on Windows and have an Nvidia GPU you can use, or are on Linux and either do not have a GPU or have an AMD GPU, you should check out the more detailed torch installation instructions here.

This should be done before installing transcribe_allign_textgrid and whisper_timestamped.

Installing

Once the requirements are satisfied, you can install whisper-timestamped and this package:

Whisper-timestamped is not on Pypi, so a separate git+ install is needed. (If you only want to use the package as a library instead of a cli, whisper-timestamped is not a dependency, and this manual install of it is not needed.)

[python-executable] -m pip install git+https://github.com/linto-ai/whisper-timestamped
[python-executable] -m pip install transcribe_allign_textgrid

Running from the command line

Once the application is installed, you can run it with:

[python-executable] -m transcribe_allign_textgrid [path]

here path is the path to the audio files.

  • If a directory path is passed, all audio files in the directory will be transcribed, and force-aligned transcription TextGrids of the same name will be generated in this directory.
  • If a file path is passed, a force-aligned transcription TextGrid will be generated into the same directory with the same name as the original file.
  • If a glob is passed, the glob will be resolved and all matches will be processed as if the files were passed individually
  • By default, if a non-audio file is passed, an error is raised. To skip those instead, pass the --skip flag.

Selecting a different model

By default, this will run on the smallest, that is, least accurate and fastest, model, tiny. To run with another model, pass it as an argument:

[python-executable] -m transcribe_allign_textgrid [path] --model [model]

The available models are:

name Parameters Required VRAM Relative speed
tiny 39 M ~1 GB ~32x
base 74 M ~1 GB ~16x
small 244 M ~2 GB ~6x
medium 769 M ~5 GB ~2x
large 1550 M ~10 GB 1x

Specifying what language to use

By default, the application will try to detect what language is used automatically. However, you can also specify this manually:

[python-executable] -m transcribe_allign_textgrid [path] --language [language]

# Or also specifying what model to use:
[python-executable] -m transcribe_allign_textgrid [path] --model [model] --language [language]

To see what languages are available, please see the tokenizer.py file in the Whisper source (Yes, the OpenAI team themselves recommends finding it this way, too.)

Using as a library

The tool can also be used as a library. It exports one function: whisper_to_textgrid() Which takes in a transcription object (nested dictionary) from whisper-timestamped and returns a Textgrid object from praatio. The typical Json output from whisper-timestamped works, too.

This library part of the package does not depend on whisper-timestamped, to make it fully installable and usable as a requirement via pipy.

Output

The output TextGrids have four TextGridTiers:

  • segments_text The text in a given segment (Speaker's turn)
  • segments_confidence The confidence the model has that this is the correct labeling and segmentation for the segment
  • words_text The text of a given word
  • words_confidence The confidence the model has that this is the current labeling and segmentation for this word.

If one of these tiers would have been empty per the output of whisper-timestamped, to satisfy Praat's error handling, a tier with an empty interval (0.0, 0.1) is generated.

In praat, it will look a little like this:

Development

The package is quite trivial, but, if you want to work on it, here are some instructions

Style

All code is formatted with the Black code-formatter. As for casing, python standards are used except in cases where dependencies don't.

I am dyslectic, and quite likely to make spelling errors in variables. If you find any, don't hesitate to send me a pull request!

Running Tests

After cloning the repository, moving into it, and installing pytest and pytest-cov with pip, run tests with:

# Install the current version of the package locally to be able to test it.
[python-executable] -m pip install -e .

[python-executable] -m pytest --cov=transcribe_allign_textgrid tests/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transcribe_allign_textgrid-0.1.5.tar.gz (21.5 kB view details)

Uploaded Source

Built Distribution

transcribe_allign_textgrid-0.1.5-py3-none-any.whl (20.3 kB view details)

Uploaded Python 3

File details

Details for the file transcribe_allign_textgrid-0.1.5.tar.gz.

File metadata

File hashes

Hashes for transcribe_allign_textgrid-0.1.5.tar.gz
Algorithm Hash digest
SHA256 5b68dd6c2506ddeb93f42bff51124548ce10152935f5a1fb2106dffb26fd9771
MD5 180e756bcf2d2034221aa348ac3d0cfb
BLAKE2b-256 bda4b614688568e55186c95b9f69022bee9040cb70642ad795275ebbb99f1ffe

See more details on using hashes here.

File details

Details for the file transcribe_allign_textgrid-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for transcribe_allign_textgrid-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 ae87dfb448429de3a5f43703ac4bd5c635b7bb33430c2cfd5ce9f2cc355dd8f4
MD5 1cae73284621f494fbb9f61e9208a510
BLAKE2b-256 59665a5d54d4046182135267f8615da65d82fbe2d054a59418a939e8b3eac2e9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page