Skip to main content

Automated transcription and diarization of linguistic data

Project description

fave-asr: Automated transcription of interview data

Maturity badge - level 1 PRs Welcome Python GitHub

PyPI Build status Build Docs codecov

The FAVE-asr package provides a system for the automated transcription of sociolinguistic interview data on local machines for use by aligners like FAVE or the Montreal Forced Aligner. The package provides functions to label different speakers in the same audio (diarization), transcribe speech, and output TextGrids with phrase- or word-level alignments.

Example Use Cases

  • You want a transcription of an interview for more detailed hand correction.
  • You want to transcribe a large corpus and your analysis can tolerate a small error rate.
  • You want to make an audio corpus into a text corpus.
  • You want to know the number of speakers in an audio file.

For examples on how to use the pacakge, see the Usage pages.

Installation

To install fave-asr using pip, run the following command in your terminal:

pip install fave-asr

Other software required

Not another transcription service

There are several services which automate the process of transcribing audio, including

Unlike other services, fave-asr does not require uploading your data to other servers and instead focuses on processing audio on your own computer. Audio data can contain highly confidential information, and uploading this data to other services may not comply with ethical or legal data protection obligations. The goal of fave-asr is to serve those use cases where data protection makes local transcription necessary while making the process as seamless as cloud-based transcription services.

Example

As an example, we'll transcribe an audio interview of Snoop Dogg by the 85 South Media podcast and output it as a TextGrid.

import fave_asr

data = fave_asr.transcribe_and_diarize(
    audio_file = 'usage/resources/SnoopDogg_85SouthMedia.wav',
    hf_token = '',
    model_name = 'small.en',
    device = 'cpu'
    )
tg = fave_asr.to_TextGrid(data)
tg.write('SnoopDogg_85SouthMedia.TextGrid')

Using gated models

Artifical Intelegence models are powerful and in the wrong hands can be dangerous. The models used by fave-asr are cost-free, but you need to accept additional terms of use.

To use these models:

  1. On HuggingFace, create an account or log in
  2. Accept the terms and conditions for the segmentation model
  3. Accept the terms and conditions for the diarization model
  4. Create an access token or copy your existing token

Keep track of your token and keep it safe (e.g. don't accidentally upload it to GitHub). We suggest creating an environment variable for your token so that you don't need to paste it into your files.

Creating an environment variable for your token

Storing your tokens as environment variables is a good way to avoid accidentally leaking them. Instead of typing the token into your code and deleting it before you commit, you can use os.environ["HF_TOKEN"] to access it from Python instead. This also makes your code more readable since it's obvious what HF_TOKEN is while a string of numbers and letters isn't clear.

Linux and Mac

On Linux and Mac you can store your token in .bashrc

  1. Open $HOME/.bashrc in a text editor
  2. At the end of that file, add the following HF_TOKEN='<your token>' ; export HF_TOKEN replacing <your token> with your HuggingFace token
  3. Add the changes to your current session using source $HOME/.bashrc

Windows

On Windows, use the setx command to create an environment variable.

setx HF_TOKEN <your token>

You need to restart the command line afterwards to make the environment variable available for use. If you try to use the variable in the same window you set the variable, you will run into problems.

Other software required

  • ffmpeg

Authors

Luís Roque contributed substantially to the main speaker diarization pipeline. Initial modifications to that code were made by Christian Brickhouse for stability and use as part of the fave-asr library. For licensing of the test audio, see the README in that directory.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fave_asr-0.1.0rc1.tar.gz (17.9 kB view details)

Uploaded Source

Built Distribution

fave_asr-0.1.0rc1-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file fave_asr-0.1.0rc1.tar.gz.

File metadata

  • Download URL: fave_asr-0.1.0rc1.tar.gz
  • Upload date:
  • Size: 17.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.6 Linux/6.5.0-26-generic

File hashes

Hashes for fave_asr-0.1.0rc1.tar.gz
Algorithm Hash digest
SHA256 9abe7900d941ac587dd2adfc007fdb0384d2c61784871d7ae21a3e17f3e83a58
MD5 cb77aeb025f5bdc8e84c0337a779a25e
BLAKE2b-256 91328d3369946bcfefe38f7c7ca7d90e2c2147d773bfee48eac1f12812ba5a2b

See more details on using hashes here.

File details

Details for the file fave_asr-0.1.0rc1-py3-none-any.whl.

File metadata

  • Download URL: fave_asr-0.1.0rc1-py3-none-any.whl
  • Upload date:
  • Size: 18.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.6 Linux/6.5.0-26-generic

File hashes

Hashes for fave_asr-0.1.0rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 63085250cf554e15749d90845ed93fe1973b53df81e9d661af0697a1d5377a63
MD5 5963f6ce978a2c8f119f1a5854b677b4
BLAKE2b-256 04fa54b5a787e109138a7d8baafb4ed853127355de6330f4e49bad9cf9e405d4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page