Aligning text transcripts with their audio recordings.

These details have not been verified by PyPI

Project links

Project description

Timething

Timething is a library for aligning text transcripts with audio. You provide an audio file, as well as a text file with the complete text transcript. Timething will output a list of time-codes for each word and character that indicate when this word or letter was spoken in the audio you provided. Timething strives to be fast and accurate, and can run on both GPUs or CPUs.

Timething uses powerful Wav2Vec based speech recognition models hosted by the Hugging Face AI community. The approach is described in this PyTorch Tutorial, as well as in this paper.

Installation

To install Timething, you'll need an installation of Python 3.7 or 3.8. You can then install it using pip:

pip install timething

Aligning recordings and transcripts

Timething currently expects to find a folder containing one or more chapters in the following form:

└── dir/
    ├── text.csv
    ├── aligned/
    └── audio/
        ├── chapter01.mp3
        ├── chapter02.mp3
        └── chapter03.mp3

Timething can process many audio formats, including MP3, WAV, FLACC and OGG/VORBIS.

The file text.csv should contain one entry per wav file in the following format:

audio/chapter01.mp3|The transcript for chapter01 on a single line here
audio/chapter02.mp3|The transcript for chapter02 on a single line here
audio/chapter03.mp3|The transcript for chapter03 on a single line here

You can now run Timething on your CPU or GPU, for example:

timething align --metadata text.csv --alignments-dir aligned

You can also specify more options, e.g.:

timething align \
  --model german \
  --metadata text.csv \
  --alignments-dir aligned \
  --batch-size 8 \
  --n-workers 8

Run timething --help for a full description.

Results will be written into the given folder, e.g. aligned. They will be written into a single json file named after each audio id. Each file will contain the character level and the word level alignments. For word level alignments, each word will have the starting time in seconds, the ending time in seconds, the confidence level for that word and the word label. Character level alignments have the corresponding results.

You can find an example dataset with alignments output in fixtures/. Here's what the alignment for "one.mp3", which contains only the word "one", looks like:

{
    "n_model_frames": 72,
    "n_audio_samples": 23392,
    "sampling_rate": 16000,
    "chars": [
        {
            "label": "O",
            "start": 0.5888611111111111,
            "end": 0.6497777777777777,
            "score": 0.9999777873357137
        },
        {
            "label": "n",
            "start": 0.6497777777777777,
            "end": 0.7106944444444444,
            "score": 0.99994424978892
        },
        {
            "label": "e!",
            "start": 0.7106944444444444,
            "end": 0.731,
            "score": 0.9999799728393555
        }
    ],
    "chars_cleaned": [
        {
            "label": "o",
            "start": 0.5888611111111111,
            "end": 0.6497777777777777,
            "score": 0.9999777873357137
        },
        {
            "label": "n",
            "start": 0.6497777777777777,
            "end": 0.7106944444444444,
            "score": 0.99994424978892
        },
        {
            "label": "e",
            "start": 0.7106944444444444,
            "end": 0.731,
            "score": 0.9999799728393555
        }
    ],
    "words": [
        {
            "label": "One!",
            "start": 0.5888611111111111,
            "end": 0.731,
            "score": 0.9999637263161796
        }
    ],
    "words_cleaned": [
        {
            "label": "one",
            "start": 0.5888611111111111,
            "end": 0.731,
            "score": 0.9999637263161796
        }
    ]
}

Re-cutting recordings

Once you've run alignment, you can cut your files down to smaller files and write the results into a new folder. For example, if you don't want any of your recordings to exceed 8 seconds, then you can create a new directory and re-cut your data into it like this:

timething recut \
  --from-meta text.csv \
  --to-meta ~/smaller-recordings/text.csv \
  --alignments-dir alignments \
  --cut-threshold-seconds 8.0

Results in this example are written into ~/smaller-recordings.

Supported languages

Currently supported languages can be found in models.yaml. This currently includes English, German, Dutch, Polish, Italian, Portuguese, Spanish, French, Russian, Japanese, Greek and Arabic models. We have only tested the German model so far.

Due to the large number of CTC speech models available on the Hugging Face AI community, new languages can be easily added to Timething. Alternatively, Wav2Vec can be fine-tuned as described here, using any of the Common Voice languages, 87 at the time of writing.

Support for text cleaning is currently minimal, and may need to be extended for new languages.

Alternatives

There are many mature libraries that can already do forced alignment like Timething, e.g. the Montreal forced aligner or Aeneas. One list of tools is maintained here.

Thanks

Thanks to why do birds for allowing the initial work on this library to be open sourced.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.27

Aug 22, 2022

0.0.26

Aug 18, 2022

0.0.25

Aug 18, 2022

0.0.24

Aug 17, 2022

0.0.23

Aug 17, 2022

0.0.22

Aug 17, 2022

0.0.21

Aug 10, 2022

0.0.20

Jun 8, 2022

0.0.19

Jun 7, 2022

0.0.18

May 27, 2022

0.0.16

May 26, 2022

0.0.15

May 25, 2022

0.0.14

May 25, 2022

This version

0.0.13

May 25, 2022

0.0.12

May 25, 2022

0.0.11

May 25, 2022

0.0.10

May 25, 2022

0.0.9

May 24, 2022

0.0.8

May 23, 2022

0.0.7

May 23, 2022

0.0.6

May 22, 2022

0.0.5

May 21, 2022

0.0.4

May 17, 2022

0.0.3

May 17, 2022

0.0.2

May 15, 2022

0.0.2.dev2 pre-release

May 15, 2022

0.0.2.dev1 pre-release

May 14, 2022

0.0.2.dev0 pre-release

May 14, 2022

0.0.1

May 13, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

timething-0.0.13.tar.gz (21.1 kB view details)

Uploaded May 25, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

timething-0.0.13-py3-none-any.whl (18.2 kB view details)

Uploaded May 25, 2022 Python 3

File details

Details for the file timething-0.0.13.tar.gz.

File metadata

Download URL: timething-0.0.13.tar.gz
Upload date: May 25, 2022
Size: 21.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for timething-0.0.13.tar.gz
Algorithm	Hash digest
SHA256	`0e11fce574b3a6250cac2dd61464c5c3e06513932e486958203209e34f832e04`
MD5	`dc74baf8096ea779247fd9e8164a49d9`
BLAKE2b-256	`9a918173826808fc3483c63f5cee9d121a9ef82dea846c445b76c7f0a7b14196`

See more details on using hashes here.

File details

Details for the file timething-0.0.13-py3-none-any.whl.

File metadata

Download URL: timething-0.0.13-py3-none-any.whl
Upload date: May 25, 2022
Size: 18.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for timething-0.0.13-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2d76dec037c7ac01e7cfd68935cf7822a16296be70af99a98e5dd515e1aeff39`
MD5	`f8885d0a502f78db2e2134ad10f6dad3`
BLAKE2b-256	`db6fa45f50c955d53d17e316c3a744c4d29f682c40bf5889ed01bf0f9ba098fc`

See more details on using hashes here.

timething 0.0.13

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Timething

Installation

Aligning recordings and transcripts

Re-cutting recordings

Supported languages

Alternatives

Thanks

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes