Skip to main content

Aligning Very Long audio and text pairs through VAC pipeline

Project description

vac_aligner

PyPI Version VAC - VAD-ASR-CER (Matching) Pipeline Documentation Status

Comprehensive pipeline designed for processing long audio recordings through three main stages: Voice Activity Detection (VAD), Automatic Speech Recognition (ASR), and Character Error Rate (CER) Matching. This pipeline is ideal for improving speech recognition models and training Text-to-Speech (TTS) systems with high accuracy.

Features

  • Robust alignment for long audio sequences

  • Support for multiple languages (mostly western)

Usage

The pipeline processes multi-hour long audio and texts to produce many multi-second audio chunks and corresponding texts with an accuracy of 97%. This high accuracy is achieved through sophisticated matching algorithms that correct common ASR errors such as repeated characters, incomplete words, and incorrect word predictions.

There are a couple of ways the library can be used. One can use the full functionality by

from vac_aligner import run_pipeline

run_pipeline(
   manifest_file="path/to/save/manifest.json", batch_size=64,
   asr_input_file="path/to/manifest.json", # or path to folder, containing audio chunks (.wav)s
   target_base="path/where/to/save/artifacts", # otherwise, will use `asr_input_file`
   init_aligner_after_asr=True  # If there is no long transcript available and you need to extract it
)

There are many scenarios where one might need a partial functionality of the pipeline. Then we can use the classes directly.

Scenario 1

Need to perform ASR over short chunks and store in nemo_manifest

import os

from vac_aligner.dto import ASRConfig
from vac_aligner.asr import ASR_MAPPING, ASR

language = "hy"  #  Armenain
model_name = ASR_MAPPING[language] # "Yeroyan/stt_arm_conformer_ctc_large"
asr_config = ASRConfig(hf_token=os.environ['HF_TOKEN'], batch_size=24)
asr = ASR(model_name, asr_config)
asr.run(
    save_dir="where/to/save/asr/predictions", # .txt(s)
    save_manifest="path/to/save/manifest.json",
    wav_files="path/to/wav/files",
    test_manifest='path/to/manifest/with/predictions.json'
)

Scenario 2

You already have predictions manifest and/or long transcript, and want to run the Matching to obtain the correct chunk texts to replace ASR predictions.

from vac_aligner.matching import ArmenianAlignerVAC

output_file = 'path/to/save/combined/transcript.txt'
predictions_manifest = 'path/to/manifest/with/predictions.json'
chunks, combined_transcript = ArmenianAlignerVAC.combine_transcript(predictions_manifest,
                                                                    output_file,
                                                                    ending_punctuations="․,։")

and then matching

from vac_aligner.matching import ArmenianAlignerVAC

target_base = 'path/to/save/artifacts'
matches_sorted = ArmenianAlignerVAC(combined_transcript,
                                    chunks, output_file.replace(".txt", ".json"),
                                    target_base=target_base).align(0.35)

and get the benchmark

from vac_aligner.matching.benchmark_on_mcv import Benchmark

benchmark = Benchmark(target_base, predictions_manifest)
stats = benchmark.get_benchmark()
benchmark.analyze_and_save_benchmark(stats, output_file)

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

0.0.2 (2024-06-01)

Fix template errors (dates, package name, etc) Add usage examples (partial components of the pipeline) in README.md

0.0.1 (2024-05-31)

  • First release on PyPI.

  1. Design library structure

  2. Implement 2 main steps of the VAC
    • ASR

    • CER based Matching

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vac_aligner-0.2.0.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

vac_aligner-0.2.0-py2.py3-none-any.whl (2.4 MB view details)

Uploaded Python 2 Python 3

File details

Details for the file vac_aligner-0.2.0.tar.gz.

File metadata

  • Download URL: vac_aligner-0.2.0.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for vac_aligner-0.2.0.tar.gz
Algorithm Hash digest
SHA256 9b322169ad319e98d14c41be27be3670757ff49c597486c3d351c5984d5ceed7
MD5 11c230ee42cc05f6bf21f4887061c132
BLAKE2b-256 57918d88a8b2b51eec779f139e5107604c0c6a89a9d5090bbbbbe3b3044b1325

See more details on using hashes here.

File details

Details for the file vac_aligner-0.2.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for vac_aligner-0.2.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 30497620b5a38fb91f0a55f4abf7b5ac61541c78a555ea73dc645bc6ccafb78b
MD5 b9ece0f9a3e3720e066da73f22488f18
BLAKE2b-256 430c2f7e88986680872625b8bd62698ffe0029316646c1f241fa413f1b93daf4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page