Skip to main content

Aligning Very Long audio and text pairs through VAC pipeline

Project description

vac_aligner

PyPI Version VAC - VAD-ASR-CER (Matching) Pipeline Documentation Status

Comprehensive pipeline designed for processing long audio recordings through three main stages: Voice Activity Detection (VAD), Automatic Speech Recognition (ASR), and Character Error Rate (CER) Matching. This pipeline is ideal for improving speech recognition models and training Text-to-Speech (TTS) systems with high accuracy.

Features

  • Robust alignment for long audio sequences

  • Support for multiple languages (mostly western)

Usage

The pipeline processes multi-hour long audio and texts to produce many multi-second audio chunks and corresponding texts with an accuracy of 97%. This high accuracy is achieved through sophisticated matching algorithms that correct common ASR errors such as repeated characters, incomplete words, and incorrect word predictions.

There are a couple of ways the library can be used. One can use the full functionality by

from vac_aligner import run_pipeline

run_pipeline(
   manifest_file="path/to/save/manifest.json", batch_size=64,
   asr_input_file="path/to/manifest.json", # or path to folder, containing audio chunks (.wav)s
   target_base="path/where/to/save/artifacts", # otherwise, will use `asr_input_file`
   init_aligner_after_asr=True  # If there is no long transcript available and you need to extract it
)

There are many scenarios where one might need a partial functionality of the pipeline. Then we can use the classes directly.

Scenario 1

Need to perform ASR over short chunks and store in nemo_manifest

import os

from vac_aligner.dto import ASRConfig
from vac_aligner.asr import ASR_MAPPING, ASR

language = "hy"  #  Armenain
model_name = ASR_MAPPING[language] # "Yeroyan/stt_arm_conformer_ctc_large"
asr_config = ASRConfig(hf_token=os.environ['HF_TOKEN'], batch_size=24)
asr = ASR(model_name, asr_config)
asr.run(
    save_dir="where/to/save/asr/predictions", # .txt(s)
    save_manifest="path/to/save/manifest.json",
    wav_files="path/to/wav/files",
    test_manifest='path/to/manifest/with/predictions.json'
)

Scenario 2

You already have predictions manifest and/or long transcript, and want to run the Matching to obtain the correct chunk texts to replace ASR predictions.

from vac_aligner.matching import ArmenianAlignerVAC

output_file = 'path/to/save/combined/transcript.txt'
predictions_manifest = 'path/to/manifest/with/predictions.json'
chunks, combined_transcript = ArmenianAlignerVAC.combine_transcript(predictions_manifest,
                                                                    output_file,
                                                                    ending_punctuations="․,։")

and then matching

from vac_aligner.matching import ArmenianAlignerVAC

target_base = 'path/to/save/artifacts'
matches_sorted = ArmenianAlignerVAC(combined_transcript,
                                    chunks, output_file.replace(".txt", ".json"),
                                    target_base=target_base).align(0.35)

and get the benchmark

from vac_aligner.matching.benchmark_on_mcv import Benchmark

benchmark = Benchmark(target_base, predictions_manifest)
stats = benchmark.get_benchmark()
benchmark.analyze_and_save_benchmark(stats, output_file)

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

0.0.2 (2024-06-01)

Fix template errors (dates, package name, etc) Add usage examples (partial components of the pipeline) in README.md

0.0.1 (2024-05-31)

  • First release on PyPI.

  1. Design library structure

  2. Implement 2 main steps of the VAC
    • ASR

    • CER based Matching

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vac_aligner-0.1.1.tar.gz (1.1 MB view hashes)

Uploaded Source

Built Distribution

vac_aligner-0.1.1-py2.py3-none-any.whl (2.3 MB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page