Skip to main content

Aligning Very Long audio and text pairs through VAC pipeline

Project description

vac_aligner

PyPI Version VAC - VAD-ASR-CER (Matching) Pipeline Documentation Status

Comprehensive pipeline designed for processing long audio recordings through three main stages: Voice Activity Detection (VAD), Automatic Speech Recognition (ASR), and Character Error Rate (CER) Matching. This pipeline is ideal for improving speech recognition models and training Text-to-Speech (TTS) systems with high accuracy.

Features

  • Robust alignment for long audio sequences

  • Support for multiple languages (mostly western)

Usage

The pipeline processes multi-hour long audio and texts to produce many multi-second audio chunks and corresponding texts with an accuracy of 97%. This high accuracy is achieved through sophisticated matching algorithms that correct common ASR errors such as repeated characters, incomplete words, and incorrect word predictions.

There are a couple of ways the library can be used. One can use the full functionality by

from vac_aligner import run_pipeline

run_pipeline(
   manifest_file="path/to/save/manifest.json", batch_size=64,
   asr_input_file="path/to/manifest.json", # or path to folder, containing audio chunks (.wav)s
   target_base="path/where/to/save/artifacts", # otherwise, will use `asr_input_file`
   init_aligner_after_asr=True  # If there is no long transcript available and you need to extract it
)

There are many scenarios where one might need a partial functionality of the pipeline. Then we can use the classes directly.

Scenario 1

Need to perform ASR over short chunks and store in nemo_manifest

import os

from vac_aligner.dto import ASRConfig
from vac_aligner.asr import ASR_MAPPING, ASR

language = "hy"  #  Armenain
model_name = ASR_MAPPING[language] # "Yeroyan/stt_arm_conformer_ctc_large"
asr_config = ASRConfig(hf_token=os.environ['HF_TOKEN'], batch_size=24)
asr = ASR(model_name, asr_config)
asr.run(
    save_dir="where/to/save/asr/predictions", # .txt(s)
    save_manifest="path/to/save/manifest.json",
    wav_files="path/to/wav/files",
    test_manifest='path/to/manifest/with/predictions.json'
)

Scenario 2

You already have predictions manifest and/or long transcript, and want to run the Matching to obtain the correct chunk texts to replace ASR predictions.

from vac_aligner.matching import ArmenianAlignerVAC

output_file = 'path/to/save/combined/transcript.txt'
predictions_manifest = 'path/to/manifest/with/predictions.json'
chunks, combined_transcript = ArmenianAlignerVAC.combine_transcript(predictions_manifest,
                                                                    output_file,
                                                                    ending_punctuations="․,։")

and then matching

from vac_aligner.matching import ArmenianAlignerVAC

target_base = 'path/to/save/artifacts'
matches_sorted = ArmenianAlignerVAC(combined_transcript,
                                    chunks, output_file.replace(".txt", ".json"),
                                    target_base=target_base).align(0.35)

and get the benchmark

from vac_aligner.matching.benchmark_on_mcv import Benchmark

benchmark = Benchmark(target_base, predictions_manifest)
stats = benchmark.get_benchmark()
benchmark.analyze_and_save_benchmark(stats, output_file)

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

0.0.2 (2024-06-01)

Fix template errors (dates, package name, etc) Add usage examples (partial components of the pipeline) in README.md

0.0.1 (2024-05-31)

  • First release on PyPI.

  1. Design library structure

  2. Implement 2 main steps of the VAC
    • ASR

    • CER based Matching

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vac_aligner-0.1.1.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

vac_aligner-0.1.1-py2.py3-none-any.whl (2.3 MB view details)

Uploaded Python 2 Python 3

File details

Details for the file vac_aligner-0.1.1.tar.gz.

File metadata

  • Download URL: vac_aligner-0.1.1.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for vac_aligner-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0280b33060d39229960ba5f8950b1146c62e6c2c63d23054ed88641e0f17c944
MD5 cf4322a419d711450e2aa1ac0669e9a2
BLAKE2b-256 722cdb8bee097fbdc100cd2284e3a7774fcecbc4e5391281f441e302486a642b

See more details on using hashes here.

File details

Details for the file vac_aligner-0.1.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for vac_aligner-0.1.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 0f026039219244636ac4a88311e46f2a08d5b431c4ded936f1d0c60cb1075d3f
MD5 edeed106e754717e21108f012c121bdc
BLAKE2b-256 088fe288db93600889e55d96a0ae97e14842e6d376119dfa20702c582c679a66

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page