Aligning Very Long audio and text pairs through VAC pipeline
Project description
vac_aligner
VAC - VAD-ASR-CER (Matching) Pipeline
Comprehensive pipeline designed for processing long audio recordings through three main stages: Voice Activity Detection (VAD), Automatic Speech Recognition (ASR), and Character Error Rate (CER) Matching. This pipeline is ideal for improving speech recognition models and training Text-to-Speech (TTS) systems with high accuracy.
Free software: Apache Software License 2.0
Documentation: https://vac-aligner.readthedocs.io
Features
Robust alignment for long audio sequences
Support for multiple languages (mostly western)
Usage
The pipeline processes multi-hour long audio and texts to produce many multi-second audio chunks and corresponding texts with an accuracy of 97%. This high accuracy is achieved through sophisticated matching algorithms that correct common ASR errors such as repeated characters, incomplete words, and incorrect word predictions.
There are a couple of ways the library can be used. One can use the full functionality by
from vac_aligner import run_pipeline
run_pipeline(
manifest_file="path/to/save/manifest.json", batch_size=64,
asr_input_file="path/to/manifest.json", # or path to folder, containing audio chunks (.wav)s
target_base="path/where/to/save/artifacts", # otherwise, will use `asr_input_file`
init_aligner_after_asr=True # If there is no long transcript available and you need to extract it
)
There are many scenarios where one might need a partial functionality of the pipeline. Then we can use the classes directly.
Scenario 1
Need to perform ASR over short chunks and store in nemo_manifest
import os
from vac_aligner.dto import ASRConfig
from vac_aligner.asr import ASR_MAPPING, ASR
language = "hy" # Armenain
model_name = ASR_MAPPING[language] # "Yeroyan/stt_arm_conformer_ctc_large"
asr_config = ASRConfig(hf_token=os.environ['HF_TOKEN'], batch_size=24)
asr = ASR(model_name, asr_config)
asr.run(
save_dir="where/to/save/asr/predictions", # .txt(s)
save_manifest="path/to/save/manifest.json",
wav_files="path/to/wav/files",
test_manifest='path/to/manifest/with/predictions.json'
)
Scenario 2
You already have predictions manifest and/or long transcript, and want to run the Matching to obtain the correct chunk texts to replace ASR predictions.
from vac_aligner.matching import ArmenianAlignerVAC
output_file = 'path/to/save/combined/transcript.txt'
predictions_manifest = 'path/to/manifest/with/predictions.json'
chunks, combined_transcript = ArmenianAlignerVAC.combine_transcript(predictions_manifest,
output_file,
ending_punctuations="․,։")
and then matching
from vac_aligner.matching import ArmenianAlignerVAC
target_base = 'path/to/save/artifacts'
matches_sorted = ArmenianAlignerVAC(combined_transcript,
chunks, output_file.replace(".txt", ".json"),
target_base=target_base).align(0.35)
and get the benchmark
from vac_aligner.matching.benchmark_on_mcv import Benchmark
benchmark = Benchmark(target_base, predictions_manifest)
stats = benchmark.get_benchmark()
benchmark.analyze_and_save_benchmark(stats, output_file)
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
History
0.0.2 (2024-06-01)
Fix template errors (dates, package name, etc) Add usage examples (partial components of the pipeline) in README.md
0.0.1 (2024-05-31)
First release on PyPI.
Design library structure
- Implement 2 main steps of the VAC
ASR
CER based Matching
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file vac_aligner-0.2.0.tar.gz
.
File metadata
- Download URL: vac_aligner-0.2.0.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9b322169ad319e98d14c41be27be3670757ff49c597486c3d351c5984d5ceed7 |
|
MD5 | 11c230ee42cc05f6bf21f4887061c132 |
|
BLAKE2b-256 | 57918d88a8b2b51eec779f139e5107604c0c6a89a9d5090bbbbbe3b3044b1325 |
File details
Details for the file vac_aligner-0.2.0-py2.py3-none-any.whl
.
File metadata
- Download URL: vac_aligner-0.2.0-py2.py3-none-any.whl
- Upload date:
- Size: 2.4 MB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 30497620b5a38fb91f0a55f4abf7b5ac61541c78a555ea73dc645bc6ccafb78b |
|
MD5 | b9ece0f9a3e3720e066da73f22488f18 |
|
BLAKE2b-256 | 430c2f7e88986680872625b8bd62698ffe0029316646c1f241fa413f1b93daf4 |