Skip to main content

TorchAudio Forced Aligner

Project description

torchfa

PyPI License

A Python package for performing forced alignment on audio files using Torchaudio's MMS model. This tool aligns audio with text transcripts to provide precise timing information for each word, making it useful for speech analysis, subtitling, and other applications requiring accurate speech-text synchronization.

Features

  • High-accuracy forced alignment using Torchaudio's MMS model
  • Support for both Chinese and English text
  • Batch processing capabilities for multiple audio files
  • Output aligned segments in various formats including TextGrid

Installation

pip install torchfa

Usage

Basic Usage

from torchfa import TorchaudioForcedAligner

aligner = TorchaudioForcedAligner()

audio = "assets/clean_speech.wav"
transcript = "关服务高端产品仍处于供不应求的局面"
cut = aligner.align_audios(audio, transcript)

# Save aligned audio segments
cut.trim_to_alignments("word").save_audios("./")

# Print alignment results
for alignment in cut.supervisions[0].alignment["word"]:
    print(alignment)

Output:

AlignmentItem(symbol='关', start=0.02, duration=0.121, score=0.21)
AlignmentItem(symbol='服', start=0.241, duration=0.141, score=0.07)
AlignmentItem(symbol='务', start=0.502, duration=0.101, score=0.49)
AlignmentItem(symbol='高', start=0.724, duration=0.181, score=0.97)
AlignmentItem(symbol='端', start=0.945, duration=0.141, score=0.52)
AlignmentItem(symbol='产', start=1.126, duration=0.201, score=0.81)
AlignmentItem(symbol='品', start=1.367, duration=0.141, score=0.35)
AlignmentItem(symbol='仍', start=1.608, duration=0.201, score=0.89)
AlignmentItem(symbol='处', start=1.869, duration=0.121, score=0.72)
AlignmentItem(symbol='于', start=2.09, duration=0.06, score=0.96)
AlignmentItem(symbol='供', start=2.251, duration=0.161, score=0.95)
AlignmentItem(symbol='不', start=2.452, duration=0.06, score=0.69)
AlignmentItem(symbol='应', start=2.573, duration=0.161, score=0.63)
AlignmentItem(symbol='求', start=2.754, duration=0.141, score=0.95)
AlignmentItem(symbol='的', start=2.935, duration=0.08, score=0.99)
AlignmentItem(symbol='局', start=3.075, duration=0.101, score=0.98)
AlignmentItem(symbol='面', start=3.256, duration=0.221, score=0.94)

Saving to TextGrid Format

from torchfa import TorchaudioForcedAligner
from torchfa.utils import save_text_grid

aligner = TorchaudioForcedAligner()

audio = "assets/clean_speech.wav"
transcript = "关服务高端产品仍处于供不应求的局面"
cut = aligner.align_audios(audio, transcript)

# Save as TextGrid file
save_text_grid(cut.supervisions[0].alignment["word"], "output.TextGrid", "long")

Batch Processing

from torchfa import TorchaudioForcedAligner

aligner = TorchaudioForcedAligner(batch_size=4)  # Process 4 files at once

audio_paths = [
    "audio1.wav",
    "audio2.wav",
    "audio3.wav"
]
transcripts = [
    "This is the first transcript.",
    "This is the second transcript.",
    "This is the third transcript."
]

cuts = aligner.align_audios(audio_paths, transcripts)

for cut in cuts:
    for alignment in cut.supervisions[0].alignment["word"]:
        print(alignment)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchfa-0.1.1.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

torchfa-0.1.1-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file torchfa-0.1.1.tar.gz.

File metadata

  • Download URL: torchfa-0.1.1.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for torchfa-0.1.1.tar.gz
Algorithm Hash digest
SHA256 707d11854f481046086f1d78ce5beca08fe0ead4cac26eccbc062ef316dc2ec9
MD5 a38c4d5d5dfdeb39cf9a447d07027cd7
BLAKE2b-256 6c3ec43dd359b7f4998b1da09a74ee1de2f8b15d75611268635ce422d4ba0b62

See more details on using hashes here.

File details

Details for the file torchfa-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: torchfa-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 10.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for torchfa-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6c6347640e250d9951db7ce29de6edd772bb1a5cf95890594900cb2f40790d3f
MD5 85f63e269787071ca985944bde7a474e
BLAKE2b-256 1e33f3ddd6f89625ff8c1395e1471f9ce873c2c61493aaff97d73ab3445cbee4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page