ASR pipeline for the ASR project

Project description

IISY ASR Pipeline

An automated speech recognition (ASR) pipeline with speech enhancement, transcription, and speaker identification capabilities.

Overview

The IISY ASR Pipeline is a comprehensive solution for processing audio input in real-time. It combines multiple processing stages:

Speech Enhancement - Using DeepFilterNet to improve audio quality
Speech Transcription - Converting speech to text with Faster Whisper
Speaker Identification - Identifying speakers using SpeechBrain models

Installation

Requirements

Python 3.11.10
CUDA-compatible GPU (recommended for optimal performance)

Installation

You can install the package directly from PyPI:

# For CPU-only installation
pip install iisy-asr-pipeline

# For GPU support (CUDA)
pip install iisy-asr-pipeline[cuda]

For GPU support, you'll need to manually install the CUDA-compatible version of PyTorch first:

# Install CUDA-compatible PyTorch (example for CUDA 11.8)
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118

# Then install the package with CUDA support
pip install iisy-asr-pipeline[cuda]

You can adjust the CUDA version (cu117, cu118, cu121, etc.) based on your system's requirements.

Usage

Listing Available Audio Devices

Before running the pipeline, you may want to identify the correct audio input device:

python -m iisy.run_pipeline --list-devices

Basic Usage

Run the ASR pipeline with default settings:

python -m iisy.run_pipeline --input-device-index 1

Command Line Options

The pipeline can be customized with various command line arguments:

python -m iisy.run_pipeline [OPTIONS]

Device Settings

--device - Device to run models on (cuda or cpu, default: cuda if available, otherwise cpu)
--input-device-index - Input audio device index (default: 1)
--list-devices - List all available audio devices and exit

Audio Parameters

--chunk-size - Number of audio frames per buffer (default: 2048)
--channels - Number of audio channels (1=mono, 2=stereo, default: 1)
--buffer-size - Size of the audio buffer (default: 1000)

Model Parameters

--whisper-model - Whisper model size (tiny, base, small, medium, large, turbo, default: medium)
--speaker-model - Speaker identification model path (default: speechbrain/spkrec-resnet-voxceleb)

Silence Detection Parameters

--silence-threshold - Energy threshold for silence detection (default: 0.01)
--min-silence-duration - Minimum duration of silence for sentence boundary in seconds (default: 2.0)

Other Parameters

--speaker-threshold - Threshold for speaker identification (default: 0.55)
--verbose - Enable verbose logging

Example Commands

Run with a larger Whisper model for better transcription accuracy:

python -m iisy.run_pipeline --whisper-model large

Use a different microphone (device index 2) and enable verbose logging:

python -m iisy.run_pipeline --input-device-index 2 --verbose

Use ECAPA-TDNN model for speaker identification:

python -m iisy.run_pipeline --speaker-model speechbrain/spkrec-ecapa-voxceleb

Advanced Usage

Programmatic Integration

You can integrate the ASR pipeline into your own Python applications:

import threading
import pyaudio
import torch
from iisy.context_window import ContextWindow
from iisy.pipeline.asr_pipeline import AsrPipeline

# Initialize audio capture
p = pyaudio.PyAudio()
in_stream = p.open(
    format=pyaudio.paInt16,
    channels=1,
    rate=16000,
    input=True,
    input_device_index=1,
    frames_per_buffer=2048
)

# Create audio buffer
audio_buffer = ContextWindow(1000)

# Configure pipeline
pipeline_config = {
    'speaker': {
        'model': "speechbrain/spkrec-resnet-voxceleb",
        'savedir': "spkrec-resnet-voxceleb",
        'speaker_threshold': 0.55
    },
    'whisper': {
        "model_size": "medium",
        "device_index": 0,
        "compute_type": "float16"
    }
}

# Create pipeline
pipeline = AsrPipeline(
    input_sr=16000,
    device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
    min_silence_duration=2.0,
    verbose=True,
    **pipeline_config
)

# Set up audio capture
def audio_capture():
    while True:
        try:
            audio_data = in_stream.read(2048, exception_on_overflow=False)
            audio_buffer.add(audio_data)
        except Exception as e:
            print(f"Audio capture error: {e}")
            break

# Start capture thread
capture_thread = threading.Thread(target=audio_capture, daemon=True)
capture_thread.start()

# Run pipeline
try:
    pipeline.run(audio_buffer)
finally:
    in_stream.stop_stream()
    in_stream.close()
    p.terminate()

Custom Processing Steps

You can customize each processing step of the pipeline:

from iisy.pipeline.speech_enhancement_step import SpeechEnhancementStep
from iisy.pipeline.speech_transcription_step import SpeechTranscriptionStep
from iisy.pipeline.speaker_identification_step import SpeakerIdentificationStep

# Create custom steps
enhancement_step = SpeechEnhancementStep(...)
transcription_step = SpeechTranscriptionStep(...)
identification_step = SpeakerIdentificationStep(...)

# Create pipeline with custom steps
pipeline = AsrPipeline(
    enhancement_step=enhancement_step,
    transcription_step=transcription_step,
    identification_step=identification_step
)

Troubleshooting

Common Issues

Audio device not found: Verify your input device with --list-devices and select the correct index.
CUDA out of memory: Try using a smaller Whisper model (--whisper-model small or --whisper-model base).
Poor transcription quality: Consider the following:
- Try a larger Whisper model
- Ensure your microphone is positioned correctly
- Adjust --min-silence-duration for better sentence boundaries
Speaker identification issues: Try adjusting the --speaker-threshold value. Higher values require more confidence for speaker identification.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

This project utilizes several open-source libraries:

DeepFilterNet for speech enhancement
Faster-Whisper for speech transcription
SpeechBrain for speaker identification

Authors

Paul Roloff - paul.roloff@uni-bielefeld.de
Felix Hostert - felix.hostert@uni-bielefeld.de
Kai Titgens - kai.titgens@uni-bielefeld.de

Project details

Release history Release notifications | RSS feed

1.0.1.post2

Apr 30, 2025

1.0.1.post1

Apr 30, 2025

1.0.0.post2

Apr 30, 2025

This version

1.0.0.post1

Apr 29, 2025

1.0.0

Apr 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iisy_asr_pipeline-1.0.0.post1.tar.gz (16.1 kB view details)

Uploaded Apr 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

iisy_asr_pipeline-1.0.0.post1-py3-none-any.whl (17.0 kB view details)

Uploaded Apr 29, 2025 Python 3

File details

Details for the file iisy_asr_pipeline-1.0.0.post1.tar.gz.

File metadata

Download URL: iisy_asr_pipeline-1.0.0.post1.tar.gz
Upload date: Apr 29, 2025
Size: 16.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.13

File hashes

Hashes for iisy_asr_pipeline-1.0.0.post1.tar.gz
Algorithm	Hash digest
SHA256	`ec763b8199a881270157b7bc86ebb70c3524fb4f58ab398043814400ba926b3c`
MD5	`82d3f79f56eae7c406578556c3627079`
BLAKE2b-256	`45dbb5def6110590712d65569bc249a46c27210cdebe584ec6b7591d0faa3ae9`

See more details on using hashes here.

File details

Details for the file iisy_asr_pipeline-1.0.0.post1-py3-none-any.whl.

File metadata

Download URL: iisy_asr_pipeline-1.0.0.post1-py3-none-any.whl
Upload date: Apr 29, 2025
Size: 17.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.13

File hashes

Hashes for iisy_asr_pipeline-1.0.0.post1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8790c4415580002c5c90136b2dab93c661a12880fcc46200e0b77236d939e192`
MD5	`82630524ce74b466795dd779f71fbcb7`
BLAKE2b-256	`c4af90bd0ee2c5047deee5490685706d847a4f79748705b15317e552147cc3e5`

See more details on using hashes here.

iisy-asr-pipeline 1.0.0.post1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

IISY ASR Pipeline

Overview

Installation

Requirements

Installation

Usage

Listing Available Audio Devices

Basic Usage

Command Line Options

Device Settings

Audio Parameters

Model Parameters

Silence Detection Parameters

Other Parameters

Example Commands

Advanced Usage

Programmatic Integration

Custom Processing Steps

Troubleshooting

Common Issues

License

Acknowledgements

Authors

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes