Skip to main content

Python input audio.

Project description

input_audio

Real-time mic recording to WAV with optional VAD segmentation and noise reduction.

Millisecond-based API; buffer_size / sample_rate must be an integer number of milliseconds. Audio is processed in float32, and written to 16‑bit PCM WAV.

Features

  • Voice Activity Detection (VAD): Detect speech start/end and emit segments
  • Noise Reduction (NR): Built-in denoiser for cleaner audio
  • Streaming to File: Continuous WAV writing with periodic processing
  • Millisecond-first API: Simple timing controls using milliseconds
  • Observability: Uses logging for friendly debug output when enabled

Installation

pip install input-audio

Or install from source:

git clone https://github.com/allen2c/input_audio.git
cd input_audio
pip install -e .

Quick Start

Record to a WAV file for 5 seconds:

from input_audio import input_audio

input_audio(
    "./recordings/quick.wav",
    max_recording_duration_ms=5000,
)

Enable noise reduction and verbose logging:

import logging
from input_audio import input_audio, NoiseReductionConfig

logging.basicConfig(level=logging.INFO)

input_audio(
    "./recordings/nr.wav",
    enable_noise_reduction=True,
    noise_reduction_config=NoiseReductionConfig(prop_decrease=0.8),
    max_recording_duration_ms=5000,
    verbose=True,
)

Enable VAD, collect segments into a queue, and also save segments to a folder:

import queue
from input_audio import input_audio, VADConfig

segments_q: queue.Queue = queue.Queue()

input_audio(
    "./recordings/full.wav",
    enable_vad=True,
    vad_config=VADConfig(pre_speech_padding_ms=300, post_speech_padding_ms=500),
    vad_segments_queue=segments_q,
    vad_dirpath="./tmp_vad",  # optional: segment WAVs written here
    max_recording_duration_ms=10000,
)

# Read emitted segments from the queue (each item has start_ms, end_ms, audio_url)
while not segments_q.empty():
    seg = segments_q.get()
    print(seg.start_ms, seg.end_ms, len(seg.audio_url.data))

Customize audio settings (16kHz mono, 512 buffer, batch processing every 320ms):

from input_audio import input_audio, AudioConfig

cfg = AudioConfig(
    sample_rate=16000,
    channels=1,
    buffer_size=512,
    batch_process_ms=320,
    gain_db=20.0,
)

input_audio(
    "./recordings/custom.wav",
    audio_config=cfg,
    max_recording_duration_ms=5000,
)

Notes:

  • input_audio(...) returns b""; the primary outputs are the continuously written WAV file and (optionally) VAD segments.
  • Timing constraints are enforced and will raise ValueError if violated.
  • NR order: noise reduction is applied before gain for consistent loudness.

API Reference (v0.2.0)

input_audio(
    output_audio_filepath: str | Path,
    *,
    audio_config: Optional[AudioConfig] = None,
    enable_vad: bool = False,
    vad_config: Optional[VADConfig] = None,
    vad_model: Optional[torch.nn.Module] = None,
    vad_segments_queue: Optional[queue.Queue[VADSegment]] = None,
    vad_dirpath: Optional[str | Path] = None,
    enable_noise_reduction: bool = False,
    noise_reduction_config: Optional[NoiseReductionConfig] = None,
    stop_event: Optional[threading.Event] = None,
    max_recording_duration_ms: int = 60000,
    verbose: bool = False,
) -> bytes

Key models:

AudioConfig(
    format=pyaudio.paInt16,  # 16‑bit PCM
    channels=1,
    sample_rate=16000,
    buffer_size=512,
    rolling_working_audio_buffer_ms=5000,
    batch_process_ms=320,
    gain_db=20.0,
)

VADConfig(
    threshold=0.5,
    pre_speech_padding_ms=300,
    post_speech_padding_ms=500,
)

NoiseReductionConfig(
    sample_rate=16000,
    stationary=True,
    prop_decrease=0.8,
    n_std_thresh_stationary=1.5,
    n_fft=1024,
)

Constraints:

  • buffer_size * 1000 % sample_rate == 0 (buffer duration must be whole ms)
  • batch_process_ms must be a multiple of the buffer duration (ms)
  • AudioConfig.sample_rate must match NoiseReductionConfig.sample_rate

Changelog — v0.2.0

Breaking changes:

  • Renamed VADConfig.keep_before_speech_mspre_speech_padding_ms
  • Renamed VADConfig.keep_after_speech_mspost_speech_padding_ms
  • Removed VADConfig.sample_rate and VADConfig.buffer_size (VAD shares audio settings)

Behavioral and quality updates:

  • Apply noise reduction before gain (consistent final loudness)
  • Replace prints with logging.getLogger(__name__)
  • Enforce timing constraints with clear error messages
  • Integer-safe latency checks; fade-in/out uses float32 consistently

Requirements

  • Python 3.11+
  • PyAudio (microphone access)
  • PyTorch and torchaudio (VAD model, WAV encoding)
  • Numpy, noisereduce, silero_vad
  • See requirements.txt for full list

License

MIT License — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

input_audio-0.2.0.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

input_audio-0.2.0-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file input_audio-0.2.0.tar.gz.

File metadata

  • Download URL: input_audio-0.2.0.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.11.11 Darwin/24.6.0

File hashes

Hashes for input_audio-0.2.0.tar.gz
Algorithm Hash digest
SHA256 79f7bb6af2ffe0307df74512984205a27174eee5cae5b086dbd087f7460a8dbf
MD5 3246bb60f0900861f32bc7ac1337c48d
BLAKE2b-256 0bc9091b8c31876016e9fa38c1f1024082373b0a1f10004ac734d17f79c9f350

See more details on using hashes here.

File details

Details for the file input_audio-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: input_audio-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.11.11 Darwin/24.6.0

File hashes

Hashes for input_audio-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a0eb6926c06d7abf7c9884ce36b71da4745d6d8ad548fb64e0b448c8d0786532
MD5 5c07ac89317738c41eae65dc26b60095
BLAKE2b-256 eb6aa67ae4f0fc9de2e797d5de681981038fa796dbd01e71feb77d3e73908882

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page