Python input audio.

These details have not been verified by PyPI

Project links

Project description

input_audio

Real-time mic recording to WAV with optional VAD segmentation and noise reduction.

Millisecond-based API; buffer_size / sample_rate must be an integer number of milliseconds. Audio is processed in float32, and written to 16‑bit PCM WAV.

Features

Voice Activity Detection (VAD): Detect speech start/end and emit segments
Noise Reduction (NR): Built-in denoiser for cleaner audio
Streaming to File: Continuous WAV writing with periodic processing
Millisecond-first API: Simple timing controls using milliseconds
Observability: Uses logging for friendly debug output when enabled

Installation

pip install input-audio

Or install from source:

git clone https://github.com/allen2c/input_audio.git
cd input_audio
pip install -e .

Quick Start

Record to a WAV file for 5 seconds:

from input_audio import input_audio

input_audio(
    "./recordings/quick.wav",
    max_recording_duration_ms=5000,
)

Enable noise reduction and verbose logging:

import logging
from input_audio import input_audio, NoiseReductionConfig

logging.basicConfig(level=logging.INFO)

input_audio(
    "./recordings/nr.wav",
    enable_noise_reduction=True,
    noise_reduction_config=NoiseReductionConfig(prop_decrease=0.8),
    max_recording_duration_ms=5000,
    verbose=True,
)

Enable VAD, collect segments into a queue, and also save segments to a folder:

import queue
from input_audio import input_audio, VADConfig

segments_q: queue.Queue = queue.Queue()

input_audio(
    "./recordings/full.wav",
    enable_vad=True,
    vad_config=VADConfig(pre_speech_padding_ms=300, post_speech_padding_ms=500),
    vad_segments_queue=segments_q,
    vad_dirpath="./tmp_vad",  # optional: segment WAVs written here
    max_recording_duration_ms=10000,
)

# Read emitted segments from the queue (each item has start_ms, end_ms, audio_url)
while not segments_q.empty():
    seg = segments_q.get()
    print(seg.start_ms, seg.end_ms, len(seg.audio_url.data))

Customize audio settings (16kHz mono, 512 buffer, batch processing every 320ms):

from input_audio import input_audio, AudioConfig

cfg = AudioConfig(
    sample_rate=16000,
    channels=1,
    buffer_size=512,
    batch_process_ms=320,
    gain_db=20.0,
)

input_audio(
    "./recordings/custom.wav",
    audio_config=cfg,
    max_recording_duration_ms=5000,
)

Notes:

input_audio(...) returns b""; the primary outputs are the continuously written WAV file and (optionally) VAD segments.
Timing constraints are enforced and will raise ValueError if violated.
NR order: noise reduction is applied before gain for consistent loudness.

API Reference (v0.2.0)

input_audio(
    output_audio_filepath: str | Path,
    *,
    audio_config: Optional[AudioConfig] = None,
    enable_vad: bool = False,
    vad_config: Optional[VADConfig] = None,
    vad_model: Optional[torch.nn.Module] = None,
    vad_segments_queue: Optional[queue.Queue[VADSegment]] = None,
    vad_dirpath: Optional[str | Path] = None,
    enable_noise_reduction: bool = False,
    noise_reduction_config: Optional[NoiseReductionConfig] = None,
    stop_event: Optional[threading.Event] = None,
    max_recording_duration_ms: int = 60000,
    verbose: bool = False,
) -> bytes

Key models:

AudioConfig(
    format=pyaudio.paInt16,  # 16‑bit PCM
    channels=1,
    sample_rate=16000,
    buffer_size=512,
    rolling_working_audio_buffer_ms=5000,
    batch_process_ms=320,
    gain_db=20.0,
)

VADConfig(
    threshold=0.5,
    pre_speech_padding_ms=300,
    post_speech_padding_ms=500,
)

NoiseReductionConfig(
    sample_rate=16000,
    stationary=True,
    prop_decrease=0.8,
    n_std_thresh_stationary=1.5,
    n_fft=1024,
)

Constraints:

buffer_size * 1000 % sample_rate == 0 (buffer duration must be whole ms)
batch_process_ms must be a multiple of the buffer duration (ms)
AudioConfig.sample_rate must match NoiseReductionConfig.sample_rate

Changelog — v0.2.0

Breaking changes:

Renamed VADConfig.keep_before_speech_ms → pre_speech_padding_ms
Renamed VADConfig.keep_after_speech_ms → post_speech_padding_ms
Removed VADConfig.sample_rate and VADConfig.buffer_size (VAD shares audio settings)

Behavioral and quality updates:

Apply noise reduction before gain (consistent final loudness)
Replace prints with logging.getLogger(__name__)
Enforce timing constraints with clear error messages
Integer-safe latency checks; fade-in/out uses float32 consistently

Requirements

Python 3.11+
PyAudio (microphone access)
PyTorch and torchaudio (VAD model, WAV encoding)
Numpy, noisereduce, silero_vad
See requirements.txt for full list

License

MIT License — see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Aug 14, 2025

0.1.0

May 24, 2025

0.0.1

May 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

input_audio-0.2.0.tar.gz (10.4 kB view details)

Uploaded Aug 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

input_audio-0.2.0-py3-none-any.whl (10.0 kB view details)

Uploaded Aug 14, 2025 Python 3

File details

Details for the file input_audio-0.2.0.tar.gz.

File metadata

Download URL: input_audio-0.2.0.tar.gz
Upload date: Aug 14, 2025
Size: 10.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.4 CPython/3.11.11 Darwin/24.6.0

File hashes

Hashes for input_audio-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`79f7bb6af2ffe0307df74512984205a27174eee5cae5b086dbd087f7460a8dbf`
MD5	`3246bb60f0900861f32bc7ac1337c48d`
BLAKE2b-256	`0bc9091b8c31876016e9fa38c1f1024082373b0a1f10004ac734d17f79c9f350`

See more details on using hashes here.

File details

Details for the file input_audio-0.2.0-py3-none-any.whl.

File metadata

Download URL: input_audio-0.2.0-py3-none-any.whl
Upload date: Aug 14, 2025
Size: 10.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.4 CPython/3.11.11 Darwin/24.6.0

File hashes

Hashes for input_audio-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a0eb6926c06d7abf7c9884ce36b71da4745d6d8ad548fb64e0b448c8d0786532`
MD5	`5c07ac89317738c41eae65dc26b60095`
BLAKE2b-256	`eb6aa67ae4f0fc9de2e797d5de681981038fa796dbd01e71feb77d3e73908882`

See more details on using hashes here.

input-audio 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

input_audio

Features

Installation

Quick Start

API Reference (v0.2.0)

Changelog — v0.2.0

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes