Python input audio.
Project description
input_audio
Real-time mic recording to WAV with optional VAD segmentation and noise reduction.
Millisecond-based API; buffer_size / sample_rate must be an integer number of milliseconds. Audio is processed in float32, and written to 16‑bit PCM WAV.
Features
- Voice Activity Detection (VAD): Detect speech start/end and emit segments
- Noise Reduction (NR): Built-in denoiser for cleaner audio
- Streaming to File: Continuous WAV writing with periodic processing
- Millisecond-first API: Simple timing controls using milliseconds
- Observability: Uses
loggingfor friendly debug output when enabled
Installation
pip install input-audio
Or install from source:
git clone https://github.com/allen2c/input_audio.git
cd input_audio
pip install -e .
Quick Start
Record to a WAV file for 5 seconds:
from input_audio import input_audio
input_audio(
"./recordings/quick.wav",
max_recording_duration_ms=5000,
)
Enable noise reduction and verbose logging:
import logging
from input_audio import input_audio, NoiseReductionConfig
logging.basicConfig(level=logging.INFO)
input_audio(
"./recordings/nr.wav",
enable_noise_reduction=True,
noise_reduction_config=NoiseReductionConfig(prop_decrease=0.8),
max_recording_duration_ms=5000,
verbose=True,
)
Enable VAD, collect segments into a queue, and also save segments to a folder:
import queue
from input_audio import input_audio, VADConfig
segments_q: queue.Queue = queue.Queue()
input_audio(
"./recordings/full.wav",
enable_vad=True,
vad_config=VADConfig(pre_speech_padding_ms=300, post_speech_padding_ms=500),
vad_segments_queue=segments_q,
vad_dirpath="./tmp_vad", # optional: segment WAVs written here
max_recording_duration_ms=10000,
)
# Read emitted segments from the queue (each item has start_ms, end_ms, audio_url)
while not segments_q.empty():
seg = segments_q.get()
print(seg.start_ms, seg.end_ms, len(seg.audio_url.data))
Customize audio settings (16kHz mono, 512 buffer, batch processing every 320ms):
from input_audio import input_audio, AudioConfig
cfg = AudioConfig(
sample_rate=16000,
channels=1,
buffer_size=512,
batch_process_ms=320,
gain_db=20.0,
)
input_audio(
"./recordings/custom.wav",
audio_config=cfg,
max_recording_duration_ms=5000,
)
Notes:
input_audio(...)returnsb""; the primary outputs are the continuously written WAV file and (optionally) VAD segments.- Timing constraints are enforced and will raise
ValueErrorif violated. - NR order: noise reduction is applied before gain for consistent loudness.
API Reference (v0.2.0)
input_audio(
output_audio_filepath: str | Path,
*,
audio_config: Optional[AudioConfig] = None,
enable_vad: bool = False,
vad_config: Optional[VADConfig] = None,
vad_model: Optional[torch.nn.Module] = None,
vad_segments_queue: Optional[queue.Queue[VADSegment]] = None,
vad_dirpath: Optional[str | Path] = None,
enable_noise_reduction: bool = False,
noise_reduction_config: Optional[NoiseReductionConfig] = None,
stop_event: Optional[threading.Event] = None,
max_recording_duration_ms: int = 60000,
verbose: bool = False,
) -> bytes
Key models:
AudioConfig(
format=pyaudio.paInt16, # 16‑bit PCM
channels=1,
sample_rate=16000,
buffer_size=512,
rolling_working_audio_buffer_ms=5000,
batch_process_ms=320,
gain_db=20.0,
)
VADConfig(
threshold=0.5,
pre_speech_padding_ms=300,
post_speech_padding_ms=500,
)
NoiseReductionConfig(
sample_rate=16000,
stationary=True,
prop_decrease=0.8,
n_std_thresh_stationary=1.5,
n_fft=1024,
)
Constraints:
buffer_size * 1000 % sample_rate == 0(buffer duration must be whole ms)batch_process_msmust be a multiple of the buffer duration (ms)AudioConfig.sample_ratemust matchNoiseReductionConfig.sample_rate
Changelog — v0.2.0
Breaking changes:
- Renamed
VADConfig.keep_before_speech_ms→pre_speech_padding_ms - Renamed
VADConfig.keep_after_speech_ms→post_speech_padding_ms - Removed
VADConfig.sample_rateandVADConfig.buffer_size(VAD shares audio settings)
Behavioral and quality updates:
- Apply noise reduction before gain (consistent final loudness)
- Replace prints with
logging.getLogger(__name__) - Enforce timing constraints with clear error messages
- Integer-safe latency checks; fade-in/out uses float32 consistently
Requirements
- Python 3.11+
- PyAudio (microphone access)
- PyTorch and torchaudio (VAD model, WAV encoding)
- Numpy, noisereduce, silero_vad
- See
requirements.txtfor full list
License
MIT License — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file input_audio-0.2.0.tar.gz.
File metadata
- Download URL: input_audio-0.2.0.tar.gz
- Upload date:
- Size: 10.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.4 CPython/3.11.11 Darwin/24.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
79f7bb6af2ffe0307df74512984205a27174eee5cae5b086dbd087f7460a8dbf
|
|
| MD5 |
3246bb60f0900861f32bc7ac1337c48d
|
|
| BLAKE2b-256 |
0bc9091b8c31876016e9fa38c1f1024082373b0a1f10004ac734d17f79c9f350
|
File details
Details for the file input_audio-0.2.0-py3-none-any.whl.
File metadata
- Download URL: input_audio-0.2.0-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.4 CPython/3.11.11 Darwin/24.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0eb6926c06d7abf7c9884ce36b71da4745d6d8ad548fb64e0b448c8d0786532
|
|
| MD5 |
5c07ac89317738c41eae65dc26b60095
|
|
| BLAKE2b-256 |
eb6aa67ae4f0fc9de2e797d5de681981038fa796dbd01e71feb77d3e73908882
|