A cross-platform utility for capturing live audio from a microphone using FFmpeg.
Project description
Live Audio Capture
Live Audio Capture is a cross-platform Python package designed for capturing, processing, and analyzing live audio from a microphone in real-time. It provides a robust and flexible interface for voice activity detection (VAD), noise reduction, audio visualization, and more. Whether you're building a voice assistant, a transcription tool, or a real-time audio analysis application, this package has you covered.
Why Use Live Audio Capture?
Key Advantages
- Cross-Platform Support: Works seamlessly on Windows, macOS, and Linux.
- Real-Time Processing: Captures and processes audio in real-time with minimal latency.
- Voice Activity Detection (VAD): Dynamically detects speech and stops recording during silence.
- Noise Reduction: Advanced noise reduction algorithms powered by the
noisereducepackage for cleaner audio. - Customizable: Highly configurable parameters for sampling rate, chunk duration, noise reduction, and more.
- Real-Time Visualization: Visualize audio waveforms, frequency spectra, and spectrograms in real-time.
- Easy to Use: Simple API for quick integration into your projects.
Use Cases
- Voice Assistants: Capture and process user commands in real-time.
- Transcription Tools: Record and transcribe audio with noise reduction.
- Real-Time Audio Analysis: Analyze audio signals for frequency, volume, and other metrics.
- Educational Tools: Teach audio processing and visualization concepts.
- Security Systems: Detect and record audio events in real-time.
Features
- Live Audio Capture: Capture audio from the microphone in real-time.
- Voice Activity Detection (VAD): Automatically detect speech and stop recording during silence.
- Noise Reduction: Reduce background noise using the
noisereducepackage, which employs spectral gating techniques. - Real-Time Visualization: Visualize audio waveforms, frequency spectra, and spectrograms.
- Multiple Output Formats: Save recordings in WAV, MP3, or OGG formats.
- Customizable Parameters:
- Sampling rate
- Chunk duration
- VAD aggressiveness
- Noise reduction settings
- Low-pass filter cutoff frequency
- Cross-Platform: Works on Windows, macOS, and Linux.
Installation
Requirements
- Python 3.9 or higher
- FFmpeg (for audio file handling)
- Microphone access
Install the Package
You can install the package via pip:
pip install live_audio_capture
Install FFmpeg
- Linux:
sudo apt update sudo apt install ffmpeg
- macOS (using Homebrew):
brew install ffmpeg
- Windows: Download FFmpeg from https://ffmpeg.org/download.html and add it to your system's
PATH.
Usage
Basic Example
Capture audio with voice activity detection and save it to a file:
from live_audio_capture import LiveAudioCapture
# Initialize the audio capture
capture = LiveAudioCapture(
sampling_rate=16000, # Sample rate in Hz
chunk_duration=0.1, # Duration of each audio chunk in seconds
enable_noise_canceling=True, # Enable noise reduction
aggressiveness=2, # VAD aggressiveness level (0-3)
)
# Start recording with VAD
capture.listen_and_record_with_vad(
output_file="output.wav", # Save the recording to this file
silence_duration=2.0, # Stop recording after 2 seconds of silence
format="wav", # Output format
)
# Stop the capture
capture.stop()
Real-Time Visualization
Visualize audio in real-time:
from live_audio_capture import LiveAudioCapture, AudioVisualizer
# Initialize the audio capture
capture = LiveAudioCapture(sampling_rate=44100, chunk_duration=0.1)
# Initialize the audio visualizer
visualizer = AudioVisualizer(sampling_rate=44100, chunk_duration=0.1)
# Stream audio and visualize it
for audio_chunk in capture.stream_audio():
visualizer.add_audio_chunk(audio_chunk)
Advanced Example
Use all available parameters for maximum customization:
from live_audio_capture import LiveAudioCapture
# Initialize the audio capture with all parameters
capture = LiveAudioCapture(
sampling_rate=16000,
chunk_duration=0.1,
audio_format="f32le",
channels=1,
aggressiveness=3,
enable_beep=True,
enable_noise_canceling=True,
low_pass_cutoff=7500.0,
stationary_noise_reduction=True,
prop_decrease=1.0,
n_std_thresh_stationary=1.5,
n_jobs=1,
use_torch=False,
device="cpu",
calibration_duration=2.0,
use_adaptive_threshold=True,
)
# Start recording with VAD
capture.listen_and_record_with_vad(
output_file="output.wav",
silence_duration=2.0,
format="wav",
)
# Stop the capture
capture.stop()
Features and Arguments
LiveAudioCapture Parameters
sampling_rate: Sample rate in Hz (default:16000).chunk_duration: Duration of each audio chunk in seconds (default:0.1).audio_format: Audio format for FFmpeg output (default:"f32le").channels: Number of audio channels (default:1for mono).aggressiveness: VAD aggressiveness level (0-3, default:1).enable_beep: Play beep sounds when recording starts/stops (default:True).enable_noise_canceling: Enable noise reduction using thenoisereducepackage (default:False).low_pass_cutoff: Low-pass filter cutoff frequency (default:7500.0).stationary_noise_reduction: Enable stationary noise reduction (default:False).prop_decrease: Proportion to reduce noise by (default:1.0).n_std_thresh_stationary: Threshold for stationary noise reduction (default:1.5).n_jobs: Number of parallel jobs for noise reduction (default:1).use_torch: Use PyTorch for noise reduction (default:False).device: Device for PyTorch noise reduction (default:"cpu").calibration_duration: Duration of calibration for adaptive thresholding (default:2.0).use_adaptive_threshold: Enable adaptive thresholding for VAD (default:True).
Recommendations
- Use Threading for Real-Time Listening: It is highly recommended to use threading for real-time audio listening. This allows you to easily stop the audio capture in any script using the
.stop()method without blocking the main program. - Use a High-Quality Microphone: For best results, use a microphone with good noise cancellation.
- Adjust VAD Aggressiveness: Higher aggressiveness levels may reduce false positives but can also miss softer speech.
- Enable Noise Reduction: If you're working in a noisy environment, enable noise reduction for cleaner audio.
- Test on Your Platform: Test the package on your target platform to ensure compatibility.
Technical Details
Voice Activity Detection (VAD)
The VAD system uses an energy-based approach with adaptive thresholding. It calculates the energy of each audio chunk and compares it to a dynamically adjusted threshold. Hysteresis is applied to avoid rapid toggling between speech and silence states.
Noise Reduction
The noisereduce package is used for noise reduction. It employs spectral gating techniques to remove background noise while preserving speech. You can choose between stationary and non-stationary noise reduction, and even use PyTorch for GPU-accelerated processing.
Real-Time Visualization
The visualization module provides insights into:
- Waveform: The amplitude of the audio signal over time.
- Frequency Spectrum: The distribution of frequencies in the audio signal.
- Spectrogram: A visual representation of the spectrum of frequencies over time.
- Volume Meter: Real-time volume levels.
- Volume History: A history of volume levels over time.
Contributing
Contributions are welcome! Please read the Contributing Guidelines for details.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Support
For questions, issues, or feature requests, please open an issue on GitHub.
Final Words
If you find this package useful, please consider leaving a ⭐ star on the GitHub repository. Your support motivates us to keep improving! If you have any suggestions for optimization or new features, don't hesitate to reach out. We'd love to hear from you!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file live_audio_capture-0.4.1.tar.gz.
File metadata
- Download URL: live_audio_capture-0.4.1.tar.gz
- Upload date:
- Size: 665.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.9.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bebfc37a20e91b6a467543cfcbb6ba27cbaf9ca60297ecfb00d960099d735faa
|
|
| MD5 |
f03119c4c4ff037c1f45277ac73ad58b
|
|
| BLAKE2b-256 |
4671b67aecb434b0607a427110455927b083db053a5b81eda11cd8e27eee5561
|
File details
Details for the file live_audio_capture-0.4.1-py3-none-any.whl.
File metadata
- Download URL: live_audio_capture-0.4.1-py3-none-any.whl
- Upload date:
- Size: 22.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.9.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d03b7a2e3e4d3d6ff3171195914367af056eb42f135ab0dd586e408858ba255d
|
|
| MD5 |
526796ef1b9ca2979df7beb359c9176c
|
|
| BLAKE2b-256 |
c69e1f1da5ea03c9fd24e326c347ed4ccf793260e0e8209dc5200c6aee189c44
|