HPSS-based voice denoiser optimized for ASR preprocessing (STT, diarization, speaker embedding)

These details have not been verified by PyPI

Project links

Project description

HPSS Voice Denoiser

A production-ready audio denoising pipeline optimized for ASR preprocessing (Speech-to-Text, Speaker Diarization, Voice Embedding).

Built on Harmonic-Percussive Source Separation (HPSS) with context-aware mixing to preserve voice quality while removing environmental noise.

Features

Optimized for ASR: Preserves voice characteristics critical for STT, diarization, and speaker embedding
Stateless Processing: Each audio chunk is processed independently (perfect for streaming)
Voice-Preserving: 99% voice band preservation, consonants intact
Low Latency: Suitable for real-time applications
Simple API: Easy to integrate as a library or use via CLI

Benchmark Results

Tested on real audio (88 seconds total). Run benchmarks/benchmark.py for full analysis.

Metric	Value	Description
STT Confidence	+16% improvement	Whisper word probability increased after denoising
Speaker Embedding	93.5% similar	Voice identity preserved (cosine similarity before/after)
Diarization	98% consistent	Speaker segments unchanged by denoising
Voice Band (300-3kHz)	75% preserved	Mid frequencies containing voice fundamentals
High Freq (3k-8kHz)	48% preserved	Reduced by design (noise lives here)

Installation

From PyPI (recommended)

pip install hpss-voice-denoiser

With visualization support

pip install hpss-voice-denoiser[visualization]

From source

git clone https://github.com/atomys/hpss-voice-denoiser.git
cd hpss-voice-denoiser
pip install -e .

Quick Start

As a Library

from hpss_denoiser import HPSSDenoiser

# Create denoiser with default settings
denoiser = HPSSDenoiser()

# Process PCM audio (16kHz, 16-bit, mono)
with open("input.pcm", "rb") as f:
    pcm_data = f.read()

# Denoise
cleaned_pcm = denoiser.process(pcm_data)

# Save result
with open("output.pcm", "wb") as f:
    f.write(cleaned_pcm)

With NumPy Arrays

import numpy as np
from hpss_denoiser import HPSSDenoiser

denoiser = HPSSDenoiser()

# Float audio (-1.0 to 1.0)
audio = np.random.randn(16000).astype(np.float64) * 0.1

# Process
cleaned = denoiser.process_array(audio)

Custom Configuration

from hpss_denoiser import HPSSDenoiser, DenoiserConfig

# Adjust for your use case
config = DenoiserConfig(
    sample_rate=16000,
    
    # More aggressive noise reduction in silence
    no_context_perc_gain=0.02,
    
    # Preserve more consonants
    voice_context_perc_gain=0.25,
)

denoiser = HPSSDenoiser(config)

CLI Usage

# Basic usage
hpss-denoise input.pcm output.pcm

# Process with intermediate stages (for debugging)
hpss-denoise input.pcm output.pcm --stages

# Generate analysis visualization
hpss-denoise input.pcm --analyze --output-image analysis.png

# Custom sample rate
hpss-denoise input.pcm output.pcm --sample-rate 8000

# Show all options
hpss-denoise --help

Audio Format

The denoiser expects and produces:

Format: Raw PCM
Sample Rate: 16000 Hz (configurable)
Bit Depth: 16-bit signed integer
Channels: Mono

Converting from other formats

# WAV to PCM
ffmpeg -i input.wav -f s16le -acodec pcm_s16le -ar 16000 -ac 1 input.pcm

# MP3 to PCM
ffmpeg -i input.mp3 -f s16le -acodec pcm_s16le -ar 16000 -ac 1 input.pcm

# PCM to WAV (for playback)
ffmpeg -f s16le -ar 16000 -ac 1 -i output.pcm output.wav

How It Works

Pipeline Architecture

Audio Input (PCM 16kHz, 16-bit)
    │
    ▼
┌─────────────────────────────────────┐
│  High-pass Filter (80 Hz)           │  Remove DC offset & rumble
└─────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────┐
│  STFT Analysis                      │  Time-frequency representation
│  (25ms frames, 6ms hop)             │
└─────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────┐
│  HPSS Separation                    │  Split into harmonic (voice)
│  (median filtering)                 │  and percussive (transients)
└─────────────────────────────────────┘
    │
    ├─── Harmonic ───┐
    │                ▼
    │    ┌─────────────────────────────┐
    │    │  Envelope Tightening        │  Reduce HPSS echo artifacts
    │    │  (asymmetric follower)      │
    │    └─────────────────────────────┘
    │                │
    │                ▼
    │    ┌─────────────────────────────┐
    ├───▶│  Context-Based Mixing       │  Detect voice activity
    │    │  - Voice: keep 20% perc     │  Mix based on context
    │    │  - Silence: keep 4% perc    │
    │    └─────────────────────────────┘
    │                │
    └── Percussive ──┘
                     │
                     ▼
┌─────────────────────────────────────┐
│  Low-Frequency Denoising            │  Spectral subtraction <350Hz
│  (percentile-based)                 │
└─────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────┐
│  ISTFT Synthesis                    │  Reconstruct audio
└─────────────────────────────────────┘
    │
    ▼
Audio Output (PCM 16kHz, 16-bit)

Why HPSS?

Harmonic-Percussive Source Separation uses median filtering on the spectrogram:

Harmonic components (voice fundamentals, vowels) appear as horizontal lines
Percussive components (transients, consonants, noise) appear as vertical lines

By separating these, we can:

Keep the harmonic component (clean voice)
Selectively mix percussive based on voice context
During speech: include percussive (consonants like 't', 's', 'k')
During silence: suppress percussive (noise transients)

Key Innovation: Context-Aware Mixing

The challenge with HPSS for voice is that consonants are percussive. Naive suppression of the percussive component removes 't', 's', 'f', etc.

Our solution: detect voice context using harmonic energy in the 200-4000 Hz band:

If voice is present: mix more percussive (preserve consonants)
If silence: aggressively suppress percussive (remove noise)

Configuration Reference

@dataclass
class DenoiserConfig:
    """Configuration for HPSS voice denoiser."""
    
    # Audio parameters
    sample_rate: int = 16000          # Input/output sample rate
    
    # STFT parameters
    frame_size_ms: int = 25           # Analysis frame size
    hop_size_ms: int = 6              # Frame hop size
    
    # HPSS separation
    harmonic_kernel: int = 9          # Median filter size (time)
    percussive_kernel: int = 9        # Median filter size (freq)
    hpss_margin: float = 2.5          # Separation hardness
    
    # Context detection
    context_window: int = 10          # Frames to extend voice context
    harmonic_threshold_db: float = -20.0  # Voice detection threshold
    
    # Percussive mixing
    voice_context_perc_gain: float = 0.20  # Keep 20% during voice
    no_context_perc_gain: float = 0.04     # Keep 4% during silence
    
    # Envelope tightening (echo reduction)
    envelope_tightening: bool = True
    envelope_attack_frames: int = 2
    envelope_release_frames: int = 3
    envelope_min_gain: float = 0.15
    
    # Low-frequency denoising
    noise_reduction_strength: float = 0.8
    noise_reduction_max_freq: float = 350.0

Use Cases

Speech-to-Text (STT)

from hpss_denoiser import HPSSDenoiser
import whisper

denoiser = HPSSDenoiser()

# Denoise before transcription
with open("noisy_audio.pcm", "rb") as f:
    noisy = f.read()

cleaned = denoiser.process(noisy)

# Save and transcribe
with open("cleaned.pcm", "wb") as f:
    f.write(cleaned)

# Use with Whisper
model = whisper.load_model("base")
result = model.transcribe("cleaned.wav")

Speaker Diarization

from hpss_denoiser import HPSSDenoiser

# Denoising improves speaker boundary detection
denoiser = HPSSDenoiser()

# Process chunks for streaming diarization
chunk_size = 30 * 16000 * 2  # 30 seconds

with open("meeting.pcm", "rb") as f:
    while chunk := f.read(chunk_size):
        cleaned_chunk = denoiser.process(chunk)
        # Send to diarization pipeline

Voice Embedding

from hpss_denoiser import HPSSDenoiser

# Clean audio produces more stable embeddings
denoiser = HPSSDenoiser()

# Process enrollment audio
enrollment_clean = denoiser.process(enrollment_pcm)

# Process verification audio
verification_clean = denoiser.process(verification_pcm)

# Compare embeddings (using your embedding model)

Performance

Processing Speed

Audio Duration	Processing Time	Real-time Factor
1 second	~44 ms	~23x
10 seconds	~420 ms	~23x
88 seconds	~3.8 s	~23x

Tested on macOS (Darwin), Python 3.12, single-threaded

Memory Usage

~50 MB base memory
~2 MB per second of audio being processed
Streaming-friendly: process in chunks

Troubleshooting

Muffled output

Increase voice_context_perc_gain:

config = DenoiserConfig(voice_context_perc_gain=0.30)

Too much noise remaining

Decrease no_context_perc_gain:

config = DenoiserConfig(no_context_perc_gain=0.02)

Echo/reverb artifacts

Reduce envelope release time:

config = DenoiserConfig(envelope_release_frames=2)

Consonants being cut

Increase context window:

config = DenoiserConfig(context_window=15)

Development

Setup

git clone https://github.com/atomys/hpss-voice-denoiser.git
cd hpss-voice-denoiser
pip install -e ".[dev]"

Run tests

pytest

Type checking

mypy src/hpss_denoiser

Linting

ruff check src/
ruff format src/

Algorithm References

HPSS: Fitzgerald, D. (2010). "Harmonic/Percussive Separation using Median Filtering"
Spectral Subtraction: Boll, S. (1979). "Suppression of Acoustic Noise in Speech Using Spectral Subtraction"

License

MIT License - see LICENSE for details.

Contributing

Contributions welcome! Please open an issue first to discuss what you would like to change.

Acknowledgments

Developed to improve audio coming from wearable device project.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.23.1

Dec 30, 2025

This version

1.23.0

Dec 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hpss_voice_denoiser-1.23.0.tar.gz (19.9 kB view details)

Uploaded Dec 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hpss_voice_denoiser-1.23.0-py3-none-any.whl (25.3 kB view details)

Uploaded Dec 30, 2025 Python 3

File details

Details for the file hpss_voice_denoiser-1.23.0.tar.gz.

File metadata

Download URL: hpss_voice_denoiser-1.23.0.tar.gz
Upload date: Dec 30, 2025
Size: 19.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hpss_voice_denoiser-1.23.0.tar.gz
Algorithm	Hash digest
SHA256	`63f3d06273c5aef043d3eb679eeb5c18a7b3c4f9d1d1d5c7bb6fd957223e97eb`
MD5	`75dbc952b52455d629449638b69148a6`
BLAKE2b-256	`963856efdfa525f9a4503aa2b5365de3b3a54396400c7deb65d2bcdf6d2a86b2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for hpss_voice_denoiser-1.23.0.tar.gz:

Publisher: publish.yml on 42atomys/hpss-voice-denoiser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: hpss_voice_denoiser-1.23.0.tar.gz
- Subject digest: 63f3d06273c5aef043d3eb679eeb5c18a7b3c4f9d1d1d5c7bb6fd957223e97eb
- Sigstore transparency entry: 782520798
- Sigstore integration time: Dec 30, 2025
Source repository:
- Permalink: 42atomys/hpss-voice-denoiser@21dedb3b92322abee41a0bea0314a8d7acf2cb7b
- Branch / Tag: refs/tags/v1.23.0
- Owner: https://github.com/42atomys
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@21dedb3b92322abee41a0bea0314a8d7acf2cb7b
- Trigger Event: push

File details

Details for the file hpss_voice_denoiser-1.23.0-py3-none-any.whl.

File metadata

Download URL: hpss_voice_denoiser-1.23.0-py3-none-any.whl
Upload date: Dec 30, 2025
Size: 25.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hpss_voice_denoiser-1.23.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`036ecb308a08803824ebea0f8c4e9eed39893629a2ecca87fd077e0ff8d40134`
MD5	`d68fa81816b744740bb05defbc021794`
BLAKE2b-256	`c423d7b2784edd171057ecf3b88d08b23586ac5bb2e06b32a9ee45e449af9389`

See more details on using hashes here.

Provenance

The following attestation bundles were made for hpss_voice_denoiser-1.23.0-py3-none-any.whl:

Publisher: publish.yml on 42atomys/hpss-voice-denoiser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: hpss_voice_denoiser-1.23.0-py3-none-any.whl
- Subject digest: 036ecb308a08803824ebea0f8c4e9eed39893629a2ecca87fd077e0ff8d40134
- Sigstore transparency entry: 782520800
- Sigstore integration time: Dec 30, 2025
Source repository:
- Permalink: 42atomys/hpss-voice-denoiser@21dedb3b92322abee41a0bea0314a8d7acf2cb7b
- Branch / Tag: refs/tags/v1.23.0
- Owner: https://github.com/42atomys
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@21dedb3b92322abee41a0bea0314a8d7acf2cb7b
- Trigger Event: push

hpss-voice-denoiser 1.23.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

HPSS Voice Denoiser

Features

Benchmark Results

Installation

From PyPI (recommended)

With visualization support

From source

Quick Start

As a Library

With NumPy Arrays

Custom Configuration

CLI Usage

Audio Format

Converting from other formats

How It Works

Pipeline Architecture

Why HPSS?

Key Innovation: Context-Aware Mixing

Configuration Reference

Use Cases

Speech-to-Text (STT)

Speaker Diarization

Voice Embedding

Performance

Processing Speed

Memory Usage

Troubleshooting

Muffled output

Too much noise remaining

Echo/reverb artifacts

Consonants being cut

Development

Setup

Run tests

Type checking

Linting

Algorithm References

License

Contributing

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance