ASR Quality Enhancement Layer for Parakeet Multilingual ASR

These details have not been verified by PyPI

Project links

Project description

ASR Quality Enhancement Layer

A production-grade post-processing pipeline for improving Parakeet Multilingual ASR outputs. This system addresses common ASR challenges including low-confidence word detection, numeric sequence reconstruction, domain vocabulary correction, and LLM-based contextual polishing.

🎯 Overview

The ASR Enhancement Layer sits between the Parakeet ASR engine and downstream applications, providing:

Error Detection: Identifies low-confidence spans, anomalies, and incomplete sequences
Secondary ASR: Re-transcribes problematic segments using Whisper/Riva
Numeric Reconstruction: Recovers missing digits in phone numbers, OTPs, amounts
Domain Vocabulary: Applies domain-specific terminology corrections
LLM Polishing: Fixes grammar and coherence with anti-hallucination safeguards
Hypothesis Fusion: Combines multiple ASR outputs using weighted scoring

📐 Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                       ASR QUALITY ENHANCEMENT LAYER                      │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐               │
│  │   Parakeet   │───▶│    Error     │───▶│   Re-ASR     │               │
│  │   ASR Input  │    │  Detection   │    │  Processing  │               │
│  └──────────────┘    └──────────────┘    └──────────────┘               │
│         │                   │                   │                        │
│         │            ┌──────┴──────┐            │                        │
│         │            │ • Confidence │           │                        │
│         │            │ • Anomalies  │           │                        │
│         │            │ • Numeric    │           │                        │
│         │            └─────────────┘            │                        │
│         ▼                                       ▼                        │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐               │
│  │   Numeric    │───▶│   Domain     │───▶│  Hypothesis  │               │
│  │ Reconstruct  │    │   Vocab      │    │    Fusion    │               │
│  └──────────────┘    └──────────────┘    └──────────────┘               │
│         │                   │                   │                        │
│         │            ┌──────┴──────┐            │                        │
│         │            │ • Lexicons   │           │                        │
│         │            │ • Fuzzy Match│           │                        │
│         │            │ • Phonetic   │           │                        │
│         │            └─────────────┘            │                        │
│         ▼                                       ▼                        │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐               │
│  │     LLM      │───▶│  Validation  │───▶│   Enhanced   │               │
│  │  Polishing   │    │  & Scoring   │    │   Output     │               │
│  └──────────────┘    └──────────────┘    └──────────────┘               │
│         │                   │                                            │
│         │            ┌──────┴──────┐                                     │
│         │            │ • Consistency│                                    │
│         │            │ • Perplexity │                                    │
│         │            │ • Completeness│                                   │
│         │            └─────────────┘                                     │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

📁 Project Structure

asr_enhancer/
├── __init__.py              # Package exports
├── core.py                  # Main EnhancementPipeline orchestrator
├── detectors/               # Error detection modules
│   ├── confidence_detector.py  # Low-confidence span detection
│   ├── anomaly_detector.py     # Segmentation/repetition anomalies
│   └── numeric_gap_detector.py # Incomplete number sequences
├── resynthesis/             # Secondary ASR processing
│   ├── segment_extractor.py    # Audio segment extraction
│   ├── secondary_asr.py        # ASR backend abstraction
│   ├── whisper_backend.py      # Whisper integration
│   └── riva_backend.py         # NVIDIA Riva integration
├── numeric/                 # Numeric reconstruction
│   ├── pattern_analyzer.py     # Number pattern detection
│   ├── sequence_reconstructor.py # Digit recovery
│   └── validators.py           # Phone/OTP/card validation
├── vocab/                   # Domain vocabulary
│   ├── lexicon_loader.py       # Lexicon loading
│   ├── term_matcher.py         # Term matching (fuzzy/phonetic)
│   └── corrector.py            # Vocabulary correction
├── llm/                     # LLM integration
│   ├── context_restorer.py     # Main LLM processor
│   ├── prompt_templates.py     # Anti-hallucination prompts
│   └── providers.py            # OpenAI/Ollama/Anthropic
├── fusion/                  # Hypothesis fusion
│   ├── fusion_engine.py        # N-best combination
│   ├── scorers.py              # Acoustic/LM scoring
│   └── selector.py             # Candidate selection
├── validators/              # Quality validation
│   ├── consistency_checker.py  # Content consistency
│   ├── perplexity_scorer.py    # Fluency scoring
│   └── completeness_validator.py # Gap detection
├── utils/                   # Utilities
│   ├── config.py               # Configuration management
│   ├── logging.py              # Structured logging
│   ├── audio.py                # Audio utilities
│   └── text.py                 # Text utilities
├── api/                     # FastAPI service
│   ├── main.py                 # Application entry
│   ├── routes.py               # API endpoints
│   └── schemas.py              # Pydantic models
└── cli/                     # Command-line interface
    └── __init__.py             # CLI commands

🚀 Quick Start

Installation

# Clone the repository
git clone <repository-url>
cd sound-web

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -e ".[all]"

Basic Usage

from asr_enhancer import EnhancementPipeline
from asr_enhancer.utils import Config

# Initialize pipeline
config = Config(
    confidence_threshold=0.7,
    llm_provider="ollama",
    llm_model="llama3.1",
)
pipeline = EnhancementPipeline(config)

# Enhance transcript
result = await pipeline.enhance(
    transcript="my phone number is nine one two tree four five six seven ate nine",
    word_timestamps=[
        {"word": "my", "start": 0.0, "end": 0.2},
        {"word": "phone", "start": 0.2, "end": 0.5},
        # ... more timestamps
    ],
    word_confidences=[0.95, 0.92, 0.89, 0.98, 0.85, 0.91, 0.88, 0.45, 0.92, 0.87, 0.90, 0.93, 0.38, 0.91],
)

print(f"Enhanced: {result.enhanced_transcript}")
print(f"Confidence improvement: {result.confidence_improvement:.2%}")

API Server

# Start the API server
asr-enhancer serve --host 0.0.0.0 --port 8000

# Or with Docker
docker-compose up -d

CLI Usage

# Enhance a transcript file
asr-enhancer enhance input.json -o output.json --format json

# Analyze without enhancement
asr-enhancer analyze input.json

# Check dependencies
asr-enhancer check

🔌 API Endpoints

POST /api/v1/enhance

Enhance a transcript using the full pipeline.

{
  "transcript": "raw transcript text",
  "word_timestamps": [{"word": "...", "start": 0.0, "end": 0.1}],
  "word_confidences": [0.9, 0.8, ...],
  "audio_path": "/path/to/audio.wav",  // optional
  "domain_lexicon": {"term": ["variant1", "variant2"]}  // optional
}

POST /api/v1/analyze

Analyze transcript without enhancement.

GET /api/v1/diagnostics

Get pipeline diagnostics and configuration.

GET /health

Health check endpoint.

⚙️ Configuration

Configuration can be set via:

Configuration file (config.json)
Environment variables
Code

Key Settings

Setting	Default	Description
`confidence_threshold`	0.7	Threshold for low-confidence detection
`sliding_window_size`	3	Window size for confidence smoothing
`secondary_asr_backend`	"whisper"	Backend for re-ASR ("whisper", "riva")
`llm_provider`	"ollama"	LLM provider ("openai", "ollama", "anthropic")
`llm_model`	"llama3.1"	LLM model name
`fusion_alpha`	0.4	Weight for original ASR confidence
`fusion_beta`	0.35	Weight for language model score
`fusion_gamma`	0.25	Weight for acoustic similarity

Environment Variables

export ASR_CONFIDENCE_THRESHOLD=0.7
export ASR_LLM_PROVIDER=ollama
export ASR_LLM_MODEL=llama3.1
export ASR_LLM_API_KEY=your-api-key  # For OpenAI/Anthropic
export ASR_LOG_LEVEL=INFO

📊 Fusion Formula

The hypothesis fusion uses weighted scoring:

$$Score = \alpha \cdot P_{confidence} + \beta \cdot S_{LM} + \gamma \cdot S_{acoustic}$$

Where:

$\alpha$ = Original ASR confidence weight (default: 0.4)
$\beta$ = Language model score weight (default: 0.35)
$\gamma$ = Acoustic similarity weight (default: 0.25)

🛡️ Anti-Hallucination Safeguards

The LLM polishing stage includes multiple safeguards:

Number Preservation: All numeric sequences must appear unchanged
Overlap Validation: Enhanced text must maintain >50% word overlap
Grounding Prompts: Explicit instructions to only fix errors, not add content
Retry Logic: Multiple attempts with validation between each

🧪 Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=asr_enhancer --cov-report=html

# Run specific test file
pytest tests/test_detectors.py -v

🐳 Docker Deployment

# Build and run with Docker Compose
docker-compose up -d

# View logs
docker-compose logs -f asr-enhancer

# Pull Ollama model (first time)
docker exec asr-enhancer-ollama ollama pull llama3.1

📈 Next Implementation Steps

Phase 1: Core Implementation (Current)

Project scaffolding
Module stubs with interfaces
FastAPI service structure
CLI tool skeleton
Docker configuration

Phase 2: Detection & Analysis

Implement sliding window confidence detection
Add acoustic anomaly detection
Build numeric gap pattern matching
Unit tests for detectors

Phase 3: Secondary ASR

Whisper backend integration
Audio segment extraction
Batch processing support
Latency optimization

Phase 4: Numeric Reconstruction

Pattern analyzer for phone/OTP/amounts
Acoustic confusion correction
Sequence completion rules
Validation with Luhn checks

Phase 5: Domain Vocabulary

Lexicon file format and loading
Fuzzy matching implementation
Phonetic matching (Soundex/Metaphone)
Case-preserving correction

Phase 6: LLM Integration

Prompt template refinement
Multi-provider support testing
Anti-hallucination validation
Fallback strategies

Phase 7: Fusion & Validation

N-best hypothesis fusion
Language model perplexity scoring
Consistency validation
Completeness checks

Phase 8: Production Hardening

Performance benchmarks
Memory optimization
Streaming support
Monitoring & metrics
Load testing

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Run tests: pytest
Run linting: ruff check . && black --check .
Submit a pull request

📝 License

MIT License - see LICENSE file for details.

🔗 Related Projects

NVIDIA Parakeet - Multilingual ASR
OpenAI Whisper - General-purpose ASR
NVIDIA Riva - Streaming ASR platform

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.1

Jan 9, 2026

0.2.0

Nov 27, 2025

0.1.10

Nov 27, 2025

0.1.9

Nov 27, 2025

0.1.8

Nov 27, 2025

0.1.7

Nov 27, 2025

0.1.6

Nov 27, 2025

0.1.5

Nov 27, 2025

0.1.4

Nov 27, 2025

0.1.3

Nov 27, 2025

0.1.2

Nov 27, 2025

0.1.1

Nov 27, 2025

0.1.0

Nov 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asr_enhancer-0.2.1.tar.gz (106.8 kB view details)

Uploaded Jan 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

asr_enhancer-0.2.1-py3-none-any.whl (118.8 kB view details)

Uploaded Jan 9, 2026 Python 3

File details

Details for the file asr_enhancer-0.2.1.tar.gz.

File metadata

Download URL: asr_enhancer-0.2.1.tar.gz
Upload date: Jan 9, 2026
Size: 106.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for asr_enhancer-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`8c849b67e99882cb3c7d556ea026975da312de00a843871bd1740731a704dee3`
MD5	`f2f83150ef7ca3f509da552250650e2b`
BLAKE2b-256	`f1b121076fe91202fa580ed45a57379b24dca1cba6e6f9c2d48661f56d2bf51e`

See more details on using hashes here.

File details

Details for the file asr_enhancer-0.2.1-py3-none-any.whl.

File metadata

Download URL: asr_enhancer-0.2.1-py3-none-any.whl
Upload date: Jan 9, 2026
Size: 118.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for asr_enhancer-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a3c89355db206c5e210c9df46a6f62957912b5df51ff431e2e31d4ca27850612`
MD5	`a1ede3b1fe568408a537e5f96218e71d`
BLAKE2b-256	`86cbd6b0d6e3d9a2091c498749b11fefc78bf6856a23efaa7db79f2aa471ccc3`

See more details on using hashes here.

asr-enhancer 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ASR Quality Enhancement Layer

🎯 Overview

📐 Architecture

📁 Project Structure

🚀 Quick Start

Installation

Basic Usage

API Server

CLI Usage

🔌 API Endpoints

POST /api/v1/enhance

POST /api/v1/analyze

GET /api/v1/diagnostics

GET /health

⚙️ Configuration

Key Settings

Environment Variables

📊 Fusion Formula

🛡️ Anti-Hallucination Safeguards

🧪 Testing

🐳 Docker Deployment

📈 Next Implementation Steps

Phase 1: Core Implementation (Current)

Phase 2: Detection & Analysis

Phase 3: Secondary ASR

Phase 4: Numeric Reconstruction

Phase 5: Domain Vocabulary

Phase 6: LLM Integration

Phase 7: Fusion & Validation

Phase 8: Production Hardening

🤝 Contributing

📝 License

🔗 Related Projects

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes