Skip to main content

Build ASR bias artifacts from presentation decks

Project description

ASR Bias Builder

Build ASR bias artifacts from presentation decks for Whisper and Google Speech-to-Text

CI PyPI Python License: MIT

ASR Bias Builder extracts high-value entities from PDF/PPTX decks and produces:

  • deck_terms.txt – Whisper initial_prompt hotwords
  • phrase_set.json – Google Speech-to-Text v2 Speech Adaptation PhraseSet
  • Structured LLM candidates + verification telemetry for auditability

Quick Start

pip install -e .
asr-bias-builder pipeline deck.pdf

Artifacts land in ./asr-bias-output/<deck-name>/ by default (deck text, seeds, verified terms, prompt list, phrase set, review markdown) and the cross-run summary lives at ./asr-bias-summary.csv. Override the locations any time via --output-dir / --summary-csv.

Features

  • Deterministic PDF/PPTX extraction with OCR normalization
  • Seed mining with configurable filters, section weighting, and per-deck overrides
  • LLM integration via Claude Code CLI (stdin, streaming JSON, read tool)
  • Verification layer merges deterministic + LLM terms with alias tracking
  • Artifact builders for Whisper prompts and Google Speech Adaptation PhraseSets
  • Review markdown + CSV summaries for production runs
  • Docker, CI/CD, docs, tests, and examples ready for GitHub publishing

Repository Layout

asr-bias-builder/
├── asr_bias_builder/        # Python package
├── config/                  # Default + example configs
├── docs/                    # User + architecture docs
├── examples/                # Synthetic sample decks & configs
├── scripts/                 # Shell helpers (make_bias.sh, claudecode, etc.)
├── tests/                   # Pytest suite and fixtures
├── docker/                  # Dockerfiles & compose for dev/prod
└── .github/                 # Workflows, templates, Dependabot

See the docs for detailed installation, configuration, and architecture notes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asr_bias_builder-0.1.0.tar.gz (31.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

asr_bias_builder-0.1.0-py3-none-any.whl (46.3 kB view details)

Uploaded Python 3

File details

Details for the file asr_bias_builder-0.1.0.tar.gz.

File metadata

  • Download URL: asr_bias_builder-0.1.0.tar.gz
  • Upload date:
  • Size: 31.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for asr_bias_builder-0.1.0.tar.gz
Algorithm Hash digest
SHA256 59dcd08c756f8f3a50165fbd3202df2155ad7c6fef6242e75eb3a99dad5fa004
MD5 f679072613ef3e4ab1bfc8bacec1c564
BLAKE2b-256 4618690bba218c18dad587a349c9e5630aeb8450e2bdd740f27fdadde738f245

See more details on using hashes here.

Provenance

The following attestation bundles were made for asr_bias_builder-0.1.0.tar.gz:

Publisher: release.yml on yaniv-golan/asr-bias-builder

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file asr_bias_builder-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for asr_bias_builder-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 18ea9ba122526be7ab4cf15720a1dabbc0f4daf991d1e3058a2dffdbf208b995
MD5 c24e6518dcb418ff677b72e87c9c04c8
BLAKE2b-256 ef9e3c848e3ceac82738ded04f8fc3c02e0e22e1136cd78d198517dc112f6780

See more details on using hashes here.

Provenance

The following attestation bundles were made for asr_bias_builder-0.1.0-py3-none-any.whl:

Publisher: release.yml on yaniv-golan/asr-bias-builder

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page