Skip to main content

Build ASR bias artifacts from presentation decks

Project description

ASR Bias Builder

Build ASR bias artifacts from presentation decks for Whisper and Google Speech-to-Text

CI PyPI Python License: MIT

ASR Bias Builder extracts high-value entities from PDF/PPTX decks and produces:

  • deck_terms.txt – Whisper initial_prompt hotwords
  • phrase_set.json – Google Speech-to-Text v2 Speech Adaptation PhraseSet
  • Structured LLM candidates + verification telemetry for auditability

Quick Start

pip install -e .
asr-bias-builder pipeline deck.pdf

Artifacts land in ./asr-bias-output/<deck-name>/ by default (deck text, seeds, verified terms, prompt list, phrase set, review markdown) and the cross-run summary lives at ./asr-bias-summary.csv. Override the locations any time via --output-dir / --summary-csv.

Features

  • Deterministic PDF/PPTX extraction with OCR normalization
  • Seed mining with configurable filters, section weighting, and per-deck overrides
  • LLM integration via Claude Code CLI (stdin, streaming JSON, read tool)
  • Verification layer merges deterministic + LLM terms with alias tracking
  • Artifact builders for Whisper prompts and Google Speech Adaptation PhraseSets
  • Review markdown + CSV summaries for production runs
  • Docker, CI/CD, docs, tests, and examples ready for GitHub publishing

Repository Layout

asr-bias-builder/
├── asr_bias_builder/        # Python package
├── config/                  # Default + example configs
├── docs/                    # User + architecture docs
├── examples/                # Synthetic sample decks & configs
├── scripts/                 # Shell helpers (make_bias.sh, claudecode, etc.)
├── tests/                   # Pytest suite and fixtures
├── docker/                  # Dockerfiles & compose for dev/prod
└── .github/                 # Workflows, templates, Dependabot

See the docs for detailed installation, configuration, and architecture notes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asr_bias_builder-0.1.1.tar.gz (31.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

asr_bias_builder-0.1.1-py3-none-any.whl (46.5 kB view details)

Uploaded Python 3

File details

Details for the file asr_bias_builder-0.1.1.tar.gz.

File metadata

  • Download URL: asr_bias_builder-0.1.1.tar.gz
  • Upload date:
  • Size: 31.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for asr_bias_builder-0.1.1.tar.gz
Algorithm Hash digest
SHA256 919a96f230dd6168c36fcc82e33fd64b94377d1c7d79e841f962933f336104da
MD5 9c09d6cbe3ab2cc3196e9110fd9e9698
BLAKE2b-256 c48c0289e7f6d24aaf976d586463f98b15e32c94d7f0aaae9956648243f3b779

See more details on using hashes here.

Provenance

The following attestation bundles were made for asr_bias_builder-0.1.1.tar.gz:

Publisher: release.yml on yaniv-golan/asr-bias-builder

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file asr_bias_builder-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for asr_bias_builder-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1bf0ee3c016019a97a3c132586398a6fd1e47879cac3e8e12ba63939e31ee446
MD5 929bd8bb8524ee74a120728d8f124b77
BLAKE2b-256 0c36c899c12aeae1b724fb3d90d1043dc3365c613c8b67c583fd04b5532d0322

See more details on using hashes here.

Provenance

The following attestation bundles were made for asr_bias_builder-0.1.1-py3-none-any.whl:

Publisher: release.yml on yaniv-golan/asr-bias-builder

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page