Build ASR bias artifacts from presentation decks
Project description
ASR Bias Builder
Build ASR bias artifacts from presentation decks for Whisper and Google Speech-to-Text
ASR Bias Builder extracts high-value entities from PDF/PPTX decks and produces:
deck_terms.txt– Whisperinitial_prompthotwordsphrase_set.json– Google Speech-to-Text v2 Speech Adaptation PhraseSet- Structured LLM candidates + verification telemetry for auditability
Quick Start
pip install -e .
asr-bias-builder pipeline deck.pdf
Artifacts land in ./asr-bias-output/<deck-name>/ by default (deck text, seeds, verified terms, prompt list, phrase set, review markdown) and the cross-run summary lives at ./asr-bias-summary.csv. Override the locations any time via --output-dir / --summary-csv.
Features
- Deterministic PDF/PPTX extraction with OCR normalization
- Seed mining with configurable filters, section weighting, and per-deck overrides
- LLM integration via Claude Code CLI (stdin, streaming JSON, read tool)
- Verification layer merges deterministic + LLM terms with alias tracking
- Artifact builders for Whisper prompts and Google Speech Adaptation PhraseSets
- Review markdown + CSV summaries for production runs
- Docker, CI/CD, docs, tests, and examples ready for GitHub publishing
Repository Layout
asr-bias-builder/
├── asr_bias_builder/ # Python package
├── config/ # Default + example configs
├── docs/ # User + architecture docs
├── examples/ # Synthetic sample decks & configs
├── scripts/ # Shell helpers (make_bias.sh, claudecode, etc.)
├── tests/ # Pytest suite and fixtures
├── docker/ # Dockerfiles & compose for dev/prod
└── .github/ # Workflows, templates, Dependabot
See the docs for detailed installation, configuration, and architecture notes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file asr_bias_builder-0.1.1.tar.gz.
File metadata
- Download URL: asr_bias_builder-0.1.1.tar.gz
- Upload date:
- Size: 31.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
919a96f230dd6168c36fcc82e33fd64b94377d1c7d79e841f962933f336104da
|
|
| MD5 |
9c09d6cbe3ab2cc3196e9110fd9e9698
|
|
| BLAKE2b-256 |
c48c0289e7f6d24aaf976d586463f98b15e32c94d7f0aaae9956648243f3b779
|
Provenance
The following attestation bundles were made for asr_bias_builder-0.1.1.tar.gz:
Publisher:
release.yml on yaniv-golan/asr-bias-builder
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
asr_bias_builder-0.1.1.tar.gz -
Subject digest:
919a96f230dd6168c36fcc82e33fd64b94377d1c7d79e841f962933f336104da - Sigstore transparency entry: 706021620
- Sigstore integration time:
-
Permalink:
yaniv-golan/asr-bias-builder@7bc1c271db6b6cc8d747f82bf43f92bdad6af8eb -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/yaniv-golan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@7bc1c271db6b6cc8d747f82bf43f92bdad6af8eb -
Trigger Event:
push
-
Statement type:
File details
Details for the file asr_bias_builder-0.1.1-py3-none-any.whl.
File metadata
- Download URL: asr_bias_builder-0.1.1-py3-none-any.whl
- Upload date:
- Size: 46.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1bf0ee3c016019a97a3c132586398a6fd1e47879cac3e8e12ba63939e31ee446
|
|
| MD5 |
929bd8bb8524ee74a120728d8f124b77
|
|
| BLAKE2b-256 |
0c36c899c12aeae1b724fb3d90d1043dc3365c613c8b67c583fd04b5532d0322
|
Provenance
The following attestation bundles were made for asr_bias_builder-0.1.1-py3-none-any.whl:
Publisher:
release.yml on yaniv-golan/asr-bias-builder
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
asr_bias_builder-0.1.1-py3-none-any.whl -
Subject digest:
1bf0ee3c016019a97a3c132586398a6fd1e47879cac3e8e12ba63939e31ee446 - Sigstore transparency entry: 706021645
- Sigstore integration time:
-
Permalink:
yaniv-golan/asr-bias-builder@7bc1c271db6b6cc8d747f82bf43f92bdad6af8eb -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/yaniv-golan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@7bc1c271db6b6cc8d747f82bf43f92bdad6af8eb -
Trigger Event:
push
-
Statement type: