Constraint-aware audio resynthesis and distillation pipeline.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

CARD Framework

This repository is the current implementation of CARD: Constraint-aware Audio Resynthesis and Distillation, the project described in EEE_196_CARD_UCL.md.

The paper is the conceptual and academic baseline. The codebase, however, has already moved beyond parts of the manuscript's original implementation plan. This README therefore prioritizes what the repository actually does now. When the paper and the current code diverge, treat the code, config, and coder_docs as the source of truth for day-to-day development.

Paper Metadata

Authors

Rei Dennis Agustin, 2022-03027, BS Electronics Engineering
Sean Luigi P. Caranzo, 2022-05398, BS Computer Engineering
Johnbell R. De Leon, 2021-01437, BS Computer Engineering
Christian Klein C. Ramos, 2022-03126, BS Electronics Engineering

Research Adviser

Rowel D. Atienza

Affiliation

University of the Philippines Diliman
December 2025

Abstract

CARD addresses the long-form podcast consumption bottleneck by generating a shorter conversational audio output that retains speaker identity and prosodic character instead of collapsing everything into plain text. The project combines transcript generation, speaker-aware summarization, voice-cloned resynthesis, and conversational overlap handling so a multi-speaker recording can be compressed toward a user-defined duration without discarding the listening experience that makes the original medium valuable.

High-Level Architecture

flowchart LR
    A[Source Audio] --> B[Stage 1<br/>Audio Ingestion]
    B --> C[Transcript JSON<br/>Speaker Metadata]
    C --> D[Stage 2<br/>Summarizer + Critic Loop]
    D --> E[Summary XML<br/>Speaker-Tagged Turns]
    E --> F[Stage 3<br/>Voice Clone Resynthesis]
    F --> G[Cloned Summary Audio]
    G --> H[Stage 4<br/>Interjector / Backchannels]
    H --> I[Final Conversational Audio]

    C -. Optional evaluation input .-> J[Benchmarks]
    E -. Optional evaluation input .-> J

    K[Hydra Config + Provider Adapters] -. controls .-> B
    K -. controls .-> D
    K -. controls .-> F
    K -. controls .-> H

What CARD Does

CARD is a multi-stage pipeline for converting long-form multi-speaker audio into a shorter, speaker-aware, resynthesized conversational output.

At a high level, the repository currently supports:

Stage 1: Audio ingestion and transcript generation
- Source separation
- ASR, diarization, and alignment
- Transcript JSON generation with speaker metadata
Stage 2: Constraint-aware summarization
- Summarizer and critic agent loop
- Duration-first summary generation with speaker-tagged XML output
- Retrieval-backed or full-transcript summarization paths
Stage 3: Voice cloning and resynthesis
- Speaker sample generation
- Voice-cloned rendering of summary turns
- Live-draft voice cloning during summarizer edits
Stage 4: Conversational interjection
- Optional overlap and backchannel synthesis on top of the cloned summary
Benchmarking and evaluation
- Summarization benchmark workflows
- Source-grounded QA benchmark workflows
- Diarization benchmark workflows

Paper vs. Current Repository

EEE_196_CARD_UCL.md explains the original CARD paper, problem framing, and proposed module design. The repository now reflects a more developed engineering system than that initial write-up.

Important differences from the manuscript-level description include:

The repo is now configuration-driven through Hydra instead of being tied to one fixed experimental path.
The runtime is now duration-first, centered on target_seconds and tolerance checks, rather than a simple word-budget-only workflow.
The summary output contract is now speaker-tagged XML, which feeds the downstream voice-clone and interjector stages.
The default stage-2/stage-3 flow can use live-draft voice cloning, where turn audio is rendered during summary editing instead of only after the final draft is approved.
The repository includes substantial benchmarking, evaluation, and operator tooling that goes beyond the initial paper narrative.
Provider support has expanded: the codebase is organized around adapters and config-selected backends rather than a single hardcoded model stack.

In short: the paper explains why CARD exists; this repository captures how CARD currently works.

Repository Layout

src/card_framework/
  agents/           A2A executors, DTOs, tool loops, client transport
  audio_pipeline/   Audio ingestion, speaker samples, voice cloning, interjector
  benchmark/        Summarization, QA, and diarization benchmarks
  cli/              Runtime, setup, calibration, matrix, and eval entrypoints
  config/           Hydra configuration
  orchestration/    Transcript DTOs and stage orchestration
  prompts/          Jinja2 prompt templates
  providers/        LLM and embedding provider adapters
  retrieval/        Transcript indexing and retrieval
  runtime/          Runtime planning and execution support
  shared/           Shared utilities, events, and logging
  _vendor/index_tts/

Other important locations:

artifacts/: generated transcripts, cloned audio, benchmark outputs, and other runtime artifacts
checkpoints/: local model/runtime checkpoints
coder_docs/: repository-specific architecture, workflow, and maintenance guidance

Common Commands

uv sync --dev
uv run python -m card_framework.cli.main --help
uv run python -m card_framework.cli.setup_and_run --help
uv run python -m card_framework.cli.calibrate --help
uv run python -m card_framework.cli.run_summary_matrix --help
uv run python -m card_framework.benchmark.run --help
uv run python -m card_framework.benchmark.diarization --help
uv run python -m card_framework.benchmark.qa --help
uv run ruff check .
uv run pytest

Common execution entrypoints:

uv run python -m card_framework.cli.setup_and_run --audio-path <path-to-audio>
uv run python -m card_framework.cli.main
uv run python -m card_framework.cli.calibrate

Package Usage

The repository now exposes a library entrypoint for installed-package use:

pip install card-framework

from card_framework import infer

result = infer(
    "audio.wav",
    "outputs/run_001",
    300,
    device="cpu",
    vllm_url="http://localhost:8000/v1",
)
print(result.summary_xml_path)
print(result.final_audio_path)

infer(audio_wav, output_dir, target_duration_seconds, *, device, ...) runs the full stage-1 to stage-4 pipeline and returns an InferenceResult with the main emitted artifact paths. target_duration_seconds is required for every call and overrides any duration target declared in the loaded config file. device is also required and must be either cpu or cuda. vllm_url is the first-class packaged-runtime override for OpenAI-compatible endpoints, and it forces the shared summarizer, critic, and interjector LLM path onto the provided vLLM-compatible server for that call. The call writes into output_dir using this high-level layout:

outputs/run_001/
  transcript.json
  summary.xml
  agent_interactions.log
  audio_stage/
    voice_clone/
    interjector/

Installed-package runtime notes:

Supported public packaged-runtime platform as of March 9, 2026: Windows only. macOS and Linux are not yet validated for the public pip install card-framework whole-pipeline path, and infer(...) now fails fast on those platforms instead of attempting a partial run.
CARD_FRAMEWORK_CONFIG: optional path to a full YAML config file when you need to override the default packaged provider/runtime config for infer(...).
CARD_FRAMEWORK_HOME: optional writable runtime home used for extracted IndexTTS assets, checkpoints, and bootstrap state. If unset, the package uses the platform-appropriate user data directory.
CARD_FRAMEWORK_VLLM_URL: optional environment-variable equivalent of the vllm_url= argument.
CARD_FRAMEWORK_VLLM_API_KEY: optional environment-variable equivalent of the vllm_api_key= argument. If omitted for vLLM, the packaged runtime uses EMPTY, which matches the common local keyless vLLM setup.
If you choose device="cuda", the packaged runtime currently supports only CUDA 12.6. infer(...) now inspects the installed PyTorch build first and, when the host itself reports CUDA 12.6, automatically replaces CPU-only or mismatched torch and torchaudio wheels with the CUDA 12.6 build before it proceeds. In a uv-managed project it uses uv pip; otherwise it falls back to python -m pip.
The packaged default is now vLLM-first. If the effective config selects another provider, infer(...) resolves required credentials before it starts the subprocess runtime:
- interactive terminals: infer(...) securely prompts for missing API keys or access tokens without echoing them and without placing them on the subprocess command line
- non-interactive runs: infer(...) fails fast with an actionable error that names the missing config field and the supported environment variable
Supported credential environment variables for the packaged path include DEEPSEEK_API_KEY, GEMINI_API_KEY or GOOGLE_API_KEY, ZAI_API_KEY, HUGGINGFACE_TOKEN or HF_TOKEN, and the configured audio.diarization.pyannote.auth_token_env value.
CARD_FRAMEWORK_FFMPEG_EXECUTABLE: optional path to a custom ffmpeg binary. When unset, packaged infer(...) falls back to the bundled imageio-ffmpeg executable and prepends its directory to PATH for nested subprocesses.
CARD_FRAMEWORK_UV_EXECUTABLE: optional path to a custom uv binary. When unset, packaged infer(...) resolves the installed uv console script from the active environment before bootstrapping the vendored IndexTTS runtime.
Packaged infer(...) no longer publishes ctc-forced-aligner in Requires-Dist. It first tries to install the pinned upstream source on demand when stage-1 forced alignment needs it. If that bootstrap cannot complete, packaged inference falls back to approximate segment-derived timing instead of failing the whole run.

Public PyPI Release

This repository now includes a GitHub Actions trusted-publishing workflow at .github/workflows/publish-pypi.yml that publishes tags matching v* to PyPI.

The public PyPI project already exists. As of March 9, 2026:

1.0.1 is the first public release, but it published the wrong bare ctc-forced-aligner dependency name for downstream pip users.
v1.0.2 was tagged but never published because PyPI rejected the direct Git dependency metadata.
1.0.4 is the current public release.
The next install-path fix must ship under a new version such as 1.0.5; do not reuse a failed or already-published version number.

Repository-side release steps:

Create a dedicated release-preparation branch such as release/v1.0.5 from the target integration branch, then run the release preflight in coder_docs/github_actions_release_spec.md, including build, targeted tests, and artifact-scoped uv publish --dry-run.
Merge the reviewed release branch, then tag the merged integration-branch commit and push it, for example:
```
git tag -a v1.0.5 -m v1.0.5
git push origin v1.0.5
```
Do not assume the release is complete just because the tag push succeeded. Watch the GitHub Actions run to completion and inspect failures directly if needed:
```
gh run list --workflow "Publish PyPI Package" --limit 1
gh run watch <run-id> --exit-status
gh run view <run-id> --log-failed
```

After the workflow succeeds, verify the public release:

python -m pip install --no-cache-dir card-framework
python -c "from card_framework import infer; print(infer)"

For the repo-specific release build standards and post-tag verification rules, see coder_docs/github_actions_release_spec.md.

Documentation

EEE_196_CARD_UCL.md: the CARD paper and project manuscript
coder_docs/codebase_guide.md: current architecture, runtime flow, commands, and maintenance expectations
coder_docs/memory/errors_and_notes.md: repository memory for recurring pitfalls and prior fixes
coder_docs/fault_localization_workflow.md: bug triage and failing-test workflow

If you are changing behavior, prompts, workflows, or commands, start with coder_docs/codebase_guide.md.

License

This repository is source-available under LICENSE.md, using the PolyForm Noncommercial 1.0.0 license. Noncommercial use is allowed; commercial use requires separate permission from the licensors.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

lolfaceftw

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.2.0

Mar 18, 2026

1.1.0

Mar 15, 2026

1.0.7

Mar 10, 2026

1.0.6

Mar 10, 2026

This version

1.0.5

Mar 9, 2026

1.0.4

Mar 9, 2026

1.0.3

Mar 9, 2026

1.0.1

Mar 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

card_framework-1.0.5.tar.gz (982.6 kB view details)

Uploaded Mar 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

card_framework-1.0.5-py3-none-any.whl (1.1 MB view details)

Uploaded Mar 9, 2026 Python 3

File details

Details for the file card_framework-1.0.5.tar.gz.

File metadata

Download URL: card_framework-1.0.5.tar.gz
Upload date: Mar 9, 2026
Size: 982.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for card_framework-1.0.5.tar.gz
Algorithm	Hash digest
SHA256	`c953a1f3d462f693ee796219194cfbfd00db5a2d8af6e8ad92a680ccebb81ecd`
MD5	`0c93dfcd2de7d42e00ae1806263c22f0`
BLAKE2b-256	`4b4379b088cffe0e04f3d2f07a5280b8e924051056f37d738705ba582b90ad22`

See more details on using hashes here.

File details

Details for the file card_framework-1.0.5-py3-none-any.whl.

File metadata

Download URL: card_framework-1.0.5-py3-none-any.whl
Upload date: Mar 9, 2026
Size: 1.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for card_framework-1.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ca5194ea19656c0559bc1c1cb18e089e03adc44de1e8e2fb3e391c31e6f9c9df`
MD5	`8c5d049ccbe4102d14bdd80387b535d7`
BLAKE2b-256	`9f1fee07486b9f83657c501df280a86263dcfb6eb05c65d15e989408492e6d80`

See more details on using hashes here.

card-framework 1.0.5

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

CARD Framework

Paper Metadata

Abstract

High-Level Architecture

What CARD Does

Paper vs. Current Repository

Repository Layout

Common Commands

Package Usage

Public PyPI Release

Documentation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes