Skip to main content

Command-line tool for ASR evaluation and analysis.

Project description

Corti Canal

Python Versions License

InstallationUsageUnderstanding the Report


Canal is a command-line tool that helps you measure how well a speech-recognition model is performing. Give it a CSV with the correct transcripts alongside the model-generated output, and it produces a self-contained HTML report with overall accuracy metrics and a word-by-word comparison so you can see exactly where the model gets things right — and where it doesn't.

Installation

The recommended way to install Canal is with pipx, which installs CLI tools in isolated environments:

pipx install corti-canal
Don't have pipx installed?
Install pipx first, then install Canal:
# macOS
brew install pipx
pipx ensurepath

# Linux (Debian/Ubuntu)
sudo apt install pipx
pipx ensurepath

# With pip (any platform)
pip install --user pipx
pipx ensurepath

After installing pipx, restart your terminal and then run:

pipx install corti-canal

Usage

canal report [OPTIONS] INPUT_PATH

Generate an ASR evaluation report from a CSV file. The CSV must contain at least two columns: one with reference transcripts (ground truth) and one with hypothesis transcripts (model output).

Use canal report --help to view the full command description and available options.

Arguments

ArgumentDescription
INPUT_PATHPath to the CSV file containing the evaluation data.

Options

OptionDefault
--output-pathcanal-report-{time}.html
Path for the generated report. Defaults to a timestamped file in the current directory (e.g., canal-report-20251224-173020.html).
--ref-colref
Name of the CSV column containing the ground-truth reference transcripts.
--gen-colgen
Name of the CSV column containing the model-generated transcripts.
--medical-termsNone
Path to a file containing medical terms (one per line) used to compute the Medical Term Recall metric. If not provided, Medical Term Recall will not be computed.
--alignment-typelv
Algorithm used to align reference and hypothesis words. lv (Levenshtein) is a conventional word-level alignment used for metrics like WER and CER. ea (ErrorAlign) is a more advanced algorithm that aligns words closer to the way a human reader would.
--overwriteFalse
Allow overwriting an existing report file. Without this flag, Canal will refuse to write over an existing file.

Example

# Columns are named "ref" and "gen" (the defaults)
canal report data.csv

# Custom column names and output path
canal report --ref-col text --gen-col transcript --output-path report.html data.csv

# With medical term recall
canal report --medical-terms medical_terms.txt data.csv

Input CSV format

The CSV file should contain at minimum a reference column and a hypothesis column. Any additional columns are ignored.

ref,gen
"The patient was prescribed amoxicillin.","The patient was prescribed amoxicilin."
"The colonoscopy revealed no significant abnormalities.","The colonoscopy revealed significant abnormalities."

Understanding the Report

See it in action: The example/ folder contains sample input files and a pre-generated report you can open in your browser.

The generated HTML report is fully self-contained (no external dependencies) and can be opened in any browser. It has two main sections:

Summary

The summary section contains two tables side by side:

Dataset Metrics

MetricWhat it measures
Word Error Rate (WER)The proportion of words the model got wrong — whether by misspelling them, adding extra words, or leaving words out. A WER of 0% means every word was correct; 6% means roughly 6 in 100 words had an error. Lower is better.
Character Error Rate (CER)Like WER, but counted per character instead of per word. This gives a more fine-grained view: a single-letter typo (e.g. "amoxicillin" → "amoxicilin") counts as one full word error in WER but only a small character error in CER. Lower is better.
Medical Term Recall (MTR)The proportion of medical terms from your keyword list that the model transcribed correctly. 100% means every term was captured. Only shown when the --medical-terms option is used. Higher is better.

Dataset Summary

FieldWhat it shows
Number of examplesHow many transcript pairs were evaluated.
Number of reference words / charactersTotal words and characters in the ground truth.
Number of generated words / charactersTotal words and characters in the model output.

Examples (Word Alignments)

Below the summary, the report lists every transcript pair with a word-level alignment visualization. Each example shows two rows:

  • Ref. -- the ground-truth reference transcript
  • Gen. -- the model-generated transcript

Words are color-coded to show the alignment result:

ColorLabelMeaning
BlackCorrectThe word was transcribed correctly.
OrangeMisspellingThe model produced a different word than the reference.
TealExtra WordThe model inserted a word that is not in the reference.
RedMissing WordA reference word was missing from the model output.
GreyPaddingVisual padding to keep the reference and hypothesis rows aligned.
Blue boxMedical TermIndicates a designated medical term for tracking recall. Only shown when the --medical-terms option is used.

This visualization makes it easy to spot patterns -- for example, if the model consistently misspells certain medical terms or drops words at the end of sentences.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

corti_canal-0.1.0b2.tar.gz (6.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

corti_canal-0.1.0b2-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file corti_canal-0.1.0b2.tar.gz.

File metadata

  • Download URL: corti_canal-0.1.0b2.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for corti_canal-0.1.0b2.tar.gz
Algorithm Hash digest
SHA256 7e1a8607ba7c783ec2e722cc5fa0a1ec53bf14af46fa8ae3961e2c6285665936
MD5 f2a79b3050c9824d1509c4336d505b20
BLAKE2b-256 1882b8fb0ba88356675f45b12cfe3be32d170a19fc5f33ed83d08cff12e95464

See more details on using hashes here.

File details

Details for the file corti_canal-0.1.0b2-py3-none-any.whl.

File metadata

  • Download URL: corti_canal-0.1.0b2-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for corti_canal-0.1.0b2-py3-none-any.whl
Algorithm Hash digest
SHA256 71a8c60c40759c7c9903f92aa9fe36023facaf7fb94a58bb6e360226e7954fe0
MD5 9e00a9823d40b3a6b45d4ef1831ec9de
BLAKE2b-256 6afafa5d1e6ab8ec7d68128239d211e77623e47739ac92dcfe7c47ea15938246

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page