Command-line tool for ASR evaluation and analysis.
Project description
Installation • Usage • Understanding the Report
Canal is a command-line tool that helps you measure how well a speech-recognition model is performing. Give it a CSV with the correct transcripts alongside the model-generated output, and it produces a self-contained HTML report with overall accuracy metrics and a word-by-word comparison so you can see exactly where the model gets things right — and where it doesn't.
Installation
The recommended way to install Canal is with pipx, which installs CLI tools in isolated environments:
pipx install corti-canal
Don't have pipx installed?
Install pipx first, then install Canal:
# macOS
brew install pipx
pipx ensurepath
# Linux (Debian/Ubuntu)
sudo apt install pipx
pipx ensurepath
# With pip (any platform)
pip install --user pipx
pipx ensurepath
After installing pipx, restart your terminal and then run:
pipx install corti-canal
Usage
canal report [OPTIONS] INPUT_PATH
Generate an ASR evaluation report from a CSV file. The CSV must contain at least two columns: one with reference transcripts (ground truth) and one with hypothesis transcripts (model output).
Use canal report --help to view the full command description and available options.
Arguments
| Argument | Description |
|---|---|
INPUT_PATH | Path to the CSV file containing the evaluation data. |
Options
| Option | Default |
|---|---|
--output-path | canal-report-{time}.html |
Path for the generated report. Defaults to a timestamped file in the current directory (e.g., canal-report-20251224-173020.html). | |
--ref-col | ref |
| Name of the CSV column containing the ground-truth reference transcripts. | |
--gen-col | gen |
| Name of the CSV column containing the model-generated transcripts. | |
--medical-terms | None |
| Path to a file containing medical terms (one per line) used to compute the Medical Term Recall metric. If not provided, Medical Term Recall will not be computed. | |
--alignment-type | lv |
Algorithm used to align reference and hypothesis words. lv (Levenshtein) is a conventional word-level alignment used for metrics like WER and CER. ea (ErrorAlign) is a more advanced algorithm that aligns words closer to the way a human reader would. | |
--overwrite | False |
| Allow overwriting an existing report file. Without this flag, Canal will refuse to write over an existing file. | |
Example
# Columns are named "ref" and "gen" (the defaults)
canal report data.csv
# Custom column names and output path
canal report --ref-col text --gen-col transcript --output-path report.html data.csv
# With medical term recall
canal report --medical-terms medical_terms.txt data.csv
Input CSV format
The CSV file should contain at minimum a reference column and a hypothesis column. Any additional columns are ignored.
ref,gen
"The patient was prescribed amoxicillin.","The patient was prescribed amoxicilin."
"The colonoscopy revealed no significant abnormalities.","The colonoscopy revealed significant abnormalities."
Understanding the Report
See it in action: The
example/folder contains sample input files and a pre-generated report you can open in your browser.
The generated HTML report is fully self-contained (no external dependencies) and can be opened in any browser. It has two main sections:
Summary
The summary section contains two tables side by side:
Dataset Metrics
| Metric | What it measures |
|---|---|
| Word Error Rate (WER) | The proportion of words the model got wrong — whether by misspelling them, adding extra words, or leaving words out. A WER of 0% means every word was correct; 6% means roughly 6 in 100 words had an error. Lower is better. |
| Character Error Rate (CER) | Like WER, but counted per character instead of per word. This gives a more fine-grained view: a single-letter typo (e.g. "amoxicillin" → "amoxicilin") counts as one full word error in WER but only a small character error in CER. Lower is better. |
| Medical Term Recall (MTR) | The proportion of medical terms from your keyword list that the model transcribed correctly. 100% means every term was captured. Only shown when the --medical-terms option is used. Higher is better. |
Dataset Summary
| Field | What it shows |
|---|---|
| Number of examples | How many transcript pairs were evaluated. |
| Number of reference words / characters | Total words and characters in the ground truth. |
| Number of generated words / characters | Total words and characters in the model output. |
Examples (Word Alignments)
Below the summary, the report lists every transcript pair with a word-level alignment visualization. Each example shows two rows:
- Ref. -- the ground-truth reference transcript
- Gen. -- the model-generated transcript
Words are color-coded to show the alignment result:
| Color | Label | Meaning |
|---|---|---|
| Black | Correct | The word was transcribed correctly. |
| Orange | Misspelling | The model produced a different word than the reference. |
| Teal | Extra Word | The model inserted a word that is not in the reference. |
| Red | Missing Word | A reference word was missing from the model output. |
| Grey | Padding | Visual padding to keep the reference and hypothesis rows aligned. |
| Blue box | Medical Term | Indicates a designated medical term for tracking recall. Only shown when the --medical-terms option is used. |
This visualization makes it easy to spot patterns -- for example, if the model consistently misspells certain medical terms or drops words at the end of sentences.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file corti_canal-0.1.0b3.tar.gz.
File metadata
- Download URL: corti_canal-0.1.0b3.tar.gz
- Upload date:
- Size: 7.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f775104d9d342571f8a7222baf29c568a07dd89c1e61a84cb101edc67b25660
|
|
| MD5 |
e03eae6ceb645a7e328ea799bc93462d
|
|
| BLAKE2b-256 |
9892505bf69080a6c668d6e36ac70a8a4fc1003e01f1a3c7b5dc329d5b0bf69c
|
File details
Details for the file corti_canal-0.1.0b3-py3-none-any.whl.
File metadata
- Download URL: corti_canal-0.1.0b3-py3-none-any.whl
- Upload date:
- Size: 7.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f0ebb1473ddf0a8582770add58843f4c58e25a0acbff00b1a6a26e666e179a9
|
|
| MD5 |
3cd4a68f66481970aca5325e0068ca38
|
|
| BLAKE2b-256 |
a01485efc1fadedb119daab5fc2caa57659a0824141b0bad5e08569bc7c9c43b
|