A small python implementation of common ASR corrections
Project description
CC - CommonCorrections
A simple repo that is used to correct common ASR outputs. The aim is not on mistakes but different ways of transcribing the same thing with a focus on how something may sound as opposed to the shortened form. The primary use case is to align the ground-truth and output from ASRs just before the WER is calculated.
Static Examples
there's -> there is
google.com -> google dot com
Dynamic Examples
1 2 3 -> one two three
53.4 -> fifty three point four
23:59 -> twenty three fifty nine
Features
- Designed to be used and fast (ish) with Pandas dataframes
- Lots of built in corrections for free
- Ability to easily extend with private corrections
Getting Started
- Install with:
pip install commoncorrections
- Import with:
from commoncorrections import CommonCorrections
Usage Examples
Turn numbers into words:
>>> cc = CommonCorrections()
>>> print(cc.correct_str("1 2 3"))
one two three
Turn times into words:
>>> cc = CommonCorrections()
>>> print(cc.correct_str("23:59"))
twenty three fifty nine
Correct a pandas dataframe:
df = pd.DataFrame(data={"transcript": ['5 4 3', "123 the time is 1:23"],
"asr_1": ["five four three", "one two three the time is one twenty three"],
"filename": ["./my_local_file.wav", "file2.wav"]})
cc = CommonCorrections()
# to correct only specific columns
new_df = cc.correct_df(df, column_list=['transcript', 'asr_1'])
# to apply to whole dataframe
new_whole_df = cc.correct_df(df)
mypy Type Checks
I tested installing mypy to check that types are compatible
(py) rob@rob-T480s:~/projects/CommonCorrections/commoncorrections (master)$ mypy commoncorrections.py
Success: no issues found in 1 source file
Change Log
- v1.0.0 - First release
- v1.0.1 - Fixed packaging issue
- v1.0.3 - Fixed pip packaging issue
- v1.0.4 - Fixed pip packaging issue
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
Close
Hashes for commoncorrections-1.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | de68332fc9bea62b37a6870a2cf0b09a702087ece0ea3465fe5cd50c1e21dab7 |
|
MD5 | f76bab813e76aaa5369cdcc640f646de |
|
BLAKE2b-256 | 51bb6351ffe0ae84bf48fdea3bf18d042f0ee55ba51a9b7c7e05678550e52090 |