Skip to main content

Text normalization (TN/ITN) and ASR evaluation framework for Bambara (Bamanankan) language processing

Project description

Bambara Text Normalizer

Text Normalization & ASR Evaluation Framework for Bambara (Bamanankan)

InstallationNormalizationASR EvaluationModesCLILinguisticsReferences


Purpose

This tool serves two complementary purposes for Bambara language processing:

Purpose Description
Text Normalization Standardize Bambara text for any downstream NLP task (TTS, MT, NER, etc.)
ASR Evaluation Fair WER/CER computation that accounts for valid orthographic variation

[!NOTE] Bambara orthography allows variation: the same utterance can be written as k'a ta or ka a ta both are correct. Without normalization, evaluation metrics unfairly penalize models for human writing inconsistencies rather than actual recognition errors.


Installation

pip install git+https://github.com/sudoping01/bambara-text-normalization.git

Text Normalization

from bambara_normalizer import normalize

# Default: expand contractions
normalize("⁠Ne k’a ma ko ayi")           
normalize("⁠K’ale t’a fɛ k’a kɛ")    
normalize("⁠K’i k’i janto i yɛrɛ la")        


# Contract mode: collapse expanded forms
normalize("Ne ko a ma ko ayi", mode="contract")    
normalize("Ko ale tɛ a fɛ ka o kɛ", mode="contract")    
normalize("Ko i ka i janto i yɛrɛ la", mode="contract")    

# Preserve mode: don't touch contractions
normalize("K’i k’i janto i yɛrɛ la", mode="preserve")     

Custom Settings

from bambara_normalizer import normalize

# Full control over normalization
text = normalize(
    "Ka na son k’o k’a la",
    mode="expand",                      # "expand" | "contract" | "preserve"
    preserve_tones=False,               
    normalize_legacy_orthography=True, 
    lowercase=True,                     
    remove_punctuation=False,           
    normalize_whitespace=True,         
    normalize_apostrophes=True,         
    normalize_special_chars=True,    
    expand_dates = False,
    expand_numbers=False,  
    expand_times=False,            
    remove_diacritics_except_tones=False,  
    handle_french_loanwords=True,   
    strip_repetitions=False,       
    normalize_compounds=True, 
)

Using BambaraNormalizer Class

For repeated normalization with consistent settings:

from bambara_normalizer import BambaraNormalizer, BambaraNormalizerConfig


config = BambaraNormalizerConfig(contraction_mode="expand") # change it to "contract" or preserve
normalizer = BambaraNormalizer(config)
normalizer("A y'a fɔ")      
normalizer("k'a la")   

# Contraction mode
config = BambaraNormalizerConfig(contraction_mode="contract")
normalizer = BambaraNormalizer(config)
normalizer("bɛ a fɔ")     
normalizer("ka a ta")     

Predefined Configuration Presets

from bambara_normalizer import BambaraNormalizer, BambaraNormalizerConfig

# For WER evaluation (aggressive normalization, removes tones)
normalizer = BambaraNormalizer(BambaraNormalizerConfig.for_wer_evaluation())

# For WER with contract mode
normalizer = BambaraNormalizer(BambaraNormalizerConfig.for_wer_evaluation(mode="contract"))

# For CER evaluation
normalizer = BambaraNormalizer(BambaraNormalizerConfig.for_cer_evaluation())

# Preserve tone marks
normalizer = BambaraNormalizer(BambaraNormalizerConfig.preserving_tones())

# Minimal normalization (only essential fixes)
normalizer = BambaraNormalizer(BambaraNormalizerConfig.minimal())

Number Normalization

The normalizer supports bidirectional number conversion between digits and Bambara words (TN/ITN).

With Normalizer

from bambara_normalizer import normalize

normalize("A ye 100 sɔrɔ", expand_numbers=True)   # => "a ye kɛmɛ sɔrɔ"
normalize("A ye 100 sɔrɔ", expand_numbers=False)  # => "a ye 100 sɔrɔ"

# WER preset has expand_numbers=True by default
normalize("A ye 5 ta", preset="wer")  # => "a ye duuru ta"

Digits to Words (Text Normalization)

from bambara_normalizer import number_to_bambara, normalize_numbers_in_text


number_to_bambara(5)        # => "duuru"
number_to_bambara(123)      # => "kɛmɛ ni mugan ni saba"
number_to_bambara(1000)     # => "waa kelen"
number_to_bambara(5.3)      # => "duuru tomi saba"

# In text
normalize_numbers_in_text("A ye 5 wari di")      # => "A ye duuru wari di"
normalize_numbers_in_text("Mɔgɔ 100 nana")       # => "Mɔgɔ kɛmɛ nana"
normalize_numbers_in_text("N ye shekɛ 1000 sɔrɔ") # => "N ye shekɛ waa kelen sɔrɔ"

Words to Digits (Inverse Text Normalization)

from bambara_normalizer import bambara_to_number, denormalize_numbers_in_text

bambara_to_number("duuru")                    # => 5
bambara_to_number("kɛmɛ ni mugan ni saba")    # => 123
bambara_to_number("duuru tomi saba")          # => 5.3

# In text
denormalize_numbers_in_text("A ye duuru di a ma")  # => "A ye 5 di a ma"
denormalize_numbers_in_text("Mɔgɔ kɛmɛ nana")      # => "Mɔgɔ 100 nana"

Number Vocabulary

Value Bambara Value Bambara
0 fu 10 tan
1 kelen 20 mugan
2 fila 30 bi saba
3 saba 40 bi naani
4 naani 50 bi duuru
5 duuru 100 kɛmɛ
6 wɔɔrɔ 1000 waa
7 wolonwula 1,000,000 miliyɔn
8 seegin decimal tomi
9 kɔnɔntɔn connector ni


Date Normalization

The normalizer supports bidirectional date conversion between standard formats and Bambara expressions (TN/ITN).

With Normalizer

from bambara_normalizer import normalize

normalize("A bɛ na 13-10-2024 la", expand_dates=True)   # ==> "a bɛ na oktɔburu tile tan ni saba san Baa fila ni mugan ni naani la"
normalize("A bɛ na 13-10-2024 la", expand_dates=False)  # => "a bɛ na 13-10-2024 la"

# WER preset has expand_dates=True by default
normalize("A bɛ na 25-01-2008 la", preset="wer")  # => "a bɛ na zanwuye tile mugan ni duuru san Baa fila ni seegin la"

Date to Bambara (Text Normalization)

from bambara_normalizer import date_to_bambara, format_date_bambara, normalize_dates_in_text
from datetime import date

# Single dates
date_to_bambara(2024, 10, 13)      # => "Oktɔburu tile tan ni saba san baa fila ni mugan ni naani"
date_to_bambara(2008, 1, 25)       # => "Zanwuye tile mugan ni duuru san baa fila ni seegin"

# With "kalo" (month) included
date_to_bambara(2024, 10, 13, include_kalo=True)  # => "Oktɔburu kalo tile tan ni saba san ..."

# With day of week
date_to_bambara(2024, 10, 13, include_day_of_week=True)  # => "Kari Oktɔburu tile ..." (Sunday)

# From date object or string
format_date_bambara(date(2024, 10, 13))  # => "Oktɔburu tile tan ni saba san ..."
format_date_bambara("13-10-2024")        # => "Oktɔburu tile tan ni saba san ..."

# In text
normalize_dates_in_text("A bɛ na 13-10-2024 la")  # => "A bɛ na Oktɔburu tile tan ni saba san baa fila ni mugan ni naani la"

Bambara to Date (Inverse Text Normalization)

from bambara_normalizer import bambara_to_date

bambara_to_date("Oktɔburu tile tan ni saba san baa fila ni mugan ni naani")
# => datetime.date(2024, 10, 13)

bambara_to_date("Zanwuye tile mugan ni duuru san baa fila ni seegin")
# => datetime.date(2008, 1, 25)

Date Format

Bambara dates follow this structure:

[Month] (kalo) tile [day] san [year]

Example: 13-10-2024 => Oktɔburu tile tan ni saba san baa fila ni mugan ni naani

Literal translation: "October day thirteen year two thousand twenty-four"

Days of the Week

Day Bambara Day Bambara
Monday Tɛnɛn Friday Juma
Tuesday Tarata Saturday Sibiri
Wednesday Araba Sunday Kari
Thursday Alamisa

Months of the Year

Month Bambara Month Bambara
January Zanwuye July Zuluye
February Feburuye August Uti
March Marsi September Sɛtanburu
April Awirili October Oktɔburu
May November Nɔwanburu
June Zuwen December Desanburu

Time Normalization

The normalizer supports bidirectional time and duration conversion between standard formats and Bambara expressions (TN/ITN).

With Normalizer

from bambara_normalizer import normalize

normalize("A nana 7:30 la", expand_times=True)   # => "a nana nɛgɛ kaɲɛ wolonwula ni sanga bi saba la"
normalize("A nana 7:30 la", expand_times=False)  # => "a nana 7:30 la"

# WER preset has expand_times=True by default
normalize("A nana 13:50 la", preset="wer")  # => "a nana nɛgɛ kaɲɛ tan ni saba ni sanga bi duuru la"

Clock Time to Bambara (Text Normalization)

from bambara_normalizer import time_to_bambara, format_time_bambara, normalize_times_in_text
from datetime import time

# Clock times
time_to_bambara(1, 0)       # => "Nɛgɛ kaɲɛ kelen"
time_to_bambara(1, 5)       # => "Nɛgɛ kaɲɛ kelen ni sanga duuru"
time_to_bambara(7, 30)      # => "Nɛgɛ kaɲɛ wolonwula  ni sanga bi saba"
time_to_bambara(13, 50)     # => "Nɛgɛ kaɲɛ tan ni saba ni sanga bi duuru"

# From time object or string
format_time_bambara(time(7, 30))  # => "Nɛgɛ kaɲɛ wolonwula ni sanga bi saba"
format_time_bambara("13:50")      # => "Nɛgɛ kaɲɛ tan ni sab ni sanga bi duuru"

# In text
normalize_times_in_text("A nana 7:30 la")  # => "A nana nɛgɛ kaɲɛ wolonwula ni sanga bi saba la"

Bambara to Clock Time (Inverse Text Normalization)

from bambara_normalizer import bambara_to_time

bambara_to_time("Nɛgɛ kaɲɛ kelen")
# => datetime.time(1, 0)

bambara_to_time("Nɛgɛ kaɲɛ wolonwula ni sanga bi saba")
# => datetime.time(7, 30)

Duration to Bambara

from bambara_normalizer import duration_to_bambara, format_duration_bambara

# Durations
duration_to_bambara(minutes=30)                      # => "miniti bi saba"
duration_to_bambara(hours=1, minutes=30)             # => "lɛrɛ kelen ni miniti bi saba"
duration_to_bambara(hours=1, minutes=30, seconds=10) # => "lɛrɛ kelen ni miniti bi saba ni segɔni tan"

# From string format
format_duration_bambara("30min")      # => "miniti bi saba"
format_duration_bambara("1h30min")    # => "lɛrɛ kelen ni miniti bi saba"
format_duration_bambara("1h30min10s") # => "lɛrɛ kelen ni miniti bi saba ni segɔni tan"

Bambara to Duration (Inverse Text Normalization)

from bambara_normalizer import bambara_to_duration

bambara_to_duration("miniti bi saba")
# => (0, 30, 0)  # (hours, minutes, seconds)

bambara_to_duration("lɛrɛ kelen ni miniti bi saba")
# => (1, 30, 0)

bambara_to_duration("lɛrɛ kelen ni miniti bi saba ni segɔni tan")
# => (1, 30, 10)

Time Format

Clock time follows this structure:

Nɛgɛ kaɲɛ [hour] ( ni sanga [minutes])

Example: 7:30 => Nɛgɛ kaɲɛ wolonwula ni sanga bi saba

Literal translation: "Clock needle seven passed with minute thirty"

Duration follows this structure:

(lɛrɛ [hours] ni) (miniti [minutes] ni) (segɔni [seconds])

Example: 1h30min10s => lɛrɛ kelen ni miniti bi saba ni segɔni tan

ASR Evaluation Framework

Quick Evaluation

from bambara_normalizer import evaluate


result = evaluate(
    reference="B'a fɔ ka taa",
    hypothesis="bɛ a fɔ ka taa"

)

print(f"WER: {result.wer:.2%}")  
print(f"CER: {result.cer:.2%}")
print(f"MER: {result.mer:.2%}")

Evaluator with Mode Selection

[!IMPORTANT] The mode parameter determines how contractions are handled during evaluation. This significantly impacts WER scores when reference and hypothesis use different orthographic conventions.

from bambara_normalizer import evaluate


result = evaluate(
    reference="k'a ta", 
    hypothesis="ka a ta"
    mode="expand" # contract | preserve 
    )
print(f"WER: {result.wer:.2%}")   

Flexible Configuration

For full control use the evalution class and define the normalization configuration:

from bambara_normalizer import (
    BambaraNormalizer, 
    BambaraNormalizerConfig, 
    BambaraEvaluator
)

# Define custom normalizer: same then the config we did upside
config = BambaraNormalizerConfig(
    contraction_mode="contract",
    preserve_tones=False,
    lowercase=True,
    remove_punctuation=True,
    normalize_legacy_orthography=True,
)


evaluator = BambaraEvaluator(config=config)


result = evaluator.evaluate(
    reference="K'a fɔ́!",
    hypothesis="ka a fo"
)
print(f"WER: {result.wer:.2%}")

Batch Evaluation

from bambara_normalizer import BambaraEvaluator

evaluator = BambaraEvaluator(mode="contract")

references = ["k'a ta", "b'a fɔ", "n'a ma"]
hypotheses = ["ka a ta", "bɛ a fɔ", "na a ma"]

aggregate, individual = evaluator.evaluate_batch(references, hypotheses)

print(f"Overall WER: {aggregate.wer:.2%}")
for i, result in enumerate(individual):
    print(f"  [{i}] WER: {result.wer:.2%}")

Available Metrics

Metric Method Description
WER evaluator.wer(ref, hyp) Word Error Rate
CER evaluator.cer(ref, hyp) Character Error Rate
MER evaluator.mer(ref, hyp) Match Error Rate
WIL evaluator.wil(ref, hyp) Word Information Lost
WIP evaluator.wip(ref, hyp) Word Information Preserved
DER result.der Diacritic Error Rate (tone accuracy)

Contraction Modes

[!WARNING] Choosing the right mode is critical for fair ASR evaluation. Using the wrong mode can inflate or deflate WER scores artificially.

Version 2.0 introduces three contraction modes to handle bidirectional Bambara orthography:

Mode Direction When to Use
expand b'a => bɛ a Default. Full linguistic analysis with k'/n' disambiguation
contract bɛ a => b'a Simpler, more forgiving. No disambiguation ambiguity
preserve No change Debugging, or when you want raw comparison

Why Contract Mode Matters

Expansion is complex the contraction k' can expand to three different words:

Contraction Possible Expansions Meaning
k'a ka a infinitive marker
k'a kɛ a verb "to do"
k'a ko a verb "to say"

The normalizer uses context to disambiguate, but some cases are genuinely ambiguous.

Contraction is simple all variants collapse to the same form:

ka a  ─┐
kɛ a  ─┼─>  k'a
ko a  ─┘

[!TIP] For ASR evaluation, contract mode is more forgiving because it doesn't penalize the model for disambiguation differences when both forms are linguistically valid.

Contraction Mappings

Expanded Contracted Function
+ vowel b' Affirmative imperfective
+ vowel t' Negative imperfective
ye + vowel y' Perfective marker
ni + vowel n' Conjunction
na + vowel n' Verb "come"
ka + vowel k' Infinitive marker
+ vowel k' Verb "to do"
ko + vowel k' Verb "to say"

Command Line Interface

Basic Usage

# default mode is expand
bambara-normalize "B'a fɔ́"
# Output: bɛ a fɔ

# Contract mode
bambara-normalize --mode contract "bɛ a fɔ"
# Output: b'a fɔ

# Preserve mode
bambara-normalize --mode preserve "B'a fɔ"
# Output: b'a fɔ

With Presets

# WER preset (aggressive normalization)
bambara-normalize --preset wer "K'a fɔ́!"
# Output: ka a fɔ

# WER preset with contract mode
bambara-normalize --preset wer --mode contract "Ka a fɔ"
# Output: k'a fɔ

# CER preset
bambara-normalize --preset cer "B'a fɔ"

File Evaluation

# Evaluate reference vs hypothesis files
bambara-normalize --evaluate reference.txt hypothesis.txt

# With contract mode
bambara-normalize --evaluate --mode contract ref.txt hyp.txt

# Output detailed metrics
bambara-normalize --evaluate --detailed ref.txt hyp.txt

Batch Processing

# Process file line by line
bambara-normalize --input corpus.txt --output normalized.txt

# With specific mode
bambara-normalize --input corpus.txt --output normalized.txt --mode contract

Linguistic Decisions

Why Normalize?

Bambara orthography allows variation. For the same spoken utterance:

  • Annotator A writes: k'a ta
  • Annotator B writes: ka a ta

Both are correct. Without normalization, we penalize models for human writing inconsistencies, not recognition errors.

n' Disambiguation

Pattern Expansion Meaning
n' + pronoun + ma na Verb "to come"
n' + other ni Conjunction (default)

Examples:

  • n'a ma => na a ma (come to him)
  • n'a ta => ni a ta (if he takes)

k' Disambiguation Rules

Applied in priority order (derived from Daba grammar):

Priority Pattern Result Example
1 k' + pronoun + ma + X + ye k'a ma hɛrɛ ye => kɛ a ma hɛrɛ ye
2 k' + pronoun + ma + speech marker ko k'anw ma ko => ko anw ma ko
3 k' + pronoun + postposition k'a la => kɛ a la
4 k' + pronoun + clause marker ko k'an ka ta => ko an ka ta
5 Default ka k'a ta => ka a ta

Postpositions: la, na, ye, , kɔnɔ, , kɔrɔ, kan, kun, ɲɛ, bolo

Clause markers: ka, kana, , , bɛna, tɛna, tun, mana

Legacy Orthography Conversion

Legacy Modern Notes
è ɛ Pre-standard spelling
ò ɔ Pre-standard spelling
ny ɲ Digraph => single character
ng ŋ Digraph => single character
ñ ɲ Spanish/Senegalese variant

Known Limitations

Inherent Linguistic Ambiguity

[!CAUTION] Some Bambara constructions are genuinely ambiguous and cannot be resolved without broader context. This is not a bug it reflects real ambiguity in the language.

The ye Problem

The word ye has five grammatical functions:

Function Example Meaning
Postposition à fɔ́ ń yé say it to me
Perfective ù ye ɲɔ̀ gòsi they have beaten
Copula ò yé kɔ̀nɔ yé it is a bird
Verb "see" ka a ye to see it
Imperative á' yé nà! come! (plural)

This creates genuine ambiguity for k'a ye:

Interpretation Expansion Meaning
Postposition kɛ a ye do it for him
Verb "see" ka a ye to see it

Default behavior: The normalizer chooses kɛ a ye (postposition is more frequent).

Solution: Use mode="contract" for ASR evaluation to avoid disambiguation penalties:

evaluator = BambaraEvaluator(mode="contract")
# Both "kɛ a ye" and "ka a ye" => "k'a ye" 

Scope

The normalizer uses local context (1-3 word lookahead). It does not:

  • Parse full sentence structure
  • Use dictionary/lexicon for POS tagging
  • Consider discourse-level context

Utility Functions

from bambara_normalizer import (
    is_contraction,
    can_contract,
    find_contractions,
    find_contractable_sequences,
    compare_normalization_modes,
    analyze_text,
    is_bambara_vowel,
    get_tone,
    remove_tones,
    number_to_bambara,
    bambara_to_number,
    normalize_numbers_in_text,
    denormalize_numbers_in_text,
    is_number_word,
    
    bambara_to_date,
    bambara_to_day_of_week,
    bambara_to_month,
    date_to_bambara,
    day_of_week_to_bambara,
    denormalize_dates_in_text,
    format_date_bambara,
    is_bambara_day,
    is_bambara_month,
    month_to_bambara,
    normalize_dates_in_text,

    time_to_bambara,
    bambara_to_time,
    format_time_bambara,
    duration_to_bambara,
    bambara_to_duration,
    format_duration_bambara,
    normalize_times_in_text,
    is_time_word,
)


is_contraction("b'a")                   
is_contraction("bɛ")                     
can_contract("bɛ a")                      

# Find patterns in text
find_contractions("B'a fɔ k'a ta")       # ["b'", "k'"]
find_contractable_sequences("bɛ a fɔ")   # [('bɛ', 'a')]

# Compare modes side-by-side
compare_normalization_modes("b'a fɔ")
# {'original': "b'a fɔ", 'expand': 'bɛ a fɔ', 'contract': "b'a fɔ", 'preserve': "b'a fɔ"}

# Full text analysis
analyze_text("B'a fɔ k'a la")
# {'word_count': 4, 'contractions_found': ["b'", "k'"], 'has_tone_marks': False, ...}

# Tone handling
get_tone("fɔ́")                           # "high"
remove_tones("fɔ́ bɛ̀")                    # "fɔ bɛ"

# Number conversion: digits => Bambara words
number_to_bambara(5)                     # "duuru"
number_to_bambara(23)                    # "mugan ni saba"
number_to_bambara(100)                   # "kɛmɛ"
number_to_bambara(123)                   # "kɛmɛ ni mugan ni saba"
number_to_bambara(1000)                  # "waa kelen"
number_to_bambara(5.3)                   # "duuru tomi saba"

# Number conversion: Bambara words => digits
bambara_to_number("duuru")               # 5
bambara_to_number("mugan ni saba")       # 23
bambara_to_number("kɛmɛ")                # 100
bambara_to_number("waa kelen")           # 1000
bambara_to_number("duuru tomi saba")     # 5.3

# Number normalization in text
normalize_numbers_in_text("A ye 5 di")       # "A ye duuru  di"
normalize_numbers_in_text("Mɔgɔ 100 nana")        # "Mɔgɔ kɛmɛ nana"
normalize_numbers_in_text("A be san 25 bɔ")       # "A be san mugan ni duuru bɔ"

# Inverse: Bambara words => digits in text
denormalize_numbers_in_text("A ye duuru di")  # "A ye 5  di"
denormalize_numbers_in_text("Mɔgɔ kɛmɛ nana")      # "Mɔgɔ 100 nana"

# Check if word is a number word
is_number_word("duuru")                  # True
is_number_word("kɛmɛ")                   # True
is_number_word("fɔ")                     # False


# Date conversion: dates => Bambara
date_to_bambara(2024, 10, 13)            # "Oktɔburu tile tan ni saba san baa fila ni mugan ni naani"
format_date_bambara("13-10-2024")        # Same as above

# Date conversion: Bambara => dates
bambara_to_date("Oktɔburu tile tan ni saba san baa fila ni mugan ni naani")  # datetime.date(2024, 10, 13)

# Day/Month helpers
day_of_week_to_bambara(0)                # "Tɛnɛn" (Monday)
day_of_week_to_bambara(6)                # "Kari" (Sunday)
month_to_bambara(10)                     # "Oktɔburu"
bambara_to_month("Oktɔburu")             # 10

# Date normalization in text
normalize_dates_in_text("A bɛ na 13-10-2024 la")  # "A bɛ na Oktɔburu tile ... la"

# Check if word is date-related
is_bambara_month("Oktɔburu")             # True
is_bambara_day("Juma")                   # True


# Time conversion: clock times → Bambara
time_to_bambara(1, 0)                    # "Nɛgɛ kaɲɛ kelen"
time_to_bambara(7, 30)                   # "Nɛgɛ kaɲɛ wolonwula ni sanga bi saba"
format_time_bambara("13:50")             # "Nɛgɛ kaɲɛ tan ni saba ni sanga bi duuru"

# Time conversion: Bambara → clock times
bambara_to_time("Nɛgɛ kaɲɛ wolonwula ni sanga bi saba")  # datetime.time(7, 30)

# Duration conversion: durations → Bambara
duration_to_bambara(minutes=30)          # "miniti bi saba"
duration_to_bambara(hours=1, minutes=30) # "lɛrɛ kelen ni miniti bi saba"
format_duration_bambara("1h30min10s")    # "lɛrɛ kelen ni miniti bi saba ni segɔni tan"

# Duration conversion: Bambara → durations
bambara_to_duration("lɛrɛ kelen ni miniti bi saba")  # (1, 30, 0)

# Time normalization in text
normalize_times_in_text("A nana 7:30 la")  # "A nana Nɛgɛ kaɲɛ wolonwula ... la"

# Check if word is time-related
is_time_word("lɛrɛ")                      # True
is_time_word("miniti")                    # True
is_time_word("segɔni")                    # True

Evaluation Metrics

Metric Description Range
WER Word Error Rate 0.0 – ∞
CER Character Error Rate 0.0 – ∞
MER Match Error Rate 0.0 – 1.0
WIL Word Information Lost 0.0 – 1.0
WIP Word Information Preserved 0.0 – 1.0
DER Diacritic Error Rate (tone accuracy) 0.0 – ∞

References

Linguistic Resources

Standards

  • UNESCO Bamako Meeting (1966)
  • Niamey African Reference Alphabet (1978)

Tools

  • jiwer ASR evaluation metrics

Related Work


MALIBA-AI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bambara_text_normalizer-1.0.0.tar.gz (47.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bambara_text_normalizer-1.0.0-py3-none-any.whl (38.2 kB view details)

Uploaded Python 3

File details

Details for the file bambara_text_normalizer-1.0.0.tar.gz.

File metadata

  • Download URL: bambara_text_normalizer-1.0.0.tar.gz
  • Upload date:
  • Size: 47.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bambara_text_normalizer-1.0.0.tar.gz
Algorithm Hash digest
SHA256 e98c00c4b6f22089627ef64ece084d4e3a47bf2fbb9f9b482da71f0af0283564
MD5 6db15d0c0cde41f2fa8ec6d90ec8ae1e
BLAKE2b-256 7d68fc352f09a7bbb78c89a51f9f41317e75e260060d80e11cd1002e15de647a

See more details on using hashes here.

File details

Details for the file bambara_text_normalizer-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for bambara_text_normalizer-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dfe6ad7ddbfa9e16e548d9eeb40545c9eb61c03eef15643951139face9db19b6
MD5 4f237a8579197523927b8ee965229b8f
BLAKE2b-256 08c14b9a01e40f1eb2dab7e77d4a4a428c711820fd14c51138cb04ac1f0146eb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page