Skip to main content

Python library for Nufi (Fe'éfě'e) text: Clafrica keyboard mapping, Bana→Komako normalisation, low-tone stripping, and encoding repair

Project description

nuficlean

Python library for Nufi (Fe'éfě'e / Babanki-Tungo) text utilities:

  • Bana → Komako normalisation — converts Bana orthography to the standard Komako form, strips low-tone diacritics, and repairs encoding issues
  • Clafrica keyboard mapping — converts ASCII shortcut sequences (Clafrica input method) into the corresponding Nufi Unicode characters
  • Verb detection — identifies whether a Nufi word is a verb form based on prefix and tone rules

Install

pip install nuficlean

Bana normalisation

clean(text)

Applies the full normalisation pipeline to a string.

from nuficlean import clean

clean("kòlə̀'")        # → "kwele'"
clean("nàh")           # → "lah"
clean("mɛ̀ɛ̀")         # → "maa"
clean("tōh mēndɑ̀'")  # → "tōh mēndɑ'"

clean_lines(lines)

Cleans a list of strings.

from nuficlean import clean_lines

clean_lines(["kòlə̀'", "nàh", "mɛ̀ɛ̀"])
# → ["kwele'", "lah", "maa"]

Pipeline

  1. Mojibake repair — fixes Latin-1 → UTF-8 misencoding
  2. Apostrophe / quote unification — maps ', `, ʼ, ", «, » → ASCII
  3. Bana → Komako rewrite — longest-match substitution (kòlə̀'kwele', ɛ̀a, …)
  4. Low-tone stripping — removes grave-accent tone marks (àa, ɑ̀ɑ, …)
  5. NFC recomposition

CLI

nuficlean "kòlə̀'"
echo "mɛ̀ɛ̀" | nuficlean

Clafrica keyboard mapping

The Clafrica input method uses ASCII shortcut sequences to type Nufi characters. nuficlean ships the canonical mapping table and exposes it through two functions and a class.

apply_clafrica(text)

Converts all Clafrica shortcuts in text to Unicode, preserving whitespace.

from nuficlean import apply_clafrica

apply_clafrica("af1 e2 n*")   # → "ɑ̀ é ŋ"
apply_clafrica("eu3 af5")     # → "ə̄ ɑ̂"
apply_clafrica("uu1 o*2")     # → "ʉ̀ ɔ́"
apply_clafrica("N* O*")       # → "Ŋ Ɔ"

Live-typing mode — pass preserve_ambiguous_trailing=True to leave the last token untouched while the user may still extend it:

apply_clafrica("af", preserve_ambiguous_trailing=True)  # → "af"  (could become af1, af2…)
apply_clafrica("af1")                                   # → "ɑ̀"

finalize_clafrica(text)

Like apply_clafrica but also resolves any trailing ambiguous shortcut — use this when the user confirms input (e.g. presses Enter or Space).

from nuficlean import finalize_clafrica

finalize_clafrica("eu3")   # → "ə̄"
finalize_clafrica("af1")   # → "ɑ̀"
finalize_clafrica("n*")    # → "ŋ"

ClafricaEngine — advanced use

Instantiate the engine directly when you need a custom mapping or extra entries.

from nuficlean import ClafricaEngine

# Add project-specific shortcuts on top of the default table
engine = ClafricaEngine(extra={"nkap": "ŋkɑ̄p"})
engine.apply_mapping("nkap e2")   # → "ŋkɑ̄p é"
engine.finalize_input("eu3")      # → "ə̄"
engine.lookup("af1")              # → "ɑ̀"
engine.lookup("xyz")              # → None

# Fully custom table (replaces the default)
engine = ClafricaEngine(mapping={"a1": "à", "e1": "è"})

Shortcut reference

Shortcut Output Notes
af ɑ open-a
eu ə schwa
ai ε epsilon
o* ɔ open-o
uu ʉ u-bar
n* ŋ eng
N* Ŋ Eng (uppercase)
a1 a2 a3 à á ā low / mid / high tone
af1 af2 af3 ɑ̀ ɑ́ ɑ̄ open-a tones
eu1 eu2 eu3 ə̀ ə́ ə̄ schwa tones
o*1 o*2 o*3 ɔ̀ ɔ́ ɔ̄ open-o tones

Tone digits: 1 = low, 2 = mid, 3 = high, 5 = rising, 7 = falling.

Tip: The clafrica package on PyPI provides the same keyboard mapping as a standalone library if you don't need the Bana normalisation. pip install clafrica


Verb detection

is_nufi_verb(text)

Returns True if text looks like a Nufi verb form — i.e. its first token starts with a recognised verb-trigger prefix (mb, nd, ng, nk, …) and the first syllable carries a high-tone or mid-tone vowel.

from nuficlean import is_nufi_verb

is_nufi_verb("ndɑ́")   # → True
is_nufi_verb("mbé")    # → True
is_nufi_verb("kolo")   # → False   (no verb prefix)
is_nufi_verb("ndàh")   # → False   (prefix present but vowel is low-tone)

classify_verb_candidate(text)

Returns a detail dict with the full analysis — useful for debugging or building richer NLP pipelines.

from nuficlean import classify_verb_candidate

result = classify_verb_candidate("ndɑ́")
# {
#   "input": "ndɑ́",
#   "normalized": "ndɑ́",
#   "classification": "verb",
#   "matched_prefixes": ["nd"],
#   "valid_tokens": ["ndɑ́"],
#   "invalid_tokens": [],
#   "reason": "first token starts with a target verb prefix and its first syllable satisfies the tone rule",
#   "token_checks": [...]
# }
print(result["classification"])  # "verb"
print(result["reason"])

Recognised verb-trigger prefixes: mb mf mm mv nc nd ng nŋ nh nj nk nn ns nt nz

Tone rule: the first syllable must contain at least one high-tone or mid-tone vowel; if it contains multiple vowels they must all share the same high or mid tone.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nuficlean-0.3.1.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nuficlean-0.3.1-py3-none-any.whl (16.7 kB view details)

Uploaded Python 3

File details

Details for the file nuficlean-0.3.1.tar.gz.

File metadata

  • Download URL: nuficlean-0.3.1.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for nuficlean-0.3.1.tar.gz
Algorithm Hash digest
SHA256 d09de68f2df96f12b1ce998284b510219a141894bc1be9f52508bc15a8a6a48d
MD5 a8967f8fcc7d2ae41316fc049efd50fe
BLAKE2b-256 19fe8229556af2e5af8c839f9a1259307a8169e978db0af591abb15f3755db73

See more details on using hashes here.

File details

Details for the file nuficlean-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: nuficlean-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 16.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for nuficlean-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 285bc03d91fd2c4965f9ae8c550027c70af613ae1607700ef732bdb86ac11b9b
MD5 f598c45b7c9937f7548974a6aeacdd44
BLAKE2b-256 aee70067dcd89e9e437de3bf498c0f547c20763e5403b47cc544ce44b3f39c52

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page