Python library for Nufi (Fe'éfě'e) text: Clafrica keyboard mapping, Bana→Komako normalisation, low-tone stripping, and encoding repair
Project description
nuficlean
Python library for Nufi (Fe'éfě'e / Babanki-Tungo) text utilities:
- Bana → Komako normalisation — converts Bana orthography to the standard Komako form, strips low-tone diacritics, and repairs encoding issues
- Clafrica keyboard mapping — converts ASCII shortcut sequences (Clafrica input method) into the corresponding Nufi Unicode characters
- Verb detection — identifies whether a Nufi word is a verb form based on prefix and tone rules
Install
pip install nuficlean
Bana normalisation
clean(text)
Applies the full normalisation pipeline to a string.
from nuficlean import clean
clean("kòlə̀'") # → "kwele'"
clean("nàh") # → "lah"
clean("mɛ̀ɛ̀") # → "maa"
clean("tōh mēndɑ̀'") # → "tōh mēndɑ'"
clean_lines(lines)
Cleans a list of strings.
from nuficlean import clean_lines
clean_lines(["kòlə̀'", "nàh", "mɛ̀ɛ̀"])
# → ["kwele'", "lah", "maa"]
Pipeline
- Mojibake repair — fixes Latin-1 → UTF-8 misencoding
- Apostrophe / quote unification — maps
',`,ʼ,",«,»→ ASCII - Bana → Komako rewrite — longest-match substitution (
kòlə̀'→kwele',ɛ̀→a, …) - Low-tone stripping — removes grave-accent tone marks (
à→a,ɑ̀→ɑ, …) - NFC recomposition
CLI
nuficlean "kòlə̀'"
echo "mɛ̀ɛ̀" | nuficlean
Clafrica keyboard mapping
The Clafrica input method uses ASCII shortcut sequences to type Nufi characters.
nuficlean ships the canonical mapping table and exposes it through two functions
and a class.
apply_clafrica(text)
Converts all Clafrica shortcuts in text to Unicode, preserving whitespace.
from nuficlean import apply_clafrica
apply_clafrica("af1 e2 n*") # → "ɑ̀ é ŋ"
apply_clafrica("eu3 af5") # → "ə̄ ɑ̂"
apply_clafrica("uu1 o*2") # → "ʉ̀ ɔ́"
apply_clafrica("N* O*") # → "Ŋ Ɔ"
Live-typing mode — pass preserve_ambiguous_trailing=True to leave the last
token untouched while the user may still extend it:
apply_clafrica("af", preserve_ambiguous_trailing=True) # → "af" (could become af1, af2…)
apply_clafrica("af1") # → "ɑ̀"
finalize_clafrica(text)
Like apply_clafrica but also resolves any trailing ambiguous shortcut —
use this when the user confirms input (e.g. presses Enter or Space).
from nuficlean import finalize_clafrica
finalize_clafrica("eu3") # → "ə̄"
finalize_clafrica("af1") # → "ɑ̀"
finalize_clafrica("n*") # → "ŋ"
ClafricaEngine — advanced use
Instantiate the engine directly when you need a custom mapping or extra entries.
from nuficlean import ClafricaEngine
# Add project-specific shortcuts on top of the default table
engine = ClafricaEngine(extra={"nkap": "ŋkɑ̄p"})
engine.apply_mapping("nkap e2") # → "ŋkɑ̄p é"
engine.finalize_input("eu3") # → "ə̄"
engine.lookup("af1") # → "ɑ̀"
engine.lookup("xyz") # → None
# Fully custom table (replaces the default)
engine = ClafricaEngine(mapping={"a1": "à", "e1": "è"})
Shortcut reference
| Shortcut | Output | Notes |
|---|---|---|
af |
ɑ |
open-a |
eu |
ə |
schwa |
ai |
ε |
epsilon |
o* |
ɔ |
open-o |
uu |
ʉ |
u-bar |
n* |
ŋ |
eng |
N* |
Ŋ |
Eng (uppercase) |
a1 a2 a3 |
à á ā |
low / mid / high tone |
af1 af2 af3 |
ɑ̀ ɑ́ ɑ̄ |
open-a tones |
eu1 eu2 eu3 |
ə̀ ə́ ə̄ |
schwa tones |
o*1 o*2 o*3 |
ɔ̀ ɔ́ ɔ̄ |
open-o tones |
Tone digits: 1 = low, 2 = mid, 3 = high, 5 = rising, 7 = falling.
Tip: The
clafricapackage on PyPI provides the same keyboard mapping as a standalone library if you don't need the Bana normalisation.pip install clafrica
Verb detection
is_nufi_verb(text)
Returns True if text looks like a Nufi verb form — i.e. its first token
starts with a recognised verb-trigger prefix (mb, nd, ng, nk, …) and
the first syllable carries a high-tone or mid-tone vowel.
from nuficlean import is_nufi_verb
is_nufi_verb("ndɑ́") # → True
is_nufi_verb("mbé") # → True
is_nufi_verb("kolo") # → False (no verb prefix)
is_nufi_verb("ndàh") # → False (prefix present but vowel is low-tone)
classify_verb_candidate(text)
Returns a detail dict with the full analysis — useful for debugging or building richer NLP pipelines.
from nuficlean import classify_verb_candidate
result = classify_verb_candidate("ndɑ́")
# {
# "input": "ndɑ́",
# "normalized": "ndɑ́",
# "classification": "verb",
# "matched_prefixes": ["nd"],
# "valid_tokens": ["ndɑ́"],
# "invalid_tokens": [],
# "reason": "first token starts with a target verb prefix and its first syllable satisfies the tone rule",
# "token_checks": [...]
# }
print(result["classification"]) # "verb"
print(result["reason"])
Recognised verb-trigger prefixes: mb mf mm mv nc nd ng nŋ nh nj nk nn ns nt nz
Tone rule: the first syllable must contain at least one high-tone or mid-tone vowel; if it contains multiple vowels they must all share the same high or mid tone.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nuficlean-0.3.1.tar.gz.
File metadata
- Download URL: nuficlean-0.3.1.tar.gz
- Upload date:
- Size: 17.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d09de68f2df96f12b1ce998284b510219a141894bc1be9f52508bc15a8a6a48d
|
|
| MD5 |
a8967f8fcc7d2ae41316fc049efd50fe
|
|
| BLAKE2b-256 |
19fe8229556af2e5af8c839f9a1259307a8169e978db0af591abb15f3755db73
|
File details
Details for the file nuficlean-0.3.1-py3-none-any.whl.
File metadata
- Download URL: nuficlean-0.3.1-py3-none-any.whl
- Upload date:
- Size: 16.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
285bc03d91fd2c4965f9ae8c550027c70af613ae1607700ef732bdb86ac11b9b
|
|
| MD5 |
f598c45b7c9937f7548974a6aeacdd44
|
|
| BLAKE2b-256 |
aee70067dcd89e9e437de3bf498c0f547c20763e5403b47cc544ce44b3f39c52
|