Skip to main content

Nufi language text normalisation: Bana → Komako standard orthography, tone stripping, mojibake repair

Project description

nuficlean

Python library for normalising Nufi (Babanki-Tungo / Nʉ Fì) text: converts Bana orthography to Komako standard, strips low-tone diacritics, and repairs common encoding issues.

Install

pip install nuficlean

Or from source:

pip install -e /path/to/nuficlean

Usage

from nuficlean import clean

clean("kòlə̀'")        # → "kwele'"
clean("tōh mēndɑ̀'")  # → "tōh mēndɑ'"
clean("mbɑ̀ɑ̀")        # → "mbɑɑ"

Batch cleaning:

from nuficlean import clean_lines

clean_lines(["kòlə̀'", "mbɑ̀ɑ̀"])  # → ["kwele'", "mbɑɑ"]

CLI:

nuficlean "kòlə̀'"
echo "mbɑ̀ɑ̀" | nuficlean

Pipeline

  1. Mojibake repair — fixes Latin-1 → UTF-8 misencoding
  2. Apostrophe / quote unification — maps ', `, ʼ, ", «, », etc. to ASCII
  3. Bana → Komako rewrite — longest-match substitution table (e.g. kòlə̀'kwèlè', ɛ̀a)
  4. Low-tone stripping — removes grave-accent tone marks (àa, ɑ̀ɑ, …)
  5. NFC recomposition

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nuficlean-0.1.0.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nuficlean-0.1.0-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file nuficlean-0.1.0.tar.gz.

File metadata

  • Download URL: nuficlean-0.1.0.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for nuficlean-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7587af8749c2666151039c870fbfc3799c9008f295954813580df697d62f1b79
MD5 b36e9ba7d6a30b2c62138b8c1090b67e
BLAKE2b-256 31fcfb402e2ec9ce7b1ecdf7401b22555c7717b330a2cfbf1729359e775243b3

See more details on using hashes here.

File details

Details for the file nuficlean-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: nuficlean-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for nuficlean-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7b386dbe96a189553aa78e6e2c581c71dbda6e35f9e6f02693c2d2a47a7a31ef
MD5 64e87aea6978269e4a8375fead96ec3a
BLAKE2b-256 3e944e8f73fce520f326a4e354259a8d9b402b55687b7bea1d8f91d725ce89e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page