A robust Python toolkit for low-resource African language pre-processing, emotion labels, evaluation, and routing.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

Low-Resource NLP Toolkit

A public, research-facing Python toolkit for African language pre-processing, emotion-label mapping, evaluation, and language/dialect routing.

The project is designed as a safe open-source wrapper around the kinds of NLP engineering problems that appear in low-resource and multilingual AI research: noisy text, code-switching, uneven label taxonomies, small datasets, and evaluation that must be transparent.

Status: 0.2.0 release. Local checks, CI, isolated wheel builds, metadata checks, and install tests pass.

Why This Exists

Low-resource NLP projects often spend too much time rebuilding the same foundations before modelling begins. This toolkit provides a dependable base layer:

Text normalisation for noisy social, conversational, and cultural text.
Lightweight African language routing for Yoruba, Igbo, Hausa, Nigerian Pidgin, Swahili, and English.
Evidence-first code-switch audits that expose token routes, spans, and abstentions.
Emotion label harmonisation across categorical and valence-arousal formats.
Evaluation utilities for classification and routing experiments.
A CLI and examples that run without downloading model weights.
Extension points for transformer or embedding backends when a project needs heavier models.

Architecture

flowchart LR
    A["Raw multilingual text"] --> B["Normaliser"]
    B --> C["Tokeniser"]
    C --> D["Language router"]
    C --> E["Emotion label mapper"]
    D --> F["Route decision + confidence"]
    E --> G["Canonical emotion / valence-arousal"]
    F --> H["Evaluation reports"]
    G --> H

Quick Start

python3 -m venv .venv
source .venv/bin/activate
python -m pip install low-resource-nlp-toolkit
low-resource-nlp --version

Route a text sample:

low-resource-nlp route "abeg make una help me check this model output"

Audit code-switched language evidence:

low-resource-nlp audit "abeg make una check this model output"

Normalise text:

low-resource-nlp normalise "Ẹ káàrọ̀!!! Visit https://example.com @user"

Map an emotion label:

low-resource-nlp label joy

Run tests:

make check

Without make:

python3 scripts/quality_gate.py
PYTHONPATH=src python3 -m unittest discover -s tests

Python Usage

from low_resource_nlp import (
    LexicalLanguageRouter,
    audit_code_switching,
    label_to_valence_arousal,
    normalise_text,
)

text = normalise_text("Ẹ káàrọ̀, báwo ni?")
decision = LexicalLanguageRouter.default().route(text)
audit = audit_code_switching("abeg make una check this model output")
emotion = label_to_valence_arousal("joy")

print(decision.language_code, decision.confidence)
print(audit.language_mix, audit.warnings)
print(emotion)

Current Scope

The first public release deliberately avoids bundling private datasets or model weights. The core is deterministic, inspectable, and dependency-light. Optional embedding and transformer backends are outside the current core package.

Supported core modules:

normalisation: Unicode-aware text cleaning, URL/user normalisation, tokenisation, repeated-character handling.
routing: script-aware and lexicon-assisted language routing.
audit: token-level code-switch audits with spans, evidence, and abstention warnings.
labels: canonical emotion labels and valence-arousal mapping.
evaluation: precision, recall, F1, macro/micro summaries, and confusion matrices.
datasets: simple CSV/JSONL readers for experiment scaffolding.

Public Project Materials

Responsible AI Notes

This toolkit is for research and prototyping. Language, dialect, and emotion labels are socially and culturally sensitive. Do not treat routing or emotion predictions as identity labels, clinical assessments, or ground truth. Always evaluate with speakers, domain experts, and context-specific data.

External Use

External use signals should be public and verifiable: issues from real users, pull requests, tutorial use, workshop demos, citations, package downloads, or adoption by a lab/community project. Self-generated activity should not be counted as impact.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

oyinkanchekwas

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Jul 1, 2026

0.1.0

Jul 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

low_resource_nlp_toolkit-0.2.0.tar.gz (19.5 kB view details)

Uploaded Jul 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

low_resource_nlp_toolkit-0.2.0-py3-none-any.whl (17.8 kB view details)

Uploaded Jul 1, 2026 Python 3

File details

Details for the file low_resource_nlp_toolkit-0.2.0.tar.gz.

File metadata

Download URL: low_resource_nlp_toolkit-0.2.0.tar.gz
Upload date: Jul 1, 2026
Size: 19.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for low_resource_nlp_toolkit-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`4ef0273baad864de16ba97fb893f45e6cb1e0b3a9964549a7606b49c68c1de06`
MD5	`e93530b2f2d65725651391f1687f92b1`
BLAKE2b-256	`26d6ada623b27b31a5811504fddd8a1a29186dc6a616518742518c7a0acbe25a`

See more details on using hashes here.

File details

Details for the file low_resource_nlp_toolkit-0.2.0-py3-none-any.whl.

File metadata

Download URL: low_resource_nlp_toolkit-0.2.0-py3-none-any.whl
Upload date: Jul 1, 2026
Size: 17.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for low_resource_nlp_toolkit-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`786f0f95f5001d52547700afb80c5e1a8813f971294fac0481ebf786277fa1e9`
MD5	`ba9d11079b8b4d67bbf208c7f0738061`
BLAKE2b-256	`23603744baefd7b60ec8157785d5192fb8d3a01a633d9fbf16045dc22a3fb547`

See more details on using hashes here.

low-resource-nlp-toolkit 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Low-Resource NLP Toolkit

Why This Exists

Architecture

Quick Start

Python Usage

Current Scope

Public Project Materials

Responsible AI Notes

External Use

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes