Evaluation toolkit for AI systems in African language contexts — code-switching, dialectal robustness, and low-resource NLP.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Uchebuzz

These details have not been verified by PyPI

Project links

Documentation

Project description

NaijaEval

Evaluation infrastructure for AI systems that mainstream benchmarks can't assess — built for African languages, code-switching, and dialectal robustness.

Why this exists

Standard NLP benchmarks — GLUE, HELM, XTREME — were built for high-resource languages and standard dialects. When you build a system for Nigerian English, Yoruba, Igbo, Hausa, Nigerian Pidgin, or Swahili, none of those benchmarks tell you whether your system actually works.

The specific gaps NaijaEval addresses:

No metric exists for code-switch robustness. A model that scores 0.85 on clean English may collapse when a user switches mid-sentence from English to Yoruba.
No standard way to measure dialectal degradation. WER on standard British English says nothing about WER on Nigerian English.
Terminology preservation is unmeasured. BLEU doesn't weight medical or legal terms differently from "the" — but in practice, getting "hypertension" wrong matters more than getting word order slightly wrong.
Hallucination in low-resource translation is invisible. When a model is undertrained on Swahili, it hallucinates. Standard metrics don't flag this.

NaijaEval provides composable, task-agnostic metrics that work on real African language evaluation challenges — out of the box.

Quickstart

pip install naijaeval

from naijaeval.metrics import (
    CodeSwitchRateMetric,
    TerminologyPreservationMetric,
    HallucinationRateMetric,
    WERMetric,
)

# Measure how mixed your test data is
csr = CodeSwitchRateMetric()
result = csr.compute(
    predictions=["I dey go market abeg, wetin be the price?"],
    references=[],
)
print(f"Code-switch rate: {result.score:.3f}")
# Code-switch rate: 0.444

# Check terminology preservation in medical translation
tpr = TerminologyPreservationMetric(domain="medical")
result = tpr.compute(
    predictions=["Alaisan naa ni malaria ati hypertension."],
    references=[],
)
print(f"Term preservation: {result.score:.3f}")
# Term preservation: 0.150  (most terms not preserved → low Yoruba coverage)

# Detect hallucination in summarisation
hal = HallucinationRateMetric()
result = hal.compute(
    predictions=["The Lagos General Hospital in Kano treated 500 patients."],
    references=["The hospital in Lagos treated patients."],  # source
)
print(f"Hallucination rate: {result.score:.3f}")
print(f"Hallucinated: {result.details['per_sample'][0]['hallucinated']}")

Supported tasks and benchmarks

Benchmark	Task	Languages	Dataset
`naija_mt_v1`	Machine translation	English → Yoruba	MENYO-20k
`coswitch_asr_v1`	ASR robustness	Nigerian English / Pidgin	Common Voice

Supported metrics

Metric	Category	Description
`code_switch_rate`	Robustness	Fraction of token pairs that switch language
`dialectal_robustness_score`	Robustness	Relative performance drop on dialectal vs standard input
`terminology_preservation_rate`	Fidelity	Fraction of domain terms present in output
`bleu`	Fidelity	Corpus BLEU (sacrebleu)
`chrf`	Fidelity	Character F-score — better for morphologically rich languages
`wer`	ASR	Word Error Rate
`cer`	ASR	Character Error Rate
`wer_delta`	ASR	WER degradation from standard to dialectal input
`hallucination_rate`	Consistency	Entity-based hallucination detection
`consistency_score`	Consistency	N-gram faithfulness to source

Built-in domain term lists

medical · legal · financial · customer_support

Built-in language vocabularies

Yoruba (yo) · Igbo (ig) · Hausa (ha) · Nigerian Pidgin (pcm) · Swahili (sw) · Zulu (zu) · Amharic (am)

CLI reference

# List everything available
naijaeval list metrics
naijaeval list datasets
naijaeval list benchmarks

# Run a benchmark
naijaeval run \
    --benchmark naija_mt_v1 \
    --predictions preds.txt \
    --references refs.txt \
    --model Helsinki-NLP/opus-mt-en-yo \
    --output results.json

# Compare two models
naijaeval compare model_a.json model_b.json

# Generate HTML report
naijaeval report --input results.json --output report.html

Python API

# Run a full task evaluation
from naijaeval.tasks.translation import TranslationTask

task = TranslationTask(domain="medical")
results = task.evaluate(
    predictions=my_translations,
    references=reference_translations,
    sources=english_sentences,
)
for name, result in results.items():
    print(f"{name}: {result.score:.4f}")

# Compare ASR performance on standard vs dialectal input
from naijaeval.tasks.asr import ASRTask

task = ASRTask()
results = task.evaluate(
    predictions=standard_preds,
    references=standard_refs,
    dialectal_predictions=dialectal_preds,
    dialectal_references=dialectal_refs,
    dialect_name="Nigerian English",
)
print(results["wer_delta"].details["interpretation"])

Extending the toolkit

Register a custom metric:

from naijaeval import register_metric
from naijaeval.metrics.base import BaseMetric, MetricResult

@register_metric("my_custom_score")
class MyCustomScore(BaseMetric):
    name = "my_custom_score"
    description = "My domain-specific evaluation metric."
    higher_is_better = True

    def compute(self, predictions, references, **kwargs):
        score = ...  # your implementation
        return MetricResult(name=self.name, score=score)

Register a custom dataset:

from naijaeval import register_dataset

@register_dataset("my_corpus")
def load_my_corpus(split="test", **kwargs):
    # Return an iterable of {"source": ..., "target": ...} dicts
    ...

See docs/contributing/adding_metrics.md for the full contribution guide.

Roadmap

v0.1 (current)

10 core metrics across 4 categories
2 benchmarks (naija_mt_v1, coswitch_asr_v1)
5 dataset loaders (MENYO-20k, FLEURS ×3, sample)
CLI and HTML reports
Plugin system

v0.2 (planned)

COMET and BERTScore integration
NLI-based hallucination detection (upgrade from heuristic)
Conversational AI task
Swahili and Igbo translation benchmarks
Interactive Colab notebook

v0.3 (planned)

Leaderboard integration
AfricaNLP workshop benchmark track

Citation

If you use NaijaEval in your research, please cite:

@software{buzugbe2026naijaeval,
  author    = {Buzugbe, Uche},
  title     = {{NaijaEval}: Evaluation toolkit for AI systems in African language contexts},
  year      = {2026},
  url       = {https://github.com/Uchebuzz/naijaeval},
  version   = {0.1.0},
}

Contributing

Contributions are welcomed and encouraged. See CONTRIBUTING.md for how to add metrics, datasets, and benchmarks.

The fastest way to make a meaningful contribution is to:

Add a new metric (see naijaeval/metrics/ for examples)
Add a dataset loader for an underrepresented African language
Run your own models against existing benchmarks and submit results

Community

GitHub Discussions — questions, ideas, benchmark results
AfricaNLP Workshop — the primary research community this toolkit serves
Masakhane — African NLP community

License

Apache 2.0 — see LICENSE.

Because good models deserve honest benchmarks.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Uchebuzz

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

This version

0.1.0

Apr 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

naijaeval-0.1.0.tar.gz (50.3 kB view details)

Uploaded Apr 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

naijaeval-0.1.0-py3-none-any.whl (46.7 kB view details)

Uploaded Apr 23, 2026 Python 3

File details

Details for the file naijaeval-0.1.0.tar.gz.

File metadata

Download URL: naijaeval-0.1.0.tar.gz
Upload date: Apr 23, 2026
Size: 50.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for naijaeval-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`54dd9563740c130cc1da62a10910c5039b60f32a0d0cb9751b529c52037c1db5`
MD5	`5baaa62f23cc46adae1f6119b9f98456`
BLAKE2b-256	`a2d1d537605b74157bbf792d9a9fa6d976fce6df650644c281948618aa3897cb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for naijaeval-0.1.0.tar.gz:

Publisher: ci.yml on Uchebuzz/Naijaeval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: naijaeval-0.1.0.tar.gz
- Subject digest: 54dd9563740c130cc1da62a10910c5039b60f32a0d0cb9751b529c52037c1db5
- Sigstore transparency entry: 1363499415
- Sigstore integration time: Apr 23, 2026
Source repository:
- Permalink: Uchebuzz/Naijaeval@930ad7b4e2b508796d396f915690819b029be74c
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/Uchebuzz
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@930ad7b4e2b508796d396f915690819b029be74c
- Trigger Event: release

File details

Details for the file naijaeval-0.1.0-py3-none-any.whl.

File metadata

Download URL: naijaeval-0.1.0-py3-none-any.whl
Upload date: Apr 23, 2026
Size: 46.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for naijaeval-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3a87879c0bd394549343dec6a61351f693fbfcf6aae3b707fd15747feb66c804`
MD5	`40eae3c2bb4f4c05408894eab302486f`
BLAKE2b-256	`a1b5bc4733e97932a7cc08ac6edfc3486e98ecc252d98840bc2b3c0cbb3dea80`

See more details on using hashes here.

Provenance

The following attestation bundles were made for naijaeval-0.1.0-py3-none-any.whl:

Publisher: ci.yml on Uchebuzz/Naijaeval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: naijaeval-0.1.0-py3-none-any.whl
- Subject digest: 3a87879c0bd394549343dec6a61351f693fbfcf6aae3b707fd15747feb66c804
- Sigstore transparency entry: 1363499483
- Sigstore integration time: Apr 23, 2026
Source repository:
- Permalink: Uchebuzz/Naijaeval@930ad7b4e2b508796d396f915690819b029be74c
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/Uchebuzz
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@930ad7b4e2b508796d396f915690819b029be74c
- Trigger Event: release

naijaeval 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

NaijaEval

Why this exists

Quickstart

Supported tasks and benchmarks

Supported metrics

Built-in domain term lists

Built-in language vocabularies

CLI reference

Python API

Extending the toolkit

Roadmap

Citation

Contributing

Community

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance