Fast multi-language keyword extraction with tenant-aware stopwords and fuzzy dictionary snapping.

These details have not been verified by PyPI

Project links

Project description

Nyansasua

Fast multi-language keyword extraction for Python, powered by the C++17 Cire core.

Nyansasua installs as the cire Python module and provides TF-IDF, YAKE, TextRank, RAKE, and ensemble keyword extraction with UTF-8 tokenization, stopword filtering, tenant-aware configuration, and fuzzy dictionary snapping.

Features

18 language profiles: English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic, Indonesian, Twi/Akan, Ga, Ewe, Hausa, and Fante.
4 extraction algorithms: TF-IDF, YAKE, TextRank, RAKE, plus ensemble mode.
Tenant-aware stopwords: isolate domain or agent-specific stopwords such as Banking, Health, Legal, and Education.
BK-tree fuzzy snapping: fast tenant-scoped correction to canonical terms like NHIS, GHS, or domain vocabulary.
Unicode-native: handles UTF-8 text, Ghanaian characters, CJK, Cyrillic, Arabic, Hangul, Hiragana, Katakana, Thai, and Devanagari scripts.
No Python runtime dependencies after installation.

Installation

pip install nyansasua

Quick Start

import cire

cfg = cire.ExtractConfig()
cfg.language = cire.Language.English
cfg.algorithm = cire.Algorithm.YAKE
cfg.top_k = 5

for kw in cire.extract_keywords("Machine learning is a branch of AI.", cfg):
    print(kw.text, kw.score)

High-Level Extractor

import cire

ext = cire.Extractor(language="auto", algorithm="ensemble", top_k=10)

keywords = ext.extract(
    "Natural language processing has seen rapid growth in education tools."
)

for kw in keywords:
    print(kw.text, kw.score)

Ghanaian Language Detection

import cire

print(cire.detect_language("ame ƒe nu"))        # Language.Ewe
print(cire.detect_language("ɗan makaranta"))    # Language.Hausa
print(cire.detect_language("ŋɔɔ kɛ sane"))      # Language.Ga
print(cire.detect_language("me dɛ hom nyina"))  # Language.Fante

Detection is heuristic. Text with diagnostic Unicode characters such as ƒ, ʋ, ɗ, ɓ, ƙ, ŋ, ɛ, and ɔ is much more reliable than plain ASCII text.

Tenant-Aware Stopwords

Use tenant IDs to keep domain-specific stopwords isolated across agents.

import cire

cire.load_tenant_stopwords(
    "banking",
    cire.Language.English,
    ["can", "get", "account", "fees"],
)

cfg = cire.ExtractConfig()
cfg.language = cire.Language.English
cfg.algorithm = cire.Algorithm.RAKE
cfg.tenant_id = "banking"
cfg.top_k = 5

keywords = cire.extract_keywords(
    "Can I get account fees for a mobile money loan?",
    cfg,
)

Tenant stopwords are additive: built-in language stopwords still apply, and each tenant gets its own isolated overlay.

Tenant Fuzzy Dictionary Snapping

Nyansasua can keep separate canonical dictionaries in memory for different tenants or domains.

import cire

cire.load_tenant_dictionary("health", ["NHIS", "GHS", "malaria treatment"])

print(cire.snap_term("health", "nhsi"))  # NHIS
print(cire.snap_term("legal", "nhsi"))   # nhsi, no cross-tenant leakage

The snapper uses a BK-tree per tenant, so large dictionaries avoid a full linear scan for every query.

Batch Processing And Corpus TF-IDF

import cire

ext = cire.Extractor(language="english", algorithm="ensemble", top_k=5)

batch = ext.extract_many([
    "Python is widely used in data science.",
    "Climate change is a significant global challenge.",
])

corpus = [
    "Python is used in data science.",
    "Java is used in enterprise environments.",
    "Python is popular for AI.",
]

kws = ext.extract_corpus_tfidf(
    texts=corpus,
    target_text="Python is heavily used in AI and ML.",
    top_k=3,
)

Performance Snapshot

Recent C++ benchmark run on the development server:

Stopword lookups: about 0.16-0.63 microseconds per lookup.
YAKE short text extraction: about 16.6 microseconds per extraction.
BK-tree fuzzy snapping at 10,000 terms: about 243 microseconds per snap.
Concurrent tenant stopword isolation: 0 failures across 160,000 operations.

Exact timings depend on hardware, compiler, build type, and input shape.

License

MIT License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.4

Jun 19, 2026

This version

0.2.3

Jun 19, 2026

0.2.2

Jun 19, 2026

0.1.2

Jun 15, 2026

0.1.1

Jun 15, 2026

0.1.0

Jun 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nyansasua-0.2.3.tar.gz (763.5 kB view details)

Uploaded Jun 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nyansasua-0.2.3-cp312-cp312-manylinux_2_34_x86_64.whl (252.8 kB view details)

Uploaded Jun 19, 2026 CPython 3.12manylinux: glibc 2.34+ x86-64

File details

Details for the file nyansasua-0.2.3.tar.gz.

File metadata

Download URL: nyansasua-0.2.3.tar.gz
Upload date: Jun 19, 2026
Size: 763.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for nyansasua-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`fd6972fa706179d1be983ce1d5477b2b69b23541c288557dfe3f73fc138089d6`
MD5	`4c44384720711ae90ed1c6cdac55aeab`
BLAKE2b-256	`0add2887db72952a0383298bd3b903a24443763741f7386e2aa01d8cfb0eef55`

See more details on using hashes here.

File details

Details for the file nyansasua-0.2.3-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

Download URL: nyansasua-0.2.3-cp312-cp312-manylinux_2_34_x86_64.whl
Upload date: Jun 19, 2026
Size: 252.8 kB
Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for nyansasua-0.2.3-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm	Hash digest
SHA256	`66aab86c64ba59a3d2a497137764103782af6ee21decc4c5f8733af79b91edfa`
MD5	`7183586a8abfa05a94052aa0addbc90b`
BLAKE2b-256	`7fd07509ab0843001c608c1ea012812ee0c0b1c598849ddba21c7da213581a9f`

See more details on using hashes here.

nyansasua 0.2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Nyansasua

Features

Installation

Quick Start

High-Level Extractor

Ghanaian Language Detection

Tenant-Aware Stopwords

Tenant Fuzzy Dictionary Snapping

Batch Processing And Corpus TF-IDF

Performance Snapshot

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes