Skip to main content

Multi-language NLP keyword extraction (C++17 core, pybind11 bindings).

Project description

Nyansasua: Blazing-fast Multi-Language Keyword Extraction

Nyansasua (Twi) — learning / wisdom.

A self-contained, high-performance Python library for multi-language keyword extraction, backed by a low-level C++17 core.

Features

  • 14 Languages Supported: English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic, Indonesian, and Twi.
  • 4 Extraction Algorithms: TF-IDF, YAKE, TextRank, and RAKE. Also supports a combined Ensemble mode.
  • Lightning Fast: Highly optimized C++ backend. Benchmarks show processing of 1.5+ million characters (~200,000 words) in < 0.5 seconds.
  • UTF-8 Native: Natively handles complex Unicode and mixed-script texts without breaking token boundaries.

Installation

pip install nyansasua

Quick Start

import cire

# One-liner extraction
for k in cire.extract_keywords("Machine learning is a branch of AI.", top_k=5):
    print(k.text, k.score)

# Using the Extractor class
ext = cire.Extractor(language="auto", algorithm="ensemble", top_k=10)
for k in ext.extract("Natural language processing has seen rapid growth."):
    print(k.text, k.score)

Batch Processing & TF-IDF Extraction

import cire

ext = cire.Extractor(language="auto")

# Extract from a batch of documents
results = ext.extract_many([
    "Python is widely used in data science.",
    "Climate change is a significant global challenge."
])

# Corpus-driven TF-IDF extraction
corpus = [
    "Python is used in data science.", 
    "Java is used in enterprise environments.",
    "Python is popular for AI."
]
kws = ext.extract_corpus_tfidf(
    texts=corpus, 
    target_text="Python is heavily utilized in AI and ML.", 
    top_k=3
)

Language Detection & Stopwords

import cire

# Detect the dominant script heuristically
lang = cire.detect_language("Bonjour tout le monde") 
print(lang)  # Language.French

# Add custom stopwords dynamically
cire.add_stopword("french", "tout")

License

MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nyansasua-0.1.1-cp312-cp312-manylinux_2_34_x86_64.whl (228.1 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

File details

Details for the file nyansasua-0.1.1-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for nyansasua-0.1.1-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 49a2f8a88d8a5d1a30fe0d62a94c0114efc856ba620ba2c8aa82b677aa815a52
MD5 bab0a2c1a7bdbaffe8b7531b0b0dddc3
BLAKE2b-256 e76dbad0ef8c4b4bcde3d040ef96ab70466dfc74a25b78fdd21be59dfe2fb57f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page