Skip to main content

Multi-language NLP keyword extraction (C++17 core, pybind11 bindings).

Project description

Nyansasua: Blazing-fast Multi-Language Keyword Extraction

Nyansasua (Twi) — learning / wisdom.

A self-contained, high-performance Python library for multi-language keyword extraction, backed by a low-level C++17 core.

Features

  • 14 Languages Supported: English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic, Indonesian, and Twi.
  • 4 Extraction Algorithms: TF-IDF, YAKE, TextRank, and RAKE. Also supports a combined Ensemble mode.
  • Lightning Fast: Highly optimized C++ backend. Benchmarks show processing of 1.5+ million characters (~200,000 words) in < 0.5 seconds.
  • UTF-8 Native: Natively handles complex Unicode and mixed-script texts without breaking token boundaries.

Installation

pip install nyansasua

Quick Start

import cire

# One-liner extraction
for k in cire.extract_keywords("Machine learning is a branch of AI.", top_k=5):
    print(k.text, k.score)

# Using the Extractor class
ext = cire.Extractor(language="auto", algorithm="ensemble", top_k=10)
for k in ext.extract("Natural language processing has seen rapid growth."):
    print(k.text, k.score)

Batch Processing & TF-IDF Extraction

import cire

ext = cire.Extractor(language="auto")

# Extract from a batch of documents
results = ext.extract_many([
    "Python is widely used in data science.",
    "Climate change is a significant global challenge."
])

# Corpus-driven TF-IDF extraction
corpus = [
    "Python is used in data science.", 
    "Java is used in enterprise environments.",
    "Python is popular for AI."
]
kws = ext.extract_corpus_tfidf(
    texts=corpus, 
    target_text="Python is heavily utilized in AI and ML.", 
    top_k=3
)

Language Detection & Stopwords

import cire

# Detect the dominant script heuristically
lang = cire.detect_language("Bonjour tout le monde") 
print(lang)  # Language.French

# Add custom stopwords dynamically
cire.add_stopword("french", "tout")

License

MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nyansasua-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl (228.1 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

File details

Details for the file nyansasua-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for nyansasua-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 d3b46a2b9f0a8333287727731fcbab5ece6aee295eebbcba6d5ce485bb66ce71
MD5 6ffa86b7eefb7561ed1dd6cea9007c1c
BLAKE2b-256 08d0ece30cf1f38fa1da71ef5fd67f0b4619fe9670e591c8c7e9dbae22e5b5fe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page