Multi-language NLP keyword extraction (C++17 core, pybind11 bindings).
Project description
Nyansasua: Blazing-fast Multi-Language Keyword Extraction
Nyansasua (Twi) — learning / wisdom.
A self-contained, high-performance Python library for multi-language keyword extraction, backed by a low-level C++17 core.
Features
- 14 Languages Supported: English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic, Indonesian, and Twi.
- 4 Extraction Algorithms: TF-IDF, YAKE, TextRank, and RAKE. Also supports a combined Ensemble mode.
- Lightning Fast: Highly optimized C++ backend. Benchmarks show processing of 1.5+ million characters (~200,000 words) in < 0.5 seconds.
- UTF-8 Native: Natively handles complex Unicode and mixed-script texts without breaking token boundaries.
Installation
pip install nyansasua
Quick Start
import cire
# One-liner extraction
for k in cire.extract_keywords("Machine learning is a branch of AI.", top_k=5):
print(k.text, k.score)
# Using the Extractor class
ext = cire.Extractor(language="auto", algorithm="ensemble", top_k=10)
for k in ext.extract("Natural language processing has seen rapid growth."):
print(k.text, k.score)
Batch Processing & TF-IDF Extraction
import cire
ext = cire.Extractor(language="auto")
# Extract from a batch of documents
results = ext.extract_many([
"Python is widely used in data science.",
"Climate change is a significant global challenge."
])
# Corpus-driven TF-IDF extraction
corpus = [
"Python is used in data science.",
"Java is used in enterprise environments.",
"Python is popular for AI."
]
kws = ext.extract_corpus_tfidf(
texts=corpus,
target_text="Python is heavily utilized in AI and ML.",
top_k=3
)
Language Detection & Stopwords
import cire
# Detect the dominant script heuristically
lang = cire.detect_language("Bonjour tout le monde")
print(lang) # Language.French
# Add custom stopwords dynamically
cire.add_stopword("french", "tout")
License
MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nyansasua-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: nyansasua-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 228.1 kB
- Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d3b46a2b9f0a8333287727731fcbab5ece6aee295eebbcba6d5ce485bb66ce71
|
|
| MD5 |
6ffa86b7eefb7561ed1dd6cea9007c1c
|
|
| BLAKE2b-256 |
08d0ece30cf1f38fa1da71ef5fd67f0b4619fe9670e591c8c7e9dbae22e5b5fe
|