aize — lightweight NLP analysis toolkit (Zipf, Heap's law, TF-IDF, sentiment, readability & more)
Project description
aize · NLP Analysis Toolkit
A lightweight, pip-installable Python library for deep text analysis — covering everything from Zipf's law to sentiment, readability, TF-IDF, and more. Comes with a Streamlit dashboard and a FastAPI backend out of the box.
Table of Contents
- Features
- Installation
- Quick Start
- Module Reference
- Streamlit Dashboard
- FastAPI Backend
- Dependencies
- Project Structure
- License
Features
| Category | Capability |
|---|---|
| 📊 Statistics | Word count, unique words, avg word length, sentence count |
| 📏 Word Grouping | Frequency distribution grouped by word length |
| 📉 Zipf's Law | Rank-frequency distribution, hapax & dis legomena percentages |
| 📈 Heap's Law | Vocabulary growth curve as corpus size increases |
| 🚫 Stopwords | Stopword density analysis |
| 🔤 Vocabulary | Side-by-side vocabulary comparison across multiple texts |
| 🔍 TF-IDF | Top keyword extraction per document in a corpus |
| 🔗 N-grams | Most common bigrams and trigrams |
| 💬 Sentiment | VADER-based positive / negative / neutral / compound scoring |
| 📖 Readability | Flesch Reading Ease & Flesch-Kincaid Grade Level |
| 🏷️ POS Tagging | Part-of-speech frequency breakdown |
| ☁️ Word Cloud | Generates word cloud images from any text |
| 🖥️ Dashboard | Interactive Streamlit UI for all analyses |
| ⚡ API | FastAPI REST backend for programmatic access |
Installation
Core library
pip install aize
With the Streamlit dashboard
pip install aize[dashboard]
With the FastAPI backend
pip install aize[api]
Everything (dashboard + API)
pip install aize[all]
From source (development)
git clone https://github.com/eokoaze/aize.git
cd aize
pip install -e .[all]
Python 3.9+ is required.
Quick Start
import aize
text = """
Natural language processing is a subfield of linguistics and artificial intelligence.
It is primarily concerned with giving computers the ability to understand text and speech.
"""
# Basic stats
print(aize.compute_stats(text))
# Sentiment
print(aize.analyze_sentiment(text))
# Readability
print(aize.compute_readability(text))
# Zipf's Law
print(aize.analyze_zipf(text))
Module Reference
compute_stats
from aize import compute_stats
result = compute_stats(text)
Returns basic corpus statistics.
| Key | Type | Description |
|---|---|---|
word_count |
int |
Total number of words |
unique_words |
int |
Number of distinct words |
avg_word_length |
float |
Average characters per word |
sentence_count |
int |
Number of sentences |
analyze_groupwords
from aize import analyze_groupwords
result = analyze_groupwords(text)
Groups words by their character length and returns frequency counts per length bucket.
analyze_zipf
from aize import analyze_zipf
result = analyze_zipf(text)
Computes Zipf's Law statistics over the text.
| Key | Type | Description |
|---|---|---|
frequency |
dict |
{word: count} sorted most → least frequent |
rank_freq |
list[tuple] |
[(rank, count)] for rank-frequency plotting |
hapax_pct |
float |
% of words appearing exactly once |
dis_pct |
float |
% of words appearing exactly twice |
freq_gt2_pct |
float |
% of words appearing more than twice |
analyze_heaps
from aize import analyze_heaps
result = analyze_heaps(text)
Returns a vocabulary growth curve (Heap's Law). Useful for visualising how the vocabulary expands as more text is read.
calculate_density
from aize import calculate_density
result = calculate_density(text)
Calculates the proportion of stopwords in the text, returning a stopword density percentage and associated word lists.
compare_vocab
from aize import compare_vocab
result = compare_vocab({"doc1": text1, "doc2": text2})
Compares vocabulary across multiple documents — unique words per document, shared vocabulary, and overlap statistics.
compute_tfidf
from aize import compute_tfidf
result = compute_tfidf(
texts=["text of doc1...", "text of doc2..."],
labels=["doc1", "doc2"],
top_n=15
)
# Returns: {"doc1": [("word", score), ...], "doc2": [...]}
Extracts the top n TF-IDF keywords for each document in a corpus. Uses scikit-learn under the hood with English stopword filtering.
compute_ngrams
from aize import compute_ngrams
bigrams = compute_ngrams(text, n=2, top_n=20)
trigrams = compute_ngrams(text, n=3, top_n=20)
# Returns: [("phrase here", count), ...]
Returns the most frequent n-grams (bigrams, trigrams, etc.) from the text.
analyze_sentiment
from aize import analyze_sentiment
result = analyze_sentiment(text)
Runs VADER sentiment analysis. NLTK's vader_lexicon is auto-downloaded on first use.
| Key | Type | Description |
|---|---|---|
positive |
float |
Proportion of positive sentiment |
negative |
float |
Proportion of negative sentiment |
neutral |
float |
Proportion of neutral sentiment |
compound |
float |
Overall score from -1.0 (most negative) to +1.0 (most positive) |
label |
str |
"Positive", "Negative", or "Neutral" |
compute_readability
from aize import compute_readability
result = compute_readability(text)
Computes Flesch-Kincaid readability metrics.
| Key | Type | Description |
|---|---|---|
flesch_reading_ease |
float |
0–100 score; higher = easier to read |
fk_grade_level |
float |
Approximate US school grade level |
sentences |
int |
Sentence count |
words |
int |
Word count |
syllables |
int |
Total syllables |
interpretation |
str |
"Very Easy" → "Very Confusing" |
analyze_pos
from aize import analyze_pos
result = analyze_pos(text)
Returns a part-of-speech frequency breakdown (nouns, verbs, adjectives, adverbs, etc.) using NLTK's POS tagger.
generate_wordcloud
from aize import generate_wordcloud
image = generate_wordcloud(text)
Generates a word cloud image from the input text. Returns a PIL Image object that can be displayed or saved.
image.save("wordcloud.png")
Streamlit Dashboard
An interactive, browser-based UI for all analyses is included.
streamlit run nlp_dashboard.py
The dashboard lets you upload one or more .txt files and interactively explore all analysis modules with charts and tables powered by Plotly.
FastAPI Backend
A REST API is included for programmatic or remote access to the toolkit.
uvicorn api:app --reload
The API will be available at http://127.0.0.1:8000. Interactive docs are auto-generated at:
- Swagger UI:
http://127.0.0.1:8000/docs - ReDoc:
http://127.0.0.1:8000/redoc
Dependencies
| Package | Purpose |
|---|---|
nltk >= 3.8 |
Tokenisation, POS tagging, VADER sentiment |
scikit-learn >= 1.2 |
TF-IDF vectorisation |
wordcloud >= 1.9 |
Word cloud image generation |
pandas >= 1.5 |
Data manipulation |
plotly >= 5.0 |
Interactive charts in the dashboard |
streamlit >= 1.28 |
Web dashboard UI |
fastapi >= 0.100 |
REST API framework |
uvicorn >= 0.23 |
ASGI server for FastAPI |
python-multipart >= 0.0.6 |
File upload support for FastAPI |
Project Structure
aize/
├── aize/ # Core library package
│ ├── __init__.py # Public API surface
│ └── analysis/
│ ├── stats.py # Basic text statistics
│ ├── groupwords.py # Word length grouping
│ ├── zipf.py # Zipf's law analysis
│ ├── heaps.py # Heap's law analysis
│ ├── stopwords.py # Stopword density
│ ├── vocab.py # Vocabulary comparison
│ ├── tfidf.py # TF-IDF & n-grams
│ ├── sentiment.py # VADER sentiment
│ ├── readability.py # Flesch-Kincaid scores
│ ├── pos.py # POS tagging
│ └── wordcloud_gen.py # Word cloud generation
├── .github/workflows/
│ └── publish.yml # Auto-publish to PyPI on version tags
├── nlp_dashboard.py # Streamlit dashboard
├── api.py # FastAPI REST backend
├── pyproject.toml # Package config & dependency extras
├── MANIFEST.in # Source distribution file rules
├── requirements.txt # All-inclusive dev requirements
└── README.md
License
This project is licensed under the MIT License. See LICENSE for details.
Built with ❤️ using Python, NLTK, scikit-learn, Streamlit & FastAPI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aize-0.1.0.tar.gz.
File metadata
- Download URL: aize-0.1.0.tar.gz
- Upload date:
- Size: 15.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
724eaca1de5f6681bd3ce2ab1cedfb6ba2c71881385f1193adaaa884e04c04ac
|
|
| MD5 |
112c6729a16736d6ee5627c92b80909b
|
|
| BLAKE2b-256 |
3cb2bf1adabc811313b47ba4fd25e94443f34baaecb2c50d45ab9dbc70b4edc3
|
File details
Details for the file aize-0.1.0-py3-none-any.whl.
File metadata
- Download URL: aize-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5955f07499d0ad615a9fc036eab3b44fe1b4c6f996de8d4069e546966f9d48f
|
|
| MD5 |
2a440213df9b9722c8efcb1c5d1862b6
|
|
| BLAKE2b-256 |
504bbd0a2db36a3b1bcda5680b429ab27b9ab27c8cddc1ed2b49596604237958
|