aize — lightweight NLP analysis toolkit (Zipf, Heap's law, TF-IDF, sentiment, readability & more)

These details have not been verified by PyPI

Project links

Project description

aize · NLP Analysis Toolkit

A lightweight, pip-installable Python library for deep text analysis — covering everything from Zipf's law to sentiment, readability, TF-IDF, and more. Comes with a Streamlit dashboard and a FastAPI backend out of the box.

Features
Installation
Quick Start
Module Reference
Streamlit Dashboard
FastAPI Backend
Dependencies
Project Structure
License

Features

Category	Capability
📊 Statistics	Word count, unique words, avg word length, sentence count
📏 Word Grouping	Frequency distribution grouped by word length
📉 Zipf's Law	Rank-frequency distribution, hapax & dis legomena percentages
📈 Heap's Law	Vocabulary growth curve as corpus size increases
🚫 Stopwords	Stopword density analysis
🔤 Vocabulary	Side-by-side vocabulary comparison across multiple texts
🔍 TF-IDF	Top keyword extraction per document in a corpus
🔗 N-grams	Most common bigrams and trigrams
💬 Sentiment	VADER-based positive / negative / neutral / compound scoring
📖 Readability	Flesch Reading Ease & Flesch-Kincaid Grade Level
🏷️ POS Tagging	Part-of-speech frequency breakdown
☁️ Word Cloud	Generates word cloud images from any text
🖥️ Dashboard	Interactive Streamlit UI for all analyses
⚡ API	FastAPI REST backend for programmatic access

Installation

Core library

pip install aize

With the Streamlit dashboard

pip install aize[dashboard]

With the FastAPI backend

pip install aize[api]

Everything (dashboard + API)

pip install aize[all]

From source (development)

git clone https://github.com/eokoaze/aize.git
cd aize
pip install -e .[all]

Python 3.9+ is required.

Quick Start

import aize

text = """
Natural language processing is a subfield of linguistics and artificial intelligence.
It is primarily concerned with giving computers the ability to understand text and speech.
"""

# Basic stats
print(aize.compute_stats(text))

# Sentiment
print(aize.analyze_sentiment(text))

# Readability
print(aize.compute_readability(text))

# Zipf's Law
print(aize.analyze_zipf(text))

Module Reference

`compute_stats`

from aize import compute_stats

result = compute_stats(text)

Returns basic corpus statistics.

Key	Type	Description
`word_count`	`int`	Total number of words
`unique_words`	`int`	Number of distinct words
`avg_word_length`	`float`	Average characters per word
`sentence_count`	`int`	Number of sentences

`analyze_groupwords`

from aize import analyze_groupwords

result = analyze_groupwords(text)

Groups words by their character length and returns frequency counts per length bucket.

`analyze_zipf`

from aize import analyze_zipf

result = analyze_zipf(text)

Computes Zipf's Law statistics over the text.

Key	Type	Description
`frequency`	`dict`	`{word: count}` sorted most → least frequent
`rank_freq`	`list[tuple]`	`[(rank, count)]` for rank-frequency plotting
`hapax_pct`	`float`	% of words appearing exactly once
`dis_pct`	`float`	% of words appearing exactly twice
`freq_gt2_pct`	`float`	% of words appearing more than twice

`analyze_heaps`

from aize import analyze_heaps

result = analyze_heaps(text)

Returns a vocabulary growth curve (Heap's Law). Useful for visualising how the vocabulary expands as more text is read.

`calculate_density`

from aize import calculate_density

result = calculate_density(text)

Calculates the proportion of stopwords in the text, returning a stopword density percentage and associated word lists.

`compare_vocab`

from aize import compare_vocab

result = compare_vocab({"doc1": text1, "doc2": text2})

Compares vocabulary across multiple documents — unique words per document, shared vocabulary, and overlap statistics.

`compute_tfidf`

from aize import compute_tfidf

result = compute_tfidf(
    texts=["text of doc1...", "text of doc2..."],
    labels=["doc1", "doc2"],
    top_n=15
)
# Returns: {"doc1": [("word", score), ...], "doc2": [...]}

Extracts the top n TF-IDF keywords for each document in a corpus. Uses scikit-learn under the hood with English stopword filtering.

`compute_ngrams`

from aize import compute_ngrams

bigrams  = compute_ngrams(text, n=2, top_n=20)
trigrams = compute_ngrams(text, n=3, top_n=20)
# Returns: [("phrase here", count), ...]

Returns the most frequent n-grams (bigrams, trigrams, etc.) from the text.

`analyze_sentiment`

from aize import analyze_sentiment

result = analyze_sentiment(text)

Runs VADER sentiment analysis. NLTK's vader_lexicon is auto-downloaded on first use.

Key	Type	Description
`positive`	`float`	Proportion of positive sentiment
`negative`	`float`	Proportion of negative sentiment
`neutral`	`float`	Proportion of neutral sentiment
`compound`	`float`	Overall score from `-1.0` (most negative) to `+1.0` (most positive)
`label`	`str`	`"Positive"`, `"Negative"`, or `"Neutral"`

`compute_readability`

from aize import compute_readability

result = compute_readability(text)

Computes Flesch-Kincaid readability metrics.

Key	Type	Description
`flesch_reading_ease`	`float`	0–100 score; higher = easier to read
`fk_grade_level`	`float`	Approximate US school grade level
`sentences`	`int`	Sentence count
`words`	`int`	Word count
`syllables`	`int`	Total syllables
`interpretation`	`str`	`"Very Easy"` → `"Very Confusing"`

`analyze_pos`

from aize import analyze_pos

result = analyze_pos(text)

Returns a part-of-speech frequency breakdown (nouns, verbs, adjectives, adverbs, etc.) using NLTK's POS tagger.

`generate_wordcloud`

from aize import generate_wordcloud

image = generate_wordcloud(text)

Generates a word cloud image from the input text. Returns a PIL Image object that can be displayed or saved.

image.save("wordcloud.png")

Streamlit Dashboard

An interactive, browser-based UI for all analyses is included.

streamlit run nlp_dashboard.py

The dashboard lets you upload one or more .txt files and interactively explore all analysis modules with charts and tables powered by Plotly.

FastAPI Backend

A REST API is included for programmatic or remote access to the toolkit.

uvicorn api:app --reload

The API will be available at http://127.0.0.1:8000. Interactive docs are auto-generated at:

Swagger UI: http://127.0.0.1:8000/docs
ReDoc: http://127.0.0.1:8000/redoc

Dependencies

Package	Purpose
`nltk >= 3.8`	Tokenisation, POS tagging, VADER sentiment
`scikit-learn >= 1.2`	TF-IDF vectorisation
`wordcloud >= 1.9`	Word cloud image generation
`pandas >= 1.5`	Data manipulation
`plotly >= 5.0`	Interactive charts in the dashboard
`streamlit >= 1.28`	Web dashboard UI
`fastapi >= 0.100`	REST API framework
`uvicorn >= 0.23`	ASGI server for FastAPI
`python-multipart >= 0.0.6`	File upload support for FastAPI

Project Structure

aize/
├── aize/                        # Core library package
│   ├── __init__.py              # Public API surface
│   └── analysis/
│       ├── stats.py             # Basic text statistics
│       ├── groupwords.py        # Word length grouping
│       ├── zipf.py              # Zipf's law analysis
│       ├── heaps.py             # Heap's law analysis
│       ├── stopwords.py         # Stopword density
│       ├── vocab.py             # Vocabulary comparison
│       ├── tfidf.py             # TF-IDF & n-grams
│       ├── sentiment.py         # VADER sentiment
│       ├── readability.py       # Flesch-Kincaid scores
│       ├── pos.py               # POS tagging
│       └── wordcloud_gen.py     # Word cloud generation
├── .github/workflows/
│   └── publish.yml              # Auto-publish to PyPI on version tags
├── nlp_dashboard.py             # Streamlit dashboard
├── api.py                       # FastAPI REST backend
├── pyproject.toml               # Package config & dependency extras
├── MANIFEST.in                  # Source distribution file rules
├── requirements.txt             # All-inclusive dev requirements
└── README.md

License

This project is licensed under the MIT License. See LICENSE for details.

Built with ❤️ using Python, NLTK, scikit-learn, Streamlit & FastAPI

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Feb 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aize-0.1.0.tar.gz (15.1 kB view details)

Uploaded Feb 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aize-0.1.0-py3-none-any.whl (14.0 kB view details)

Uploaded Feb 20, 2026 Python 3

File details

Details for the file aize-0.1.0.tar.gz.

File metadata

Download URL: aize-0.1.0.tar.gz
Upload date: Feb 20, 2026
Size: 15.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for aize-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`724eaca1de5f6681bd3ce2ab1cedfb6ba2c71881385f1193adaaa884e04c04ac`
MD5	`112c6729a16736d6ee5627c92b80909b`
BLAKE2b-256	`3cb2bf1adabc811313b47ba4fd25e94443f34baaecb2c50d45ab9dbc70b4edc3`

See more details on using hashes here.

File details

Details for the file aize-0.1.0-py3-none-any.whl.

File metadata

Download URL: aize-0.1.0-py3-none-any.whl
Upload date: Feb 20, 2026
Size: 14.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for aize-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a5955f07499d0ad615a9fc036eab3b44fe1b4c6f996de8d4069e546966f9d48f`
MD5	`2a440213df9b9722c8efcb1c5d1862b6`
BLAKE2b-256	`504bbd0a2db36a3b1bcda5680b429ab27b9ab27c8cddc1ed2b49596604237958`

See more details on using hashes here.

aize 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

aize · NLP Analysis Toolkit

Table of Contents

Features

Installation

Core library

With the Streamlit dashboard

With the FastAPI backend

Everything (dashboard + API)

From source (development)

Quick Start

Module Reference

compute_stats

analyze_groupwords

analyze_zipf

analyze_heaps

calculate_density

compare_vocab

compute_tfidf

compute_ngrams

analyze_sentiment

compute_readability

analyze_pos

generate_wordcloud

Streamlit Dashboard

FastAPI Backend

Dependencies

Project Structure

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`compute_stats`

`analyze_groupwords`

`analyze_zipf`

`analyze_heaps`

`calculate_density`

`compare_vocab`

`compute_tfidf`

`compute_ngrams`

`analyze_sentiment`

`compute_readability`

`analyze_pos`

`generate_wordcloud`