Skip to main content

aize — lightweight NLP analysis toolkit (Zipf, Heap's law, TF-IDF, sentiment, readability & more)

Project description

aize · NLP Analysis Toolkit

PyPI version Python License: MIT

A lightweight, pip-installable Python library for deep text analysis — covering everything from Zipf's law to sentiment, readability, TF-IDF, and more. Comes with a Streamlit dashboard and a FastAPI backend out of the box.


Table of Contents


Features

Category Capability
📊 Statistics Word count, unique words, avg word length, sentence count
📏 Word Grouping Frequency distribution grouped by word length
📉 Zipf's Law Rank-frequency distribution, hapax & dis legomena percentages
📈 Heap's Law Vocabulary growth curve as corpus size increases
🚫 Stopwords Stopword density analysis
🔤 Vocabulary Side-by-side vocabulary comparison across multiple texts
🔍 TF-IDF Top keyword extraction per document in a corpus
🔗 N-grams Most common bigrams and trigrams
💬 Sentiment VADER-based positive / negative / neutral / compound scoring
📖 Readability Flesch Reading Ease & Flesch-Kincaid Grade Level
🏷️ POS Tagging Part-of-speech frequency breakdown
☁️ Word Cloud Generates word cloud images from any text
🖥️ Dashboard Interactive Streamlit UI for all analyses
API FastAPI REST backend for programmatic access

Installation

Core library

pip install aize

With the Streamlit dashboard

pip install aize[dashboard]

With the FastAPI backend

pip install aize[api]

Everything (dashboard + API)

pip install aize[all]

From source (development)

git clone https://github.com/eokoaze/aize.git
cd aize
pip install -e .[all]

Python 3.9+ is required.


Quick Start

import aize

text = """
Natural language processing is a subfield of linguistics and artificial intelligence.
It is primarily concerned with giving computers the ability to understand text and speech.
"""

# Basic stats
print(aize.compute_stats(text))

# Sentiment
print(aize.analyze_sentiment(text))

# Readability
print(aize.compute_readability(text))

# Zipf's Law
print(aize.analyze_zipf(text))

Module Reference

compute_stats

from aize import compute_stats

result = compute_stats(text)

Returns basic corpus statistics.

Key Type Description
word_count int Total number of words
unique_words int Number of distinct words
avg_word_length float Average characters per word
sentence_count int Number of sentences

analyze_groupwords

from aize import analyze_groupwords

result = analyze_groupwords(text)

Groups words by their character length and returns frequency counts per length bucket.


analyze_zipf

from aize import analyze_zipf

result = analyze_zipf(text)

Computes Zipf's Law statistics over the text.

Key Type Description
frequency dict {word: count} sorted most → least frequent
rank_freq list[tuple] [(rank, count)] for rank-frequency plotting
hapax_pct float % of words appearing exactly once
dis_pct float % of words appearing exactly twice
freq_gt2_pct float % of words appearing more than twice

analyze_heaps

from aize import analyze_heaps

result = analyze_heaps(text)

Returns a vocabulary growth curve (Heap's Law). Useful for visualising how the vocabulary expands as more text is read.


calculate_density

from aize import calculate_density

result = calculate_density(text)

Calculates the proportion of stopwords in the text, returning a stopword density percentage and associated word lists.


compare_vocab

from aize import compare_vocab

result = compare_vocab({"doc1": text1, "doc2": text2})

Compares vocabulary across multiple documents — unique words per document, shared vocabulary, and overlap statistics.


compute_tfidf

from aize import compute_tfidf

result = compute_tfidf(
    texts=["text of doc1...", "text of doc2..."],
    labels=["doc1", "doc2"],
    top_n=15
)
# Returns: {"doc1": [("word", score), ...], "doc2": [...]}

Extracts the top n TF-IDF keywords for each document in a corpus. Uses scikit-learn under the hood with English stopword filtering.


compute_ngrams

from aize import compute_ngrams

bigrams  = compute_ngrams(text, n=2, top_n=20)
trigrams = compute_ngrams(text, n=3, top_n=20)
# Returns: [("phrase here", count), ...]

Returns the most frequent n-grams (bigrams, trigrams, etc.) from the text.


analyze_sentiment

from aize import analyze_sentiment

result = analyze_sentiment(text)

Runs VADER sentiment analysis. NLTK's vader_lexicon is auto-downloaded on first use.

Key Type Description
positive float Proportion of positive sentiment
negative float Proportion of negative sentiment
neutral float Proportion of neutral sentiment
compound float Overall score from -1.0 (most negative) to +1.0 (most positive)
label str "Positive", "Negative", or "Neutral"

compute_readability

from aize import compute_readability

result = compute_readability(text)

Computes Flesch-Kincaid readability metrics.

Key Type Description
flesch_reading_ease float 0–100 score; higher = easier to read
fk_grade_level float Approximate US school grade level
sentences int Sentence count
words int Word count
syllables int Total syllables
interpretation str "Very Easy""Very Confusing"

analyze_pos

from aize import analyze_pos

result = analyze_pos(text)

Returns a part-of-speech frequency breakdown (nouns, verbs, adjectives, adverbs, etc.) using NLTK's POS tagger.


generate_wordcloud

from aize import generate_wordcloud

image = generate_wordcloud(text)

Generates a word cloud image from the input text. Returns a PIL Image object that can be displayed or saved.

image.save("wordcloud.png")

Streamlit Dashboard

An interactive, browser-based UI for all analyses is included.

streamlit run nlp_dashboard.py

The dashboard lets you upload one or more .txt files and interactively explore all analysis modules with charts and tables powered by Plotly.


FastAPI Backend

A REST API is included for programmatic or remote access to the toolkit.

uvicorn api:app --reload

The API will be available at http://127.0.0.1:8000. Interactive docs are auto-generated at:

  • Swagger UI: http://127.0.0.1:8000/docs
  • ReDoc: http://127.0.0.1:8000/redoc

Dependencies

Package Purpose
nltk >= 3.8 Tokenisation, POS tagging, VADER sentiment
scikit-learn >= 1.2 TF-IDF vectorisation
wordcloud >= 1.9 Word cloud image generation
pandas >= 1.5 Data manipulation
plotly >= 5.0 Interactive charts in the dashboard
streamlit >= 1.28 Web dashboard UI
fastapi >= 0.100 REST API framework
uvicorn >= 0.23 ASGI server for FastAPI
python-multipart >= 0.0.6 File upload support for FastAPI

Project Structure

aize/
├── aize/                        # Core library package
│   ├── __init__.py              # Public API surface
│   └── analysis/
│       ├── stats.py             # Basic text statistics
│       ├── groupwords.py        # Word length grouping
│       ├── zipf.py              # Zipf's law analysis
│       ├── heaps.py             # Heap's law analysis
│       ├── stopwords.py         # Stopword density
│       ├── vocab.py             # Vocabulary comparison
│       ├── tfidf.py             # TF-IDF & n-grams
│       ├── sentiment.py         # VADER sentiment
│       ├── readability.py       # Flesch-Kincaid scores
│       ├── pos.py               # POS tagging
│       └── wordcloud_gen.py     # Word cloud generation
├── .github/workflows/
│   └── publish.yml              # Auto-publish to PyPI on version tags
├── nlp_dashboard.py             # Streamlit dashboard
├── api.py                       # FastAPI REST backend
├── pyproject.toml               # Package config & dependency extras
├── MANIFEST.in                  # Source distribution file rules
├── requirements.txt             # All-inclusive dev requirements
└── README.md

License

This project is licensed under the MIT License. See LICENSE for details.


Built with ❤️ using Python, NLTK, scikit-learn, Streamlit & FastAPI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aize-0.1.0.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aize-0.1.0-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file aize-0.1.0.tar.gz.

File metadata

  • Download URL: aize-0.1.0.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for aize-0.1.0.tar.gz
Algorithm Hash digest
SHA256 724eaca1de5f6681bd3ce2ab1cedfb6ba2c71881385f1193adaaa884e04c04ac
MD5 112c6729a16736d6ee5627c92b80909b
BLAKE2b-256 3cb2bf1adabc811313b47ba4fd25e94443f34baaecb2c50d45ab9dbc70b4edc3

See more details on using hashes here.

File details

Details for the file aize-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: aize-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for aize-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a5955f07499d0ad615a9fc036eab3b44fe1b4c6f996de8d4069e546966f9d48f
MD5 2a440213df9b9722c8efcb1c5d1862b6
BLAKE2b-256 504bbd0a2db36a3b1bcda5680b429ab27b9ab27c8cddc1ed2b49596604237958

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page