A powerful Python package for analyzing reports: sentiment, readability, keywords, summaries, NER, and more.

These details have not been verified by PyPI

Project links

Project description

ReportAnalysis

A powerful, batteries-included Python package for analyzing reports.

Drop in any report — as a text string, PDF, Word document (.docx), or URL — and instantly get sentiment analysis, readability scores, keywords, summaries, named entities, language detection, and much more. Ships with a full CLI and export to JSON, CSV, and HTML.

Features

Feature	Description
Sentiment Analysis	VADER + TextBlob ensemble with confidence scoring
Readability Scores	Flesch, Gunning Fog, SMOG, ARI — all computed offline
Keyword Extraction	TF-IDF keywords + RAKE multi-word keyphrases
Extractive Summary	Top N most informative sentences
Text Statistics	Word count, reading time, vocabulary richness, and more
Named Entity Recognition	People, Organizations, Locations via NLTK
Language Detection	Detects 50+ languages with ISO codes
Report Comparison	Cosine similarity score between two reports
Multi-format Loaders	Plain text, PDF, DOCX, and web URLs
CLI	`analyze` / `compare` / `summarize` subcommands
Export	JSON, CSV, and self-contained HTML reports

Installation

Minimal install (core only)

pip install ReportAnalysis

Full install (with PDF, DOCX, URL loaders and all analysis features)

pip install "ReportAnalysis[full]"

Quick Start

From a text string

from report_analysis import ReportAnalyzer

ra = ReportAnalyzer("The quarterly results exceeded all expectations. Revenue grew 30%.")
result = ra.analyze()
result.show()  # Prints a formatted report to the terminal

From a file

from report_analysis import ReportAnalyzer

# Supports .txt, .pdf, .docx
ra = ReportAnalyzer("annual_report.pdf")
result = ra.analyze()

print(result.sentiment.label)          # "positive"
print(result.readability.grade_level)  # "College"
print(result.keywords.top_keywords[:5])

result.export("analysis.html")  # Export as a standalone HTML report

From a URL

ra = ReportAnalyzer(url="https://example.com/annual-report")
result = ra.analyze()
result.export("results.json")

From a DOCX file

ra = ReportAnalyzer("report.docx")
result = ra.analyze()
print(result.summary.text)

Run only specific modules

result = ra.analyze(
    include=["sentiment", "readability", "keywords"],
    top_keywords=15,
    summary_sentences=5,
)

Compare two reports

ra1 = ReportAnalyzer("Q1 report text here...")
ra2 = ReportAnalyzer("Q2 report text here...")

comparison = ra1.compare_with(ra2)
print(f"Similarity: {comparison.similarity_score:.1%}")  # e.g. "72.3%"
print(comparison.similarity_label)                        # "Similar"
print("Common words:", comparison.common_words[:10])

Export results

result.export("analysis.json")  # Machine-readable JSON
result.export("analysis.csv")   # Spreadsheet-friendly CSV
result.export("analysis.html")  # Standalone HTML report

CLI Usage

# Analyze a file
report-analysis analyze report.pdf

# Analyze from a URL
report-analysis analyze --url https://example.com/annual-report

# Read from stdin
echo "Revenue increased by 30% this quarter." | report-analysis analyze -

# Run only specific modules
report-analysis analyze report.txt --include sentiment --include keywords

# Export results to HTML
report-analysis analyze report.pdf --export html --output results.html

# Compare two reports
report-analysis compare q1.pdf q2.pdf

# Summarize with 10 sentences
report-analysis summarize report.docx --sentences 10

# Show help
report-analysis --help
report-analysis analyze --help

API Reference

`ReportAnalyzer(source="", *, url="")`

Parameter	Type	Description
`source`	`str`	Raw text string, or a path to a `.txt`, `.pdf`, or `.docx` file
`url`	`str`	URL to fetch and analyze (keyword-only argument)

`.analyze(include=None, summary_sentences=5, top_keywords=20)`

Runs the analysis pipeline and returns an AnalysisResult object.

Parameter	Default	Description
`include`	`None` (all modules)	List of module names to run
`summary_sentences`	`5`	Number of sentences to include in the summary
`top_keywords`	`20`	Number of keywords to extract

Available modules: "stats", "language", "sentiment", "readability", "keywords", "summary", "entities"

`AnalysisResult` — Fields

Field	Type	Description
`.stats`	`StatsResult`	Word count, sentence count, reading time, vocabulary richness
`.sentiment`	`SentimentResult`	Label (positive/negative/neutral), compound score, confidence
`.readability`	`ReadabilityResult`	Flesch reading ease, Gunning Fog index, grade level
`.keywords`	`KeywordsResult`	TF-IDF scored keywords and RAKE keyphrases
`.summary`	`SummaryResult`	Extractive summary as sentence list
`.entities`	`EntitiesResult`	Named entities grouped by type
`.language`	`LanguageResult`	ISO language code and human-readable name

`AnalysisResult` — Methods

Method	Description
`.show()`	Print a rich formatted report to the terminal
`.export(path)`	Export to `.json`, `.csv`, or `.html`
`.to_dict()`	Return the full result as a Python `dict`

Result Details

Sentiment

result.sentiment.label             # "positive" | "negative" | "neutral"
result.sentiment.compound          # -1.0 to 1.0
result.sentiment.positive          # 0.0 to 1.0
result.sentiment.confidence        # "high" | "medium" | "low"
result.sentiment.textblob_polarity      # TextBlob polarity score
result.sentiment.textblob_subjectivity  # TextBlob subjectivity score

Readability

result.readability.flesch_reading_ease   # 0-100 (higher = easier to read)
result.readability.flesch_kincaid_grade  # US school grade level
result.readability.gunning_fog           # Years of education needed
result.readability.smog_index            # SMOG grade level
result.readability.reading_ease_label    # "Very Easy", "Standard", "Difficult", etc.
result.readability.grade_level           # "High School", "College", etc.

Keywords

result.keywords.tfidf_keywords   # [(word, score), ...]
result.keywords.rake_phrases     # [(phrase, score), ...]
result.keywords.top_keywords     # [word, ...] — plain list
result.keywords.top_phrases      # [phrase, ...] — plain list

Summary

result.summary.sentences         # ["sentence 1", "sentence 2", ...]
result.summary.text              # Joined summary as a single string
result.summary.reduction_ratio   # 0.0-1.0 (proportion of text removed)

Named Entities

result.entities.people           # ["Steve Jobs", ...]
result.entities.organizations    # ["Apple Inc.", ...]
result.entities.locations        # ["Cupertino", ...]
result.entities.entities         # {"PERSON": [...], "ORGANIZATION": [...], ...}

Dependencies

Installed with the core package:

nltk — tokenization, VADER sentiment, named entity recognition
click — CLI framework
rich — terminal output formatting

Installed with [full] extras:

textblob — secondary sentiment signal and subjectivity scoring
scikit-learn — TF-IDF keyword extraction and cosine similarity
rake-nltk — RAKE multi-word keyphrase extraction
langdetect — language detection
pdfplumber — PDF text extraction
python-docx — Word document (.docx) loading
requests and beautifulsoup4 — web page fetching and parsing

Running Tests

# Install development dependencies
pip install -e ".[dev]"

# Download required NLTK data
python -c "import nltk; nltk.download(['vader_lexicon', 'punkt', 'punkt_tab', 'averaged_perceptron_tagger', 'averaged_perceptron_tagger_eng', 'maxent_ne_chunker', 'words'])"

# Run the full test suite
pytest tests/ -v

Publishing to PyPI

pip install build twine
python -m build
twine upload dist/*

License

MIT License — see LICENSE for details.

Author

Al Mustafiz Bappy

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Mar 5, 2026

0.2

Apr 29, 2023

0.1

Apr 29, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reportanalysis-1.0.0.tar.gz (28.3 kB view details)

Uploaded Mar 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

reportanalysis-1.0.0-py3-none-any.whl (37.2 kB view details)

Uploaded Mar 5, 2026 Python 3

File details

Details for the file reportanalysis-1.0.0.tar.gz.

File metadata

Download URL: reportanalysis-1.0.0.tar.gz
Upload date: Mar 5, 2026
Size: 28.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.9

File hashes

Hashes for reportanalysis-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`0c4cf460a05675b5cf21e24bc352e53bdef0666b83d47b9b99815f56670e4430`
MD5	`026aa41b2c770eebae2db2064644f34e`
BLAKE2b-256	`b0abe595998f0cf80ba10a018c7cad80ce177355babba54fe9c96422d2a2a8ea`

See more details on using hashes here.

File details

Details for the file reportanalysis-1.0.0-py3-none-any.whl.

File metadata

Download URL: reportanalysis-1.0.0-py3-none-any.whl
Upload date: Mar 5, 2026
Size: 37.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.9

File hashes

Hashes for reportanalysis-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`28ac24767b752903ece471d41eea71afe51df81ba7ca6010ef9bf13c68fe0940`
MD5	`058719c288ac1f914d80b12b5207100b`
BLAKE2b-256	`fcb084b4c6ae920a097d3632e356d92defb80a8cb7e35b1320f09e239ffb7c2f`

See more details on using hashes here.

ReportAnalysis 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ReportAnalysis

Features

Installation

Minimal install (core only)

Full install (with PDF, DOCX, URL loaders and all analysis features)

Quick Start

From a text string

From a file

From a URL

From a DOCX file

Run only specific modules

Compare two reports

Export results

CLI Usage

API Reference

ReportAnalyzer(source="", *, url="")

.analyze(include=None, summary_sentences=5, top_keywords=20)

AnalysisResult — Fields

AnalysisResult — Methods

Result Details

Sentiment

Readability

Keywords

Summary

Named Entities

Dependencies

Running Tests

Publishing to PyPI

License

Author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`ReportAnalyzer(source="", *, url="")`

`.analyze(include=None, summary_sentences=5, top_keywords=20)`

`AnalysisResult` — Fields

`AnalysisResult` — Methods