A powerful Python package for analyzing reports: sentiment, readability, keywords, summaries, NER, and more.
Project description
ReportAnalysis
A powerful, batteries-included Python package for analyzing reports.
Drop in any report — as a text string, PDF, Word document (.docx), or URL — and instantly get sentiment analysis, readability scores, keywords, summaries, named entities, language detection, and much more. Ships with a full CLI and export to JSON, CSV, and HTML.
Features
| Feature | Description |
|---|---|
| Sentiment Analysis | VADER + TextBlob ensemble with confidence scoring |
| Readability Scores | Flesch, Gunning Fog, SMOG, ARI — all computed offline |
| Keyword Extraction | TF-IDF keywords + RAKE multi-word keyphrases |
| Extractive Summary | Top N most informative sentences |
| Text Statistics | Word count, reading time, vocabulary richness, and more |
| Named Entity Recognition | People, Organizations, Locations via NLTK |
| Language Detection | Detects 50+ languages with ISO codes |
| Report Comparison | Cosine similarity score between two reports |
| Multi-format Loaders | Plain text, PDF, DOCX, and web URLs |
| CLI | analyze / compare / summarize subcommands |
| Export | JSON, CSV, and self-contained HTML reports |
Installation
Minimal install (core only)
pip install ReportAnalysis
Full install (with PDF, DOCX, URL loaders and all analysis features)
pip install "ReportAnalysis[full]"
Quick Start
From a text string
from report_analysis import ReportAnalyzer
ra = ReportAnalyzer("The quarterly results exceeded all expectations. Revenue grew 30%.")
result = ra.analyze()
result.show() # Prints a formatted report to the terminal
From a file
from report_analysis import ReportAnalyzer
# Supports .txt, .pdf, .docx
ra = ReportAnalyzer("annual_report.pdf")
result = ra.analyze()
print(result.sentiment.label) # "positive"
print(result.readability.grade_level) # "College"
print(result.keywords.top_keywords[:5])
result.export("analysis.html") # Export as a standalone HTML report
From a URL
ra = ReportAnalyzer(url="https://example.com/annual-report")
result = ra.analyze()
result.export("results.json")
From a DOCX file
ra = ReportAnalyzer("report.docx")
result = ra.analyze()
print(result.summary.text)
Run only specific modules
result = ra.analyze(
include=["sentiment", "readability", "keywords"],
top_keywords=15,
summary_sentences=5,
)
Compare two reports
ra1 = ReportAnalyzer("Q1 report text here...")
ra2 = ReportAnalyzer("Q2 report text here...")
comparison = ra1.compare_with(ra2)
print(f"Similarity: {comparison.similarity_score:.1%}") # e.g. "72.3%"
print(comparison.similarity_label) # "Similar"
print("Common words:", comparison.common_words[:10])
Export results
result.export("analysis.json") # Machine-readable JSON
result.export("analysis.csv") # Spreadsheet-friendly CSV
result.export("analysis.html") # Standalone HTML report
CLI Usage
# Analyze a file
report-analysis analyze report.pdf
# Analyze from a URL
report-analysis analyze --url https://example.com/annual-report
# Read from stdin
echo "Revenue increased by 30% this quarter." | report-analysis analyze -
# Run only specific modules
report-analysis analyze report.txt --include sentiment --include keywords
# Export results to HTML
report-analysis analyze report.pdf --export html --output results.html
# Compare two reports
report-analysis compare q1.pdf q2.pdf
# Summarize with 10 sentences
report-analysis summarize report.docx --sentences 10
# Show help
report-analysis --help
report-analysis analyze --help
API Reference
ReportAnalyzer(source="", *, url="")
| Parameter | Type | Description |
|---|---|---|
source |
str |
Raw text string, or a path to a .txt, .pdf, or .docx file |
url |
str |
URL to fetch and analyze (keyword-only argument) |
.analyze(include=None, summary_sentences=5, top_keywords=20)
Runs the analysis pipeline and returns an AnalysisResult object.
| Parameter | Default | Description |
|---|---|---|
include |
None (all modules) |
List of module names to run |
summary_sentences |
5 |
Number of sentences to include in the summary |
top_keywords |
20 |
Number of keywords to extract |
Available modules: "stats", "language", "sentiment", "readability", "keywords", "summary", "entities"
AnalysisResult — Fields
| Field | Type | Description |
|---|---|---|
.stats |
StatsResult |
Word count, sentence count, reading time, vocabulary richness |
.sentiment |
SentimentResult |
Label (positive/negative/neutral), compound score, confidence |
.readability |
ReadabilityResult |
Flesch reading ease, Gunning Fog index, grade level |
.keywords |
KeywordsResult |
TF-IDF scored keywords and RAKE keyphrases |
.summary |
SummaryResult |
Extractive summary as sentence list |
.entities |
EntitiesResult |
Named entities grouped by type |
.language |
LanguageResult |
ISO language code and human-readable name |
AnalysisResult — Methods
| Method | Description |
|---|---|
.show() |
Print a rich formatted report to the terminal |
.export(path) |
Export to .json, .csv, or .html |
.to_dict() |
Return the full result as a Python dict |
Result Details
Sentiment
result.sentiment.label # "positive" | "negative" | "neutral"
result.sentiment.compound # -1.0 to 1.0
result.sentiment.positive # 0.0 to 1.0
result.sentiment.confidence # "high" | "medium" | "low"
result.sentiment.textblob_polarity # TextBlob polarity score
result.sentiment.textblob_subjectivity # TextBlob subjectivity score
Readability
result.readability.flesch_reading_ease # 0-100 (higher = easier to read)
result.readability.flesch_kincaid_grade # US school grade level
result.readability.gunning_fog # Years of education needed
result.readability.smog_index # SMOG grade level
result.readability.reading_ease_label # "Very Easy", "Standard", "Difficult", etc.
result.readability.grade_level # "High School", "College", etc.
Keywords
result.keywords.tfidf_keywords # [(word, score), ...]
result.keywords.rake_phrases # [(phrase, score), ...]
result.keywords.top_keywords # [word, ...] — plain list
result.keywords.top_phrases # [phrase, ...] — plain list
Summary
result.summary.sentences # ["sentence 1", "sentence 2", ...]
result.summary.text # Joined summary as a single string
result.summary.reduction_ratio # 0.0-1.0 (proportion of text removed)
Named Entities
result.entities.people # ["Steve Jobs", ...]
result.entities.organizations # ["Apple Inc.", ...]
result.entities.locations # ["Cupertino", ...]
result.entities.entities # {"PERSON": [...], "ORGANIZATION": [...], ...}
Dependencies
Installed with the core package:
nltk— tokenization, VADER sentiment, named entity recognitionclick— CLI frameworkrich— terminal output formatting
Installed with [full] extras:
textblob— secondary sentiment signal and subjectivity scoringscikit-learn— TF-IDF keyword extraction and cosine similarityrake-nltk— RAKE multi-word keyphrase extractionlangdetect— language detectionpdfplumber— PDF text extractionpython-docx— Word document (.docx) loadingrequestsandbeautifulsoup4— web page fetching and parsing
Running Tests
# Install development dependencies
pip install -e ".[dev]"
# Download required NLTK data
python -c "import nltk; nltk.download(['vader_lexicon', 'punkt', 'punkt_tab', 'averaged_perceptron_tagger', 'averaged_perceptron_tagger_eng', 'maxent_ne_chunker', 'words'])"
# Run the full test suite
pytest tests/ -v
Publishing to PyPI
pip install build twine
python -m build
twine upload dist/*
License
MIT License — see LICENSE for details.
Author
Al Mustafiz Bappy
- Website: almustafizbappy.zerodevs.com
- GitHub: @bappy-3
- PyPI: pypi.org/project/ReportAnalysis
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file reportanalysis-1.0.0.tar.gz.
File metadata
- Download URL: reportanalysis-1.0.0.tar.gz
- Upload date:
- Size: 28.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c4cf460a05675b5cf21e24bc352e53bdef0666b83d47b9b99815f56670e4430
|
|
| MD5 |
026aa41b2c770eebae2db2064644f34e
|
|
| BLAKE2b-256 |
b0abe595998f0cf80ba10a018c7cad80ce177355babba54fe9c96422d2a2a8ea
|
File details
Details for the file reportanalysis-1.0.0-py3-none-any.whl.
File metadata
- Download URL: reportanalysis-1.0.0-py3-none-any.whl
- Upload date:
- Size: 37.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28ac24767b752903ece471d41eea71afe51df81ba7ca6010ef9bf13c68fe0940
|
|
| MD5 |
058719c288ac1f914d80b12b5207100b
|
|
| BLAKE2b-256 |
fcb084b4c6ae920a097d3632e356d92defb80a8cb7e35b1320f09e239ffb7c2f
|