Skip to main content

A powerful Python package for analyzing reports: sentiment, readability, keywords, summaries, NER, and more.

Project description

ReportAnalysis

PyPI version Python 3.8+ License: MIT

A powerful, batteries-included Python package for analyzing reports.

Drop in any report — as a text string, PDF, Word document (.docx), or URL — and instantly get sentiment analysis, readability scores, keywords, summaries, named entities, language detection, and much more. Ships with a full CLI and export to JSON, CSV, and HTML.


Features

Feature Description
Sentiment Analysis VADER + TextBlob ensemble with confidence scoring
Readability Scores Flesch, Gunning Fog, SMOG, ARI — all computed offline
Keyword Extraction TF-IDF keywords + RAKE multi-word keyphrases
Extractive Summary Top N most informative sentences
Text Statistics Word count, reading time, vocabulary richness, and more
Named Entity Recognition People, Organizations, Locations via NLTK
Language Detection Detects 50+ languages with ISO codes
Report Comparison Cosine similarity score between two reports
Multi-format Loaders Plain text, PDF, DOCX, and web URLs
CLI analyze / compare / summarize subcommands
Export JSON, CSV, and self-contained HTML reports

Installation

Minimal install (core only)

pip install ReportAnalysis

Full install (with PDF, DOCX, URL loaders and all analysis features)

pip install "ReportAnalysis[full]"

Quick Start

From a text string

from report_analysis import ReportAnalyzer

ra = ReportAnalyzer("The quarterly results exceeded all expectations. Revenue grew 30%.")
result = ra.analyze()
result.show()  # Prints a formatted report to the terminal

From a file

from report_analysis import ReportAnalyzer

# Supports .txt, .pdf, .docx
ra = ReportAnalyzer("annual_report.pdf")
result = ra.analyze()

print(result.sentiment.label)          # "positive"
print(result.readability.grade_level)  # "College"
print(result.keywords.top_keywords[:5])

result.export("analysis.html")  # Export as a standalone HTML report

From a URL

ra = ReportAnalyzer(url="https://example.com/annual-report")
result = ra.analyze()
result.export("results.json")

From a DOCX file

ra = ReportAnalyzer("report.docx")
result = ra.analyze()
print(result.summary.text)

Run only specific modules

result = ra.analyze(
    include=["sentiment", "readability", "keywords"],
    top_keywords=15,
    summary_sentences=5,
)

Compare two reports

ra1 = ReportAnalyzer("Q1 report text here...")
ra2 = ReportAnalyzer("Q2 report text here...")

comparison = ra1.compare_with(ra2)
print(f"Similarity: {comparison.similarity_score:.1%}")  # e.g. "72.3%"
print(comparison.similarity_label)                        # "Similar"
print("Common words:", comparison.common_words[:10])

Export results

result.export("analysis.json")  # Machine-readable JSON
result.export("analysis.csv")   # Spreadsheet-friendly CSV
result.export("analysis.html")  # Standalone HTML report

CLI Usage

# Analyze a file
report-analysis analyze report.pdf

# Analyze from a URL
report-analysis analyze --url https://example.com/annual-report

# Read from stdin
echo "Revenue increased by 30% this quarter." | report-analysis analyze -

# Run only specific modules
report-analysis analyze report.txt --include sentiment --include keywords

# Export results to HTML
report-analysis analyze report.pdf --export html --output results.html

# Compare two reports
report-analysis compare q1.pdf q2.pdf

# Summarize with 10 sentences
report-analysis summarize report.docx --sentences 10

# Show help
report-analysis --help
report-analysis analyze --help

API Reference

ReportAnalyzer(source="", *, url="")

Parameter Type Description
source str Raw text string, or a path to a .txt, .pdf, or .docx file
url str URL to fetch and analyze (keyword-only argument)

.analyze(include=None, summary_sentences=5, top_keywords=20)

Runs the analysis pipeline and returns an AnalysisResult object.

Parameter Default Description
include None (all modules) List of module names to run
summary_sentences 5 Number of sentences to include in the summary
top_keywords 20 Number of keywords to extract

Available modules: "stats", "language", "sentiment", "readability", "keywords", "summary", "entities"

AnalysisResult — Fields

Field Type Description
.stats StatsResult Word count, sentence count, reading time, vocabulary richness
.sentiment SentimentResult Label (positive/negative/neutral), compound score, confidence
.readability ReadabilityResult Flesch reading ease, Gunning Fog index, grade level
.keywords KeywordsResult TF-IDF scored keywords and RAKE keyphrases
.summary SummaryResult Extractive summary as sentence list
.entities EntitiesResult Named entities grouped by type
.language LanguageResult ISO language code and human-readable name

AnalysisResult — Methods

Method Description
.show() Print a rich formatted report to the terminal
.export(path) Export to .json, .csv, or .html
.to_dict() Return the full result as a Python dict

Result Details

Sentiment

result.sentiment.label             # "positive" | "negative" | "neutral"
result.sentiment.compound          # -1.0 to 1.0
result.sentiment.positive          # 0.0 to 1.0
result.sentiment.confidence        # "high" | "medium" | "low"
result.sentiment.textblob_polarity      # TextBlob polarity score
result.sentiment.textblob_subjectivity  # TextBlob subjectivity score

Readability

result.readability.flesch_reading_ease   # 0-100 (higher = easier to read)
result.readability.flesch_kincaid_grade  # US school grade level
result.readability.gunning_fog           # Years of education needed
result.readability.smog_index            # SMOG grade level
result.readability.reading_ease_label    # "Very Easy", "Standard", "Difficult", etc.
result.readability.grade_level           # "High School", "College", etc.

Keywords

result.keywords.tfidf_keywords   # [(word, score), ...]
result.keywords.rake_phrases     # [(phrase, score), ...]
result.keywords.top_keywords     # [word, ...] — plain list
result.keywords.top_phrases      # [phrase, ...] — plain list

Summary

result.summary.sentences         # ["sentence 1", "sentence 2", ...]
result.summary.text              # Joined summary as a single string
result.summary.reduction_ratio   # 0.0-1.0 (proportion of text removed)

Named Entities

result.entities.people           # ["Steve Jobs", ...]
result.entities.organizations    # ["Apple Inc.", ...]
result.entities.locations        # ["Cupertino", ...]
result.entities.entities         # {"PERSON": [...], "ORGANIZATION": [...], ...}

Dependencies

Installed with the core package:

  • nltk — tokenization, VADER sentiment, named entity recognition
  • click — CLI framework
  • rich — terminal output formatting

Installed with [full] extras:

  • textblob — secondary sentiment signal and subjectivity scoring
  • scikit-learn — TF-IDF keyword extraction and cosine similarity
  • rake-nltk — RAKE multi-word keyphrase extraction
  • langdetect — language detection
  • pdfplumber — PDF text extraction
  • python-docx — Word document (.docx) loading
  • requests and beautifulsoup4 — web page fetching and parsing

Running Tests

# Install development dependencies
pip install -e ".[dev]"

# Download required NLTK data
python -c "import nltk; nltk.download(['vader_lexicon', 'punkt', 'punkt_tab', 'averaged_perceptron_tagger', 'averaged_perceptron_tagger_eng', 'maxent_ne_chunker', 'words'])"

# Run the full test suite
pytest tests/ -v

Publishing to PyPI

pip install build twine
python -m build
twine upload dist/*

License

MIT License — see LICENSE for details.

Author

Al Mustafiz Bappy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reportanalysis-1.0.0.tar.gz (28.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reportanalysis-1.0.0-py3-none-any.whl (37.2 kB view details)

Uploaded Python 3

File details

Details for the file reportanalysis-1.0.0.tar.gz.

File metadata

  • Download URL: reportanalysis-1.0.0.tar.gz
  • Upload date:
  • Size: 28.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.9

File hashes

Hashes for reportanalysis-1.0.0.tar.gz
Algorithm Hash digest
SHA256 0c4cf460a05675b5cf21e24bc352e53bdef0666b83d47b9b99815f56670e4430
MD5 026aa41b2c770eebae2db2064644f34e
BLAKE2b-256 b0abe595998f0cf80ba10a018c7cad80ce177355babba54fe9c96422d2a2a8ea

See more details on using hashes here.

File details

Details for the file reportanalysis-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: reportanalysis-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 37.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.9

File hashes

Hashes for reportanalysis-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 28ac24767b752903ece471d41eea71afe51df81ba7ca6010ef9bf13c68fe0940
MD5 058719c288ac1f914d80b12b5207100b
BLAKE2b-256 fcb084b4c6ae920a097d3632e356d92defb80a8cb7e35b1320f09e239ffb7c2f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page