Skip to main content

Audience-aware document summarizer for PDF/DOCX/TXT — optimized for context retention, not token count.

Project description

docsumm-ai

One-line, opinionated document summarizer for PDFs, Word, or text — optimized for context retention, not token count.

CI License Python Version


Why docsumm-ai?

Summarizing long documents shouldn’t mean losing meaning.
Most tools today truncate context just to fit into token limits — resulting in shallow, inaccurate summaries.

docsumm-ai was built to fix that.

We designed it for researchers, analysts, and AI developers who care about both fidelity and efficiency.
It automatically adapts to document structure, ensuring retention of key insights from text, Word, or PDFs — in a single line.


What Makes It Different

One-line summarize() — clean summaries with context retention
Handles PDFs, DOCX, TXT — no format left behind
Context-aware chunking — semantic segmentation, not blind splitting
Adaptive compression — keeps the right level of detail per section
CLI + Python API — works both in scripts and terminal
Transparent JSON + Markdown output — reproducible and human-readable


Installation

pip install docsumm-ai

## Quickstart
1. Summarize a text file
from docsumm_ai import summarize

summary = summarize("annual_report.txt", mode="concise")
print(summary)

2. Summarize a PDF (CLI)
docsumm summarize my_report.pdf --mode detailed --out summary.md

## Output Example

Input:

“The study explores the correlation between urban growth and environmental impact across 32 global cities…”

Output:

“Analyzes 32 cities showing urban expansion drives higher emissions; highlights need for adaptive policies.”

---

## License

MIT License © 2025 Rohit Rajdev
Open for community collaboration and research integration.

🌐 Links

🔗 GitHub: https://github.com/RohitRajdev/docsumm-ai

✉️ Contact: rohitrajdev.com

🧠 Related project: dataprep-ai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docsumm_ai-0.1.0.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docsumm_ai-0.1.0-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file docsumm_ai-0.1.0.tar.gz.

File metadata

  • Download URL: docsumm_ai-0.1.0.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for docsumm_ai-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f6c05439695b95057bfd77df176c2398be234d9ef396b7d2b805154e07557104
MD5 57188e5ce8de5d05e3c7ea7345186c29
BLAKE2b-256 8c07083b5528828e941f8a382d26860c329fb838f1f6a921485813f54cb82fcb

See more details on using hashes here.

Provenance

The following attestation bundles were made for docsumm_ai-0.1.0.tar.gz:

Publisher: publish.yml on RohitRajdev/docsumm-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file docsumm_ai-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: docsumm_ai-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for docsumm_ai-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5469de08b76e652ec9b5ee26e05f8589bfd4379e0ef5d738ae39c59ff0feeaaa
MD5 74169186513f608c14722f462be282de
BLAKE2b-256 ba281401f16c35a1d9a7cce7341a123a0e30eab7956b52f76e543c50cef32497

See more details on using hashes here.

Provenance

The following attestation bundles were made for docsumm_ai-0.1.0-py3-none-any.whl:

Publisher: publish.yml on RohitRajdev/docsumm-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page