Audience-aware document summarizer for PDF/DOCX/TXT — optimized for context retention, not token count.
Project description
docsumm-ai
One-line, opinionated document summarizer for PDFs, Word, or text — optimized for context retention, not token count.
Why docsumm-ai?
Summarizing long documents shouldn’t mean losing meaning.
Most tools today truncate context just to fit into token limits — resulting in shallow, inaccurate summaries.
docsumm-ai was built to fix that.
We designed it for researchers, analysts, and AI developers who care about both fidelity and efficiency.
It automatically adapts to document structure, ensuring retention of key insights from text, Word, or PDFs — in a single line.
What Makes It Different
✅ One-line summarize() — clean summaries with context retention
✅ Handles PDFs, DOCX, TXT — no format left behind
✅ Context-aware chunking — semantic segmentation, not blind splitting
✅ Adaptive compression — keeps the right level of detail per section
✅ CLI + Python API — works both in scripts and terminal
✅ Transparent JSON + Markdown output — reproducible and human-readable
Installation
pip install docsumm-ai
## Quickstart
1. Summarize a text file
from docsumm_ai import summarize
summary = summarize("annual_report.txt", mode="concise")
print(summary)
2. Summarize a PDF (CLI)
docsumm summarize my_report.pdf --mode detailed --out summary.md
## Output Example
Input:
“The study explores the correlation between urban growth and environmental impact across 32 global cities…”
Output:
“Analyzes 32 cities showing urban expansion drives higher emissions; highlights need for adaptive policies.”
---
## License
MIT License © 2025 Rohit Rajdev
Open for community collaboration and research integration.
🌐 Links
🔗 GitHub: https://github.com/RohitRajdev/docsumm-ai
✉️ Contact: rohitrajdev.com
🧠 Related project: dataprep-ai
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docsumm_ai-0.1.0.tar.gz.
File metadata
- Download URL: docsumm_ai-0.1.0.tar.gz
- Upload date:
- Size: 6.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f6c05439695b95057bfd77df176c2398be234d9ef396b7d2b805154e07557104
|
|
| MD5 |
57188e5ce8de5d05e3c7ea7345186c29
|
|
| BLAKE2b-256 |
8c07083b5528828e941f8a382d26860c329fb838f1f6a921485813f54cb82fcb
|
Provenance
The following attestation bundles were made for docsumm_ai-0.1.0.tar.gz:
Publisher:
publish.yml on RohitRajdev/docsumm-ai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
docsumm_ai-0.1.0.tar.gz -
Subject digest:
f6c05439695b95057bfd77df176c2398be234d9ef396b7d2b805154e07557104 - Sigstore transparency entry: 585585737
- Sigstore integration time:
-
Permalink:
RohitRajdev/docsumm-ai@ae5bd4ae1c9f667d533e66275f94fc6d235f57ed -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/RohitRajdev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ae5bd4ae1c9f667d533e66275f94fc6d235f57ed -
Trigger Event:
release
-
Statement type:
File details
Details for the file docsumm_ai-0.1.0-py3-none-any.whl.
File metadata
- Download URL: docsumm_ai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5469de08b76e652ec9b5ee26e05f8589bfd4379e0ef5d738ae39c59ff0feeaaa
|
|
| MD5 |
74169186513f608c14722f462be282de
|
|
| BLAKE2b-256 |
ba281401f16c35a1d9a7cce7341a123a0e30eab7956b52f76e543c50cef32497
|
Provenance
The following attestation bundles were made for docsumm_ai-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on RohitRajdev/docsumm-ai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
docsumm_ai-0.1.0-py3-none-any.whl -
Subject digest:
5469de08b76e652ec9b5ee26e05f8589bfd4379e0ef5d738ae39c59ff0feeaaa - Sigstore transparency entry: 585585746
- Sigstore integration time:
-
Permalink:
RohitRajdev/docsumm-ai@ae5bd4ae1c9f667d533e66275f94fc6d235f57ed -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/RohitRajdev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ae5bd4ae1c9f667d533e66275f94fc6d235f57ed -
Trigger Event:
release
-
Statement type: