Lightweight text summarizer with token budgets and smart chunking
Project description
gi_text_summarizer
A lightweight, production-ready Python library for summarizing text with any OpenAI-compatible LLM.
Features
✅ Automatic text chunking for long documents ✅ Token budget guardrails (summaries always shorter than input) ✅ Support for multiple summary lengths, tones, output formats ✅ Works with OpenAI, Azure OpenAI, or custom LLM endpoints ✅ Built-in token estimation (no external dependencies) ✅ Production-grade error handling and logging
Installation
pip install gi-text-summarizer
Quick Start
from gi_text_summarizer import TextSummarizer
summarizer = TextSummarizer(api_key="sk-...")
result = summarizer.summarize(
text="Your long document here...",
summary_type="medium", # "short" | "medium" | "detailed"
tone="neutral", # "neutral" | "formal" | "casual"
focus_area="general",
output_format="text", # "text" | "bullets" | "json"
)
print(result.summary)
print(result.compression) # e.g., "68% shorter"
print(result) # Pretty-printed with all details
API Reference
TextSummarizer
TextSummarizer(
api_key=None, # OpenAI/Azure API key (or env vars)
azure_endpoint=None, # Azure endpoint
deployment_name=None, # Azure deployment
model="gpt-4o-mini", # Model name
provider="auto", # "openai" | "azure" | "custom"
)
summarize()
result = summarizer.summarize(
text: str,
summary_type="medium", # "short" (20%) | "medium" (40%) | "detailed" (60%)
tone="neutral", # "neutral" | "formal" | "casual"
focus_area="general", # Any string
output_format="text", # "text" | "bullets" | "json"
chunk_strategy="character",# "character" | "sentence"
chunk_size=3000, # Characters per chunk
) -> SummaryResult
SummaryResult
result.summary # str: Generated summary
result.input_tokens # int: Tokens in original
result.output_tokens # int: Tokens in summary
result.compression # str: "X% shorter"
result.num_chunks # int: Number of chunks
Token Budget Guardrails
Automatically calculates token caps to ensure summaries are shorter:
| Type | Ratio | Use Case |
|---|---|---|
short |
20% | Core idea (3-4 sentences) |
medium |
40% | Executive summary |
detailed |
60% | Comprehensive summary |
Chunking & Multi-Document Summarization
For longer texts:
- Text is split into chunks
- Each chunk is summarized
- Summaries are combined and re-summarized
result = summarizer.summarize(
text=long_document,
chunk_strategy="sentence", # Better quality
chunk_size=3000,
)
Environment Variables
# OpenAI
export OPENAI_API_KEY="sk-..."
# Azure OpenAI
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o-mini"
Standalone Token Counter
from gi_text_summarizer import count_tokens
tokens = count_tokens("Your text here...")
Examples
Executive Summary (Formal)
result = summarizer.summarize(
text=financial_report,
summary_type="medium",
tone="formal",
focus_area="financial",
output_format="bullets",
)
Technical Highlight (JSON)
result = summarizer.summarize(
text=documentation,
summary_type="detailed",
tone="neutral",
focus_area="technical",
output_format="json",
)
Ultra-Concise (20% of input)
result = summarizer.summarize(
text=research_paper,
summary_type="short",
)
Publishing to PyPI
pip install build twine
python -m build
twine upload dist/*
License
MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gi_text_summarizer-1.0.0.tar.gz.
File metadata
- Download URL: gi_text_summarizer-1.0.0.tar.gz
- Upload date:
- Size: 8.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a6268aee1f456e9df31c5d5edad3a2040e0650199b601f476695c2c3d0c215fe
|
|
| MD5 |
36b1063ce0def0b18f1a2cdfdcadea83
|
|
| BLAKE2b-256 |
ee86d089bcc2eef2c59d5cf6f5343d59f19b4b5f62f02f7bee177c2395db305f
|
File details
Details for the file gi_text_summarizer-1.0.0-py3-none-any.whl.
File metadata
- Download URL: gi_text_summarizer-1.0.0-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4912b684ee1d3bef3dd70709fa5a057465352ae981f29aadc24d32965d8e131c
|
|
| MD5 |
b1136d800bde5b82a0b04d885c034243
|
|
| BLAKE2b-256 |
f74c2c1aa980f0c5710dfa02ac735ff5a21d5432be7cafd25cf260f9a3cb1b82
|