Lightweight text summarizer with intelligent chunking and token counting for multiple LLM providers
Project description
Text Summarizer - Minimal & Focused
A simple, production-ready text summarization library.
Input: Long text
Output: Concise summary
Quick Start
from text_summarizer_gi import TextSummarizer
summarizer = TextSummarizer()
result = summarizer.summarize("Your long text here...")
print(result.summary)
Setup
# 1. Install dependencies
pip install openai
# 2. Set Azure credentials
export AZURE_OPENAI_API_KEY="your-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o-mini"
Features
- ✓ Simple, focused API
- ✓ Intelligent text chunking (sentence-based)
- ✓ Token counting
- ✓ Multiple summary types: short, medium, detailed
- ✓ Multiple tones: neutral, formal, casual
- ✓ Compression tracking
- ✓ Error handling with fallbacks
Basic Usage
Simple Summarization
from text_summarizer_gi import TextSummarizer
summarizer = TextSummarizer()
text = "Your long text..."
# Short summary (25% of original)
result = summarizer.summarize(text, summary_type="short")
print(result.summary)
# Medium summary (40% of original)
result = summarizer.summarize(text, summary_type="medium")
print(result.summary)
# Detailed summary (60% of original)
result = summarizer.summarize(text, summary_type="detailed")
print(result.summary)
With Different Tones
# Formal tone
result = summarizer.summarize(text, tone="formal")
# Casual tone
result = summarizer.summarize(text, tone="casual")
# Neutral tone (default)
result = summarizer.summarize(text, tone="neutral")
Token Counting
from text_summarizer_gi import count_tokens
tokens = count_tokens("Your text...")
print(f"Token count: {tokens}")
Text Chunking
from text_summarizer_gi import chunk_text, chunk_text_by_sentences
# Character-based chunking
chunks = chunk_text("Your text...", chunk_size=3000)
# Sentence-based chunking (better for summarization)
chunks = chunk_text_by_sentences("Your text...")
Result Object
result = summarizer.summarize(text)
result.summary # The summarized text
result.input_tokens # Input token count
result.output_tokens # Output token count
result.compression_ratio # Output reduction percentage
Testing
# Run basic test
python test_basic.py
# Check imports
python -c "from text_summarizer_gi import TextSummarizer; print('✓ OK')"
Project Structure
text_summarizer_gi/
├── __init__.py # Exports
├── summarizer.py # Main TextSummarizer class
├── prompts.py # Summarization prompts
├── chunking.py # Text chunking utilities
├── token_counter.py # Token counting
└── utils.py # Helper functions
test_basic.py # Basic test
pyproject.toml # Package config
README.md # This file
LICENSE # MIT License
How It Works
- Input Validation - Check text is not empty
- Token Counting - Count tokens in input
- Chunking - Split large texts into manageable chunks (sentence-based)
- Summarization - Send each chunk to Azure OpenAI with clear instructions
- Combination - If multiple chunks, combine and re-summarize
- Output - Return summary with compression statistics
Key Features
Smart Chunking
- Sentence-based chunking preserves context better than character-based
- Each chunk is processed independently for better quality
- Multiple summaries are combined and re-summarized
Clear Prompts
- Explicit instruction to create REAL summaries, not copies
- Target length guidance (short/medium/detailed)
- Tone customization
- Lower temperature (0.5) for more focused output
Compression Tracking
- Input and output token counts
- Compression ratio shows effectiveness
- Helps optimize summary type selection
Error Handling
- Empty responses → fallback to first sentences
- API errors → logged with fallback
- Invalid input → clear error messages
Dependencies
openai>=1.0.0- For Azure OpenAI API
License
MIT License - See LICENSE file
Version
1.0.0 - Clean, focused implementation
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file text_summarizer_gi-1.0.0.tar.gz.
File metadata
- Download URL: text_summarizer_gi-1.0.0.tar.gz
- Upload date:
- Size: 9.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf4eba9e8e5f5aaeb7a17bdc3050ad4070c4422c844318fbd9a68eb769dd1b74
|
|
| MD5 |
1ae3bbc58a5bb1cd5dc541fb6bec8395
|
|
| BLAKE2b-256 |
c5fe8a688101885aaf94991c65ad66d37ee1280144d54001c8ec1dd5d67269eb
|
File details
Details for the file text_summarizer_gi-1.0.0-py3-none-any.whl.
File metadata
- Download URL: text_summarizer_gi-1.0.0-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2fdfcd79074c60d84bbfe79f10624f4dc4af31021510c3586ce51bf1cb3dfd8
|
|
| MD5 |
53c7d1e6b8c4a2936ac2512ecb9f703d
|
|
| BLAKE2b-256 |
b1ff59e061170ff113e54566b74ab897d29da0ec8c3570c6b1bf08369c9c6d69
|