Skip to main content

Lightweight text summarizer with intelligent chunking and token counting for multiple LLM providers

Project description

Text Summarizer - Minimal & Focused

A simple, production-ready text summarization library.

Input: Long text
Output: Concise summary

Quick Start

from text_summarizer_gi import TextSummarizer

summarizer = TextSummarizer()
result = summarizer.summarize("Your long text here...")
print(result.summary)

Setup

# 1. Install dependencies
pip install openai

# 2. Set Azure credentials
export AZURE_OPENAI_API_KEY="your-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o-mini"

Features

  • ✓ Simple, focused API
  • ✓ Intelligent text chunking (sentence-based)
  • ✓ Token counting
  • ✓ Multiple summary types: short, medium, detailed
  • ✓ Multiple tones: neutral, formal, casual
  • ✓ Compression tracking
  • ✓ Error handling with fallbacks

Basic Usage

Simple Summarization

from text_summarizer_gi import TextSummarizer

summarizer = TextSummarizer()

text = "Your long text..."

# Short summary (25% of original)
result = summarizer.summarize(text, summary_type="short")
print(result.summary)

# Medium summary (40% of original)
result = summarizer.summarize(text, summary_type="medium")
print(result.summary)

# Detailed summary (60% of original)
result = summarizer.summarize(text, summary_type="detailed")
print(result.summary)

With Different Tones

# Formal tone
result = summarizer.summarize(text, tone="formal")

# Casual tone
result = summarizer.summarize(text, tone="casual")

# Neutral tone (default)
result = summarizer.summarize(text, tone="neutral")

Token Counting

from text_summarizer_gi import count_tokens

tokens = count_tokens("Your text...")
print(f"Token count: {tokens}")

Text Chunking

from text_summarizer_gi import chunk_text, chunk_text_by_sentences

# Character-based chunking
chunks = chunk_text("Your text...", chunk_size=3000)

# Sentence-based chunking (better for summarization)
chunks = chunk_text_by_sentences("Your text...")

Result Object

result = summarizer.summarize(text)

result.summary              # The summarized text
result.input_tokens         # Input token count
result.output_tokens        # Output token count
result.compression_ratio    # Output reduction percentage

Testing

# Run basic test
python test_basic.py

# Check imports
python -c "from text_summarizer_gi import TextSummarizer; print('✓ OK')"

Project Structure

text_summarizer_gi/
├── __init__.py           # Exports
├── summarizer.py         # Main TextSummarizer class
├── prompts.py            # Summarization prompts
├── chunking.py           # Text chunking utilities
├── token_counter.py      # Token counting
└── utils.py              # Helper functions

test_basic.py            # Basic test
pyproject.toml           # Package config
README.md                # This file
LICENSE                  # MIT License

How It Works

  1. Input Validation - Check text is not empty
  2. Token Counting - Count tokens in input
  3. Chunking - Split large texts into manageable chunks (sentence-based)
  4. Summarization - Send each chunk to Azure OpenAI with clear instructions
  5. Combination - If multiple chunks, combine and re-summarize
  6. Output - Return summary with compression statistics

Key Features

Smart Chunking

  • Sentence-based chunking preserves context better than character-based
  • Each chunk is processed independently for better quality
  • Multiple summaries are combined and re-summarized

Clear Prompts

  • Explicit instruction to create REAL summaries, not copies
  • Target length guidance (short/medium/detailed)
  • Tone customization
  • Lower temperature (0.5) for more focused output

Compression Tracking

  • Input and output token counts
  • Compression ratio shows effectiveness
  • Helps optimize summary type selection

Error Handling

  • Empty responses → fallback to first sentences
  • API errors → logged with fallback
  • Invalid input → clear error messages

Dependencies

  • openai>=1.0.0 - For Azure OpenAI API

License

MIT License - See LICENSE file

Version

1.0.0 - Clean, focused implementation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text_summarizer_gi-1.0.0.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

text_summarizer_gi-1.0.0-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file text_summarizer_gi-1.0.0.tar.gz.

File metadata

  • Download URL: text_summarizer_gi-1.0.0.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for text_summarizer_gi-1.0.0.tar.gz
Algorithm Hash digest
SHA256 cf4eba9e8e5f5aaeb7a17bdc3050ad4070c4422c844318fbd9a68eb769dd1b74
MD5 1ae3bbc58a5bb1cd5dc541fb6bec8395
BLAKE2b-256 c5fe8a688101885aaf94991c65ad66d37ee1280144d54001c8ec1dd5d67269eb

See more details on using hashes here.

File details

Details for the file text_summarizer_gi-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for text_summarizer_gi-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d2fdfcd79074c60d84bbfe79f10624f4dc4af31021510c3586ce51bf1cb3dfd8
MD5 53c7d1e6b8c4a2936ac2512ecb9f703d
BLAKE2b-256 b1ff59e061170ff113e54566b74ab897d29da0ec8c3570c6b1bf08369c9c6d69

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page