Skip to main content

Lightweight text summarizer with token budgets and smart chunking

Project description

gi_text_summarizer

A lightweight, production-ready Python library for summarizing text with any OpenAI-compatible LLM.

Features

✅ Automatic text chunking for long documents ✅ Token budget guardrails (summaries always shorter than input) ✅ Support for multiple summary lengths, tones, output formats ✅ Works with OpenAI, Azure OpenAI, or custom LLM endpoints ✅ Built-in token estimation (no external dependencies) ✅ Production-grade error handling and logging

Installation

pip install gi-text-summarizer

Quick Start

from gi_text_summarizer import TextSummarizer

summarizer = TextSummarizer(api_key="sk-...")

result = summarizer.summarize(
    text="Your long document here...",
    summary_type="medium",    # "short" | "medium" | "detailed"
    tone="neutral",           # "neutral" | "formal" | "casual"
    focus_area="general",
    output_format="text",     # "text" | "bullets" | "json"
)

print(result.summary)
print(result.compression)  # e.g., "68% shorter"
print(result)              # Pretty-printed with all details

API Reference

TextSummarizer

TextSummarizer(
    api_key=None,              # OpenAI/Azure API key (or env vars)
    azure_endpoint=None,       # Azure endpoint
    deployment_name=None,      # Azure deployment
    model="gpt-4o-mini",       # Model name
    provider="auto",           # "openai" | "azure" | "custom"
)

summarize()

result = summarizer.summarize(
    text: str,
    summary_type="medium",     # "short" (20%) | "medium" (40%) | "detailed" (60%)
    tone="neutral",            # "neutral" | "formal" | "casual"
    focus_area="general",      # Any string
    output_format="text",      # "text" | "bullets" | "json"
    chunk_strategy="character",# "character" | "sentence"
    chunk_size=3000,           # Characters per chunk
) -> SummaryResult

SummaryResult

result.summary          # str: Generated summary
result.input_tokens     # int: Tokens in original
result.output_tokens    # int: Tokens in summary
result.compression      # str: "X% shorter"
result.num_chunks       # int: Number of chunks

Token Budget Guardrails

Automatically calculates token caps to ensure summaries are shorter:

Type Ratio Use Case
short 20% Core idea (3-4 sentences)
medium 40% Executive summary
detailed 60% Comprehensive summary

Chunking & Multi-Document Summarization

For longer texts:

  1. Text is split into chunks
  2. Each chunk is summarized
  3. Summaries are combined and re-summarized
result = summarizer.summarize(
    text=long_document,
    chunk_strategy="sentence",  # Better quality
    chunk_size=3000,
)

Environment Variables

# OpenAI
export OPENAI_API_KEY="sk-..."

# Azure OpenAI
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o-mini"

Standalone Token Counter

from gi_text_summarizer import count_tokens

tokens = count_tokens("Your text here...")

Examples

Executive Summary (Formal)

result = summarizer.summarize(
    text=financial_report,
    summary_type="medium",
    tone="formal",
    focus_area="financial",
    output_format="bullets",
)

Technical Highlight (JSON)

result = summarizer.summarize(
    text=documentation,
    summary_type="detailed",
    tone="neutral",
    focus_area="technical",
    output_format="json",
)

Ultra-Concise (20% of input)

result = summarizer.summarize(
    text=research_paper,
    summary_type="short",
)

Publishing to PyPI

pip install build twine
python -m build
twine upload dist/*

License

MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gi_text_summarizer-1.0.0.tar.gz (8.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gi_text_summarizer-1.0.0-py3-none-any.whl (8.8 kB view details)

Uploaded Python 3

File details

Details for the file gi_text_summarizer-1.0.0.tar.gz.

File metadata

  • Download URL: gi_text_summarizer-1.0.0.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for gi_text_summarizer-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a6268aee1f456e9df31c5d5edad3a2040e0650199b601f476695c2c3d0c215fe
MD5 36b1063ce0def0b18f1a2cdfdcadea83
BLAKE2b-256 ee86d089bcc2eef2c59d5cf6f5343d59f19b4b5f62f02f7bee177c2395db305f

See more details on using hashes here.

File details

Details for the file gi_text_summarizer-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for gi_text_summarizer-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4912b684ee1d3bef3dd70709fa5a057465352ae981f29aadc24d32965d8e131c
MD5 b1136d800bde5b82a0b04d885c034243
BLAKE2b-256 f74c2c1aa980f0c5710dfa02ac735ff5a21d5432be7cafd25cf260f9a3cb1b82

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page