A flexible text summarization library to summarize long documents supporting multiple LLM providers

These details have not been verified by PyPI

Project description

long2short

long2short is a flexible Python library for long document text summarization that supports multiple Language Model (LLM) providers. It allows you to summarize long documents with fine-grained control over the level of detail. With an extensible architecture, it’s easy to integrate with various LLMs and customize its behavior.

Features

Multi-LLM Support: Compatible with OpenAI, Anthropic, and custom LLM providers.
Detail Control: Adjust the level of detail in the summary with a simple parameter.
Smart Chunking: Automatically splits and processes large texts based on token limits.
Recursive Summarization: Uses previous summaries as context for summarizing subsequent sections.
Custom Instructions: Add domain-specific instructions for tailored summarization.
Progress Tracking: Visualize progress with tqdm.
Extensible Design: Add new LLM providers or customize existing ones with ease.

Installation

Install the library using pip:

pip install long2short

Quick Start

Here’s how to get started with long2short using OpenAI as the LLM provider:

from long2short import Long2Short, OpenAIProvider

# Initialize the provider
provider = OpenAIProvider(api_key="your-api-key")
summarizer = Long2Short(provider)

# Summarize text
text = "Your long text here..."
summary = summarizer.summarize(text, detail=0.5)
print(summary)

Using Different Providers

OpenAI

To use OpenAI’s GPT models:

from long2short import Long2Short, OpenAIProvider

provider = OpenAIProvider(
    api_key="your-openai-api-key",
    model="gpt-4-turbo"  # Specify your preferred model
)
summarizer = Long2Short(provider)

Anthropic (Claude)

To use Anthropic’s Claude models:

from long2short import Long2Short, AnthropicProvider

provider = AnthropicProvider(
    api_key="your-anthropic-api-key",
    model="claude-3-opus-20240229"  # Specify your preferred model
)
summarizer = Long2Short(provider)

Controlling Summary Detail

The detail parameter allows you to adjust how detailed the summary should be:

# Generate a brief, high-level summary
brief_summary = summarizer.summarize(text, detail=0)

# Generate a detailed, in-depth summary
detailed_summary = summarizer.summarize(text, detail=1)

Advanced Features

Recursive Summarization

Enable recursive summarization to use previous summaries as context for generating new ones:

summary = summarizer.summarize(
    text,
    detail=0.5,
    summarize_recursively=True
)

Custom Instructions

Tailor the summary with additional instructions:

summary = summarizer.summarize(
    text,
    detail=0.5,
    additional_instructions="Focus on numerical data and statistics."
)

Smart Text Chunking

Large texts are automatically split into manageable chunks based on token limits, ensuring efficient processing. You can control:

Minimum chunk size (minimum_chunk_size)
Chunk delimiters (chunk_delimiter)
Headers for each chunk (header)

Example:

summary = summarizer.summarize(
    text,
    detail=0.7,
    minimum_chunk_size=500,
    chunk_delimiter=".",
    header="Section Summary"
)

Verbose Output

Enable detailed logging to track the summarization process:

summary = summarizer.summarize(
    text,
    detail=0.5,
    verbose=True
)

Handling Dropped Chunks

The library ensures that excessively large chunks are skipped, and any dropped chunks are logged (if verbose mode is enabled). This prevents token overflow issues while maintaining efficient processing.

Creating Custom Providers

You can implement custom LLM providers by extending the LLMProvider abstract base class:

from long2short import LLMProvider

class CustomProvider(LLMProvider):
    def __init__(self, **kwargs):
        # Initialize your provider
        pass

    def generate_completion(self, messages: list, **kwargs) -> str:
        # Implement completion generation logic
        return "Custom completion response"

Integrate the custom provider into Long2Short:

custom_provider = CustomProvider()
summarizer = Long2Short(custom_provider)

Progress Tracking

The summarization process supports tqdm for real-time progress tracking:

summary = summarizer.summarize(
    text,
    detail=0.5,
    verbose=True
)

Extensibility

Adding New Features

Extend functionality by overriding or extending the Long2Short class.
Customize tokenization or chunking behavior by modifying Tokenizer or TextChunker classes.

Contributing

Contributions are welcome! Whether it’s reporting a bug, suggesting new features, or submitting a pull request, your help is appreciated.

To contribute:

Fork the repository.
Create a feature branch.
Submit a pull request.

Example Usage

from long2short import Long2Short, OpenAIProvider

# Initialize with OpenAI
provider = OpenAIProvider(api_key="your-api-key")
summarizer = Long2Short(provider)

# Summarize with custom instructions
text = "Your long document here..."
summary = summarizer.summarize(
    text,
    detail=0.8,
    additional_instructions="Focus on the key takeaways and technical details."
)

print("Summary:")
print(summary)

Attribution

This project heavily references code and ideas from the OpenAI Cookbook.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.4

Jan 23, 2025

0.1.3

Jan 23, 2025

0.1.2

Jan 21, 2025

0.1.1

Jan 21, 2025

This version

0.1.0

Jan 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

long2short-0.1.0.tar.gz (6.7 kB view details)

Uploaded Jan 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

long2short-0.1.0-py3-none-any.whl (7.5 kB view details)

Uploaded Jan 21, 2025 Python 3

File details

Details for the file long2short-0.1.0.tar.gz.

File metadata

Download URL: long2short-0.1.0.tar.gz
Upload date: Jan 21, 2025
Size: 6.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for long2short-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`36ebd96023c0ce6d1daca19c3c6a4176e401b84204bf6a5a8bf8cafb7bfe7a3b`
MD5	`91a319f05b19e7b3ca3e0c64af255c2c`
BLAKE2b-256	`d74adca5d56ac05b25baa513e0b1050f70803a4bdf279a15cfa246aad376e28b`

See more details on using hashes here.

File details

Details for the file long2short-0.1.0-py3-none-any.whl.

File metadata

Download URL: long2short-0.1.0-py3-none-any.whl
Upload date: Jan 21, 2025
Size: 7.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for long2short-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`273dd8abb90697f00d22f4cb6cccb510e5ed77983135d6d58c6e84dc83db2837`
MD5	`c415ca0c7a987cc99c2ea7c6ac0341c9`
BLAKE2b-256	`718ccc313ed919cb3dc6b6a56116162cb878f54c4bb1c784746fe0d89844ea1c`

See more details on using hashes here.

long2short 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

long2short

Features

Installation

Quick Start

Using Different Providers

OpenAI

Anthropic (Claude)

Controlling Summary Detail

Advanced Features

Recursive Summarization

Custom Instructions

Smart Text Chunking

Verbose Output

Handling Dropped Chunks

Creating Custom Providers

Progress Tracking

Extensibility

Adding New Features

Contributing

Example Usage

Attribution

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes