A tool to cut PDFs into pieces and process them with LLMs in batches

These details have not been verified by PyPI

Project links

Project description

Corte

A tool to cut PDFs into pieces and process them with LLMs in batches.

Features

Split PDFs into manageable chunks
Process chunks with various LLM providers (OpenAI, Anthropic, LangChain models)
Support for local models via Ollama and cloud providers via LangChain
Batch processing with configurable concurrency
Rate limiting for API calls
Command-line interface
Combine results from multiple chunks

Installation

# Basic installation
pip install corte

# With LangChain support for additional providers
pip install corte[langchain]

# Full installation with all optional dependencies
pip install corte[all]

Usage

Command Line Interface

Split a PDF into chunks

corte split document.pdf --chunk-size 5 --output-dir ./chunks

Process a PDF with an LLM

# Using OpenAI directly
corte process document.pdf --provider openai --prompt "Summarize this content" --output results.json

# Using Anthropic via LangChain
corte process document.pdf --provider langchain_anthropic --model claude-3-opus-20240229 --prompt "Summarize this content" --output results.json

# Using local Ollama model
corte process document.pdf --provider ollama --model llama3 --prompt "Summarize this content" --output results.json

# Using Google Gemini via LangChain
corte process document.pdf --provider langchain_google --model gemini-1.5-pro --prompt "Summarize this content" --output results.json

Get PDF information

corte info document.pdf

List available providers

corte providers

Check LangChain integration status

corte langchain

Python API

from corte import PDFProcessor, LLMClientFactory, BatchProcessor

# Create processors
pdf_processor = PDFProcessor(chunk_size=5)

# Use direct OpenAI integration
llm_client = LLMClientFactory.create_client("openai")

# Or use LangChain providers
llm_client = LLMClientFactory.create_client("langchain_anthropic", model="claude-3-opus-20240229")

# Or use local Ollama model
llm_client = LLMClientFactory.create_client("ollama", model="llama3")

batch_processor = BatchProcessor(llm_client, pdf_processor)

# Process PDF
results = batch_processor.process_pdf_batch(
    pdf_path="document.pdf",
    prompt="Summarize this content"
)

# Combine results
combined_text = batch_processor.combine_results(results)

Configuration

Set your API keys as environment variables:

# For direct integrations
export OPENAI_API_KEY="your-openai-key"
export ANTHROPIC_API_KEY="your-anthropic-key"

# For LangChain integrations (same keys work)
export GOOGLE_API_KEY="your-google-key"  # For Google Gemini

# Ollama runs locally, no API key needed

Or use a .env file in your project directory.

Requirements

Python 3.8+
PyPDF2 for PDF processing
OpenAI Python client
Anthropic Python client
Click for CLI interface
LangChain (optional, for additional provider support)

LangChain Providers

When you install corte[langchain], you get access to:

langchain_openai: OpenAI models via LangChain (alternative to direct integration)
langchain_anthropic: Anthropic Claude models via LangChain
langchain_google: Google Gemini models
ollama: Local models via Ollama (requires Ollama to be running)

License

MIT License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Aug 22, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

corte-0.1.0.tar.gz (12.1 kB view details)

Uploaded Aug 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

corte-0.1.0-py3-none-any.whl (11.7 kB view details)

Uploaded Aug 22, 2025 Python 3

File details

Details for the file corte-0.1.0.tar.gz.

File metadata

Download URL: corte-0.1.0.tar.gz
Upload date: Aug 22, 2025
Size: 12.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for corte-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2a32c9e27a181b922889263887b7826ededc3d4a4aa091e9b9475eea629ba005`
MD5	`02c23b642f5291af11cc23bbf9d9effd`
BLAKE2b-256	`eb829d4119ec7086ca459114b969093bd823ac240c3b5bd016ecc056d7628a96`

See more details on using hashes here.

File details

Details for the file corte-0.1.0-py3-none-any.whl.

File metadata

Download URL: corte-0.1.0-py3-none-any.whl
Upload date: Aug 22, 2025
Size: 11.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for corte-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`522ac26b950c84fb16db0ee9986a5fd76cf2af2c5e9bdd9e2aeebc26c593edad`
MD5	`5dcb86d73126e972c2fc8481a65666c3`
BLAKE2b-256	`b063f5f7b9fff9451d93ac516282ae00b3f8e62ed70e71e0178320e89259035e`

See more details on using hashes here.

corte 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Corte

Features

Installation

Usage

Command Line Interface

Split a PDF into chunks

Process a PDF with an LLM

Get PDF information

List available providers

Check LangChain integration status

Python API

Configuration

Requirements

LangChain Providers

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes