Skip to main content

Convert PDF to text for token-efficient AI agent reading

Project description

readpdf

Convert PDFs to text for token-efficient AI agent reading.

Why

Reading PDFs with vision costs 100–500 tokens per page. Converting to text first lets any AI agent read only the sections it needs — saving up to 90% of tokens.

Install

pip install readpdf

Requires pdftotext:

# macOS
brew install poppler

# Linux
apt install poppler-utils

Usage

# Convert to file, then read selectively
readpdf paper.pdf -o paper.txt

# Extract a single page
readpdf paper.pdf -o paper.txt -p 3

# Extract a page range
readpdf paper.pdf -o paper.txt --pages 3-7

# Print to stdout
readpdf paper.pdf

Works with any AI agent that has shell access: Claude, GPT-4, Gemini, Cursor, etc.

How it works

Without readpdf:
  PDF → AI vision → ~200 tokens/page × 30 pages = 6,000 tokens (full doc, every time)

With readpdf:
  PDF → pdftotext → paper.txt (on disk, 0 tokens)
  AI reads only offset/limit chunks it needs → ~300 tokens total

Step 1. readpdf paper.pdf -o paper.txt runs pdftotext locally — no AI tokens consumed.

Step 2. The AI uses a file-reading tool (e.g. Read with offset/limit) to load only the relevant lines. Because the text file already exists on disk, the AI never pays the cost of processing the entire PDF.

Why not MCP? MCP tool results return the full content back into the AI's context window — same cost as reading directly. A disk file lets the AI pull exactly the slice it needs, nothing more.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

readpdf_cli-0.1.0-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file readpdf_cli-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: readpdf_cli-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 3.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for readpdf_cli-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c9e7338a0266aae03f1389665d2d42db396a54ee83340bd2dfee06c6057781a2
MD5 d5b459a39f898e7adcdc9bdcbde9554e
BLAKE2b-256 ec7fa3e18df07c89eb4599b6ae8d3f1419db04e2e0b5ed76574b3f6d7431dd9c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page