Skip to main content

Agent for extracting structured content from PDFs using LangGraph

Project description

PDFAgent

An agent for extracting structured content from PDFs using LangGraph and OpenAI.

Features

  • Extract and format text content from PDFs
  • Convert tables to markdown format
  • Extract images with AI-generated descriptions
  • Use LangGraph for agent-based orchestration

Setup

# Install Poetry if you don't have it
curl -sSL https://install.python-poetry.org | python3 -

# Install dependencies
poetry install
# Or install from pypi
pip install pdf_mind

# Install other dependencies
brew install ghostscript
brew install poppler
# apt install ghostscript poppler

Usage

from pdf_mind import PDFExtractionAgent

agent = PDFExtractionAgent()
result = agent.process("path/to/document.pdf")
print(result)

Alternatively, look at example.py for an example that will output metadata on extracted items and token usage:

Development

# Run tests
poetry run pytest

# Lint code
poetry run ruff check .
poetry run black .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_mind-0.1.1.tar.gz (10.1 kB view details)

Uploaded Source

Built Distribution

pdf_mind-0.1.1-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file pdf_mind-0.1.1.tar.gz.

File metadata

  • Download URL: pdf_mind-0.1.1.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.10.16 Linux/6.8.0-1021-azure

File hashes

Hashes for pdf_mind-0.1.1.tar.gz
Algorithm Hash digest
SHA256 6930ca6fb96470c3d66fd008afe9003b7332f9c2833c7a93b20aa30da69e23aa
MD5 892593b2ae1985a96b416c6a2ca47959
BLAKE2b-256 ddf0c10b33a520d98187fbd7539ab6a7dac58d747bcd4fb8b0f51b272e2afbbe

See more details on using hashes here.

File details

Details for the file pdf_mind-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pdf_mind-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 14.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.10.16 Linux/6.8.0-1021-azure

File hashes

Hashes for pdf_mind-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9fd6e374d9df02d77dfc8a29e82d775673998edce04ece2c91360f44e74d5c85
MD5 628c1d2f896a7da8fce398084c5c8042
BLAKE2b-256 592d9df4d09fb1309c631a0e0c0f29b793126716e289822fa9be931269fc7d53

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page