Skip to main content

Agent for extracting structured content from PDFs using LangGraph

Project description

PDFMind

An agent for extracting structured content from PDFs using LangGraph and OpenAI.

Features

  • Extract and format text content from PDFs
  • Convert tables to markdown format
  • Extract images with AI-generated descriptions
  • Use LangGraph for agent-based orchestration

Setup

# Install Poetry if you don't have it
curl -sSL https://install.python-poetry.org | python3 -

# Install dependencies
poetry install

# Install other dependencies
brew install ghostscript
brew install poppler
# apt install ghostscript poppler

Usage

from pdf_agent import PDFExtractionAgent

agent = PDFExtractionAgent()
result = agent.process("path/to/document.pdf")
print(result)

Alternatively, look at example.py for an example that will output metadata on extracted items and token usage:

Development

# Run tests
poetry run pytest

# Lint code
poetry run ruff check .
poetry run black .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_mind-0.1.0.tar.gz (10.1 kB view details)

Uploaded Source

Built Distribution

pdf_mind-0.1.0-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file pdf_mind-0.1.0.tar.gz.

File metadata

  • Download URL: pdf_mind-0.1.0.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.13.1 Darwin/24.3.0

File hashes

Hashes for pdf_mind-0.1.0.tar.gz
Algorithm Hash digest
SHA256 006d18bea4c863abe0c739eecd8a7860231c7e466446bea2d9598ef85fe5daaa
MD5 2f1ae4eccd6b1e9fa053b9ff7f41efb1
BLAKE2b-256 603aa25e9b2be54d88793f963d25ef921bf762d9a8c703aef57379f960419ebf

See more details on using hashes here.

File details

Details for the file pdf_mind-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pdf_mind-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.13.1 Darwin/24.3.0

File hashes

Hashes for pdf_mind-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6941da55372447edd48b2b71c603402196c893ba93d1746a42d0b44168015941
MD5 bf35d908b5e144dd8beb446b0d0a00f3
BLAKE2b-256 9c306435799aae64a55bc5c3e35b7cf70389f0804c3189b36d3d61061de482f2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page