Agent for extracting structured content from PDFs using LangGraph
Project description
PDFAgent
An agent for extracting structured content from PDFs using LangGraph and OpenAI.
Features
- Extract and format text content from PDFs
- Convert tables to markdown format
- Extract images with AI-generated descriptions
- Use LangGraph for agent-based orchestration
Setup
# Install Poetry if you don't have it
curl -sSL https://install.python-poetry.org | python3 -
# Install dependencies
poetry install
# Or install from pypi
pip install pdf_mind
# Install other dependencies
brew install ghostscript
brew install poppler
# apt install ghostscript poppler
Usage
from pdf_mind import PDFExtractionAgent
agent = PDFExtractionAgent()
result = agent.process("path/to/document.pdf")
print(result)
Alternatively, look at example.py for an example that will output metadata on extracted items and token usage:
Development
# Run tests
poetry run pytest
# Lint code
poetry run ruff check .
poetry run black .
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pdf_mind-0.1.1.tar.gz
(10.1 kB
view details)
Built Distribution
pdf_mind-0.1.1-py3-none-any.whl
(14.0 kB
view details)
File details
Details for the file pdf_mind-0.1.1.tar.gz
.
File metadata
- Download URL: pdf_mind-0.1.1.tar.gz
- Upload date:
- Size: 10.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.10.16 Linux/6.8.0-1021-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6930ca6fb96470c3d66fd008afe9003b7332f9c2833c7a93b20aa30da69e23aa |
|
MD5 | 892593b2ae1985a96b416c6a2ca47959 |
|
BLAKE2b-256 | ddf0c10b33a520d98187fbd7539ab6a7dac58d747bcd4fb8b0f51b272e2afbbe |
File details
Details for the file pdf_mind-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: pdf_mind-0.1.1-py3-none-any.whl
- Upload date:
- Size: 14.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.10.16 Linux/6.8.0-1021-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9fd6e374d9df02d77dfc8a29e82d775673998edce04ece2c91360f44e74d5c85 |
|
MD5 | 628c1d2f896a7da8fce398084c5c8042 |
|
BLAKE2b-256 | 592d9df4d09fb1309c631a0e0c0f29b793126716e289822fa9be931269fc7d53 |