Agent for extracting structured content from PDFs using LangGraph
Project description
PDFMind
An agent for extracting structured content from PDFs using LangGraph and OpenAI.
Features
- Extract and format text content from PDFs
- Convert tables to markdown format
- Extract images with AI-generated descriptions
- Use LangGraph for agent-based orchestration
Setup
# Install Poetry if you don't have it
curl -sSL https://install.python-poetry.org | python3 -
# Install dependencies
poetry install
# Install other dependencies
brew install ghostscript
brew install poppler
# apt install ghostscript poppler
Usage
from pdf_agent import PDFExtractionAgent
agent = PDFExtractionAgent()
result = agent.process("path/to/document.pdf")
print(result)
Alternatively, look at example.py for an example that will output metadata on extracted items and token usage:
Development
# Run tests
poetry run pytest
# Lint code
poetry run ruff check .
poetry run black .
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pdf_mind-0.1.0.tar.gz
(10.1 kB
view details)
Built Distribution
pdf_mind-0.1.0-py3-none-any.whl
(14.0 kB
view details)
File details
Details for the file pdf_mind-0.1.0.tar.gz
.
File metadata
- Download URL: pdf_mind-0.1.0.tar.gz
- Upload date:
- Size: 10.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.13.1 Darwin/24.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 006d18bea4c863abe0c739eecd8a7860231c7e466446bea2d9598ef85fe5daaa |
|
MD5 | 2f1ae4eccd6b1e9fa053b9ff7f41efb1 |
|
BLAKE2b-256 | 603aa25e9b2be54d88793f963d25ef921bf762d9a8c703aef57379f960419ebf |
File details
Details for the file pdf_mind-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: pdf_mind-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.13.1 Darwin/24.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6941da55372447edd48b2b71c603402196c893ba93d1746a42d0b44168015941 |
|
MD5 | bf35d908b5e144dd8beb446b0d0a00f3 |
|
BLAKE2b-256 | 9c306435799aae64a55bc5c3e35b7cf70389f0804c3189b36d3d61061de482f2 |