Agent for extracting structured content from PDFs using LangGraph
Project description
PDFMind
An agent for extracting structured content from PDFs using LangGraph and OpenAI.
Features
- Extract and format text content from PDFs
- Convert tables to markdown format
- Extract images with AI-generated descriptions
- Use LangGraph for agent-based orchestration
Setup
# Install Poetry if you don't have it
curl -sSL https://install.python-poetry.org | python3 -
# Install dependencies
poetry install
# Or install from pypi
pip install pdf_mind
# Install other dependencies
brew install ghostscript
brew install poppler
# apt install ghostscript poppler
N.B.: if you're on OSX, the Ghostscript module may not be found. You can fix that by doing:
mkdir -p ~/lib
ln -s "$(brew --prefix gs)/lib/libgs.dylib" ~/lib
See the Camelot docs for more details on installing the dependency. It'll work without Ghostscript.
Usage
from pdf_mind import PDFExtractionAgent
agent = PDFExtractionAgent()
result = agent.process("path/to/document.pdf")
print(result)
Alternatively, look at example.py for an example that will output metadata on extracted items and token usage:
Development
# Run tests
poetry run pytest
# Lint code
poetry run ruff check .
poetry run black .
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf_mind-0.1.2.tar.gz.
File metadata
- Download URL: pdf_mind-0.1.2.tar.gz
- Upload date:
- Size: 10.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.10.16 Linux/6.8.0-1021-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
165a2a47d8d23805c656c0a19c9ddfe0de3728f190b016f90818c5b6054da225
|
|
| MD5 |
174b81655d2f70ea571358c390d78465
|
|
| BLAKE2b-256 |
745f264ae87e121c287515175c01a5d56902cf78b9793751e04e9273a57a9ce5
|
File details
Details for the file pdf_mind-0.1.2-py3-none-any.whl.
File metadata
- Download URL: pdf_mind-0.1.2-py3-none-any.whl
- Upload date:
- Size: 14.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.10.16 Linux/6.8.0-1021-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
436d891e7281d17fb3bb9aea37203066ae8fc7c3201f69081ca9b646768771cb
|
|
| MD5 |
8fdc19472de6014a693c5f215d2b867b
|
|
| BLAKE2b-256 |
c27a8a273e97f6e5135c80ec7f9efd0fe3952ea86cfadb36473fddbe5b52f6b4
|