High-performance PDF-to-structured-data extraction — Rust engine, Python interface
Project description
edgeparse
High-performance PDF-to-structured-data extraction for Python — powered by a Rust engine via PyO3.
Install
pip install edgeparse
Pre-built wheels are available for macOS, Linux (x86_64, arm64), and Windows (x64). No system dependencies or compilation required.
Quick start
import edgeparse
# Convert a PDF to Markdown
result = edgeparse.convert("document.pdf")
print(result.markdown)
# Convert with options
result = edgeparse.convert(
"document.pdf",
format="markdown", # "markdown" | "json" | "html"
extract_images=False,
page_range=None, # None = all pages, or [0, 5] for pages 1–6
)
CLI
edgeparse document.pdf # → Markdown on stdout
edgeparse document.pdf --format json # → JSON
edgeparse /path/to/dir/ --output-dir out/ # batch convert
Performance
edgeparse consistently leads open benchmarks for PDF-to-Markdown extraction quality across 200-document test suites.
Links
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file edgeparse-0.1.0.tar.gz.
File metadata
- Download URL: edgeparse-0.1.0.tar.gz
- Upload date:
- Size: 730.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6f6715486cad6004d4248be35c265f7c41f700b4e60333ecf61f7ca468a26d1
|
|
| MD5 |
f3c9cfa32b79ca43d32cf30b00ffc66a
|
|
| BLAKE2b-256 |
52a0e1c357770d1ebb7038b71ecf996db44cf2784c04cb2cf6e0e2984cbe47a8
|
File details
Details for the file edgeparse-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: edgeparse-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.6 MB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6547066869986d8f672adc01538842b36293b3a68f504abcb5274b3086ff4ec8
|
|
| MD5 |
9bc8f9047f0350de2d5152adcbaa0db5
|
|
| BLAKE2b-256 |
69ad938beffefe6a5ae8a135f3169e1675daf1a70a91a120f002c69e160c6c3c
|