Skip to main content

High-performance PDF-to-structured-data extraction — Rust engine, Python interface

Project description

edgeparse

High-performance PDF-to-structured-data extraction for Python — powered by a Rust engine via PyO3.

Install

pip install edgeparse

Pre-built wheels are available for macOS, Linux (x86_64, arm64), and Windows (x64). No system dependencies or compilation required.

Quick start

import edgeparse

# Convert a PDF to Markdown
result = edgeparse.convert("document.pdf")
print(result.markdown)

# Convert with options
result = edgeparse.convert(
    "document.pdf",
    format="markdown",      # "markdown" | "json" | "html"
    extract_images=False,
    page_range=None,        # None = all pages, or [0, 5] for pages 1–6
)

CLI

edgeparse document.pdf                     # → Markdown on stdout
edgeparse document.pdf --format json       # → JSON
edgeparse /path/to/dir/ --output-dir out/  # batch convert

Performance

edgeparse consistently leads open benchmarks for PDF-to-Markdown extraction quality across 200-document test suites.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edgeparse-0.1.0.tar.gz (730.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

edgeparse-0.1.0-cp312-cp312-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file edgeparse-0.1.0.tar.gz.

File metadata

  • Download URL: edgeparse-0.1.0.tar.gz
  • Upload date:
  • Size: 730.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.3

File hashes

Hashes for edgeparse-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b6f6715486cad6004d4248be35c265f7c41f700b4e60333ecf61f7ca468a26d1
MD5 f3c9cfa32b79ca43d32cf30b00ffc66a
BLAKE2b-256 52a0e1c357770d1ebb7038b71ecf996db44cf2784c04cb2cf6e0e2984cbe47a8

See more details on using hashes here.

File details

Details for the file edgeparse-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for edgeparse-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6547066869986d8f672adc01538842b36293b3a68f504abcb5274b3086ff4ec8
MD5 9bc8f9047f0350de2d5152adcbaa0db5
BLAKE2b-256 69ad938beffefe6a5ae8a135f3169e1675daf1a70a91a120f002c69e160c6c3c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page