Skip to main content

Parxy document processing gateway

Project description

pypi Pydantic v2 uv CI

OneOffTech Parxy

Parxy is a document processing gateway providing a unified interface to interact with multiple document parsing services, exposing a unified flexible document model suitable for different levels of text extraction granularity.

  • Unified API to parse documents with different providers
  • Unified flexible hierarchical document model (page → block → line → span → character)
  • Supports both local libraries (e.g., PyMuPDF, Unstructured) and remote services (e.g., LlamaParse, LLMWhisperer, PdfAct)
  • Extensible: easily integrate new parsers in your own code
  • Trace the execution for debug purposes
  • Pair with evaluation utilities to compare extraction results (coming soon)

Requirements

  • Python 3.12 or 3.13 (Python 3.14 is under testing).

Next steps

Getting started

Parxy is available as a standalone command line and a library. The quickest way to try out Parxy is via command line using uvx.

Use with minimal footprint (fewer drivers supported):

uvx parxy --help

Use all supported drivers:

uvx --from 'parxy[all]' parxy --help

See Supported services for the list of included drivers and their extras for the installation.

Use on the command line

You can install Parxy globally using either pip or uv. If you prefer you can execute without installation using uvx.

# Install via pip
pip install parxy       # Basic installation
pip install parxy[all]  # All drivers included

# Install via uv
uv add parxy       # Basic installation
uv add parxy --extra all  # All drivers included

# Using uvx
uvx parxy       # Basic installation
uvx --from 'parxy[all]' parxy  # All drivers included

Once installed, you can use the parxy command to:

  • parxy tui: Interactive TUI for comparing multiple parsers side-by-side with diff visualization
  • parxy parse: Extract text content from documents with customizable granularity levels and output formats. Process individual files or entire folders, use multiple drivers, and control output with progress bars.
  • parxy preview: Interactive document viewer showing metadata, table of contents, and content preview in a scrollable interface
  • parxy markdown: Convert documents into Markdown format, with optional combining of multiple documents
  • parxy pdf:merge: Merge multiple PDF files into one, with support for selecting specific page ranges
  • parxy pdf:split: Split a PDF file into individual pages
  • parxy drivers: List available document processing drivers
  • parxy env: Create a configuration file with default settings
  • parxy docker: Generate a Docker Compose configuration for self-hosted services

Example usage:

# Launch interactive TUI for parser comparison
parxy tui ./documents

# Parse a PDF to markdown
parxy parse --mode markdown document.pdf

# Parse entire folder with JSON output
parxy parse /path/to/pdfs -m json -o output/

# Parse with multiple drivers for comparison
parxy parse document.pdf -d pymupdf -d llamaparse

# Preview document interactively
parxy preview document.pdf

# Convert multiple PDFs to markdown and combine them
parxy markdown --combine -o output/ doc1.pdf doc2.pdf

# Merge multiple PDFs with page ranges
parxy pdf:merge cover.pdf doc1.pdf[1:10] doc2.pdf -o merged.pdf

# Split a PDF into individual pages
parxy pdf:split document.pdf -o ./pages

# List available drivers
parxy drivers

See Using the Parxy Command Line Interface or run parxy --help for more information about available commands and options.

Use as a library in your project

  1. Install, all or the driver you need
# Install all supported drivers via Pip
pip install parxy[all]

# add to your project using when using UV
uv add parxy --extra all

You can also install optional parser backends depending on your needs (e.g. PyMuPDF, Unstructured, LlamaParse):

  1. Add the env variables when needed

Some services require an api key. Parxy support those as environment variables. You can create a .env file in your project root.

# LlamaParse 
PARXY_LLAMAPARSE_API_KEY=

# Unstract LLMWhisperer
PARXY_LLMWHISPERER_API_KEY=
  1. Call the driver
from parxy_core.facade import Parxy

# Parse a document using the default driver
doc = Parxy.parse('path/to/document.pdf')

# Print basic information
print(f"Pages: {len(doc.pages)}")
print(f"Title: {doc.metadata.title}")

# Parse a document using a specific driver
Parxy.driver(Parxy.LLAMAPARSE).parse('path/to/document.pdf')

For more information take a look at our Getting Started with Parxy tutorial.

Supported services

Service or Library Support status Extra Local file Remote file
PyMuPDF Live -
PdfAct Live -
Unstructured library Preview unstructured_local
Landing AI Agentic Document Extraction Preview landingai
LlamaParse Preview llama
LLMWhisperer Preview llmwhisperer
Unstructured.io cloud service Planned
Chunkr Planned
Docling Planned

...and more can be added via the live extension!

Live extension

Live Extension allow to add new drivers or create custom configuration of the current drivers directly in your app code.

  1. Create a class that inherits from Driver
from parxy_core.drivers import Driver
from parxy_core.models import Document

class CustomDriverExample(Driver):
    """Example custom driver for testing."""

    def _handle(self, file, level="page") -> Document:
        return Document(pages=[])
  1. Register it in Parxy using the extend method
Parxy.extend(name='my_parser', callback=lambda: CustomDriverExample())
  1. Use it
Parxy.driver('my_parser').parse('path/to/document.pdf')

More on the live extension in our How to Add a New Parser to Parxy guide.

Contributing

Thank you for considering contributing to Parxy! You can find how to get started in our contribution guide.

Interested in adding a new parser to the supported list, take a look at our How to Add a New Parser to Parxy guide.

Development

Parxy uses UV as package and project manager.

  1. Clone the repository
  2. Sync all dependencies with uv sync --all-extras

All Parxy code is located in the src directory:

  • parxy_core contains the drivers implementations, the models and the facade and factory to access Parxy features
  • parxy_cli contains the module providing the command line interface

Optional Dependencies vs Dependency Groups

Parxy uses optional dependencies to track user oriented dependencies that enhance functionality. Dependency groups are reserved for development purposes. When supporting a new driver consider defining it's dependencies as optional to reduce Parxy's footprint.

The question What’s the difference between optional-dependencies and dependency-groups in pyproject.toml? give a nice overview of the differences.

Testing

Parxy is tested using Pytest. Tests, located under tests folder, run for each commit and pull request.

To execute the test suite run:

uv run pytest

You can run type checking and linting via:

uv run ruff check

Security Vulnerabilities

Please review our security policy on how to report security vulnerabilities.

Supporters

The project is provided and supported by OneOff-Tech (UG) and Alessio Vertemati.

Licence and Copyright

Parxy is licensed under the GPL v3 licence.

  • Copyright (c) 2025-present Alessio Vertemati, @avvertix
  • Copyright (c) 2025-present Oneoff-tech UG, www.oneofftech.de
  • All contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parxy-0.12.0.tar.gz (87.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parxy-0.12.0-py3-none-any.whl (125.7 kB view details)

Uploaded Python 3

File details

Details for the file parxy-0.12.0.tar.gz.

File metadata

  • Download URL: parxy-0.12.0.tar.gz
  • Upload date:
  • Size: 87.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for parxy-0.12.0.tar.gz
Algorithm Hash digest
SHA256 d22ebeb47e81a4349a143d798de1df665fcfea7914cfcbae9c8e0d8b0a938472
MD5 4aaad1de08855de292f280831cef9cfe
BLAKE2b-256 1518f3d65b9c879005c87e991ac5fa158735ddb2b34da98bb46d05d571b09811

See more details on using hashes here.

File details

Details for the file parxy-0.12.0-py3-none-any.whl.

File metadata

  • Download URL: parxy-0.12.0-py3-none-any.whl
  • Upload date:
  • Size: 125.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for parxy-0.12.0-py3-none-any.whl
Algorithm Hash digest
SHA256 96f71f6aed5271f9220f3964e084c88273059d974045fadd3b165ad01012d2d4
MD5 73e2208454dfbfbeac868ae5aa34beab
BLAKE2b-256 9c2ec7dc36fc5b649e51f2d6d60d8b38bc21d647c4376a2decfbf223ffd315fd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page