Skip to main content

Parxy document processing gateway

Project description

CI uv

OneOffTech Parxy

Parxy is a document processing gateway providing a unified interface to interact with multiple document parsing services, exposing a unified flexible document model suitable for different levels of text extraction granularity.

  • Unified API to parse documents with different providers
  • Unified flexible hierarchical document model (page → block → line → span → character)
  • Supports both local libraries (e.g., PyMuPDF, Unstructured) and remote services (e.g., LlamaParse, LLMWhisperer, PdfAct)
  • Extensible: easily integrate new parsers in your own code
  • Trace the execution for debug purposes
  • Pair with evaluation utilities to compare extraction results (coming soon)

[!NOTE]
Parxy is being rewritten from the ground up. Versions 0.6 and below are preserved in the legacy branch for historical purposes. The main branch contains the rewrite, which focuses on library and CLI usage. If you still need the HTTP API, continue using version 0.6.

Requirements

  • Python 3.12 or above (Python 3.10 and 3.11 are supported on best-effort).

Next steps

Getting started

Parxy is available as a standalone command line and a library. The quickest way to try out Parxy is via command line using uvx.

Use with minimal footprint (fewer drivers supported):

uvx --from "git+https://github.com/oneofftech/parxy.git" parxy --help

Use all supported drivers:

uvx --from "git+https://github.com/oneofftech/parxy.git[all]" parxy --help

See Supported services for the list of included drivers and their extras for the installation.

Use on the command line

to be documented

Use as a library in your project

to be documented

  1. Install, all or the driver you need

  2. Add the env variables when needed

  3. Call the driver

from parxy_core.facade import Parxy

# Using the default driver, usually pymupdf
Parxy.parse('path/to/document.pdf')

# Using a specific driver
Parxy.driver(Parxy.LLAMAPARSE).parse('path/to/document.pdf')

Supported services

Service or Library Support status Extra Local file Remote file
PyMuPDF Live -
PdfAct Live -
Unstructured library Preview unstructured_local
LlamaParse Preview llama
LLMWhisperer Preview llmwhisperer
Unstructured.io cloud service Planned
Chunkr Planned
Docling Planned

...and more can be added via the live extension!

Live extension

Live Extension allow to add new drivers or create custom configuration of the current drivers directly in your app code.

  1. Create a class that inherits from Driver
from parxy_core.drivers import Driver
from parxy_core.models import Document

class CustomDriverExample(Driver):
    """Example custom driver for testing."""

    def _handle(self, file, level="page") -> Document:
        return Document(pages=[])
  1. Register it in Parxy using the extend method
Parxy.extend(name='my_parser', callback=lambda: CustomDriverExample())
  1. Use it
Parxy.driver('my_parser').parse('path/to/document.pdf')

Contributing

Thank you for considering contributing to Parxy! You can find how to get started in our contribution guide.

Development

Parxy uses UV as package and project manager.

  1. Clone the repository
  2. Sync all dependencies with uv sync --all-extras

All Parxy code is located in the src directory:

  • parxy_core contains the drivers implementations, the models and the facade and factory to access Parxy features
  • parxy_cli contains the module providing the command line interface

Optional Dependencies vs Dependency Groups

Parxy uses optional dependencies to track user oriented dependencies that enhance functionality. Dependency groups are reserved for development purposes. When supporting a new driver consider defining it's dependencies as optional to reduce Parxy's footprint.

The question What’s the difference between optional-dependencies and dependency-groups in pyproject.toml? give a nice overview of the differences.

Testing

Parxy is tested using Pytest. Tests, located under tests folder, run for each commit and pull request.

To execute the test suite run:

uv run pytest

You can run type checking and linting via:

uv run ruff check

Security Vulnerabilities

Please review our security policy on how to report security vulnerabilities.

Supporters

The project is provided and supported by OneOff-Tech (UG) and Alessio Vertemati.

Licence and Copyright

Parxy is licensed under the GPL v3 licence.

  • Copyright (c) 2025-present Alessio Vertemati, @avvertix
  • Copyright (c) 2025-present Oneoff-tech UG, www.oneofftech.de
  • All contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parxy-0.1.0.tar.gz (22.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parxy-0.1.0-py3-none-any.whl (35.4 kB view details)

Uploaded Python 3

File details

Details for the file parxy-0.1.0.tar.gz.

File metadata

  • Download URL: parxy-0.1.0.tar.gz
  • Upload date:
  • Size: 22.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.22

File hashes

Hashes for parxy-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b7fb2e3978167ea65e0f1206b4753297a5de81669e944403a2044df61c24fdee
MD5 da5d8399e078b255ade1f49cded9ac52
BLAKE2b-256 4a911d64f2fd16efa20a95117111c1523a7e4c4e48966c4531c4f632b6970262

See more details on using hashes here.

File details

Details for the file parxy-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: parxy-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 35.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.22

File hashes

Hashes for parxy-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ba9ef03ad0abb0a1d5cb2906c3abd975f89d914c88ce90d4658f343e0fe332b8
MD5 513c26ddf53b7424fe16906b5a972c31
BLAKE2b-256 67852fa5de7c2d06ca8fa52211335758462390809d426a33103f483e1bd2f794

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page