Parxy document processing gateway
Project description
OneOffTech Parxy
Parxy is a document processing gateway providing a unified interface to interact with multiple document parsing services, exposing a unified flexible document model suitable for different levels of text extraction granularity.
- Unified API to parse documents with different providers
- Unified flexible hierarchical document model (
page → block → line → span → character) - Supports both local libraries (e.g., PyMuPDF, Unstructured) and remote services (e.g., LlamaParse, LLMWhisperer, PdfAct)
- Extensible: easily integrate new parsers in your own code
- Trace the execution for debug purposes
- Pair with evaluation utilities to compare extraction results (coming soon)
[!NOTE]
Parxy is being rewritten from the ground up. Versions 0.6 and below are preserved in the legacy branch for historical purposes. The main branch contains the rewrite, which focuses on library and CLI usage. If you still need the HTTP API, continue using version 0.6.
Requirements
- Python 3.12 or above (Python 3.10 and 3.11 are supported on best-effort).
Next steps
Getting started
Parxy is available as a standalone command line and a library. The quickest way to try out Parxy is via command line using uvx.
Use with minimal footprint (fewer drivers supported):
uvx --from "git+https://github.com/oneofftech/parxy.git" parxy --help
Use all supported drivers:
uvx --from "git+https://github.com/oneofftech/parxy.git[all]" parxy --help
See Supported services for the list of included drivers and their extras for the installation.
Use on the command line
to be documented
Use as a library in your project
to be documented
-
Install, all or the driver you need
-
Add the env variables when needed
-
Call the driver
from parxy_core.facade import Parxy
# Using the default driver, usually pymupdf
Parxy.parse('path/to/document.pdf')
# Using a specific driver
Parxy.driver(Parxy.LLAMAPARSE).parse('path/to/document.pdf')
Supported services
| Service or Library | Support status | Extra | Local file | Remote file |
|---|---|---|---|---|
| PyMuPDF | Live | - | ✅ | ✅ |
| PdfAct | Live | - | ✅ | ✅ |
| Unstructured library | Preview | unstructured_local |
✅ | ✅ |
| LlamaParse | Preview | llama |
✅ | ✅ |
| LLMWhisperer | Preview | llmwhisperer |
✅ | ✅ |
| Unstructured.io cloud service | Planned | |||
| Chunkr | Planned | |||
| Docling | Planned |
...and more can be added via the live extension!
Live extension
Live Extension allow to add new drivers or create custom configuration of the current drivers directly in your app code.
- Create a class that inherits from
Driver
from parxy_core.drivers import Driver
from parxy_core.models import Document
class CustomDriverExample(Driver):
"""Example custom driver for testing."""
def _handle(self, file, level="page") -> Document:
return Document(pages=[])
- Register it in Parxy using the
extendmethod
Parxy.extend(name='my_parser', callback=lambda: CustomDriverExample())
- Use it
Parxy.driver('my_parser').parse('path/to/document.pdf')
Contributing
Thank you for considering contributing to Parxy! You can find how to get started in our contribution guide.
Development
Parxy uses UV as package and project manager.
- Clone the repository
- Sync all dependencies with
uv sync --all-extras
All Parxy code is located in the src directory:
parxy_corecontains the drivers implementations, the models and the facade and factory to access Parxy featuresparxy_clicontains the module providing the command line interface
Optional Dependencies vs Dependency Groups
Parxy uses optional dependencies to track user oriented dependencies that enhance functionality. Dependency groups are reserved for development purposes. When supporting a new driver consider defining it's dependencies as optional to reduce Parxy's footprint.
The question What’s the difference between optional-dependencies and dependency-groups in pyproject.toml? give a nice overview of the differences.
Testing
Parxy is tested using Pytest. Tests, located under tests folder, run for each commit and pull request.
To execute the test suite run:
uv run pytest
You can run type checking and linting via:
uv run ruff check
Security Vulnerabilities
Please review our security policy on how to report security vulnerabilities.
Supporters
The project is provided and supported by OneOff-Tech (UG) and Alessio Vertemati.
Licence and Copyright
Parxy is licensed under the GPL v3 licence.
- Copyright (c) 2025-present Alessio Vertemati, @avvertix
- Copyright (c) 2025-present Oneoff-tech UG, www.oneofftech.de
- All contributors
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file parxy-0.1.0.tar.gz.
File metadata
- Download URL: parxy-0.1.0.tar.gz
- Upload date:
- Size: 22.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.8.22
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7fb2e3978167ea65e0f1206b4753297a5de81669e944403a2044df61c24fdee
|
|
| MD5 |
da5d8399e078b255ade1f49cded9ac52
|
|
| BLAKE2b-256 |
4a911d64f2fd16efa20a95117111c1523a7e4c4e48966c4531c4f632b6970262
|
File details
Details for the file parxy-0.1.0-py3-none-any.whl.
File metadata
- Download URL: parxy-0.1.0-py3-none-any.whl
- Upload date:
- Size: 35.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.8.22
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba9ef03ad0abb0a1d5cb2906c3abd975f89d914c88ce90d4658f343e0fe332b8
|
|
| MD5 |
513c26ddf53b7424fe16906b5a972c31
|
|
| BLAKE2b-256 |
67852fa5de7c2d06ca8fa52211335758462390809d426a33103f483e1bd2f794
|