Python binding to Java Archery Framework

Project description

PyArchery

License: GPL v3 Servier Inspired

PyArchery is a Python binding for the Java Archery Framework, enabling powerful semi-structured document processing directly from Python. It leverages JPype to bridge Python and Java, providing seamless access to Archery's intelligent extraction, layout analysis, and tag classification capabilities.

Description

In today's data-driven landscape, navigating the complexities of semi-structured documents poses a significant challenge. PyArchery brings the robust capabilities of the Archery framework to the Python ecosystem.

By leveraging innovative algorithms and machine learning techniques, Archery offers a solution that gives you control over the data extraction process with tweakable and repeatable settings. It automates the extraction process, saving time and minimizing errors, making it ideal for industries dealing with large volumes of documents.

Key features include:

Intelligent Extraction: Automatically extract structured data from documents.
Layout Analysis: Understand the physical layout of document elements.
Tag Classification: Classify document tags using customizable styles (Snake case, Camel case, etc.).
Java Integration: Direct access to the underlying Java Archery API for advanced usage.

Getting Started

Prerequisites

Java Development Kit (JDK): Version 21 or higher is required.
Python: Version 3.11 or higher.

Installation

Install PyArchery using pip:

pip install pyjarchery

Quick Start

Here's a simple example of how to use PyArchery to open a document and extract data from tables:

import pyarchery

# Path to your document
file_path = "path/to/your/document.pdf"

# Load the document with intelligent extraction hints
# This returns a DocumentWrapper
with pyarchery.load(
    file_path,
    hints=[pyarchery.INTELLI_EXTRACT, pyarchery.INTELLI_LAYOUT]
) as doc:
    # Access sheets using the pythonic wrapper property
    for sheet in doc.sheets:
        # Check if sheet has a table
        if sheet.table:
            table = sheet.table
            # Convert to python dictionary
            data = table.to_pydict()
            print(f"Extracted data from table: {data.keys()}")

Documentation

For comprehensive documentation, tutorials, and API references, please visit:

PyArchery Documentation: https://romualdrousseau.github.io/PyArchery/
Java Archery Framework: https://github.com/RomualdRousseau/Archery

Configuration

You can tune runtime behavior via environment variables:

PYARCHERY_MAVEN_URL / PYARCHERY_MAVEN_SNAPSHOT_URL: Override Maven base URLs for downloading Java dependencies.
PYARCHERY_JARS_HOME: Directory where downloaded JARs are cached (default is inside the package). Useful to share a cache across virtual environments or reduce wheel size by keeping jars out of the wheel.
PYARCHERY_SKIP_JVM_START: Set to 1 to skip JVM startup (for dry runs or environments where Java is managed externally).
PYARCHERY_REQUIRE_CHECKSUMS: Set to 1 to enforce checksum verification of downloaded JARs (fails if checksum file is missing or mismatched).
PYARCHERY_FETCH_ALL_NATIVE: Set to 1 to download all native classifiers instead of filtering by the current platform.

Wheel slimming

The default build excludes bundled JARs; on first use PyArchery downloads only the platform-matching artifacts. To avoid repeated downloads across projects or CI runs, set PYARCHERY_JARS_HOME to a shared cache directory. If you need a “fat” wheel that includes JARs, consider providing a separate distribution or optional extra that re-enables JAR bundling, while keeping the default wheel lightweight.

Contribute

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Authors

Romuald Rousseau, romualdrousseau@gmail.com

Project details

Release history Release notifications | RSS feed

This version

0.1.16

Dec 31, 2025

0.1.15

Dec 31, 2025

0.1.14

Dec 12, 2025

0.1.13

Dec 9, 2025

0.1.12

Nov 28, 2025

0.1.11

Nov 23, 2025

0.1.10

Nov 23, 2025

0.1.9

Nov 23, 2025

0.1.8

Nov 23, 2025

0.1.7

Nov 23, 2025

0.1.6

Nov 23, 2025

0.1.5

Nov 23, 2025

0.1.4

Nov 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyjarchery-0.1.16.tar.gz (136.7 kB view details)

Uploaded Dec 31, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyjarchery-0.1.16-py3-none-any.whl (138.1 kB view details)

Uploaded Dec 31, 2025 Python 3

File details

Details for the file pyjarchery-0.1.16.tar.gz.

File metadata

Download URL: pyjarchery-0.1.16.tar.gz
Upload date: Dec 31, 2025
Size: 136.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.4

File hashes

Hashes for pyjarchery-0.1.16.tar.gz
Algorithm	Hash digest
SHA256	`e341ae84e27aea30b52902c54a86659865de94459098ea1026ad344a5ef3b7b2`
MD5	`aa812bf0683a67a2b3d5dff789f6faf2`
BLAKE2b-256	`57d3a74b622192ed390dfb5c6208331287c6b7bf968bef72b66b3585de2bda93`

See more details on using hashes here.

File details

Details for the file pyjarchery-0.1.16-py3-none-any.whl.

File metadata

Download URL: pyjarchery-0.1.16-py3-none-any.whl
Upload date: Dec 31, 2025
Size: 138.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.4

File hashes

Hashes for pyjarchery-0.1.16-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c07b345464a88f5c8403bff67ab8cee922d34296f1df86f4afe914cd4abc3ebc`
MD5	`dbfed89cb38167d189a40d1be078280b`
BLAKE2b-256	`6d416c05ec7eaee4988964bc0ccfd2b76057c0705447fc7a7d397b992059c774`

See more details on using hashes here.

pyjarchery 0.1.16

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

PyArchery

Description

Getting Started

Prerequisites

Installation

Quick Start

Documentation

Configuration

Wheel slimming

Contribute

Authors

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes