Skip to main content

The client library for Aryn services

Project description

PyPI PyPI - Python Version Slack Docs License

aryn-sdk is a simple client library for interacting with Aryn cloud services.

Aryn Partitioning Service

Partition pdf files with the Aryn Partitioning Service (APS) through aryn-sdk:

from aryn_sdk.partition import partition_file

with open("partition-me.pdf", "rb") as f:
    data = partition_file(
        f,
        use_ocr=True,
        extract_table_structure=True,
        extract_images=True
    )
elements = data['elements']

Convert a partitioned table element to a pandas dataframe for easier use:

from aryn_sdk.partition import partition_file, table_elem_to_dataframe

with open("partition-me.pdf", "rb") as f:
    data = partition_file(
        f,
        use_ocr=True,
        extract_table_structure=True,
        extract_images=True
    )

# Find the first table and convert it to a dataframe
df = None
for element in data['elements']:
    if element['type'] == 'table':
        df = table_elem_to_dataframe(element)
        break

Or convert all partitioned tables to pandas dataframes in one shot:

from aryn_sdk.partition import partition_file, tables_to_pandas

with open("partition-me.pdf", "rb") as f:
    data = partition_file(
        f,
        use_ocr=True,
        extract_table_structure=True,
        extract_images=True
    )
elements_and_tables = tables_to_pandas(data)
dataframes = [table for (element, table) in elements_and_tables if table is not None]

Visualize partitioned documents by drawing on the bounding boxes:

from aryn_sdk.partition import partition_file, draw_with_boxes

with open("partition-me.pdf", "rb") as f:
    data = partition_file(
        f,
        use_ocr=True,
        extract_table_structure=True,
        extract_images=True
    )
page_pics = draw_with_boxes("partition-me.pdf", data, draw_table_cells=True)

from IPython.display import display
display(page_pics[0])

Note: visualizing documents requires poppler, a pdf processing library, to be installed. Instructions for installing poppler can be found here

Convert image elements to more useful types, like PIL, or image format typed byte strings

from aryn_sdk.partition import partition_file, convert_image_element

with open("my-favorite-pdf.pdf", "rb") as f:
    data = partition_file(
        f,
        extract_images=True
    )
image_elts = [e for e in data['elements'] if e['type'] == 'Image']

pil_img = convert_image_element(image_elts[0])
jpg_bytes = convert_image_element(image_elts[1], format='JPEG')
png_str = convert_image_element(image_elts[2], format="PNG", b64encode=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aryn_sdk-0.1.9.tar.gz (852.9 kB view details)

Uploaded Source

Built Distribution

aryn_sdk-0.1.9-py3-none-any.whl (870.2 kB view details)

Uploaded Python 3

File details

Details for the file aryn_sdk-0.1.9.tar.gz.

File metadata

  • Download URL: aryn_sdk-0.1.9.tar.gz
  • Upload date:
  • Size: 852.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for aryn_sdk-0.1.9.tar.gz
Algorithm Hash digest
SHA256 02d2a84f96314f394c3adddbbfe09586b9492c22c57b3120ee9a0110252e1225
MD5 2cabe2378addd4402bf8822178fca78d
BLAKE2b-256 5b7e4d72e23c3525bdffc827c07226da1aee6dce77f562abba9680364e0757a5

See more details on using hashes here.

Provenance

The following attestation bundles were made for aryn_sdk-0.1.9.tar.gz:

Publisher: aryn-sdk_release.yml on aryn-ai/sycamore

Attestations:

File details

Details for the file aryn_sdk-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: aryn_sdk-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 870.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for aryn_sdk-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 15145959f8fbbb89d6e8dc07bbbcc57cd102f2333ecff08708a14553b2127856
MD5 778f2194c9ca5654b2efb573a00b5d8f
BLAKE2b-256 5a108f0e702a9470a1cb7f44db1fe4f22fd162189c7cb77a9745acfffb8f3082

See more details on using hashes here.

Provenance

The following attestation bundles were made for aryn_sdk-0.1.9-py3-none-any.whl:

Publisher: aryn-sdk_release.yml on aryn-ai/sycamore

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page