Skip to main content

The client library for Aryn services

Project description

PyPI PyPI - Python Version Slack Docs License

aryn-sdk is a simple client library for interacting with Aryn cloud services.

Aryn Partitioning Service

Partition pdf files with the Aryn Partitioning Service (APS) through aryn-sdk:

from aryn_sdk.partition import partition_file

with open("partition-me.pdf", "rb") as f:
    data = partition_file(
        f,
        use_ocr=True,
        extract_table_structure=True,
        extract_images=True
    )
elements = data['elements']

Convert a partitioned table element to a pandas dataframe for easier use:

from aryn_sdk.partition import partition_file, table_elem_to_dataframe

with open("partition-me.pdf", "rb") as f:
    data = partition_file(
        f,
        use_ocr=True,
        extract_table_structure=True,
        extract_images=True
    )

# Find the first table and convert it to a dataframe
df = None
for element in data['elements']:
    if element['type'] == 'table':
        df = table_elem_to_dataframe(element)
        break

Or convert all partitioned tables to pandas dataframes in one shot:

from aryn_sdk.partition import partition_file, tables_to_pandas

with open("partition-me.pdf", "rb") as f:
    data = partition_file(
        f,
        use_ocr=True,
        extract_table_structure=True,
        extract_images=True
    )
elements_and_tables = tables_to_pandas(data)
dataframes = [table for (element, table) in elements_and_tables if table is not None]

Visualize partitioned documents by drawing on the bounding boxes:

from aryn_sdk.partition import partition_file, draw_with_boxes

with open("partition-me.pdf", "rb") as f:
    data = partition_file(
        f,
        use_ocr=True,
        extract_table_structure=True,
        extract_images=True
    )
page_pics = draw_with_boxes("partition-me.pdf", data, draw_table_cells=True)

from IPython.display import display
display(page_pics[0])

Note: visualizing documents requires poppler, a pdf processing library, to be installed. Instructions for installing poppler can be found here

Convert image elements to more useful types, like PIL, or image format typed byte strings

from aryn_sdk.partition import partition_file, convert_image_element

with open("my-favorite-pdf.pdf", "rb") as f:
    data = partition_file(
        f,
        extract_images=True
    )
image_elts = [e for e in data['elements'] if e['type'] == 'Image']

pil_img = convert_image_element(image_elts[0])
jpg_bytes = convert_image_element(image_elts[1], format='JPEG')
png_str = convert_image_element(image_elts[2], format="PNG", b64encode=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aryn_sdk-0.1.7.tar.gz (852.1 kB view details)

Uploaded Source

Built Distribution

aryn_sdk-0.1.7-py3-none-any.whl (869.5 kB view details)

Uploaded Python 3

File details

Details for the file aryn_sdk-0.1.7.tar.gz.

File metadata

  • Download URL: aryn_sdk-0.1.7.tar.gz
  • Upload date:
  • Size: 852.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for aryn_sdk-0.1.7.tar.gz
Algorithm Hash digest
SHA256 5f4d2c0ffdd709132c2fc410e07833858c8574e4a17980b5d85c2d369ac709a5
MD5 78620203f734e82a268a502559643962
BLAKE2b-256 af9ed295d22c873da59c36382ac50d6fb488c9f08771dc68a28c9b21c0217dc2

See more details on using hashes here.

File details

Details for the file aryn_sdk-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: aryn_sdk-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 869.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for aryn_sdk-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 5040f086869078103549d88b9276d03405e4b45eb6ec5f2debf8f08b00c87b63
MD5 346004ea1bf63f455f647b6d26197834
BLAKE2b-256 691bc0ba8b23a8535bc5d55f8571d32445459912fe4bed7a768ce1111da838da

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page