Skip to main content

The client library for Aryn services

Project description

PyPI PyPI - Python Version Slack Docs License

aryn-sdk is a simple client library for interacting with Aryn cloud services.

Aryn Partitioning Service

Partition pdf files with the Aryn Partitioning Service (APS) through aryn-sdk:

from aryn_sdk.partition import partition_file

with open("partition-me.pdf", "rb") as f:
    data = partition_file(
        f,
        use_ocr=True,
        extract_table_structure=True,
        extract_images=True
    )
elements = data['elements']

Convert a partitioned table element to a pandas dataframe for easier use:

from aryn_sdk.partition import partition_file, table_elem_to_dataframe

with open("partition-me.pdf", "rb") as f:
    data = partition_file(
        f,
        use_ocr=True,
        extract_table_structure=True,
        extract_images=True
    )

# Find the first table and convert it to a dataframe
df = None
for element in data['elements']:
    if element['type'] == 'table':
        df = table_elem_to_dataframe(element)
        break

Or convert all partitioned tables to pandas dataframes in one shot:

from aryn_sdk.partition import partition_file, tables_to_pandas

with open("partition-me.pdf", "rb") as f:
    data = partition_file(
        f,
        use_ocr=True,
        extract_table_structure=True,
        extract_images=True
    )
elements_and_tables = tables_to_pandas(data)
dataframes = [table for (element, table) in elements_and_tables if table is not None]

Visualize partitioned documents by drawing on the bounding boxes:

from aryn_sdk.partition import partition_file, draw_with_boxes

with open("partition-me.pdf", "rb") as f:
    data = partition_file(
        f,
        use_ocr=True,
        extract_table_structure=True,
        extract_images=True
    )
page_pics = draw_with_boxes("partition-me.pdf", data, draw_table_cells=True)

from IPython.display import display
display(page_pics[0])

Note: visualizing documents requires poppler, a pdf processing library, to be installed. Instructions for installing poppler can be found here

Convert image elements to more useful types, like PIL, or image format typed byte strings

from aryn_sdk.partition import partition_file, convert_image_element

with open("my-favorite-pdf.pdf", "rb") as f:
    data = partition_file(
        f,
        extract_images=True
    )
image_elts = [e for e in data['elements'] if e['type'] == 'Image']

pil_img = convert_image_element(image_elts[0])
jpg_bytes = convert_image_element(image_elts[1], format='JPEG')
png_str = convert_image_element(image_elts[2], format="PNG", b64encode=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aryn_sdk-0.1.8.tar.gz (852.3 kB view details)

Uploaded Source

Built Distribution

aryn_sdk-0.1.8-py3-none-any.whl (869.7 kB view details)

Uploaded Python 3

File details

Details for the file aryn_sdk-0.1.8.tar.gz.

File metadata

  • Download URL: aryn_sdk-0.1.8.tar.gz
  • Upload date:
  • Size: 852.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for aryn_sdk-0.1.8.tar.gz
Algorithm Hash digest
SHA256 f5c421fdefbc000258b64ffdf2a483c524e19a86e08b32240b08ceb2c6b3079d
MD5 6aab26644ac453adbc3fdb94776ed4cd
BLAKE2b-256 be5034cbdcc534a4a91e0fa368db9b46e45175234fde7c440a979c37714d814f

See more details on using hashes here.

File details

Details for the file aryn_sdk-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: aryn_sdk-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 869.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for aryn_sdk-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 fb3d114eba324295cf9a69a7825abe79ea7449bef265448b78a8164a899aa93c
MD5 a01a4d8ddaa7e45db7edc444eb9191ac
BLAKE2b-256 029d4e2a7c23ff71aa30cafcbe90a1a6a99e0a99e6ab2e0f38adaafd2b135bbf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page