Skip to main content

The client library for Aryn services

Project description

PyPI PyPI - Python Version Slack Docs License

aryn-sdk is a simple client library for interacting with Aryn cloud services.

Aryn Partitioning Service

Partition pdf files with the Aryn Partitioning Service (APS) through aryn-sdk:

from aryn_sdk.partition import partition_file

with open("partition-me.pdf", "rb") as f:
    data = partition_file(
        f,
        use_ocr=True,
        extract_table_structure=True,
        extract_images=True
    )
elements = data['elements']

Convert a partitioned table element to a pandas dataframe for easier use:

from aryn_sdk.partition import partition_file, table_elem_to_dataframe

with open("partition-me.pdf", "rb") as f:
    data = partition_file(
        f,
        use_ocr=True,
        extract_table_structure=True,
        extract_images=True
    )

# Find the first table and convert it to a dataframe
df = None
for element in data['elements']:
    if element['type'] == 'table':
        df = table_elem_to_dataframe(element)
        break

Or convert all partitioned tables to pandas dataframes in one shot:

from aryn_sdk.partition import partition_file, tables_to_pandas

with open("partition-me.pdf", "rb") as f:
    data = partition_file(
        f,
        use_ocr=True,
        extract_table_structure=True,
        extract_images=True
    )
elements_and_tables = tables_to_pandas(data)
dataframes = [table for (element, table) in elements_and_tables if table is not None]

Visualize partitioned documents by drawing on the bounding boxes:

from aryn_sdk.partition import partition_file, draw_with_boxes

with open("partition-me.pdf", "rb") as f:
    data = partition_file(
        f,
        use_ocr=True,
        extract_table_structure=True,
        extract_images=True
    )
page_pics = draw_with_boxes("partition-me.pdf", data, draw_table_cells=True)

from IPython.display import display
display(page_pics[0])

Note: visualizing documents requires poppler, a pdf processing library, to be installed. Instructions for installing poppler can be found here

Convert image elements to more useful types, like PIL, or image format typed byte strings

from aryn_sdk.partition import partition_file, convert_image_element

with open("my-favorite-pdf.pdf", "rb") as f:
    data = partition_file(
        f,
        extract_images=True
    )
image_elts = [e for e in data['elements'] if e['type'] == 'Image']

pil_img = convert_image_element(image_elts[0])
jpg_bytes = convert_image_element(image_elts[1], format='JPEG')
png_str = convert_image_element(image_elts[2], format="PNG", b64encode=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aryn_sdk-0.1.6.tar.gz (851.9 kB view details)

Uploaded Source

Built Distribution

aryn_sdk-0.1.6-py3-none-any.whl (869.3 kB view details)

Uploaded Python 3

File details

Details for the file aryn_sdk-0.1.6.tar.gz.

File metadata

  • Download URL: aryn_sdk-0.1.6.tar.gz
  • Upload date:
  • Size: 851.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for aryn_sdk-0.1.6.tar.gz
Algorithm Hash digest
SHA256 ade87cee892db8c3f2b756d831be8f794e6e89e802ca2209913f5c29838714c3
MD5 27d158a3fcb6529d77050e7c2440ecaf
BLAKE2b-256 6fffa624139a8b06ed03c34b9bc9074d0933db689f5e90638ebae3951156f176

See more details on using hashes here.

File details

Details for the file aryn_sdk-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: aryn_sdk-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 869.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for aryn_sdk-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 c7e0281414aed8b33c0044752e9f1c7259c127696c7011ca6783eb4b34a39fdb
MD5 99979cc2488844495acc1f753e3f7560
BLAKE2b-256 ae84ea32ce5d5acc0ad8b6cecda40fe3c5afc98b36ace3cbd586f7614187be4b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page