The client library for Aryn services
Project description
aryn-sdk
is a simple client library for interacting with Aryn cloud services.
Aryn Partitioning Service
Partition pdf files with the Aryn Partitioning Service (APS) through aryn-sdk
:
from aryn_sdk.partition import partition_file
with open("partition-me.pdf", "rb") as f:
data = partition_file(
f,
use_ocr=True,
extract_table_structure=True,
extract_images=True
)
elements = data['elements']
Convert a partitioned table element to a pandas dataframe for easier use:
from aryn_sdk.partition import partition_file, table_elem_to_dataframe
with open("partition-me.pdf", "rb") as f:
data = partition_file(
f,
use_ocr=True,
extract_table_structure=True,
extract_images=True
)
# Find the first table and convert it to a dataframe
df = None
for element in data['elements']:
if element['type'] == 'table':
df = table_elem_to_dataframe(element)
break
Or convert all partitioned tables to pandas dataframes in one shot:
from aryn_sdk.partition import partition_file, tables_to_pandas
with open("partition-me.pdf", "rb") as f:
data = partition_file(
f,
use_ocr=True,
extract_table_structure=True,
extract_images=True
)
elements_and_tables = tables_to_pandas(data)
dataframes = [table for (element, table) in elements_and_tables if table is not None]
Visualize partitioned documents by drawing on the bounding boxes:
from aryn_sdk.partition import partition_file, draw_with_boxes
with open("partition-me.pdf", "rb") as f:
data = partition_file(
f,
use_ocr=True,
extract_table_structure=True,
extract_images=True
)
page_pics = draw_with_boxes("partition-me.pdf", data, draw_table_cells=True)
from IPython.display import display
display(page_pics[0])
Note: visualizing documents requires
poppler
, a pdf processing library, to be installed. Instructions for installing poppler can be found here
Convert image elements to more useful types, like PIL, or image format typed byte strings
from aryn_sdk.partition import partition_file, convert_image_element
with open("my-favorite-pdf.pdf", "rb") as f:
data = partition_file(
f,
extract_images=True
)
image_elts = [e for e in data['elements'] if e['type'] == 'Image']
pil_img = convert_image_element(image_elts[0])
jpg_bytes = convert_image_element(image_elts[1], format='JPEG')
png_str = convert_image_element(image_elts[2], format="PNG", b64encode=True)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file aryn_sdk-0.1.6.tar.gz
.
File metadata
- Download URL: aryn_sdk-0.1.6.tar.gz
- Upload date:
- Size: 851.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ade87cee892db8c3f2b756d831be8f794e6e89e802ca2209913f5c29838714c3 |
|
MD5 | 27d158a3fcb6529d77050e7c2440ecaf |
|
BLAKE2b-256 | 6fffa624139a8b06ed03c34b9bc9074d0933db689f5e90638ebae3951156f176 |
File details
Details for the file aryn_sdk-0.1.6-py3-none-any.whl
.
File metadata
- Download URL: aryn_sdk-0.1.6-py3-none-any.whl
- Upload date:
- Size: 869.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c7e0281414aed8b33c0044752e9f1c7259c127696c7011ca6783eb4b34a39fdb |
|
MD5 | 99979cc2488844495acc1f753e3f7560 |
|
BLAKE2b-256 | ae84ea32ce5d5acc0ad8b6cecda40fe3c5afc98b36ace3cbd586f7614187be4b |