The client library for Aryn services
Reason this release was yanked:
Hardcoded self-reported version number was 0.1.11
Project description
aryn-sdk is a simple client library for interacting with Aryn cloud services.
Aryn DocParse
Partition pdf files with Aryn DocParse through aryn-sdk:
from aryn_sdk.partition import partition_file
with open("partition-me.pdf", "rb") as f:
data = partition_file(
f,
use_ocr=True,
extract_table_structure=True,
extract_images=True
)
elements = data['elements']
Convert a partitioned table element to a pandas dataframe for easier use:
from aryn_sdk.partition import partition_file, table_elem_to_dataframe
with open("partition-me.pdf", "rb") as f:
data = partition_file(
f,
use_ocr=True,
extract_table_structure=True,
extract_images=True
)
# Find the first table and convert it to a dataframe
df = None
for element in data['elements']:
if element['type'] == 'table':
df = table_elem_to_dataframe(element)
break
Or convert all partitioned tables to pandas dataframes in one shot:
from aryn_sdk.partition import partition_file, tables_to_pandas
with open("partition-me.pdf", "rb") as f:
data = partition_file(
f,
use_ocr=True,
extract_table_structure=True,
extract_images=True
)
elements_and_tables = tables_to_pandas(data)
dataframes = [table for (element, table) in elements_and_tables if table is not None]
Visualize partitioned documents by drawing on the bounding boxes:
from aryn_sdk.partition import partition_file, draw_with_boxes
with open("partition-me.pdf", "rb") as f:
data = partition_file(
f,
use_ocr=True,
extract_table_structure=True,
extract_images=True
)
page_pics = draw_with_boxes("partition-me.pdf", data, draw_table_cells=True)
from IPython.display import display
display(page_pics[0])
Note: visualizing documents requires
poppler, a pdf processing library, to be installed. Instructions for installing poppler can be found here
Convert image elements to more useful types, like PIL, or image format typed byte strings
from aryn_sdk.partition import partition_file, convert_image_element
with open("my-favorite-pdf.pdf", "rb") as f:
data = partition_file(
f,
extract_images=True
)
image_elts = [e for e in data['elements'] if e['type'] == 'Image']
pil_img = convert_image_element(image_elts[0])
jpg_bytes = convert_image_element(image_elts[1], format='JPEG')
png_str = convert_image_element(image_elts[2], format="PNG", b64encode=True)
Async Aryn DocParse
Single Job Example
import time
from aryn_sdk.partition import partition_file_async_submit, partition_file_async_result
with open("my-favorite-pdf.pdf", "rb") as f:
response = partition_file_async_submit(
f,
use_ocr=True,
extract_table_structure=True,
)
job_id = response["job_id"]
# Poll for the results
while True:
result = partition_file_async_result(job_id)
if result["status"] != "pending":
break
time.sleep(5)
Optionally, you can also set a webhook for Aryn to call when your job is completed:
partition_file_async_submit("path/to/my/file.docx", webhook_url="https://example.com/alert")
Aryn will POST a request containing a body like the below:
{"done": [{"job_id": "aryn:j-47gpd3604e5tz79z1jro5fc"}]}
Multi-Job Example
import logging
import time
from aryn_sdk.partition import partition_file_async_submit, partition_file_async_result
files = [open("file1.pdf", "rb"), open("file2.docx", "rb")]
job_ids = [None] * len(files)
for i, f in enumerate(files):
try:
job_ids[i] = partition_file_async_submit(f)["job_id"]
except Exception as e:
logging.warning(f"Failed to submit {f}: {e}")
results = [None] * len(files)
for i, job_id in enumerate(job_ids):
while True:
result = partition_file_async_result(job_id)
if result["status"] != "pending":
break
time.sleep(5)
results[i] = result
Cancelling an async job
from aryn_sdk.partition import partition_file_async_submit, partition_file_async_cancel
job_id = partition_file_async_submit(
"path/to/file.pdf",
use_ocr=True,
extract_table_structure=True,
extract_images=True,
)["job_id"]
partition_file_async_cancel(job_id)
List pending jobs
from aryn_sdk.partition import partition_file_async_list
partition_file_async_list()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aryn_sdk-0.1.12.tar.gz.
File metadata
- Download URL: aryn_sdk-0.1.12.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8c83a50b72a1c772936f4259d86020db154a6d845530c3dd90d3c956b22fbd87
|
|
| MD5 |
26a585559bd276e00c77cbdb6d09829f
|
|
| BLAKE2b-256 |
4a5c780e7de5b916a77ebff54a462add31ae74bc0479e9edb8625e62b37c18a9
|
Provenance
The following attestation bundles were made for aryn_sdk-0.1.12.tar.gz:
Publisher:
aryn-sdk_release.yml on aryn-ai/sycamore
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
aryn_sdk-0.1.12.tar.gz -
Subject digest:
8c83a50b72a1c772936f4259d86020db154a6d845530c3dd90d3c956b22fbd87 - Sigstore transparency entry: 166796326
- Sigstore integration time:
-
Permalink:
aryn-ai/sycamore@c6a142e1ef5d8f928da620bc787463268d500b64 -
Branch / Tag:
refs/tags/sdk-v0.1.12 - Owner: https://github.com/aryn-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
self-hosted -
Publication workflow:
aryn-sdk_release.yml@c6a142e1ef5d8f928da620bc787463268d500b64 -
Trigger Event:
push
-
Statement type:
File details
Details for the file aryn_sdk-0.1.12-py3-none-any.whl.
File metadata
- Download URL: aryn_sdk-0.1.12-py3-none-any.whl
- Upload date:
- Size: 1.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a53a780ef4244c36993109906b4a240bb92ff78c2722dcbebe7f80baf7c225e
|
|
| MD5 |
5de4d75ba616650516ff88c51e529998
|
|
| BLAKE2b-256 |
36348efcd4c5ce6ff9734b5737399febf96cadbcde3d411e5c5f3f3086c29a6a
|
Provenance
The following attestation bundles were made for aryn_sdk-0.1.12-py3-none-any.whl:
Publisher:
aryn-sdk_release.yml on aryn-ai/sycamore
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
aryn_sdk-0.1.12-py3-none-any.whl -
Subject digest:
8a53a780ef4244c36993109906b4a240bb92ff78c2722dcbebe7f80baf7c225e - Sigstore transparency entry: 166796328
- Sigstore integration time:
-
Permalink:
aryn-ai/sycamore@c6a142e1ef5d8f928da620bc787463268d500b64 -
Branch / Tag:
refs/tags/sdk-v0.1.12 - Owner: https://github.com/aryn-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
self-hosted -
Publication workflow:
aryn-sdk_release.yml@c6a142e1ef5d8f928da620bc787463268d500b64 -
Trigger Event:
push
-
Statement type: