Python client for Chunkr: open source document intelligence
Project description
Chunkr Python Client
This provides a simple interface to interact with the Chunkr API.
Getting Started
You can get an API key from Chunkr or deploy your own Chunkr instance. For self-hosted deployment options, check out our deployment guide.
For more information about the API and its capabilities, visit the Chunkr API docs.
Installation
pip install chunkr-ai
Usage
The Chunkr client works seamlessly in both synchronous and asynchronous contexts.
Synchronous Usage
from chunkr_ai import Chunkr
# Initialize client
chunkr = Chunkr()
# Upload a file and wait for processing
task = chunkr.upload("document.pdf")
print(task.task_id)
# Create task without waiting
task = chunkr.create_task("document.pdf")
result = task.poll() # Check status when needed
# Clean up when done
chunkr.close()
Asynchronous Usage
from chunkr_ai import Chunkr
import asyncio
async def process_document():
# Initialize client
chunkr = Chunkr()
try:
# Upload a file and wait for processing
task = await chunkr.upload("document.pdf")
print(task.task_id)
# Create task without waiting
task = await chunkr.create_task("document.pdf")
result = await task.poll() # Check status when needed
finally:
await chunkr.close()
# Run the async function
asyncio.run(process_document())
Concurrent Processing
The client supports both async concurrency and multiprocessing:
# Async concurrency
async def process_multiple():
chunkr = Chunkr()
try:
tasks = [
chunkr.upload("doc1.pdf"),
chunkr.upload("doc2.pdf"),
chunkr.upload("doc3.pdf")
]
results = await asyncio.gather(*tasks)
finally:
await chunkr.close()
# Multiprocessing
from multiprocessing import Pool
def process_file(path):
chunkr = Chunkr()
try:
return chunkr.upload(path)
finally:
chunkr.close()
with Pool(processes=3) as pool:
results = pool.map(process_file, ["doc1.pdf", "doc2.pdf", "doc3.pdf"])
Input Types
The client supports various input types:
# File path
chunkr.upload("document.pdf")
# Opened file
with open("document.pdf", "rb") as f:
chunkr.upload(f)
# PIL Image
from PIL import Image
img = Image.open("photo.jpg")
chunkr.upload(img)
Configuration
You can customize the processing behavior by passing a Configuration object:
from chunkr_ai.models import (
Configuration,
OcrStrategy,
SegmentationStrategy,
GenerationStrategy
)
config = Configuration(
ocr_strategy=OcrStrategy.AUTO,
segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS,
high_resolution=True,
expires_in=3600, # seconds
)
# Works in both sync and async contexts
task = chunkr.upload("document.pdf", config) # sync
task = await chunkr.upload("document.pdf", config) # async
Available Configuration Examples
-
Chunk Processing
from chunkr_ai.models import ChunkProcessing config = Configuration( chunk_processing=ChunkProcessing(target_length=1024) )
-
Expires In
config = Configuration(expires_in=3600)
-
High Resolution
config = Configuration(high_resolution=True)
-
JSON Schema
config = Configuration(json_schema=JsonSchema( title="Sales Data", properties=[ Property(name="Person with highest sales", prop_type="string", description="The person with the highest sales"), Property(name="Person with lowest sales", prop_type="string", description="The person with the lowest sales"), ] ))
-
OCR Strategy
config = Configuration(ocr_strategy=OcrStrategy.AUTO)
-
Segment Processing
from chunkr_ai.models import SegmentProcessing, GenerationConfig, GenerationStrategy config = Configuration( segment_processing=SegmentProcessing( page=GenerationConfig( html=GenerationStrategy.LLM, markdown=GenerationStrategy.LLM ) ) )
-
Segmentation Strategy
config = Configuration( segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS # or SegmentationStrategy.PAGE )
Environment Setup
You can provide your API key and URL in several ways:
- Environment variables:
CHUNKR_API_KEYandCHUNKR_URL .envfile- Direct initialization:
chunkr = Chunkr(
api_key="your-api-key",
url="https://api.chunkr.ai"
)
Resource Management
It's recommended to properly close the client when you're done:
# Sync context
chunkr = Chunkr()
try:
result = chunkr.upload("document.pdf")
finally:
chunkr.close()
# Async context
async def process():
chunkr = Chunkr()
try:
result = await chunkr.upload("document.pdf")
finally:
await chunkr.close()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chunkr_ai-0.0.28.tar.gz.
File metadata
- Download URL: chunkr_ai-0.0.28.tar.gz
- Upload date:
- Size: 14.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.4.29
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
36c325ee4681be78588ad6e6df7bf9f4a30e1d12d6ee667032ff5afe95944e3c
|
|
| MD5 |
93d93e25c392b157d0223675c3d7889c
|
|
| BLAKE2b-256 |
fccfc48535a9a6eb3a465dc5a3c36c525e043acb655ccc6c4100797def5d5e1c
|
File details
Details for the file chunkr_ai-0.0.28-py3-none-any.whl.
File metadata
- Download URL: chunkr_ai-0.0.28-py3-none-any.whl
- Upload date:
- Size: 14.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.4.29
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54402f3689a26adb3f9c5be88c950749636f248b1b5f19bd392e75916be3f8da
|
|
| MD5 |
a809ea195f6b2a7bab9069d7820f2e9a
|
|
| BLAKE2b-256 |
aa28a4572be58c49abc312dae8201e2c1d116e8c97c4d25d0e36bb18e63a5102
|