Skip to main content

Python client for Chunkr: open source document intelligence

Project description

Chunkr Python Client

This provides a simple interface to interact with the Chunkr API.

Getting Started

You can get an API key from Chunkr or deploy your own Chunkr instance. For self-hosted deployment options, check out our deployment guide.

For more information about the API and its capabilities, visit the Chunkr API docs.

Installation

pip install chunkr-ai

Usage

The Chunkr client works seamlessly in both synchronous and asynchronous contexts.

Synchronous Usage

from chunkr_ai import Chunkr

# Initialize client
chunkr = Chunkr()

# Upload a file and wait for processing
task = chunkr.upload("document.pdf")
print(task.task_id)

# Create task without waiting
task = chunkr.create_task("document.pdf")
result = task.poll()  # Check status when needed

# Clean up when done
chunkr.close()

Asynchronous Usage

from chunkr_ai import Chunkr
import asyncio

async def process_document():
    # Initialize client
    chunkr = Chunkr()

    try:
        # Upload a file and wait for processing
        task = await chunkr.upload("document.pdf")
        print(task.task_id)

        # Create task without waiting
        task = await chunkr.create_task("document.pdf")
        result = await task.poll()  # Check status when needed
    finally:
        await chunkr.close()

# Run the async function
asyncio.run(process_document())

Concurrent Processing

The client supports both async concurrency and multiprocessing:

# Async concurrency
async def process_multiple():
    chunkr = Chunkr()
    try:
        tasks = [
            chunkr.upload("doc1.pdf"),
            chunkr.upload("doc2.pdf"),
            chunkr.upload("doc3.pdf")
        ]
        results = await asyncio.gather(*tasks)
    finally:
        await chunkr.close()

# Multiprocessing
from multiprocessing import Pool

def process_file(path):
    chunkr = Chunkr()
    try:
        return chunkr.upload(path)
    finally:
        chunkr.close()

with Pool(processes=3) as pool:
    results = pool.map(process_file, ["doc1.pdf", "doc2.pdf", "doc3.pdf"])

Input Types

The client supports various input types:

# File path
chunkr.upload("document.pdf")

# Opened file
with open("document.pdf", "rb") as f:
    chunkr.upload(f)

# PIL Image
from PIL import Image
img = Image.open("photo.jpg")
chunkr.upload(img)

Configuration

You can customize the processing behavior by passing a Configuration object:

from chunkr_ai.models import (
    Configuration, 
    OcrStrategy, 
    SegmentationStrategy, 
    GenerationStrategy
)

config = Configuration(
    ocr_strategy=OcrStrategy.AUTO,
    segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS,
    high_resolution=True,
    expires_in=3600,  # seconds
)

# Works in both sync and async contexts
task = chunkr.upload("document.pdf", config)  # sync
task = await chunkr.upload("document.pdf", config)  # async

Available Configuration Examples

  • Chunk Processing

    from chunkr_ai.models import ChunkProcessing
    config = Configuration(
        chunk_processing=ChunkProcessing(target_length=1024)
    )
    
  • Expires In

    config = Configuration(expires_in=3600)
    
  • High Resolution

    config = Configuration(high_resolution=True)
    
  • JSON Schema

    config = Configuration(json_schema=JsonSchema(
        title="Sales Data",
        properties=[
            Property(name="Person with highest sales", prop_type="string", description="The person with the highest sales"),
            Property(name="Person with lowest sales", prop_type="string", description="The person with the lowest sales"),
        ]
    ))
    
  • OCR Strategy

    config = Configuration(ocr_strategy=OcrStrategy.AUTO)
    
  • Segment Processing

    from chunkr_ai.models import SegmentProcessing, GenerationConfig, GenerationStrategy
    config = Configuration(
        segment_processing=SegmentProcessing(
            page=GenerationConfig(
                html=GenerationStrategy.LLM,
                markdown=GenerationStrategy.LLM
            )
        )
    )
    
  • Segmentation Strategy

    config = Configuration(
        segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS  # or SegmentationStrategy.PAGE
    )
    

Environment Setup

You can provide your API key and URL in several ways:

  1. Environment variables: CHUNKR_API_KEY and CHUNKR_URL
  2. .env file
  3. Direct initialization:
chunkr = Chunkr(
    api_key="your-api-key",
    url="https://api.chunkr.ai"
)

Resource Management

It's recommended to properly close the client when you're done:

# Sync context
chunkr = Chunkr()
try:
    result = chunkr.upload("document.pdf")
finally:
    chunkr.close()

# Async context
async def process():
    chunkr = Chunkr()
    try:
        result = await chunkr.upload("document.pdf")
    finally:
        await chunkr.close()

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chunkr_ai-0.0.22.tar.gz (14.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chunkr_ai-0.0.22-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file chunkr_ai-0.0.22.tar.gz.

File metadata

  • Download URL: chunkr_ai-0.0.22.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.29

File hashes

Hashes for chunkr_ai-0.0.22.tar.gz
Algorithm Hash digest
SHA256 e3dc0926d2338cecdbca33cd150f50a9d1fc3cabf62b699a3ed758074989871d
MD5 ca0abbf1e230e007fb0e31d822c14a3b
BLAKE2b-256 3494e0bb4bd7bc8daee63baa1dd7727139203931453533b4148cce5dc4e4aaf4

See more details on using hashes here.

File details

Details for the file chunkr_ai-0.0.22-py3-none-any.whl.

File metadata

  • Download URL: chunkr_ai-0.0.22-py3-none-any.whl
  • Upload date:
  • Size: 15.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.29

File hashes

Hashes for chunkr_ai-0.0.22-py3-none-any.whl
Algorithm Hash digest
SHA256 75c2fe54e62734062dd73807f7be32bc80efc5e8f489842f8c6572050a54a79b
MD5 48c309bdf27167a17b73d11e1a4fdd32
BLAKE2b-256 45dfc0e1c1b226b1ede183a07b45aa8e2b9f2f99429dc2242da48807a175aee8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page