Python client for Chunkr: open source document intelligence

Project description

Chunkr Python Client

This provides a simple interface to interact with the Chunkr API.

Getting Started

You can get an API key from Chunkr or deploy your own Chunkr instance. For self-hosted deployment options, check out our deployment guide.

For more information about the API and its capabilities, visit the Chunkr API docs.

Installation

pip install chunkr-ai

Usage

The Chunkr client works seamlessly in both synchronous and asynchronous contexts.

Synchronous Usage

from chunkr_ai import Chunkr

# Initialize client
chunkr = Chunkr()

# Upload a file and wait for processing
task = chunkr.upload("document.pdf")
print(task.task_id)

# Create task without waiting
task = chunkr.create_task("document.pdf")
result = task.poll()  # Check status when needed

# Clean up when done
chunkr.close()

Asynchronous Usage

from chunkr_ai import Chunkr
import asyncio

async def process_document():
    # Initialize client
    chunkr = Chunkr()

    try:
        # Upload a file and wait for processing
        task = await chunkr.upload("document.pdf")
        print(task.task_id)

        # Create task without waiting
        task = await chunkr.create_task("document.pdf")
        result = await task.poll()  # Check status when needed
    finally:
        await chunkr.close()

# Run the async function
asyncio.run(process_document())

Concurrent Processing

The client supports both async concurrency and multiprocessing:

# Async concurrency
async def process_multiple():
    chunkr = Chunkr()
    try:
        tasks = [
            chunkr.upload("doc1.pdf"),
            chunkr.upload("doc2.pdf"),
            chunkr.upload("doc3.pdf")
        ]
        results = await asyncio.gather(*tasks)
    finally:
        await chunkr.close()

# Multiprocessing
from multiprocessing import Pool

def process_file(path):
    chunkr = Chunkr()
    try:
        return chunkr.upload(path)
    finally:
        chunkr.close()

with Pool(processes=3) as pool:
    results = pool.map(process_file, ["doc1.pdf", "doc2.pdf", "doc3.pdf"])

Input Types

The client supports various input types:

# File path
chunkr.upload("document.pdf")

# Opened file
with open("document.pdf", "rb") as f:
    chunkr.upload(f)

# PIL Image
from PIL import Image
img = Image.open("photo.jpg")
chunkr.upload(img)

Configuration

You can customize the processing behavior by passing a Configuration object:

from chunkr_ai.models import (
    Configuration, 
    OcrStrategy, 
    SegmentationStrategy, 
    GenerationStrategy
)

config = Configuration(
    ocr_strategy=OcrStrategy.AUTO,
    segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS,
    high_resolution=True,
    expires_in=3600,  # seconds
)

# Works in both sync and async contexts
task = chunkr.upload("document.pdf", config)  # sync
task = await chunkr.upload("document.pdf", config)  # async

Available Configuration Examples

Chunk Processing

from chunkr_ai.models import ChunkProcessing
config = Configuration(
    chunk_processing=ChunkProcessing(target_length=1024)
)

Expires In

config = Configuration(expires_in=3600)

High Resolution

config = Configuration(high_resolution=True)

JSON Schema

config = Configuration(json_schema=JsonSchema(
    title="Sales Data",
    properties=[
        Property(name="Person with highest sales", prop_type="string", description="The person with the highest sales"),
        Property(name="Person with lowest sales", prop_type="string", description="The person with the lowest sales"),
    ]
))

OCR Strategy

config = Configuration(ocr_strategy=OcrStrategy.AUTO)

Segment Processing

from chunkr_ai.models import SegmentProcessing, GenerationConfig, GenerationStrategy
config = Configuration(
    segment_processing=SegmentProcessing(
        page=GenerationConfig(
            html=GenerationStrategy.LLM,
            markdown=GenerationStrategy.LLM
        )
    )
)

Segmentation Strategy

config = Configuration(
    segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS  # or SegmentationStrategy.PAGE
)

Environment Setup

You can provide your API key and URL in several ways:

Environment variables: CHUNKR_API_KEY and CHUNKR_URL
.env file
Direct initialization:

chunkr = Chunkr(
    api_key="your-api-key",
    url="https://api.chunkr.ai"
)

Resource Management

It's recommended to properly close the client when you're done:

# Sync context
chunkr = Chunkr()
try:
    result = chunkr.upload("document.pdf")
finally:
    chunkr.close()

# Async context
async def process():
    chunkr = Chunkr()
    try:
        result = await chunkr.upload("document.pdf")
    finally:
        await chunkr.close()

Project details

Release history Release notifications | RSS feed

0.3.7

Aug 6, 2025

0.3.6 yanked

Aug 6, 2025

0.3.5 yanked

Aug 6, 2025

0.3.4 yanked

Aug 6, 2025

0.3.3 yanked

Aug 6, 2025

0.3.2 yanked

Aug 4, 2025

0.3.1

Jul 24, 2025

0.3.0

Jul 23, 2025

0.1.0

Jun 24, 2025

0.1.0a15 pre-release

Nov 3, 2025

0.1.0a14 pre-release

Nov 3, 2025

0.1.0a13 pre-release

Oct 14, 2025

0.1.0a12 pre-release

Oct 4, 2025

0.1.0a11 pre-release

Oct 2, 2025

0.1.0a10 pre-release

Oct 2, 2025

0.1.0a9 pre-release

Oct 1, 2025

0.1.0a8 pre-release

Sep 11, 2025

0.1.0a7 pre-release

Sep 1, 2025

0.1.0a6 pre-release

Aug 12, 2025

0.1.0a5 pre-release

Aug 12, 2025

0.1.0a4 pre-release

Aug 9, 2025

0.1.0a3 pre-release

Aug 9, 2025

0.1.0a2 pre-release

Aug 6, 2025

0.1.0a1 pre-release

Aug 2, 2025

0.0.50

May 22, 2025

0.0.49

May 6, 2025

0.0.48

Apr 22, 2025

0.0.47

Apr 18, 2025

0.0.46

Apr 16, 2025

0.0.45

Apr 6, 2025

0.0.44

Mar 29, 2025

0.0.43

Mar 21, 2025

0.0.41

Feb 19, 2025

0.0.40

Feb 17, 2025

0.0.39

Feb 15, 2025

0.0.38

Feb 12, 2025

0.0.37

Feb 10, 2025

0.0.36

Feb 10, 2025

0.0.35

Feb 7, 2025

0.0.34

Feb 3, 2025

0.0.33

Feb 3, 2025

0.0.32

Feb 3, 2025

0.0.31

Jan 31, 2025

0.0.30

Jan 29, 2025

0.0.29

Jan 29, 2025

This version

0.0.28

Jan 28, 2025

0.0.27

Jan 28, 2025

0.0.26

Jan 28, 2025

0.0.25

Jan 26, 2025

0.0.24

Jan 25, 2025

0.0.23

Jan 24, 2025

0.0.22

Jan 24, 2025

0.0.21

Jan 24, 2025

0.0.20

Jan 24, 2025

0.0.19

Jan 24, 2025

0.0.18

Jan 24, 2025

0.0.17

Jan 22, 2025

0.0.16

Jan 22, 2025

0.0.15

Jan 17, 2025

0.0.14

Jan 17, 2025

0.0.12

Jan 17, 2025

0.0.11

Jan 17, 2025

0.0.10

Jan 17, 2025

0.0.9

Jan 16, 2025

0.0.8

Jan 16, 2025

0.0.7

Jan 16, 2025

0.0.6

Jan 11, 2025

0.0.5

Jan 10, 2025

0.0.4

Jan 10, 2025

0.0.3

Jan 10, 2025

0.0.2

Jan 9, 2025

0.0.1

Jan 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chunkr_ai-0.0.28.tar.gz (14.8 kB view details)

Uploaded Jan 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

chunkr_ai-0.0.28-py3-none-any.whl (14.8 kB view details)

Uploaded Jan 28, 2025 Python 3

File details

Details for the file chunkr_ai-0.0.28.tar.gz.

File metadata

Download URL: chunkr_ai-0.0.28.tar.gz
Upload date: Jan 28, 2025
Size: 14.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.4.29

File hashes

Hashes for chunkr_ai-0.0.28.tar.gz
Algorithm	Hash digest
SHA256	`36c325ee4681be78588ad6e6df7bf9f4a30e1d12d6ee667032ff5afe95944e3c`
MD5	`93d93e25c392b157d0223675c3d7889c`
BLAKE2b-256	`fccfc48535a9a6eb3a465dc5a3c36c525e043acb655ccc6c4100797def5d5e1c`

See more details on using hashes here.

File details

Details for the file chunkr_ai-0.0.28-py3-none-any.whl.

File metadata

Download URL: chunkr_ai-0.0.28-py3-none-any.whl
Upload date: Jan 28, 2025
Size: 14.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.4.29

File hashes

Hashes for chunkr_ai-0.0.28-py3-none-any.whl
Algorithm	Hash digest
SHA256	`54402f3689a26adb3f9c5be88c950749636f248b1b5f19bd392e75916be3f8da`
MD5	`a809ea195f6b2a7bab9069d7820f2e9a`
BLAKE2b-256	`aa28a4572be58c49abc312dae8201e2c1d116e8c97c4d25d0e36bb18e63a5102`

See more details on using hashes here.

chunkr-ai 0.0.28

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Chunkr Python Client

Getting Started

Installation

Usage

Synchronous Usage

Asynchronous Usage

Concurrent Processing

Input Types

Configuration

Available Configuration Examples

Environment Setup

Resource Management

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes