A Python SDK for interacting with Docling Serve API using Pydantic models

These details have not been verified by PyPI

Project links

Project description

Docling Serve SDK

A comprehensive Python SDK for interacting with Docling Serve API using Pydantic models. This SDK provides type-safe, async/sync support for document conversion, chunking, and processing.

Author: Alberto Ferrer
Email: albertof@barrahome.org
Repository: https://github.com/bet0x/docling-serve-sdk
PyPI: https://pypi.org/project/docling-serve-sdk/

Features

✅ Document Conversion: PDF, DOCX, PPTX, HTML, images, and more
✅ Multiple Source Types: Local files, HTTP URLs, S3 storage
✅ Flexible Output: In-body, ZIP, S3, PUT targets
✅ OCR Processing: Multiple engines (EasyOCR, Tesseract, etc.)
✅ Table Extraction: Structure analysis and cell matching
✅ Image Handling: Processing, scaling, and embedding
✅ Chunking: Hierarchical and hybrid document chunking
✅ Async/Sync Support: Both synchronous and asynchronous operations
✅ Type Safety: Full Pydantic model validation
✅ Error Handling: Comprehensive exception handling
✅ Production Ready: Connection pooling, retries, timeouts

Installation

From PyPI (Recommended)

pip install docling-serve-sdk

From Source

# Clone the repository
git clone https://github.com/bet0x/docling-serve-sdk.git
cd docling-serve-sdk

# Install with pip
pip install -e .

# Or with uv
uv pip install -e .

Requirements

Python 3.8+
httpx >= 0.24.0
pydantic >= 2.0.0

Quick Start

from docling_serve_sdk import DoclingClient

# Create client
client = DoclingClient(base_url="http://localhost:5001")

# Check health
health = client.health_check()
print(f"Status: {health.status}")

# Convert document
result = client.convert_file("document.pdf")
print(f"Content: {result.document['md_content']}")

Documentation

📖 Complete Usage Guide - Comprehensive examples and advanced usage patterns

Examples

Basic Conversion

from docling_serve_sdk import DoclingClient

client = DoclingClient(base_url="http://localhost:5001")
result = client.convert_file("document.pdf")

print(f"Status: {result.status}")
print(f"Processing time: {result.processing_time:.2f}s")
print(f"Content: {result.document['md_content']}")

Advanced Usage with Multiple Sources

from docling_serve_sdk import (
    DoclingClient, ConvertDocumentsRequest, ConvertDocumentsRequestOptions,
    FileSourceRequest, HttpSourceRequest, ZipTarget,
    InputFormat, OutputFormat
)
import base64

# Create file source
with open("document.pdf", "rb") as f:
    content = base64.b64encode(f.read()).decode('utf-8')

file_source = FileSourceRequest(base64_string=content, filename="document.pdf")
http_source = HttpSourceRequest(url="https://example.com/doc.pdf")

# Create request with multiple sources
request = ConvertDocumentsRequest(
    sources=[file_source, http_source],
    options=ConvertDocumentsRequestOptions(
        from_formats=[InputFormat.PDF, InputFormat.DOCX],
        to_formats=[OutputFormat.MD, OutputFormat.HTML],
        do_ocr=True,
        include_images=True
    ),
    target=ZipTarget()
)

Async Usage

import asyncio
from docling_serve_sdk import DoclingClient

async def convert_document():
    client = DoclingClient(base_url="http://localhost:5001")
    
    # Check health
    health = await client.health_check_async()
    print(f"Status: {health.status}")
    
    # Convert document
    result = await client.convert_file_async("document.pdf")
    print(f"Content: {result.document['md_content']}")

# Run async function
asyncio.run(convert_document())

Error Handling

from docling_serve_sdk import DoclingClient, DoclingError, DoclingAPIError

client = DoclingClient(base_url="http://localhost:5001")

try:
    result = client.convert_file("document.pdf")
    print(f"Success: {result.status}")
except DoclingError as e:
    print(f"Docling error: {e}")
except DoclingAPIError as e:
    print(f"API error: {e}")
    print(f"Status code: {e.status_code}")
except Exception as e:
    print(f"Unexpected error: {e}")

Configuration Options

ConvertDocumentsRequestOptions

Option	Type	Default	Description
`from_formats`	List[InputFormat]	All formats	Input formats to accept
`to_formats`	List[OutputFormat]	`[MD]`	Output formats to generate
`image_export_mode`	ImageRefMode	`EMBEDDED`	How to handle images
`do_ocr`	bool	`True`	Enable OCR processing
`force_ocr`	bool	`False`	Force OCR over existing text
`ocr_engine`	OCREngine	`EASYOCR`	OCR engine to use
`pdf_backend`	PdfBackend	`DLPARSE_V4`	PDF processing backend
`table_mode`	TableMode	`ACCURATE`	Table processing mode
`include_images`	bool	`True`	Include images in output
`images_scale`	float	`2.0`	Image scale factor

Supported Formats

Input Formats:

PDF, DOCX, PPTX, HTML, MD, CSV, XLSX
Images (PNG, JPG, etc.)
XML (USPTO, JATS)
Audio files

Output Formats:

Markdown (MD)
HTML
JSON
Text
DocTags

API Reference

Core Classes

DoclingClient: Main client for API interactions
ConvertDocumentsRequest: Request model for document conversion
ConvertDocumentsRequestOptions: Configuration options
ConvertDocumentResponse: Response model for conversions

Source Types

FileSourceRequest: Local files (base64 encoded)
HttpSourceRequest: HTTP/HTTPS URLs
S3SourceRequest: S3-compatible storage

Target Types

InBodyTarget: Return in response body (default)
ZipTarget: Return as ZIP file
S3Target: Upload to S3
PutTarget: Upload via PUT request

Chunking

HierarchicalChunkerOptions: Hierarchical document chunking
HybridChunkerOptions: Hybrid semantic chunking

Testing

# Run basic tests
uv run python test_sdk.py

# Run new features tests
uv run python test_new_features.py

# Run integration tests
uv run python test_client_integration.py

# Or with pytest
pytest test_*.py

License

MIT License

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

Support

For issues and questions:

📖 Documentation: Complete Usage Guide
🐛 Issues: GitHub Issues
📦 PyPI: docling-serve-sdk
🔗 Docling Serve: Official Documentation

Changelog

v1.1.0 (Latest)

✅ Added complete API model coverage
✅ Added multiple source types (File, HTTP, S3)
✅ Added multiple target types (InBody, ZIP, S3, PUT)
✅ Added chunking options (Hierarchical, Hybrid)
✅ Added comprehensive error handling
✅ Added async/sync support
✅ Added complete documentation

v1.0.0

✅ Initial release
✅ Basic document conversion
✅ Health check functionality
✅ Custom options support

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.3.0

Sep 30, 2025

1.2.1

Sep 28, 2025

1.2.0

Sep 27, 2025

1.1.0

Sep 27, 2025

1.0.0

Sep 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docling_serve_sdk-1.3.0.tar.gz (14.1 kB view details)

Uploaded Sep 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

docling_serve_sdk-1.3.0-py3-none-any.whl (12.3 kB view details)

Uploaded Sep 30, 2025 Python 3

File details

Details for the file docling_serve_sdk-1.3.0.tar.gz.

File metadata

Download URL: docling_serve_sdk-1.3.0.tar.gz
Upload date: Sep 30, 2025
Size: 14.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for docling_serve_sdk-1.3.0.tar.gz
Algorithm	Hash digest
SHA256	`6ab7c1d12472437845142b48fefde63c19f80419faaf0ab7eb48d00e1179d01e`
MD5	`dad83c681b8cba231f78000617e6a0d9`
BLAKE2b-256	`85526784844ea8b392aff62ff1ab6ef2dd468a562292cce0f148eb89ce9e9b5e`

See more details on using hashes here.

File details

Details for the file docling_serve_sdk-1.3.0-py3-none-any.whl.

File metadata

Download URL: docling_serve_sdk-1.3.0-py3-none-any.whl
Upload date: Sep 30, 2025
Size: 12.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for docling_serve_sdk-1.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c6e81e4b7c22c0ac1df774b8ead1901f4f085490e37b384c56bdd8aca4bb01c5`
MD5	`d8d22034584a2acb8e011128ebd49f13`
BLAKE2b-256	`86fc9c27c5b1feaed8a86045dc882e60a26d38aa4a4803bec145c5a1431045b5`

See more details on using hashes here.

docling-serve-sdk 1.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Docling Serve SDK

Features

Installation

From PyPI (Recommended)

From Source

Requirements

Quick Start

Documentation

Examples

Basic Conversion

Advanced Usage with Multiple Sources

Async Usage

Error Handling

Configuration Options

ConvertDocumentsRequestOptions

Supported Formats

API Reference

Core Classes

Source Types

Target Types

Chunking

Testing

License

Contributing

Support

Changelog

v1.1.0 (Latest)

v1.0.0

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes