A Python SDK for PDF Craft API
Project description
PDF Craft SDK
English | 简体中文
A Python SDK for interacting with the PDF Craft API. It simplifies the process of converting PDFs to Markdown or EPUB by handling authentication, file upload, task submission, and result polling.
Features
- 🚀 Easy PDF Conversion: Convert PDFs to Markdown or EPUB format
- 📤 Local File Upload: Upload and convert local PDF files with progress tracking
- 🔄 Automatic Retry: Built-in retry mechanism for robust operations
- ⏱️ Flexible Polling: Configurable polling strategies for task completion
- 📊 Progress Tracking: Monitor upload progress with callbacks
- 🔧 Type Safe: Full type hints support
Installation
You can install the package from PyPI:
pip install pdf-craft-sdk
Quick Start
Converting Local PDF Files
The easiest way to convert a local PDF file:
from pdf_craft_sdk import PDFCraftClient
# Initialize the client
client = PDFCraftClient(api_key="YOUR_API_KEY")
# Upload and convert a local PDF file
download_url = client.convert_local_pdf("document.pdf")
print(f"Conversion successful! Download URL: {download_url}")
💡 See examples.py for 10 complete usage examples covering all features!
Converting Remote PDF Files
If you already have a PDF URL from the upload API:
from pdf_craft_sdk import PDFCraftClient, FormatType
client = PDFCraftClient(api_key="YOUR_API_KEY")
# Convert a PDF to Markdown and wait for the result
try:
pdf_url = "https://oomol-file-cache.example.com/your-file.pdf"
download_url = client.convert(pdf_url, format_type=FormatType.MARKDOWN)
print(f"Conversion successful! Download URL: {download_url}")
except Exception as e:
print(f"An error occurred: {e}")
Advanced Usage
Usage Examples
Upload with Progress Tracking
Monitor the upload progress of large files:
from pdf_craft_sdk import PDFCraftClient, UploadProgress
def on_progress(progress: UploadProgress):
print(f"Upload progress: {progress.percentage:.2f}% "
f"({progress.current_part}/{progress.total_parts} parts)")
client = PDFCraftClient(api_key="YOUR_API_KEY")
# Upload and convert with progress tracking
download_url = client.convert_local_pdf(
"large_document.pdf",
progress_callback=on_progress
)
Convert to EPUB Format
from pdf_craft_sdk import PDFCraftClient, FormatType
client = PDFCraftClient(api_key="YOUR_API_KEY")
# Convert to EPUB with footnotes
download_url = client.convert_local_pdf(
"document.pdf",
format_type=FormatType.EPUB,
includes_footnotes=True
)
Manual Upload and Conversion
If you prefer to handle the steps manually or asynchronously:
from pdf_craft_sdk import PDFCraftClient, FormatType
client = PDFCraftClient(api_key="YOUR_API_KEY")
# Step 1: Upload local file
cache_url = client.upload_file("document.pdf")
print(f"Uploaded to: {cache_url}")
# Step 2: Submit conversion task
task_id = client.submit_conversion(cache_url, format_type=FormatType.MARKDOWN)
print(f"Task ID: {task_id}")
# Step 3: Wait for completion
download_url = client.wait_for_completion(task_id)
print(f"Download URL: {download_url}")
Configuration
Polling Strategies
The convert and wait_for_completion methods accept optional configuration for polling behavior:
max_wait_ms: Maximum time (in milliseconds) to wait for the conversion. Default is 7200000 (2 hours).check_interval_ms: Initial polling interval (in milliseconds). Default is 1000 (1 second).max_check_interval_ms: Maximum polling interval (in milliseconds). Default is 5000 (5 seconds).backoff_factor: Multiplier for increasing interval after each check, orPollingStrategyenum. Default isPollingStrategy.EXPONENTIAL(1.5).
Available polling strategies:
PollingStrategy.EXPONENTIAL(1.5): Default. Starts fast, slows down.PollingStrategy.FIXED(1.0): Polls at a fixed interval.PollingStrategy.AGGRESSIVE(2.0): Doubles the interval each time.
from pdf_craft_sdk import PollingStrategy
# Example: Stable Polling (Every 3 seconds)
download_url = client.convert(
pdf_url="https://oomol-file-cache.example.com/your-file.pdf",
check_interval_ms=3000,
max_check_interval_ms=3000,
backoff_factor=PollingStrategy.FIXED
)
# Example: Long Running Task (Start slow, check infrequently)
download_url = client.convert(
pdf_url="https://oomol-file-cache.example.com/your-file.pdf",
check_interval_ms=5000,
max_check_interval_ms=60000, # 1 minute
backoff_factor=PollingStrategy.AGGRESSIVE
)
API Reference
PDFCraftClient
Constructor
PDFCraftClient(api_key, base_url=None, upload_base_url=None)
Initialize the PDF Craft client.
Parameters:
api_key(str): Your API keybase_url(str, optional): Custom API base URLupload_base_url(str, optional): Custom upload API base URL
Methods
convert_local_pdf(file_path, **kwargs)
Upload and convert a local PDF file in one step.
Parameters:
file_path(str): Path to the local PDF fileformat_type(str | FormatType): Output format, "markdown" or "epub" (default: "markdown")model(str): Model to use (default: "gundam")includes_footnotes(bool): Include footnotes (default: False)ignore_pdf_errors(bool): Ignore PDF parsing errors (default: True)ignore_ocr_errors(bool): Ignore OCR errors (default: True)wait(bool): Wait for completion (default: True)max_wait_ms(int): Max wait time in milliseconds (default: 7200000)check_interval_ms(int): Initial polling interval in milliseconds (default: 1000)max_check_interval_ms(int): Max polling interval in milliseconds (default: 5000)backoff_factor(float | PollingStrategy): Polling backoff factor (default: PollingStrategy.EXPONENTIAL)progress_callback(callable): Upload progress callback functionupload_max_retries(int): Max upload retries per part (default: 3)
Returns: Download URL (str) if wait=True, else task ID (str)
upload_file(file_path, progress_callback=None, max_retries=3)
Upload a local PDF file to cloud cache.
Parameters:
file_path(str): Path to the local PDF fileprogress_callback(callable): Progress callback functionmax_retries(int): Max retries per upload part (default: 3)
Returns: Cache URL (str)
convert(pdf_url, **kwargs)
Convert a PDF from URL.
Parameters:
pdf_url(str): PDF URL to convert (HTTPS URL from upload API)format_type(str | FormatType): Output format (default: "markdown")- Other parameters same as
convert_local_pdf
Returns: Download URL (str)
submit_conversion(pdf_url, **kwargs)
Submit a conversion task without waiting.
Parameters:
pdf_url(str): PDF URL to convertformat_type(str | FormatType): Output formatmodel(str): Model to useincludes_footnotes(bool): Include footnotesignore_pdf_errors(bool): Ignore PDF parsing errorsignore_ocr_errors(bool): Ignore OCR errors
Returns: Task ID (str)
wait_for_completion(task_id, **kwargs)
Wait for a conversion task to complete.
Parameters:
task_id(str): Task ID fromsubmit_conversion- Polling parameters same as
convert_local_pdf
Returns: Download URL (str)
UploadProgress
Progress information for file uploads.
Attributes:
uploaded_bytes(int): Bytes uploaded so fartotal_bytes(int): Total bytes to uploadcurrent_part(int): Current part number being uploadedtotal_parts(int): Total number of partspercentage(float): Progress percentage (0-100)
Example:
def on_progress(progress):
print(f"{progress.percentage:.1f}% - Part {progress.current_part}/{progress.total_parts}")
Error Handling
The SDK raises the following exceptions:
FileNotFoundError: When the specified file doesn't existAPIError: When API requests failTimeoutError: When conversion exceeds max wait time
Example:
from pdf_craft_sdk import PDFCraftClient
from pdf_craft_sdk.exceptions import APIError
client = PDFCraftClient(api_key="YOUR_API_KEY")
try:
download_url = client.convert_local_pdf("document.pdf")
print(f"Success: {download_url}")
except FileNotFoundError:
print("File not found!")
except APIError as e:
print(f"API error: {e}")
except TimeoutError:
print("Conversion timed out")
Advanced Features
Custom Upload Endpoint
If you need to use a custom upload API endpoint:
client = PDFCraftClient(
api_key="YOUR_API_KEY",
upload_base_url="https://custom.example.com/upload"
)
Default upload endpoint: https://llm.oomol.com/api/tasks/files/remote-cache
Batch Processing
Process multiple files:
import os
from pdf_craft_sdk import PDFCraftClient
client = PDFCraftClient(api_key="YOUR_API_KEY")
pdf_files = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]
for pdf_file in pdf_files:
try:
print(f"Processing {pdf_file}...")
download_url = client.convert_local_pdf(pdf_file, wait=False)
print(f"Task submitted: {download_url}")
except Exception as e:
print(f"Error processing {pdf_file}: {e}")
License
This project is licensed under the MIT License.
Support
For issues, questions, or contributions, please visit our GitHub repository.
Complete Examples
See examples.py for complete, runnable examples including:
- ✅ Basic local PDF conversion
- 📊 Upload with progress tracking
- 📖 EPUB format conversion
- 🔧 Manual step-by-step upload and conversion
- 🌐 Remote PDF conversion
- ⚙️ Custom polling strategies
- 🛡️ Proper error handling
- 📦 Batch processing multiple files
- 🔌 Custom upload endpoint
- ⏱️ Async workflow (submit now, check later)
Run examples:
# Get your API key from https://console.oomol.com/api-key
# Then edit examples.py and replace 'your_api_key_here' with your actual API key
# Run examples
python examples.py
# Choose a specific example (1-10) or 'all' to run all examples
Changelog
Version 0.4.0
- ✨ Added local file upload functionality
- ✨ Added
convert_local_pdf()convenience method - ✨ Added upload progress tracking with callbacks
- 🐛 Fixed null
uploaded_partshandling in upload response - 📝 Improved documentation and examples
Version 0.3.0
- Initial public release
- Basic PDF to Markdown/EPUB conversion
- Configurable polling strategies
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf_craft_sdk-0.4.2.tar.gz.
File metadata
- Download URL: pdf_craft_sdk-0.4.2.tar.gz
- Upload date:
- Size: 16.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa2de35ab4da36c6b15d1e7cab162ed2cc1c7a650aebe5036342b2d38c8be638
|
|
| MD5 |
e4554afc6ab5320e0d7f9c03af137d6a
|
|
| BLAKE2b-256 |
4fd8e84b34305c7e0b9df474690cea98f12c63e9703f47d5cdcb9fe3a0934ea6
|
File details
Details for the file pdf_craft_sdk-0.4.2-py3-none-any.whl.
File metadata
- Download URL: pdf_craft_sdk-0.4.2-py3-none-any.whl
- Upload date:
- Size: 14.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa258f0e93028bd94624f98d7e9b16ceca5ceec721657844a77705c77e0c6dde
|
|
| MD5 |
8a4eeba5bd605933c53e6084d85bdf65
|
|
| BLAKE2b-256 |
547112e92998222b5d996b1cb8a20a317d9fb3fd95a01e160832dedf02a3802a
|