Official Python SDK for the pdftables.io API – extract tables from PDFs.
Project description
pdftables-io
Official Python SDK for the pdftables.io API — extract tables from PDFs programmatically.
pip install pdftables-io
Quick Start
from pdftables import PDFTablesClient
client = PDFTablesClient(api_key="your-api-key")
# 1. Upload a PDF
upload = client.upload("invoice.pdf")
# 2. Start table extraction
job = client.create_job(upload.upload_id)
# 3. Wait for completion
job = client.wait_for_job(job.id)
# 4. Download results as CSV
csv_zip = client.download_job_csv(job.id)
with open("tables.zip", "wb") as f:
f.write(csv_zip)
Authentication
Pass your API key directly or set the PDFTABLES_API_KEY environment variable:
# Explicit
client = PDFTablesClient(api_key="sk_live_...")
# Via environment variable
# export PDFTABLES_API_KEY=sk_live_...
client = PDFTablesClient()
Async Usage
import asyncio
from pdftables import AsyncPDFTablesClient
async def main():
async with AsyncPDFTablesClient(api_key="your-api-key") as client:
upload = await client.upload("invoice.pdf")
job = await client.create_job(upload.upload_id)
job = await client.wait_for_job(job.id)
csv_zip = await client.download_job_csv(job.id)
asyncio.run(main())
API Reference
Upload
| Method | Description |
|---|---|
upload(file) |
Upload a PDF file (path or file object) |
list_uploads() |
List all uploads |
Extraction Jobs
| Method | Description |
|---|---|
create_job(upload_id, *, pages, mode) |
Start extraction (mode: auto, stream, lattice) |
get_job(job_id) |
Get job status |
wait_for_job(job_id, *, poll_interval, timeout) |
Poll until complete |
list_jobs() |
List all jobs |
list_job_tables(job_id) |
List extracted tables |
Downloads
| Method | Description |
|---|---|
download_table(table_id, *, format, structure) |
Download single table (csv/json/xlsx) |
download_tables_zip(table_ids, *, format, structure) |
Download multiple tables as ZIP |
download_job_csv(job_id) |
Download all job tables as CSV ZIP |
download_job_xlsx(job_id) |
Download all job tables as XLSX ZIP |
download_job_json(job_id) |
Download all job tables as JSON ZIP |
Export Structures
| Method | Description |
|---|---|
list_structures() |
List all structures |
create_structure(*, name, slug, fields, ...) |
Create custom structure |
get_structure(structure_id) |
Get structure details |
update_structure(structure_id, *, name, slug, ...) |
Update structure |
delete_structure(structure_id) |
Delete structure |
DATEV
| Method | Description |
|---|---|
create_datev_export(job_id, *, table_id, fiscal_year) |
Trigger DATEV export |
download_datev_export(job_id, datev_id, *, format) |
Download DATEV file |
Error Handling
from pdftables import PDFTablesClient, AuthenticationError, RateLimitError
client = PDFTablesClient(api_key="your-key")
try:
upload = client.upload("invoice.pdf")
except AuthenticationError:
print("Invalid API key")
except RateLimitError:
print("Rate limit exceeded — try again later")
| Exception | HTTP Status |
|---|---|
AuthenticationError |
401, 403 |
ValidationError |
400 |
PaymentRequiredError |
402 |
NotFoundError |
404 |
RateLimitError |
429 |
ConflictError |
409 |
ServerError |
5xx |
Advanced: Custom Base URL
client = PDFTablesClient(
api_key="your-key",
base_url="https://staging-api.pdftables.io",
timeout=60.0,
)
Requirements
License
BSD 3-Clause — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pdftables_io-0.1.0.tar.gz
(12.5 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdftables_io-0.1.0.tar.gz.
File metadata
- Download URL: pdftables_io-0.1.0.tar.gz
- Upload date:
- Size: 12.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
36700be5c0a1b9bfb38c4c9ece8bdc24f91bde2be103988755ea29de11c0f4e2
|
|
| MD5 |
99b021b4f1671e474de1b74d9a2616e7
|
|
| BLAKE2b-256 |
e42de90cebeec3916cb3d8f35555cf95247a05ad1b94111ed2775a8578a57464
|
File details
Details for the file pdftables_io-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pdftables_io-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b4f087cf50c21412e1879fe2c2f8f99dfde038a31bab7a2969335d369ac58f2f
|
|
| MD5 |
0d8e86d85d9ad4e79cc9e965de49c24a
|
|
| BLAKE2b-256 |
d71e4b2f6b958f738770fdcbc6323fedc7b6ea1deafee9302503840d921773e5
|