Skip to main content

Official Python SDK for the pdftables.io API – extract tables from PDFs.

Project description

pdftables-io

Official Python SDK for the pdftables.io API — extract tables from PDFs programmatically.

pip install pdftables-io

Quick Start

from pdftables import PDFTablesClient

client = PDFTablesClient(api_key="your-api-key")

# 1. Upload a PDF
upload = client.upload("invoice.pdf")

# 2. Start table extraction
job = client.create_job(upload.upload_id)

# 3. Wait for completion
job = client.wait_for_job(job.id)

# 4. Download results as CSV
csv_zip = client.download_job_csv(job.id)
with open("tables.zip", "wb") as f:
    f.write(csv_zip)

Authentication

Pass your API key directly or set the PDFTABLES_API_KEY environment variable:

# Explicit
client = PDFTablesClient(api_key="sk_live_...")

# Via environment variable
# export PDFTABLES_API_KEY=sk_live_...
client = PDFTablesClient()

Async Usage

import asyncio
from pdftables import AsyncPDFTablesClient

async def main():
    async with AsyncPDFTablesClient(api_key="your-api-key") as client:
        upload = await client.upload("invoice.pdf")
        job = await client.create_job(upload.upload_id)
        job = await client.wait_for_job(job.id)
        csv_zip = await client.download_job_csv(job.id)

asyncio.run(main())

API Reference

Upload

Method Description
upload(file) Upload a PDF file (path or file object)
list_uploads() List all uploads

Extraction Jobs

Method Description
create_job(upload_id, *, pages, mode) Start extraction (mode: auto, stream, lattice)
get_job(job_id) Get job status
wait_for_job(job_id, *, poll_interval, timeout) Poll until complete
list_jobs() List all jobs
list_job_tables(job_id) List extracted tables

Downloads

Method Description
download_table(table_id, *, format, structure) Download single table (csv/json/xlsx)
download_tables_zip(table_ids, *, format, structure) Download multiple tables as ZIP
download_job_csv(job_id) Download all job tables as CSV ZIP
download_job_xlsx(job_id) Download all job tables as XLSX ZIP
download_job_json(job_id) Download all job tables as JSON ZIP

Export Structures

Method Description
list_structures() List all structures
create_structure(*, name, slug, fields, ...) Create custom structure
get_structure(structure_id) Get structure details
update_structure(structure_id, *, name, slug, ...) Update structure
delete_structure(structure_id) Delete structure

DATEV

Method Description
create_datev_export(job_id, *, table_id, fiscal_year) Trigger DATEV export
download_datev_export(job_id, datev_id, *, format) Download DATEV file

Error Handling

from pdftables import PDFTablesClient, AuthenticationError, RateLimitError

client = PDFTablesClient(api_key="your-key")

try:
    upload = client.upload("invoice.pdf")
except AuthenticationError:
    print("Invalid API key")
except RateLimitError:
    print("Rate limit exceeded — try again later")
Exception HTTP Status
AuthenticationError 401, 403
ValidationError 400
PaymentRequiredError 402
NotFoundError 404
RateLimitError 429
ConflictError 409
ServerError 5xx

Advanced: Custom Base URL

client = PDFTablesClient(
    api_key="your-key",
    base_url="https://staging-api.pdftables.io",
    timeout=60.0,
)

Requirements

License

BSD 3-Clause — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdftables_io-0.1.0.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdftables_io-0.1.0-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file pdftables_io-0.1.0.tar.gz.

File metadata

  • Download URL: pdftables_io-0.1.0.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pdftables_io-0.1.0.tar.gz
Algorithm Hash digest
SHA256 36700be5c0a1b9bfb38c4c9ece8bdc24f91bde2be103988755ea29de11c0f4e2
MD5 99b021b4f1671e474de1b74d9a2616e7
BLAKE2b-256 e42de90cebeec3916cb3d8f35555cf95247a05ad1b94111ed2775a8578a57464

See more details on using hashes here.

File details

Details for the file pdftables_io-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pdftables_io-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pdftables_io-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b4f087cf50c21412e1879fe2c2f8f99dfde038a31bab7a2969335d369ac58f2f
MD5 0d8e86d85d9ad4e79cc9e965de49c24a
BLAKE2b-256 d71e4b2f6b958f738770fdcbc6323fedc7b6ea1deafee9302503840d921773e5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page