Skip to main content

AI-powered document intelligence platform - Turn your data into structured data with a single line of code.

Project description

ByteIT Python SDK

Python client for ByteIT — AI-powered document parsing. Extract structured text from PDFs, Word files, images, and more with a single API call.


Installation

pip install byteit

Requires Python 3.8+ and an API key from byteit.ai.


Quick Start

from byteit import ByteITClient

client = ByteITClient(api_key="your_api_key")
result = client.parse("document.pdf")
print(result.decode())

Returns raw bytes. Pass output="result.md" to save directly to disk.


Usage

Parse and save

# Returns bytes
result = client.parse("invoice.pdf", result_format="json")

# Save to file
client.parse("invoice.pdf", result_format="md", output="invoice.md")

Output formats: md (default), txt, json, html

Async (non-blocking)

Submit a job and check back later — useful for large files or batch workflows.

# Submit without waiting
job = client.parse_async("document.pdf")

# Poll status
status = client.get_job_status(job.id)
# status.processing_status: "pending" | "processing" | "completed" | "failed"

# Download when ready
if status.is_completed:
    result = client.get_job_result(job.id)

Job management

for job in client.get_jobs():
    print(f"{job.id}  {job.processing_status}  {job.result_format}")

Processing options

from byteit import ProcessingOptions

result = client.parse(
    "document.pdf",
    processing_options=ProcessingOptions(languages=["de", "en"], page_range="1-5"),
)

Or pass a plain dict:

result = client.parse("doc.pdf", processing_options={"languages": ["de"]})

API key from environment

import os
client = ByteITClient(api_key=os.environ["BYTEIT_API_KEY"])

Context manager

with ByteITClient(api_key="your_key") as client:
    result = client.parse("doc.pdf")

Supported File Types

Documents Images
PDF .pdf PNG .png
Word .docx JPEG .jpg .jpeg
PowerPoint .pptx TIFF .tiff
HTML .html BMP .bmp
Markdown .md
Plain text .txt
JSON .json
XML .xml

Error Handling

All exceptions inherit from ByteITError.

from byteit.exceptions import (
    AuthenticationError,
    ValidationError,
    RateLimitError,
    JobProcessingError,
    ByteITError,
)

try:
    result = client.parse("document.pdf")
except AuthenticationError:
    print("Invalid API key")
except ValidationError as e:
    print("Bad request:", e.message)
except RateLimitError:
    print("Rate limit hit — retry later")
except JobProcessingError as e:
    print("Processing failed:", e.message)
except ByteITError as e:
    print("Unexpected error:", e.message)
Exception When raised
AuthenticationError Invalid or missing API key
APIKeyError API key rejected (403)
ValidationError Bad request parameters
ResourceNotFoundError Job not found
RateLimitError Rate limit exceeded
JobProcessingError Job failed during processing
ServerError Server-side error (5xx)

API Reference

ByteITClient(api_key)

Method Description
parse(input, ...) Parse a document, block until complete, return bytes
parse_async(input, ...) Submit a job, return Job immediately
get_job_status(job_id) Get current Job status
get_job_result(job_id) Download result as bytes
get_jobs() List all jobs as list[Job]

parse(input, output=None, processing_options=None, result_format="md") → bytes

Param Type Description
input str | Path | InputConnector File to parse
output str | Path | None Save result to disk (optional)
processing_options ProcessingOptions | dict | None Languages, page range, etc.
result_format str "md", "txt", "json", "html"

parse_async(input, processing_options=None, result_format="md") → Job

Same parameters as parse, minus output. Returns a Job without waiting.

Job properties

Property Type Description
id str Unique job identifier
processing_status str pending / processing / completed / failed
result_format str Output format
is_completed bool True when result is ready
is_failed bool True if job failed
metadata DocumentMetadata Filename, page count, language, etc.

Notebook Integration

Results are automatically rendered when running in Jupyter:

  • md → rendered Markdown
  • html → rendered HTML
  • json → interactive tree
  • txt → code block

To disable auto-display, pass output="file.md".


Resources


Licensed under Apache 2.0. © 2026 ByteIT GmbH.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

byteit-1.0.0.tar.gz (31.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

byteit-1.0.0-py3-none-any.whl (25.4 kB view details)

Uploaded Python 3

File details

Details for the file byteit-1.0.0.tar.gz.

File metadata

  • Download URL: byteit-1.0.0.tar.gz
  • Upload date:
  • Size: 31.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for byteit-1.0.0.tar.gz
Algorithm Hash digest
SHA256 212212fd1916e558c6f36a43eb1b0782d1c0e45136d5464e27c6b56c61700c81
MD5 75f49e06775ff30c4f524b6e1fc432c2
BLAKE2b-256 e6ba9e0133e6b7d3638dc4732b99b96f65667417560f74795e67120c79f53208

See more details on using hashes here.

File details

Details for the file byteit-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: byteit-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 25.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for byteit-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ed576df28e920018a9542b2d7ef711e9b835800827fec42f4a3a4c92b418f161
MD5 3e1673581f603befd3efba6eeeea8a87
BLAKE2b-256 3d3bb7884c8a2c12921ef70124609772781b35cebf9237fac7dd3bb0c8f9ed57

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page