Skip to main content

AI-powered document intelligence platform - Turn your data into structured data with a single line of code.

Project description

ByteIT Python SDK

Python client for ByteIT — AI-powered document parsing. Extract structured text from PDFs, Word files, images, and more with a single API call.


Installation

pip install byteit

Requires Python 3.8+ and an API key from byteit.ai.


Quick Start

from byteit import ByteITClient, OutputFormat

client = ByteITClient(api_key="your_api_key")
result = client.parse("document.pdf")
print(result.decode())

Returns raw bytes. Pass output="result.md" to save directly to disk.


Usage

Parse and save

# Returns bytes
result = client.parse("invoice.pdf", result_format=OutputFormat.JSON)

# Save to file
client.parse(
    "invoice.pdf",
    result_format=OutputFormat.MD,
    output="invoice.md",
)

Output formats: OutputFormat.MD (default), OutputFormat.TXT, OutputFormat.JSON, OutputFormat.HTML, OutputFormat.EXCEL

Excel output note: OutputFormat.EXCEL extracts tables into one or more Excel files. Because a document can contain multiple tables, we return the Excel files bundled in a single .zip archive. If you pass the output parameter with result_format=OutputFormat.EXCEL, the output path should end with .zip instead of .xlsx.

Async (non-blocking)

Submit a job and check back later — useful for large files or batch workflows.

# Submit without waiting
job = client.parse_async("document.pdf")

# Poll status
status = client.get_job_status(job.id)
# status.processing_status: "pending" | "processing" | "completed" | "failed"

# Fetch full job details when needed
details = client.get_job_details(job.id)

# Download when ready
if status.is_completed:
    result = client.get_job_result(job.id)

Job management

job_list = client.get_jobs()

for job in job_list.jobs:
    print(f"{job.id}  {job.processing_status}  {job.result_format}")

Processing options

from byteit import ProcessingOptions

result = client.parse(
    "document.pdf",
    processing_options=ProcessingOptions(languages=["de", "en"], page_range="1-5"),
)

Or pass a plain dict:

result = client.parse("doc.pdf", processing_options={"languages": ["de"]})

API key from environment

import os
client = ByteITClient(api_key=os.environ["BYTEIT_API_KEY"])

Context manager

with ByteITClient(api_key="your_key") as client:
    result = client.parse("doc.pdf")

Supported File Types

Documents Images
PDF .pdf PNG .png
Word .docx JPEG .jpg .jpeg
PowerPoint .pptx TIFF .tiff
HTML .html BMP .bmp
Markdown .md
Plain text .txt
JSON .json
XML .xml

Error Handling

All exceptions inherit from ByteITError.

from byteit.exceptions import (
    AuthenticationError,
    ValidationError,
    RateLimitError,
    JobProcessingError,
    ByteITError,
)

try:
    result = client.parse("document.pdf")
except AuthenticationError:
    print("Invalid API key")
except ValidationError as e:
    print("Bad request:", e.message)
except RateLimitError:
    print("Rate limit hit — retry later")
except JobProcessingError as e:
    print("Processing failed:", e.message)
except ByteITError as e:
    print("Unexpected error:", e.message)
Exception When raised
AuthenticationError Invalid or missing API key
APIKeyError API key rejected (403)
ValidationError Bad request parameters
ResourceNotFoundError Job not found
RateLimitError Rate limit exceeded
JobProcessingError Job failed during processing
ServerError Server-side error (5xx)

API Reference

ByteITClient(api_key)

Method Description
parse(input, ...) Parse a document, block until complete, return bytes
parse_async(input, ...) Submit a job, return ParseJob immediately
get_job_details(job_id) Get full ParseJob details
get_job_status(job_id) Get current JobStatus
get_job_result(job_id) Download result as bytes
get_jobs() List all jobs as JobList

parse(input, output=None, processing_options=None, result_format=OutputFormat.MD) → bytes

Param Type Description
input str | Path | InputConnector File to parse
output str | Path | None Save result to disk (optional)
processing_options ProcessingOptions | dict | None Languages, page range, etc.
result_format OutputFormat OutputFormat.MD, OutputFormat.TXT, OutputFormat.JSON, OutputFormat.HTML, OutputFormat.EXCEL

When result_format is OutputFormat.EXCEL, the returned bytes represent a .zip archive containing the generated Excel files.

parse_async(input, processing_options=None, result_format=OutputFormat.MD) → ParseJob

Same parameters as parse, minus output. Returns a ParseJob without waiting.

ParseJob properties

Property Type Description
id str Unique job identifier
processing_status str pending / processing / completed / failed
result_format str Output format
is_completed bool True when result is ready
is_failed bool True if job failed
metadata DocumentMetadata Filename, page count, language, etc.

Notebook Integration

Results are automatically rendered when running in Jupyter:

  • OutputFormat.MD → rendered Markdown
  • OutputFormat.HTML → rendered HTML
  • OutputFormat.JSON → interactive tree
  • OutputFormat.TXT → code block

To disable auto-display, pass output="file.md".


Resources


Licensed under Apache 2.0. © 2026 ByteIT GmbH.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

byteit-1.0.1.tar.gz (35.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

byteit-1.0.1-py3-none-any.whl (28.6 kB view details)

Uploaded Python 3

File details

Details for the file byteit-1.0.1.tar.gz.

File metadata

  • Download URL: byteit-1.0.1.tar.gz
  • Upload date:
  • Size: 35.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for byteit-1.0.1.tar.gz
Algorithm Hash digest
SHA256 0f66f723e8d5629c413e7980adb9394a593f410e99e55ed4176b3a9720e5196e
MD5 3321f6e36916b7631f960b4e4f640e02
BLAKE2b-256 cff7abf071e04da5bb87f2c76cade8c1957883785cd1f2adfd8a2782b947894b

See more details on using hashes here.

File details

Details for the file byteit-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: byteit-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 28.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for byteit-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 926031135e1d811185bab2c2b7f59edce779686926f5f69e4c9e2e90b7a219cd
MD5 52dc13d7aca540f44a3d9aeaaa8cc66f
BLAKE2b-256 a78a7e221ccb962d5a4d329a9ad379baebd8ce4549d0d6a204e888b00502bd50

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page