AI-powered document intelligence platform - Turn your data into structured data with a single line of code.
Project description
ByteIT Python SDK
Python client for ByteIT — AI-powered document parsing. Extract structured text from PDFs, Word files, images, and more with a single API call.
Installation
pip install byteit
Requires Python 3.8+ and an API key from byteit.ai.
Quick Start
from byteit import ByteITClient
client = ByteITClient(api_key="your_api_key")
result = client.parse("document.pdf")
print(result.decode())
Returns raw bytes. Pass output="result.md" to save directly to disk.
Usage
Parse and save
# Returns bytes
result = client.parse("invoice.pdf", result_format="json")
# Save to file
client.parse("invoice.pdf", result_format="md", output="invoice.md")
Output formats: md (default), txt, json, html
Async (non-blocking)
Submit a job and check back later — useful for large files or batch workflows.
# Submit without waiting
job = client.parse_async("document.pdf")
# Poll status
status = client.get_job_status(job.id)
# status.processing_status: "pending" | "processing" | "completed" | "failed"
# Download when ready
if status.is_completed:
result = client.get_job_result(job.id)
Job management
for job in client.get_jobs():
print(f"{job.id} {job.processing_status} {job.result_format}")
Processing options
from byteit import ProcessingOptions
result = client.parse(
"document.pdf",
processing_options=ProcessingOptions(languages=["de", "en"], page_range="1-5"),
)
Or pass a plain dict:
result = client.parse("doc.pdf", processing_options={"languages": ["de"]})
API key from environment
import os
client = ByteITClient(api_key=os.environ["BYTEIT_API_KEY"])
Context manager
with ByteITClient(api_key="your_key") as client:
result = client.parse("doc.pdf")
Supported File Types
| Documents | Images |
|---|---|
PDF .pdf |
PNG .png |
Word .docx |
JPEG .jpg .jpeg |
PowerPoint .pptx |
TIFF .tiff |
HTML .html |
BMP .bmp |
Markdown .md |
|
Plain text .txt |
|
JSON .json |
|
XML .xml |
Error Handling
All exceptions inherit from ByteITError.
from byteit.exceptions import (
AuthenticationError,
ValidationError,
RateLimitError,
JobProcessingError,
ByteITError,
)
try:
result = client.parse("document.pdf")
except AuthenticationError:
print("Invalid API key")
except ValidationError as e:
print("Bad request:", e.message)
except RateLimitError:
print("Rate limit hit — retry later")
except JobProcessingError as e:
print("Processing failed:", e.message)
except ByteITError as e:
print("Unexpected error:", e.message)
| Exception | When raised |
|---|---|
AuthenticationError |
Invalid or missing API key |
APIKeyError |
API key rejected (403) |
ValidationError |
Bad request parameters |
ResourceNotFoundError |
Job not found |
RateLimitError |
Rate limit exceeded |
JobProcessingError |
Job failed during processing |
ServerError |
Server-side error (5xx) |
API Reference
ByteITClient(api_key)
| Method | Description |
|---|---|
parse(input, ...) |
Parse a document, block until complete, return bytes |
parse_async(input, ...) |
Submit a job, return Job immediately |
get_job_status(job_id) |
Get current Job status |
get_job_result(job_id) |
Download result as bytes |
get_jobs() |
List all jobs as list[Job] |
parse(input, output=None, processing_options=None, result_format="md") → bytes
| Param | Type | Description |
|---|---|---|
input |
str | Path | InputConnector |
File to parse |
output |
str | Path | None |
Save result to disk (optional) |
processing_options |
ProcessingOptions | dict | None |
Languages, page range, etc. |
result_format |
str |
"md", "txt", "json", "html" |
parse_async(input, processing_options=None, result_format="md") → Job
Same parameters as parse, minus output. Returns a Job without waiting.
Job properties
| Property | Type | Description |
|---|---|---|
id |
str |
Unique job identifier |
processing_status |
str |
pending / processing / completed / failed |
result_format |
str |
Output format |
is_completed |
bool |
True when result is ready |
is_failed |
bool |
True if job failed |
metadata |
DocumentMetadata |
Filename, page count, language, etc. |
Notebook Integration
Results are automatically rendered when running in Jupyter:
md→ rendered Markdownhtml→ rendered HTMLjson→ interactive treetxt→ code block
To disable auto-display, pass output="file.md".
Resources
- Studio: studio.byteit.ai — Process and test with a graphical user interface.
- Colab notebook: Quick demo
- Pricing: byteit.ai/pricing — 1,000 free credits
- Support: byteit.ai/support
- LinkedIn: ByteIT on LinkedIn
Licensed under Apache 2.0. © 2026 ByteIT GmbH.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file byteit-1.0.0.tar.gz.
File metadata
- Download URL: byteit-1.0.0.tar.gz
- Upload date:
- Size: 31.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
212212fd1916e558c6f36a43eb1b0782d1c0e45136d5464e27c6b56c61700c81
|
|
| MD5 |
75f49e06775ff30c4f524b6e1fc432c2
|
|
| BLAKE2b-256 |
e6ba9e0133e6b7d3638dc4732b99b96f65667417560f74795e67120c79f53208
|
File details
Details for the file byteit-1.0.0-py3-none-any.whl.
File metadata
- Download URL: byteit-1.0.0-py3-none-any.whl
- Upload date:
- Size: 25.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ed576df28e920018a9542b2d7ef711e9b835800827fec42f4a3a4c92b418f161
|
|
| MD5 |
3e1673581f603befd3efba6eeeea8a87
|
|
| BLAKE2b-256 |
3d3bb7884c8a2c12921ef70124609772781b35cebf9237fac7dd3bb0c8f9ed57
|