AI-powered document intelligence platform - Turn your data into structured data with a single line of code.
Project description
ByteIT Python SDK
ByteIT's Python library for extracting structured data from documents. It is designed for backend services and ETL pipelines that require reliable, consistent document parsing at scale through a simple API.
Installation
Install from PyPI:
pip install byteit
Python 3.8 or newer is required.
Quick Start
from byteit import ByteITClient
client = ByteITClient(api_key="your_api_key")
result = client.parse("document.pdf")
print(result.decode())
The returned value is raw bytes containing the parsed document content.
Supported Input File Types
ByteIT supports the following file types as input:
- PDF (
.pdf) - Word (
.docx) - PowerPoint (
.pptx) - HTML (
.html) - Markdown (
.md) - Plain text (
.txt) - JSON (
.json) - XML (
.xml)
Basic Usage
Parse a Local File
result = client.parse("invoice.pdf")
By default, the output format is Markdown (md).
Output Formats
You can choose the output format depending on your pipeline needs:
txt = client.parse("doc.pdf", output_format="txt")
json = client.parse("doc.pdf", output_format="json")
md = client.parse("doc.pdf", output_format="md")
html = client.parse("doc.pdf", output_format="html")
Supported output formats:
- Plain text (
txt) - JSON (
json) - Markdown (
md) (default) - HTML (
html)
Save Output to File
client.parse(
"doc.pdf",
output_format="md",
output="result.md"
)
When output is provided, the parsed result is written directly to disk.
Notebook Integration
When used in Jupyter notebooks, ByteIT automatically displays results in a readable format:
- JSON: Interactive, expandable/collapsible tree view
- Markdown: Rendered with formatting (headers, lists, etc.)
- HTML: Rendered as HTML
- Text: Code block with syntax highlighting
# In a Jupyter notebook - automatically displays formatted result
result = client.parse("document.pdf", result_format="json")
To disable auto-display, save to a file instead:
# Saves to file, no auto-display
result = client.parse("doc.pdf", result_format="json", output="output.json")
Typical Use Cases
- Extracting structured data from documents in ETL pipelines
- Preprocessing documents before indexing or downstream processing
- Automating ingestion of invoices, contracts, or reports
- Interactive document exploration in Jupyter notebooks
API Reference
ByteITClient
ByteITClient(api_key: str)
Creates a new ByteIT client.
Parameters
api_key(str): Your ByteIT API key
parse(...)
parse(
input,
output_format: str = "md",
output = None
)
Parse a document and return the extracted content.
Parameters
input(str | Path): Path to a local documentoutput_format(str): Output format (txt,json,md,html)output(str | Path | None): Optional path to save the result
Returns
bytes: Parsed document content
Error Handling
The SDK exposes specific exceptions for common error cases:
from byteit.exceptions import (
ByteITError,
ValidationError,
AuthenticationError,
RateLimitError,
ServerError,
)
try:
result = client.parse("document.pdf")
except ValidationError as e:
print("Invalid input:", e.message)
except AuthenticationError:
print("Invalid API key")
except RateLimitError:
print("Rate limit exceeded")
except ByteITError as e:
print("ByteIT error:", e.message)
All exceptions inherit from ByteITError.
Configuration
Environment Variable
You can provide the API key via environment variable:
export BYTEIT_API_KEY="your_api_key"
import os
from byteit import ByteITClient
client = ByteITClient(api_key=os.getenv("BYTEIT_API_KEY"))
Requirements
- Python 3.8+
requests
About ByteIT
ByteIT provides document parsing and data extraction APIs designed for backend systems and automation workflows.
Website: https://byteit.ai
License
This project is licensed under the terms specified in the LICENSE file.
© 2026 ByteIT GmbH
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file byteit-0.1.2.tar.gz.
File metadata
- Download URL: byteit-0.1.2.tar.gz
- Upload date:
- Size: 29.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e4d64da7a8dcdc612a779ffc0824b75de4f657e36dc945721bccdf41bf8b93aa
|
|
| MD5 |
53f54e78c9d8adf95521e99baaa38445
|
|
| BLAKE2b-256 |
0f80ee5a3c8cc320aaa955bf539046bdd4ff373bd36b55315f1660b7af8b526d
|
File details
Details for the file byteit-0.1.2-py3-none-any.whl.
File metadata
- Download URL: byteit-0.1.2-py3-none-any.whl
- Upload date:
- Size: 24.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1a774cf74a56db06239b75559a0506a0fb75571b53af22efa39f444459b3857
|
|
| MD5 |
ab524b5653e46080913a7768c34542bc
|
|
| BLAKE2b-256 |
9718efebf8ab3100ef7727b67fd769497d3dabfc801155141e1471531fbdda53
|