AI-powered document intelligence platform - Turn your data into structured data with a single line of code.
Project description
ByteIT API Library
Turn your data into AI - Transform documents into structured data with a single line of code.
ByteIT is an AI-powered document intelligence platform that extracts clean, structured data from PDFs, Word, Excel, and many other file formats. This Python SDK provides a simple, developer-first interface to ByteIT's advanced document processing capabilities.
Why ByteIT?
- Lightning Fast - Process documents in under 2 seconds
- AI-Powered - Advanced ML models trained on millions of documents
- Simple API - Parse documents in one line:
client.parse("document.pdf") - Developer First - Clean code, full type hints, comprehensive SDKs
- Enterprise Security - End-to-end encryption and GDPR compliance
- Smart Extraction - Extract text, tables, forms, and structured data with AI precision
Quick Start
Installation
pip install byteit
Basic Usage
from byteit import ByteITClient
# Initialize client
client = ByteITClient(api_key="your_api_key")
# Parse a document
result = client.parse("invoice.pdf")
print(result.decode())
That's it. Your document is now structured text.
Features
Parse Any Document
# Local files
result = client.parse("contract.pdf")
# Different formats
txt_result = client.parse("doc.pdf", output_format="txt")
json_result = client.parse("doc.pdf", output_format="json")
md_result = client.parse("doc.pdf", output_format="md")
html_result = client.parse("doc.pdf", output_format="html")
# Save to file
client.parse("doc.pdf", output="result.txt")
S3 Integration
Process files directly from S3 without downloading - perfect for high-volume workflows:
from byteit.connectors import S3InputConnector
# Parse from S3
result = client.parse(
S3InputConnector(
source_bucket="my-documents",
source_path_inside_bucket="invoices/jan-2024.pdf"
)
)
Job Management
Track and retrieve processing jobs:
# List all jobs
jobs = client.get_all_jobs()
for job in jobs:
print(f"{job.id}: {job.processing_status}")
# Get specific job
job = client.get_job_by_id("job_123")
# Download result later
if job.is_completed:
result = client.get_result(job.id)
Context Manager
Automatic resource cleanup:
with ByteITClient(api_key="your_key") as client:
result = client.parse("document.pdf")
# Session automatically closed
API Reference
ByteITClient
ByteITClient(api_key: str)
Initialize the ByteIT client.
Parameters:
api_key(str): Your ByteIT API key
Methods:
parse(input, output_format="txt", output=None)
Parse a document and return the result.
Parameters:
input(str | Path | InputConnector): File to parsestrorPath: Local file pathS3InputConnector: For S3 files
output_format(str): Output format - "txt", "json", "md", or "html" (default: "txt")output(str | Path | None): Optional file path to save result
Returns: bytes - Parsed content
Example:
result = client.parse("doc.pdf", output_format="json")
get_all_jobs()
Get all jobs for your account.
Returns: List[Job] - List of Job objects
get_job_by_id(job_id: str)
Get a specific job by ID.
Parameters:
job_id(str): The job ID
Returns: Job - Job object
get_result(job_id: str)
Download result for a completed job.
Parameters:
job_id(str): The job ID
Returns: bytes - Result content
Connectors
LocalFileInputConnector
Read files from local filesystem.
from byteit.connectors import LocalFileInputConnector
connector = LocalFileInputConnector("path/to/file.pdf")
result = client.parse(connector)
S3InputConnector
Read files from Amazon S3 using IAM role authentication - files never pass through your machine.
Prerequisites:
- Contact ByteIT support to set up AWS connection
- Provide IAM role ARN for ByteIT to assume
- Grant role read access to your bucket
from byteit.connectors import S3InputConnector
connector = S3InputConnector(
source_bucket="my-bucket",
source_path_inside_bucket="documents/file.pdf"
)
result = client.parse(connector)
Error Handling
ByteIT SDK provides specific exceptions for different error scenarios:
from byteit.exceptions import (
APIKeyError, # Invalid API key
AuthenticationError, # Authentication failed
ValidationError, # Invalid parameters
ResourceNotFoundError, # Job/resource not found
RateLimitError, # Rate limit exceeded
ServerError, # Server-side error (5xx)
JobProcessingError, # Job processing failed
)
try:
result = client.parse("document.pdf")
except ValidationError as e:
print(f"Invalid input: {e.message}")
except RateLimitError:
print("Rate limit exceeded - please wait")
except JobProcessingError as e:
print(f"Processing failed: {e.message}")
All exceptions inherit from ByteITError:
from byteit.exceptions import ByteITError
try:
result = client.parse("document.pdf")
except ByteITError as e:
print(f"ByteIT error: {e.message}")
print(f"Status code: {e.status_code}")
print(f"Response: {e.response}")
Advanced Usage
Batch Processing
Process multiple files efficiently:
files = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]
results = []
for file in files:
result = client.parse(file, output_format="json")
results.append(result)
Custom Output Paths
Organize results systematically:
from pathlib import Path
input_dir = Path("inputs")
output_dir = Path("outputs")
output_dir.mkdir(exist_ok=True)
for pdf_file in input_dir.glob("*.pdf"):
output_file = output_dir / f"{pdf_file.stem}.txt"
client.parse(pdf_file, output=output_file)
S3 Workflow
High-volume cloud processing:
from byteit.connectors import S3InputConnector
# Process multiple S3 files
s3_files = [
"invoices/2024-01.pdf",
"invoices/2024-02.pdf",
"invoices/2024-03.pdf",
]
for s3_path in s3_files:
connector = S3InputConnector(
source_bucket="my-documents",
source_path_inside_bucket=s3_path
)
result = client.parse(connector, output_format="json")
# Process result...
Configuration
Environment Variables
Set your API key via environment variable:
export BYTEIT_API_KEY="your_api_key_here"
import os
from byteit import ByteITClient
client = ByteITClient(api_key=os.getenv("BYTEIT_API_KEY"))
Custom Base URL
For testing or custom deployments:
from byteit import ByteITClient
# Set custom URL (for development/testing)
ByteITClient.BASE_URL = "http://localhost:8000"
client = ByteITClient(api_key="test_key")
Testing
The SDK includes comprehensive unit and integration tests.
Run Unit Tests
pytest
Run Integration Tests
Integration tests require a running ByteIT API and valid API key:
export BYTEIT_API_KEY="your_api_key"
pytest -m integration
Run All Tests
pytest -m ""
Requirements
- Python 3.8+
requestslibrary
About ByteIT
ByteIT transforms unstructured documents into clean, structured data with AI-powered precision. Built for scale, designed for developers.
Get started today: Start Processing Free - 1,000 free pages/month
Support & Resources
- Website: https://byteit.ai
- Pricing: https://byteit.ai/pricing
- Support: https://byteit.ai/support
- Contact: https://byteit.ai/contact
- LinkedIn: ByteIT on LinkedIn
Legal
© 2026 ByteIT GmbH. All rights reserved.
- Privacy Policy: https://byteit.ai/privacy-policy
- Terms of Service: https://byteit.ai/terms
- Impressum: https://byteit.ai/impressum
This project is licensed under the terms specified in the LICENSE file.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file byteit-0.1.0.tar.gz.
File metadata
- Download URL: byteit-0.1.0.tar.gz
- Upload date:
- Size: 27.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c20f3ee3899022099bf726efe3ab4ac84afd538b1e87d2d416293af3f3a11914
|
|
| MD5 |
a0fe5cd186d74e545020572375a12ef9
|
|
| BLAKE2b-256 |
96b24d97542dc96ff0cb684f2e9e9379087c082646adf7756097b94adea68b55
|
File details
Details for the file byteit-0.1.0-py3-none-any.whl.
File metadata
- Download URL: byteit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 23.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98f825eee89a4f7ac664e9657527f072ca4d8f572b7e444999e13eaa40965745
|
|
| MD5 |
cdb98346ee2c9413b9a60bb316c798f9
|
|
| BLAKE2b-256 |
7ee517f984d7fe2db2ea183f8fb3d87332e5f3cd323a0dd5bfb5ce332becd245
|