Skip to main content

AI-powered document intelligence platform - Turn your data into structured data with a single line of code.

Project description

ByteIT API Library

Turn your data into AI - Transform documents into structured data with a single line of code.

ByteIT is an AI-powered document intelligence platform that extracts clean, structured data from PDFs, Word, Excel, and many other file formats. This Python SDK provides a simple, developer-first interface to ByteIT's advanced document processing capabilities.


Why ByteIT?

  • Lightning Fast - Process documents in under 2 seconds
  • AI-Powered - Advanced ML models trained on millions of documents
  • Simple API - Parse documents in one line: client.parse("document.pdf")
  • Developer First - Clean code, full type hints, comprehensive SDKs
  • Enterprise Security - End-to-end encryption and GDPR compliance
  • Smart Extraction - Extract text, tables, forms, and structured data with AI precision

Quick Start

Installation

pip install byteit

Basic Usage

from byteit import ByteITClient

# Initialize client
client = ByteITClient(api_key="your_api_key")

# Parse a document
result = client.parse("invoice.pdf")
print(result.decode())

That's it. Your document is now structured text.


Features

Parse Any Document

# Local files
result = client.parse("contract.pdf")

# Different formats
txt_result = client.parse("doc.pdf", output_format="txt")
json_result = client.parse("doc.pdf", output_format="json")
md_result = client.parse("doc.pdf", output_format="md")
html_result = client.parse("doc.pdf", output_format="html")

# Save to file
client.parse("doc.pdf", output="result.txt")

S3 Integration

Process files directly from S3 without downloading - perfect for high-volume workflows:

from byteit.connectors import S3InputConnector

# Parse from S3
result = client.parse(
    S3InputConnector(
        source_bucket="my-documents",
        source_path_inside_bucket="invoices/jan-2024.pdf"
    )
)

Job Management

Track and retrieve processing jobs:

# List all jobs
jobs = client.get_all_jobs()
for job in jobs:
    print(f"{job.id}: {job.processing_status}")

# Get specific job
job = client.get_job_by_id("job_123")

# Download result later
if job.is_completed:
    result = client.get_result(job.id)

Context Manager

Automatic resource cleanup:

with ByteITClient(api_key="your_key") as client:
    result = client.parse("document.pdf")
    # Session automatically closed

API Reference

ByteITClient

ByteITClient(api_key: str)

Initialize the ByteIT client.

Parameters:

  • api_key (str): Your ByteIT API key

Methods:

parse(input, output_format="txt", output=None)

Parse a document and return the result.

Parameters:

  • input (str | Path | InputConnector): File to parse
    • str or Path: Local file path
    • S3InputConnector: For S3 files
  • output_format (str): Output format - "txt", "json", "md", or "html" (default: "txt")
  • output (str | Path | None): Optional file path to save result

Returns: bytes - Parsed content

Example:

result = client.parse("doc.pdf", output_format="json")

get_all_jobs()

Get all jobs for your account.

Returns: List[Job] - List of Job objects

get_job_by_id(job_id: str)

Get a specific job by ID.

Parameters:

  • job_id (str): The job ID

Returns: Job - Job object

get_result(job_id: str)

Download result for a completed job.

Parameters:

  • job_id (str): The job ID

Returns: bytes - Result content


Connectors

LocalFileInputConnector

Read files from local filesystem.

from byteit.connectors import LocalFileInputConnector

connector = LocalFileInputConnector("path/to/file.pdf")
result = client.parse(connector)

S3InputConnector

Read files from Amazon S3 using IAM role authentication - files never pass through your machine.

Prerequisites:

  1. Contact ByteIT support to set up AWS connection
  2. Provide IAM role ARN for ByteIT to assume
  3. Grant role read access to your bucket
from byteit.connectors import S3InputConnector

connector = S3InputConnector(
    source_bucket="my-bucket",
    source_path_inside_bucket="documents/file.pdf"
)
result = client.parse(connector)

Error Handling

ByteIT SDK provides specific exceptions for different error scenarios:

from byteit.exceptions import (
    APIKeyError,           # Invalid API key
    AuthenticationError,   # Authentication failed
    ValidationError,       # Invalid parameters
    ResourceNotFoundError, # Job/resource not found
    RateLimitError,        # Rate limit exceeded
    ServerError,           # Server-side error (5xx)
    JobProcessingError,    # Job processing failed
)

try:
    result = client.parse("document.pdf")
except ValidationError as e:
    print(f"Invalid input: {e.message}")
except RateLimitError:
    print("Rate limit exceeded - please wait")
except JobProcessingError as e:
    print(f"Processing failed: {e.message}")

All exceptions inherit from ByteITError:

from byteit.exceptions import ByteITError

try:
    result = client.parse("document.pdf")
except ByteITError as e:
    print(f"ByteIT error: {e.message}")
    print(f"Status code: {e.status_code}")
    print(f"Response: {e.response}")

Advanced Usage

Batch Processing

Process multiple files efficiently:

files = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]
results = []

for file in files:
    result = client.parse(file, output_format="json")
    results.append(result)

Custom Output Paths

Organize results systematically:

from pathlib import Path

input_dir = Path("inputs")
output_dir = Path("outputs")
output_dir.mkdir(exist_ok=True)

for pdf_file in input_dir.glob("*.pdf"):
    output_file = output_dir / f"{pdf_file.stem}.txt"
    client.parse(pdf_file, output=output_file)

S3 Workflow

High-volume cloud processing:

from byteit.connectors import S3InputConnector

# Process multiple S3 files
s3_files = [
    "invoices/2024-01.pdf",
    "invoices/2024-02.pdf",
    "invoices/2024-03.pdf",
]

for s3_path in s3_files:
    connector = S3InputConnector(
        source_bucket="my-documents",
        source_path_inside_bucket=s3_path
    )
    result = client.parse(connector, output_format="json")
    # Process result...

Configuration

Environment Variables

Set your API key via environment variable:

export BYTEIT_API_KEY="your_api_key_here"
import os
from byteit import ByteITClient

client = ByteITClient(api_key=os.getenv("BYTEIT_API_KEY"))

Custom Base URL

For testing or custom deployments:

from byteit import ByteITClient

# Set custom URL (for development/testing)
ByteITClient.BASE_URL = "http://localhost:8000"
client = ByteITClient(api_key="test_key")

Testing

The SDK includes comprehensive unit and integration tests.

Run Unit Tests

pytest

Run Integration Tests

Integration tests require a running ByteIT API and valid API key:

export BYTEIT_API_KEY="your_api_key"
pytest -m integration

Run All Tests

pytest -m ""

Requirements

  • Python 3.8+
  • requests library

About ByteIT

ByteIT transforms unstructured documents into clean, structured data with AI-powered precision. Built for scale, designed for developers.

Get started today: Start Processing Free - 1,000 free pages/month


Support & Resources


Legal

© 2026 ByteIT GmbH. All rights reserved.

This project is licensed under the terms specified in the LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

byteit-0.1.0.tar.gz (27.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

byteit-0.1.0-py3-none-any.whl (23.5 kB view details)

Uploaded Python 3

File details

Details for the file byteit-0.1.0.tar.gz.

File metadata

  • Download URL: byteit-0.1.0.tar.gz
  • Upload date:
  • Size: 27.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for byteit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c20f3ee3899022099bf726efe3ab4ac84afd538b1e87d2d416293af3f3a11914
MD5 a0fe5cd186d74e545020572375a12ef9
BLAKE2b-256 96b24d97542dc96ff0cb684f2e9e9379087c082646adf7756097b94adea68b55

See more details on using hashes here.

File details

Details for the file byteit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: byteit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for byteit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 98f825eee89a4f7ac664e9657527f072ca4d8f572b7e444999e13eaa40965745
MD5 cdb98346ee2c9413b9a60bb316c798f9
BLAKE2b-256 7ee517f984d7fe2db2ea183f8fb3d87332e5f3cd323a0dd5bfb5ce332becd245

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page