Skip to main content

AI-powered document intelligence platform - Turn your data into structured data with a single line of code.

Project description

ByteIT API Library

Turn your data into AI - Transform documents into structured data with a single line of code.

ByteIT is an AI-powered document intelligence platform that extracts clean, structured data from PDFs, Word, Excel, and many other file formats. This Python SDK provides a simple, developer-first interface to ByteIT's advanced document processing capabilities.


Why ByteIT?

  • Lightning Fast - Process documents in under 2 seconds
  • AI-Powered - Advanced ML models trained on millions of documents
  • Simple API - Parse documents in one line: client.parse("document.pdf")
  • Developer First - Clean code, full type hints, comprehensive SDKs
  • Enterprise Security - End-to-end encryption and GDPR compliance
  • Smart Extraction - Extract text, tables, forms, and structured data with AI precision

Quick Start

Installation

pip install byteit

Basic Usage

from byteit import ByteITClient

# Initialize client
client = ByteITClient(api_key="your_api_key")

# Parse a document
result = client.parse("invoice.pdf")
print(result.decode())

That's it. Your document is now structured text.


Features

Parse Any Document

# Local files
result = client.parse("contract.pdf")

# Different formats
txt_result = client.parse("doc.pdf", output_format="txt")
json_result = client.parse("doc.pdf", output_format="json")
md_result = client.parse("doc.pdf", output_format="md")
html_result = client.parse("doc.pdf", output_format="html")

# Save to file
client.parse("doc.pdf", output="result.txt")

S3 Integration

Process files directly from S3 without downloading - perfect for high-volume workflows:

from byteit.connectors import S3InputConnector

# Parse from S3
result = client.parse(
    S3InputConnector(
        source_bucket="my-documents",
        source_path_inside_bucket="invoices/jan-2024.pdf"
    )
)

Job Management

Track and retrieve processing jobs:

# List all jobs
jobs = client.get_all_jobs()
for job in jobs:
    print(f"{job.id}: {job.processing_status}")

# Get specific job
job = client.get_job_by_id("job_123")

# Download result later
if job.is_completed:
    result = client.get_result(job.id)

Context Manager

Automatic resource cleanup:

with ByteITClient(api_key="your_key") as client:
    result = client.parse("document.pdf")
    # Session automatically closed

API Reference

ByteITClient

ByteITClient(api_key: str)

Initialize the ByteIT client.

Parameters:

  • api_key (str): Your ByteIT API key

Methods:

parse(input, output_format="txt", output=None)

Parse a document and return the result.

Parameters:

  • input (str | Path | InputConnector): File to parse
    • str or Path: Local file path
    • S3InputConnector: For S3 files
  • output_format (str): Output format - "txt", "json", "md", or "html" (default: "txt")
  • output (str | Path | None): Optional file path to save result

Returns: bytes - Parsed content

Example:

result = client.parse("doc.pdf", output_format="json")

get_all_jobs()

Get all jobs for your account.

Returns: List[Job] - List of Job objects

get_job_by_id(job_id: str)

Get a specific job by ID.

Parameters:

  • job_id (str): The job ID

Returns: Job - Job object

get_result(job_id: str)

Download result for a completed job.

Parameters:

  • job_id (str): The job ID

Returns: bytes - Result content


Connectors

LocalFileInputConnector

Read files from local filesystem.

from byteit.connectors import LocalFileInputConnector

connector = LocalFileInputConnector("path/to/file.pdf")
result = client.parse(connector)

S3InputConnector

Read files from Amazon S3 using IAM role authentication - files never pass through your machine.

Prerequisites:

  1. Contact ByteIT support to set up AWS connection
  2. Provide IAM role ARN for ByteIT to assume
  3. Grant role read access to your bucket
from byteit.connectors import S3InputConnector

connector = S3InputConnector(
    source_bucket="my-bucket",
    source_path_inside_bucket="documents/file.pdf"
)
result = client.parse(connector)

Error Handling

ByteIT SDK provides specific exceptions for different error scenarios:

from byteit.exceptions import (
    APIKeyError,           # Invalid API key
    AuthenticationError,   # Authentication failed
    ValidationError,       # Invalid parameters
    ResourceNotFoundError, # Job/resource not found
    RateLimitError,        # Rate limit exceeded
    ServerError,           # Server-side error (5xx)
    JobProcessingError,    # Job processing failed
)

try:
    result = client.parse("document.pdf")
except ValidationError as e:
    print(f"Invalid input: {e.message}")
except RateLimitError:
    print("Rate limit exceeded - please wait")
except JobProcessingError as e:
    print(f"Processing failed: {e.message}")

All exceptions inherit from ByteITError:

from byteit.exceptions import ByteITError

try:
    result = client.parse("document.pdf")
except ByteITError as e:
    print(f"ByteIT error: {e.message}")
    print(f"Status code: {e.status_code}")
    print(f"Response: {e.response}")

Advanced Usage

Batch Processing

Process multiple files efficiently:

files = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]
results = []

for file in files:
    result = client.parse(file, output_format="json")
    results.append(result)

Custom Output Paths

Organize results systematically:

from pathlib import Path

input_dir = Path("inputs")
output_dir = Path("outputs")
output_dir.mkdir(exist_ok=True)

for pdf_file in input_dir.glob("*.pdf"):
    output_file = output_dir / f"{pdf_file.stem}.txt"
    client.parse(pdf_file, output=output_file)

S3 Workflow

High-volume cloud processing:

from byteit.connectors import S3InputConnector

# Process multiple S3 files
s3_files = [
    "invoices/2024-01.pdf",
    "invoices/2024-02.pdf",
    "invoices/2024-03.pdf",
]

for s3_path in s3_files:
    connector = S3InputConnector(
        source_bucket="my-documents",
        source_path_inside_bucket=s3_path
    )
    result = client.parse(connector, output_format="json")
    # Process result...

Configuration

Environment Variables

Set your API key via environment variable:

export BYTEIT_API_KEY="your_api_key_here"
import os
from byteit import ByteITClient

client = ByteITClient(api_key=os.getenv("BYTEIT_API_KEY"))

Custom Base URL

For testing or custom deployments:

from byteit import ByteITClient

# Set custom URL (for development/testing)
ByteITClient.BASE_URL = "http://localhost:8000"
client = ByteITClient(api_key="test_key")

Testing

The SDK includes comprehensive unit and integration tests.

Run Unit Tests

pytest

Run Integration Tests

Integration tests require a running ByteIT API and valid API key:

export BYTEIT_API_KEY="your_api_key"
pytest -m integration

Run All Tests

pytest -m ""

Requirements

  • Python 3.8+
  • requests library

About ByteIT

ByteIT transforms unstructured documents into clean, structured data with AI-powered precision. Built for scale, designed for developers.

Get started today: Start Processing Free - 1,000 free pages/month


Support & Resources


Legal

© 2026 ByteIT GmbH. All rights reserved.

This project is licensed under the terms specified in the LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

byteit-0.1.1.tar.gz (27.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

byteit-0.1.1-py3-none-any.whl (23.6 kB view details)

Uploaded Python 3

File details

Details for the file byteit-0.1.1.tar.gz.

File metadata

  • Download URL: byteit-0.1.1.tar.gz
  • Upload date:
  • Size: 27.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for byteit-0.1.1.tar.gz
Algorithm Hash digest
SHA256 9fab9a173aebd209c749f52a5c4f345cfac0c6a2c05e6fe1c7e7aea9c339ef3e
MD5 2bce777027b3cd3881270cf5fc4bd386
BLAKE2b-256 37ce87ef92c1011bed1f246f4325048d8bdbe6cc08e5f2e58b8d027129930d43

See more details on using hashes here.

File details

Details for the file byteit-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: byteit-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 23.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for byteit-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 839656ebc633bd80e963e2ee49b4e27b8f776d26b532b77064a73cac3a761def
MD5 c94b50d48060ea9ba28152fad6f8c45c
BLAKE2b-256 b8651a9b0ecc23158aab997c68ac3c166605a4d2d8984cd4e4f35be5e9b8387c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page