Skip to main content

AI-powered document intelligence platform - Turn your data into structured data with a single line of code.

Project description

ByteIT Python SDK

ByteIT is a Python client for document parsing and structured extraction. Use it to submit files, retrieve parsed content, and extract schema-based data from completed parse jobs.

Installation

pip install byteit

Requires Python 3.8+ and a ByteIT API key.

Quick Start

from byteit import ByteITClient

client = ByteITClient(api_key="your_api_key")
result = client.parse("document.pdf")
print(result.decode("utf-8"))

parse() returns raw bytes. Pass output="result.json" to write the result directly to disk.

Parse Documents

from byteit import ByteITClient, ProcessingOptions

client = ByteITClient(api_key="your_api_key")

result = client.parse(
    "invoice.pdf",
    processing_options=ProcessingOptions(languages=["en"], page_range="1-2"),
)

Public parse submission methods always request JSON output internally. If you need another format, request it when downloading an async result.

Async Workflow

job = client.parse_async("document.pdf")

status = client.get_job_status(job.id)
details = client.get_parse_job_details(job.id)

if status.is_completed:
    result_json = client.get_parse_job_result(job.id)
    result_txt = client.get_parse_job_result(job.id, result_format="txt")

Available parse-job methods:

Method Purpose
get_parse_jobs() List parse jobs
get_parse_job_details(job_id) Get full parse-job details
get_job_status(job_id) Check lightweight processing status
get_parse_job_result(job_id, result_format=None) Download parse result

Structured Extraction

Extraction runs on a completed parse job and returns a dictionary matching your schema.

from byteit import ByteITClient, ExtractionSchema
from pydantic import Field


class InvoiceSchema(ExtractionSchema):
    invoice_number: str | None = Field(description="Invoice number")
    total_amount: str | None = Field(description="Total amount")


client = ByteITClient(api_key="your_api_key")
parse_job = client.parse_async("invoice.pdf")

result = client.extract(
    parse_job.id,
    InvoiceSchema,
    extraction_complexity="medium",
)

Async extraction is also available:

extract_job = client.extract_async(parse_job.id, InvoiceSchema)

status = client.get_job_status(extract_job.id)
if status.is_completed:
    extracted = client.get_extract_job_result(extract_job.id)

Available extraction methods:

Method Purpose
extract(parse_job_id, schema, output=None, extraction_complexity="medium") Run extraction and wait for the result
extract_async(parse_job_id, schema, extraction_complexity="medium") Submit extraction without waiting
get_extract_jobs() List extraction jobs
get_extract_job_details(job_id) Get full extraction job details
get_extract_job_result(job_id) Download extraction result

Processing Options

You can pass either a ProcessingOptions instance or a plain dictionary.

result = client.parse(
    "document.pdf",
    processing_options={
        "languages": ["de", "en"],
        "page_range": "1-5",
        "extraction_type": "complex",
    },
)

Error Handling

All SDK exceptions inherit from ByteITError.

from byteit.exceptions import (
    AuthenticationError,
    ByteITError,
    JobProcessingError,
    RateLimitError,
    ValidationError,
)

try:
    result = client.parse("document.pdf")
except AuthenticationError:
    print("Invalid API key")
except ValidationError as exc:
    print("Invalid request:", exc.message)
except RateLimitError:
    print("Rate limit exceeded")
except JobProcessingError as exc:
    print("Processing failed:", exc.message)
except ByteITError as exc:
    print("ByteIT error:", exc.message)

Supported Inputs

Common supported inputs include PDF, Word, PowerPoint, HTML, Markdown, plain text, JSON, XML, and common image formats such as PNG, JPEG, TIFF, and BMP.

Notebook Behavior

When running in Jupyter, parse results are automatically displayed as JSON when possible. Pass output=... if you want to suppress inline display and save the response directly.

Resources

Licensed under Apache 2.0. © 2026 ByteIT GmbH.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

byteit-1.1.0.tar.gz (39.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

byteit-1.1.0-py3-none-any.whl (32.3 kB view details)

Uploaded Python 3

File details

Details for the file byteit-1.1.0.tar.gz.

File metadata

  • Download URL: byteit-1.1.0.tar.gz
  • Upload date:
  • Size: 39.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for byteit-1.1.0.tar.gz
Algorithm Hash digest
SHA256 ab3c9fc77e7c2f6386c04accd4bad4e9ee6face6512f48e487a8f0e7433310b9
MD5 27f9936d6ffbbf88ecada852ac067d03
BLAKE2b-256 b9719e495fbbce84e5a1335eed3b15189da28f51b0febfceb613041b2b1fe7f3

See more details on using hashes here.

File details

Details for the file byteit-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: byteit-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 32.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for byteit-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0b4f5da95251aeeaf71ec664ffa36abb0f328188653ebef78a088fd88fdac135
MD5 f9fcaffe32aa8f480afa52e31f5cd33d
BLAKE2b-256 df5394e1074fb3e01e11b5eb6315b0bc11c5030bd4ed04210b92a159d2c5fa2d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page