Skip to main content

AI-powered document intelligence platform - Turn your data into structured data with a single line of code.

Project description

ByteIT Python SDK

ByteIT's Python library for extracting structured data from documents. It is designed for backend services and ETL pipelines that require reliable, consistent document parsing at scale through a simple API.


Installation

Install from PyPI:

pip install byteit

Python 3.8 or newer is required.


Quick Start

from byteit import ByteITClient

client = ByteITClient(api_key="your_api_key")

result = client.parse("document.pdf")
print(result.decode())

The returned value is raw bytes containing the parsed document content.


Supported Input File Types

ByteIT supports the following file types as input:

  • PDF (.pdf)
  • Word (.docx)
  • PowerPoint (.pptx)
  • HTML (.html)
  • Markdown (.md)
  • Plain text (.txt)
  • JSON (.json)
  • XML (.xml)

Basic Usage

Parse a Local File

result = client.parse("invoice.pdf")

By default, the output format is Markdown (md).


Output Formats

You can choose the output format depending on your pipeline needs:

txt = client.parse("doc.pdf", output_format="txt")
json = client.parse("doc.pdf", output_format="json")
md = client.parse("doc.pdf", output_format="md")
html = client.parse("doc.pdf", output_format="html")

Supported output formats:

  • Plain text (txt)
  • JSON (json)
  • Markdown (md) (default)
  • HTML (html)

Save Output to File

client.parse(
    "doc.pdf",
    output_format="md",
    output="result.md"
)

When output is provided, the parsed result is written directly to disk.


Notebook Integration

When used in Jupyter notebooks, ByteIT automatically displays results in a readable format:

  • JSON: Interactive, expandable/collapsible tree view
  • Markdown: Rendered with formatting (headers, lists, etc.)
  • HTML: Rendered as HTML
  • Text: Code block with syntax highlighting
# In a Jupyter notebook - automatically displays formatted result
result = client.parse("document.pdf", result_format="json")

To disable auto-display, save to a file instead:

# Saves to file, no auto-display
result = client.parse("doc.pdf", result_format="json", output="output.json")

Typical Use Cases

  • Extracting structured data from documents in ETL pipelines
  • Preprocessing documents before indexing or downstream processing
  • Automating ingestion of invoices, contracts, or reports
  • Interactive document exploration in Jupyter notebooks

API Reference

ByteITClient

ByteITClient(api_key: str)

Creates a new ByteIT client.

Parameters

  • api_key (str): Your ByteIT API key

parse(...)

parse(
    input,
    output_format: str = "md",
    output = None
)

Parse a document and return the extracted content.

Parameters

  • input (str | Path): Path to a local document
  • output_format (str): Output format (txt, json, md, html)
  • output (str | Path | None): Optional path to save the result

Returns

  • bytes: Parsed document content

Error Handling

The SDK exposes specific exceptions for common error cases:

from byteit.exceptions import (
    ByteITError,
    ValidationError,
    AuthenticationError,
    RateLimitError,
    ServerError,
)

try:
    result = client.parse("document.pdf")
except ValidationError as e:
    print("Invalid input:", e.message)
except AuthenticationError:
    print("Invalid API key")
except RateLimitError:
    print("Rate limit exceeded")
except ByteITError as e:
    print("ByteIT error:", e.message)

All exceptions inherit from ByteITError.


Configuration

Environment Variable

You can provide the API key via environment variable:

export BYTEIT_API_KEY="your_api_key"
import os
from byteit import ByteITClient

client = ByteITClient(api_key=os.getenv("BYTEIT_API_KEY"))

Requirements

  • Python 3.8+
  • requests

About ByteIT

ByteIT provides document parsing and data extraction APIs designed for backend systems and automation workflows.

Website: https://byteit.ai


License

This project is licensed under the terms specified in the LICENSE file.

© 2026 ByteIT GmbH


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

byteit-0.1.2.tar.gz (29.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

byteit-0.1.2-py3-none-any.whl (24.7 kB view details)

Uploaded Python 3

File details

Details for the file byteit-0.1.2.tar.gz.

File metadata

  • Download URL: byteit-0.1.2.tar.gz
  • Upload date:
  • Size: 29.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for byteit-0.1.2.tar.gz
Algorithm Hash digest
SHA256 e4d64da7a8dcdc612a779ffc0824b75de4f657e36dc945721bccdf41bf8b93aa
MD5 53f54e78c9d8adf95521e99baaa38445
BLAKE2b-256 0f80ee5a3c8cc320aaa955bf539046bdd4ff373bd36b55315f1660b7af8b526d

See more details on using hashes here.

File details

Details for the file byteit-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: byteit-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 24.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for byteit-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d1a774cf74a56db06239b75559a0506a0fb75571b53af22efa39f444459b3857
MD5 ab524b5653e46080913a7768c34542bc
BLAKE2b-256 9718efebf8ab3100ef7727b67fd769497d3dabfc801155141e1471531fbdda53

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page