Skip to main content

Official Python SDK for the Knowhere document parsing API

Project description

Knowhere Python SDK

Official Python SDK for the Knowhere document parsing API.

Installation

pip install knowhere-python-sdk

Or with uv:

uv add knowhere-python-sdk

Quick Start

import knowhere

client = knowhere.Knowhere(api_key="sk_...")

# Parse a document from URL
result = client.parse(url="https://example.com/report.pdf")

print(result.statistics.total_chunks)  # 152
print(result.full_markdown[:200])      # First 200 chars of full markdown

for chunk in result.text_chunks:
    print(chunk.content[:80])

Parse a Local File

from pathlib import Path

result = client.parse(
    file=Path("report.pdf"),
    parsing_params={"model": "advanced", "ocr_enabled": True},
)

print(result.manifest.source_file_name)  # "report.pdf"
print(len(result.chunks))                # 152

Access Different Chunk Types

result = client.parse(url="https://example.com/report.pdf")

# Text chunks
for chunk in result.text_chunks:
    print(chunk.keywords)
    print(chunk.summary)

# Image chunks (raw bytes loaded from ZIP)
for chunk in result.image_chunks:
    print(chunk.file_path)
    print(len(chunk.data))       # bytes
    chunk.save("./output/")      # writes image to disk

# Table chunks (HTML loaded from ZIP)
for chunk in result.table_chunks:
    print(chunk.file_path)
    print(chunk.html[:100])

Save All Results to Disk

result = client.parse(file=Path("report.pdf"))
result.save("./output/report/")

Async Usage

import asyncio
import knowhere

async def main():
    async with knowhere.AsyncKnowhere(api_key="sk_...") as client:
        result = await client.parse(url="https://example.com/report.pdf")
        print(result.statistics.total_chunks)

        for chunk in result.text_chunks:
            print(chunk.summary)

asyncio.run(main())

Step-by-Step Control

For granular control over the parsing workflow, use the jobs resource directly:

from pathlib import Path

# Step 1: Create a parsing job
job = client.jobs.create(
    source_type="file",
    file_name="report.pdf",
    parsing_params={"model": "advanced", "ocr_enabled": True},
)

# Step 2: Upload file to presigned URL
client.jobs.upload(job, file=Path("report.pdf"))

# Step 3: Poll until done (adaptive backoff)
job_result = client.jobs.wait(job.job_id, poll_interval=10.0, poll_timeout=1800.0)

# Step 4: Download and parse results
result = client.jobs.load(job_result)
print(result.statistics)

Configuration

The SDK reads configuration from constructor arguments, environment variables, or defaults (in that priority order):

Variable Description Default
KNOWHERE_API_KEY API key (required)
KNOWHERE_BASE_URL API base URL https://api.knowhereto.ai
KNOWHERE_LOG_LEVEL Log level WARNING
# Uses environment variables automatically
client = knowhere.Knowhere()

# Or configure explicitly
client = knowhere.Knowhere(
    api_key="sk_...",
    base_url="https://api.knowhereto.ai",
    timeout=30.0,           # HTTP request timeout (default: 60s)
    upload_timeout=300.0,   # File upload timeout (default: 600s)
    max_retries=3,          # Max retry attempts (default: 5)
)

Context Manager

# Sync — ensures httpx.Client is properly closed
with knowhere.Knowhere(api_key="sk_...") as client:
    result = client.parse(url="https://example.com/report.pdf")

# Async — ensures httpx.AsyncClient is properly closed
async with knowhere.AsyncKnowhere(api_key="sk_...") as client:
    result = await client.parse(url="https://example.com/report.pdf")

Error Handling

from knowhere import (
    Knowhere,
    AuthenticationError,
    NotFoundError,
    RateLimitError,
    BadRequestError,
    APIStatusError,
    PollingTimeoutError,
)

try:
    result = client.parse(url="https://example.com/report.pdf")
except BadRequestError as e:
    print(e.status_code)   # 400
    print(e.code)          # "INVALID_ARGUMENT"
    print(e.message)       # "Unsupported file format"
    print(e.request_id)    # "req_abc123"
except NotFoundError as e:
    print(e.message)       # "Job not found"
except RateLimitError as e:
    print(e.retry_after)   # seconds to wait
except AuthenticationError:
    print("Invalid API key")
except PollingTimeoutError:
    print("Job did not complete within timeout")
except APIStatusError as e:
    print(f"API error {e.status_code}: {e.message}")

Requirements

Building from Source

Prerequisites

  • Python 3.9 or later
  • uv (recommended) or pip

Build

git clone https://github.com/Ontos-AI/knowhere-python-sdk.git
cd knowhere-python-sdk

# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh

# Build sdist + wheel
uv build

# Install the built wheel
pip install dist/knowhere_python_sdk-*.whl

Development

Setup

git clone https://github.com/Ontos-AI/knowhere-python-sdk.git
cd knowhere-python-sdk

# Create venv and install all dependencies (including dev)
uv sync --all-extras

Running Tests

# Run all unit tests
uv run pytest tests/ -v

# Run with coverage
uv run coverage run -m pytest tests/ -v
uv run coverage report -m

Linting and Type Checking

# Lint
uv run ruff check src/

# Type check
uv run mypy src/knowhere/

Project Structure

knowhere-python-sdk/
├── src/knowhere/
│   ├── __init__.py              # Public API surface
│   ├── _client.py               # Knowhere + AsyncKnowhere clients
│   ├── _base_client.py          # HTTP logic, retry, error parsing
│   ├── _exceptions.py           # Exception hierarchy
│   ├── _constants.py            # Default URLs, timeouts, env var names
│   ├── _types.py                # Sentinel types, callback type aliases
│   ├── _logging.py              # Logger setup, header redaction
│   ├── _response.py             # APIResponse wrapper
│   ├── _version.py              # __version__
│   ├── py.typed                 # PEP 561 marker
│   ├── types/
│   │   ├── job.py               # Job, JobResult, JobError
│   │   ├── result.py            # ParseResult, Manifest, Chunk types
│   │   └── params.py            # ParsingParams, WebhookConfig
│   ├── resources/
│   │   └── jobs.py              # Jobs + AsyncJobs resource
│   └── lib/
│       ├── polling.py           # Adaptive polling loop
│       ├── upload.py            # Streaming file upload
│       └── result_parser.py     # ZIP parsing, checksum verification
├── tests/                       # Unit tests (respx-mocked HTTP)
├── examples/                    # Usage examples
└── pyproject.toml

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

knowhere_python_sdk-0.1.0.tar.gz (522.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

knowhere_python_sdk-0.1.0-py3-none-any.whl (27.9 kB view details)

Uploaded Python 3

File details

Details for the file knowhere_python_sdk-0.1.0.tar.gz.

File metadata

  • Download URL: knowhere_python_sdk-0.1.0.tar.gz
  • Upload date:
  • Size: 522.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for knowhere_python_sdk-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ede49b28bc4045ef5708d697877be77f509061a9385f02de67cd60f9bbb66aa0
MD5 4d59dc99922f860790254f73d2aa0c87
BLAKE2b-256 f3060835a7a9bbfd73b682d3c82afc397b29f050560ad6cc7c40f9353319200b

See more details on using hashes here.

File details

Details for the file knowhere_python_sdk-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: knowhere_python_sdk-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 27.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for knowhere_python_sdk-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9170f6c9768d28822a5389ab44d4af353a0128a88063cf16911231111131317a
MD5 9c1bdc7718b9ed95d7a80fb61c2ed9a1
BLAKE2b-256 0f959c8d667a05b54fa394bbdf80dcb11d8f1a14aaeb99d4936dd005349e076b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page