Skip to main content

Official Python SDK for the Knowhere document parsing API

Project description

Knowhere Python SDK

PyPI version

Official Python SDK for the Knowhere document parsing API.

Installation

pip install knowhere-python-sdk

Or with uv:

uv add knowhere-python-sdk

Usage

import knowhere

client = knowhere.Knowhere(api_key="sk_...")

result = client.parse(url="https://example.com/report.pdf")

print(result.statistics.total_chunks)
print(result.full_markdown[:200])

for chunk in result.text_chunks:
    print(chunk.content[:80])

While you can provide an api_key keyword argument, we recommend using python-dotenv to add KNOWHERE_API_KEY="sk_..." to your .env file so that your API key is not stored in source control.

Parse a local file

from pathlib import Path

result = client.parse(
    file=Path("report.pdf"),
    parsing_params={"model": "advanced", "ocr_enabled": True},
)

print(result.manifest.source_file_name)  # "report.pdf"
print(len(result.chunks))                # 152

Access different chunk types

result = client.parse(url="https://example.com/report.pdf")

# Text chunks
for chunk in result.text_chunks:
    print(chunk.keywords)
    print(chunk.summary)

# Image chunks (raw bytes loaded from ZIP)
for chunk in result.image_chunks:
    print(chunk.file_path)
    print(len(chunk.data))       # bytes
    chunk.save("./output/")      # writes image to disk

# Table chunks (HTML loaded from ZIP)
for chunk in result.table_chunks:
    print(chunk.file_path)
    print(chunk.html[:100])

Save all results to disk

result = client.parse(file=Path("report.pdf"))
result.save("./output/report/")

Async usage

import asyncio
import knowhere

async def main():
    async with knowhere.AsyncKnowhere(api_key="sk_...") as client:
        result = await client.parse(url="https://example.com/report.pdf")
        print(result.statistics.total_chunks)

        for chunk in result.text_chunks:
            print(chunk.summary)

asyncio.run(main())

Step-by-step control

For granular control over the parsing workflow, use the jobs resource directly:

from pathlib import Path

# Step 1: Create a parsing job
job = client.jobs.create(
    source_type="file",
    file_name="report.pdf",
    parsing_params={"model": "advanced", "ocr_enabled": True},
)

# Step 2: Upload file to presigned URL
client.jobs.upload(job, file=Path("report.pdf"))

# Step 3: Poll until done (adaptive backoff)
job_result = client.jobs.wait(job.job_id, poll_interval=10.0, poll_timeout=1800.0)

# Step 4: Download and parse results
result = client.jobs.load(job_result)
print(result.statistics)

Handling errors

All errors inherit from knowhere.KnowhereError.

import knowhere

try:
    result = client.parse(url="https://example.com/report.pdf")
except knowhere.AuthenticationError:
    print("Invalid API key")
except knowhere.APIStatusError as e:
    print(f"{e.status_code}: {e.message}")

Configuration

The SDK reads configuration from constructor arguments, environment variables, or defaults (in that priority order):

Variable Description Default
KNOWHERE_API_KEY API key (required)
KNOWHERE_BASE_URL API base URL https://api.knowhereto.ai
KNOWHERE_LOG_LEVEL Log level WARNING
# Uses environment variables automatically
client = knowhere.Knowhere()

# Or configure explicitly
client = knowhere.Knowhere(
    api_key="sk_...",
    base_url="https://api.knowhereto.ai",
    timeout=30.0,           # HTTP request timeout (default: 60s)
    upload_timeout=300.0,   # File upload timeout (default: 600s)
    max_retries=3,          # Max retry attempts (default: 5)
)

Retries

Connection errors, 429 Rate Limit, and >=500 Internal errors are automatically retried with exponential backoff.

client = knowhere.Knowhere(
    api_key="sk_...",
    max_retries=3,  # default is 5
)

Determining the installed version

import knowhere
print(knowhere.__version__)

Versioning

This package follows Semantic Versioning.

We publish stable releases to PyPI. To install the latest unreleased changes directly from the repository: https://github.com/Ontos-AI/knowhere-python-sdk

Requirements

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

knowhere_python_sdk-0.2.1.tar.gz (518.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

knowhere_python_sdk-0.2.1-py3-none-any.whl (30.3 kB view details)

Uploaded Python 3

File details

Details for the file knowhere_python_sdk-0.2.1.tar.gz.

File metadata

  • Download URL: knowhere_python_sdk-0.2.1.tar.gz
  • Upload date:
  • Size: 518.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for knowhere_python_sdk-0.2.1.tar.gz
Algorithm Hash digest
SHA256 1762880eb68af6b4820a213607ebd1bc2b64056df87054b4e7a5ef4995355877
MD5 afc1e730002136a1d119b9e66dccd2fb
BLAKE2b-256 e3917833a601729261c8d27e92706d2f841b4474bcfe8f35b5d5f0f08e844141

See more details on using hashes here.

File details

Details for the file knowhere_python_sdk-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: knowhere_python_sdk-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 30.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for knowhere_python_sdk-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5b7626110078583ffef39161f60614f4b6a5eb395f00eedc5f951572798f8503
MD5 1b353b12978b1194a8443b737ac20fe5
BLAKE2b-256 dbc5cc01f2e4ac8b05aca5fc0b977bc9a93c4432421cd9f216be63ab47b8ce97

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page