Official Python SDK for the Knowhere document parsing API

These details have not been verified by PyPI

Project links

Project description

Knowhere Python SDK

Official Python SDK for the Knowhere document parsing API.

Installation

pip install knowhere-python-sdk

Or with uv:

uv add knowhere-python-sdk

Usage

import knowhere

client = knowhere.Knowhere(api_key="sk_...")

result = client.parse(url="https://example.com/report.pdf")

print(result.statistics.total_chunks)
print(result.full_markdown[:200])

for chunk in result.text_chunks:
    print(chunk.content[:80])

Retrieval and document lifecycle

New documents are published into a retrieval namespace. The server returns a stable document_id after the job is published. client.jobs.create(...) does not return a usable document_id; persist job_result.document_id if you need to update or archive the same document later.

job = client.jobs.create(
    source_type="url",
    source_url="https://example.com/manual.pdf",
    namespace="support-center",
)

job_result = client.jobs.wait(job.job_id)
document_id = job_result.document_id

if document_id is None:
    raise RuntimeError("Expected document_id after successful publication.")

After the job is done and published, query the canonical document content:

response = client.retrieval.query(
    namespace="support-center",
    query="How do I reset Bluetooth pairing?",
    top_k=5,
    channels=["path", "term"],
    filter_mode="keep",
    signal_paths=["Bluetooth", "Pairing"],
)

print(response.router_used)

for result in response.results:
    print(result.content)
    print(result.score)
    print(result.source.source_file_name, result.source.section_path)

Use document_id to update or archive a document:

update_job = client.jobs.create(
    source_type="url",
    source_url="https://example.com/manual-v2.pdf",
    document_id=document_id,
)

document = client.documents.get(document_id)
print(document.status)

client.documents.archive(document_id)

You can also list documents in a namespace:

documents = client.documents.list(namespace="support-center")
for document in documents.documents:
    print(document.document_id, document.status)

Retrieval supports exclusions when clients want follow-up results that avoid previously used documents or sections:

response = client.retrieval.query(
    namespace="support-center",
    query="battery charging",
    exclude_document_ids=["doc_old"],
    exclude_sections=[
        {"document_id": "doc_123", "section_path": "Appendix / Legal"}
    ],
)

While you can provide an api_key keyword argument, we recommend using python-dotenv to add KNOWHERE_API_KEY="sk_..." to your .env file so that your API key is not stored in source control.

Parse a local file

from pathlib import Path

result = client.parse(
    file=Path("report.pdf"),
    parsing_params={"model": "advanced", "ocr_enabled": True},
)

print(result.manifest.source_file_name)  # "report.pdf"
print(len(result.chunks))                # 152
print(result.namespace)                  # "default" or your explicit namespace
print(result.document_id)                # Published canonical document id

Access different chunk types

result = client.parse(url="https://example.com/report.pdf")

# Text chunks
for chunk in result.text_chunks:
    print(chunk.keywords)
    print(chunk.summary)

# Image chunks (raw bytes loaded from ZIP)
for chunk in result.image_chunks:
    print(chunk.file_path)
    print(len(chunk.data))       # bytes
    chunk.save("./output/")      # writes image to disk

# Table chunks (HTML loaded from ZIP)
for chunk in result.table_chunks:
    print(chunk.file_path)
    print(chunk.html[:100])

Save all results to disk

result = client.parse(file=Path("report.pdf"))
result.save("./output/report/")

Async usage

import asyncio
import knowhere

async def main():
    async with knowhere.AsyncKnowhere(api_key="sk_...") as client:
        result = await client.parse(url="https://example.com/report.pdf")
        print(result.statistics.total_chunks)

        for chunk in result.text_chunks:
            print(chunk.summary)

asyncio.run(main())

Step-by-step control

For granular control over the parsing workflow, use the jobs resource directly:

from pathlib import Path

# Step 1: Create a parsing job
job = client.jobs.create(
    source_type="file",
    file_name="report.pdf",
    namespace="support-center",
    parsing_params={"model": "advanced", "ocr_enabled": True},
)

# Step 2: Upload file to presigned URL
client.jobs.upload(job, file=Path("report.pdf"))

# Step 3: Poll until done (adaptive backoff)
job_result = client.jobs.wait(job.job_id, poll_interval=10.0, poll_timeout=1800.0)

print(job_result.document_id)  # Persist this to update/archive the document later.

# Step 4: Download and parse results
result = client.jobs.load(job_result)
print(result.statistics)

Handling errors

All errors inherit from knowhere.KnowhereError.

import knowhere

try:
    result = client.parse(url="https://example.com/report.pdf")
except knowhere.AuthenticationError:
    print("Invalid API key")
except knowhere.APIStatusError as e:
    print(f"{e.status_code}: {e.message}")

Configuration

The SDK reads configuration from constructor arguments, environment variables, or defaults (in that priority order):

Variable	Description	Default
`KNOWHERE_API_KEY`	API key (required)	—
`KNOWHERE_BASE_URL`	API base URL	`https://api.knowhereto.ai`
`KNOWHERE_LOG_LEVEL`	Log level	`WARNING`

# Uses environment variables automatically
client = knowhere.Knowhere()

# Or configure explicitly
client = knowhere.Knowhere(
    api_key="sk_...",
    base_url="https://api.knowhereto.ai",
    timeout=30.0,           # HTTP request timeout (default: 60s)
    upload_timeout=300.0,   # File upload timeout (default: 600s)
    max_retries=3,          # Max retry attempts (default: 5)
)

Retries

Connection errors, 429 Rate Limit, and >=500 Internal errors are automatically retried with exponential backoff.

client = knowhere.Knowhere(
    api_key="sk_...",
    max_retries=3,  # default is 5
)

Determining the installed version

import knowhere
print(knowhere.__version__)

Versioning

This package follows Semantic Versioning.

We publish stable releases to PyPI. To install the latest unreleased changes directly from the repository: https://github.com/Ontos-AI/knowhere-python-sdk

Requirements

Python 3.9+
httpx >=0.25.0,<1.0
pydantic >=2.0.0,<3.0
typing-extensions >=4.7.0

Community

Contributing guide: CONTRIBUTING.md
Security policy: SECURITY.md
Code of conduct: CODE_OF_CONDUCT.md

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.0

Apr 27, 2026

This version

0.3.2

Apr 23, 2026

0.3.1

Apr 22, 2026

0.3.0

Apr 21, 2026

0.2.1

Apr 9, 2026

0.2.0

Mar 18, 2026

0.1.0

Feb 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

knowhere_python_sdk-0.3.2.tar.gz (525.5 kB view details)

Uploaded Apr 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

knowhere_python_sdk-0.3.2-py3-none-any.whl (35.4 kB view details)

Uploaded Apr 23, 2026 Python 3

File details

Details for the file knowhere_python_sdk-0.3.2.tar.gz.

File metadata

Download URL: knowhere_python_sdk-0.3.2.tar.gz
Upload date: Apr 23, 2026
Size: 525.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for knowhere_python_sdk-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`042d3a136ddd2c99b053e699ad7341d042e78ae38f256275ed202139c3f4d4b6`
MD5	`9cd0a2f0088622dff0559066c7e1d9d8`
BLAKE2b-256	`93467af40b5183d6f0095b8a9484d8728008a26805b6cebb6e12b3f34c709c1a`

See more details on using hashes here.

File details

Details for the file knowhere_python_sdk-0.3.2-py3-none-any.whl.

File metadata

Download URL: knowhere_python_sdk-0.3.2-py3-none-any.whl
Upload date: Apr 23, 2026
Size: 35.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for knowhere_python_sdk-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`74ef207e15a47770b96d910307f8b7d5b105d201decf868a86e1f6040dfb84e4`
MD5	`5f113de96ac415282edc0442354b846b`
BLAKE2b-256	`8cbc815ed320e9c125f78925ff61029f707af4a420a1111eeb75502645a5b09f`

See more details on using hashes here.

knowhere-python-sdk 0.3.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Knowhere Python SDK

Installation

Usage

Retrieval and document lifecycle

Parse a local file

Access different chunk types

Save all results to disk

Async usage

Step-by-step control

Handling errors

Configuration

Retries

Determining the installed version

Versioning

Requirements

Community

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes