Official Python SDK for the Knowhere document parsing API
Project description
Knowhere Python SDK
Official Python SDK for the Knowhere document parsing API.
Installation
pip install knowhere-python-sdk
Or with uv:
uv add knowhere-python-sdk
Quick Start
import knowhere
client = knowhere.Knowhere(api_key="sk_...")
# Parse a document from URL
result = client.parse(url="https://example.com/report.pdf")
print(result.statistics.total_chunks) # 152
print(result.full_markdown[:200]) # First 200 chars of full markdown
for chunk in result.text_chunks:
print(chunk.content[:80])
Parse a Local File
from pathlib import Path
result = client.parse(
file=Path("report.pdf"),
parsing_params={"model": "advanced", "ocr_enabled": True},
)
print(result.manifest.source_file_name) # "report.pdf"
print(len(result.chunks)) # 152
Access Different Chunk Types
result = client.parse(url="https://example.com/report.pdf")
# Text chunks
for chunk in result.text_chunks:
print(chunk.keywords)
print(chunk.summary)
# Image chunks (raw bytes loaded from ZIP)
for chunk in result.image_chunks:
print(chunk.file_path)
print(len(chunk.data)) # bytes
chunk.save("./output/") # writes image to disk
# Table chunks (HTML loaded from ZIP)
for chunk in result.table_chunks:
print(chunk.file_path)
print(chunk.html[:100])
Save All Results to Disk
result = client.parse(file=Path("report.pdf"))
result.save("./output/report/")
Async Usage
import asyncio
import knowhere
async def main():
async with knowhere.AsyncKnowhere(api_key="sk_...") as client:
result = await client.parse(url="https://example.com/report.pdf")
print(result.statistics.total_chunks)
for chunk in result.text_chunks:
print(chunk.summary)
asyncio.run(main())
Step-by-Step Control
For granular control over the parsing workflow, use the jobs resource directly:
from pathlib import Path
# Step 1: Create a parsing job
job = client.jobs.create(
source_type="file",
file_name="report.pdf",
parsing_params={"model": "advanced", "ocr_enabled": True},
)
# Step 2: Upload file to presigned URL
client.jobs.upload(job, file=Path("report.pdf"))
# Step 3: Poll until done (adaptive backoff)
job_result = client.jobs.wait(job.job_id, poll_interval=10.0, poll_timeout=1800.0)
# Step 4: Download and parse results
result = client.jobs.load(job_result)
print(result.statistics)
Configuration
The SDK reads configuration from constructor arguments, environment variables, or defaults (in that priority order):
| Variable | Description | Default |
|---|---|---|
KNOWHERE_API_KEY |
API key (required) | — |
KNOWHERE_BASE_URL |
API base URL | https://api.knowhereto.ai |
KNOWHERE_LOG_LEVEL |
Log level | WARNING |
# Uses environment variables automatically
client = knowhere.Knowhere()
# Or configure explicitly
client = knowhere.Knowhere(
api_key="sk_...",
base_url="https://api.knowhereto.ai",
timeout=30.0, # HTTP request timeout (default: 60s)
upload_timeout=300.0, # File upload timeout (default: 600s)
max_retries=3, # Max retry attempts (default: 5)
)
Context Manager
# Sync — ensures httpx.Client is properly closed
with knowhere.Knowhere(api_key="sk_...") as client:
result = client.parse(url="https://example.com/report.pdf")
# Async — ensures httpx.AsyncClient is properly closed
async with knowhere.AsyncKnowhere(api_key="sk_...") as client:
result = await client.parse(url="https://example.com/report.pdf")
Error Handling
from knowhere import (
Knowhere,
AuthenticationError,
NotFoundError,
RateLimitError,
BadRequestError,
APIStatusError,
PollingTimeoutError,
)
try:
result = client.parse(url="https://example.com/report.pdf")
except BadRequestError as e:
print(e.status_code) # 400
print(e.code) # "INVALID_ARGUMENT"
print(e.message) # "Unsupported file format"
print(e.request_id) # "req_abc123"
except NotFoundError as e:
print(e.message) # "Job not found"
except RateLimitError as e:
print(e.retry_after) # seconds to wait
except AuthenticationError:
print("Invalid API key")
except PollingTimeoutError:
print("Job did not complete within timeout")
except APIStatusError as e:
print(f"API error {e.status_code}: {e.message}")
Requirements
- Python 3.9+
- httpx
>=0.25.0,<1.0 - pydantic
>=2.0.0,<3.0 - typing-extensions
>=4.7.0
Building from Source
Prerequisites
- Python 3.9 or later
- uv (recommended) or pip
Build
git clone https://github.com/Ontos-AI/knowhere-python-sdk.git
cd knowhere-python-sdk
# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh
# Build sdist + wheel
uv build
# Install the built wheel
pip install dist/knowhere_python_sdk-*.whl
Development
Setup
git clone https://github.com/Ontos-AI/knowhere-python-sdk.git
cd knowhere-python-sdk
# Create venv and install all dependencies (including dev)
uv sync --all-extras
Running Tests
# Run all unit tests
uv run pytest tests/ -v
# Run with coverage
uv run coverage run -m pytest tests/ -v
uv run coverage report -m
Linting and Type Checking
# Lint
uv run ruff check src/
# Type check
uv run mypy src/knowhere/
Project Structure
knowhere-python-sdk/
├── src/knowhere/
│ ├── __init__.py # Public API surface
│ ├── _client.py # Knowhere + AsyncKnowhere clients
│ ├── _base_client.py # HTTP logic, retry, error parsing
│ ├── _exceptions.py # Exception hierarchy
│ ├── _constants.py # Default URLs, timeouts, env var names
│ ├── _types.py # Sentinel types, callback type aliases
│ ├── _logging.py # Logger setup, header redaction
│ ├── _response.py # APIResponse wrapper
│ ├── _version.py # __version__
│ ├── py.typed # PEP 561 marker
│ ├── types/
│ │ ├── job.py # Job, JobResult, JobError
│ │ ├── result.py # ParseResult, Manifest, Chunk types
│ │ └── params.py # ParsingParams, WebhookConfig
│ ├── resources/
│ │ └── jobs.py # Jobs + AsyncJobs resource
│ └── lib/
│ ├── polling.py # Adaptive polling loop
│ ├── upload.py # Streaming file upload
│ └── result_parser.py # ZIP parsing, checksum verification
├── tests/ # Unit tests (respx-mocked HTTP)
├── examples/ # Usage examples
└── pyproject.toml
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file knowhere_python_sdk-0.1.0.tar.gz.
File metadata
- Download URL: knowhere_python_sdk-0.1.0.tar.gz
- Upload date:
- Size: 522.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ede49b28bc4045ef5708d697877be77f509061a9385f02de67cd60f9bbb66aa0
|
|
| MD5 |
4d59dc99922f860790254f73d2aa0c87
|
|
| BLAKE2b-256 |
f3060835a7a9bbfd73b682d3c82afc397b29f050560ad6cc7c40f9353319200b
|
File details
Details for the file knowhere_python_sdk-0.1.0-py3-none-any.whl.
File metadata
- Download URL: knowhere_python_sdk-0.1.0-py3-none-any.whl
- Upload date:
- Size: 27.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9170f6c9768d28822a5389ab44d4af353a0128a88063cf16911231111131317a
|
|
| MD5 |
9c1bdc7718b9ed95d7a80fb61c2ed9a1
|
|
| BLAKE2b-256 |
0f959c8d667a05b54fa394bbdf80dcb11d8f1a14aaeb99d4936dd005349e076b
|