Skip to main content

Python SDK for the Knowledge2 retrieval platform

Project description

Knowledge2 Python SDK

PyPI version Python 3.11+ License: MIT

Official Python client for the Knowledge2 retrieval platform. Build, search, and tune hybrid retrieval corpora with a simple API.

Installation

pip install knowledge2

From source:

pip install -e .

Framework extras:

pip install -e ".[langchain]"
pip install -e ".[llamaindex]"
pip install -e ".[integrations]"  # both

Quick Start

from sdk import Knowledge2

client = Knowledge2(api_key="k2_...")

# Create project and corpus
project = client.create_project("My Project")
corpus = client.create_corpus(project["id"], "My Corpus")

# Upload a document
client.upload_document(
    corpus["id"],
    raw_text="Knowledge2 is a retrieval platform for building hybrid search systems.",
)

# Build indexes and search
client.build_indexes(corpus["id"])
results = client.search(corpus["id"], "retrieval platform", top_k=5)

for chunk in results["results"]:
    print(chunk["score"], chunk.get("text", "")[:80])

Framework integrations

LangChain

from sdk.integrations.langchain import K2LangChainRetriever

retriever = K2LangChainRetriever(
    api_key="YOUR_API_KEY",
    api_host="https://api.knowledge2.ai",
    corpus_id="YOUR_CORPUS_ID",
    top_k=5,
    filters={"topic": "search"},
    hybrid={"enabled": True, "fusion_mode": "rrf", "dense_weight": 0.7, "sparse_weight": 0.3},
)

docs = retriever.invoke("How does hybrid retrieval work?")

LlamaIndex

from sdk.integrations.llamaindex import K2LlamaIndexRetriever

retriever = K2LlamaIndexRetriever(
    api_key="YOUR_API_KEY",
    api_host="https://api.knowledge2.ai",
    corpus_id="YOUR_CORPUS_ID",
    top_k=5,
)

nodes = retriever.retrieve("How does hybrid retrieval work?")

Authentication

Use one of the following credentials. API key is the primary method for programmatic access.

Method Header Typical use
API key X-API-Key Primary — programmatic access from apps and scripts
Bearer token Authorization: Bearer <token> Console / Auth0 session
Admin token X-Admin-Token Internal admin operations
# API key (recommended)
client = Knowledge2(api_key="k2_...")

# From environment
import os
client = Knowledge2(api_key=os.environ["K2_API_KEY"])

# Bearer token (console)
client = Knowledge2(bearer_token=os.environ["K2_BEARER_TOKEN"])

# Admin operations
client = Knowledge2(admin_token=os.environ["K2_ADMIN_TOKEN"])
# Dynamic auth with token factory (OAuth/OIDC)
client = Knowledge2(
    bearer_token_factory=lambda: fetch_oauth_token(),
    token_cache_ttl=300,  # cache for 5 minutes (default)
)

# Check if authenticated
if client.is_authenticated():
    print("Credentials are configured")

Configuration

Parameter Default Description
api_host https://api.knowledge2.ai Base URL of the API
api_key None API key for X-API-Key auth
org_id Auto-detected from API key Organisation ID
bearer_token None Bearer token for console auth
admin_token None Admin token for X-Admin-Token
timeout None (httpx default) Request timeout in seconds or httpx.Timeout
max_retries 2 Max retries for transient errors (0 to disable)
limits None ClientLimits for connection pool tuning
bearer_token_factory None Callable returning a bearer token string
token_cache_ttl 300.0 Seconds to cache factory-produced token
validate_responses False Enable Pydantic response validation
http_client None Pre-configured httpx.Client for custom transport
from sdk import Knowledge2, ClientLimits

# Custom host and timeout
client = Knowledge2(
    api_host="https://api.example.com",
    api_key="k2_...",
    timeout=30.0,
)

# Connection pool limits
limits = ClientLimits(
    max_connections=50,
    max_keepalive_connections=20,
    keepalive_expiry=60.0,
)
client = Knowledge2(api_key="k2_...", limits=limits)

# Disable retries
client = Knowledge2(api_key="k2_...", max_retries=0)
from sdk import Knowledge2, ClientTimeouts

# Per-phase timeouts
client = Knowledge2(
    api_key="k2_...",
    timeout=ClientTimeouts(connect=5, read=120, write=30, pool=10),
)

Config Object

Use K2Config for environment-based or file-based configuration:

from sdk import Knowledge2

# From K2_* environment variables
client = Knowledge2.from_env()

# From a config file
client = Knowledge2.from_file("~/.k2/config.yaml")

# From a named profile
client = Knowledge2.from_profile("staging")

Constructor Behavior

When org_id is omitted and api_key is provided, the sync client (Knowledge2(...)) calls GET /v1/auth/whoami during __init__ to auto-detect the organization ID. For the async client, this happens in AsyncKnowledge2.create(...).

To skip this network call, pass org_id explicitly:

client = Knowledge2(api_key="k2_...", org_id="org_xxx")

To skip this network call entirely, pass lazy=True:

client = Knowledge2(api_key="k2_...", lazy=True)
# org_id will be None — pass it explicitly or call get_whoami() later

Auth Requirements by Endpoint

Most endpoints work with an API key. Some require bearer token or admin auth:

Endpoint Required Auth
Most resources API key (X-API-Key)
usage_summary, usage_by_corpus, usage_by_key Bearer token
list_api_keys Admin token

Error Handling

All SDK exceptions inherit from Knowledge2Error. Use except Knowledge2Error as a catch-all.

Knowledge2Error (base)
├── APIError (HTTP 4xx/5xx)
│   ├── BadRequestError (400)
│   ├── AuthenticationError (401)
│   ├── PermissionDeniedError (403)
│   ├── NotFoundError (404)
│   ├── ConflictError (409)
│   ├── ValidationError (422)
│   ├── RateLimitError (429)
│   └── ServerError (500, 502, 503, 504)
├── APIConnectionError (network failures)
├── APITimeoutError (request timeout)
└── ConfirmationRequiredError (client-side deletion guard)
from sdk import Knowledge2
from sdk.errors import (
    Knowledge2Error,
    NotFoundError,
    ValidationError,
    RateLimitError,
)

client = Knowledge2(api_key="k2_...")

try:
    corpus = client.get_corpus("nonexistent")
except NotFoundError as e:
    print(f"Corpus not found: {e.message}")
except ValidationError as e:
    print(f"Validation failed: {e.details}")
except RateLimitError as e:
    if e.retryable:
        print(f"Rate limited; retry after {e.retry_after}s")
except Knowledge2Error as e:
    print(f"API error: {e.message}")

retryable property: Indicates whether the operation can be retried. True for RateLimitError, ServerError, APIConnectionError, and APITimeoutError; False for auth, validation, and not-found errors.

Automatic Retries

The SDK retries transient failures automatically:

  • Retried: 5xx, 429, connection errors, timeouts
  • Not retried: 4xx (except 429)

Configure via max_retries (default 2). Backoff is exponential with jitter; for RateLimitError, the Retry-After header is respected when present.

# Default: 2 retries
client = Knowledge2(api_key="k2_...")

# Aggressive retries
client = Knowledge2(api_key="k2_...", max_retries=5)

# No retries
client = Knowledge2(api_key="k2_...", max_retries=0)

Per-Call Overrides

Use RequestOptions to override timeout and retry settings for a single call:

from sdk import Knowledge2, RequestOptions, ClientTimeouts

client = Knowledge2(api_key="k2_...")

# Longer timeout for a known-slow operation
opts = RequestOptions(timeout=ClientTimeouts(read=300))
results = client.search(corpus_id, "complex query", request_options=opts)

# Zero retries for a health check
opts = RequestOptions(max_retries=0)
client.get_corpus(corpus_id, request_options=opts)

# Passthrough tracing headers
opts = RequestOptions(passthrough_headers={"X-Request-ID": "abc-123"})
client.search(corpus_id, "query", request_options=opts)

Raw Response Access

Use with_raw_response to inspect HTTP status and headers alongside the parsed body. HTTP errors are wrapped into RawResponse instead of raised, so you can inspect error responses without try/except:

raw = client.with_raw_response.list_corpora()
raw.status_code   # 200
raw.headers       # {"content-type": "application/json", ...}
raw.parsed        # Page[dict] — same as client.list_corpora()

# Error responses are also wrapped (not raised)
raw = client.with_raw_response.get_corpus("nonexistent-id")
if raw.status_code >= 400:
    print(f"Error {raw.status_code}: {raw.parsed}")

# Async
raw = await async_client.with_raw_response.search(corpus_id, "query")

Note: Non-HTTP errors (APIConnectionError, APITimeoutError) are still raised since there is no HTTP response to wrap.

Custom HTTP Client

Inject a pre-configured httpx.Client for custom TLS, proxies, or instrumentation:

import httpx
from sdk import Knowledge2

# Custom TLS / proxy
http_client = httpx.Client(
    verify="/path/to/custom-ca.pem",
    proxy="http://corporate-proxy:8080",
)
client = Knowledge2(api_key="k2_...", http_client=http_client)

# Async variant
async_http = httpx.AsyncClient(verify="/path/to/custom-ca.pem")
async_client = AsyncKnowledge2(api_key="k2_...", http_client=async_http)

Ownership: When you supply http_client, the SDK does not close it. You are responsible for closing it when done. timeout and limits are ignored — configure them on your client directly.

Async Client

AsyncKnowledge2 provides native async support using httpx.AsyncClient:

from sdk import AsyncKnowledge2

# Use as async context manager
async with AsyncKnowledge2(api_key="k2_...") as client:
    results = await client.search(corpus_id, "query")

# With auto-detected org_id
client = await AsyncKnowledge2.create(api_key="k2_...")

# From environment
client = await AsyncKnowledge2.from_env()

When to use sync vs async

Use case Recommendation
Scripts, CLIs, Jupyter notebooks Knowledge2 (sync)
FastAPI, aiohttp, async frameworks AsyncKnowledge2 (async)
Agentic workloads (LangChain, LlamaIndex) AsyncKnowledge2 for concurrency
Thread-based parallelism Knowledge2 (sync) with threads

Pagination

Iterator methods (iter_*) yield items lazily across pages:

for corpus in client.iter_corpora(limit=50):
    print(corpus["name"])

for doc in client.iter_documents(corpus_id, limit=100):
    process(doc)

Manual pagination with list_* returns Page[T]:

page = client.list_corpora(limit=50)
print(f"Total: {page.total}, got {len(page)} items")
for corpus in page:
    print(corpus["name"])

Sub-Client Namespaces

Access methods through typed namespaces for better IDE autocomplete:

# Equivalent calls:
client.upload_document(corpus_id, raw_text="...")
client.documents.upload(corpus_id, raw_text="...")

# Available namespaces:
client.documents.*     # Document operations
client.corpora.*       # Corpus operations
client.search_ns.*     # Search operations
client.models_ns.*     # Model operations
client.jobs.*          # Job operations
client.training_ns.*   # Training operations
client.deployments.*   # Deployment operations
client.agents.*        # Agent operations
client.feeds.*         # Feed operations
client.pipelines.*     # Pipeline operations
client.auth.*          # Auth operations

Resource Overview

Resource Methods
Organisations create_org
Projects create_project, list_projects, iter_projects
Corpora create_corpus, list_corpora, iter_corpora, get_corpus, get_corpus_status, update_corpus, delete_corpus, list_corpus_models, iter_corpus_models
Documents upload_document, upload_documents_batch, upload_files_batch, ingest_urls, ingest_manifest, list_documents, iter_documents, get_document, delete_document, list_chunks, iter_chunks
Indexes build_indexes
Search search, search_batch, search_generate, embeddings, create_feedback
Training build_training_data, list_training_data, iter_training_data, list_tuning_runs, iter_tuning_runs, create_tuning_run, build_and_start_tuning_run, get_tuning_run, get_tuning_run_logs, get_eval_run, promote_tuning_run
Deployments create_deployment, list_deployments, iter_deployments
Jobs get_job, list_jobs, iter_jobs
Models list_models, iter_models, delete_model
Auth create_api_key, list_api_keys, get_whoami
Agents create_agent, get_agent, list_agents, update_agent, delete_agent, activate_agent, archive_agent, chat_with_agent, run_agent, list_agent_runs, list_agent_models, list_task_types, create_subscription, list_subscriptions, delete_subscription
Feeds create_feed, list_feeds, get_feed, update_feed, delete_feed, run_feed
Pipelines create_pipeline_spec, get_pipeline_spec, list_pipeline_specs, iter_pipeline_specs, update_pipeline_spec, delete_pipeline_spec, get_pipeline_spec_schema, dry_run_pipeline_spec, apply_pipeline_spec, archive_pipeline_spec, unarchive_pipeline_spec, create_pipeline_spec_draft, get_pipeline_spec_draft, activate_pipeline_spec_draft, discard_pipeline_spec_draft, get_pipeline_spec_graph, diff_pipeline_spec, refresh_pipeline_spec
Audit list_audit_logs, iter_audit_logs
Usage usage_summary, usage_by_corpus, usage_by_key
Console console_me, console_bootstrap, console_summary, console_projects, console_get_project, console_update_project, console_get_org, console_update_org, console_list_team, console_list_invites, console_create_invite, console_accept_invite, console_update_member_role, console_remove_member, console_list_api_keys, console_create_api_key, console_revoke_api_key
Onboarding get_onboarding_status, get_analysis, upload_gold_labels, upload_gold_labels_file, list_gold_labels, iter_gold_labels, list_synthetic_batches, get_synthetic_batch, list_evaluations, get_evaluation, get_evaluation_report, get_summarization_status, get_document_summary

Search Response Shapes

search() and search_batch() return different response structures:

# Single search — results are top-level
resp = client.search(corpus_id, "query")
resp["results"]   # list[SearchResult]
resp["meta"]      # SearchMeta (cold_start, warnings)

# Batch search — each query gets its own SearchResponse
resp = client.search_batch(corpus_id, queries=["q1", "q2"])
resp["responses"]           # list[SearchResponse]
resp["responses"][0]["results"]  # list[SearchResult] for q1

responses contains a list of SearchResponse objects (one per query), each of which has its own results and meta. This nesting is intentional — it mirrors the one-query-to-one-response mapping.

Preview Resources

The following resources require feature flags to be enabled on the server. Calling these methods against an environment where the flag is off will return a NotFoundError (404). A RuntimeWarning is emitted on first use.

Resource Feature Flag
Agents knowledge_agents_enabled
Feeds knowledge_agents_enabled
Pipelines pipelines_enabled
A2A a2a_enabled

To suppress the warning:

import warnings
warnings.filterwarnings("ignore", message=".*preview feature.*", category=RuntimeWarning)

Debug Logging

from sdk import Knowledge2

Knowledge2.set_debug(True)  # Log requests, responses, retries to stderr
client = Knowledge2(api_key="k2_...")
# ... use client ...

Alternative: configure the knowledge2 logger directly:

import logging
logging.getLogger("knowledge2").setLevel(logging.DEBUG)
logging.getLogger("knowledge2").addHandler(logging.StreamHandler())

Auth headers are redacted in logs.

Examples

Runnable examples are in the examples/ directory:

# End-to-end lifecycle (ingest, index, tune, search)
python -m sdk.examples.e2e_lifecycle

Version

from sdk import __version__
print(__version__)  # e.g. "0.3.0"

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

knowledge2-0.4.0.tar.gz (125.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

knowledge2-0.4.0-py3-none-any.whl (173.9 kB view details)

Uploaded Python 3

File details

Details for the file knowledge2-0.4.0.tar.gz.

File metadata

  • Download URL: knowledge2-0.4.0.tar.gz
  • Upload date:
  • Size: 125.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for knowledge2-0.4.0.tar.gz
Algorithm Hash digest
SHA256 01492b049805b764e0c80db77052918133ac9420397abcc7f8bfbd4a30cde2e9
MD5 13d1280faeeafdf5169ccd2ba94112be
BLAKE2b-256 b71fc94b0856501c45c1565f1810214674d0ab65b07087840cfb14c24bfa3b4f

See more details on using hashes here.

Provenance

The following attestation bundles were made for knowledge2-0.4.0.tar.gz:

Publisher: pypi-release.yml on knowledge2-ai/knowledge2-python-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file knowledge2-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: knowledge2-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 173.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for knowledge2-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6787e1f16f17f9aa9b890b2e0f2a98e3004510b3979106c88b1ed0427dafeebe
MD5 cf57157922b403f5125d01ddc6216c65
BLAKE2b-256 aa650c6d15abc46beac0a97065d087f1d288731e52e9526e3973bd8a3be84de9

See more details on using hashes here.

Provenance

The following attestation bundles were made for knowledge2-0.4.0-py3-none-any.whl:

Publisher: pypi-release.yml on knowledge2-ai/knowledge2-python-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page