KnowledgeSDK Python SDK — Extract, classify and search web knowledge
Project description
KnowledgeSDK Python SDK
Official Python client for the KnowledgeSDK API — extract, classify, scrape, screenshot, and search web knowledge programmatically.
Installation
pip install knowledgesdk
Quick Start
from knowledgesdk import KnowledgeSDK
ks = KnowledgeSDK("sk_ks_your_key_here")
Usage
Extract
Run a full knowledge extraction on a website (synchronous):
result = ks.extract.run("https://stripe.com")
print(result.business.business_name)
print(result.business.industry_sector)
print(result.pages_scraped)
for item in result.knowledge_items:
print(item.title, item.content)
Run an asynchronous extraction with a callback:
job = ks.extract.run_async(
"https://stripe.com",
max_pages=20,
callback_url="https://myapp.com/webhook"
)
print(job.job_id) # e.g. "job_abc123"
print(job.status) # e.g. "PENDING"
Scrape
Scrape a single web page and get its Markdown content:
page = ks.scrape.run("https://docs.stripe.com/get-started")
print(page.title)
print(page.markdown)
print(page.links)
Classify
Classify a business from its website:
biz = ks.classify.run("https://stripe.com")
print(biz.business_name)
print(biz.business_type)
print(biz.industry_sector)
print(biz.target_audience)
print(biz.confidence_score)
Screenshot
Capture a screenshot of a web page:
shot = ks.screenshot.run("https://stripe.com")
# shot.screenshot is a base64-encoded PNG string
import base64
image_bytes = base64.b64decode(shot.screenshot)
with open("screenshot.png", "wb") as f:
f.write(image_bytes)
Sitemap
Fetch the sitemap for a website:
site_map = ks.sitemap.run("https://stripe.com")
print(site_map.count)
for url in site_map.urls:
print(url)
Search
Search the extracted knowledge base:
results = ks.search.run("pricing plans", limit=5)
print(f"Found {results.total} results")
for hit in results.hits:
print(hit.title, hit.score)
print(hit.content)
Webhooks
# Create a webhook
wh = ks.webhooks.create(
url="https://myapp.com/hook",
events=["EXTRACTION_COMPLETED", "JOB_FAILED"],
display_name="My App Webhook"
)
print(wh.id) # e.g. "weh_xxx"
print(wh.token) # signing token
# List all webhooks
all_webhooks = ks.webhooks.list()
for w in all_webhooks:
print(w.id, w.url, w.status)
# Send a test event to a webhook
ks.webhooks.test("weh_xxx")
# Delete a webhook
ks.webhooks.delete("weh_xxx")
Jobs
Retrieve a job by ID:
job = ks.jobs.get("job_xxx")
print(job.status) # PENDING | RUNNING | COMPLETED | FAILED
print(job.progress) # 0–100
print(job.result)
Poll until a job completes (blocking):
completed = ks.jobs.poll("job_xxx", interval_sec=5, timeout_sec=300)
print(completed.result)
Configuration
| Parameter | Default | Description |
|---|---|---|
api_key |
required | API key starting with sk_ks_ |
base_url |
https://api.knowledgesdk.com |
Override via KNOWLEDGESDK_BASE_URL env var |
timeout |
30000 |
Request timeout in milliseconds |
max_retries |
5 |
Max retries with exponential backoff |
debug |
False |
Enable request/response logging |
Environment Variables
export KNOWLEDGESDK_BASE_URL="https://api.knowledgesdk.com"
Debug Mode
ks = KnowledgeSDK("sk_ks_your_key", debug=True)
# Or toggle at runtime
ks.set_debug_mode(True)
Custom Headers
ks.set_header("X-Custom-Header", "value")
ks.set_headers({"X-Header-A": "a", "X-Header-B": "b"})
Error Handling
from knowledgesdk import (
KnowledgeSDK,
AuthenticationError,
APIError,
RateLimitError,
NetworkError,
TimeoutError,
)
ks = KnowledgeSDK("sk_ks_your_key")
try:
result = ks.extract.run("https://stripe.com")
except AuthenticationError as e:
print(f"Auth error: {e.message}")
except RateLimitError as e:
print(f"Rate limited: {e.message}")
except APIError as e:
print(f"API error {e.status_code}: {e.message}")
except NetworkError as e:
print(f"Network error: {e.message}")
except TimeoutError as e:
print(f"Request timed out: {e.message}")
Type Reference
All response objects are Pydantic models and are fully typed.
| Type | Description |
|---|---|
ExtractResult |
Full extraction with business and knowledge items |
BusinessClassification |
Business name, type, industry, audience, etc. |
KnowledgeItem |
A single knowledge article extracted from a page |
ScrapeResult |
Markdown content, title, description, links |
ScreenshotResult |
Base64 PNG screenshot |
SitemapResult |
List of URLs from the site's sitemap |
SearchResult |
Search hits, total count, query |
SearchHit |
Individual search result with score |
AsyncJobRef |
Job ID and initial status for async operations |
JobResult |
Full job status, progress, result, and error |
WebhookFull |
Webhook ID, URL, events, status, token |
Requirements
- Python >= 3.8
requests >= 2.31.0pydantic >= 2.0.0
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file knowledgesdk-0.2.0.tar.gz.
File metadata
- Download URL: knowledgesdk-0.2.0.tar.gz
- Upload date:
- Size: 12.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df7f634ba9237fa6ee62fdbaf8c5f52502f4a55898287c9248f44cd488dc2b09
|
|
| MD5 |
4aa71d9ad27fb3cfad5c3cfaa1873e30
|
|
| BLAKE2b-256 |
0ea3356c8dc4a303b8b351bbc1df66587e19f776af8a6d9981faea05f9000cc6
|
Provenance
The following attestation bundles were made for knowledgesdk-0.2.0.tar.gz:
Publisher:
publish.yml on KnowledgeSDK/knowledgesdk-python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
knowledgesdk-0.2.0.tar.gz -
Subject digest:
df7f634ba9237fa6ee62fdbaf8c5f52502f4a55898287c9248f44cd488dc2b09 - Sigstore transparency entry: 1154890456
- Sigstore integration time:
-
Permalink:
KnowledgeSDK/knowledgesdk-python@e241c92807b62a9d00162464e5c64c8dbe75a851 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/KnowledgeSDK
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e241c92807b62a9d00162464e5c64c8dbe75a851 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file knowledgesdk-0.2.0-py3-none-any.whl.
File metadata
- Download URL: knowledgesdk-0.2.0-py3-none-any.whl
- Upload date:
- Size: 17.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8cfa859574e0fff5860c273f2a5043ddaadde1c6cb11e87897a3a3c3dcba1fca
|
|
| MD5 |
25243ef4e4a9c437f3927f5c963b489a
|
|
| BLAKE2b-256 |
31704ffc2ca92229bd8b80302904e1bdc6a7432fb9bc4f13fb3654e7d61070b4
|
Provenance
The following attestation bundles were made for knowledgesdk-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on KnowledgeSDK/knowledgesdk-python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
knowledgesdk-0.2.0-py3-none-any.whl -
Subject digest:
8cfa859574e0fff5860c273f2a5043ddaadde1c6cb11e87897a3a3c3dcba1fca - Sigstore transparency entry: 1154890457
- Sigstore integration time:
-
Permalink:
KnowledgeSDK/knowledgesdk-python@e241c92807b62a9d00162464e5c64c8dbe75a851 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/KnowledgeSDK
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e241c92807b62a9d00162464e5c64c8dbe75a851 -
Trigger Event:
workflow_dispatch
-
Statement type: