Skip to main content

KnowledgeSDK Python SDK — Extract, classify and search web knowledge

Project description

KnowledgeSDK Python SDK

Official Python client for the KnowledgeSDK API — extract, classify, scrape, screenshot, and search web knowledge programmatically.

Installation

pip install knowledgesdk

Quick Start

from knowledgesdk import KnowledgeSDK

ks = KnowledgeSDK("sk_ks_your_key_here")

Usage

Extract

Run a full knowledge extraction on a website (synchronous):

result = ks.extract.run("https://stripe.com")

print(result.business.business_name)
print(result.business.industry_sector)
print(result.pages_scraped)

for item in result.knowledge_items:
    print(item.title, item.content)

Run an asynchronous extraction with a callback:

job = ks.extract.run_async(
    "https://stripe.com",
    max_pages=20,
    callback_url="https://myapp.com/webhook"
)

print(job.job_id)   # e.g. "job_abc123"
print(job.status)   # e.g. "PENDING"

Scrape

Scrape a single web page and get its Markdown content:

page = ks.scrape.run("https://docs.stripe.com/get-started")

print(page.title)
print(page.markdown)
print(page.links)

Classify

Classify a business from its website:

biz = ks.classify.run("https://stripe.com")

print(biz.business_name)
print(biz.business_type)
print(biz.industry_sector)
print(biz.target_audience)
print(biz.confidence_score)

Screenshot

Capture a screenshot of a web page:

shot = ks.screenshot.run("https://stripe.com")

# shot.screenshot is a base64-encoded PNG string
import base64
image_bytes = base64.b64decode(shot.screenshot)
with open("screenshot.png", "wb") as f:
    f.write(image_bytes)

Sitemap

Fetch the sitemap for a website:

site_map = ks.sitemap.run("https://stripe.com")

print(site_map.count)
for url in site_map.urls:
    print(url)

Search

Search the extracted knowledge base:

results = ks.search.run("pricing plans", limit=5)

print(f"Found {results.total} results")
for hit in results.hits:
    print(hit.title, hit.score)
    print(hit.content)

Webhooks

# Create a webhook
wh = ks.webhooks.create(
    url="https://myapp.com/hook",
    events=["EXTRACTION_COMPLETED", "JOB_FAILED"],
    display_name="My App Webhook"
)
print(wh.id)    # e.g. "weh_xxx"
print(wh.token) # signing token

# List all webhooks
all_webhooks = ks.webhooks.list()
for w in all_webhooks:
    print(w.id, w.url, w.status)

# Send a test event to a webhook
ks.webhooks.test("weh_xxx")

# Delete a webhook
ks.webhooks.delete("weh_xxx")

Jobs

Retrieve a job by ID:

job = ks.jobs.get("job_xxx")
print(job.status)   # PENDING | RUNNING | COMPLETED | FAILED
print(job.progress) # 0–100
print(job.result)

Poll until a job completes (blocking):

completed = ks.jobs.poll("job_xxx", interval_sec=5, timeout_sec=300)
print(completed.result)

Configuration

Parameter Default Description
api_key required API key starting with sk_ks_
base_url https://api.knowledgesdk.com Override via KNOWLEDGESDK_BASE_URL env var
timeout 30000 Request timeout in milliseconds
max_retries 5 Max retries with exponential backoff
debug False Enable request/response logging

Environment Variables

export KNOWLEDGESDK_BASE_URL="https://api.knowledgesdk.com"

Debug Mode

ks = KnowledgeSDK("sk_ks_your_key", debug=True)

# Or toggle at runtime
ks.set_debug_mode(True)

Custom Headers

ks.set_header("X-Custom-Header", "value")
ks.set_headers({"X-Header-A": "a", "X-Header-B": "b"})

Error Handling

from knowledgesdk import (
    KnowledgeSDK,
    AuthenticationError,
    APIError,
    RateLimitError,
    NetworkError,
    TimeoutError,
)

ks = KnowledgeSDK("sk_ks_your_key")

try:
    result = ks.extract.run("https://stripe.com")
except AuthenticationError as e:
    print(f"Auth error: {e.message}")
except RateLimitError as e:
    print(f"Rate limited: {e.message}")
except APIError as e:
    print(f"API error {e.status_code}: {e.message}")
except NetworkError as e:
    print(f"Network error: {e.message}")
except TimeoutError as e:
    print(f"Request timed out: {e.message}")

Type Reference

All response objects are Pydantic models and are fully typed.

Type Description
ExtractResult Full extraction with business and knowledge items
BusinessClassification Business name, type, industry, audience, etc.
KnowledgeItem A single knowledge article extracted from a page
ScrapeResult Markdown content, title, description, links
ScreenshotResult Base64 PNG screenshot
SitemapResult List of URLs from the site's sitemap
SearchResult Search hits, total count, query
SearchHit Individual search result with score
AsyncJobRef Job ID and initial status for async operations
JobResult Full job status, progress, result, and error
WebhookFull Webhook ID, URL, events, status, token

Requirements

  • Python >= 3.8
  • requests >= 2.31.0
  • pydantic >= 2.0.0

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

knowledgesdk-0.2.0.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

knowledgesdk-0.2.0-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file knowledgesdk-0.2.0.tar.gz.

File metadata

  • Download URL: knowledgesdk-0.2.0.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for knowledgesdk-0.2.0.tar.gz
Algorithm Hash digest
SHA256 df7f634ba9237fa6ee62fdbaf8c5f52502f4a55898287c9248f44cd488dc2b09
MD5 4aa71d9ad27fb3cfad5c3cfaa1873e30
BLAKE2b-256 0ea3356c8dc4a303b8b351bbc1df66587e19f776af8a6d9981faea05f9000cc6

See more details on using hashes here.

Provenance

The following attestation bundles were made for knowledgesdk-0.2.0.tar.gz:

Publisher: publish.yml on KnowledgeSDK/knowledgesdk-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file knowledgesdk-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: knowledgesdk-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 17.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for knowledgesdk-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8cfa859574e0fff5860c273f2a5043ddaadde1c6cb11e87897a3a3c3dcba1fca
MD5 25243ef4e4a9c437f3927f5c963b489a
BLAKE2b-256 31704ffc2ca92229bd8b80302904e1bdc6a7432fb9bc4f13fb3654e7d61070b4

See more details on using hashes here.

Provenance

The following attestation bundles were made for knowledgesdk-0.2.0-py3-none-any.whl:

Publisher: publish.yml on KnowledgeSDK/knowledgesdk-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page