Skip to main content

Official Python SDK for the Ghostcrawl local orchestration API.

Project description

ghostcrawl — Python SDK

The official Python client for the GhostCrawl API. Collect web data at scale — scrape, crawl, search, extract structured data, manage browser sessions, and automate the full data-collection pipeline.

Install

pip install ghostcrawl

Requires Python 3.10+. Runtime dependencies: httpx>=0.28.1.

Quickstart

from ghostcrawl import GhostcrawlClient

# Reads GHOSTCRAWL_API_KEY from environment, or pass token= explicitly
client = GhostcrawlClient(token="gck_live_YOUR_KEY")

# Scrape a URL
result = client.scrape(url="https://example.com", format="markdown")
print(result["content"])

# Start a crawl
run = client.crawl_runs.start(url="https://example.com", max_depth=2, max_pages=50)
print(run["run_id"])

# Web search
results = client.search(query="latest AI research", engine="google", limit=10)
for r in results["results"]:
    print(r["title"], r["url"])

Authentication

import os
from ghostcrawl import GhostcrawlClient

# Option 1: pass token directly
client = GhostcrawlClient(token="gck_live_YOUR_KEY")

# Option 2: set environment variable (recommended for production)
os.environ["GHOSTCRAWL_API_KEY"] = "gck_live_YOUR_KEY"
client = GhostcrawlClient()

Every request sends Authorization: Bearer <token>. This is the only auth scheme the API accepts.

Extract structured data

from ghostcrawl import GhostcrawlClient

client = GhostcrawlClient(token="gck_live_YOUR_KEY")

# Define a schema and extract matching data
data = client.extract(
    url="https://example.com/product",
    schema={
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "price": {"type": "number"},
            "description": {"type": "string"},
        },
    },
)
print(data["name"], data["price"])

Browser sessions

from ghostcrawl import GhostcrawlClient

client = GhostcrawlClient(token="gck_live_YOUR_KEY")

# Create a session
session = client.sessions.create(profile_name="my-profile")
session_id = session["session_id"]

# Extend and release
client.sessions.extend(session_id, duration_seconds=600)
client.sessions.release(session_id)

Error handling

from ghostcrawl import GhostcrawlClient, AuthenticationError, RateLimitError, APIError

client = GhostcrawlClient(token="gck_live_YOUR_KEY")

try:
    result = client.scrape(url="https://example.com")
except AuthenticationError:
    print("Invalid API key — check your token")
except RateLimitError:
    print("Rate limit reached — retry after a short delay")
except APIError as e:
    print(f"Server error: {e.status_code}")

Context manager

from ghostcrawl import GhostcrawlClient

with GhostcrawlClient(token="gck_live_YOUR_KEY") as client:
    result = client.scrape(url="https://example.com")
    print(result)
# HTTP connection is closed automatically

All resources

Resource Client attribute Key operations
Scraping client.scrape(url=…) Render and return page content
Web search client.search(query=…) Search Google, Bing, DuckDuckGo
Data extraction client.extract(url=…, schema=…) Structured JSON from any page
Deep crawl client.crawl(url=…) Crawl a site depth-first
URL map client.map(url=…) Discover all reachable URLs
Crawl runs client.crawl_runs start, list, get, cancel
Sessions client.sessions create, extend, release
Profiles client.profiles list, get, create, update, delete
Webhooks client.webhooks list, get, create, delete, rotate-secret
Schedules client.schedules list, get, create, delete
Datasets client.datasets list, get, create, delete, append rows
Recordings client.recordings list, get, delete
Key-Value Store client.kv get, set, delete
Account client.me() Get account info and usage

LangChain integration

pip install ghostcrawl-langchain
from ghostcrawl_langchain import GhostcrawlScrape, GhostcrawlSearch

scrape_tool = GhostcrawlScrape()
search_tool = GhostcrawlSearch()

Self-hosted

client = GhostcrawlClient(
    token="gck_live_YOUR_KEY",
    base_url="http://localhost:8080",  # your self-hosted instance
)

License

Proprietary — GhostCrawl Software License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ghostcrawl-2.2.1.tar.gz (131.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ghostcrawl-2.2.1-py3-none-any.whl (438.5 kB view details)

Uploaded Python 3

File details

Details for the file ghostcrawl-2.2.1.tar.gz.

File metadata

  • Download URL: ghostcrawl-2.2.1.tar.gz
  • Upload date:
  • Size: 131.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ghostcrawl-2.2.1.tar.gz
Algorithm Hash digest
SHA256 26ea1203ccdf66ce69a5a982e01ab1425c0cbdd0e23b46d873f8510e4d63638f
MD5 45c83502f93f5a7cb8e6d24422fecb53
BLAKE2b-256 9094edc67cb5272a67c514d302f311064423353d2fcf7aaafc4c1ac2a845f98f

See more details on using hashes here.

File details

Details for the file ghostcrawl-2.2.1-py3-none-any.whl.

File metadata

  • Download URL: ghostcrawl-2.2.1-py3-none-any.whl
  • Upload date:
  • Size: 438.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ghostcrawl-2.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 79e48f8797e4192ff21c119717a5a1c587d846650c622539b39b5747f28e8f59
MD5 e16026ff14ac84f63e96ea7989bfcb9e
BLAKE2b-256 b5cf5ffed0dd3d458b1ecda6e93d11b14e156dfdeb81bbee6f39ca3178d172ca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page