Skip to main content

Official Python SDK for the Ghostcrawl local orchestration API.

Project description

ghostcrawl — Python SDK

The official Python client for the GhostCrawl API. Collect web data at scale — scrape, crawl, search, extract structured data, manage browser sessions, and automate the full data-collection pipeline.

Install

pip install ghostcrawl

Requires Python 3.10+. Runtime dependencies: httpx>=0.28.1.

Quickstart

from ghostcrawl import GhostcrawlClient

# Reads GHOSTCRAWL_API_KEY from environment, or pass token= explicitly
client = GhostcrawlClient(token="gck_live_YOUR_KEY")

# Scrape a URL
result = client.scrape(url="https://example.com", format="markdown")
print(result["content"])

# Start a crawl
run = client.crawl_runs.start(url="https://example.com", max_depth=2, max_pages=50)
print(run["run_id"])

# Web search
results = client.search(query="latest AI research", engine="google", limit=10)
for r in results["results"]:
    print(r["title"], r["url"])

Authentication

import os
from ghostcrawl import GhostcrawlClient

# Option 1: pass token directly
client = GhostcrawlClient(token="gck_live_YOUR_KEY")

# Option 2: set environment variable (recommended for production)
os.environ["GHOSTCRAWL_API_KEY"] = "gck_live_YOUR_KEY"
client = GhostcrawlClient()

Every request sends Authorization: Bearer <token>. This is the only auth scheme the API accepts.

Extract structured data

from ghostcrawl import GhostcrawlClient

client = GhostcrawlClient(token="gck_live_YOUR_KEY")

# Define a schema and extract matching data
data = client.extract(
    url="https://example.com/product",
    schema={
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "price": {"type": "number"},
            "description": {"type": "string"},
        },
    },
)
print(data["name"], data["price"])

Browser sessions

from ghostcrawl import GhostcrawlClient

client = GhostcrawlClient(token="gck_live_YOUR_KEY")

# Create a session
session = client.sessions.create(profile_name="my-profile")
session_id = session["session_id"]

# Extend and release
client.sessions.extend(session_id, duration_seconds=600)
client.sessions.release(session_id)

Error handling

from ghostcrawl import GhostcrawlClient, AuthenticationError, RateLimitError, APIError

client = GhostcrawlClient(token="gck_live_YOUR_KEY")

try:
    result = client.scrape(url="https://example.com")
except AuthenticationError:
    print("Invalid API key — check your token")
except RateLimitError:
    print("Rate limit reached — retry after a short delay")
except APIError as e:
    print(f"Server error: {e.status_code}")

Context manager

from ghostcrawl import GhostcrawlClient

with GhostcrawlClient(token="gck_live_YOUR_KEY") as client:
    result = client.scrape(url="https://example.com")
    print(result)
# HTTP connection is closed automatically

All resources

Resource Client attribute Key operations
Scraping client.scrape(url=…) Render and return page content
Web search client.search(query=…) Search Google, Bing, DuckDuckGo
Data extraction client.extract(url=…, schema=…) Structured JSON from any page
Deep crawl client.crawl(url=…) Crawl a site depth-first
URL map client.map(url=…) Discover all reachable URLs
Crawl runs client.crawl_runs start, list, get, cancel
Sessions client.sessions create, extend, release
Profiles client.profiles list, get, create, update, delete
Webhooks client.webhooks list, get, create, delete, rotate-secret
Schedules client.schedules list, get, create, delete
Datasets client.datasets list, get, create, delete, append rows
Recordings client.recordings list, get, delete
Key-Value Store client.kv get, set, delete
Account client.me() Get account info and usage

LangChain integration

pip install ghostcrawl-langchain
from ghostcrawl_langchain import GhostcrawlScrape, GhostcrawlSearch

scrape_tool = GhostcrawlScrape()
search_tool = GhostcrawlSearch()

Self-hosted

client = GhostcrawlClient(
    token="gck_live_YOUR_KEY",
    base_url="http://localhost:8080",  # your self-hosted instance
)

License

Proprietary — GhostCrawl Software License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ghostcrawl-2.2.3.tar.gz (131.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ghostcrawl-2.2.3-py3-none-any.whl (439.1 kB view details)

Uploaded Python 3

File details

Details for the file ghostcrawl-2.2.3.tar.gz.

File metadata

  • Download URL: ghostcrawl-2.2.3.tar.gz
  • Upload date:
  • Size: 131.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ghostcrawl-2.2.3.tar.gz
Algorithm Hash digest
SHA256 2bbd87714849671bcd66389af6ca529d0f9ed7ec9ecbdb738c32d2f7203375b6
MD5 d2a0796fe2ea7ee73c2255b6267f054e
BLAKE2b-256 4a473bd30092dc703b07da73477d003092414bf49dc4f218e0f7701c469d622f

See more details on using hashes here.

File details

Details for the file ghostcrawl-2.2.3-py3-none-any.whl.

File metadata

  • Download URL: ghostcrawl-2.2.3-py3-none-any.whl
  • Upload date:
  • Size: 439.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ghostcrawl-2.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3be9509ee41d2aa0b867c438de873c481d119b31560ea072b43eda2f87ec82d9
MD5 14f2c8dae6f710d37ce41cb081b94974
BLAKE2b-256 35347a09061d8b095242f3e470bc9945d19d51b2b6eb52b4993364075f87f84e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page