Skip to main content

Official Python SDK for the Ghostcrawl local orchestration API.

Project description

ghostcrawl — Python SDK

The official Python client for the GhostCrawl API. Collect web data at scale — scrape, crawl, search, extract structured data, manage browser sessions, and automate the full data-collection pipeline.

Install

pip install ghostcrawl

Requires Python 3.10+. Runtime dependencies: httpx>=0.28.1.

Quickstart

from ghostcrawl import GhostcrawlClient

# Reads GHOSTCRAWL_API_KEY from environment, or pass token= explicitly
client = GhostcrawlClient(token="gck_live_YOUR_KEY")

# Scrape a URL
result = client.scrape(url="https://example.com", format="markdown")
print(result["content"])

# Start a crawl
run = client.crawl_runs.start(url="https://example.com", max_depth=2, max_pages=50)
print(run["run_id"])

# Web search
results = client.search(query="latest AI research", engine="google", limit=10)
for r in results["results"]:
    print(r["title"], r["url"])

Authentication

import os
from ghostcrawl import GhostcrawlClient

# Option 1: pass token directly
client = GhostcrawlClient(token="gck_live_YOUR_KEY")

# Option 2: set environment variable (recommended for production)
os.environ["GHOSTCRAWL_API_KEY"] = "gck_live_YOUR_KEY"
client = GhostcrawlClient()

Every request sends Authorization: Bearer <token>. This is the only auth scheme the API accepts.

Extract structured data

from ghostcrawl import GhostcrawlClient

client = GhostcrawlClient(token="gck_live_YOUR_KEY")

# Define a schema and extract matching data
data = client.extract(
    url="https://example.com/product",
    schema={
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "price": {"type": "number"},
            "description": {"type": "string"},
        },
    },
)
print(data["name"], data["price"])

Browser sessions

from ghostcrawl import GhostcrawlClient

client = GhostcrawlClient(token="gck_live_YOUR_KEY")

# Create a session
session = client.sessions.create(profile_name="my-profile")
session_id = session["session_id"]

# Extend and release
client.sessions.extend(session_id, duration_seconds=600)
client.sessions.release(session_id)

Error handling

from ghostcrawl import GhostcrawlClient, AuthenticationError, RateLimitError, APIError

client = GhostcrawlClient(token="gck_live_YOUR_KEY")

try:
    result = client.scrape(url="https://example.com")
except AuthenticationError:
    print("Invalid API key — check your token")
except RateLimitError:
    print("Rate limit reached — retry after a short delay")
except APIError as e:
    print(f"Server error: {e.status_code}")

Context manager

from ghostcrawl import GhostcrawlClient

with GhostcrawlClient(token="gck_live_YOUR_KEY") as client:
    result = client.scrape(url="https://example.com")
    print(result)
# HTTP connection is closed automatically

All resources

Resource Client attribute Key operations
Scraping client.scrape(url=…) Render and return page content
Web search client.search(query=…) Search Google, Bing, DuckDuckGo
Data extraction client.extract(url=…, schema=…) Structured JSON from any page
Deep crawl client.crawl(url=…) Crawl a site depth-first
URL map client.map(url=…) Discover all reachable URLs
Crawl runs client.crawl_runs start, list, get, cancel
Sessions client.sessions create, extend, release
Profiles client.profiles list, get, create, update, delete
Webhooks client.webhooks list, get, create, delete, rotate-secret
Schedules client.schedules list, get, create, delete
Datasets client.datasets list, get, create, delete, append rows
Recordings client.recordings list, get, delete
Key-Value Store client.kv get, set, delete
Account client.me() Get account info and usage

LangChain integration

pip install ghostcrawl-langchain
from ghostcrawl_langchain import GhostcrawlScrape, GhostcrawlSearch

scrape_tool = GhostcrawlScrape()
search_tool = GhostcrawlSearch()

Self-hosted

client = GhostcrawlClient(
    token="gck_live_YOUR_KEY",
    base_url="http://localhost:8080",  # your self-hosted instance
)

License

Proprietary — GhostCrawl Software License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ghostcrawl-2.2.2.tar.gz (131.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ghostcrawl-2.2.2-py3-none-any.whl (438.9 kB view details)

Uploaded Python 3

File details

Details for the file ghostcrawl-2.2.2.tar.gz.

File metadata

  • Download URL: ghostcrawl-2.2.2.tar.gz
  • Upload date:
  • Size: 131.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ghostcrawl-2.2.2.tar.gz
Algorithm Hash digest
SHA256 e67b30bd62ca7978ddb7d0af9e770f16cbf60f8e430eac420768a69d81fb128d
MD5 88d7667f0f1e0870d364b2c90d280c2d
BLAKE2b-256 7640fa8586bd1dbca04b71d58cec612225471d8ccff3d5e262f01e596e8c53b7

See more details on using hashes here.

File details

Details for the file ghostcrawl-2.2.2-py3-none-any.whl.

File metadata

  • Download URL: ghostcrawl-2.2.2-py3-none-any.whl
  • Upload date:
  • Size: 438.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ghostcrawl-2.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3fc6e7c3adf5578d900c3873b7da3403930ad7cd7e78f45599456ad13615fe9d
MD5 26c3ebcc9473f59aa2bd3f5fc7061309
BLAKE2b-256 9ed0d4e4c71b8c11d84556362a3bdaa77585ead169e6969417596f5f6c718e97

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page