Official Python SDK for the Ghostcrawl local orchestration API.

These details have not been verified by PyPI

Project links

Homepage

Project description

ghostcrawl — Python SDK

The official Python client for the GhostCrawl API. Collect web data at scale — scrape, crawl, search, extract structured data, manage browser sessions, and automate the full data-collection pipeline.

Install

pip install ghostcrawl

Requires Python 3.10+. Runtime dependencies: httpx>=0.28.1.

Quickstart

from ghostcrawl import GhostcrawlClient

# Reads GHOSTCRAWL_API_KEY from environment, or pass token= explicitly
client = GhostcrawlClient(token="gck_live_YOUR_KEY")

# Scrape a URL
result = client.scrape(url="https://example.com", format="markdown")
print(result["content"])

# Start a crawl
run = client.crawl_runs.start(url="https://example.com", max_depth=2, max_pages=50)
print(run["run_id"])

# Web search
results = client.search(query="latest AI research", engine="google", limit=10)
for r in results["results"]:
    print(r["title"], r["url"])

Authentication

import os
from ghostcrawl import GhostcrawlClient

# Option 1: pass token directly
client = GhostcrawlClient(token="gck_live_YOUR_KEY")

# Option 2: set environment variable (recommended for production)
os.environ["GHOSTCRAWL_API_KEY"] = "gck_live_YOUR_KEY"
client = GhostcrawlClient()

Every request sends Authorization: Bearer <token>. This is the only auth scheme the API accepts.

Extract structured data

from ghostcrawl import GhostcrawlClient

client = GhostcrawlClient(token="gck_live_YOUR_KEY")

# Define a schema and extract matching data
data = client.extract(
    url="https://example.com/product",
    schema={
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "price": {"type": "number"},
            "description": {"type": "string"},
        },
    },
)
print(data["name"], data["price"])

Browser sessions

from ghostcrawl import GhostcrawlClient

client = GhostcrawlClient(token="gck_live_YOUR_KEY")

# Create a session
session = client.sessions.create(profile_name="my-profile")
session_id = session["session_id"]

# Extend and release
client.sessions.extend(session_id, duration_seconds=600)
client.sessions.release(session_id)

Error handling

from ghostcrawl import GhostcrawlClient, AuthenticationError, RateLimitError, APIError

client = GhostcrawlClient(token="gck_live_YOUR_KEY")

try:
    result = client.scrape(url="https://example.com")
except AuthenticationError:
    print("Invalid API key — check your token")
except RateLimitError:
    print("Rate limit reached — retry after a short delay")
except APIError as e:
    print(f"Server error: {e.status_code}")

Context manager

from ghostcrawl import GhostcrawlClient

with GhostcrawlClient(token="gck_live_YOUR_KEY") as client:
    result = client.scrape(url="https://example.com")
    print(result)
# HTTP connection is closed automatically

All resources

Resource	Client attribute	Key operations
Scraping	`client.scrape(url=…)`	Render and return page content
Web search	`client.search(query=…)`	Search Google, Bing, DuckDuckGo
Data extraction	`client.extract(url=…, schema=…)`	Structured JSON from any page
Deep crawl	`client.crawl(url=…)`	Crawl a site depth-first
URL map	`client.map(url=…)`	Discover all reachable URLs
Crawl runs	`client.crawl_runs`	start, list, get, cancel
Sessions	`client.sessions`	create, extend, release
Profiles	`client.profiles`	list, get, create, update, delete
Webhooks	`client.webhooks`	list, get, create, delete, rotate-secret
Schedules	`client.schedules`	list, get, create, delete
Datasets	`client.datasets`	list, get, create, delete, append rows
Recordings	`client.recordings`	list, get, delete
Key-Value Store	`client.kv`	get, set, delete
Account	`client.me()`	Get account info and usage

LangChain integration

pip install ghostcrawl-langchain

from ghostcrawl_langchain import GhostcrawlScrape, GhostcrawlSearch

scrape_tool = GhostcrawlScrape()
search_tool = GhostcrawlSearch()

Self-hosted

client = GhostcrawlClient(
    token="gck_live_YOUR_KEY",
    base_url="http://localhost:8080",  # your self-hosted instance
)

License

Proprietary — GhostCrawl Software License. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

2.2.3

Jul 1, 2026

2.2.2

Jul 1, 2026

This version

2.2.1

Jul 1, 2026

2.2.0

Jul 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ghostcrawl-2.2.1.tar.gz (131.3 kB view details)

Uploaded Jul 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ghostcrawl-2.2.1-py3-none-any.whl (438.5 kB view details)

Uploaded Jul 1, 2026 Python 3

File details

Details for the file ghostcrawl-2.2.1.tar.gz.

File metadata

Download URL: ghostcrawl-2.2.1.tar.gz
Upload date: Jul 1, 2026
Size: 131.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ghostcrawl-2.2.1.tar.gz
Algorithm	Hash digest
SHA256	`26ea1203ccdf66ce69a5a982e01ab1425c0cbdd0e23b46d873f8510e4d63638f`
MD5	`45c83502f93f5a7cb8e6d24422fecb53`
BLAKE2b-256	`9094edc67cb5272a67c514d302f311064423353d2fcf7aaafc4c1ac2a845f98f`

See more details on using hashes here.

File details

Details for the file ghostcrawl-2.2.1-py3-none-any.whl.

File metadata

Download URL: ghostcrawl-2.2.1-py3-none-any.whl
Upload date: Jul 1, 2026
Size: 438.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ghostcrawl-2.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`79e48f8797e4192ff21c119717a5a1c587d846650c622539b39b5747f28e8f59`
MD5	`e16026ff14ac84f63e96ea7989bfcb9e`
BLAKE2b-256	`b5cf5ffed0dd3d458b1ecda6e93d11b14e156dfdeb81bbee6f39ca3178d172ca`

See more details on using hashes here.

ghostcrawl 2.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ghostcrawl — Python SDK

Install

Quickstart

Authentication

Extract structured data

Browser sessions

Error handling

Context manager

All resources

LangChain integration

Self-hosted

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes