Official Python SDK for the Ghostcrawl local orchestration API.
Project description
ghostcrawl — Python SDK
The official Python client for the GhostCrawl API. Collect web data at scale — scrape, crawl, search, extract structured data, manage browser sessions, and automate the full data-collection pipeline.
Install
pip install ghostcrawl
Requires Python 3.10+. Runtime dependencies: httpx>=0.28.1.
Quickstart
from ghostcrawl import GhostcrawlClient
# Reads GHOSTCRAWL_API_KEY from environment, or pass token= explicitly
client = GhostcrawlClient(token="gck_live_YOUR_KEY")
# Scrape a URL
result = client.scrape(url="https://example.com", format="markdown")
print(result["content"])
# Start a crawl
run = client.crawl_runs.start(url="https://example.com", max_depth=2, max_pages=50)
print(run["run_id"])
# Web search
results = client.search(query="latest AI research", engine="google", limit=10)
for r in results["results"]:
print(r["title"], r["url"])
Authentication
import os
from ghostcrawl import GhostcrawlClient
# Option 1: pass token directly
client = GhostcrawlClient(token="gck_live_YOUR_KEY")
# Option 2: set environment variable (recommended for production)
os.environ["GHOSTCRAWL_API_KEY"] = "gck_live_YOUR_KEY"
client = GhostcrawlClient()
Every request sends Authorization: Bearer <token>. This is the only auth scheme the API accepts.
Extract structured data
from ghostcrawl import GhostcrawlClient
client = GhostcrawlClient(token="gck_live_YOUR_KEY")
# Define a schema and extract matching data
data = client.extract(
url="https://example.com/product",
schema={
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"},
"description": {"type": "string"},
},
},
)
print(data["name"], data["price"])
Browser sessions
from ghostcrawl import GhostcrawlClient
client = GhostcrawlClient(token="gck_live_YOUR_KEY")
# Create a session
session = client.sessions.create(profile_name="my-profile")
session_id = session["session_id"]
# Extend and release
client.sessions.extend(session_id, duration_seconds=600)
client.sessions.release(session_id)
Error handling
from ghostcrawl import GhostcrawlClient, AuthenticationError, RateLimitError, APIError
client = GhostcrawlClient(token="gck_live_YOUR_KEY")
try:
result = client.scrape(url="https://example.com")
except AuthenticationError:
print("Invalid API key — check your token")
except RateLimitError:
print("Rate limit reached — retry after a short delay")
except APIError as e:
print(f"Server error: {e.status_code}")
Context manager
from ghostcrawl import GhostcrawlClient
with GhostcrawlClient(token="gck_live_YOUR_KEY") as client:
result = client.scrape(url="https://example.com")
print(result)
# HTTP connection is closed automatically
All resources
| Resource | Client attribute | Key operations |
|---|---|---|
| Scraping | client.scrape(url=…) |
Render and return page content |
| Web search | client.search(query=…) |
Search Google, Bing, DuckDuckGo |
| Data extraction | client.extract(url=…, schema=…) |
Structured JSON from any page |
| Deep crawl | client.crawl(url=…) |
Crawl a site depth-first |
| URL map | client.map(url=…) |
Discover all reachable URLs |
| Crawl runs | client.crawl_runs |
start, list, get, cancel |
| Sessions | client.sessions |
create, extend, release |
| Profiles | client.profiles |
list, get, create, update, delete |
| Webhooks | client.webhooks |
list, get, create, delete, rotate-secret |
| Schedules | client.schedules |
list, get, create, delete |
| Datasets | client.datasets |
list, get, create, delete, append rows |
| Recordings | client.recordings |
list, get, delete |
| Key-Value Store | client.kv |
get, set, delete |
| Account | client.me() |
Get account info and usage |
LangChain integration
pip install ghostcrawl-langchain
from ghostcrawl_langchain import GhostcrawlScrape, GhostcrawlSearch
scrape_tool = GhostcrawlScrape()
search_tool = GhostcrawlSearch()
Self-hosted
client = GhostcrawlClient(
token="gck_live_YOUR_KEY",
base_url="http://localhost:8080", # your self-hosted instance
)
License
Proprietary — GhostCrawl Software License. See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ghostcrawl-2.2.1.tar.gz.
File metadata
- Download URL: ghostcrawl-2.2.1.tar.gz
- Upload date:
- Size: 131.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26ea1203ccdf66ce69a5a982e01ab1425c0cbdd0e23b46d873f8510e4d63638f
|
|
| MD5 |
45c83502f93f5a7cb8e6d24422fecb53
|
|
| BLAKE2b-256 |
9094edc67cb5272a67c514d302f311064423353d2fcf7aaafc4c1ac2a845f98f
|
File details
Details for the file ghostcrawl-2.2.1-py3-none-any.whl.
File metadata
- Download URL: ghostcrawl-2.2.1-py3-none-any.whl
- Upload date:
- Size: 438.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
79e48f8797e4192ff21c119717a5a1c587d846650c622539b39b5747f28e8f59
|
|
| MD5 |
e16026ff14ac84f63e96ea7989bfcb9e
|
|
| BLAKE2b-256 |
b5cf5ffed0dd3d458b1ecda6e93d11b14e156dfdeb81bbee6f39ca3178d172ca
|