Python SDK for FireScraper — web scraping for AI pipelines.
Project description
FireScraper Python SDK
Official Python SDK for FireScraper — web scraping built for AI pipelines.
Turn websites into clean, structured text for RAG, fine-tuning, and AI agent workflows.
Installation
pip install firescraper
With LangChain integration:
pip install firescraper langchain-firescraper
Quick Start
from firescraper import FireScraper
client = FireScraper("fsk_your_api_key")
# Start a crawl
session = client.scrape(
name="Docs crawl",
urls=["https://docs.example.com/"],
max_depth=2,
scraper="article",
)
# Wait for completion
result = client.wait_for_completion(session.id)
print(f"Scraped {result.counts.success} pages")
# Download results
download = client.get_results(session.id, format="json")
with open("results.json", "wb") as f:
f.write(download.data)
Async Usage
from firescraper import AsyncFireScraper
async with AsyncFireScraper("fsk_your_api_key") as client:
session = await client.scrape(
name="Async crawl",
urls=["https://example.com/"],
max_depth=1,
)
result = await client.wait_for_completion(session.id)
download = await client.get_results(session.id, format="markdown")
LangChain Integration
from langchain_firescraper import FireScraperLoader
loader = FireScraperLoader(
api_key="fsk_your_api_key",
urls=["https://docs.example.com/"],
max_depth=2,
)
# Load all documents
docs = loader.load()
for doc in docs:
print(doc.metadata["url"], len(doc.page_content))
# Or stream with lazy_load
for doc in loader.lazy_load():
process(doc)
API Reference
FireScraper(api_key, *, base_url, timeout)
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key |
str |
required | API key (starts with fsk_) |
base_url |
str |
https://firescraper.com |
API base URL |
timeout |
float |
30.0 |
HTTP request timeout in seconds |
Methods
scrape(name, urls, max_depth=1, scraper="article", **kwargs)
Start a new crawl session.
| Parameter | Type | Default | Description |
|---|---|---|---|
name |
str |
required | Human-readable session name |
urls |
list[str] |
required | Seed URLs |
max_depth |
int |
1 |
Link-hop depth (0 = seeds only) |
scraper |
str |
"article" |
"article" or "full" |
ignore_urls |
list[str] |
None |
URLs to exclude |
webhook_url |
str |
None |
Callback URL on completion |
extraction_schema |
dict |
None |
JSON Schema for structured extraction |
respect_robots_txt |
bool |
None |
Respect robots.txt |
content_selector |
str |
None |
CSS selector for extraction |
Returns a ScrapeResponse with .id, .status, .message.
get_session(session_id)
Get current session status, including page counts and processing state.
wait_for_completion(session_id, poll_interval=5, timeout=300, on_progress=None)
Poll until the session reaches a terminal status (done, error, etc.).
def progress(status):
print(f"{status.counts.success}/{status.counts.total} pages")
result = client.wait_for_completion(session.id, on_progress=progress)
list_results(session_id)
List available result files for a completed session.
get_results(session_id, format="json")
Download results. Supported formats: zip, csv, json, markdown, structured, manifest, documents, chunks, extracted. Use documents for page-level JSONL output.
get_partial_results(session_id, format="csv")
Download mid-crawl results while the session is still running.
Error Handling
from firescraper import FireScraperError, AuthenticationError, RateLimitError
try:
session = client.scrape(name="Test", urls=["https://example.com"])
except AuthenticationError:
print("Invalid API key")
except RateLimitError:
print("Rate limited — try again later")
except FireScraperError as e:
print(f"API error: {e.message} (code={e.code}, status={e.status})")
Advanced: Progress Tracking
session = client.scrape(name="Large crawl", urls=urls, max_depth=5)
result = client.wait_for_completion(
session.id,
poll_interval=3,
timeout=600,
on_progress=lambda s: print(
f"[{s.session.status}] {s.counts.success} pages, "
f"queue: {s.processing.queue_length}"
),
)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file firescraper-1.0.1.tar.gz.
File metadata
- Download URL: firescraper-1.0.1.tar.gz
- Upload date:
- Size: 16.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70e937698d36dd0b111e8c1077c751675ce9168a558ece7d0f8b15d89d451e3e
|
|
| MD5 |
fda860f1b61fd26015f858027cc3ec8b
|
|
| BLAKE2b-256 |
b9b4bbdc9a50df818dc13da9100ccfe258533111d5a30235014e2c7f2ef61377
|
File details
Details for the file firescraper-1.0.1-py3-none-any.whl.
File metadata
- Download URL: firescraper-1.0.1-py3-none-any.whl
- Upload date:
- Size: 13.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
08d343c853d31f509efdcd6b3fd89894a5d4d31769584c5f1883e1a4493b0b41
|
|
| MD5 |
f19f84c264489f379f553f75616d9743
|
|
| BLAKE2b-256 |
bac5cf4bc624aadb40767ea5ea3461e03060154802b8995f329b8c9fcca72ea2
|