Command-line client for the ScrapingBee API: scrape pages (single or batch), crawl sites, check usage/credits, and use Google Search, Fast Search, Amazon, Walmart, YouTube, and ChatGPT from the terminal.
Project description
ScrapingBee CLI
Command-line client for the ScrapingBee API: scrape URLs (single or batch), crawl sites, check usage and credits, and use Google, Fast Search, Amazon, Walmart, YouTube, and ChatGPT from the terminal.
Requirements
- Python 3.10+
Setup: Install (below), then authenticate (Configuration). You need a ScrapingBee API key before any command will work.
Installation
Recommended — install with uv (no virtual environment needed):
curl -LsSf https://astral.sh/uv/install.sh | sh
uv tool install scrapingbee-cli
Alternative — install with pip in a virtual environment:
pip install scrapingbee-cli
From source: clone the repo and run pip install -e . in the project root.
Configuration
You need a ScrapingBee API key:
scrapingbee auth– Validate and save the key to config (use--api-key KEYfor non-interactive;--showto print config path).- Environment –
export SCRAPINGBEE_API_KEY=your_key .envfile – In the current directory or~/.config/scrapingbee-cli/.env
Remove the stored key with scrapingbee logout. Get your API key from the ScrapingBee dashboard.
Usage
scrapingbee [command] [arguments] [options]
scrapingbee --help– List all commands.scrapingbee [command] --help– Options and parameters for that command.
Options are per-command. Each command has its own set of options — run scrapingbee [command] --help to see them. Common options across batch-capable commands include --output-file, --output-dir, --input-file, --input-column, --concurrency, --output-format, --retries, --backoff, --resume, --update-csv, --no-progress, --extract-field, --fields, --deduplicate, --sample, --post-process, --on-complete, and --verbose. For details, see the documentation.
Commands
| Command | Description |
|---|---|
usage |
Check credits and max concurrency |
auth / logout |
Save or remove API key |
docs |
Print docs URL; --open to open in browser |
scrape [url] |
Scrape a URL (HTML, JS, screenshot, extract) |
crawl |
Crawl sites following links, with AI extraction and save-pattern filtering |
google / fast-search |
Search SERP APIs |
amazon-product / amazon-search |
Amazon product and search |
walmart-search / walmart-product |
Walmart search and product |
youtube-search / youtube-metadata |
YouTube search and video metadata |
chatgpt |
ChatGPT API (--search true for web-enhanced responses) |
export |
Merge batch/crawl output to ndjson, txt, or csv (with --flatten, --columns) |
schedule |
Schedule commands via cron (--name, --list, --stop) |
Batch mode: Commands that take a single input support --input-file (one line per input, or .csv with --input-column) and --output-dir. Use --output-format to choose between files (default), csv, or ndjson streaming. Add --deduplicate to remove duplicate URLs, --sample N to test on a subset, or --post-process 'jq .title' to transform each result. Use --resume to skip already-completed items after interruption.
Parameters and options: Use space-separated values (e.g. --render-js false), not --option=value. For full parameter lists, response formats, and credit costs, see scrapingbee [command] --help and the ScrapingBee API documentation.
Key features
- AI extraction:
--ai-extract-rules '{"price": "product price", "title": "product name"}'pulls structured data from any page using natural language — no CSS selectors needed. Works withscrape,crawl, and batch mode. - CSS/XPath extraction:
--extract-rules '{"title": "h1", "price": ".price"}'for consistent, cheaper production scraping. Find selectors in browser DevTools. - Pipelines: Chain commands with
--extract-field— e.g.google QUERY --extract-field organic_results.url > urls.txtthenscrape --input-file urls.txt. - Update CSV:
--update-csvfetches fresh data and updates the input CSV in-place. Ideal for daily price tracking, inventory monitoring, or any dataset that needs periodic refresh. - Crawl with filtering:
--include-pattern,--exclude-patterncontrol which links to follow.--save-patternonly saves pages matching a regex (others are visited for link discovery but not saved). - Output formats:
--output-format ndjsonstreams results as JSON lines;--output-format csvwrites a single CSV. Defaultfileswrites individual files. - CSV input:
--input-file products.csv --input-column urlreads URLs from a CSV column. - Export:
scrapingbee export --input-dir batch/ --format csv --flatten --columns "title,price"merges batch output with nested JSON flattening and column selection. - Scheduling:
scrapingbee schedule --every 1d --name prices scrape --input-file products.csv --update-csvregisters a cron job. Use--list,--stop NAME, or--stop all. - Deduplication & sampling:
--deduplicateremoves duplicate URLs;--sample 100processes only 100 random items. - RAG chunking:
scrape --chunk-size 500 --chunk-overlap 50 --return-page-markdown trueoutputs NDJSON chunks ready for vector DB ingestion.
Examples
scrapingbee usage
scrapingbee scrape "https://example.com" --output-file page.html
scrapingbee scrape "https://example.com/product" --ai-extract-rules '{"title": "product name", "price": "price"}'
scrapingbee google "pizza new york" --extract-field organic_results.url > urls.txt
scrapingbee scrape --input-file urls.txt --output-dir pages --deduplicate
scrapingbee crawl "https://store.com" --output-dir products --save-pattern "/product/" --ai-extract-rules '{"name": "name", "price": "price"}' --max-pages 200 --concurrency 200
scrapingbee export --input-dir products --format csv --flatten --columns "name,price" --output-file products.csv
scrapingbee scrape --input-file products.csv --input-column url --update-csv --ai-extract-rules '{"price": "current price"}'
scrapingbee schedule --every 1d --name price-tracker scrape --input-file products.csv --input-column url --update-csv --ai-extract-rules '{"price": "price"}'
scrapingbee schedule --list
More information
- CLI Documentation – Full CLI reference with pipelines, parameters, and examples.
- Advanced usage examples – Shell piping, command chaining, batch workflows, monitoring scripts, NDJSON streaming, screenshots, Google search patterns, LLM chunking, and more.
- ScrapingBee API documentation – Parameters, response formats, credit costs, and best practices.
- Claude / AI agents: This repo includes a Claude Skill and Claude Plugin for agent use with file-based output and security rules.
Testing
Pytest is configured in pyproject.toml ([tool.pytest.ini_options]). From the project root:
1. Install the package with dev dependencies
pip install -e ".[dev]"
2. Run tests
| Command | What runs |
|---|---|
pytest tests/unit |
Unit tests only (no API key needed) |
pytest -m "not integration" |
All except integration (no API key needed) |
pytest |
Full suite (integration tests require SCRAPINGBEE_API_KEY) |
python tests/run_e2e_tests.py |
E2E tests (182 tests, requires SCRAPINGBEE_API_KEY) |
python tests/run_e2e_tests.py --filter GG |
E2E tests filtered by prefix |
Integration tests call the live ScrapingBee API and are marked with @pytest.mark.integration.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scrapingbee_cli-1.2.3.tar.gz.
File metadata
- Download URL: scrapingbee_cli-1.2.3.tar.gz
- Upload date:
- Size: 62.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dda50772dbb749165d4241bcb5f973f8229385c5ed9c26a5610c765b29d380db
|
|
| MD5 |
55602a5fc014d2ce9e05dc8a770a6f60
|
|
| BLAKE2b-256 |
cee0da0c524081e8255a4c364579c19a2a0826e2ea2160cf3dc78b340060c1f2
|
Provenance
The following attestation bundles were made for scrapingbee_cli-1.2.3.tar.gz:
Publisher:
publish.yml on ScrapingBee/scrapingbee-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scrapingbee_cli-1.2.3.tar.gz -
Subject digest:
dda50772dbb749165d4241bcb5f973f8229385c5ed9c26a5610c765b29d380db - Sigstore transparency entry: 1179521512
- Sigstore integration time:
-
Permalink:
ScrapingBee/scrapingbee-cli@b0c92e36326460962815a385d0d99c2b6db72c16 -
Branch / Tag:
refs/tags/v1.2.3 - Owner: https://github.com/ScrapingBee
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b0c92e36326460962815a385d0d99c2b6db72c16 -
Trigger Event:
release
-
Statement type:
File details
Details for the file scrapingbee_cli-1.2.3-py3-none-any.whl.
File metadata
- Download URL: scrapingbee_cli-1.2.3-py3-none-any.whl
- Upload date:
- Size: 72.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
03aa69b964313b6867482d658b96865d960bef9a033984a60f0778665a2220af
|
|
| MD5 |
99cc2998fce98df0e554c5b3f7f11de3
|
|
| BLAKE2b-256 |
f06bab934d03e09c883f573c1ce0daec079990ad6e314214acf5ca1b2ec92709
|
Provenance
The following attestation bundles were made for scrapingbee_cli-1.2.3-py3-none-any.whl:
Publisher:
publish.yml on ScrapingBee/scrapingbee-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scrapingbee_cli-1.2.3-py3-none-any.whl -
Subject digest:
03aa69b964313b6867482d658b96865d960bef9a033984a60f0778665a2220af - Sigstore transparency entry: 1179521569
- Sigstore integration time:
-
Permalink:
ScrapingBee/scrapingbee-cli@b0c92e36326460962815a385d0d99c2b6db72c16 -
Branch / Tag:
refs/tags/v1.2.3 - Owner: https://github.com/ScrapingBee
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b0c92e36326460962815a385d0d99c2b6db72c16 -
Trigger Event:
release
-
Statement type: