Skip to main content

Command-line client for the ScrapingBee API: scrape pages (single or batch), crawl sites, check usage/credits, and use Google Search, Fast Search, Amazon, Walmart, YouTube, and ChatGPT from the terminal.

Project description

ScrapingBee CLI

Command-line client for the ScrapingBee API: scrape URLs (single or batch), crawl sites, check usage and credits, and use Google, Fast Search, Amazon, Walmart, YouTube, and ChatGPT from the terminal.

Requirements

  • Python 3.10+

Setup: Install (below), then authenticate (Configuration). You need a ScrapingBee API key before any command will work.

Installation

pip install scrapingbee-cli
# or (isolated): pipx install scrapingbee-cli

From source: clone the repo and run pip install -e . in the project root.

Configuration

You need a ScrapingBee API key:

  1. scrapingbee auth – Validate and save the key to config (use --api-key KEY for non-interactive; --show to print config path).
  2. Environmentexport SCRAPINGBEE_API_KEY=your_key
  3. .env file – In the current directory or ~/.config/scrapingbee-cli/.env

Remove the stored key with scrapingbee logout. Get your API key from the ScrapingBee dashboard.

Usage

scrapingbee [command] [arguments] [options]
  • scrapingbee --help – List all commands.
  • scrapingbee [command] --help – Options and parameters for that command.

Options are per-command. Each command has its own set of options — run scrapingbee [command] --help to see them. Common options across batch-capable commands include --output-file, --output-dir, --input-file, --input-column, --concurrency, --output-format, --retries, --backoff, --resume, --update-csv, --no-progress, --extract-field, --fields, --deduplicate, --sample, --post-process, --on-complete, and --verbose. For details, see the documentation.

Commands

Command Description
usage Check credits and max concurrency
auth / logout Save or remove API key
docs Print docs URL; --open to open in browser
scrape [url] Scrape a URL (HTML, JS, screenshot, extract)
crawl Crawl sites following links, with AI extraction and save-pattern filtering
google / fast-search Search SERP APIs
amazon-product / amazon-search Amazon product and search
walmart-search / walmart-product Walmart search and product
youtube-search / youtube-metadata YouTube search and video metadata
chatgpt ChatGPT API
export Merge batch/crawl output to ndjson, txt, or csv (with --flatten, --columns)
schedule Schedule commands via cron (--name, --list, --stop)

Batch mode: Commands that take a single input support --input-file (one line per input, or .csv with --input-column) and --output-dir. Use --output-format to choose between files (default), csv, or ndjson streaming. Add --deduplicate to remove duplicate URLs, --sample N to test on a subset, or --post-process 'jq .title' to transform each result. Use --resume to skip already-completed items after interruption.

Parameters and options: Use space-separated values (e.g. --render-js false), not --option=value. For full parameter lists, response formats, and credit costs, see scrapingbee [command] --help and the ScrapingBee API documentation.

Key features

  • AI extraction: --ai-extract-rules '{"price": "product price", "title": "product name"}' pulls structured data from any page using natural language — no CSS selectors needed. Works with scrape, crawl, and batch mode.
  • CSS/XPath extraction: --extract-rules '{"title": "h1", "price": ".price"}' for consistent, cheaper production scraping. Find selectors in browser DevTools.
  • Pipelines: Chain commands with --extract-field — e.g. google QUERY --extract-field organic_results.url > urls.txt then scrape --input-file urls.txt.
  • Update CSV: --update-csv fetches fresh data and updates the input CSV in-place. Ideal for daily price tracking, inventory monitoring, or any dataset that needs periodic refresh.
  • Crawl with filtering: --include-pattern, --exclude-pattern control which links to follow. --save-pattern only saves pages matching a regex (others are visited for link discovery but not saved).
  • Output formats: --output-format ndjson streams results as JSON lines; --output-format csv writes a single CSV. Default files writes individual files.
  • CSV input: --input-file products.csv --input-column url reads URLs from a CSV column.
  • Export: scrapingbee export --input-dir batch/ --format csv --flatten --columns "title,price" merges batch output with nested JSON flattening and column selection.
  • Scheduling: scrapingbee schedule --every 1d --name prices scrape --input-file products.csv --update-csv registers a cron job. Use --list, --stop NAME, or --stop all.
  • Deduplication & sampling: --deduplicate removes duplicate URLs; --sample 100 processes only 100 random items.
  • RAG chunking: scrape --chunk-size 500 --chunk-overlap 50 --return-page-markdown true outputs NDJSON chunks ready for vector DB ingestion.

Examples

scrapingbee usage
scrapingbee scrape "https://example.com" --output-file page.html
scrapingbee scrape "https://example.com/product" --ai-extract-rules '{"title": "product name", "price": "price"}'
scrapingbee google "pizza new york" --extract-field organic_results.url > urls.txt
scrapingbee scrape --input-file urls.txt --output-dir pages --deduplicate
scrapingbee crawl "https://store.com" --output-dir products --save-pattern "/product/" --ai-extract-rules '{"name": "name", "price": "price"}' --max-pages 200 --concurrency 200
scrapingbee export --input-dir products --format csv --flatten --columns "name,price" --output-file products.csv
scrapingbee scrape --input-file products.csv --input-column url --update-csv --ai-extract-rules '{"price": "current price"}'
scrapingbee schedule --every 1d --name price-tracker scrape --input-file products.csv --input-column url --update-csv --ai-extract-rules '{"price": "price"}'
scrapingbee schedule --list

More information

Testing

Pytest is configured in pyproject.toml ([tool.pytest.ini_options]). From the project root:

1. Install the package with dev dependencies

pip install -e ".[dev]"

2. Run tests

Command What runs
pytest tests/unit Unit tests only (no API key needed)
pytest -m "not integration" All except integration (no API key needed)
pytest Full suite (integration tests require SCRAPINGBEE_API_KEY)
python tests/run_e2e_tests.py E2E tests (182 tests, requires SCRAPINGBEE_API_KEY)
python tests/run_e2e_tests.py --filter GG E2E tests filtered by prefix

Integration tests call the live ScrapingBee API and are marked with @pytest.mark.integration.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapingbee_cli-1.2.2.tar.gz (60.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapingbee_cli-1.2.2-py3-none-any.whl (70.7 kB view details)

Uploaded Python 3

File details

Details for the file scrapingbee_cli-1.2.2.tar.gz.

File metadata

  • Download URL: scrapingbee_cli-1.2.2.tar.gz
  • Upload date:
  • Size: 60.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scrapingbee_cli-1.2.2.tar.gz
Algorithm Hash digest
SHA256 8ba8660fd9b49a403a7b50404c621d5adf2d032be921c7491118f8d7a78bd2af
MD5 2c0db6cc5e2923b4ed450a86320cc1f3
BLAKE2b-256 cfa935f833376e0f3c48be1a53bbc67fe2bccb8c251e1fd11cff563463d210e4

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapingbee_cli-1.2.2.tar.gz:

Publisher: publish.yml on ScrapingBee/scrapingbee-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scrapingbee_cli-1.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapingbee_cli-1.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cc0557757bf970d97ded90bae4cae0a58d515c6dde4d9fda4158b15d56f6751c
MD5 fce0830df5c7e41f0cdb7a37257b2840
BLAKE2b-256 14d0ce0fad62a4cc1a4c503d1916d07231b70981ffc73778538282ca530105c9

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapingbee_cli-1.2.2-py3-none-any.whl:

Publisher: publish.yml on ScrapingBee/scrapingbee-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page