Command-line client for the ScrapingBee API: scrape pages (single or batch), crawl sites, check usage/credits, and use Google Search, Fast Search, Amazon, Walmart, YouTube, ChatGPT, and Gemini from the terminal.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sahilsunnysb wfn

These details have not been verified by PyPI

Project links

Project description

ScrapingBee CLI

Command-line client for the ScrapingBee API: scrape URLs (single or batch), crawl sites, check usage and credits, and use Google, Fast Search, Amazon, Walmart, YouTube, ChatGPT, and Gemini from the terminal.

Requirements

Python 3.10+
Linux: clipboard copy (e.g. selecting text in the interactive REPL's :view pager) needs one of wl-copy, xclip, or xsel installed. macOS (pbcopy) and Windows (clip) work out of the box.

Setup: Install (below), then authenticate (Configuration). You need a ScrapingBee API key before any command will work.

Installation

Recommended — install with uv (no virtual environment needed):

curl -LsSf https://astral.sh/uv/install.sh | sh
uv tool install scrapingbee-cli

Alternative — install with pip in a virtual environment:

pip install scrapingbee-cli

From source: clone the repo and run pip install -e . in the project root.

Configuration

You need a ScrapingBee API key:

scrapingbee auth – Validate and save the key to config (use --api-key KEY for non-interactive; --show to print config path).
Environment – export SCRAPINGBEE_API_KEY=your_key
.env file – In the current directory or ~/.config/scrapingbee-cli/.env

Remove the stored key with scrapingbee logout. Get your API key from the ScrapingBee dashboard.

Usage

scrapingbee [command] [arguments] [options]

scrapingbee --help – List all commands.
scrapingbee [command] --help – Options and parameters for that command.

Options are per-command. Each command has its own set of options — run scrapingbee [command] --help to see them. Common options across batch-capable commands include --output-file, --output-dir, --input-file, --input-column, --concurrency, --output-format, --overwrite, --retries, --backoff, --resume, --update-csv, --no-progress, --extract-field, --fields, --smart-extract, --deduplicate, --sample, --post-process, --on-complete, --scraping-config, and --verbose. For details, see the documentation.

Parameter values: Choice parameters accept both hyphens and underscores interchangeably (e.g. --sort-by price-low and --sort-by price_low both work).

Commands

Command	Description
`usage`	Check credits and max concurrency
`auth` / `logout`	Save or remove API key
`docs`	Print docs URL; `--open` to open in browser
`scrape [url]`	Scrape a URL (HTML, JS, screenshot, extract)
`crawl`	Crawl sites following links, with AI extraction and save-pattern filtering
`google` / `fast-search`	Search SERP APIs
`amazon-product` / `amazon-pricing` / `amazon-search`	Amazon product, pricing and search
`walmart-search` / `walmart-product`	Walmart search and product
`youtube-search` / `youtube-metadata`	YouTube search and video metadata
`chatgpt`	ChatGPT API (`--search true` for web-enhanced responses)
`gemini`	Gemini API (returns text, markdown, and citations)
`export`	Merge batch/crawl output to ndjson, txt, or csv (with --flatten, --columns)
`schedule`	Schedule commands via cron (--name, --list, --stop)
`tutorial`	Interactive step-by-step guide to CLI features (`--chapter N`, `--reset`, `--list`, `--output-dir`)

Batch mode: Commands that take a single input support --input-file (one line per input, or .csv with --input-column) and --output-dir. Use --output-format csv or --output-format ndjson to stream all results to a single file (or stdout) instead of individual files. Add --deduplicate to remove duplicate URLs, --sample N to test on a subset, or --post-process 'jq .title' to transform each result. Use --resume to skip already-completed items after interruption. Run bare scrapingbee --resume to discover incomplete batches in the current directory.

Parameters and options: Use space-separated values (e.g. --render-js false), not --option=value. For full parameter lists, response formats, and credit costs, see scrapingbee [command] --help and the ScrapingBee API documentation.

Key features

Interactive REPL: run scrapingbee with no command to open an interactive shell — a live toolbar (credits, concurrency, elapsed), :-commands (:help, :set for session defaults, :view to page the last output, :q to quit), and any command run inline with completion and history. Drag across scrollback output to copy it to the clipboard; pass --no-drag-copy to use the Scroll/Select toggle (Shift+Tab) instead.
AI extraction: --ai-extract-rules '{"price": "product price", "title": "product name"}' pulls structured data from any page using natural language — no CSS selectors needed. Works with scrape, crawl, and batch mode.
CSS/XPath extraction: --extract-rules '{"title": "h1", "price": ".price"}' for consistent, cheaper production scraping. Find selectors in browser DevTools.
Pipelines: Chain commands with --extract-field — e.g. google QUERY --extract-field organic_results.url > urls.txt then scrape --input-file urls.txt. Use --fields to filter JSON output keys; supports dot notation (e.g. --fields product.title,product.price).
Smart Extract: --smart-extract extracts data from any format (JSON, HTML, XML, CSV, Markdown) using a path expression. Auto-detects format. Supports slicing, regex filtering, and JSON schema output.
Update CSV: --update-csv fetches fresh data and updates the input CSV in-place. Ideal for daily price tracking, inventory monitoring, or any dataset that needs periodic refresh.
Crawl with filtering: --include-pattern, --exclude-pattern control which links to follow. --save-pattern only saves pages matching a regex (others are still fetched for link discovery — which costs credits — but not saved). Note: --max-pages caps saved pages, so with --save-pattern (or AI/screenshot/extract modes) the crawl fetches additional pages to find matches and total credits can exceed --max-pages.
Output formats: --output-format accepts ndjson (streams results as JSON lines) or csv (writes a single CSV) — these are the only valid values. Default (no flag) writes individual files per item into --output-dir.
CSV input: --input-file products.csv --input-column url reads URLs from a CSV column.
Export: scrapingbee export --input-dir batch/ --format csv --flatten --columns "title,price" merges batch output with nested JSON flattening and column selection.
Scheduling: scrapingbee schedule --every 1d --name prices scrape --input-file products.csv --update-csv registers a cron job. Use --list, --stop NAME, or --stop all.
Deduplication & sampling: --deduplicate removes duplicate URLs; --sample 100 processes only 100 random items.
RAG chunking: scrape --chunk-size 500 --chunk-overlap 50 --return-page-markdown true outputs NDJSON chunks ready for vector DB ingestion.
Scraping configurations: --scraping-config "My-Config" applies a pre-saved configuration from your ScrapingBee dashboard. Inline options override config settings. Create configurations in the request builder. Running scrapingbee --scraping-config NAME (without a subcommand) auto-routes to scrape.

Examples

scrapingbee usage
scrapingbee scrape "https://example.com" --output-file page.html
scrapingbee scrape "https://example.com/product" --ai-extract-rules '{"title": "product name", "price": "price"}'
scrapingbee google "pizza new york" --extract-field organic_results.url > urls.txt
scrapingbee scrape --input-file urls.txt --output-dir pages --deduplicate
scrapingbee crawl "https://store.com" --output-dir products --save-pattern "/product/" --ai-extract-rules '{"name": "name", "price": "price"}' --max-pages 200 --concurrency 200
scrapingbee export --input-dir products --format csv --flatten --columns "name,price" --output-file products.csv
scrapingbee scrape --input-file products.csv --input-column url --update-csv --ai-extract-rules '{"price": "current price"}'
scrapingbee schedule --every 1d --name price-tracker scrape --input-file products.csv --input-column url --update-csv --ai-extract-rules '{"price": "price"}'
scrapingbee schedule --list

# Smart Extract — pull fields from any format with a path expression
scrapingbee google "pizza new york" --smart-extract 'organic_results[0:3].title'
scrapingbee scrape "https://example.com" --smart-extract '...a[href=/mailto/].text'
scrapingbee scrape "https://example.com" --smart-extract '{"titles": "...h1", "links": "...href[0:5]"}'

Security

The --post-process, --on-complete, and schedule commands execute arbitrary shell commands on your machine. These features are disabled by default and require explicit human setup to enable.

For advanced features setup, see the Security section in our CLI documentation.

Do not enable these features in AI agent environments where commands may be constructed from scraped web content. ScrapingBee is not responsible for any damages caused by shell execution features. Use at your own discretion.

More information

CLI Documentation – Full CLI reference with pipelines, parameters, and examples.
Advanced usage examples – Shell piping, command chaining, batch workflows, monitoring scripts, NDJSON streaming, screenshots, Google search patterns, LLM chunking, and more.
ScrapingBee API documentation – Parameters, response formats, credit costs, and best practices.
Claude / AI agents: This repo includes a Claude Skill and Claude Plugin for agent use with file-based output and security rules.

Testing

Pytest is configured in pyproject.toml ([tool.pytest.ini_options]). From the project root:

1. Install the package with dev dependencies

pip install -e ".[dev]"

2. Run tests

Command	What runs
`pytest tests/unit`	Unit tests only (no API key needed)
`pytest -m "not integration"`	All except integration (no API key needed)
`pytest`	Full suite (integration tests require `SCRAPINGBEE_API_KEY`)
`python tests/run_e2e_tests.py`	E2E tests (182 tests, requires `SCRAPINGBEE_API_KEY`)
`python tests/run_e2e_tests.py --filter GG`	E2E tests filtered by prefix

Integration tests call the live ScrapingBee API and are marked with @pytest.mark.integration.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sahilsunnysb wfn

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.5.0

Jul 8, 2026

1.4.4

Jun 29, 2026

1.4.3

Jun 9, 2026

1.4.2

May 26, 2026

1.4.1

Apr 17, 2026

1.4.0

Apr 2, 2026

1.3.1

Mar 30, 2026

1.3.0

Mar 27, 2026

1.2.3 yanked

Mar 25, 2026

1.2.2 yanked

Mar 16, 2026

1.2.1 yanked

Mar 16, 2026

1.2.0 yanked

Mar 16, 2026

1.1.0 yanked

Mar 3, 2026

1.0.1 yanked

Feb 24, 2026

1.0.0 yanked

Feb 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapingbee_cli-1.5.0.tar.gz (210.4 kB view details)

Uploaded Jul 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scrapingbee_cli-1.5.0-py3-none-any.whl (225.9 kB view details)

Uploaded Jul 8, 2026 Python 3

File details

Details for the file scrapingbee_cli-1.5.0.tar.gz.

File metadata

Download URL: scrapingbee_cli-1.5.0.tar.gz
Upload date: Jul 8, 2026
Size: 210.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scrapingbee_cli-1.5.0.tar.gz
Algorithm	Hash digest
SHA256	`3b1ddbe596703c3be5fdb19f52f642739c8e8fa63d48b7e11de90de9724dafcd`
MD5	`8c8ed597d37423039fe072329187603f`
BLAKE2b-256	`26b060f350958a64682c64ba01c3414c07c635cd32574750244f0533cf5372c5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapingbee_cli-1.5.0.tar.gz:

Publisher: publish.yml on ScrapingBee/scrapingbee-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: scrapingbee_cli-1.5.0.tar.gz
- Subject digest: 3b1ddbe596703c3be5fdb19f52f642739c8e8fa63d48b7e11de90de9724dafcd
- Sigstore transparency entry: 2113830828
- Sigstore integration time: Jul 8, 2026
Source repository:
- Permalink: ScrapingBee/scrapingbee-cli@2adb3ca33504d1da616bd7a973d66f9791369f75
- Branch / Tag: refs/tags/v1.5.0
- Owner: https://github.com/ScrapingBee
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2adb3ca33504d1da616bd7a973d66f9791369f75
- Trigger Event: release

File details

Details for the file scrapingbee_cli-1.5.0-py3-none-any.whl.

File metadata

Download URL: scrapingbee_cli-1.5.0-py3-none-any.whl
Upload date: Jul 8, 2026
Size: 225.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scrapingbee_cli-1.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8ad7e61687ff6f52b6346845335e687ccf012174136584085b500fa2b9f70d31`
MD5	`81cd71c72eb6db99fea95d0f406b235a`
BLAKE2b-256	`e15077c1927da48e9931dbab5c01f89a29adbcaebecafee39b85ea38015bfba0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapingbee_cli-1.5.0-py3-none-any.whl:

Publisher: publish.yml on ScrapingBee/scrapingbee-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: scrapingbee_cli-1.5.0-py3-none-any.whl
- Subject digest: 8ad7e61687ff6f52b6346845335e687ccf012174136584085b500fa2b9f70d31
- Sigstore transparency entry: 2113830898
- Sigstore integration time: Jul 8, 2026
Source repository:
- Permalink: ScrapingBee/scrapingbee-cli@2adb3ca33504d1da616bd7a973d66f9791369f75
- Branch / Tag: refs/tags/v1.5.0
- Owner: https://github.com/ScrapingBee
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2adb3ca33504d1da616bd7a973d66f9791369f75
- Trigger Event: release

scrapingbee-cli 1.5.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ScrapingBee CLI

Requirements

Installation

Configuration

Usage

Commands

Key features

Examples

Security

More information

Testing

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance