Simple CLI to scrape product data, images, and collections from Shopify and WooCommerce stores

These details have not been verified by PyPI

Project links

Project description

Shopify Spy

Shopify Spy is a command-line tool for scraping product and collection data from ecommerce stores. Built on Scrapy, it supports Shopify and WooCommerce stores out of the box.

To find Shopify stores to scrape, try searching Google with site:myshopify.com.

Installation

pipx and uv tool install CLI tools in isolated environments, so they won't conflict with other Python projects:

# pipx
pipx install shopify-spy

# uv
uv tool install shopify-spy

Or install with pip if you want it in a specific virtual environment:

pip install shopify-spy

Requires Python 3.10+.

Quick Start

# Scrape a Shopify store (default)
shopify-spy scrape https://www.example.com

# Scrape a WooCommerce store
shopify-spy scrape --platform woocommerce https://www.example.com

# Scrape multiple stores
shopify-spy scrape https://store1.com https://store2.com https://store3.com

# Download product images
shopify-spy scrape https://www.example.com --images

# Include collections (Shopify only)
shopify-spy scrape https://www.example.com --collections

# Scrape multiple stores from a file
shopify-spy scrape --url-file stores.txt

# Specify output directory
shopify-spy scrape https://www.example.com --output ./my-data

Results are saved as JSONL in the output directory (default: ./output). Use --format to choose JSON, CSV, XML, SQLite, or Parquet.

Supported Platforms

Platform	Mechanism	Notes
Shopify	`/sitemap.xml` + `.json` endpoints	Products and collections
WooCommerce	`/wp-json/wc/store/v1/products`	No authentication required

Commands

`scrape`

Scrape products and collections from Shopify and WooCommerce stores.

shopify-spy scrape [URL] [OPTIONS]

Arguments:

URL... - One or more store URLs (optional if using --url-file)

Options:

--platform, -p PLATFORM - Ecommerce platform: shopify, woocommerce (default: shopify)
--limit, -n INT - Stop after scraping N items (useful for sampling or testing)
--url-file, -f FILE - File containing URLs (one per line)
--products / --no-products - Scrape products (default: yes; Shopify only)
--collections / --no-collections - Scrape collections (default: no; Shopify only)
--images / --no-images - Download images (default: no)
--headless / --no-headless - Use Playwright for headless/Hydrogen stores (default: no)
--install-browser / --no-install-browser - Auto-install Chromium if missing, headless mode only (default: yes)
--output, -o PATH - Output directory (default: ./output)
--format, -F FORMAT - Output format: json, jsonl, csv, xml, sqlite, parquet (default: jsonl)
--config, -c FILE - Path to YAML config file
--concurrent INT - Concurrent requests per domain (default: 16)
--throttle / --no-throttle - Auto-throttle requests (default: yes)
--user-agent, -A TEXT - Custom User-Agent header
--verbose, -v - Show debug output
--quiet, -q - Show only warnings and errors

`init`

Create a default configuration file.

shopify-spy init [PATH]

Arguments:

PATH - Where to create the config file (default: ./shopify-spy.yaml)

Options:

--force, -f - Overwrite existing file

Configuration

Shopify Spy can be configured via YAML file. Create one with shopify-spy init:

# shopify-spy.yaml
scrape:
  platform: shopify   # Platform: shopify, woocommerce
  products: true      # Scrape product data (Shopify only)
  collections: false  # Scrape collection data (Shopify only)
  images: false       # Download product images
  headless: false     # Use Playwright for headless Shopify stores

output:
  dir: ./output       # Output directory for results
  format: jsonl       # Output format: json, jsonl, csv, xml, sqlite, parquet
  images_subdir: images  # Subdirectory for downloaded images

network:
  concurrent_requests: 16  # Concurrent requests per domain
  timeout: 180             # Download timeout (seconds)
  retries: 2               # Retry failed requests
  # user_agent: MyBot/1.0 (+https://example.com)  # Custom user agent
  respect_robots_txt: true

throttle:
  enabled: true            # Auto-throttle based on server response
  start_delay: 1           # Initial download delay (seconds)
  max_delay: 60            # Maximum download delay (seconds)
  target_concurrency: 1.0  # Target concurrent requests (higher = faster)

Config file search order:

Path specified with --config
./shopify-spy.yaml
~/.config/shopify-spy/config.yaml

CLI options override config file settings.

Output

Results are saved in the output directory (JSONL by default, configurable via --format):

output/
  shopify_spider_2024-01-15T10-30-00.jsonl
  images/
    full/
      <image files>

Shopify output

Each line contains the full product or collection JSON from Shopify's API, plus two added fields:

{
  "product": { "title": "...", "variants": [...], "images": [...], ... },
  "url": "https://store.com/products/item.json",
  "store": "store.com",
  "image_urls": ["https://cdn.shopify.com/.../product.jpg"]
}

WooCommerce output

Each line contains the full product JSON from the WooCommerce Store API, plus two added fields:

{
  "id": 123,
  "name": "Product Name",
  "slug": "product-name",
  "permalink": "https://store.com/product/product-name/",
  "sku": "SKU-001",
  "prices": { "price": "5200", "currency_code": "USD", "currency_minor_unit": 2 },
  "images": [{ "id": 1, "src": "https://..." }],
  "store": "store.com",
  "image_urls": ["https://..."]
}

Note: WooCommerce prices are strings in minor currency units (divide by 10^currency_minor_unit to get the decimal value).

Image Metadata

When using --images, each item includes a scraped_images field with download info:

{
  "image_urls": ["https://cdn.shopify.com/.../product.jpg"],
  "scraped_images": [
    {
      "url": "https://cdn.shopify.com/.../product.jpg",
      "path": "full/abc123def.jpg",
      "checksum": "d41d8cd98f00b204e9800998ecf8427e",
      "status": "downloaded"
    }
  ]
}

The path is relative to the images directory (output/images/ by default).

Parsing Output

With jq:

# Shopify: extract product titles
cat output/*.jsonl | jq '.product.title'

# WooCommerce: extract product names and prices
cat output/*.jsonl | jq '{name: .name, price: .prices.price, currency: .prices.currency_code}'

With Python:

import json

with open("output/shopify_spider_2024-01-15.jsonl") as f:
    for line in f:
        item = json.loads(line)
        print(item["product"]["title"])  # Shopify
        # print(item["name"])            # WooCommerce

With pandas:

import pandas as pd

df = pd.read_json("output/shopify_spider_2024-01-15.jsonl", lines=True)
products = pd.json_normalize(df["product"])  # Shopify

With polars:

import polars as pl

df = pl.read_ndjson("output/shopify_spider_2024-01-15.jsonl")

Browser-Based Scraping

Some stores require a real browser to scrape -- for example, stores built on Hydrogen or those that block automated HTTP requests. Use the --headless flag to enable Playwright-based scraping:

# Install with browser-based scraping support
pip install shopify-spy[headless]

# Scrape a store using browser rendering (Chromium is installed automatically on first use, ~300MB)
shopify-spy scrape https://example.com --headless

# Skip the auto-install (e.g. in CI where Chromium is pre-installed)
shopify-spy scrape https://example.com --headless --no-install-browser

Browser mode tries fast JSON endpoints first and only falls back to full page rendering when needed.

Limitations

WooCommerce Store API required. The WooCommerce spider uses the public Store API (/wp-json/wc/store/v1/products), available in WooCommerce 3.x and later. Stores that have disabled the REST API via security plugins, or that broadly block crawlers in robots.txt, will not be scrapeable.

Rate limiting. Scraping very large stores may result in temporary bans. Auto-throttling is enabled by default, but you can adjust the settings or disable it for faster scraping:

# Disable throttling (faster but riskier)
shopify-spy scrape https://example.com --no-throttle

Advanced Usage

For advanced Scrapy configuration or custom pipelines, you can use Shopify Spy as a library:

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from shopify_spy.spiders.shopify import ShopifySpider
from shopify_spy.spiders.woocommerce import WooCommerceSpider
process = CrawlerProcess(get_project_settings())

# Shopify
process.crawl(ShopifySpider, url="https://example.com", products=True)

# WooCommerce
process.crawl(WooCommerceSpider, url="https://example.com")

process.start()

Feedback

Found a bug or have a suggestion? Open an issue.

License

MIT

Credits

Icon by Bartama Graphic.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.1

Apr 3, 2026

This version

0.2.0 yanked

Mar 31, 2026

Reason this release was yanked:

Rolling back half-baked output format feats

0.1.1

Feb 14, 2026

0.1.0

Jan 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shopify_spy-0.2.0.tar.gz (145.6 kB view details)

Uploaded Mar 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

shopify_spy-0.2.0-py3-none-any.whl (22.6 kB view details)

Uploaded Mar 31, 2026 Python 3

File details

Details for the file shopify_spy-0.2.0.tar.gz.

File metadata

Download URL: shopify_spy-0.2.0.tar.gz
Upload date: Mar 31, 2026
Size: 145.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for shopify_spy-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`e5dadc18656efba4ae730ae2515ee8dbe4767374e673bade79dcc93d43c4da1c`
MD5	`82cc804a9ad09b5776d76dda034ca8c7`
BLAKE2b-256	`7f46ac14fdd778b27ab25407a008c1c5d9443222fdbce95a4529e02a3914e6fc`

See more details on using hashes here.

File details

Details for the file shopify_spy-0.2.0-py3-none-any.whl.

File metadata

Download URL: shopify_spy-0.2.0-py3-none-any.whl
Upload date: Mar 31, 2026
Size: 22.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for shopify_spy-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e6b1541d43a58762436f2313f20b3321db9e54d92b36acb75dc8386378054cca`
MD5	`ea7274d7336b7ed202e43ffd5c72a27a`
BLAKE2b-256	`e59ec553b3e94c9ff56f5a34e673a72d06d06ba954ca3b4625660b049cb1b4fa`

See more details on using hashes here.

shopify-spy 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Shopify Spy

Installation

Quick Start

Supported Platforms

Commands

scrape

init

Configuration

Output

Shopify output

WooCommerce output

Image Metadata

Parsing Output

Browser-Based Scraping

Limitations

Advanced Usage

Feedback

License

Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`scrape`

`init`