Simple CLI to scrape product data, images, and collections from Shopify stores
Project description
Shopify Spy
Shopify Spy is a command-line tool for scraping product and collection data from any Shopify store. Built on Scrapy, it extracts detailed data including high-value information like vendor names and inventory levels.
To find Shopify stores to scrape, try searching Google with site:myshopify.com.
Installation
pipx and uv tool install CLI tools in isolated environments, so they won't conflict with other Python projects:
# pipx
pipx install shopify-spy
# uv
uv tool install shopify-spy
Or install with pip if you want it in a specific virtual environment:
pip install shopify-spy
Requires Python 3.10+.
Quick Start
# Scrape a single store
shopify-spy scrape https://www.example.com
# Scrape multiple stores
shopify-spy scrape https://store1.com https://store2.com https://store3.com
# Download product images
shopify-spy scrape https://www.example.com --images
# Include collections
shopify-spy scrape https://www.example.com --collections
# Scrape multiple stores from a file
shopify-spy scrape --url-file stores.txt
# Specify output directory
shopify-spy scrape https://www.example.com --output ./my-data
Results are saved as JSON lines in the output directory (default: ./output).
Commands
scrape
Scrape products and collections from Shopify stores.
shopify-spy scrape [URL] [OPTIONS]
Arguments:
URL...- One or more Shopify store URLs (optional if using--url-file)
Options:
--url-file, -f FILE- File containing URLs (one per line)--products / --no-products- Scrape products (default: yes)--collections / --no-collections- Scrape collections (default: no)--images / --no-images- Download images (default: no)--output, -o PATH- Output directory (default:./output)--config, -c FILE- Path to YAML config file--concurrent INT- Concurrent requests per domain (default: 16)--throttle / --no-throttle- Auto-throttle requests (default: yes)--user-agent, -A TEXT- Custom User-Agent header--verbose, -v- Show debug output--quiet, -q- Show only warnings and errors
init
Create a default configuration file.
shopify-spy init [PATH]
Arguments:
PATH- Where to create the config file (default:./shopify-spy.yaml)
Options:
--force, -f- Overwrite existing file
Configuration
Shopify Spy can be configured via YAML file. Create one with shopify-spy init:
# shopify-spy.yaml
scrape:
products: true # Scrape product data
collections: false # Scrape collection data
images: false # Download product images
output:
dir: ./output # Output directory for results
images_subdir: images # Subdirectory for downloaded images
network:
concurrent_requests: 16 # Concurrent requests per domain
timeout: 180 # Download timeout (seconds)
retries: 2 # Retry failed requests
# user_agent: MyBot/1.0 (+https://example.com) # Custom user agent
respect_robots_txt: true
throttle:
enabled: true # Auto-throttle based on server response
start_delay: 1 # Initial download delay (seconds)
max_delay: 60 # Maximum download delay (seconds)
target_concurrency: 1.0 # Target concurrent requests (higher = faster)
Config file search order:
- Path specified with
--config ./shopify-spy.yaml~/.config/shopify-spy/config.yaml
CLI options override config file settings.
Output
Results are saved as JSON lines files in the output directory:
output/
shopify_spider_2024-01-15T10-30-00.jsonl
images/
full/
<image files>
Each line in the JSON file contains a product or collection with full metadata from Shopify's JSON API.
Parsing Output
With jq:
# Extract product titles
cat output/*.jsonl | jq '.product.title'
# Get prices
cat output/*.jsonl | jq '{title: .product.title, price: .product.variants[0].price}'
With Python:
import json
with open("output/shopify_spider_2024-01-15.jsonl") as f:
for line in f:
item = json.loads(line)
print(item["product"]["title"])
With pandas:
import pandas as pd
df = pd.read_json("output/shopify_spider_2024-01-15.jsonl", lines=True)
products = pd.json_normalize(df["product"])
With polars:
import polars as pl
df = pl.read_ndjson("output/shopify_spider_2024-01-15.jsonl")
Limitations
Standard Shopify stores only. This tool works with standard Shopify stores using Liquid themes, which represent nearly all Shopify sites. The small number of headless stores built on Hydrogen or other custom storefronts are not supported, as they use the Storefront GraphQL API instead of the JSON endpoints this tool relies on.
Rate limiting. Scraping very large stores may still result in temporary bans. Auto-throttling is enabled by default, but you can adjust the settings or disable it for faster scraping:
# Disable throttling (faster but riskier)
shopify-spy scrape https://example.com --no-throttle
Advanced Usage
For advanced Scrapy configuration or custom pipelines, you can use Shopify Spy as a library:
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from shopify_spy.spiders.shopify import ShopifySpider
process = CrawlerProcess(get_project_settings())
process.crawl(ShopifySpider, url="https://example.com", products=True)
process.start()
Feedback
Found a bug or have a suggestion? Open an issue.
License
Credits
Icon by Bartama Graphic.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file shopify_spy-0.1.0.tar.gz.
File metadata
- Download URL: shopify_spy-0.1.0.tar.gz
- Upload date:
- Size: 116.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a7228e1940e327b0d3b06e993fde0f5fc8f68de3e12c9e87e3690f98f30754b
|
|
| MD5 |
79c2825bfb7c94575c694621d7000769
|
|
| BLAKE2b-256 |
5048f59f6d5dfbf945f86da80be0d2101db1c381bf1ce3e47e6b50627c8b9813
|
File details
Details for the file shopify_spy-0.1.0-py3-none-any.whl.
File metadata
- Download URL: shopify_spy-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0404120a30312e0e20b035c2a8baf13bec4f767a4671edd3d29b7280c8bcf4f9
|
|
| MD5 |
d6d5731582096bb9305c4786f36371ad
|
|
| BLAKE2b-256 |
c16a2a141f248bbd687923d501a94f3aa8e0c3c73bebca067ee2a35ff6717315
|