ShopScraper is a thin python wrapper for Shopify webshop product APIs used to scrape information from online stores.
Project description
ShopScraper
ShopScraper is a thin python wrapper for Shopify webshop product APIs used to scrape information from online stores. Every Shopify webshop has a "hidden" api with access to all of the store's products. This python package uses the requests library to grab the product information and return objects representing each product. There are also convenience functions for saving to and reading from a JSON file.
>>> import shopscraper
>>>
>>> products = shopscraper.scrape("bjjfanatics.com", include_html=False, items_per_page=2, max_pages=1)
>>> type(products)
<class 'generator'>
>>> for product in products:
>>> print(product)
id=6706690981986,
title='New Wave Jiu Jitsu: Side Attacks - Building a Devastating Side Control System by John Danaher'
handle='new-wave-jiu-jitsu-side-attacks-building-a-devastating-side-control-system-by-john-danaher',
body_html='',
published_at=datetime.datetime(2022, 5, 18, 8, 2, 9, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000))),
created_at=datetime.datetime(2022, 5, 4, 23, 14, 55, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000))),
updated_at=datetime.datetime(2022, 5, 18, 11, 55, 59, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000))),
vendor='John Danaher',
product_type='COMBO',
tags=['Facebook', 'Fighter_John Danaher', 'MC_Side_Control_Attacks', 'New', 'new_and_popular', 'Show_More_App'],
variants=[
Variant(
id=39769726582882,
title='Default Title',
option1='Default Title',
option2=None,
option3=None,
sku='JDNWJJSA-01',
requires_shipping=False,
taxable=True,
featured_image=None,
available=True,
price=19700,
grams=0,
compare_at_price=None,
position=1,
product_id=6706690981986,
created_at=datetime.datetime(2022, 5, 4, 23, 14, 55, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000))),
updated_at=datetime.datetime(2022, 5, 18, 11, 55, 15, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000)))
)
]
images=[
Image(
id=28542700257378,
created_at=datetime.datetime(2022, 5, 4, 23, 14, 55,
tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000))),
position=1,
updated_at=datetime.datetime(2022, 5, 4, 23, 14, 57, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000))),
product_id=6706690981986,
variant_ids=[],
src='https://cdn.shopify.com/s/files/1/1800/2299/products/JohnDanaher_NewWaveJiu-Jitsu-SideAttacks_CoverFRONT.jpg?v=1651720497',
width=1631,
height=2194
),
]
options=[
Options(
name='Title',
position=1,
values=['Default Title']
)
]
Installation
ShopScraper is available on PyPI:
$ python -m pip install shopscraper
ShopScraper officially supports Python 3.7+.
Usage
The 'scrape' and 'read_json' functions yield product objects with lists of Variant, Image, and Options objects.
class Image:
"""
Attributes:
id (int): image id
created_at (datetime.datetime): datetime object of when image was created
position (int): position of image in product
updated_at (datetime.datetime): datetime object of when image was last updated
product_id (int): product id associated with the image
variant_ids (list[int]): list of variant ids associated with the image
src (str): url to image
width (int): width of image in pixels
height (int): height of image in pixels
"""
class Options:
"""
Attributes:
name (str): name of option
position (int): position of option in product
values (list[Any]): list of values for option
"""
class Variant:
"""
Attributes:
id (int): variant id
title (str): title of variant
option1 (str): first option of variant
option2 (str): second option of variant
option3 (str): third option of variant
sku (Optional[str]): sku of variant
requires_shipping (bool): whether variant requires shipping
taxable (bool): whether variant is taxable
featured_image (Image): featured image of variant
price (float): price of variant
grams (int): weight of variant in grams
compare_at_price (Optional(float)): compare at price of variant
position (int): position of variant in product
product_id (int): product id associated with the variant
created_at (datetime.datetime): datetime object of when variant was created
updated_at (datetime.datetime): datetime object of when variant was last updated
"""
class Product:
"""
Attributes:
id (int): product id
title (str): name of product
handle (str): url safe name of product
body_html (str): description of product (html)
published_at (datetime.datetime): date product was published
created_at (datetime.datetime): date product was created
updated_at (datetime.datetime): date product was last updated
vendor (str): name of product vendor
product_type (str): type of product
tags (list[str]): tags associated with product
variants (list[Variants]): list of variants for product
images (list[Image]): list of images for product
options (list[Options]): list of options for product
"""
The 'scrape' function yields product objects:
>>> import shopscraper
>>>
>>> products = shopscraper.scrape("bjjfanatics.com")
>>> type(products)
<class 'generator'>
The 'scrape_to_json' function saves the scraped data to the specified file path:
>>> import shopscraper
>>>
>>> save_path = shopscraper.scrape_to_json("bjjfanatics.com", "C:\\scraped_data.json")
>>> type(save_path)
<class 'pathlib.Path'>
The 'read_json' function reads the saved json file and yields Product objects:
>>> import shopscraper
>>>
>>> products = shopscraper.read_json("C:\\scraped_data.json")
>>> type(products)
<class 'generator'>
Note that both functions that provide product objects are generators, which are more memory efficient than lists but can only be iterated over one time. If you want to use the product objects in more than one operation, accumulate them into a list:
>>> import shopscraper
>>>
>>> products = shopscraper.read_json("C:\\scraped_data.json")
>>> product_list = list(products)
>>> len(product_list)
200
>>> combo_products = [i for i in product_list if i.product_type == "COMBO"]
>>> len(combo_products)
10
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file shopscraper-0.0.4.tar.gz.
File metadata
- Download URL: shopscraper-0.0.4.tar.gz
- Upload date:
- Size: 9.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.9.1 Windows/10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d51b803dd03607ccebe09a437dc11f4227b1c31c36757e3fe84c42d127a75d5
|
|
| MD5 |
0a5433ecef5ce838cf45309a95ccccd1
|
|
| BLAKE2b-256 |
98abd3badb377796901b4f9f39f7114807434f9d65f6a4b44bf75249124b0941
|
File details
Details for the file shopscraper-0.0.4-py3-none-any.whl.
File metadata
- Download URL: shopscraper-0.0.4-py3-none-any.whl
- Upload date:
- Size: 9.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.9.1 Windows/10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6dec46d2ef119e67a7af8f0ceef6eb9069e82be343592556cfa7b6dd3f18cfec
|
|
| MD5 |
fc0114515861c6420f8b48edc7243727
|
|
| BLAKE2b-256 |
261aa1b71d036093ebdebe21f7bde3ce5ad05b06bbe49fd1fe05ac2c4c6e4698
|