Skip to main content

Google Shopping Scraper can scrape and parse various Google Shopping page types to collect structured e-commerce data.

Project description

Google Shopping Scraper

Oxylabs promo code

As part of E-Commerce Scraper API, Google Shopping Scraper extracts timely e-commerce data in raw HTML or structured JSON format. The scraper offers a maintenance-free data collection infrastructure that automates the bulk of underlying processes, from sending HTTP requests to data parsing.

The underlying measures, such as proxies, ensure considerably fewer CAPTCHAs and IP blocks. The scraper supports localized results from almost any locale worldwide (195 countries) with country-level and postal code targeting.

Additionally, the scraper can automate recurring scraping and parsing jobs through Scheduler, load dynamic websites that use JavaScript for rendering content, and retrieve results via the API or directly to Google Cloud Storage or Amazon S3 storage bucket.

How it works

There are various page types we can scrape and parse on Google Shopping. You can either provide us with a full URL or a few input parameters via specifically built data sources (e.g. Search, Product, Product Pricing so we can form the URL on our end.

Overview

Below is a quick overview of all the available data source values we support with Google Shopping.

Source Description Structured data
google Submit any Google Shopping URL you like. Depends on the URL.
google_shopping_search Search results for a search term of your choice. Yes.
google_shopping_product Product page of a product ID of your choice. Yes.
google_shopping_pricing List of offers available for a product ID of your choice. Yes.

URL

The google source is designed to retrieve content from various Google Shopping URLs. Instead of sending multiple parameters and letting us form and scrape Google Shopping URLs, you can provide us with a URL to the required Google Shopping page. We do not strip any parameters or alter your URLs in any other way.

This data source also supports parsed data (structured data in JSON format), as long as the URL submitted links to a page that we can parse.

Query parameters

Parameter Description Default Value
source Data source. More info. google
url Direct URL (link) to Google page -
user_agent_type Device type and browser. The full list can be found here. desktop
render Enables JavaScript rendering. More info.
callback_url URL to your callback endpoint. More info. -
geo_location The geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. For more information, read about our suggested geo_location parameter structures here. -
parse true will return parsed data, as long as the URL submitted is for Google Search. -

- required parameter

Python code example

In this example, we make a request to retrieve a Google Shopping Search result for keyword adidas, as seen in New York, USA.

import requests
from pprint import pprint

# Structure payload.
payload = {
    'source': 'google',
    'url': 'https://www.google.com/search?tbm=shop&q=adidas&hl=en',
    'geo_location': 'New York,New York,United States'
}

# Get response.
response = requests.request(
    'POST',
    'https://realtime.oxylabs.io/v1/queries',
    auth=('user', 'pass1'),
    json=payload,
)

# Instead of response with job status and results url, this will return the
# JSON response with results.
pprint(response.json())

Code examples for other languages can be found here.

Shopping Search

The google_shopping_search source is designed to retrieve Google Shopping search results.

Query parameters

Parameter Description Default Value
source Data source. More info. google_shopping_search
domain Domain localization com
query UTF-encoded keyword -
start_page Starting page number 1
pages Number of pages to retrieve 1
locale Accept-Language header value which changes your Google Shopping page web interface language. More info. -
results_language Results language. List of supported Google languages can be found here. -
geo_location The geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. For more information, read about our suggested geo_location parameter structures here. -
user_agent_type Device type and browser. The full list can be found here. desktop
render Enables JavaScript rendering. More info. -
callback_url URL to your callback endpoint. More info. -
parse true will return parsed data. -

context:
nfpr

true will turn off spelling auto-correction. false

context:
sort_by

Sort product list by a given criteria. r applies default Google sorting, rv - by review score, p - by price ascending, pd - by price descending r

context:
min_price

Minimum price of products to filter -

context:
max_price

Maximum price of products to filter -

- required parameter

Python code example

In this example, we make a request to retrieve the first 4 pages of Google Shopping search for the search term adidas, sorted by descending price and minimum price of $20.

import requests
from pprint import pprint


# Structure payload.
payload = {
    'source': 'google_shopping_search',
    'domain': 'com',
    'query': 'adidas',
    'pages': 4,
    'context': [
        {'key': 'sort_by', 'value': 'pd'},
        {'key': 'min_price', 'value': 20},
    ],
}

# Get response.
response = requests.request(
    'POST',
    'https://realtime.oxylabs.io/v1/queries',
    auth=('user', 'pass1'),
    json=payload,
)

# Print prettified response to stdout.
pprint(response.json())

Code examples for other languages can be found here.

Shopping Product

The google_shopping_product source is designed to retrieve Google Shopping product page for a specified product.

Query parameters

Parameter Description Default Value
source Data source. More info. google_shopping_product
domain Domain localization com
query UTF-encoded product code -
locale Accept-Language header value which changes your Google Shopping page web interface language. More info. -
results_language Results language. List of supported Google languages can be found here. -
geo_location The geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. For more information, read about our suggested geo_location parameter structures here. -
user_agent_type Device type and browser. The full list can be found here. desktop
render Enables JavaScript rendering. More info.
callback_url URL to your callback endpoint. More info. -
parse true will return parsed data. -

- required parameter

Python code example

In the code example below, we make a request to retrieve the product page for product ID 5007040952399054528 from Google Shopping on com domain.

import requests
from pprint import pprint


# Structure payload.
payload = {
    'source': 'google_shopping_product',
    'domain': 'com',
    'query': '5007040952399054528',
}

# Get response.
response = requests.request(
    'POST',
    'https://realtime.oxylabs.io/v1/queries',
    auth=('user', 'pass1'),
    json=payload,
)

# Print prettified response to stdout.
pprint(response.json())

Code examples for other languages can be found here.

Product Pricing

The google_shopping_pricing source is designed to retrieve pages containing lists of offers available for a product ID of your choice.

Query parameters

Parameter Description Default Value
source Data source. More info. google_shopping_pricing
domain Domain localization com
query UTF-encoded product code -
start_page Starting page number 1
pages Number of pages to retrieve 1
locale Accept-Language header value which changes your Google Shopping page web interface language. More info. -
results_language Results language. List of supported Google languages can be found here. -
geo_location The geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. For more information, read about our suggested geo_location parameter structures here. -
user_agent_type Device type and browser. The full list can be found here. desktop
render Enables JavaScript rendering. More info.
callback_url URL to your callback endpoint. More info. -
parse true will return parsed data. -
- required parameter

Python code example

In the code example below, we make a request to retrieve the product pricing page for product ID 5007040952399054528 from Google Shopping on google.com.

import requests
from pprint import pprint


# Structure payload.
payload = {
    'source': 'google_shopping_pricing',
    'domain': 'com',
    'query': '5007040952399054528',
}

# Get response.
response = requests.request(
    'POST',
    'https://realtime.oxylabs.io/v1/queries',
    auth=('user', 'pass1'),
    json=payload,
)

# Print prettified response to stdout.
pprint(response.json())

Code examples for other languages can be found here..

If you have questions or concerns about Google Shopping Scraper or associated features, get in touch via (support@oxylabs.io) or through the live chat on our website.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

google-shopping-scraper-api-0.1.3.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file google-shopping-scraper-api-0.1.3.tar.gz.

File metadata

File hashes

Hashes for google-shopping-scraper-api-0.1.3.tar.gz
Algorithm Hash digest
SHA256 b969d3a86d9834380b5f8d1eb8b1fecce488d7a852e29bd5c256459ee3af387e
MD5 2de7ca8402017735cb860fcf7e674a61
BLAKE2b-256 7bf412cc5d2fb34f0c218ca2f3dd9abe227a198a4732aeba282efe0c5b38c5bb

See more details on using hashes here.

File details

Details for the file google_shopping_scraper_api-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for google_shopping_scraper_api-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f376be90cfcf875f22c1e53a7ac17ea7d55fe6d9d5f38ad80d35ad84555ad9d3
MD5 7d27587a930f28b15801abd9e5088e5c
BLAKE2b-256 54f570b2f7ea8208b605e6b27b91ce343fccc7e5182bb082f019ede3727ebcc1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page