Skip to main content

ScraperAPI Python SDK

Project description

ScraperAPI Python SDK

Install

pip install scraperapi-sdk

Usage

from scraperapi_sdk import ScraperAPIClient

client = ScraperAPIClient("<API-KEY>")

# regular get request
content = client.get('https://amazon.com/')
# get request with premium
content = client.get('https://amazon.com/', params={'premium': True})

# post request
content = client.post('https://webhook.site/403e44ce-5835-4ce9-a648-188a51d9395d', headers={'Content-Type': 'application/x-www-form-urlencoded'}, data={'field1': 'data1'})

# put request
content = client.put('https://webhook.site/403e44ce-5835-4ce9-a648-188a51d9395d', headers={'Content-Type': 'application/json'}, data={'field1': 'data1'})

The content variable will contain the scraped page.

If you want to get the Response object instead of the content you can use make_request.

response = client.make_request(url='https://webhook.site/403e44ce-5835-4ce9-a648-188a51d9395d', headers={'Content-Type': 'application/json'}, data={'field1': 'data1'})
# response will be <Response [200]>

Exception

from scraperapi_sdk import ScraperAPIClient
from scraperapi_sdk.exceptions import ScraperAPIException

client = ScraperAPIClient(
    api_key=api_key,
)
try:
    result = client.post('https://webhook.site/403e44ce-5835-4ce9-a648-188a51d9395d', headers={'Content-Type': 'application/x-www-form-urlencoded'}, data={'field1': 'data1'})
    _ = result
except ScraperAPIException as e:
    print(e.original_exception)  # you can access the original exception via `.original_exception` property.

scrapyGet

To prepare a URL for scrapy you can use client.scrapyGet method.

client.scrapyGet(url,
          headers={"header1": "value1"},
          country_code="us",
          premium=False,
          render=True,
          session_number=2772728518147,
          autoparse=None,
          )

All of the parameters except url are optional.

Full example:

import scrapy
import os
from pathlib import Path
from scraperapi_sdk import ScraperAPIClient

client = ScraperAPIClient(
    api_key=os.getenv("SCRAPERAPI_API_KEY"),
)
class ExampleSpider(scrapy.Spider):
    name = ""

    async def start(self):
        urls = [
            "https://example.com/",
        ]
        for url in urls:
            yield scrapy.Request(url=client.scrapyGet(url, render=True, ), callback=self.parse)

    def parse(self, response):
        page = response.url.split("/")[-2]
        filename = f"example-{page}.html"
        Path(filename).write_bytes(response.body)
        self.log(f"Saved file {filename}")

Structured Data Collection Methods

Amazon Endpoints

Amazon Product Page API

This method will retrieve product data from an Amazon product page and transform it into usable JSON.

result = client.amazon.product("<ASIN>")
result = client.amazon.product("<ASIN>", country="us", tld="com")

Read more in docs: Amazon Product Page API

Amazon Search API

This method will retrieve products for a specified search term from Amazon search page and transform it into usable JSON.

result = client.amazon.search("<QUERY>")
result = client.amazon.search("<QUERY>", country="us", tld="com")

Read more in docs: Amazon Search API

Amazon Offers API

This method will retrieve offers for a specified product from an Amazon offers page and transform it into usable JSON.

result = client.amazon.offers("<ASIN>")
result = client.amazon.offers("<ASIN>", country="us", tld="com")

Read more in docs: Amazon Offers API

Amazon Prices API

This method will retrieve product prices for the given ASINs and transform it into usable JSON.

result = client.amazon.prices(['<ASIN1>'])
result = client.amazon.prices(['<ASIN1>', '<ASIN2>'])
result = client.amazon.prices(['<ASIN1>', '<ASIN2>'], country="us", tld="com")

Read more in docs: Amazon Prices API

Ebay API

Ebay Search API

This endpoint will retrieve products for a specified search term from Ebay search page and transform it into usable JSON.

result = client.ebay.search("<QUERY>")
result = client.ebay.search2("<QUERY>", country="us", tld="com") // newest version with additional data

Read more in docs: Ebay Search API

Ebay Product API

This endpoint will retrieve product data from an Ebay product pages (/itm/) and transform it into usable JSON.

result = client.ebay.product("<PRODUCT_ID>")
result = client.ebay.product("<PRODUCT_ID>", country="us", tld="com")

Read more in docs: Ebay Product API

Google API

Google SERP API

This method will retrieve product data from an Google search result page and transform it into usable JSON.

result = client.google.search('free hosting')
result = client.google.search('free hosting', country="us", tld="com")

Read more in docs: Google SERP API

Google News API

This method will retrieve news data from an Google news result page and transform it into usable JSON.

result = client.google.news('tornado')
result = client.google.news('tornado', country="us", tld="com")

Read more in docs: Google News API

Google Jobs API

This method will retrieve jobs data from an Google jobs result page and transform it into usable JSON.

result = client.google.jobs('Senior Software Developer')
result = client.google.jobs('Senior Software Developer', country="us", tld="com")

Read more in docs: Google Jobs API

Google Shopping API

This method will retrieve shopping data from an Google shopping result page and transform it into usable JSON.

result = client.google.shopping('macbook air')
result = client.google.shopping('macbook air', country="us", tld="com")

Read more in docs: Google Shopping API

Redfin API

Redfin Agent Details API

This endpoint retrieves information and details from a Redfin Agent's page or a Redfin Partner Agents page and transforms it into usable JSON.

result = client.redfin.agent("<URL>")
result = client.redfin.agent("<URL>", country="us", tld="com")

Read more in docs: Redfin Agent Details API

Redfin 'For Rent' Listings API

This endpoint will retrieve listing information from a single 'For Rent' property listing page and transform it into usable JSON.

result = client.redfin.forrent("<URL>")
result = client.redfin.forrent("<URL>", country="us", tld="com")

Read more in docs: Redfin For Rent Listings API

Redfin 'For Sale' Listings API

This endpoint will retrieve listing information from a single 'For Sale' property listing page and transform it into usable JSON.

result = client.redfin.forsale("<URL>")
result = client.redfin.forsale("<URL>", country="us", tld="com")

Read more in docs: Redfin For Sale Listings API

Redfin Listing Search API

This endpoint will return the search results from a listing search page and transform it into usable JSON.

result = client.redfin.search("<URL>")
result = client.redfin.search("<URL>", country="us", tld="com")

Read more in docs: Redfin Listing Search API

Walmart API

Walmart Search API

This method will retrieve product list data from Walmart as a result of a search.

result = client.walmart.search('hoodie')
result = client.walmart.search('hoodie', page=2)

Read more in docs: Walmart Search API

Walmart Category API

This method will retrieve Walmart product list for a specified product category.

result = client.walmart.category('5438_7712430_8775031_5315201_3279226')
result = client.walmart.category('5438_7712430_8775031_5315201_3279226', page=2)

Read more in docs: Walmart Category API

Walmart Product API

This method will retrieve Walmart product details for one product.

result = client.walmart.product('5053452213')

Read more in docs: Walmart Product API

Async Scraping

Basic scraping:

from scraperapi_sdk import ScraperAPIAsyncClient, ScraperAPIException

client = ScraperAPIAsyncClient(api_key)
request_id = None
# request async scraping
try:
    job = client.create('https://example.com')
    request_id = job.get('id')
except ScraperAPIException as e:
    print(e.original_exception)

# if job was submitted successfully we can request the result of scraping 

if request_id:
    result = client.get(request_id)

Read more in docs: How to use Async Scraping

Webhook Callback

from scraperapi_sdk import ScraperAPIAsyncClient, ScraperAPIException

client = ScraperAPIAsyncClient(api_key)
request_id = None
# request async scraping
try:
    job = client.create('https://example.com', webhook_url="https://webhook.site/#!/view/c4facc6e-c028-4d9c-9f58-b14c92a381fe")
    request_id = job.get('id')
except ScraperAPIException as e:
    print(e.original_exception)

# if job was submitted successfully we can request the result of scraping 

if request_id:
    result = client.get(request_id)

Wait for results

You can use wait method which will poll ScraperAPI for result until its ready.

Use client.wait

Arguments: request_id (required): ID returned from client.create call cooldown (optional, default=5): number of seconds between retries max_retries (optional, default=10): Maximum number of retries raise_for_exceeding_max_retries (optional, default=False): If True will raise exception when reached max_retries, else returns the response from the API.

from scraperapi_sdk import ScraperAPIAsyncClient, ScraperAPIException

client = ScraperAPIAsyncClient(api_key)
request_id = None
# request async scraping
try:
    job = client.create('https://example.com')
    request_id = job.get('id')
except ScraperAPIException as e:
    print(e.original_exception)

# if job was submitted successfully we can request the result of scraping 

if request_id:
    result = client.wait(
        request_id,
        cooldown=5,
        max_retries=10,
        raise_for_exceeding_max_retries=False,
    )

Amazon Async Scraping

Amazon Product

Scrape a single Amazon Product asynchronously:

from scraperapi_sdk import ScraperAPIAsyncClient, ScraperAPIException

client = ScraperAPIAsyncClient('<api_key>')
request_id = None
try:
    job = client.amazon.product('B0CHVR5K7C')
    request_id = job.get('id')
except ScraperAPIException as e:
    print(e.original_exception)
result = client.get(request_id)

Single Product with params:

from scraperapi_sdk import ScraperAPIAsyncClient, ScraperAPIException

client = ScraperAPIAsyncClient('<api_key>')
request_id = None
try:
    job = client.amazon.product('B0B5PLT7FZ', api_params=dict(country_code='uk', tld='co.uk'))
    request_id = job.get('id')
except ScraperAPIException as e:
    print(e.original_exception)
result = client.get(request_id)

Scrape multiple Amazon Products asynchronously with params:

from scraperapi_sdk import ScraperAPIAsyncClient, ScraperAPIException

client = ScraperAPIAsyncClient('<api_key>')
request_id = None
try:
    job = client.amazon.products(['B0B5PLT7FZ', 'B00CL6353A'], api_params=dict(country_code='uk', tld='co.uk'))
    request_id = job.get('id')
except ScraperAPIException as e:
    print(e.original_exception)
result = client.get(request_id)

Read more in docs: Async Amazon Product Scraping

Amazon Search

Search Amazon asynchronously

from scraperapi_sdk import ScraperAPIAsyncClient, ScraperAPIException

client = ScraperAPIAsyncClient('<api_key>')
request_id = None
try:
    job = client.amazon.search('usb c microphone')
    request_id = job.get('id')
except ScraperAPIException as e:
    print(e.original_exception)
result = client.get(request_id)

Search Amazon asynchronously with api_params

from scraperapi_sdk import ScraperAPIAsyncClient, ScraperAPIException

client = ScraperAPIAsyncClient('<api_key>')
request_id = None
try:
    job = client.amazon.search('usb c microphone', api_params=dict(country_code='uk', tld='co.uk')
    request_id = job.get('id')
except ScraperAPIException as e:
    print(e.original_exception)
result = client.get(request_id)

Read more in docs: Amazon Review Scraping Async

Amazon Offers for a Product

Scrape Amazon offers for a single product

from scraperapi_sdk import ScraperAPIAsyncClient, ScraperAPIException

client = ScraperAPIAsyncClient('<api_key>')
request_id = None
try:
    job = client.amazon.offers('B0CHVR5K7C')
    request_id = job.get('id')
except ScraperAPIException as e:
    print(e.original_exception)
result = client.get(request_id)

Scrape Amazon offers for multiple products

from scraperapi_sdk import ScraperAPIAsyncClient, ScraperAPIException

client = ScraperAPIAsyncClient('<api_key>')
request_id = None
try:
    jobs = client.amazon.offers('B0CHVR5K7C')
except ScraperAPIException as e:
    print(e.original_exception)
for job in jobs:
    result = client.get(job.get('id'))

Amazon Reviews

Scrape Reviews for a single product asynchronously:

from scraperapi_sdk import ScraperAPIAsyncClient, ScraperAPIException

client = ScraperAPIAsyncClient('<api_key>')
request_id = None
try:
    job = client.amazon.product('B0B5PLT7FZ'], api_params=dict(country_code='uk', tld='co.uk'))
    request_id = job.get('id')
except ScraperAPIException as e:
    print(e.original_exception)
result = client.get(request_id)

Scrape reviews for multiple products asynchronously:

from scraperapi_sdk import ScraperAPIAsyncClient, ScraperAPIException

client = ScraperAPIAsyncClient('<api_key>')
try:
    jobs = client.amazon.products(['B0B5PLT7FZ', 'B00CL6353A'], api_params=dict(country_code='uk', tld='co.uk'))
except ScraperAPIException as e:
    print(e.original_exception)
for job in jobs:
    result = client.get(job.get('id'))

Read more in docs: Amazon Review Scraping Async

Google Async Scraping

Google Async Search Scraping

from scraperapi_sdk import ScraperAPIAsyncClient, ScraperAPIException

client = ScraperAPIAsyncClient('<api_key>')
try:
    jobs = client.google.search('solar eclipse')
except ScraperAPIException as e:
    print(e.original_exception)
for job in jobs:
    result = client.get(job.get('id'))

Read more in docs: Google Search API (Async)

Google Async News Scraping

from scraperapi_sdk import ScraperAPIAsyncClient, ScraperAPIException

client = ScraperAPIAsyncClient('<api_key>')
try:
    jobs = client.google.news('solar eclipse')
except ScraperAPIException as e:
    print(e.original_exception)
for job in jobs:
    result = client.get(job.get('id'))

Read more in docs: Google News API (Async)

Google Async Jobs Scraping

from scraperapi_sdk import ScraperAPIAsyncClient, ScraperAPIException

client = ScraperAPIAsyncClient('<api_key>')
try:
    jobs = client.google.jobs('senior software developer')
except ScraperAPIException as e:
    print(e.original_exception)
for job in jobs:
    result = client.get(job.get('id'))

Read more in docs: Google Jobs API (Async)

Google Async Shopping Scraping

from scraperapi_sdk import ScraperAPIAsyncClient, ScraperAPIException

client = ScraperAPIAsyncClient('<api_key>')
try:
    jobs = client.google.shopping('usb c microphone')
except ScraperAPIException as e:
    print(e.original_exception)
for job in jobs:
    result = client.get(job.get('id'))

Read more in docs: Google Shopping API (Async)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scraperapi_sdk-1.6.0.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scraperapi_sdk-1.6.0-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file scraperapi_sdk-1.6.0.tar.gz.

File metadata

  • Download URL: scraperapi_sdk-1.6.0.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.13.5 Darwin/25.0.0

File hashes

Hashes for scraperapi_sdk-1.6.0.tar.gz
Algorithm Hash digest
SHA256 ffc55b78f64056968060ebcbc87420aac895720127dac943bd4c05eb3be8dd45
MD5 500003f911cd480d84f2d1ca4bcbd5b2
BLAKE2b-256 2e3d1d96df2b260a29f51cba385818086b1d8d09d68c27a8053354ed1d80bf85

See more details on using hashes here.

File details

Details for the file scraperapi_sdk-1.6.0-py3-none-any.whl.

File metadata

  • Download URL: scraperapi_sdk-1.6.0-py3-none-any.whl
  • Upload date:
  • Size: 8.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.13.5 Darwin/25.0.0

File hashes

Hashes for scraperapi_sdk-1.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 03346570ad70844247ec903f0224d6b28e907088839ce4aca8a5ac60e7551bfd
MD5 5ed204b4a201cb9c18749e69da74727d
BLAKE2b-256 79e70be7363384f6fa443c4cf8338876f019883b86c086ba20b72bcf30d22a41

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page