Skip to main content

Fundemantal Scrapy support for Scrape.do API

Project description

Scrapydo

Scrapy wrapper for running Scrapy spiders with Scrapedo API.

Install

# get it from github
pip3 install git+https://github.com/scrape-do/scrapy-scrapedo

# or from pypi
pip3 install scrapy-scrapedo

Usage

from scrapydo import scrapy, scrapedo


class ScrapedoSampleCrawler(scrapy.Spider):
    name = "Scrape-do Sample Crawler"
    def __init__(self):
        super().__init__(scrapedo.RequestParameters(
        token="TOKEN", # Get your Scrape.do token from: dashboard.scrape.do
        params={
            "geoCode":"us",
            "super":False,
            "render":True,
            "playWithBrowser":[
                {"Action":"Click","Selector":"#manpage > div.mp > ul > li:nth-child(3) > a"},
                {"Action":"Wait","Timeout":2000},
                {"Action":"Execute","Execute":"document.URL"}
            ],
        }))
        
    def start_requests(self):
        urls = [
            'https://httpbin.co/',
        ]
        
        for url in urls:
            yield self.Request(url=url, callback=self.parse)
    def parse(self, response):
        print(response.body)
        print("target:",self.target_url(response))
            

You can also use the proxy mode to use the Scrape.do proxy service.

from scrapydo import scrapy, scrapedo

class ScrapedoSampleCrawler(scrapy.Spider):
    name = "Scrape-do Sample Crawler"
    def __init__(self):
        super().__init__(scrapedo.RequestParameters(
        token="TOKEN", # Get your Scrape.do token from: dashboard.scrape.do
        params={
            "geoCode":"uk",
            "super":True,
        },
        proxy_mode=True,
        ))
    
    def start_requests(self):
        urls = [
            'https://httpbin.co/headers',
        ]
        
        for url in urls:
            yield self.Request(url=url, callback=self.parse)
    def parse(self, response):
        print(response.body)
        print("target:",self.target_url(response))
        
    

Build

You may prefer to build the package from source code.

pip3 install setuptools wheel
python3 setup.py sdist bdist_wheel

Finally, you can install the package from the generated wheel file.

pip3 install dist/scrapy_scrapedo-0.1.4-py3-none-any.whl

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_scrapedo-0.1.4.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

scrapy_scrapedo-0.1.4-py3-none-any.whl (4.7 kB view details)

Uploaded Python 3

File details

Details for the file scrapy_scrapedo-0.1.4.tar.gz.

File metadata

  • Download URL: scrapy_scrapedo-0.1.4.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.5

File hashes

Hashes for scrapy_scrapedo-0.1.4.tar.gz
Algorithm Hash digest
SHA256 f2894998186f34d762a131c381ef332bc3fa7213ebdada1f6ed7c8911c998c8c
MD5 e2675daac2e2153a73c720d775447b08
BLAKE2b-256 8f06673d7c7adcad09e96fe0d2dfd8a43bba5a81a005250c393fdd02b210e3c7

See more details on using hashes here.

File details

Details for the file scrapy_scrapedo-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapy_scrapedo-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 5b871edfc7b8c685d376d163238f9650fb19bb8547250f493b5b561b8f886002
MD5 9d38edd522672990c02d777a96bc27ed
BLAKE2b-256 fea3bc82a319f070a78967f55d9a9ed384199f2ded0fe5e66be025c448260423

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page