Skip to main content

Scrapy middleware for Scraping.link

Project description

Scrapy ScrapingLink Middleware

Acknowledgements

Thanks to arimbr and ScrapingBee, this is adaptation of their work.

Installation

pip install scrapy-scraping-link

Configuration

Add your ScrapingLink_API_KEY and the ScrapingLinkMiddleware to your project settings.py. Don't forget to set CONCURRENT_REQUESTS according to your ScrapingLink plan.

SCRAPINGLINK_API_KEY = 'REPLACE-WITH-YOUR-API-KEY'

DOWNLOADER_MIDDLEWARES = {
    'scrapy_scraping_link.ScrapingLinkMiddleware': 700,
}

CONCURRENT_REQUESTS = 1

Usage

Inherit your spiders from ScrapingLinkSpider and yield a ScrapingLinkRequest.

Below you can see an example from the spider in parascrapear.py.

from scrapy import Spider
from scrapy_scraping_link import ScrapingLinkSpider, ScrapingLinkRequest


class ParascrapearSpider(Spider):
    name = 'parascrapear'
    allowed_domains = ['parascrapear.com']
    start_urls = ['http://parascrapear.com/']

    def parse(self, response):
        print('Parseando ' + response.url)       
        
        next_urls = response.css('a::attr(href)').getall()
        for next_url in next_urls:
            if next_url is not None:
                yield ScrapingLinkRequest(response.urljoin(next_url))
        
        sentences = response.css('q::text').getall()
        for sentence in sentences:
            print(sentence)

You can pass ScrapingLink parameters in the params argument of a ScrapingLinkRequest. Headers and cookies are passed like a normal Scrapy Request. ScrapingLinkRequests formats all parameters, headers and cookies to the format expected by the API.

Examples

Add your API key to settings.py.

To run the examples you need to clone this repository. In your terminal, go to examples/parascrapear/parascrapear and run the example spider with:

scrapy runspider parascrapear.py

Customer Support

Simply reach out to us via Telegram Group or or write us an email.

Sign up for our free plan to get a free API key loaded with 100 free credits. No credit card required!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-scraping-link-0.0.8.tar.gz (4.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page