Skip to main content

A collection of helpers for running Scrapy in ScraperWiki

Project description

A collection of helpers for running scrapers built with Scrapy in ScraperWiki

Launch scraper without scrapy CLI

Example:

from scrapy.conf import settings
from scrapyrwiki import run_spider

def main():
    run_spider(MySpider(), settings)

if __name__ == '__main__':
    main()

Save produced data to ScraperWiki

Just add “scrapyrwiki.pipelines.ScraperWikiPipeline” to ITEM_PIPELINES

Example:

from scrapy.conf import settings
from scrapyrwiki import run_spider

def scraperwiki():
    options = {
        'SW_SAVE_BUFFER': 5,
        'SW_UNIQUE_KEYS': {"MyItem": ['url']},
        'ITEM_PIPELINES': ['scrapyrwiki.pipelines.ScraperWikiPipeline'],
    }
    settings.overrides.update(options)
    run_spider(MySpider(), settings)


if __name__ == 'scraper':
    scraperwiki()

Check spider contracts in CI

Just launch spider with run_tests

Example:

from scrapyrwiki import run_tests
from scrapy.conf import settings

run_tests(MySpider(), "output.xml", settings)

Note: For testing the HTTP cache is used. In the directory where the script is launched there must be a scrapy.cfg (needed by Scrapy to identify that’s a scraper directory) and a .scrapy directory with the HTTP cache db.

The output is in XUnit format, tested on Jenkins

Log scraper errors to Sentry

Install scrapy-sentry and set the environment variable SENTRY_DSN with the Sentry key. Scrapyrwiki will handle everything for you.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapyrwiki-0.2.tar.gz (3.5 kB view details)

Uploaded Source

File details

Details for the file scrapyrwiki-0.2.tar.gz.

File metadata

  • Download URL: scrapyrwiki-0.2.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for scrapyrwiki-0.2.tar.gz
Algorithm Hash digest
SHA256 0afe100bdbc403955228309d9942066a13278d99843630a89a0e25ec72f4dcd8
MD5 edcec4d73d677c3f89507aebde4d5edd
BLAKE2b-256 5669f6486c5083066040f0461e24ae39cc371c43728230cfb9fd91618207b9b7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page