scrapyrwiki

A collection of helpers for running Scrapy in ScraperWiki

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python
Topic
- Utilities

Project description

A collection of helpers for running scrapers built with Scrapy in ScraperWiki

Launch scraper without scrapy CLI

Example:

from scrapy.conf import settings
from scrapyrwiki import run_spider

def main():
    run_spider(MySpider(), settings)

if __name__ == '__main__':
    main()

Save produced data to ScraperWiki

Just add “scrapyrwiki.pipelines.ScraperWikiPipeline” to ITEM_PIPELINES

Example:

from scrapy.conf import settings
from scrapyrwiki import run_spider

def scraperwiki():
    options = {
        'SW_SAVE_BUFFER': 5,
        'SW_UNIQUE_KEYS': {"MyItem": ['url']},
        'ITEM_PIPELINES': ['scrapyrwiki.pipelines.ScraperWikiPipeline'],
    }
    settings.overrides.update(options)
    run_spider(MySpider(), settings)


if __name__ == 'scraper':
    scraperwiki()

Check spider contracts in CI

Just launch spider with run_tests

Example:

from scrapyrwiki import run_tests
from scrapy.conf import settings

run_tests(MySpider(), "output.xml", settings)

Note: For testing the HTTP cache is used. In the directory where the script is launched there must be a scrapy.cfg (needed by Scrapy to identify that’s a scraper directory) and a .scrapy directory with the HTTP cache db.

The output is in XUnit format, tested on Jenkins

Log scraper errors to Sentry

Install scrapy-sentry and set the environment variable SENTRY_DSN with the Sentry key. Scrapyrwiki will handle everything for you.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python
Topic
- Utilities

Release history Release notifications | RSS feed

This version

0.2

Feb 27, 2013

0.1

Jan 28, 2013

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapyrwiki-0.2.tar.gz (3.5 kB view details)

Uploaded Feb 27, 2013 Source

File details

Details for the file scrapyrwiki-0.2.tar.gz.

File metadata

Download URL: scrapyrwiki-0.2.tar.gz
Upload date: Feb 27, 2013
Size: 3.5 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for scrapyrwiki-0.2.tar.gz
Algorithm	Hash digest
SHA256	`0afe100bdbc403955228309d9942066a13278d99843630a89a0e25ec72f4dcd8`
MD5	`edcec4d73d677c3f89507aebde4d5edd`
BLAKE2b-256	`5669f6486c5083066040f0461e24ae39cc371c43728230cfb9fd91618207b9b7`

See more details on using hashes here.

scrapyrwiki 0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Launch scraper without scrapy CLI

Save produced data to ScraperWiki

Check spider contracts in CI

Log scraper errors to Sentry

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes