scrapy output testing framework
Project description
scrapy-test
Scrapy test is a validation/test framework for validating scrapy results. This framework is capable of testing scrapy crawl and stats output.
See example project for hackernews crawler with full test suite.
Philosophy and Architecture
scrapy-test
tries to replicate scrapy.Item
definition but instead of defining fields it defines test for every field.
Tests are callables that either returns a failure message if some condition is met.
Example item specification:
class MyItem(Item):
name = Field()
url = Field()
class TestMyItem(ItemSpec):
item_cls = MyItem
# define tests
name_test = Match('some-regex-pattern')
url_test = lamda v: 'bad url' if 'cat' in v else ''
# define coverage
url_cov = 100 # 100% - every item should have url field
scrapy-test
also supports stats output validation. When scrapy finished crawling it outputs various stats like error count etc. StatSpec
can be defined to validate these stats:
class MyStats(StatsSpec):
spider_cls = MySpder1, MySpider2
# or multiple spiders
validation = { #stat_name_pattern : tests
'item_scraped_count': MoreThan(1),
'downloader/response_status_count/50\d': LessThan(1),
}
# required stat keys
required = ['stat_pattern.+']
Finally scrapy-test
determines failure by asserting if there are any messages generated by either stat ir item specifications (exit code 1 and 0 respectively).
Usage
Setup
-
test.py
module should be created in spider directory.
For example creatingtest.py
scrapy-test-example/ ├── example │ ├── __init__.py │ └── test.py └── scrapy.cfg
-
Add test file config to
scrapy.cfg
:[settings] default = example.settings [test] root = example.test
-
Define
ItemSpec
for item field validation:from scrapytest.tests import Match, Equal, Type, MoreThan, Map, Len, Required class TestPost(ItemSpec): # defining item that is being covered item_cls = PostItem # defining field tests title_test = Match('.{5,}') points_test = Type(int), MoreThan(0) author_test = Type(str), Match('.{3}') comments_test = Type(list), Required() # also supports methods! def url_test(self, value: str): if not value.startswith('http'): return f'Invalid url: {value}' return ''
ItemSpec
class should contain attributes that end in_test
these attributes have be callables (functions, methods etc.) that return message(s) if failure is encountered. See theurl_test
example above. -
Define
StatSpec
for crawl stats validation:class TestStats(StatsSpec): # stat pattern: test functions validate = { # this is default 'log_count/ERROR$': LessThan(1), 'item_scraped_count': MoreThan(1), 'finish_reason': Match('finished'), } # these stats shoudl be required required = ['some_cool_stat']
StatsSpec
should containvalidate
attribute withpattern: tests
dictionary. -
Define
Spider
classes:from project.spiders import HackernewsSpider class TestHackernewsSpider(HackernewsSpider): test_urls = [ "https://news.ycombinator.com/item?id=19187417", ] def start_requests(self): for url in self.test_urls: yield Request(url, self.parse_submission)
This spider should extend your production spider that simply crawls the urls without doing discovery. Alternatively you can also not extend anything for live testing.
Running
$ scrapy-test --help
Usage: scrapy-test [OPTIONS] [SPIDER_NAME]
run scrapy-test tests and output messages and appropriate exit code (1 for
failed, 0 for passed)
Options:
--cache enable HTTPCACHE_ENABLED setting for this run
--help Show this message and exit.
To run the tests use cli command:
$ scrapy-test <spider_name>
Spider name can be skipped for running all spiders
Notifications
scrapy-test
supports notification hooks on either test failure or success:
--notify-on-error TEXT send notification on failure, choice from:
['slack']
--notify-on-all TEXT send notification on failure or success, choice
from: ['slack']
--notify-on-success TEXT send notification on success, choice from:
['slack']
Right scrapy-test
offers these notifiers:
* Slack - to configure slack notification follow slack [incoming webhooks](https://slack.com/apps/A0F7XDUAZ-incoming-webhooks) app and supply these settings in `scrapy.cfg`:
slack_url = https://hooks.slack.com/services/AAA/BBB/CCC
# where the message goes to
slack_channel = #cats
# bot's name
slack_username = bender
# bot's avatar
slack_icon_emoji = :bender:
# maintainer will be mentioned on error
slack_maintainer = @bernard
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scrapy_test-0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6cf5970f4ae7c797cc922c6a7c0557f1e5a00b060b180a2104b5e9d1bebc6615 |
|
MD5 | 8e22c3b10f3d9092f4808efbc9335688 |
|
BLAKE2b-256 | 34cc774c9b27c579ffb9a6243203362d201e46910811caf9e9ff05dad3084b22 |