Skip to main content

Helpers and examples to build Scrapy Crawlers in a test driven way.

Project description

https://travis-ci.org/rrschmidt/scrapy_tdd.svg?branch=master Test Coverage

scrapy_tdd

Helpers and examples to build Scrapy Crawlers in a test driven way.

Motivation / Why should I develop Scrapy Crawlers using TDD?

  1. The develop - test cycle goes down to a few seconds and so it allows you to get a properly working scraper up much faster

  2. When bugs are discovered in “the wild” with real data, new example files, a test and a fix can be created and tested much faster

  3. It allows for fast refactoring without breaking anything - which results in much cleaner scraper code

  4. It just feels right when you are used to be doing TDD

What’s the difference to Scrapy’s Spiders Contracts?

Scrapy has its own builtin testing feature named Spiders Contracts

I tried to use them for some time, but decided to build real unit tests in a unit test framework like py.test because of these shortcomings:

  • its philosophy is geared towards testing against contracts (thus the name) that by nature are more broad and less specific concepts. Testing for exact field contents in items can be done, but is difficult and fragile

  • its documentation and basic set of features is a bit thin

  • it mixes implementation code with contract descriptions which is only usable when there are few and simple contracts

Installation

pip install scrapy_tdd

Quick Start Examples

def describe_fancy_spider():

to_test = MySpider().from_crawler(get_crawler())

def describe_parse_suggested_terms():

resp = response_from(“Result_JSON_Widget.txt”) results = to_test.parse(resp)

def should_get_item():

item = results assert item[0][“lorem”] == ‘ipsum’ assert item[0][“iterem”] == “ipsem”

Full Documentation

… coming soon …

Missing / next steps

  • Mocking Request-Response pairs

How to contribute

… coming soon …

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_tdd-0.2.0.tar.gz (18.2 kB view details)

Uploaded Source

File details

Details for the file scrapy_tdd-0.2.0.tar.gz.

File metadata

  • Download URL: scrapy_tdd-0.2.0.tar.gz
  • Upload date:
  • Size: 18.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for scrapy_tdd-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6350b8264a9eac50973a6079fd8e3c0596f361e2d556f5f3f238d3a9c4fdc4a8
MD5 00fdf3766a36080516c0a79d2fbec2a1
BLAKE2b-256 9a47603158fc820fd25ddd43f6abb63161d39b551fdad030ffd0237a9e019594

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page