This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

How can I scrape items off a site from the last five days?

—Scrapy User

That question started the development of scrapy-mosquitera, a tool to help you restrict crawling and scraping scope using matchers.

Matchers are simple Python functions that return the validity of an element under certain restrictions.

The first goal in the project was date matching, but you can create your own matcher for your own crawling and scraping needs.

How it works

In the case where the dates are available in the URLs, you will just use the matcher function directly in your code:

from scrapy_mosquitera.matchers import date_matches

 date = scrape_date_from_url(url)

 if date_matches(data=date, after='5 days ago'):
    yield Request(url=url, callback=self.parse_item)

To handle the case when the date is only available at the time when you scrape the items, scrapy-mosquitera provides a PaginationMixin to control the crawl according to the dates scraped.

Head on to the remaining of the documentation for more details.

Installation

The quick way:

pip install scrapy-mosquitera
Release History

Release History

0.1.1

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
scrapy_mosquitera-0.1.1-py2.py3-none-any.whl (8.7 kB) Copy SHA256 Checksum SHA256 py2.py3 Wheel May 19, 2016
scrapy-mosquitera-0.1.1.tar.gz (18.4 kB) Copy SHA256 Checksum SHA256 Source May 19, 2016

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting