32 projects
Scrapy
A high-level Web Crawling and Web Scraping framework
web-poet
Zyte's Page Object pattern for web scraping
scrapyd-client
A client for Scrapyd
scrapy-poet
Page Object pattern for Scrapy
scrapyd
A service for running Scrapy spiders, with an HTTP API
itemloaders
Base library for scrapy's ItemLoader
scrapy-zyte-smartproxy
Scrapy middleware for Zyte Smart Proxy Manager
form2request
Build HTTP requests out of HTML forms
w3lib
Library of web-related functions
itemadapter
Common interface for data container classes
queuelib
Collection of persistent (disk-based) and non-persistent (memory-based) queues
parsel
Parsel is a library to extract data from HTML and XML using XPath and CSS selectors
Protego
Pure-Python robots.txt parser with support for modern conventions
xtractmime
Implementation of the MIME Sniffing standard (https://mimesniff.spec.whatwg.org/)
andi
Library for annotation-based dependency injection
scrapy-splash
JavaScript support for Scrapy using Splash
cssselect
cssselect parses CSS3 Selectors and translates them to XPath 1.0
scrapy-deltafetch
Scrapy middleware to ignore previously crawled pages
splash
A javascript rendered with a HTTP API
scrapely
A pure-python HTML screen-scraping library
scrapy-po
Page Object pattern for Scrapy
webstruct
A library for creating statistical NER systems that work on HTML data
PyPyDispatcher
Multi-producer-multi-consumer signal dispatching mechanism
adblockparser
Parser for Adblock Plus rules
loginform
Fill HTML login forms automatically
scrapy-splitvariants
Scrapy spider middleware to split an item into multiple items on a multi-valued key
scrapy-hcf
Scrapy spider middleware to use Scrapinghub's Hub Crawl Frontier as a backend for URLs
scrapy-querycleaner
Scrapy spider middleware to clean up query parameters in request URLs
scrapy-magicfields
Scrapy middleware to add extra "magic" fields to items
scrapy-djangoitem
Scrapy extension to write scraped items using Django models
scrapyjs
JavaScript support for Scrapy using Splash
scrapy-jsonrpc
Scrapy extenstion to control spiders using JSON-RPC