36 projects
itemloaders
Base library for scrapy's ItemLoader
queuelib
Collection of persistent (disk-based) and non-persistent (memory-based) queues
Protego
Pure-Python robots.txt parser with support for modern conventions
parsel
Parsel is a library to extract data from HTML and XML using XPath and CSS selectors
w3lib
Library of web-related functions
cssselect
cssselect parses CSS3 Selectors and translates them to XPath 1.0
scrapy-lint
A linter for Scrapy projects
web-poet
Zyte's Page Object pattern for web scraping
scrapy-poet
Page Object pattern for Scrapy
Scrapy
A high-level Web Crawling and Web Scraping framework
itemadapter
Common interface for data container classes
andi
Library for annotation-based dependency injection
sphinx-scrapy
Sphinx extension for documentation in the Scrapy ecosystem
scrapyd
A service for running Scrapy spiders, with an HTTP API
scrapyd-client
A client for Scrapyd
scrapy-zyte-smartproxy
Scrapy middleware for Zyte Smart Proxy Manager
scrapy-deltafetch
Scrapy middleware to ignore previously crawled pages
scrapy-feedexporter-sftp
Scrapy extension Feed Exporter Storage Backend to export items to an SFTP server
scrapy-splash
JavaScript support for Scrapy using Splash
form2request
Build HTTP requests out of HTML forms
xtractmime
Implementation of the MIME Sniffing standard (https://mimesniff.spec.whatwg.org/)
splash
A javascript rendered with a HTTP API
scrapely
A pure-python HTML screen-scraping library
scrapy-po
Page Object pattern for Scrapy
flake8-scrapy
webstruct
A library for creating statistical NER systems that work on HTML data
PyPyDispatcher
Multi-producer-multi-consumer signal dispatching mechanism
adblockparser
Parser for Adblock Plus rules
loginform
Fill HTML login forms automatically
scrapy-splitvariants
Scrapy spider middleware to split an item into multiple items on a multi-valued key
scrapy-hcf
Scrapy spider middleware to use Scrapinghub's Hub Crawl Frontier as a backend for URLs
scrapy-querycleaner
Scrapy spider middleware to clean up query parameters in request URLs
scrapy-magicfields
Scrapy middleware to add extra "magic" fields to items
scrapy-djangoitem
Scrapy extension to write scraped items using Django models
scrapyjs
JavaScript support for Scrapy using Splash
scrapy-jsonrpc
Scrapy extenstion to control spiders using JSON-RPC