48 projects
extruct
Extract embedded metadata from HTML markup
spidermon
Spidermon is a framework to build monitors for Scrapy spiders.
web-poet
Zyte's Page Object pattern for web scraping
scrapy-poet
Page Object pattern for Scrapy
itemloaders
Base library for scrapy's ItemLoader
scrapinghub-entrypoint-scrapy
Scrapy entrypoint for Scrapinghub job runner
itemadapter
Common interface for data container classes
scrapyrt
Put Scrapy spiders behind an HTTP API
shub
Scrapinghub Command Line Client
andi
Library for annotation-based dependency injection
dateparser
Date parsing library designed to parse dates from HTML pages
number-parser
parse numbers written in natural language
js2xml
Convert Javascript code to XML document
scrapinghub
Client interface for Scrapinghub API
autoextract-poet
web-poet definitions for AutoExtract API
scrapy-deltafetch
Scrapy middleware to ignore previously crawled pages
scrapy-autoextract
Zyte Automatic Extraction API integration for Scrapy
scrapy-jsonschema
Scrapy schema validation pipeline and Item builder using JSON Schema
scrapinghub-autoextract
Python interface to Scrapinghub Automatic Extraction API
scrapy-crawlera
Crawlera middleware for Scrapy
price-parser
Extract price and currency from a raw string
splash
A javascript rendered with a HTTP API
scrapy-headless
Download Handler for using Scrapy with headless browsers
scrapy-po
Page Object pattern for Scrapy
arche
Analyze Scrapy Cloud data
slybot
Slybot crawler
frontera
A scalable frontier for web crawlers
portia2code
Convert portia spider definitions to python scrapy spiders
webstruct
A library for creating statistical NER systems that work on HTML data
PyPyDispatcher
Multi-producer-multi-consumer signal dispatching mechanism
exporters
Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations.
page_finder
hubstorage
Client interface for Scrapinghub HubStorage
scrapylib
Scrapy helper functions and processors
shub-image
Scrapinghub release tool
adblockparser
Parser for Adblock Plus rules
scrapy-splitvariants
Scrapy spider middleware to split an item into multiple items on a multi-valued key
scrapy-hcf
Scrapy spider middleware to use Scrapinghub's Hub Crawl Frontier as a backend for URLs
scrapy-querycleaner
Scrapy spider middleware to clean up query parameters in request URLs
scrapy-magicfields
Scrapy middleware to add extra "magic" fields to items
page_clustering
Online k-means clustering of web pages
scrapy-mosquitera
Restrict crawl and scraping scope using matchers.
skinfer
Simple tool to merge JSON schemas
flatson
Tool to flatten stream of JSON-like objects, configured via schema
crawl-frontier
A flexible frontier for web crawlers
aduana
Bindings for Aduana library
wappalyzer-python
Python wrapper for Wappalyzer (utility that uncovers the technologies used on websites)
scrapy-streamitem
Scrapy support for working with streamcorpus Stream Items