Skip to main content
Avatar for scrapinghub from gravatar.com
Username    scrapinghub
Date joined   Joined on

46 projects

itemadapter

Last released on

Common interface for data container classes

itemloaders

Last released on

Base library for scrapy's ItemLoader

scrapy-jsonschema

Last released on

Scrapy schema validation pipeline and Item builder using JSON Schema

scrapy-poet

Last released on

Page Object pattern for Scrapy

web-poet

Last released on

Scrapinghub's Page Object pattern for web scraping

andi

Last released on

Library for annotation-based dependency injection

extruct

Last released on

Extract embedded metadata from HTML markup

scrapinghub-autoextract

Last released on

Python interface to Scrapinghub Automatic Extraction API

scrapy-crawlera

Last released on

Crawlera middleware for Scrapy

scrapinghub

Last released on

Client interface for Scrapinghub API

dateparser

Last released on

Date parsing library designed to parse dates from HTML pages

scrapy-headless

Last released on

Download Handler for using Scrapy with headless browsers

price-parser

Last released on

Extract price and currency from a raw string

scrapy-autoextract

Last released on

Scrapinghub AutoExtract API integration for Scrapy

splash

Last released on

A javascript rendered with a HTTP API

spidermon

Last released on

Spidermon is a framework to build monitors for Scrapy spiders.

shub

Last released on

Scrapinghub Command Line Client

scrapyrt

Last released on

Put Scrapy spiders behind an HTTP API

scrapy-po

Last released on

Page Object pattern for Scrapy

scrapinghub-entrypoint-scrapy

Last released on

Scrapy entrypoint for Scrapinghub job runner

arche

Last released on

Analyze Scrapy Cloud data

slybot

Last released on

Slybot crawler

frontera

Last released on

A scalable frontier for web crawlers

portia2code

Last released on

Convert portia spider definitions to python scrapy spiders

webstruct

Last released on

A library for creating statistical NER systems that work on HTML data

js2xml

Last released on

Convert Javascript code to XML document

PyPyDispatcher

Last released on

Multi-producer-multi-consumer signal dispatching mechanism

exporters

Last released on

Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations.

scrapy-deltafetch

Last released on

Scrapy middleware to ignore previously crawled pages

page_finder

Last released on

hubstorage

Last released on

Client interface for Scrapinghub HubStorage

scrapylib

Last released on

Scrapy helper functions and processors

shub-image

Last released on

Scrapinghub release tool

adblockparser

Last released on

Parser for Adblock Plus rules

scrapy-splitvariants

Last released on

Scrapy spider middleware to split an item into multiple items on a multi-valued key

scrapy-hcf

Last released on

Scrapy spider middleware to use Scrapinghub's Hub Crawl Frontier as a backend for URLs

scrapy-querycleaner

Last released on

Scrapy spider middleware to clean up query parameters in request URLs

scrapy-magicfields

Last released on

Scrapy middleware to add extra "magic" fields to items

page_clustering

Last released on

Online k-means clustering of web pages

scrapy-mosquitera

Last released on

Restrict crawl and scraping scope using matchers.

skinfer

Last released on

Simple tool to merge JSON schemas

flatson

Last released on

Tool to flatten stream of JSON-like objects, configured via schema

crawl-frontier

Last released on

A flexible frontier for web crawlers

aduana

Last released on

Bindings for Aduana library

wappalyzer-python

Last released on

Python wrapper for Wappalyzer (utility that uncovers the technologies used on websites)

scrapy-streamitem

Last released on

Scrapy support for working with streamcorpus Stream Items

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page