Skip to main content
Avatar for scrapinghub from gravatar.com

  scrapinghub

  Joined on Nov 25, 2013

34 projects

scrapy-crawlera

Last released on Sep 20, 2018

Crawlera middleware for Scrapy

extruct

Last released on Aug 23, 2018

Extract embedded metadata from HTML markup

shub

Last released on Aug 20, 2018

Scrapinghub Command Line Client

frontera

Last released on Jul 30, 2018

A scalable frontier for web crawlers

portia2code

Last released on Mar 27, 2018

Convert portia spider definitions to python scrapy spiders

splash

Last released on Feb 15, 2018

A javascript rendered with a HTTP API

dateparser

Last released on Feb 8, 2018

Date parsing library designed to parse dates from HTML pages

webstruct

Last released on Dec 29, 2017

A library for creating statistical NER systems that work on HTML data

scrapinghub

Last released on Dec 8, 2017

Client interface for Scrapinghub API

js2xml

Last released on Aug 3, 2017

Convert Javascript code to XML document

scrapinghub-entrypoint-scrapy

Last released on Jul 28, 2017

Scrapy entrypoint for Scrapinghub job runner

PyPyDispatcher

Last released on Jul 3, 2017

Multi-producer-multi-consumer signal dispatching mechanism

slybot

Last released on Jun 28, 2017

Slybot crawler

exporters

Last released on Aug 22, 2016

Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations.

scrapyrt

Last released on Apr 18, 2017

Put Scrapy spiders behind an HTTP API

scrapy-deltafetch

Last released on Feb 9, 2017

Scrapy middleware to ignore previously crawled pages

scrapy-jsonschema

Last released on Jan 20, 2017

Scrapy schema validation pipeline and Item builder using JSON Schema

page_finder

Last released on Jan 12, 2017

hubstorage

Last released on Dec 5, 2016

Client interface for Scrapinghub HubStorage

scrapylib

Last released on Nov 14, 2016

Scrapy helper functions and processors

shub-image

Last released on Oct 28, 2016

Scrapinghub release tool

adblockparser

Last released on Oct 17, 2016

Parser for Adblock Plus rules

scrapy-splitvariants

Last released on Jul 18, 2016

Scrapy spider middleware to split an item into multiple items on a multi-valued key

scrapy-hcf

Last released on Jul 18, 2016

Scrapy spider middleware to use Scrapinghub's Hub Crawl Frontier as a backend for URLs

scrapy-querycleaner

Last released on Jun 30, 2016

Scrapy spider middleware to clean up query parameters in request URLs

scrapy-magicfields

Last released on Jun 30, 2016

Scrapy middleware to add extra "magic" fields to items

page_clustering

Last released on May 31, 2016

Online k-means clustering of web pages

scrapy-mosquitera

Last released on May 19, 2016

Restrict crawl and scraping scope using matchers.

skinfer

Last released on Oct 8, 2015

Simple tool to merge JSON schemas

flatson

Last released on Sep 25, 2015

Tool to flatten stream of JSON-like objects, configured via schema

crawl-frontier

Last released on Jun 26, 2015

A flexible frontier for web crawlers

aduana

Last released on Jun 22, 2015

Bindings for Aduana library

wappalyzer-python

Last released on Feb 26, 2015

Python wrapper for Wappalyzer (utility that uncovers the technologies used on websites)

scrapy-streamitem

Last released on Dec 23, 2014

Scrapy support for working with streamcorpus Stream Items

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page