This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
scrapinghub

scrapinghub

Joined on Nov 25, 2013

Projects

adblockparser

Last released on Oct 17, 2016

Parser for Adblock Plus rules

aduana

Last released on Jun 22, 2015

Bindings for Aduana library

crawl-frontier

Last released on Jun 26, 2015

A flexible frontier for web crawlers

dateparser

Last released on Sep 26, 2016

Date parsing library designed to parse dates from HTML pages

distributed-frontera

Last released on Nov 12, 2015

[deprecated] Distributed version of Frontera, flexible frontier for web crawlers

exporters

Last released on Aug 22, 2016

Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations.

extruct

Last released on Dec 2, 2016

Extract embedded metadata from HTML markup

flatson

Last released on Sep 25, 2015

Tool to flatten stream of JSON-like objects, configured via schema

frontera

Last released on Nov 29, 2016

A scalable frontier for web crawlers

hubstorage

Last released on Dec 5, 2016

Client interface for Scrapinghub HubStorage

js2xml

Last released on Dec 1, 2016

Convert Javascript code to XML document

page_clustering

Last released on May 31, 2016

Online k-means clustering of web pages

page_finder

Last released on Nov 16, 2016

portia2code

Last released on Jun 8, 2016

Convert portia spider definitions to python scrapy spiders

scrapinghub

Last released on Nov 2, 2016

Client interface for Scrapinghub API

scrapinghub-entrypoint-scrapy

Last released on Nov 30, 2016

Scrapy entrypoint for Scrapinghub job runner

scrapy-crawlera

Last released on Oct 17, 2016

Crawlera middleware for Scrapy

scrapy-deltafetch

Last released on Dec 7, 2016

Scrapy middleware to ignore previously crawled pages

scrapy-hcf

Last released on Jul 18, 2016

Scrapy spider middleware to use Scrapinghub's Hub Crawl Frontier as a backend for URLs

scrapylib

Last released on Nov 14, 2016

Scrapy helper functions and processors

scrapy-magicfields

Last released on Jun 30, 2016

Scrapy middleware to add extra "magic" fields to items

scrapy-mosquitera

Last released on May 19, 2016

Restrict crawl and scraping scope using matchers.

scrapy-querycleaner

Last released on Jun 30, 2016

Scrapy spider middleware to clean up query parameters in request URLs

scrapyrt

Last released on May 11, 2016

Put Scrapy spiders behind an HTTP API

scrapy-splitvariants

Last released on Jul 18, 2016

Scrapy spider middleware to split an item into multiple items on a multi-valued key

scrapy-streamitem

Last released on Dec 23, 2014

Scrapy support for working with streamcorpus Stream Items

shub

Last released on Sep 20, 2016

Scrapinghub Command Line Client

shub-image

Last released on Oct 28, 2016

Scrapinghub release tool

skinfer

Last released on Oct 8, 2015

Simple tool to merge JSON schemas

slybot

Last released on Nov 11, 2016

Slybot crawler

splash

Last released on Nov 30, 2016

A javascript rendered with a HTTP API

wappalyzer-python

Last released on Feb 26, 2015

Python wrapper for Wappalyzer (utility that uncovers the technologies used on websites)

webstruct

Last released on Nov 28, 2016

A library for creating statistical NER systems that work on HTML data

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting