Skip to main content
Python Software Foundation 20th Year Anniversary Fundraiser  Donate today!
Avatar for scrapinghub from gravatar.com
Username    scrapinghub
Date joined   Joined

48 projects

spidermon

Last released

Spidermon is a framework to build monitors for Scrapy spiders.

scrapy-jsonschema

Last released

Scrapy schema validation pipeline and Item builder using JSON Schema

scrapyrt

Last released

Put Scrapy spiders behind an HTTP API

scrapy-poet

Last released

Page Object pattern for Scrapy

andi

Last released

Library for annotation-based dependency injection

scrapinghub-entrypoint-scrapy

Last released

Scrapy entrypoint for Scrapinghub job runner

scrapy-autoextract

Last released

Scrapinghub AutoExtract API integration for Scrapy

autoextract-poet

Last released

web-poet definitions for AutoExtract API

scrapinghub-autoextract

Last released

Python interface to Scrapinghub Automatic Extraction API

shub

Last released

Scrapinghub Command Line Client

extruct

Last released

Extract embedded metadata from HTML markup

scrapy-crawlera

Last released

Crawlera middleware for Scrapy

price-parser

Last released

Extract price and currency from a raw string

itemloaders

Last released

Base library for scrapy's ItemLoader

itemadapter

Last released

Common interface for data container classes

dateparser

Last released

Date parsing library designed to parse dates from HTML pages

number-parser

Last released

parse numbers written in natural language

js2xml

Last released

Convert Javascript code to XML document

web-poet

Last released

Scrapinghub's Page Object pattern for web scraping

splash

Last released

A javascript rendered with a HTTP API

scrapinghub

Last released

Client interface for Scrapinghub API

scrapy-headless

Last released

Download Handler for using Scrapy with headless browsers

scrapy-po

Last released

Page Object pattern for Scrapy

arche

Last released

Analyze Scrapy Cloud data

slybot

Last released

Slybot crawler

frontera

Last released

A scalable frontier for web crawlers

portia2code

Last released

Convert portia spider definitions to python scrapy spiders

webstruct

Last released

A library for creating statistical NER systems that work on HTML data

PyPyDispatcher

Last released

Multi-producer-multi-consumer signal dispatching mechanism

exporters

Last released

Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations.

scrapy-deltafetch

Last released

Scrapy middleware to ignore previously crawled pages

page_finder

Last released

hubstorage

Last released

Client interface for Scrapinghub HubStorage

scrapylib

Last released

Scrapy helper functions and processors

shub-image

Last released

Scrapinghub release tool

adblockparser

Last released

Parser for Adblock Plus rules

scrapy-splitvariants

Last released

Scrapy spider middleware to split an item into multiple items on a multi-valued key

scrapy-hcf

Last released

Scrapy spider middleware to use Scrapinghub's Hub Crawl Frontier as a backend for URLs

scrapy-querycleaner

Last released

Scrapy spider middleware to clean up query parameters in request URLs

scrapy-magicfields

Last released

Scrapy middleware to add extra "magic" fields to items

page_clustering

Last released

Online k-means clustering of web pages

scrapy-mosquitera

Last released

Restrict crawl and scraping scope using matchers.

skinfer

Last released

Simple tool to merge JSON schemas

flatson

Last released

Tool to flatten stream of JSON-like objects, configured via schema

crawl-frontier

Last released

A flexible frontier for web crawlers

aduana

Last released

Bindings for Aduana library

wappalyzer-python

Last released

Python wrapper for Wappalyzer (utility that uncovers the technologies used on websites)

scrapy-streamitem

Last released

Scrapy support for working with streamcorpus Stream Items

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page