Skip to main content
Help us improve Python packaging – donate today!
Avatar for pablohoffman from gravatar.com

  pablohoffman

18 projects

queuelib

Last released on Mar 12, 2018

Collection of persistent (disk-based) queues

splash

Last released on Feb 15, 2018

A javascript rendered with a HTTP API

dateparser

Last released on Feb 8, 2018

Date parsing library designed to parse dates from HTML pages

parsel

Last released on Feb 8, 2018

Parsel is a library to extract data from HTML and XML using XPath and CSS selectors

w3lib

Last released on Jan 25, 2018

Library of web-related functions

scrapy-crawlera

Last released on Jan 11, 2018

Crawlera middleware for Scrapy

Scrapy

Last released on Dec 29, 2017

A high-level Web Crawling and Web Scraping framework

webstruct

Last released on Dec 29, 2017

A library for creating statistical NER systems that work on HTML data

slybot

Last released on Jun 28, 2017

Slybot crawler

shub

Last released on Jun 26, 2017

Scrapinghub Command Line Client

scrapely

Last released on May 26, 2017

A pure-python HTML screen-scraping library

scrapyd

Last released on Apr 12, 2017

A service for running Scrapy spiders, with an HTTP API

frontera

Last released on Feb 9, 2017

A scalable frontier for web crawlers

hubstorage

Last released on Dec 5, 2016

Client interface for Scrapinghub HubStorage

scrapylib

Last released on Nov 14, 2016

Scrapy helper functions and processors

adblockparser

Last released on Oct 17, 2016

Parser for Adblock Plus rules

scrapy-dotpersistence

Last released on Aug 4, 2016

Scrapy extension to sync `.scrapy` folder to an S3 bucket

scrapyjs

Last released on Mar 25, 2016

JavaScript support for Scrapy using Splash

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page