Skip to main content
Avatar for lopuhin from gravatar.com
Username    lopuhin
Date joined   Joined

29 projects

scrapy-splash

Last released

JavaScript support for Scrapy using Splash

eli5

Last released

Debug machine learning classifiers and explain their predictions

autopager

Last released

Detect and classify pagination links on web pages

html-text

Last released

Extract text from HTML

python-crfsuite

Last released

Python binding for CRFsuite

scrapy-rotating-proxies

Last released

Rotating proxies for Scrapy

json-log-plots

Last released

json-lines

Last released

Reading JSON lines (jl) files, recover broken files

scurl

Last released

formasaurus

Last released

Formasaurus tells you the types of HTML forms and their fields using machine learning

tensorboard_logger

Last released

Log TensorBoard events without Tensorflow

MaybeDont

Last released

A component that tried to avoid downloading duplicate content

webstruct

Last released

A library for creating statistical NER systems that work on HTML data

vmprofit

Last released

vmprof helpers

scrapy-cdr

Last released

rl_wsd_labeled

Last released

Labeled contexts of Russian polysemous words

scrapy-kafka-export

Last released

Export Scrapy items to Kafka

PyPyDispatcher

Last released

Multi-producer-multi-consumer signal dispatching mechanism

sklearn-crfsuite

Last released

CRFsuite (python-crfsuite) wrapper which provides interface simlar to scikit-learn

proxy-middleware

Last released

Scrapy http proxy middleware that gets proxy parameters from settings

autologin

Last released

A utility for finding login links, forms and autologging into websites with a set of valid credentials.

soft404

Last released

A classifier for detecting soft 404 pages

url-summary

Last released

Display a summary of urls in a notebook

autologin-middleware

Last released

A Scrapy middleware to use with autologin

scrapy-crawl-once

Last released

Scrapy middleware which allows to crawl only new content

extract-html-diff

Last released

Extract difference between two html pages

adblockparser

Last released

Parser for Adblock Plus rules

rlwsd

Last released

Word sense disambiguation library

arachnado

Last released

Scrapy-based Web Crawler with an UI

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page