29 projects
formasaurus
Formasaurus tells you the types of HTML forms and their fields using machine learning
python-crfsuite
Python binding for CRFsuite
sklearn-crfsuite
CRFsuite (python-crfsuite) wrapper which provides interface simlar to scikit-learn
html-text
Extract text from HTML
scrapy-splash
JavaScript support for Scrapy using Splash
eli5
Debug machine learning classifiers and explain their predictions
autopager
Detect and classify pagination links on web pages
scrapy-rotating-proxies
Rotating proxies for Scrapy
json-log-plots
json-lines
Reading JSON lines (jl) files, recover broken files
scurl
tensorboard_logger
Log TensorBoard events without Tensorflow
MaybeDont
A component that tried to avoid downloading duplicate content
webstruct
A library for creating statistical NER systems that work on HTML data
vmprofit
vmprof helpers
scrapy-cdr
rl_wsd_labeled
Labeled contexts of Russian polysemous words
scrapy-kafka-export
Export Scrapy items to Kafka
PyPyDispatcher
Multi-producer-multi-consumer signal dispatching mechanism
proxy-middleware
Scrapy http proxy middleware that gets proxy parameters from settings
autologin
A utility for finding login links, forms and autologging into websites with a set of valid credentials.
soft404
A classifier for detecting soft 404 pages
url-summary
Display a summary of urls in a notebook
autologin-middleware
A Scrapy middleware to use with autologin
scrapy-crawl-once
Scrapy middleware which allows to crawl only new content
extract-html-diff
Extract difference between two html pages
adblockparser
Parser for Adblock Plus rules
rlwsd
Word sense disambiguation library
arachnado
Scrapy-based Web Crawler with an UI