A utility for finding login links, forms and autologging into websites with a set of valid credentials.
A Scrapy middleware to use with autologin
Extract text from HTML
Reading JSON lines (jl) files
A component that tried to avoid downloading duplite content
Word sense disambiguation library
A classifier for detecting soft 404 pages