Web scraping library and command-line tool for text discovery and retrieval. Downloads web pages, scrapes main text and comments while preserving some structure, and converts to TXT, CSV, JSON and XML
Clean, filter, normalize, and sample URLs
Fast and robust extraction of original and updated publication dates from web pages.
A simple multilingual lemmatizer for Python.