Quickly build a news/web corpus with specifc topics or terms automatically from Google News or by specifying article links in a file. This module automatically extracts the body and title from each article and saves the result to either flatfiles or sqlite database.
News Corpus Builder
A simple module that can be used to quickly build a corpus from news articles. The generated corpus can be stored in a sqlite database or as flat files.
See http://skillachie.github.io/news-corpus-builder/ for installation and usage
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
news-corpus-builder-0.1.5.zip (6.2 kB view hashes)