Tools for building streamcorpus objects, such as those used in TREC.
Project description
streamcorpus_pipeline is a document processing pipeline that assembles streamcorpus objects from raw data sets.
The streamcorpus_pipeline python module contains tools for processing streamcorpus.StreamItem objects stored in Chunks. It includes transform functions for getting clean_html, clean_visible, creating labels from hyperlinks to particular sites (e.g. Wikipedia), and taggers like LingPipe, Serif, and Factorie, which make Tokens and Sentences.
Read more at [streamcorpus.org](http://streamcorpus.org/)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for streamcorpus_pipeline-0.5.32.dev12.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 53b72e5ddab528dbaee37edd995c426acf021d452b7ac45d1f86444405552226 |
|
MD5 | 11c4fe164139ea7be510e97f564d3960 |
|
BLAKE2b-256 | 1b13c87abb6d266dab48b29264677483b602e2afd279e2c5aec5e5a87929790a |
Close
Hashes for streamcorpus_pipeline-0.5.32.dev12-py2.7.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | bcfd5fe90c99ca9f8eceabfca72b48407b6752457f91e4713a00fd84bc740379 |
|
MD5 | 55c84bb4120d81c18314837602ff9aa8 |
|
BLAKE2b-256 | b0fc25b9cbf3f473a374b9f7c55d654273c2bfc631618962edb287a66a3b6895 |