Tools for building streamcorpus objects, such as those used in TREC.
Project description
streamcorpus_pipeline is a document processing pipeline that assembles streamcorpus objects from raw data sets.
The streamcorpus_pipeline python module contains tools for processing streamcorpus.StreamItem objects stored in Chunks. It includes transform functions for getting clean_html, clean_visible, creating labels from hyperlinks to particular sites (e.g. Wikipedia), and taggers like LingPipe, Serif, and Factorie, which make Tokens and Sentences.
Read more at [streamcorpus.org](http://streamcorpus.org/)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for streamcorpus_pipeline-0.5.49.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 47510ad31301f4c25d531833f21a5ddfae50b9f436f2516885aadee6ea275f61 |
|
MD5 | 4f3e8e2b383ca88190e234fb74d6b0a1 |
|
BLAKE2b-256 | 21fbbeb28dfd99ef0017f0581dd72a35eb6165c96b8625c7cd58729b00f79d5d |
Close
Hashes for streamcorpus_pipeline-0.5.49-py2.7.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | a9e2d8d3b89ecef73c65c478697243ccbf8afe4dea350720ff6ccef1c932290c |
|
MD5 | 53a7f6b4cab0be34451c11b5879b8723 |
|
BLAKE2b-256 | e9a1520d0e5a818f8fb04515e2940bc44804a205eadb7b550cb932217a77b113 |