Tools for building streamcorpus objects, such as those used in TREC.
Project description
streamcorpus_pipeline is a document processing pipeline that assembles streamcorpus objects from raw data sets.
The streamcorpus_pipeline python module contains tools for processing streamcorpus.StreamItem objects stored in Chunks. It includes transform functions for getting clean_html, clean_visible, creating labels from hyperlinks to particular sites (e.g. Wikipedia), and taggers like LingPipe, Serif, and Factorie, which make Tokens and Sentences.
Read more at [streamcorpus.org](http://streamcorpus.org/)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for streamcorpus_pipeline-0.6.7.dev3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | dde701867176e6696786ce1642c37d9e9d150979bb3c5d7c5c082812c010993f |
|
MD5 | d5c8f5ed3d5bf79069ed1a7eeae1c707 |
|
BLAKE2b-256 | cd69791230c01e2263a68923d8e513049d14cf642ba57aa6be0f615ccac6d050 |
Close
Hashes for streamcorpus_pipeline-0.6.7.dev3-py2.7.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | 318ce50fc0f28722d102abef729b894be370a46e9913279a8a1e391c3be8eb2e |
|
MD5 | d9a67fb89034d2bd13e17cb7131860ec |
|
BLAKE2b-256 | 66867164e99286a094c5be18593f49c80279fa7607539c47babe3820ef589a22 |