Tools for building streamcorpus objects, such as those used in TREC.
Project description
streamcorpus_pipeline is a document processing pipeline that assembles streamcorpus objects from raw data sets.
The streamcorpus_pipeline python module contains tools for processing streamcorpus.StreamItem objects stored in Chunks. It includes transform functions for getting clean_html, clean_visible, creating labels from hyperlinks to particular sites (e.g. Wikipedia), and taggers like LingPipe, Serif, and Factorie, which make Tokens and Sentences.
Read more at [streamcorpus.org](http://streamcorpus.org/)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for streamcorpus_pipeline-0.6.4.dev10.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 16787bc481eccef9af6ec4e24056ed29ad793d40d3f86593b3dbde445b187262 |
|
MD5 | 9180b2640658cd99506aea3a618be14c |
|
BLAKE2b-256 | 0ab7dbfdc5a80c7982c269b8750418bd079670ae67466b68140f1585674b9cd6 |
Close
Hashes for streamcorpus_pipeline-0.6.4.dev10-py2.7.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7d44ed121194c8733e0691f53cba172b5adf835feebfc93a5f609a91019ae471 |
|
MD5 | 2811a52bbf697a0acda91df07979e1a3 |
|
BLAKE2b-256 | 58a3bc07d027fcdac52aa8be6b4e3d0526840c883423049e828f134b32cd6451 |