Skip to main content

Tools for building streamcorpus objects, such as those used in TREC.

Project description

streamcorpus_pipeline is a document processing pipeline that assembles streamcorpus objects from raw data sets.

The streamcorpus_pipeline python module contains tools for processing streamcorpus.StreamItem objects stored in Chunks. It includes transform functions for getting clean_html, clean_visible, creating labels from hyperlinks to particular sites (e.g. Wikipedia), and taggers like LingPipe, Serif, and Factorie, which make Tokens and Sentences.

Read more at [streamcorpus.org](http://streamcorpus.org/)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

streamcorpus_pipeline-0.6.0.tar.gz (9.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

streamcorpus_pipeline-0.6.0-py2.7.egg (9.7 MB view details)

Uploaded Egg

File details

Details for the file streamcorpus_pipeline-0.6.0.tar.gz.

File metadata

File hashes

Hashes for streamcorpus_pipeline-0.6.0.tar.gz
Algorithm Hash digest
SHA256 1eb717096073e7a37a4c32af3197395f0363b8a234bce690c02b348657e8e8e3
MD5 c87c210b01f99bae97ad5fb87dc4ac21
BLAKE2b-256 69fd13fd1d2bfe0ebfbb3007f7bfe16820e5526aa7a86864064ea36f2b2b8cd2

See more details on using hashes here.

File details

Details for the file streamcorpus_pipeline-0.6.0-py2.7.egg.

File metadata

File hashes

Hashes for streamcorpus_pipeline-0.6.0-py2.7.egg
Algorithm Hash digest
SHA256 91d62198f05ac9428fae7257c747998b222838c8b4f5314647f39a5c3bcae388
MD5 11d6824c5dee6355caf8d2154b63a20d
BLAKE2b-256 e15c44a1963cc70ca7d1c879f9d515fc56f5e7c2999ac762069c3fe4f0c23848

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page