Skip to main content

Tools for building streamcorpus objects, such as those used in TREC.

Project description

streamcorpus_pipeline is a document processing pipeline that assembles streamcorpus objects from raw data sets.

The streamcorpus_pipeline python module contains tools for processing streamcorpus.StreamItem objects stored in Chunks. It includes transform functions for getting clean_html, clean_visible, creating labels from hyperlinks to particular sites (e.g. Wikipedia), and taggers like LingPipe, Serif, and Factorie, which make Tokens and Sentences.

Read more at [streamcorpus.org](http://streamcorpus.org/)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

streamcorpus_pipeline-0.7.19.tar.gz (9.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

streamcorpus_pipeline-0.7.19-py2.7.egg (10.3 MB view details)

Uploaded Egg

File details

Details for the file streamcorpus_pipeline-0.7.19.tar.gz.

File metadata

File hashes

Hashes for streamcorpus_pipeline-0.7.19.tar.gz
Algorithm Hash digest
SHA256 bec6f6e69774dbb34b46af40d7166950b30035d9b093780f716f95993c7e2d32
MD5 e925a6ce07caa5061927f2eb8aed23dd
BLAKE2b-256 2df0a6fa2afac328147359ff9ab1a0cf6bff0bf15ee7c7e92697051c35f7a54a

See more details on using hashes here.

File details

Details for the file streamcorpus_pipeline-0.7.19-py2.7.egg.

File metadata

File hashes

Hashes for streamcorpus_pipeline-0.7.19-py2.7.egg
Algorithm Hash digest
SHA256 f2589aac2f109b82d8d41c0a66c9daa290b36f3c63f33870b0f1910e08cd07b0
MD5 2afb0cbf40bd532c122882a2d5d9bcf1
BLAKE2b-256 cfc0f783f20aac0beccfec2168539b01c0232d4c4a0a07fb03c5fe5cee1ec9d3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page