Skip to main content

Tools for building streamcorpus objects, such as those used in TREC.

Project description

streamcorpus_pipeline is a document processing pipeline that assembles streamcorpus objects from raw data sets.

The streamcorpus_pipeline python module contains tools for processing streamcorpus.StreamItem objects stored in Chunks. It includes transform functions for getting clean_html, clean_visible, creating labels from hyperlinks to particular sites (e.g. Wikipedia), and taggers like LingPipe, Serif, and Factorie, which make Tokens and Sentences.

Read more at [streamcorpus.org](http://streamcorpus.org/)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

streamcorpus_pipeline-0.5.53.dev5.tar.gz (9.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

streamcorpus_pipeline-0.5.53.dev5-py2.7.egg (9.8 MB view details)

Uploaded Egg

File details

Details for the file streamcorpus_pipeline-0.5.53.dev5.tar.gz.

File metadata

File hashes

Hashes for streamcorpus_pipeline-0.5.53.dev5.tar.gz
Algorithm Hash digest
SHA256 120277662e734f5dde968abf51149a6469046aab4b0a147e94e9953a2c8642dc
MD5 ec66620293140555738b693f55e1cb0d
BLAKE2b-256 30adaf733fcd19cb2ccee52f0d034c723304848c20f1973889d167307a00bb4d

See more details on using hashes here.

File details

Details for the file streamcorpus_pipeline-0.5.53.dev5-py2.7.egg.

File metadata

File hashes

Hashes for streamcorpus_pipeline-0.5.53.dev5-py2.7.egg
Algorithm Hash digest
SHA256 1441499a5b6dd8a014e8c6533f1a473d6f7fa24a03279e6cc2115a40a72ce26a
MD5 bbb273c616f5e3da5002be21485df804
BLAKE2b-256 8ab32bccc3433d8db53cc7a79dcbb1e89fe84976a4020d7df0ecb35627f73a5a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page