Skip to main content

Tools for building streamcorpus objects, such as those used in TREC.

Project description

streamcorpus_pipeline is a document processing pipeline that assembles streamcorpus objects from raw data sets.

The streamcorpus_pipeline python module contains tools for processing streamcorpus.StreamItem objects stored in Chunks. It includes transform functions for getting clean_html, clean_visible, creating labels from hyperlinks to particular sites (e.g. Wikipedia), and taggers like LingPipe, Serif, and Factorie, which make Tokens and Sentences.

Read more at [streamcorpus.org](http://streamcorpus.org/)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

streamcorpus_pipeline-0.7.16.tar.gz (9.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

streamcorpus_pipeline-0.7.16-py2.7.egg (10.3 MB view details)

Uploaded Egg

File details

Details for the file streamcorpus_pipeline-0.7.16.tar.gz.

File metadata

File hashes

Hashes for streamcorpus_pipeline-0.7.16.tar.gz
Algorithm Hash digest
SHA256 1ab5b268a785caf6140c0eb895cf828841a86b5fbc1777cd713183c8f4082dee
MD5 c1dfb9aaacd5a09f3ca40cd38ea871f4
BLAKE2b-256 f7848fd19afcc3dbdb82e6a9d6dcab8b6438cc8a3bafe16f9552f2bef5b81cf8

See more details on using hashes here.

File details

Details for the file streamcorpus_pipeline-0.7.16-py2.7.egg.

File metadata

File hashes

Hashes for streamcorpus_pipeline-0.7.16-py2.7.egg
Algorithm Hash digest
SHA256 d5dc2b109e39ad276f28377cd9c3b4af6a59d75a16570b418bd8dabf8f7f1f05
MD5 c568a81241dbae5806fb048058f8e982
BLAKE2b-256 6255f8c4d50cf21cf93a159ac0959d75c4aa70a11a290622d34c3be1bc116b81

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page