Skip to main content

Tools for building streamcorpus objects, such as those used in TREC.

Project description

streamcorpus_pipeline is a document processing pipeline that assembles streamcorpus objects from raw data sets.

The streamcorpus_pipeline python module contains tools for processing streamcorpus.StreamItem objects stored in Chunks. It includes transform functions for getting clean_html, clean_visible, creating labels from hyperlinks to particular sites (e.g. Wikipedia), and taggers like LingPipe, Serif, and Factorie, which make Tokens and Sentences.

Read more at [streamcorpus.org](http://streamcorpus.org/)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

streamcorpus_pipeline-0.7.17.tar.gz (9.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

streamcorpus_pipeline-0.7.17-py2.7.egg (10.3 MB view details)

Uploaded Egg

File details

Details for the file streamcorpus_pipeline-0.7.17.tar.gz.

File metadata

File hashes

Hashes for streamcorpus_pipeline-0.7.17.tar.gz
Algorithm Hash digest
SHA256 060f1495a86569d2ae888ec925657836831b2660b74bec03549040dea5eae086
MD5 d606705e4de912277432b746885b59e2
BLAKE2b-256 58b09a41016044e4ede62a291a6dbc38d12b1393939faf4e044199b351e2cffc

See more details on using hashes here.

File details

Details for the file streamcorpus_pipeline-0.7.17-py2.7.egg.

File metadata

File hashes

Hashes for streamcorpus_pipeline-0.7.17-py2.7.egg
Algorithm Hash digest
SHA256 8429c149653d3c63a2b00cdde93d4f92e9eeb60026bad5344ddf1510bfc9a979
MD5 d043b4ec4a19bf200830b7d5684a2db2
BLAKE2b-256 9d4e39a392305650eccac55cbf47859ba5b1995a01d1b6befc786cb14efe12f4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page