Skip to main content

Tools for building streamcorpus objects, such as those used in TREC.

Project description

streamcorpus_pipeline is a document processing pipeline that assembles streamcorpus objects from raw data sets.

The streamcorpus_pipeline python module contains tools for processing streamcorpus.StreamItem objects stored in Chunks. It includes transform functions for getting clean_html, clean_visible, creating labels from hyperlinks to particular sites (e.g. Wikipedia), and taggers like LingPipe, Serif, and Factorie, which make Tokens and Sentences.

Read more at [streamcorpus.org](http://streamcorpus.org/)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

streamcorpus_pipeline-0.7.14.tar.gz (9.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

streamcorpus_pipeline-0.7.14-py2.7.egg (10.3 MB view details)

Uploaded Egg

File details

Details for the file streamcorpus_pipeline-0.7.14.tar.gz.

File metadata

File hashes

Hashes for streamcorpus_pipeline-0.7.14.tar.gz
Algorithm Hash digest
SHA256 0b73a2930ad566fb1218a9677e42d3ec662688188ba53e269cb094661ea0c85f
MD5 b7e4be1da79d0dea2f7cff9a11ee57a0
BLAKE2b-256 a84339e8682a4adb63ff9b009ebb9687ede6f2f0f076904c9e37254a9cf8c873

See more details on using hashes here.

File details

Details for the file streamcorpus_pipeline-0.7.14-py2.7.egg.

File metadata

File hashes

Hashes for streamcorpus_pipeline-0.7.14-py2.7.egg
Algorithm Hash digest
SHA256 2fdf5131d8f1f876f59ce2863e87736ac23672c8c7ff94192ca111534071d7d2
MD5 d05206db94ca1cfecb8928b115c69ad5
BLAKE2b-256 d472c99b5c34cc47ac16168eb59ae5fa9e41f92777427ae822d6f1d4ddeb2573

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page