21 projects
many-stop-words
stop words lists in many languages
streamcorpus_pipeline
Tools for building streamcorpus objects, such as those used in TREC.
streamcorpus
Tools for organizing a collections of text for entity-centric stream processing.
dossier.label
Label (ground truth) storage for DossierStack
nilsimsa
Locality-sensitive hashing
dossier.models
Active learning models
streamcorpus_opensextant
Transforms for converting opensextant output into Token objects in streamcorpus
dossier.store
Feature collection storage for DossierStack
dossier.fc
Feature collections for DossierStack
trec_dd
TREC Dynamic Domain (DD) evaluation test harness for simulating user interaction with a search engine
dossier.web
DossierStack web services
coordinate
redis-based python client library and command line tools for managing tasks executed by a group of configurable workers
kvlayer
table-oriented abstraction layer over key-value stores
kvlayer_mysql
table-oriented abstraction layer over key-value stores
rejester
redis-based python client library and command line tools for managing tasks executed by a group of configurable workers
yakonfig
load a configuration dictionary for a large application
dblogger
DB-backed python logging.Handler subclass that uses kvlayer, and provides command-line tools.
sortedcollection
SortedCollection class that abstracts bisect extended from http://code.activestate.com/recipes/577197-sortedcollection/.
streamcorpus_elasticsearch
Tool for loading streamcorpus.StreamItems into ElasticSearch
streamcorpus_factorie
Tools for building streamcorpus objects for particular collections of text used in TREC KBA.
pyconnectedcomponent
simple connected component tool from http://breakingcode.wordpress.com/2013/04/08/finding-connected-components-in-a-graph/