11 projects
codebooks
Automatic generation of codebooks from dataframes.
jobstruct
Extract structured information from job postings.
sockit
Sockit is a natural-language processing toolkit for modeling structured occupation information and Standard Occupational Classification (SOC) codes in unstructured text from job titles, job postings, and resumes.
scons-remote
Extends SCons to build targets in remote environments.
wordtrie
WordTrie: a simple trie (prefix tree) for word and phrase matching
sirad
Secure Infrastructure for Research with Administrative Data
censuscoding
Censuscoding: determine the Census blockgroup for a street address
hivmmer
An alignment and variant-calling pipeline for Illumina deep sequencing of HIV-1, based on the probabilistic aligner HMMER.
unitable
A data analysis environment that unites the best features of pandas, R, Stata, and others.
agalma
An automated phylogenomics pipeline.
biolite
A lightweight bioinformatics framework with automated tracking of diagnostics and provenance.