29 projects
vectors2vrt
Generate a VRT file from GIS vector sources
DoubleMetaphone
Python wrapper for C++ Double Metaphone
python-crfsuite
Python binding for CRFsuite
dedupe
A python library for accurate and scaleable data deduplication and entity-resolution
dedupe-variable-address
Address variable type for dedupe
dedupe-variable-datetime
DateTime variable type for dedupe
dedupe-variable-name
Name variable type for dedupe
parseratorvariable
Structured variable type for dedupe
pyhacrf-datamade
Hidden alignment conditional random field, a discriminative string edit distance
PyLBFGS
LBFGS and OWL-QN optimization algorithms
kubra
command line tool for downloading utility outage data
chicagorequests
command line tool for downloading Chicago Open311 data
dedupe-Levenshtein-search
Search through documents for approximately matching strings. A fork of Matt Anderson's library for MIT licensing
affinegap
A Cython implementation of the affine gap string distance
rlr
Case weighted L2 regularized logistic regression
dedupe-hcluster
Hierarchical Clustering Algorithms (Information Theory)
django-proxy-overrides
Overridable foreign key fields for Proxy models
dedupe-variable-ilcs
Dedupe variable for Illinois Compiled Statute (ILCS) codes
csvdedupe
Command line tools for deduplicating and merging csv files
dedupe-variable-number
Employer variable type for dedupe
datetime-distance
Compare string distances between dates, timestamps, or datetime objects.
simplecosine
Simple cosine distance
highered
Learnable Edit Distance Using PyHacrf
categorical-distance
Compare two categorical variables
dedupe-variable-person
Variable type for American Person Names
dedupe-variable-employer
Employer variable type for dedupe
dedupe-variable-fuzzycategory
Fuzzy Categoy variable type for dedupe
fuzzycategory
A context comparison
canonicalize
canonicalize a cluster of records