Fast Python phonetic algorithms
Fuzzy is a python library implementing common phonetic algorithms quickly. Typically this is in string similarity exercises, but they’re pretty versatile.
It uses C Extensions (via Cython) for speed.
The algorithms are:
The functions are quite easy to use!
>>> import fuzzy >>> soundex = fuzzy.Soundex(4) >>> soundex('fuzzy') 'F200' >>> dmeta = fuzzy.DMetaphone() >>> dmeta('fuzzy') ['FS', None] >>> fuzzy.nysiis('fuzzy') 'FASY'
In : timeit soundex('fuzzy') 1000000 loops, best of 3: 326 ns per loop In : timeit dmeta('fuzzy') 100000 loops, best of 3: 2.18 us per loop In : timeit fuzzy.nysiis('fuzzy') 100000 loops, best of 3: 13.7 us per loop
We recommend the Python-Levenshtein module for fast, C based string distance/similarity metrics. Among others functions it includes:
In testing it’s been several times faster than comparable pure python implementations of those algorithms.