Directed Acyclic Word Graph (DAWG) allows to store huge strings set in compacted form
pyDAWG implements DAWG structure. Testing if word is present in a set has complexity O(n), where n is length of tested string.
Algorithm used to build DAWG is incremental, memory overhead during graph construction is small.
Module supports also minimal perfect hashing — it is possible to get unique number for any word from a set, or find which word has assigned given number. This makes possible to use DAWG as a dictionary.
The module has been written in C. There is also a pure python version providing most of C version functionality.