Skip to main content

Pure-python reader for DAWGs created by dawgdic C++ library or DAWG Python extension.

Project description


This pure-python package provides read-only access for files created by dawgdic C++ library and DAWG python package.

This package is not capable of creating DAWGs. It works with DAWGs built by dawgdic C++ library or DAWG Python extension module. The main purpose of DAWG-Python is to provide an access to DAWGs without requiring compiled extensions. It is also quite fast under PyPy (see benchmarks).


pip install DAWG-Python


The aim of DAWG-Python is to be API- and binary-compatible with DAWG when it is possible.

First, you have to create a dawg using DAWG module:

import dawg
d = dawg.DAWG(data)'words.dawg')

And then this dawg can be loaded without requiring C extensions:

import dawg_python
d = dawg_python.DAWG().load('words.dawg')

Please consult DAWG docs for detailed usage. Some features (like constructor parameters or save method) are intentionally unsupported.


Benchmark results (100k unicode words, integer values (lenghts of the words), PyPy 1.9, macbook air i5 1.8 Ghz):

dict __getitem__ (hits):        10.978M ops/sec
DAWG __getitem__ (hits):        not supported
BytesDAWG __getitem__ (hits):   0.423M ops/sec
RecordDAWG __getitem__ (hits):  0.348M ops/sec

dict get() (hits):              10.127M ops/sec
DAWG get() (hits):              not supported
BytesDAWG get() (hits):         0.438M ops/sec
RecordDAWG get() (hits):        0.363M ops/sec
dict get() (misses):            14.885M ops/sec
DAWG get() (misses):            not supported
BytesDAWG get() (misses):       1.228M ops/sec
RecordDAWG get() (misses):      1.239M ops/sec

dict __contains__ (hits):           10.341M ops/sec
DAWG __contains__ (hits):           1.086M ops/sec
BytesDAWG __contains__ (hits):      0.904M ops/sec
RecordDAWG __contains__ (hits):     0.886M ops/sec

dict __contains__ (misses):         9.823M ops/sec
DAWG __contains__ (misses):         1.491M ops/sec
BytesDAWG __contains__ (misses):    1.451M ops/sec
RecordDAWG __contains__ (misses):   1.437M ops/sec

dict items():           44.401 ops/sec
DAWG items():           not supported
BytesDAWG items():      3.437 ops/sec
RecordDAWG items():     3.210 ops/sec
dict keys():            426.250 ops/sec
DAWG keys():            not supported
BytesDAWG keys():       6.347 ops/sec
RecordDAWG keys():      6.428 ops/sec

DAWG.prefixes (hits):    0.729M ops/sec
DAWG.prefixes (mixed):   1.770M ops/sec
DAWG.prefixes (misses):  1.420M ops/sec

RecordDAWG.keys(prefix="xxx"), avg_len(res)==415:       1.531K ops/sec
RecordDAWG.keys(prefix="xxxxx"), avg_len(res)==17:      39.823K ops/sec
RecordDAWG.keys(prefix="xxxxxxxx"), avg_len(res)==3:    165.236K ops/sec
RecordDAWG.keys(prefix="xxxxx..xx"), avg_len(res)==1.4: 237.831K ops/sec
RecordDAWG.keys(prefix="xxx"), NON_EXISTING:            4183.149K ops/sec

Under CPython expect it to be about 50x slower.

I think these results are quite good for pure-Python package. For example, under PyPy it has faster lookups and uses 2.5x less memory than marisa-trie under Python 3.2 (marisa-trie is much slower/doesn’t work under PyPy).

It is several times slower under PyPy than Cython-based DAWG under CPython though, so DAWG + CPython > DAWG-Python + PyPy.

Memory consumption of DAWG-Python should be the same as of DAWG.

Current limitations

  • This package is not capable of creating DAWGs;
  • IntDAWG is not implemented;
  • all the limitations of DAWG apply.

Contributions are welcome!


Development happens at github and bitbucket:

The main issue tracker is at github:

Feel free to submit ideas, bugs, pull requests (git or hg) or regular patches.

Running tests and benchmarks

Make sure tox is installed and run

$ tox

from the source checkout. Tests should pass under python 2.6, 2.7, 3.2, 3.3 and PyPy >= 1.9.

In order to run benchmarks, type

$ tox -c bench.ini -e pypy

This runs benchmarks under PyPy (they are about 50x slower under CPython).

Authors & Contributors

The algorithms are from dawgdic C++ library by Susumu Yata & contributors.


This package is licensed under MIT License.

0.3 (2012-09-26)

  • iterkeys and iteritems methods.

0.2 (2012-09-24)

prefixes support.

0.1 (2012-09-20)

Initial release.

Release history Release notifications

History Node


History Node


History Node


History Node


History Node


History Node


History Node


This version
History Node


History Node


History Node


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
DAWG-Python-0.3.tar.gz (7.5 kB) Copy SHA256 hash SHA256 Source None Sep 25, 2012

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page