Static memory-efficient and fast Trie-like structures for Python.
Static memory-efficient Trie-like structures for Python (2.x and 3.x) based on marisa-trie C++ library.
String data in a MARISA-trie may take up to 50x-100x less memory than in a standard Python dict; the raw lookup speed is comparable; trie also provides fast advanced methods like prefix search.
There are official SWIG-based Python bindings included in C++ library distribution; this package provides alternative Cython-based pip-installable Python bindings.
pip install marisa-trie
- The library is not tested with mingw32 compiler;
- .prefixes() method of BytesTrie and RecordTrie is quite slow and doesn’t have iterator counterpart;
- read() and write() methods don’t work with file-like objects (they work only with real files; pickling works fine for file-like objects);
- there are keys() and items() methods but no values() method.
Wrapper code is licensed under MIT License.
Bundled marisa-trie C++ library is dual-licensed under LGPL and BSD 2-clause license.
- Fixed packaging issue, MANIFEST.in was not updated after libmarisa-trie became a submodule.
- Added BinaryTrie for storing arbitrary sequences of bytes, e.g. IP addresses (thanks Tomasz Melcer);
- Deprecated Trie.has_keys_with_prefix which can be trivially implemented in terms of Trie.iterkeys;
- Deprecated Trie.read and Trie.write which onlywork for “real” files and duplicate the functionality of load and save. See issue #31 on GitHub;
- Updated libmarisa-trie to the latest version. Yay, 64-bit Windows support.
- Rebuilt Cython wrapper with Cython 0.25.2.
- packaging issue is fixed.
- setup.py is switched to setuptools;
- a tiny speedup;
- wrapper is rebuilt with Cython 0.22.
- trie1 == trie2 and trie1 != trie2 now work (thanks Sergei Lebedev);
- for key in trie: is fixed (thanks Sergei Lebedev);
- wrapper is rebuilt with Cython 0.21.1 (thanks Sergei Lebedev);
- https://bitbucket.org/kmike/marisa-trie repo is no longer supported.
- New Trie methods: __getitem__, get, items, iteritems. trie[u'key'] is now the same as trie.key_id(u'key').
- small optimization for BytesTrie.get.
- wrapper is rebuilt with Cython 0.20.1.
- small Trie.restore_key optimization (it should work 5-15% faster)
- fix Trie.restore_key method - it was reading past declared string length;
- rebuild wrapper with Cython 0.20.
- has_keys_with_prefix(prefix) method (thanks Matt Hickford)
- BytesTrie.iterkeys, BytesTrie.iteritems, RecordTrie.iterkeys and RecordTrie.iteritems methods;
- wrapper is rebuilt with Cython 0.19;
- value_separator parameter for BytesTrie and RecordTrie.
- improved trie building: weights optional parameter;
- improved trie building: unnecessary input sorting is removed;
- wrapper is rebuilt with Cython 0.18;
- bundled marisa-trie C++ library is updated to svn r133.
- Rebuild wrapper with Cython pre-0.18;
- update benchmarks.
- Update bundled marisa-trie C++ library (this may fix more mingw issues);
- Python 3.3 support is back.
- much faster (3x-7x) .items() and .keys() methods for all tries; faster (up to 3x) .prefixes() method for Trie.
- Pickling of RecordTrie is fixed (thanks lazarou for the report);
- error messages should become more useful.
- Issues with mingw32 should be resolved (thanks Susumu Yata).
- .get(key, default=None) method for BytesTrie and RecordTrie;
- small README improvements.
- Small code cleanup;
- load, read and mmap methods returns ‘self’;
- I can’t run tests (via tox) under Python 3.3 so it is removed from supported versions for now.
- .prefixes() support for RecordTrie and BytesTrie.
- RecordTrie and BytesTrie are introduced;
- IntTrie class is removed (probably temporary?);
- dumps/loads methods are renamed to tobytes/frombytes;
- benchmark & tests improvements;
- support for MARISA-trie config options is added.
- Pickling/unpickling support;
- dumps/loads methods;
- python 3.3 workaround;
- improved tests;
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.