Skip to main content

pypocketmap - a memory-efficient hashtable for CPython

Project description

pypocketmap

NOTE: this package is in beta. The current repo only contains implementations for string-keyed maps.

A high performance python hash table library that consumes significantly less memory than Python Dictionaries. It currently supports Python 3.6+. It is forked from Microdict to remove the limitation on the length of string keys, and add utility methods like get and setdefault. It also has a slightly different table design, making use of code ported to C from abseil-cpp.

Benchmarks

The latest charts are at https://observablehq.com/@dylan-burati-ws/pypocketmap-benchmarks

Installation and Building

You can install using pip: pip install pypocketmap.

pypocketmap is tested to work on Linux, Mac OSX, and Windows systems. You will need GCC 7+ on linux/mac osx systems and Visual C++ 14+ compiler on Windows systems to build the package. For the best performance use on a 64 bit system.

Usage

The following code snippet shows common uses of the library.

>>> import pypocketmap as pkm

# Generates a dictionary with string keys and signed 64 bit integer values.
>>> d = pkm.create(str, int)
>>> d = pkm.create(pkm.string, pkm.i64)  # or explicitly

# Works just like a python dictionary, although insertion order != iteration order
>>> d["a"] = 2
>>> "a" in d, "b" in d
(True, False)

# `get` default value can be any type
>>> d.get("a"), d.get("b"), d.get("b", False)
(2, None, False)

# `setdefault` default value must be an int
>>> d.setdefault("a", -10)
2
>>> d.setdefault("b", -10)
-10
>>> d.update({"okc": 1997, "yhf": 2002})
>>> d
<pypocketmap[str, int64]: {'b': -10, 'okc': 1997, 'a': 2, 'yhf': 2002}>
>>> [k for k in d], list(d.keys())
(['b', 'okc', 'a', 'yhf'], ['b', 'okc', 'a', 'yhf'])
>>> list(d.values()), list(d.items())
([-10, 1997, 2, 2002], [('b', -10), ('okc', 1997), ('a', 2), ('yhf', 2002)])
>>> len(d)
4
>>> d.clear()
>>> len(d)
0

How it works

The C parts

The map part of pypocketmap is just a C port of abseil-cpp's SwissTable, which is explained nicely in this blog post series: https://www.kylematsuda.com/blog/writing_a_hashmap_part_2.

The memory savings (pocket part) are partially due to an optimization for small strings - when stored in the key or value array of the map, a string is a 16-byte packed_str_t. This is a union:

  • contained: the data fits in 15 bytes, and the length and sentinel bit 1 fit in 1 byte.
  • spilled: the struct holds a char * pointing to the data, and the length and sentinel bit 0 take up the remaining 8 bytes.
    • 4 bytes are padding on 32-bit platforms; 8 bytes end up unused because the longer lengths would fail to allocate.

The C++ standard library and Rust crate byteyarn do something similar - I learned about this from the crate author's blog post: https://mcyoung.xyz/2023/08/09/yarns/.

The Python C API parts

The file str_int64_Py.c is the only Python module definition which should be edited. That file and abstract.h use C macros and typedef to fake generics. Many Python-specific blocks in the former file are also annotated with custom template! comments, which I came up with for this library's Java equivalent, pocketmap.

The template! comments are the reason for the nearly equivalent *_Py.c files in the repo. Essentially, the macros and comments implement the following traits for the key and value types. Since the "implementations" are just inlined into the files, this is prone to errors. It is probably doable in C++, but for now I'm choosing to add tests rather than convert the C code.

trait Value {
  type Packed;

  // the `&` operator indicates something borrowed.
  // physically, both Self and &Self can be a struct { char* data, uint64_t len }
  // semantically, &Self means that the data field is readonly, and its not safe
  // to use the pointer once the source of the borrow is possibly freed/dropped.
  // ===

  /// Converts the possibly-borrowed value to an owned value independent of any
  /// PyObjects and stores it.
  fn p_set(arr: *mut Self::Packed, index: usize, elem: &Self);

  /// Frees anything alloc'd in a previous `p_set(arr, index)`.
  /// No-op for everything except spilled strings.
  fn p_unset(arr: *mut Self::Packed, index: usize);

  /// Borrows the value - the caller is responsible for making sure `arr[index]` is
  /// set, and stays set while the returned &Self is in use.
  fn p_get(arr: *Self::Packed, index: usize) -> &Self;

  /// Converts the Python object to a C value.
  fn from_py(obj: &PyObject) -> Result<&Self, ()>;

  /// Convert the C value to a new Python object which owns its data. The caller
  /// owns the returned Python object, and should eventually either decrement its
  /// reference count, or return it as the result of a pymethodfunc.
  fn as_py(&self) -> PyObject;

  /// For `repr`.
  fn fmt(&self, f: &mut _PyUnicodeWriter) -> Result<(), ()>;

  /// For `setdefault` when the value arg is left out.
  fn default() -> &'static Self;

  /// For `__eq__` when acting as the value type. For pretty much all methods
  /// when acting as the key type.
  fn eq(&self, other: &Self) -> bool;
}

trait Key: Value + Hash {}

Not yet supported

  • Automatic or manual shrink_to_fit
  • Concurrent modification exceptions if an iterator is used after the table has been rehashed. The implementation can skip elements or yield them twice if used incorrectly.
  • update should work on any arg when PyDict_Check returns true or PyMapping_Keys returns non-null
    • __or__ and __ior__ operators can be implemented with this
  • The METH_FASTCALL convention is stable since Python 3.10, and it should be possible to alter the *Py.c files to use it in place of METH_VARARGS when compiling for 3.10+.
  • Additional overflow checking when LLONG_MAX != INT64_MAX or (more likely) LONG_MAX != INT32_MAX

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pypocketmap-0.0.1b0-cp312-cp312-win_amd64.whl (90.1 kB view hashes)

Uploaded CPython 3.12 Windows x86-64

pypocketmap-0.0.1b0-cp312-cp312-win32.whl (87.1 kB view hashes)

Uploaded CPython 3.12 Windows x86

pypocketmap-0.0.1b0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (406.4 kB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

pypocketmap-0.0.1b0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (394.7 kB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

pypocketmap-0.0.1b0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (341.8 kB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

pypocketmap-0.0.1b0-cp312-cp312-macosx_11_0_arm64.whl (77.4 kB view hashes)

Uploaded CPython 3.12 macOS 11.0+ ARM64

pypocketmap-0.0.1b0-cp312-cp312-macosx_10_15_x86_64.whl (77.6 kB view hashes)

Uploaded CPython 3.12 macOS 10.15+ x86-64

pypocketmap-0.0.1b0-cp312-cp312-macosx_10_15_universal2.whl (129.1 kB view hashes)

Uploaded CPython 3.12 macOS 10.15+ universal2 (ARM64, x86-64)

pypocketmap-0.0.1b0-cp311-cp311-win_amd64.whl (89.8 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

pypocketmap-0.0.1b0-cp311-cp311-win32.whl (86.9 kB view hashes)

Uploaded CPython 3.11 Windows x86

pypocketmap-0.0.1b0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (401.5 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

pypocketmap-0.0.1b0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (391.2 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

pypocketmap-0.0.1b0-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (338.6 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

pypocketmap-0.0.1b0-cp311-cp311-macosx_11_0_arm64.whl (77.1 kB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

pypocketmap-0.0.1b0-cp311-cp311-macosx_10_15_x86_64.whl (77.0 kB view hashes)

Uploaded CPython 3.11 macOS 10.15+ x86-64

pypocketmap-0.0.1b0-cp311-cp311-macosx_10_15_universal2.whl (128.2 kB view hashes)

Uploaded CPython 3.11 macOS 10.15+ universal2 (ARM64, x86-64)

pypocketmap-0.0.1b0-cp310-cp310-win_amd64.whl (89.8 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

pypocketmap-0.0.1b0-cp310-cp310-win32.whl (86.8 kB view hashes)

Uploaded CPython 3.10 Windows x86

pypocketmap-0.0.1b0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (401.2 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

pypocketmap-0.0.1b0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (390.8 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

pypocketmap-0.0.1b0-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (338.1 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

pypocketmap-0.0.1b0-cp310-cp310-macosx_11_0_arm64.whl (77.1 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

pypocketmap-0.0.1b0-cp310-cp310-macosx_10_15_x86_64.whl (77.0 kB view hashes)

Uploaded CPython 3.10 macOS 10.15+ x86-64

pypocketmap-0.0.1b0-cp310-cp310-macosx_10_15_universal2.whl (128.2 kB view hashes)

Uploaded CPython 3.10 macOS 10.15+ universal2 (ARM64, x86-64)

pypocketmap-0.0.1b0-cp39-cp39-win_amd64.whl (89.7 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

pypocketmap-0.0.1b0-cp39-cp39-win32.whl (86.8 kB view hashes)

Uploaded CPython 3.9 Windows x86

pypocketmap-0.0.1b0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (400.5 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

pypocketmap-0.0.1b0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (389.9 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

pypocketmap-0.0.1b0-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (337.5 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

pypocketmap-0.0.1b0-cp39-cp39-macosx_11_0_arm64.whl (77.1 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

pypocketmap-0.0.1b0-cp39-cp39-macosx_10_15_x86_64.whl (77.0 kB view hashes)

Uploaded CPython 3.9 macOS 10.15+ x86-64

pypocketmap-0.0.1b0-cp39-cp39-macosx_10_15_universal2.whl (128.2 kB view hashes)

Uploaded CPython 3.9 macOS 10.15+ universal2 (ARM64, x86-64)

pypocketmap-0.0.1b0-cp38-cp38-win_amd64.whl (89.8 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

pypocketmap-0.0.1b0-cp38-cp38-win32.whl (86.8 kB view hashes)

Uploaded CPython 3.8 Windows x86

pypocketmap-0.0.1b0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (405.6 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

pypocketmap-0.0.1b0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (394.7 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

pypocketmap-0.0.1b0-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (342.3 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

pypocketmap-0.0.1b0-cp38-cp38-macosx_11_0_arm64.whl (77.1 kB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

pypocketmap-0.0.1b0-cp38-cp38-macosx_10_15_x86_64.whl (77.0 kB view hashes)

Uploaded CPython 3.8 macOS 10.15+ x86-64

pypocketmap-0.0.1b0-cp38-cp38-macosx_10_15_universal2.whl (128.2 kB view hashes)

Uploaded CPython 3.8 macOS 10.15+ universal2 (ARM64, x86-64)

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page