Skip to main content

Injectively, deterministically maps arbitrary objects to hashable values

Project description

PyPI Package PyPI Downloads License Python Versions libraries.io sourcerank GitHub stars CI status GitHub last commit Checked with Mypy Code style: black

Injectively, deterministically maps arbitrary objects to hashable, immutable values

Quickstart

If you don’t have pip installed, see the pip install guide.

$ pip install charmonium.freeze

For a related project, charmonium.cache, I needed a function that deterministically, injectively maps objects to hashable objects.

  • “Injectively” means freeze(a) == freeze(b) implies a == b (with the precondition that a and b are of the same type).

  • “Deterministically” means it should return the same value across subsequent process invocations (with the same interpreter major and minor version), unlike Python’s hash function, which is not deterministic between processes.

  • “Hashable” means one can call hash(...) on it. All hashable values are immutable.

Have you ever felt like you wanted to “freeze” a list of arbitrary data into a hashable value? Now you can.

>>> obj = [1, 2, 3, {4, 5, 6}, object()]
>>> hash(obj)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> from charmonium.freeze import freeze
>>> from pprint import pprint
>>> freeze(obj)
(1, 2, 3, frozenset({4, 5, 6}), (('args', ('object',)),))

If you want to actually boil this down into a single integer, see charmonium.determ_hash. This library’s job is just to freeze the state.

It even works on custom types.

>>> # Make a custom type
>>> class Struct:
...     def frobnicate(self):
...         print(123)
>>> s = Struct()
>>> s.attr = 4
>>> freeze(s)
(('args', ('Struct',)), ('state', (('attr', 4),)))

And methods, functions, lambdas, etc.

>>> pprint(freeze(lambda x: x + 123))
(('code',
  (('name', '<lambda>'),
   ('varnames', ('x',)),
   ('constants', (None, 123)),
   ('bytecode', b'|\x00d\x01\x17\x00S\x00'))),)
>>> import functools
>>> pprint(freeze(functools.partial(print, 123)))
(('constructor', 'partial'),
 ('args', ('print',)),
 ('state', ('print', (123,), (), None)))
>>> pprint(freeze(Struct.frobnicate))
(('code',
  (('name', 'frobnicate'),
   ('varnames', ('self',)),
   ('constants', (None, 123)),
   ('bytecode', b't\x00d\x01\x83\x01\x01\x00d\x00S\x00'))),)
>>> i = 0
>>> def square_plus_i(x):
...     # Value of global variable will be included in the function's frozen state.
...     return x**2 + i
...
>>> pprint(freeze(square_plus_i))
(('code',
  (('name', 'square_plus_i'),
   ('varnames', ('x',)),
   ('constants', (None, 2)),
   ('bytecode', b'|\x00d\x01\x13\x00t\x00\x17\x00S\x00'))),
 ('closure globals', (('i', 0),)))

If the source code of square_plus_i changes between successive invocations, then the freeze value will change. This is useful for caching unchanged functions.

Special cases

  • freeze on functions returns their bytecode, constants, and closure-vars. The remarkable thing is that this is true across subsequent invocations of the same process. If the user edits the script and changes the function, then it’s freeze will change too.

    (freeze(f) == freeze(g)) implies (for all x, f(x) == g(x))
  • freeze on an object returns the data that used in the pickle protocol. This makes freeze work correctly on most user-defined types. However, there can still be special cases: pickle may incorporate non-deterministic values. In this case, there are two remedies:

    • If you can tweak the definition of the class, add a method called __getfrozenstate__ which returns a deterministic snapshot of the state. This takes precedence over the Pickle protocol, if it is defined.

      >>> class Struct:
      ...     pass
      >>> s = Struct()
      >>> s.attr = 4
      >>> pprint(freeze(s))
      (('args', ('Struct',)), ('state', (('attr', 4),)))
      >>> # which is based on the Pickle protocol's definition of `__reduce__`:
      >>> pprint(s.__reduce__())
      (<function _reconstructor at 0x...>,
       (<class '__main__.Struct'>, <class 'object'>, None),
       {'attr': 4})
      
    • If you cannot tweak the definition of the class, you can still register single dispatch handler for that type:

      >>> from typing import Set, Hashable
      >>> from charmonium.freeze import freeze, _freeze_dispatch, _freeze
      >>> class Test:
      ...     deterministic_val = 3
      ...     nondeterministic_val = 4
      ...
      >>> @_freeze_dispatch.register(Test)
      ... def _(obj: Test, tabu: Set[int], level: int) -> Hashable:
      ...     # Type annotations are optional.
      ...     # I have included them here for clarity.
      ...
      ...     # `tabu` is for object cycle detection.
      ...     tabu = tabu | {id(obj)}
      ...
      ...     # `level` is for logging and infinite recursion detection.
      ...     level = level + 1
      ...
      ...     # Freeze should depend only on deterministic values.
      ...     if isinstance(obj.deterministic_val, int):
      ...         return obj.deterministic_val
      ...     else:
      ...         # If the underlying instance variable is not hashable, we can use recursion to help.
      ...         # Call `_freeze` instead of `freeze` to recurse with `tabu` and `level`.
      ...         return _freeze(obj.deterministic_val, tabu, level)
      ...
      >>> freeze(Test())
      3
      
  • Note that as of Python 3.7, dictionaries “remember” their insertion order. As such,

    >>> freeze({"a": 1, "b": 2})
    (('a', 1), ('b', 2))
    >>> freeze({"b": 2, "a": 1})
    (('b', 2), ('a', 1))
    

Developing

See CONTRIBUTING.md for instructions on setting up a development environment.

Debugging

Use the following lines to see how freeze decomposes an object into primitive values.

import logging, os
logger = logging.getLogger("charmonium.freeze")
logger.setLevel(logging.DEBUG)
fh = logging.FileHandler("freeze.log")
fh.setLevel(logging.DEBUG)
fh.setFormatter(logging.Formatter("%(message)s"))
logger.addHandler(fh)
logger.debug("Program %d", os.getpid())

i = 0
def square_plus_i(x):
    # Value of global variable will be included in the function's frozen state.
    return x**2 + i

from charmonium.freeze import freeze
freeze(square_plus_i)

This produces a log such as in freeze.log:

freeze begin <function square_plus_i at 0x7f9228bff550>
 function <function square_plus_i at 0x7f9228bff550>
  tuple (('code', <code object square_plus_i at 0x7f9228c6cf50, file "/tmp/ipython_edit_303agyiz/ipython_edit_rez33yf_.py", line 2>), 'closure globals', {'i': 0})
   tuple ('code', <code object square_plus_i at 0x7f9228c6cf50, file "/tmp/ipython_edit_303agyiz/ipython_edit_rez33yf_.py", line 2>)
    'code'
    code <code object square_plus_i at 0x7f9228c6cf50, file "/tmp/ipython_edit_303agyiz/ipython_edit_rez33yf_.py", line 2>
     tuple (None, 2)
      None
      2
     b'|\x00d\x01\x13\x00t\x00\x17\x00S\x00'
   'closure globals'
   dict {'i': 0}
    'i'
    0
freeze end

I do this to find the differences between subsequent runs:

$ python code.py
$ mv freeze.log freeze.0.log

$ python code.py
$ mv freeze.log freeze.1.log

$ sed -i 's/at 0x[0-9a-f]*//g' freeze.*.log
# This removes pointer values that appear in the `repr(...)`.

$ meld freeze.0.log freeze.1.log
# Alternatively, use `icdiff` or `diff -u1`.

TODO

  • ☑ Bring hash into separate package.

  • ☐ Correctness

    • ☑ Test hashing sets with different orders. Assert tests fail.

    • ☑ Test hashing dicts with different orders. Assert tests fail.

    • ☑ Don’t include properties in hash.

    • ☐ Support closures which include import x and from x import y

    • ☐ Test for unique representation between types.

    • ☐ Test functions with minor changes.

    • ☐ Test set/dict with diff hash.

    • ☐ Test obj with slots.

    • ☐ Test hash for objects and classes more carefully.

    • ☑ Improve test coverage.

    • ☐ Investigate when modules are assumed constant.

  • ☑ API

    • ☑ Use user-customizable multidispatch.

    • ☑ Make it easier to register a freeze method for a type.

    • ☑ Encapsulate global config freeze into object.

    • ☐ Move “get call graph” into its own package.

  • ☐ Make freeze handle more types:

    • ☑ Module: freeze by name.

    • ☑ Objects: include the source-code of methods.

    • ☑ C extensions. freeze by name, like module

    • ☑ Methods

    • ☑ fastpath for numpy arrays

    • tqdm

    • numpy.int64(1234)

    • ☑ Pandas dataframe

    • ☑ Catch Pickle TypeError

    • ☑ Catch Pickle ImportError

  • ☐ Performance

    • ☐ Make performance benchmarks.

    • ☐ Memoize the hash of immutable data.

      • Note: this might not work, because immutable data can still contain pointers to mutable data, e.g.: in a = ([]), a is immutable, but a[0] is not.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

charmonium.freeze-0.5.6.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

charmonium.freeze-0.5.6-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file charmonium.freeze-0.5.6.tar.gz.

File metadata

  • Download URL: charmonium.freeze-0.5.6.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.9.10 Linux/5.13.0-37-generic

File hashes

Hashes for charmonium.freeze-0.5.6.tar.gz
Algorithm Hash digest
SHA256 9631da8ff031c4e701485132e10f9c697545c8a5714897e184617c3f3c0d390d
MD5 621efb70b970f0fb27e31a23d009e89d
BLAKE2b-256 2215365d4c9da28cbc11d04b1d4beb987c733e87e90cada2550c36fafe3a19dd

See more details on using hashes here.

File details

Details for the file charmonium.freeze-0.5.6-py3-none-any.whl.

File metadata

  • Download URL: charmonium.freeze-0.5.6-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.9.10 Linux/5.13.0-37-generic

File hashes

Hashes for charmonium.freeze-0.5.6-py3-none-any.whl
Algorithm Hash digest
SHA256 087c85658d3236b23f09647a40cac48f6e20df4201df457e47477eb2e7a37e26
MD5 e06f943e37baba75f6cd565b617b9dc3
BLAKE2b-256 473239bd7a3072bbdd58938a94c67b19ca65192f4001267ef269fa193dde23b7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page