Skip to main content

Injectively, deterministically maps arbitrary objects to hashable values

Project description

PyPI Package PyPI Downloads License Python Versions libraries.io sourcerank GitHub stars CI status GitHub last commit Checked with Mypy Code style: black

Injectively, deterministically maps arbitrary objects to hashable, immutable values

Quickstart

If you don’t have pip installed, see the pip install guide.

$ pip install charmonium.freeze

For a related project, charmonium.cache, I needed a function that deterministically, injectively maps objects to hashable objects.

  • “Injectively” means freeze(a) == freeze(b) implies a == b (with the precondition that a and b are of the same type).

  • “Deterministically” means it should return the same value across subsequent process invocations (with the same interpreter major and minor version), unlike Python’s hash function, which is not deterministic between processes.

  • “Hashable” means one can call hash(...) on it. All hashable values are immutable.

Have you ever felt like you wanted to “freeze” a list of arbitrary data into a hashable value? Now you can.

>>> obj = [1, 2, 3, {4, 5, 6}, object()]
>>> hash(obj)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> from charmonium.freeze import freeze
>>> from pprint import pprint
>>> freeze(obj)
(1, 2, 3, frozenset({4, 5, 6}), (('args', ('object',)),))

If you want to actually boil this down into a single integer, see charmonium.determ_hash. This library’s job is just to freeze the state.

It even works on custom types.

>>> # Make a custom type
>>> class Struct:
...     def frobnicate(self):
...         print(123)
>>> s = Struct()
>>> s.attr = 4
>>> freeze(s)
(('args', ('Struct',)), ('state', (('attr', 4),)))

And methods, functions, lambdas, etc.

>>> pprint(freeze(lambda x: x + 123))
(('code',
  (('name', '<lambda>'),
   ('varnames', ('x',)),
   ('constants', (None, 123)),
   ('bytecode', b'|\x00d\x01\x17\x00S\x00'))),)
>>> import functools
>>> pprint(freeze(functools.partial(print, 123)))
(('constructor', 'partial'),
 ('args', ('print',)),
 ('state', ('print', (123,), (), None)))
>>> pprint(freeze(Struct.frobnicate))
(('code',
  (('name', 'frobnicate'),
   ('varnames', ('self',)),
   ('constants', (None, 123)),
   ('bytecode', b't\x00d\x01\x83\x01\x01\x00d\x00S\x00'))),)
>>> i = 0
>>> def square_plus_i(x):
...     # Value of global variable will be included in the function's frozen state.
...     return x**2 + i
...
>>> pprint(freeze(square_plus_i))
(('code',
  (('name', 'square_plus_i'),
   ('varnames', ('x',)),
   ('constants', (None, 2)),
   ('bytecode', b'|\x00d\x01\x13\x00t\x00\x17\x00S\x00'))),
 ('closure globals', (('i', 0),)))

If the source code of square_plus_i changes between successive invocations, then the freeze value will change. This is useful for caching unchanged functions.

Special cases

  • freeze on functions returns their bytecode, constants, and closure-vars. The remarkable thing is that this is true across subsequent invocations of the same process. If the user edits the script and changes the function, then it’s freeze will change too.

    (freeze(f) == freeze(g)) implies (for all x, f(x) == g(x))
  • freeze on an object returns the data that used in the pickle protocol. This makes freeze work correctly on most user-defined types. However, there can still be special cases: pickle may incorporate non-deterministic values. In this case, there are two remedies:

    • If you can tweak the definition of the class, add a method called __getfrozenstate__ which returns a deterministic snapshot of the state. This takes precedence over the Pickle protocol, if it is defined.

      >>> class Struct:
      ...     pass
      >>> s = Struct()
      >>> s.attr = 4
      >>> pprint(freeze(s))
      (('args', ('Struct',)), ('state', (('attr', 4),)))
      >>> # which is based on the Pickle protocol's definition of `__reduce__`:
      >>> pprint(s.__reduce__())
      (<function _reconstructor at 0x...>,
       (<class '__main__.Struct'>, <class 'object'>, None),
       {'attr': 4})
      
    • If you cannot tweak the definition of the class, you can still register single dispatch handler for that type:

      >>> from typing import Set, Hashable
      >>> from charmonium.freeze import freeze, _freeze_dispatch, _freeze
      >>> class Test:
      ...     deterministic_val = 3
      ...     nondeterministic_val = 4
      ...
      >>> @_freeze_dispatch.register(Test)
      ... def _(obj: Test, tabu: Set[int], level: int) -> Hashable:
      ...     # Type annotations are optional.
      ...     # I have included them here for clarity.
      ...
      ...     # `tabu` is for object cycle detection.
      ...     tabu = tabu | {id(obj)}
      ...
      ...     # `level` is for logging and infinite recursion detection.
      ...     level = level + 1
      ...
      ...     # Freeze should depend only on deterministic values.
      ...     if isinstance(obj.deterministic_val, int):
      ...         return obj.deterministic_val
      ...     else:
      ...         # If the underlying instance variable is not hashable, we can use recursion to help.
      ...         # Call `_freeze` instead of `freeze` to recurse with `tabu` and `level`.
      ...         return _freeze(obj.deterministic_val, tabu, level)
      ...
      >>> freeze(Test())
      3
      
  • Note that as of Python 3.7, dictionaries “remember” their insertion order. As such,

    >>> freeze({"a": 1, "b": 2})
    (('a', 1), ('b', 2))
    >>> freeze({"b": 2, "a": 1})
    (('b', 2), ('a', 1))
    

Developing

See CONTRIBUTING.md for instructions on setting up a development environment.

Debugging

Use the following lines to see how freeze decomposes an object into primitive values.

import logging, os
logger = logging.getLogger("charmonium.freeze")
logger.setLevel(logging.DEBUG)
fh = logging.FileHandler("freeze.log")
fh.setLevel(logging.DEBUG)
fh.setFormatter(logging.Formatter("%(message)s"))
logger.addHandler(fh)
logger.debug("Program %d", os.getpid())

i = 0
def square_plus_i(x):
    # Value of global variable will be included in the function's frozen state.
    return x**2 + i

from charmonium.freeze import freeze
freeze(square_plus_i)

This produces a log such as in freeze.log:

freeze begin <function square_plus_i at 0x7f9228bff550>
 function <function square_plus_i at 0x7f9228bff550>
  tuple (('code', <code object square_plus_i at 0x7f9228c6cf50, file "/tmp/ipython_edit_303agyiz/ipython_edit_rez33yf_.py", line 2>), 'closure globals', {'i': 0})
   tuple ('code', <code object square_plus_i at 0x7f9228c6cf50, file "/tmp/ipython_edit_303agyiz/ipython_edit_rez33yf_.py", line 2>)
    'code'
    code <code object square_plus_i at 0x7f9228c6cf50, file "/tmp/ipython_edit_303agyiz/ipython_edit_rez33yf_.py", line 2>
     tuple (None, 2)
      None
      2
     b'|\x00d\x01\x13\x00t\x00\x17\x00S\x00'
   'closure globals'
   dict {'i': 0}
    'i'
    0
freeze end

I do this to find the differences between subsequent runs:

$ python code.py
$ mv freeze.log freeze.0.log

$ python code.py
$ mv freeze.log freeze.1.log

$ sed -i 's/at 0x[0-9a-f]*//g' freeze.*.log
# This removes pointer values that appear in the `repr(...)`.

$ meld freeze.0.log freeze.1.log
# Alternatively, use `icdiff` or `diff -u1`.

TODO

  • ☑ Bring hash into separate package.

  • ☐ Correctness

    • ☑ Test hashing sets with different orders. Assert tests fail.

    • ☑ Test hashing dicts with different orders. Assert tests fail.

    • ☑ Don’t include properties in hash.

    • ☐ Support closures which include import x and from x import y

    • ☐ Test for unique representation between types.

    • ☐ Test functions with minor changes.

    • ☐ Test set/dict with diff hash.

    • ☐ Test obj with slots.

    • ☐ Test hash for objects and classes more carefully.

    • ☑ Improve test coverage.

    • ☐ Investigate when modules are assumed constant.

  • ☑ API

    • ☑ Use user-customizable multidispatch.

    • ☑ Make it easier to register a freeze method for a type.

    • ☑ Encapsulate global config freeze into object.

    • ☐ Move “get call graph” into its own package.

  • ☐ Make freeze handle more types:

    • ☑ Module: freeze by name.

    • ☑ Objects: include the source-code of methods.

    • ☑ C extensions. freeze by name, like module

    • ☑ Methods

    • ☑ fastpath for numpy arrays

    • tqdm

    • numpy.int64(1234)

    • ☑ Pandas dataframe

    • ☑ Catch Pickle TypeError

    • ☑ Catch Pickle ImportError

  • ☐ Performance

    • ☐ Make performance benchmarks.

    • ☐ Memoize the hash of immutable data.

      • Note: this might not work, because immutable data can still contain pointers to mutable data, e.g.: in a = ([]), a is immutable, but a[0] is not.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

charmonium.freeze-0.5.6.tar.gz (16.5 kB view hashes)

Uploaded Source

Built Distribution

charmonium.freeze-0.5.6-py3-none-any.whl (11.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page