Skip to main content

Package with helpful object recursion utils

Project description

spelunk

spelunk is a module containing tools for recursively exploring python objects. Here are a few examples.

1. Printing an object's tree

Ex:

from spelunk import print_obj_tree

obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,)]}]}
print_obj_tree(root_obj=obj)

# ROOT -> {'key': [1, ...]}
# ROOT['key'] -> [1, ...]
# ROOT['key'][0] -> 1
# ROOT['key'][1] -> (2.0,)
# ROOT['key'][1][0] -> 2.0
# ROOT['key'][2] -> {3}
# ROOT['key'][2]{id=4431022448} -> 3
# ROOT['key'][3] -> frozenset({4})
# ROOT['key'][3]{id=4431022480} -> 4
# ROOT['key'][4] -> {'subkey': [(1,)]}
# ROOT['key'][4]['subkey'] -> [(1,)]
# ROOT['key'][4]['subkey'][0] -> (1,)
# ROOT['key'][4]['subkey'][0][0] -> 1
  • The root object is referred to as ROOT.
  • Attributes are denoted with ROOT.attr.
  • Keys from mappings are denoted with ROOT['key'].
  • Indices from sequences are denoted with ROOT[idx].
  • Elements of sets and frozensets are indicated by their id in memory with ROOT{id=10012}.
  • Elements of a ValuesView are indicated by their id in memory with ROOT{ValuesView_id=10012}. (These are not common.)

The previous notations will be recursively chained together. For example, the path ROOT['key'][2] indicates that in order to access the corresponding object {3}, we would use root_obj['key'][2]. For sets it is a bit more difficult due to the need to inspect by id. To access 4 via ROOT['key'][3]{id=4431022480} we would iterate through root_obj['key'][3] until we found a matching id:

for elem in root_obj['key'][3]:
  if id(elem) == 4431022480:
    break
    
print(elem)
# 4

Fortunately, for getting references and manipulating elements of root_obj, there are additional tools that avoid needing to tediously address and iterate (see below).

Before moving on, it's worth pointing out you can also sort by element and/or by path name by supplying callables element_test and path_test that determine whether an element or path is interesting (by default they always return True). element_test operates on the element itself and returns a bool. path_test operates on either the most recent string (for attributes, mapping keys) or integer (for sequence indices, memory ids of element of sets) of the current path and returns a bool. For example, if you're at root_obj['key'] with path ROOT['key'], it would pass key to the input of path_test and [1, (2,), ...] to element_path.

obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,)]}]}
print_obj_tree(root_obj=obj, element_test=lambda x: isinstance(x, float))

# ROOT['key'][1][0] -> 2.0
obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,)]}]}
print_obj_tree(root_obj=obj, path_test=lambda x: x=='subkey')  

# ROOT['key'][4]['subkey'] -> [(1,)]

2. Getting the values and paths of objects

To get a dictionary of objects filtered by element/path and keyed by full path string, use get_elements:

from spelunk import get_elements
  
obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,)]}]}
get_elements(root_obj=obj, element_test=lambda x: isinstance(x, frozenset))

# {"ROOT['key'][3]": frozenset({4})}

get_elements(root_obj=obj, element_test=lambda x: isinstance(x, dict))
# {
#   'ROOT':           {'key': [1, (2.0,), {3}, frozenset({4}), {'subkey': [(1,)]}]}, 
#   "ROOT['key'][4]": {'subkey': [(1,)]}
# }

3. Overwriting elements

To overwrite elements use overwrite_elements:

from spelunk import overwrite_elements

obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,)]}]}
overwrite_elements(
    root_obj=obj, 
    overwrite_value=None, 
    element_test=lambda x: isinstance(x, tuple)
)
print(obj)

# {'key': [1, None, {3}, frozenset({4}), {'subkey': [None]}]}

Overwriting will fail if attempting to overwrite an immutable container.

Ex:

obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,)]}]}
overwrite_elements(
    root_obj=obj, 
    overwrite_value=None, 
    element_test=lambda x: isinstance(x, int)
)
print(obj)

# Failed to overwrite [(<Address.MUTABLE_MAPPING_KEY: 'MutableMappingKey'>, ...
# Exception: Cannot overwrite immutable collections.
# Traceback (most recent call last):
# ...
# TypeError: Cannot overwrite immutable collections.

Error messages can be silenced with silent=True and exceptions can be dismissed with raise_on_exception.

obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,)]}]}
overwrite_elements(
    root_obj=obj, 
    overwrite_value=None, 
    element_test=lambda x: isinstance(x, int),
    silent=True,
    raise_on_exception=False
)
print(obj)

# {'key': [None, (2.0,), {None}, frozenset({4}), {'subkey': [(1,)]}]}

4. Hot swapping

If you need to temporarily overwrite an object's contents with replacement values and then restore the original values, there is a context manager hot_swap that achieves this. As an example, say you had an object that contained threading locks and you wanted to make a deepcopy in order to manipulate but preserve the original. The deepcopy will fail on the original object due to the fact that thread locks are not serializable. With hot_swap, you can safely overwrite the non-serializable elements with something safe, perform the deepcopy, and then restore the original elements.

from spelunk import hot_swap
from _thread import LockType
from threading import Lock
from copy import deepcopy

lock_0 = Lock()
lock_1 = Lock()
obj = {'key': [1, lock_0, {3}, frozenset((4,)), {'subkey': [(1,)]}], 'other_lock': lock_1}

print(obj)
# {
#   'key': [1, <unlocked _thread.lock object at 0x104a7b870>, {3}, frozenset({4}), {'subkey': [(1,)]}], 
#  'other_lock': <unlocked _thread.lock object at 0x104a7b840>
# }

obj_deepcopy = deepcopy(obj)
# Traceback (most recent call last):
# ...
# TypeError: cannot pickle '_thread.lock' object

with hot_swap(root_obj=obj, overwrite_value='lock', element_test=lambda x: isinstance(x, LockType)):
    obj_deepcopy = deepcopy(obj)

print(obj_deepcopy)
# {'key': [1, 'lock', {3}, frozenset({4}), {'subkey': [(1,)]}], 'other_lock': 'lock'}

print(obj)
# {
#   'key': [1, <unlocked _thread.lock object at 0x104a7b870>, {3}, frozenset({4}), {'subkey': [(1,)]}], 
#  'other_lock': <unlocked _thread.lock object at 0x104a7b840>
# }

If performing a hot_swap on a root_obj would involve attempting to mutate an immutable collection, an exception will be thrown before any modifications occur (even legal mutations) to leave root_obj unchanged. Additionally, by default, it will throw an exception before any attempt to hot swap an element of a mutable set because this cannot be performed reliably. Imagine swapping all int for None in {1, 2, 3, None} -> {None}. It is then ambiguous to determine which elements of the new set should be restored. By default, hot swapping is not allowed with sets, however, if you know it can be performed safely you can use the flag allow_mutable_set_mutations. For example, the set {1} could be safely hot swapped to {None} and restored due to the fact that the cardinality is unchanged.

Details

__slots__

spelunk fully support objects that define __slots__ (as well as __dict__ simultaneously). For each object that isn't an ignored type or an instance of a Collection, the object's MRO is looked up and each parent class is queried for possible contents of __slots__ in order to capture those from inherited classes. These attributes are collected together (along with the contents of the instance's obj.__dict__). Note that although we search for __slots__ (a class attribute), we do not include the object __slots__ in our exploration because this is a class attribute, not an instance attribute. This changes if we pass a class cls as root_obj. Here, cls.__dict__ contains all of the attached methods and class attributes (including __slots__ and the content within). Here, we never inherit __slots__ contents from parent attributes because for any class cls, cls.__class__ is type and type.__mro__ is (<class 'type'>, <class 'object'>). Neither type nor object define __slots__.

Ex:

from spelunk import print_obj_tree

class A:
    important = "important"
    __slots__ = '__dict__', 'val'
    def __init__(self, val):
        self.val = val
        self.other = 'other'

print_obj_tree(A(1))
# ROOT -> <__main__.A object at 0x10a3dcdc0>
# ROOT.other -> 'other'
# ROOT.__dict__ -> {'other': 'other'}
# ROOT.__dict__['other'] -> 'other'
# ROOT.val -> 1
# ...

We can see that both the contents of __slots__ (which containts __dict__) and __dict__ attributes are captured but the class attribute important is not. However, the class itself can be inspected:

print_obj_tree(A)
# ROOT -> <class '__main__.A'>
# ROOT.__module__ -> '__main__'
# ROOT.important -> 'important'
# ROOT.__slots__ -> ('__dict__', ...)
# ROOT.__slots__[0] -> '__dict__'
# ROOT.__slots__[1] -> 'val'
# ...

Memoization

spelunk utilizes memoization by caching previously seen objects in a memoization dictionary during searches. It will not print new paths for objects which refer to the same place in memory. This is not only important for speed but also to prevent potential infinite recursive loops. There is one important class of exceptions. In CPython, certain types of objects always share the same memory location (e.g. certain integers, strings) regardless of how they're initialized. For a conservative approach, all instances of (Number, str, ByteString) are prevented from caching so that each object's path is memorialized.

Ignored Collections

spelunk intentionally ignores Collections that are instances of (str, ByteString). This prevents string-like objects from being broken down by char which is usually not the preferred behavior.

Installation

If you prefer using pyenv and Poetry (or have no preference), the Makefile provides installation support. Make sure conda is deactivated fully (not even base active) and pyenv is not running a shell.

  1. Run make install-python to install pyenv (if not present) and then use pyenv to install the specific version of python.
  2. Run make install-poetry to install Poetry if not already present.
  3. Run make install-repo to create a virtual environment spelunk stored in spelunk/.venv and use Poetry to install all dependencies.
  4. To use the environment simply run source .venv/bin/activate.
  5. To deactivate simply run deactivate.

If you have a different package management system:

  1. Create a virtual environment.
  2. Either install using Poetry or use external tools to convert poetry.lock to a requirements.txt and pip install.

Developing

For contributors, kindly use the Makefile to perform formatting, linting, and unit testing locally.

  1. Run make style-check to dry-run black formatting changes.
  2. Run make format to format with black.
  3. Run make lint to lint with flake8.
  4. Run make unit-test to run pytest and check the coverage report.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spelunk-0.1.1.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spelunk-0.1.1-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file spelunk-0.1.1.tar.gz.

File metadata

  • Download URL: spelunk-0.1.1.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.15 CPython/3.7.13 Darwin/21.6.0

File hashes

Hashes for spelunk-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d2840edff01d079f26ea565450a1d152d44030125379d575b36d778579f9b124
MD5 ecad0539ef32af75bda8539994297395
BLAKE2b-256 7f1ee93d8d53c5d872b0fad47326941726145dfc93cb5d0182dff050f3ae8875

See more details on using hashes here.

File details

Details for the file spelunk-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: spelunk-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.15 CPython/3.7.13 Darwin/21.6.0

File hashes

Hashes for spelunk-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 171f1abc58db0ae61f3990bffaf750fa44a3b00e8800d645c85e7a4b578fe881
MD5 e2aa580e09eb54de62a77f6d96e7aabe
BLAKE2b-256 b937688d49c4c1548c696904ea247fb6eb1712b2e055922e6d07e867c4ffef78

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page