Package with helpful object recursion utils
Project description
spelunk
spelunk is a module containing tools for recursively exploring python objects
Installation
spelunk can be installed with pip install spelunk. See below for details on how to install the project for development.
Quick Use Guide
1. Printing an object's tree
Ex:
from spelunk import print_obj_tree
obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,)]}]}
print_obj_tree(root_obj=obj)
# ROOT -> {'key': [1, ...]}
# ROOT['key'] -> [1, ...]
# ROOT['key'][0] -> 1
# ROOT['key'][1] -> (2.0,)
# ROOT['key'][1][0] -> 2.0
# ROOT['key'][2] -> {3}
# ROOT['key'][2]{id=4431022448} -> 3
# ROOT['key'][3] -> frozenset({4})
# ROOT['key'][3]{id=4431022480} -> 4
# ROOT['key'][4] -> {'subkey': [(1,)]}
# ROOT['key'][4]['subkey'] -> [(1,)]
# ROOT['key'][4]['subkey'][0] -> (1,)
# ROOT['key'][4]['subkey'][0][0] -> 1
- The root object is referred to as
ROOT. - Attributes are denoted with
ROOT.attr. - Keys from mappings are denoted with
ROOT['key']. - Indices from sequences are denoted with
ROOT[idx]. - Elements of sets and frozensets are indicated by their id in memory with
ROOT{id=10012}. - Elements of a
ValuesVieware indicated by their id in memory withROOT{ValuesView_id=10012}. (These are not common.)
The previous notations will be recursively chained together. For example, the path
ROOT['key'][2] indicates that in order to access the corresponding object {3}, we would
use root_obj['key'][2]. For sets it is a bit more difficult due to the need to inspect by id. To
access 4 via ROOT['key'][3]{id=4431022480} we would iterate through root_obj['key'][3] until we found a
matching id:
for elem in root_obj['key'][3]:
if id(elem) == 4431022480:
break
print(elem)
# 4
Fortunately, for getting references and manipulating elements of root_obj, there are additional tools that
avoid needing to tediously address and iterate (see below).
Before moving on, it's worth pointing out you can also sort by element and/or by path name by supplying
callables element_test and path_test that determine whether an element or path is interesting
(by default they always return True). element_test operates on the element itself and returns a bool.
path_test operates on either the most recent string (for attributes, mapping keys) or integer
(for sequence indices, memory ids of element of sets) of the current path and returns a bool.
For example, if you're at root_obj['key'] with path ROOT['key'], it would pass key to the input of path_test
and [1, (2,), ...] to element_path.
obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,)]}]}
print_obj_tree(root_obj=obj, element_test=lambda x: isinstance(x, float))
# ROOT['key'][1][0] -> 2.0
obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,)]}]}
print_obj_tree(root_obj=obj, path_test=lambda x: x=='subkey')
# ROOT['key'][4]['subkey'] -> [(1,)]
2. Getting the values and paths of objects
To get a dictionary of objects filtered by element/path and keyed by full path string, use get_elements:
from spelunk import get_elements
obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,)]}]}
get_elements(root_obj=obj, element_test=lambda x: isinstance(x, frozenset))
# {"ROOT['key'][3]": frozenset({4})}
get_elements(root_obj=obj, element_test=lambda x: isinstance(x, dict))
# {
# 'ROOT': {'key': [1, (2.0,), {3}, frozenset({4}), {'subkey': [(1,)]}]},
# "ROOT['key'][4]": {'subkey': [(1,)]}
# }
3. Overwriting elements
To overwrite elements use overwrite_elements:
from spelunk import overwrite_elements
obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,)]}]}
overwrite_elements(
root_obj=obj,
overwrite_value=None,
element_test=lambda x: isinstance(x, tuple)
)
print(obj)
# {'key': [1, None, {3}, frozenset({4}), {'subkey': [None]}]}
Overwriting will fail if attempting to overwrite an immutable container.
Ex:
obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,)]}]}
overwrite_elements(
root_obj=obj,
overwrite_value=None,
element_test=lambda x: isinstance(x, int)
)
print(obj)
# Failed to overwrite [(<Address.MUTABLE_MAPPING_KEY: 'MutableMappingKey'>, ...
# Exception: Cannot overwrite immutable collections.
# Traceback (most recent call last):
# ...
# TypeError: Cannot overwrite immutable collections.
Error messages can be silenced with silent=True and exceptions can be dismissed with
raise_on_exception.
obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,)]}]}
overwrite_elements(
root_obj=obj,
overwrite_value=None,
element_test=lambda x: isinstance(x, int),
silent=True,
raise_on_exception=False
)
print(obj)
# {'key': [None, (2.0,), {None}, frozenset({4}), {'subkey': [(1,)]}]}
4. Hot swapping
If you need to temporarily overwrite an object's contents with replacement
values and then restore the original values, there is a context manager hot_swap that achieves this.
As an example, say you had an object that contained threading locks and you wanted to make a deepcopy in
order to manipulate but preserve the original. The deepcopy will fail on the original object due to the fact
that thread locks are not serializable. With hot_swap, you can safely overwrite the non-serializable elements
with something safe, perform the deepcopy, and then restore the original elements.
from spelunk import hot_swap
from _thread import LockType
from threading import Lock
from copy import deepcopy
lock_0 = Lock()
lock_1 = Lock()
obj = {'key': [1, lock_0, {3}, frozenset((4,)), {'subkey': [(1,)]}], 'other_lock': lock_1}
print(obj)
# {
# 'key': [1, <unlocked _thread.lock object at 0x104a7b870>, {3}, frozenset({4}), {'subkey': [(1,)]}],
# 'other_lock': <unlocked _thread.lock object at 0x104a7b840>
# }
obj_deepcopy = deepcopy(obj)
# Traceback (most recent call last):
# ...
# TypeError: cannot pickle '_thread.lock' object
with hot_swap(root_obj=obj, overwrite_value='lock', element_test=lambda x: isinstance(x, LockType)):
obj_deepcopy = deepcopy(obj)
print(obj_deepcopy)
# {'key': [1, 'lock', {3}, frozenset({4}), {'subkey': [(1,)]}], 'other_lock': 'lock'}
print(obj)
# {
# 'key': [1, <unlocked _thread.lock object at 0x104a7b870>, {3}, frozenset({4}), {'subkey': [(1,)]}],
# 'other_lock': <unlocked _thread.lock object at 0x104a7b840>
# }
More Details
__slots__
spelunk fully support objects that define __slots__ (as well as __dict__ simultaneously). For each
object that isn't an ignored type or an instance of a Collection, the object's MRO is looked up and
each parent class is queried for possible contents of __slots__ in order to capture those from inherited classes.
These attributes are collected together (along with the contents of the instance's obj.__dict__). Note that
although we search for __slots__ (a class attribute), we do not include the object __slots__ in our exploration
because this is a class attribute, not an instance attribute. This changes if we pass a class cls as root_obj. Here,
cls.__dict__ contains all of the attached methods and class attributes (including __slots__ and the content within).
Here, we never inherit __slots__ contents from parent attributes because for any class cls, cls.__class__ is type
and type.__mro__ is (<class 'type'>, <class 'object'>). Neither type nor object define __slots__.
Ex:
from spelunk import print_obj_tree
class A:
important = "important"
__slots__ = '__dict__', 'val'
def __init__(self, val):
self.val = val
self.other = 'other'
print_obj_tree(A(1))
# ROOT -> <__main__.A object at 0x10a3dcdc0>
# ROOT.other -> 'other'
# ROOT.__dict__ -> {'other': 'other'}
# ROOT.__dict__['other'] -> 'other'
# ROOT.val -> 1
# ...
We can see that both the contents of __slots__ (which containts __dict__) and __dict__ attributes are captured but the
class attribute important is not. However, the class itself can be inspected:
print_obj_tree(A)
# ROOT -> <class '__main__.A'>
# ROOT.__module__ -> '__main__'
# ROOT.important -> 'important'
# ROOT.__slots__ -> ('__dict__', ...)
# ROOT.__slots__[0] -> '__dict__'
# ROOT.__slots__[1] -> 'val'
# ...
Memoization
spelunk utilizes memoization by caching previously seen objects in a memoization dictionary during searches. It will not print new paths for
objects which refer to the same place in memory. This is not only important for speed but also to prevent potential infinite recursive loops. There
is one important class of exceptions. In CPython, certain types of objects always share the same memory location (e.g. certain integers, strings)
regardless of how they're initialized. For a conservative approach, all instances of (Number, str, ByteString) are prevented from caching
so that each object's path is memorialized.
Ignored Collections
spelunk intentionally ignores Collections that are instances of (str, ByteString). This prevents string-like objects from being broken down by char which is usually not the preferred behavior.
Developing
Project Installation
If performing a hot_swap on a root_obj would involve attempting to mutate an immutable collection, an exception
will be thrown before any modifications occur (even legal mutations) to leave root_obj unchanged.
Additionally, by default, it will throw an exception before any attempt to hot swap an element of a mutable set because
this cannot be performed reliably. Imagine swapping all int for None in {1, 2, 3, None} -> {None}. It is then ambiguous to determine which
elements of the new set should be restored. By default, hot swapping is not allowed with sets, however,
if you know it can be performed safely you can use the flag allow_mutable_set_mutations. For example,
the set {1} could be safely hot swapped to {None} and restored due to the fact that the cardinality is unchanged.
If you prefer using pyenv and Poetry (or have no preference), the Makefile provides installation support. Make sure conda is deactivated fully (not even base active) and pyenv is not running a shell.
- Run
make install-pythonto installpyenv(if not present) and then usepyenvto install the specific version ofpython. - Run
make install-poetryto installPoetryif not already present. - Run
make install-repoto create a virtual environmentspelunkstored inspelunk/.venvand usePoetryto install all dependencies. - To use the environment simply run
source .venv/bin/activate. - To deactivate simply run
deactivate.
If you have a different package management system:
- Create a virtual environment.
- Either install using
Poetryor use external tools to convertpoetry.lockto arequirements.txtandpip install.
Tests
For contributors, kindly use the Makefile to perform formatting, linting, and unit testing locally.
- Run
make style-checkto dry-runblackformatting changes. - Run
make formatto format withblack. - Run
make lintto lint withflake8. - Run
make unit-testto runpytestand check the coverage report.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spelunk-0.1.2.tar.gz.
File metadata
- Download URL: spelunk-0.1.2.tar.gz
- Upload date:
- Size: 13.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.15 CPython/3.7.13 Darwin/21.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5f5cbef074ed7fa50b6a503cc9e6fac7678170c9deb90ffec7e0a20e78abec7
|
|
| MD5 |
619f2abfdd5ed70bdc03a7773fe760ce
|
|
| BLAKE2b-256 |
6873e2e8033cc2ec067831b76fff1fedbb14f4ae8740216c6a5ac7f4986c3d7a
|
File details
Details for the file spelunk-0.1.2-py3-none-any.whl.
File metadata
- Download URL: spelunk-0.1.2-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.15 CPython/3.7.13 Darwin/21.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5fdb75c1bfcfacc230c99452ed69290bb4d5002bc06480064498d4c8c6c8657
|
|
| MD5 |
d6bf3c8e7a9648f03ab315ebda11ec19
|
|
| BLAKE2b-256 |
30ddf2585b1d1c3e6a0114bba63a4e7c43677a43d2069eb24883c44678efa1e3
|