Package with helpful object recursion utils
Project description
Spelunk
Spelunk is a module containing tools for recursively exploring and manipulating python objects.
Installation
To install spelunk, simply install with pip install spelunk. See below for details on how to install
the project for development.
Overview
This section will review the major utilities of spelunk.
1. Printing an object's tree
from spelunk import print_obj_tree
class A:
def __init__(self):
self.val = 'val'
def __repr__(self):
return f'A(val={self.val})'
obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,), A()]}]}
print_obj_tree(root_obj=obj)
# ROOT -> {'key': [1, (2.0,), ...]}
# ROOT['key'] -> [1, (2.0,), ...]
# ROOT['key'][0] -> 1
# ROOT['key'][1] -> (2.0,)
# ROOT['key'][1][0] -> 2.0
# ROOT['key'][2] -> {3}
# ROOT['key'][2]{id=4315240816} -> 3
# ROOT['key'][3] -> frozenset({4})
# ROOT['key'][3]{id=4315240848} -> 4
# ROOT['key'][4] -> {'subkey': [(1,), A(val=val)]}
# ROOT['key'][4]['subkey'] -> [(1,), A(val=val)]
# ROOT['key'][4]['subkey'][0] -> (1,)
# ROOT['key'][4]['subkey'][0][0] -> 1
# ROOT['key'][4]['subkey'][1] -> A(val=val)
# ROOT['key'][4]['subkey'][1].val -> 'val'
- The root object is referred to as
ROOT. - Attributes are denoted with
ROOT.attr. - Keys from mappings are denoted with
ROOT['key']. - Indices from sequences are denoted with
ROOT[idx]. - Elements of sets and frozensets are indicated by their id in memory with
ROOT{id=10012}. - Elements of a
ValuesVieware indicated by their id in memory withROOT{ValuesView_id=10012}.
Note that ValuesView may seem odd here. This is included separately because this is a unique subclass
of Collection not captured by Mapping, Sequence, or Set. In contrast, both KeysView and
ItemsView are subclasses of Set.
The previous notations will be recursively chained together. For example, the path
ROOT['key'][2] indicates that in order to access the corresponding object {3}, we would
use root_obj['key'][2]. For sets it is also possible by iterating and inspecting by id. To
access 4 via ROOT['key'][3]{id=4315240848} we would iterate through root_obj['key'][3] until
we found a matching id:
for elem in root_obj['key'][3]:
if id(elem) == 4315240848:
break
print(elem)
# 4
Fortunately, for accessing and manipulating elements of root_obj, there are additional
tools that avoid needing to tediously address and iterate (see below).
Before moving on, it's worth pointing out you can also select by element and/or by "path name" by
supplying callables element_test and path_test that determine whether an element or path is
interesting (by default they always return True). element_test operates on the element itself and
returns a bool. path_test operates on the most recent branch of the current path and returns a bool. For
example, if you're at root_obj['key'] with path ROOT['key'], it would pass key to the input of
path_test and [1, (2,), ...] to element_test.
obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,), A()]}]}
print_obj_tree(root_obj=obj, element_test=lambda x: isinstance(x, float))
# ROOT['key'][1][0] -> 2.0
obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,), A()]}]}
print_obj_tree(root_obj=obj, path_test=lambda x: x=='subkey')
# ROOT['key'][4]['subkey'] -> [(1,), A(val=val)]
2. Getting the values and paths of objects
To get a dictionary of objects filtered by element/path and keyed by full path string,
use get_elements:
from spelunk import get_elements
obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,), A()]}]}
get_elements(root_obj=obj, element_test=lambda x: isinstance(x, frozenset))
# {"ROOT['key'][3]": frozenset({4})}
get_elements(root_obj=obj, element_test=lambda x: isinstance(x, dict))
# {
# 'ROOT': {'key': [1, (2.0,), {3}, frozenset({4}), {'subkey': [(1,), A(val=val)]}]},
# "ROOT['key'][4]": {'subkey': [(1,), A(val=val)]}
# }
3. Overwriting elements
To overwrite elements use overwrite_elements:
from spelunk import overwrite_elements
obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,), A()]}]}
overwrite_elements(
root_obj=obj,
overwrite_value=None,
element_test=lambda x: isinstance(x, tuple)
)
print(obj)
# {'key': [1, None, {3}, frozenset({4}), {'subkey': [None, A(val=val)]}]}
Objects can also be overwritten using a callable overwrite_func. If overwrite_func is not None and
callable, overwrite_value will be ignored.
from spelunk import overwrite_elements
obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,), A()]}]}
overwrite_elements(
root_obj=obj,
overwrite_func=str,
element_test=lambda x: isinstance(x, tuple)
)
print(obj)
# {'key': [1, '(2.0,)', {3}, frozenset({4}), {'subkey': ['(1,)', A(val=val)]}]}
Overwriting will fail if attempting to overwrite an immutable container.
obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,), A()]}]}
overwrite_elements(
root_obj=obj,
overwrite_value=None,
element_test=lambda x: isinstance(x, int)
)
print(obj)
# Failed to overwrite 4 at ROOT['key'][3]{id=4315240848}.
# Traceback (most recent call last):
# ...
# TypeError: Cannot overwrite immutable collections.
Error messages can be silenced with silent=True and exceptions can be dismissed with
raise_on_exception=False kwargs. Be aware that it may be difficult to determine which objects
failed with these options.
obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,), A()]}]}
overwrite_elements(
root_obj=obj,
overwrite_value=None,
element_test=lambda x: isinstance(x, int),
silent=True,
raise_on_exception=False
)
print(obj)
# {'key': [None, (2.0,), {None}, frozenset({4}), {'subkey': [(1,), A(val=val)]}]}
4. Hot swapping
One helpful utility is the ability to safely and reversibly "hot swap" certain elements of an object.
One use-case is writing a non-serializable object to JSON. Say we have some root object root_obj
that needs to be serialized to JSON but some of its constituent elements are not
serializable. It may be tedious to go through and null/convert the
non-serializable content. Furthermore, we may not want to permanently overwrite the non-serializable
content. One tool in spelunk is a context manager hot_swap that can find elements to arbitrary
specification and at any depth in the root object, overwrite their values, and then restore the originals.
from spelunk import hot_swap
import json
from datetime import datetime
from threading import Lock
from _thread import LockType
from typing import Any, Optional, Union
root_obj = {
'date': datetime.now(),
'thread_lock': Lock(),
'data': [1, 2, 3, 4],
'other_locks': [Lock(), Lock()]
}
print(root_obj)
# {
# 'date': datetime.datetime(2022, 11, 9, 13, 48, 19, 969856),
# 'thread_lock': <unlocked _thread.lock object at 0x105ff4600>,
# 'data': [1, 2, 3, 4],
# 'other_locks': [
# <unlocked _thread.lock object at 0x105ff4630>,
# <unlocked _thread.lock object at 0x105ff4690>
# ]
# }
Neither datetime nor _thread.lock objects are serializable.
json.dumps(root_obj)
# Traceback (most recent call last):
# ...
# TypeError: Object of type datetime is not JSON serializable
We can define callables to both capture the bad elements (get_datetime_and_locks) as well as
overwrite intelligently (overwrite_func).
def get_datetime_and_locks(obj: Any) -> bool:
"""Get the non-serializable content"""
return isinstance(obj, (datetime, LockType))
def overwrite_func(obj: Union[datetime, LockType]) -> Optional[str]:
"""Overwrite the non-serializable content"""
if isinstance(obj, datetime):
return str(obj)
else:
return None
Note that overwrite_func will only get called on objects for which get_datetime_and_locks returns
True. Now, we can use our context manager hot_swap to temporarily overwrite the non-serializable
content and then restore on exit.
with hot_swap(root_obj, element_test=get_datetime_and_locks, overwrite_func=overwrite_func):
serialized_obj = json.dumps(root_obj)
print(serialized_obj)
# {
# "date": "2022-11-09 13:48:19.969856",
# "thread_lock": null,
# "data": [1, 2, 3, 4],
# "other_locks": [null, null]
# }
root_obj is restored to its original form, allowing the datetime and thread lock objects to
continue to provide utility with further use.
print(root_obj)
# {
# 'date': datetime.datetime(2022, 11, 9, 13, 48, 19, 969856),
# 'thread_lock': <unlocked _thread.lock object at 0x105ff4600>,
# 'data': [1, 2, 3, 4],
# 'other_locks': [
# <unlocked _thread.lock object at 0x105ff4630>,
# <unlocked _thread.lock object at 0x105ff4690>
# ]
# }
If performing a hot_swap on a root_obj throws an exception, an attempt to restoreroot_obj to
its original form is made. Additionally, by default, it will throw an exception before any attempt
to hot swap an element of a mutable set because this cannot be performed reliably. Imagine swapping
all int for None in {1, 2, 3, None} -> {None}. It is then ambiguous to determine which
elements of the new set should be restored. It would be possible to copy the set {1, 2, 3, None}
and restore this to the parent object, however, this copy would not share the same location in
memory as the original and it may break internal references as a result. By default, hot swapping is
not allowed with mutable sets, however, if you know it can be performed safely you can use the kwarg
allow_mutable_set_mutations=True. For example, the set {1} could be safely hot swapped to
{None} and restored due to the fact that the cardinality is unchanged.
More Details
__slots__ and other class attributes
Spelunk fully support objects that define __slots__, __dict__, as well as __slots__ and __dict__
simultaneously). In order to deal with instance attributes derived from __slots__ defined on the
class itself (or which may be inherited from parent classes), the MRO is used. For each class
in the object's MRO, the contents of cls.__slots__ is collected (along with the contents of obj.__dict__ if
__dict__ is defined). For a given object, its attributes are collected as follows:
attrs = []- If
obj.__dict__exists, add all elements toattrs. - For
clsinobj.__class__.__mro__:- If
cls.__slots__exists, add all elements toattrs.
- If
Note in the special case that both __slots__ and __dict__ are defined (such that __dict__ is
a member of __slots__), __dict__ itself will be independently added as an attribute to attrs
in addition to the contents of __dict__.
Note that any attributes accessible to obj outside of __dict__ (such as attributes of the class),
are not included by spelunk. However, if one wants to inspect class attribute, the class itself can be
passed in as the root_obj. Here, __slots__ as well as all methods and other attributes of the class
will be collected and explored since these are direct attributes of the root object.
Ex:
from spelunk import print_obj_tree
class A:
important = "important"
__slots__ = '__dict__', 'val'
def __init__(self, val):
self.val = val
self.other = 'other'
def __repr__(self):
return f"A(val={self.val})"
print_obj_tree(A(1))
# ROOT -> A(val=1)
# ROOT.other -> 'other'
# ROOT.__dict__ -> {'other': 'other'}
# ROOT.__dict__['other'] -> 'other'
# ROOT.val -> 1
We can see that both the contents of __slots__ (which contains __dict__) and __dict__
are captured but the class attribute important is not. However, the class itself can be
inspected:
print_obj_tree(A)
# ROOT -> <class '__main__.A'>
# ROOT.__module__ -> '__main__'
# ROOT.important -> 'important'
# ROOT.__slots__ -> ('__dict__', 'val')
# ROOT.__slots__[0] -> '__dict__'
# ROOT.__slots__[1] -> 'val'
# ...
Memoization
Spelunk optionally utilizes memoization to increase performance and to prevent reporting multiple
paths which point to the same object in memory. By default, memoization is not used in order to
retrieve and output the full hierarchy of the object. Memoization can be turned on and off with the
kwarg memoization=True. Note that some objects cannot be memoized regardless of whether
memoization is turned on. Namely, any subclass of Number, str, or ByteString (along with
None) will not be memoized due to the fact that members of these classes may be interned and
all instances will always refer to the same singleton in memory in CPython.
String unraveling
Spelunk by default assumes that all subclasses of str or ByteString refer to an atomic
collection that should not be recursed into character by character. If you do want to recurse
into a str or ByteString instance, use the kwarg unravel_strings=True.
Developing
Project Installation
- Install an appropriate version of python and create a virtual environment. Pyenv is recommended.
- Set the shell variable
VENV_LOCaccording to the location of the virtural environment. For example, if the virtual environment is inside the repo home directory in a directory named.venvyou would runexport VENV_LOC=.venv(this is used by default and isn't needed unless the virtual environment is located elsewhere).
- Set the shell variable
- Install Poetry.
- Run
make install-repoto activate the virtual environment and install the dependencies with Poetry.
If you have a different package management system (e.g. conda):
- Create and source/activate a virtual environment.
- Either install using
Poetryor use external tools to convert thepoetry.lockfile to arequirements.txtand install withpip install -r requirements.txt.
Tests
For contributors, kindly use the Makefile to perform formatting, linting, and unit testing
locally.
- Run
make style-checkto dry-runblackformatting changes. - Run
make reformatto format withblack. - Run
make lintto lint withflake8. - Run
make unit-testto runpytestand check the coverage report.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spelunk-0.1.6.tar.gz.
File metadata
- Download URL: spelunk-0.1.6.tar.gz
- Upload date:
- Size: 15.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.15 CPython/3.7.13 Darwin/21.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b488d0b2b1ddc3bd4348207615377a0ac1a4bc0e246c1014e08e6b0c2890a6e4
|
|
| MD5 |
60ae43a35e0863631b842429fa633e16
|
|
| BLAKE2b-256 |
828e0b7fae79b7a9d900e31e558f897064259cc367334b5f047a0b2f78968d15
|
File details
Details for the file spelunk-0.1.6-py3-none-any.whl.
File metadata
- Download URL: spelunk-0.1.6-py3-none-any.whl
- Upload date:
- Size: 10.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.15 CPython/3.7.13 Darwin/21.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8013dfd84e819a02ae71763b279bcfeb9e7f12a68e8f2d7e8207cac5f91f5ae8
|
|
| MD5 |
02bc18735bae23456c21167e5810613c
|
|
| BLAKE2b-256 |
27a0239b2525388805d09e56dddd119db39473f0ea8c53e2872e0cd4cbd25c0b
|