Skip to main content

Traversal over Python's objects subtree and calculate the total size of the subtree in bytes (deep size).

Project description

objsize

Coverage Status Downloads

The objsize Python package allows for the exploration and measurement of an object’s complete memory usage in bytes, including its child objects. This process, often referred to as deep size calculation, is achieved through Python’s internal Garbage Collection (GC) mechanism.

The objsize package is designed to ignore shared objects, such as None, types, modules, classes, functions, and lambdas, because they are shared across many instances. One of the key performance features of objsize is that it avoids recursive calls, ensuring a faster and safer execution.

Key Features

  • Traverse objects’ subtree
  • Calculates the size of objects, including nested objects (deep size), in bytes
  • Exclude non-exclusive objects
  • Exclude specified objects subtree
  • Provides flexibility by allowing users to define custom handlers for:
    • Object’s size calculation
    • Object’s referents (i.e., its children)
    • Object filter (skip specific objects)

Documentation

objsize Traversal over Python's objects subtree and calculating the total size of the subtree (deep size).

Install

pip install objsize==0.7.0

Basic Usage

Calculate the size of the object including all its members in bytes.

>>> import objsize
>>> objsize.get_deep_size(dict(arg1='hello', arg2='world'))
340

It is possible to calculate the deep size of multiple objects by passing multiple arguments:

>>> objsize.get_deep_size(['hello', 'world'], dict(arg1='hello', arg2='world'), {'hello', 'world'})
628

Complex Data

objsize can calculate the size of an object’s entire subtree in bytes regardless of the type of objects in it, and its depth.

Here is a complex data structure, for example, that include a self reference:

my_data = list(range(3)), list(range(3, 6))

class MyClass:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        self.d = {'x': x, 'y': y, 'self': self}

    def __repr__(self):
        return f"{self.__class__.__name__}()"

my_obj = MyClass(*my_data)

We can calculate my_obj deep size, including its stored data.

>>> objsize.get_deep_size(my_obj)
724

We might want to ignore non-exclusive objects such as the ones stored in my_data.

>>> objsize.get_deep_size(my_obj, exclude=[my_data])
384

Or simply let objsize detect that automatically:

>>> objsize.get_exclusive_deep_size(my_obj)
384

Non Shared Functions or Classes

objsize filters functions, lambdas, and classes by default since they are usually shared among many objects. For example:

>>> method_dict = {"identity": lambda x: x, "double": lambda x: x*2}
>>> objsize.get_deep_size(method_dict)
232

Some objects, however, as illustrated in the above example, have unique functions not shared by other objects. Due to this, it may be useful to count their sizes. You can achieve this by providing an alternative filter function.

>>> objsize.get_deep_size(method_dict, filter_func=objsize.shared_object_filter)
986

Notes:

Special Cases

Some objects handle their data in a way that prevents Python’s GC from detecting it. The user can supply a special way to calculate the actual size of these objects.

Case 1: torch

Using a simple calculation of the object size won’t work for torch.Tensor.

>>> import torch
>>> objsize.get_deep_size(torch.rand(200))
72

So the user can define its own size calculation handler for such cases:

import objsize
import sys
import torch

def get_size_of_torch(o):
    # `objsize.safe_is_instance` catches `ReferenceError` caused by `weakref` objects
    if objsize.safe_is_instance(o, torch.Tensor):
        return sys.getsizeof(o) + (o.element_size() * o.nelement())
    else:
        return sys.getsizeof(o)

Then use it as follows:

>>> objsize.get_deep_size(
...   torch.rand(200),
...   get_size_func=get_size_of_torch
... )
872

The above approach may neglect the object’s internal structure. The user can help objsize to find the object’s hidden storage by supplying it with its own referent and filter functions:

import objsize
import gc
import torch

def get_referents_torch(*objs):
    # Yield all native referents
    yield from gc.get_referents(*objs)
    for o in objs:
        # If the object is a torch tensor, then also yield its storage
        if type(o) == torch.Tensor:
            yield o.untyped_storage()

# `torch.dtype` is a common object like Python's types.
MySharedObjects = (*objsize.SharedObjectOrFunctionType, torch.dtype)

def filter_func(o):
    return not objsize.safe_is_instance(o, MySharedObjects)

Then use these as follows:

>>> objsize.get_deep_size(
...   torch.rand(200),
...   get_referents_func=get_referents_torch,
...   filter_func=filter_func
... )
928

Case 2: weakref

Using a simple calculation of the object size won’t work for weakref.proxy.

>>> from collections import UserList
>>> o = UserList([0]*100)
>>> objsize.get_deep_size(o)
1032
>>> import weakref
>>> o_ref = weakref.proxy(o)
>>> objsize.get_deep_size(o_ref)
72

To mitigate this, you can provide a method that attempts to fetch the proxy’s referents:

import weakref
import gc

def get_weakref_referents(*objs):
    yield from gc.get_referents(*objs)
    for o in objs:
        if type(o) in weakref.ProxyTypes:
            try:
                yield o.__repr__.__self__
            except ReferenceError:
                pass

Then use it as follows:

>>> objsize.get_deep_size(o_ref, get_referents_func=get_weakref_referents)
1104

After the referenced object will be collected, then the size of the proxy object will be reduced.

>>> del o
>>> gc.collect()
>>> # Wait for the object to be collected
>>> objsize.get_deep_size(o_ref, get_referents_func=get_weakref_referents)
72

Object Size Settings

To avoid repeating the input settings when handling the special cases above, you can use the ObjSizeSettings class.

>>> torch_objsize = objsize.ObjSizeSettings(
...   get_referents_func=get_referents_torch,
...   filter_func=filter_func,
... )
>>> torch_objsize.get_deep_size(torch.rand(200))
928
>>> torch_objsize.get_deep_size(torch.rand(300))
1328

See ObjSizeSettings for the list of configurable parameters.

Traversal

A user can implement its own function over the entire subtree using the traversal method, which traverses all the objects in the subtree.

>>> for o in objsize.traverse_bfs(my_obj):
...     print(o)
...
MyClass()
{'x': [0, 1, 2], 'y': [3, 4, 5], 'd': {'x': [0, 1, 2], 'y': [3, 4, 5], 'self': MyClass()}}
[0, 1, 2]
[3, 4, 5]
{'x': [0, 1, 2], 'y': [3, 4, 5], 'self': MyClass()}
2
1
0
5
4
3

Similar to before, non-exclusive objects can be ignored.

>>> for o in objsize.traverse_exclusive_bfs(my_obj):
...     print(o)
...
MyClass()
{'x': [0, 1, 2], 'y': [3, 4, 5], 'd': {'x': [0, 1, 2], 'y': [3, 4, 5], 'self': MyClass()}}
{'x': [0, 1, 2], 'y': [3, 4, 5], 'self': MyClass()}

Alternative

Pympler also supports determining an object deep size via pympler.asizeof(). There are two main differences between objsize and pympler.

  1. objsize has additional features:
    • Traversing the object subtree: iterating all the object’s descendants one by one.
    • Excluding non-exclusive objects. That is, objects that are also referenced from somewhere else in the program. This is true for calculating the object’s deep size and for traversing its descendants.
  2. objsize has a simple and robust implementation with significantly fewer lines of code, compared to pympler. The Pympler implementation uses recursion, and thus have to use a maximal depth argument to avoid reaching Python’s max depth. objsize, however, uses BFS which is more efficient and simple to follow. Moreover, the Pympler implementation carefully takes care of any object type. objsize archives the same goal with a simple and generic implementation, which has fewer lines of code.

License: BSD-3

Copyright (c) 2006-2023, Liran Funaro. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

objsize-0.7.0.tar.gz (16.1 kB view details)

Uploaded Source

Built Distribution

objsize-0.7.0-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file objsize-0.7.0.tar.gz.

File metadata

  • Download URL: objsize-0.7.0.tar.gz
  • Upload date:
  • Size: 16.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for objsize-0.7.0.tar.gz
Algorithm Hash digest
SHA256 d66bbb2a4341803caba84894b5753f9b065ebe1cbf50fd186ae438dfc1ca4729
MD5 4852a4de79f4972b8641fda424bb5656
BLAKE2b-256 88de0a0a81c4ce02ab3c0dc672619472f3988c2b398b7785e3176c2ae96c2a7d

See more details on using hashes here.

File details

Details for the file objsize-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: objsize-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for objsize-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a8b03ce87477c649a99e6b1920f4eeb8b9ba3f8bc2a94d0e5c06ef68adc334a7
MD5 2e6f1311a214ec2544d07c7b81e7123b
BLAKE2b-256 982600ba2cd7d79935ecefa384020828f3a96c3c9c9b66faf9d93aa16eb75985

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page