Skip to main content

Transform nested data structures into Python objects

Project description

data2objects

Transform self-documenting config files and data structures into Python objects.

PyPI Tests codecov

Installation

pip install data2objects or just copy data2objects.py into your project.

Examples

The best way to explain the use of data2objects is via an example. Consider the following config.yaml file:

backbone:
    activation: +torch.nn.SiLU()
    hidden_size: 1024
readout:
    +torch.nn.Linear:
        in_features: =/backbone/hidden_size
        out_features: 1

Parsing this file using data2objects.from_yaml returns the following:

>>> import data2objects
>>> config = data2objects.from_yaml("config.yaml")
>>> print(config)
{'backbone': {'activation': SiLU(), 'hidden_size': 1024}, 
 'readout': Linear(in_features=1024, out_features=1, bias=True)}

Under-the-hood, data2objects has done the following:

  • identified any "reference strings" prefixed by "=" and replaced them with the corresponding values in the nested data structure
    • hence =/backbone/hidden_size was replaced with 1024.
  • identified any "object instantiation strings" prefixed by "+", imported the corresponding objects from the provided modules and:
    • called the object if the instantiation string ends with "()", i.e. "+torch.nn.SiLU()" created a SiLU object.
    • called the object with keyword arguments if the instantiation string ends with a mapping, i.e. "+torch.nn.Linear: {in_features: =/backbone/hidden_size, out_features: 1}" created a Linear object with in_features=1024 and out_features=1.

Documentation

data2objects exposes two functions, from_dict and from_yaml, which can be used to transform a nested data structure into a set of instantiated Python objects.

from_yaml

def from_yaml(thing: str | Path, modules: list[object] | None = None) -> dict:

Load a nested dictionary from a yaml file or string, and parse it using data2objects.from_dict.

If thing points to an existing file, the data in the file is loaded. Otherwise, the string is treated as containing the raw yaml data.

Parameters

thing: str | Path The yaml file or string to load.

modules: list[object] | None A list of modules to look up non-fully qualified names in.

Returns

dict The transformed data.


from_dict

def from_dict(
    data: dict[K, V], modules: list[object] | None = None
) -> dict[K, V | Any]:

Transform a nested data structure into instantiated Python objects. This function recursively processes the input data, and applies the following special handling to any str objects:

Reference handling:

Any leaf-nodes within data that are strings and start with "=" are interpreted as references to other parts of data. The syntax for these references follows the same rules as unix paths:

  • "=/path": resolve path relative to the root of the data structure.
  • "=./path": resolve path relative to the current working directory.
  • "=../path": resolve path relative to the parent of the current working directory.

Object instantiation:

The following handling applied to any str objects found within data ( either as a key or value) that start with "+":

  1. attempt to import the python object specified by the string: e.g. the string "+torch.nn.Tanh" will be converted to the Tanh class (not an instance) from the torch.nn module. If the string is not an absolute path (i.e. does not contain any dots), we attempt to import it from the python standard library, or any of the provided modules:
    • "+Path" with modules=[pathlib] will be converted to the Path class from the pathlib module.
    • "+tuple" will be converted to the tuple type.
  2. if the string ends with a "()", the resulting object is called with no arguments e.g. "+my_module.MyClass()" will be converted to an instance of MyClass from my_module. This is equivalent to +my_module.MyClass: {} (see below).
  3. if the string is found as key in a mapping with exactly one key-value pair, then:
    • if the value is itself a mapping, the single-item mapping is replaced with the result of calling the imported object with the recursively instantiated values as keyword arguments
    • otherwise, the single-item mapping is replaced with the result of calling the imported object with the instantiated value as a single positional argument

Parameters

data: dict[K, V] The data to transform.

modules: list[object] | None A list of modules to look up non-fully qualified names in.

Returns

dict The transformed data.

Examples

A basic example:

>>> instantiate_from_data({"activation": "+torch.nn.Tanh()"})
{'activation': Tanh()}

Note the importance of trailing parentheses:

>>> instantiate_from_data({"activation": "+torch.nn.Tanh"})
{'activation': <class 'torch.nn.modules.activation.Tanh'>}

Alternatively, point instantiate_from_data to automatically import from torch.nn:

>>> instantiate_from_data({"activation": "+Tanh()"}, modules=[torch.nn])
{'activation': Tanh()}

Use single-item mappings to instantiate classes/call functions with arguments. The following syntax will internally import MyClass from my_module, and call it as MyClass(x=1, y=2) with explicit keyword arguments:

>>> instantiate_from_data({
...     "activation": "+torch.nn.ReLU()",
...     "model": {
...         "+MyClass": {"x": 1, "y": 2}
...     }
... })
{'activation': ReLU(), 'model': MyClass(x=1, y=2)}

In contrast, the following syntax call the imported objects with a single positional argument:

>>> instantiate_from_data({"+len": [1, 2, 3]})
3  # i.e. len([1, 2, 3])

Mapping with multiple keys are still processed, but are never used to instantiate classes/call functions:

>>> instantiate_from_data({"+len": [1, 2, 3], "+print": "hello"})
{<built-in function len>: [1, 2, 3], <built-in function print>: 'hello'}

instantiate_from_data also works with arbitrary nesting:

>>> instantiate_from_data({"model": {"activation": "+torch.nn.Tanh()"}})
{'model': {'activation': Tanh()}}

Caution: instantiate_from_data can lead to side-effects!

>>> instantiate_from_data({"+print": "hello"})
hello

References are resolved before object instantiation, so all of the following will resolve the "length" field to 3:

>>> instantiate_from_data({"args": [1, 2, 3], "length": {"+len": "!../args"}})
3
>>> instantiate_from_data({"args": [1, 2, 3], "length": {"+len": "!~args"}})
3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data2objects-0.1.0.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data2objects-0.1.0-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file data2objects-0.1.0.tar.gz.

File metadata

  • Download URL: data2objects-0.1.0.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for data2objects-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9c0bb1d918c83e107aac897f00df69033d173c968972d68b776c280b04aea18c
MD5 b2397b76e7728ff55460ea2fd1d81bff
BLAKE2b-256 caf4099e4b283b971b82be2d0e52bcdc718c55706ff5e09ceeae34e7a8bbd7cd

See more details on using hashes here.

Provenance

The following attestation bundles were made for data2objects-0.1.0.tar.gz:

Publisher: publish.yml on jla-gardner/data2objects

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file data2objects-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: data2objects-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for data2objects-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 21c2ed613651895064d31bd881dabcf56b0a342fde513208518745498b33d9a6
MD5 8115ff21fbe2f4ba5089ad626fccf6f4
BLAKE2b-256 1cc4f418f0c12bb7badadec326f3609464cfad71f5c8f2378074979b4ae39017

See more details on using hashes here.

Provenance

The following attestation bundles were made for data2objects-0.1.0-py3-none-any.whl:

Publisher: publish.yml on jla-gardner/data2objects

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page