Skip to main content

Lazy dict with universally unique identifier for values

Project description

test codecov pypi Python version license: GPL v3

arXiv API documentation

ldict

A lazy dict with universally unique deterministic identifiers.

Latest release | Current code | API documentation

Overview

We consider that every value is generated by a process, starting from an empty ldict. The process is a sequence of transformation steps done through the operator >>, which symbolizes a data flow. There are two types of steps:

  • value insertion - represented by dict-like objects
  • function application - represented by ordinary python functions

Each function, ldict, and any value have a deterministic UUID (called hosh - operable hash). Identifiers (hoshes) for ldicts and values are predictable through the magic available here. A ldict is completely defined by its key-value pairs so that it can be converted from/to a built-in dict.

Creating a ldict is not different from creating an ordinary dict. Optionally it can be created through the >> operator used after empty or Ø (uppercase, usually AltGr+Shift+o in most keyboards): img.png

Function application is done in the same way. The parameter names define the input fields, while the keys in the returned dict define the output fields: img_1.png

Similarly, for anonymous functions: img_5.png

Finally, the result is only evaluated at request: img_6.png img_7.png

Installation

...as a standalone lib

# Set up a virtualenv. 
python3 -m venv venv
source venv/bin/activate

# Install from PyPI...
pip install --upgrade pip
pip install -U ldict

# ...or, install from updated source code.
pip install git+https://github.com/davips/ldict

...from source

git clone https://github.com/davips/ldict
cd ldict
poetry install

Examples

Merging two ldicts

from ldict import ldict

a = ldict(x=3)
print(a)
"""
{
    "id": "kr_4aee5c3bcac2c478be9901d57fd1ef8a9d002",
    "ids": "kr_4aee5c3bcac2c478be9901d57fd1ef8a9d002",
    "x": 3
}
"""
b = ldict(y=5)
print(b)
"""
{
    "id": "Uz_0af6d78f77734fad67e6de7cdba3ea368aae4",
    "ids": "Uz_0af6d78f77734fad67e6de7cdba3ea368aae4",
    "y": 5
}
"""
print(a >> b)
"""
{
    "id": "c._2b0434ca422114262680df425b85cac028be6",
    "ids": "kr_4aee5c3bcac2c478be9901d57fd1ef8a9d002 Uz_0af6d78f77734fad67e6de7cdba3ea368aae4",
    "x": 3,
    "y": 5
}
"""

Lazily applying functions to ldict

from ldict import ldict

a = ldict(x=3)
print(a)
"""
{
    "id": "kr_4aee5c3bcac2c478be9901d57fd1ef8a9d002",
    "ids": "kr_4aee5c3bcac2c478be9901d57fd1ef8a9d002",
    "x": 3
}
"""
a = a >> ldict(y=5) >> {"z": 7} >> (lambda x, y, z: {"r": x ** y // z})
print(a)
"""
{
    "id": "8jopGVdtSEyCk1NSKcrEF-Lfv8up9MQBdvkLxU2o",
    "ids": "J3tsy4vUXPELySBicaAy-h-UK7Dp9MQBdvkLxU2o... +2 ...Ss_7dff0a161ba7462725cac7dcee71b67669f69",
    "r": "→(x y z)",
    "x": 3,
    "y": 5,
    "z": 7
}
"""
print(a.r)
"""
34
"""
print(a)
"""
{
    "id": "8jopGVdtSEyCk1NSKcrEF-Lfv8up9MQBdvkLxU2o",
    "ids": "J3tsy4vUXPELySBicaAy-h-UK7Dp9MQBdvkLxU2o... +2 ...Ss_7dff0a161ba7462725cac7dcee71b67669f69",
    "r": 34,
    "x": 3,
    "y": 5,
    "z": 7
}
"""

Parameterized functions and sampling

from random import Random

from ldict import Ø
from ldict.cfg import cfg


# A function provide input fields and, optionally, parameters.
# For instance:
# 'a' is sampled from an arithmetic progression
# 'b' is sampled from a geometric progression
# Here, the syntax for default parameter values is borrowed with a new meaning.
def fun(x, y, a=[-100, -99, -98, ..., 100], b=[0.0001, 0.001, 0.01, ..., 100000000]):
    return {"z": a * x + b * y}


# Creating an empty ldict. Alternatively: d = ldict().
d = Ø >> {}
d.show(colored=False)
"""
{
    "id": "0000000000000000000000000000000000000000",
    "ids": {}
}
"""
# Putting some values. Alternatively: d = ldict(x=5, y=7).
d["x"] = 5
d["y"] = 7
d.show(colored=False)
"""
{
    "id": "I0_39c94b4dfbc7a8579ca1304eba25917204a5e",
    "ids": {
        "x": "Tz_d158c49297834fad67e6de7cdba3ea368aae4",
        "y": "Rs_92162dea64a7462725cac7dcee71b67669f69"
    },
    "x": 5,
    "y": 7
}
"""
# Parameter values are uniformly sampled.
d1 = d >> fun
d1.show(colored=False)
print(d1.z)
"""
{
    "id": "514.w8gcsQ8.TnVN3aSVAAoXfymqsurKYCKMzp.h",
    "ids": {
        "z": "ik8Av.-CexDONSEUn.wP-zQA1nmqsurKYCKMzp.h",
        "x": "Tz_d158c49297834fad67e6de7cdba3ea368aae4",
        "y": "Rs_92162dea64a7462725cac7dcee71b67669f69"
    },
    "z": "→(a b x y)",
    "x": 5,
    "y": 7
}
-170.0
"""
d2 = d >> fun
d2.show(colored=False)
print(d2.z)
"""
{
    "id": "3S7qBvyyYd0D-J6CxZ.lIwrzWY91CzsYaW6RKYbo",
    "ids": {
        "z": "52L8np10eE7dcfSIROGf6wTcIN91CzsYaW6RKYbo",
        "x": "Tz_d158c49297834fad67e6de7cdba3ea368aae4",
        "y": "Rs_92162dea64a7462725cac7dcee71b67669f69"
    },
    "z": "→(a b x y)",
    "x": 5,
    "y": 7
}
467.0
"""
# Parameter values can also be manually set.
e = d >> cfg(a=5, b=10) >> fun
print(e.z)
"""
95
"""
# Not all parameters need to be set.
e = d >> cfg(a=5) >> fun
print(e.z)
"""
95.0
"""
# Each run will be a different sample for the missing parameters.
e = e >> cfg(a=5) >> fun
print(e.z)
"""
700000025.0
"""
# We can define the initial state of the random sampler.
# It will be in effect from its location place onwards in the expression.
e = d >> cfg(a=5) >> Random(0) >> fun
print(e.z)
"""
699999990.0
"""
# All runs will yield the same result,
# if starting from the same random number generator seed.
e = e >> cfg(a=5) >> Random(0) >> fun
print(e.z)
"""
699999990.0
"""
# Reproducible different runs are achievable by using a single random number generator.
rnd = Random(0)
e = d >> cfg(a=5) >> rnd >> fun
print(e.z)
e = d >> cfg(a=5) >> rnd >> fun  # Alternative syntax.
print(e.z)
"""
699999990.0
35.0007
"""

Composition of sets of functions

from random import Random

from ldict import Ø


# A multistep process can be defined without applying its functions


def g(x, y, a=[1, 2, 3, ..., 10], b=[0.00001, 0.0001, 0.001, ..., 100000]):
    return {"z": a * x + b * y}


def h(z, c=[1, 2, 3]):
    return {"z": c * z}


# In the ldict framework 'data is function',
# so the alias ø represents the 'empty data object' and the 'reflexive function' at the same time.
# In other words: 'inserting nothing' has the same effect as 'doing nothing'.
# The operator '*' is an alias for '>>', used just to make the context clearer.
fun = Ø * g * h  # ø enable the cartesian product of the subsequent sets of functions within the expression.
print(fun)
"""
«g × h»
"""
# The difference between 'ø * g * h' and 'ldict(x=3) >> g >> h' is that the functions in the latter are already applied
# (resulting in an ldict object). The former still has its free parameters unsampled,
# and results in an ordered set of composite functions.
# It is a set because the parameter values of the functions are still undefined.
d = {"x": 5, "y": 7} >> fun
print(d)
"""
{
    "id": "2aZUpZTEi6VYR1Tp6itTWn-gJVXjJYRkIly.KUS4",
    "ids": "lUC4HBIu6Va4mzCwq78NknqWuKXjJYRkIly.KUS4... +1 ...Rs_92162dea64a7462725cac7dcee71b67669f69",
    "z": "→(c z→(a b x y))",
    "x": 5,
    "y": 7
}
"""
print(d.z)
"""
2205.0
"""
d = {"x": 5, "y": 7} >> fun
print(d.z)
"""
190.0
"""
# Reproducible different runs by passing a stateful random number generator.
rnd = Random(0)
e = d >> rnd >> fun
print(e.z)
"""
105.0
"""
e = d >> rnd >> fun
print(e.z)
"""
14050.0
"""
# Repeating the same results.
rnd = Random(0)
e = d >> rnd >> fun
print(e.z)
"""
105.0
"""
e = d >> rnd >> fun
print(e.z)
"""
14050.0
"""

Transparent persistence

import shelve
from collections import namedtuple
from pprint import pprint

from ldict import ldict, Ø
# The cache can be set globally.
# It is as simple as a dict, or any dict-like implementation mapping str to serializable content.
# Implementations can, e.g., store data on disk or in a remote computer.
from ldict.cfg import cfg
from ldict.config import setup

setup(cache={})


def fun(x, y):
    print("Calculated!")  # Watch whether the value had to be calculated.
    return {"z": x ** y}


# The operator '^' indicates a relevant point during the process, i.e., a point where data should be stored.
# It is mostly intended to avoid costly recalculations or log results.
# The symbol points upwards, meaning data can momentarily come from or go outside of the process.
# When the same process is repeated, only the first request will trigger calculation.
# Local caching objects (dicts or dict-like database servers) can also be used.
# They should be wrapped by square brackets to avoid ambiguity.
# The list may contain many different caches, e.g.: [RAM, local, remote].
mycache = {}
remote = {}
d = Ø >> {"x": 3, "y": 2} >> fun >> [mycache, remote]
print(d)
print(d.z, d.id)
"""
{
    "id": "dpWeC4tFX.7oD1PMWLoyNAaH6gtNSvzvAw2XMZVi",
    "ids": "GsDJe8CjPiVCEoJEoNzyfKAyyirNSvzvAw2XMZVi... +1 ...yI_a331070d4bcdde465f28ba37ba1310e928122",
    "z": "→(^ x y)",
    "x": 3,
    "y": 2
}
Calculated!
9 dpWeC4tFX.7oD1PMWLoyNAaH6gtNSvzvAw2XMZVi
"""
# The second request just retrieves the cached value.
d = ldict(y=2, x=3) >> fun >> [remote]
print(d.z, d.id)
"""
9 dpWeC4tFX.7oD1PMWLoyNAaH6gtNSvzvAw2XMZVi
"""
# The caching operator can appear in multiple places in the expression, if intermediate values are of interest.
# The ø is used as ldict-inducer when needed.
d = ldict(y=2, x=3) >> fun ^ Ø >> (lambda x: {"x": x ** 2}) >> Ø >> {"w": 5, "k": 5} >> Ø >> [mycache]
print(d.z, d.id)
"""
9 QaRWaaqyTLRqBDzvIff.HdTGQVDeSMDamXXwaYMA
"""
# Persisting to disk is easily done via Python shelve.
P = namedtuple("P", "x y")
a = [3, 2]
b = [1, 4]


def measure_distance(a, b):
    from math import sqrt
    return {"distance": sqrt((a[0] - b[0]) ** 2 + (a[1] - b[1]) ** 2)}


with shelve.open("/tmp/my-cache-file.db") as db:
    d = ldict(a=a, b=b) >> measure_distance >> [db]
    pprint(dict(db))  # Cache is initially empty.
    print(d.distance)
    pprint(dict(db))
    #  ...

    # '^' syntax is also possible.
    a = [7, 1]
    b = [4, 3]
    copy = lambda source=None, target=None, **kwargs: {target: kwargs[source]}
    mean = lambda distance, other_distance: {"m": (distance + other_distance) / 2}
    e = (
            ldict(a=a, b=b)
            >> measure_distance
            >> {"other_distance": d.distance}
            >> mean
            ^ Ø
            ^ cfg(source="m", target="m0")
            >> copy
            >> (lambda m: {"m": m ** 2})
    )
    print(e.m0, e.m)

"""
{'3Q_85403c3464883af128dc24eef54294173d8ef': [1, 4],
 'E0_45bf7de0dcdfc012da8a0f556492e8880b09d': [3, 2],
 'KBQMiN2gHLwCewlu6HC67I1R2-m8hIbZ8IXI2c0c': 2.8284271247461903}
2.8284271247461903
{'3Q_85403c3464883af128dc24eef54294173d8ef': [1, 4],
 'E0_45bf7de0dcdfc012da8a0f556492e8880b09d': [3, 2],
 'KBQMiN2gHLwCewlu6HC67I1R2-m8hIbZ8IXI2c0c': 2.8284271247461903}
3.2169892001050897 10.349019513592784
"""

Concept

A ldict is like a common Python dict, with extra functionality and lazy. It is a mapping between string keys, called fields, and any serializable object. The ldict id (identifier) and the field ids are also part of the mapping.

The user can provide a unique identifier (hosh) for each function or value object. Otherwise, they will be calculated through blake3 hashing of the content of data or bytecode of function. For this reason, such functions should be simple, i.e., with minimal external dependencies, to avoid the unfortunate situation where two functions with identical local code actually perform different calculations through calls to external code that implement different algorithms with the same name.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ldict-2.211016.3.tar.gz (27.9 kB view hashes)

Uploaded Source

Built Distribution

ldict-2.211016.3-py3-none-any.whl (34.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page