Skip to main content

Kdict

Project description

Kdict: dict with multi-dimensional, sliceable keys

CI

kdict is like dict for multi-dimensional keys. With kdict, you can easily filter and slice your dictionary by key dimensions.

Example: machine learning model evaluation. Suppose you're evaluating several models on three cross validation folds, each with a training set and a test set.

Before kdict, you might store evaluation scores in a nested dictionary. But that's cumbersome and error-prone. Here's what it would take to get the mean accuracy for a particular model across all folds:

# To access inner nested data without kdict, you'd need to write iterators like this:
import numpy as np
np.mean(
    [
        data[fold_id][fold_label]["lasso"]
        for fold_id in data.keys()
        for fold_label in data[fold_id].keys()
    ]
)

kdict makes storing and accessing this type of data a breeze. No more nesting:

# Store data in a three-dimensional kdict.
# Dimensions: fold ID, fold label, model name
data = kdict(...)

# Slice the kdict to get lasso model's mean accuracy across all folds:
# data[:, :, 'lasso'] is a subset of the full dictionary
np.mean(list(data[:, :, 'lasso'].values()))

In this example, data is a three-dimensional kdict that you can slice along any dimension. So how did we make this kdict?

from kdict import kdict
data = kdict() # make a blank kdict
for fold_id in range(3):
    for fold_label in ['train', 'test']:
        for model_name in ['lasso', 'randomforest']:
            # add an entry for each fold ID, fold label, and model name
            data[fold_id, fold_label, model_name] = get_model_score(
                fold_id,
                fold_label,
                model_name
            )

The syntax, in a nutshell:

  • Read or write a single element by accessing [key_dimension_1, key_dimension_2] and so on.
  • Or get a subset of the dictionary by slicing, e.g. [:, key_dimension_2].

Installation

pip install kdict

Usage

Create a kdict

Import: from kdict import kdict

Create a blank kdict: data = kdict(). Or initialize from an existing dict: data = kdict(existing_dict). You can also use a dict comprehension there, such as:

data = kdict({
    (fold_id, fold_label, model_name): get_model_score(fold_id, fold_label, model_name)
    for model_name in ['lasso', 'randomforest']
    for fold_label in ['train', 'test']
    for fold_id in range(3)
})

Slice a kdict

Access an individual item with data[0, 'train', 'lasso'].

Or get a subset of the dictionary with slices: data[0, :, :] will have all items where the first dimension of the key is 0. This slice is also a kdict, so you can keep slicing and filtering further.

You can also iterate over specific key dimensions:

# get final dimension of the keys
available_models = data.keys(dimensions=2)

# or get all pairs of first two dimensions
for fold_id, fold_label in data.keys(dimensions=[0, 1]):
    ... # now do something with data[fold_id, fold_label, :]

Eject

A kdict behaves just like a dict, except all keys must have the same number of dimensions.

To get a raw dict back, call data.eject().

Development

Submit PRs against develop branch, then make a release pull request to master.

# Install requirements
pip install --upgrade pip wheel
pip install -r requirements_dev.txt

# Install local package
pip install -e .

# Install pre-commit
pre-commit install

# Run tests
make test

# Run lint
make lint

# bump version before submitting a PR against master (all master commits are deployed)
bump2version patch # possible: major / minor / patch

# also ensure CHANGELOG.md updated

Changelog

0.0.1

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kdict-0.1.0.tar.gz (12.6 kB view hashes)

Uploaded Source

Built Distribution

kdict-0.1.0-py2.py3-none-any.whl (6.6 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page