Skip to main content

No project description provided

Project description

GitlabCIPipeline GitlabCICoverage Appveyor Pypi Downloads ReadTheDocs

The kwarray module implements a small set of pure-python extensions to numpy and torch.

Read the docs here: https://kwarray.readthedocs.io/en/master/

The top-level API is:

from .arrayapi import ArrayAPI
from . import distributions
from .algo_assignment import (maxvalue_assignment, mincost_assignment,
                              mindist_assignment,)
from .dataframe_light import (DataFrameArray, DataFrameLight, LocLight,)
from .fast_rand import (standard_normal, standard_normal32, standard_normal64,
                        uniform, uniform32,)
from .util_averages import (stats_dict,)
from .util_groups import (apply_grouping, group_consecutive,
                          group_consecutive_indices, group_indices,
                          group_items,)
from .util_numpy import (arglexmax, argmaxima, argminima, atleast_nd, boolmask,
                         isect_flags, iter_reduce_ufunc,)
from .util_random import (ensure_rng, random_combinations, random_product,
                          seed_global, shuffle,)
from .util_torch import (one_hot_embedding,)

The ArrayAPI

On of the most useful features in kwarray is the kwarray.ArrayAPI — a class that helps bridge between numpy and torch. This class consists of static methods that implement part of the numpy API and operate equivalently on either torch.Tensor or numpy.ndarray objects.

This works because every function call checks if the input is a torch tensor or a numpy array and then takes the appropriate action.

As you can imagine, it can be slow to validate your inputs on each function call. Therefore the recommended way of using the array API is via the kwarray.ArrayAPI.impl function. This function does the check once and then returns another object that directly performs the correct operations on subsequent data items of the same type.

The following example demonstrates both modes of usage.

import torch
import numpy as np
data1 = torch.rand(10, 10)
data2 = data1.numpy()
# Method 1: grab the appropriate sub-impl
impl1 = ArrayAPI.impl(data1)
impl2 = ArrayAPI.impl(data2)
result1 = impl1.sum(data1, axis=0)
result2 = impl2.sum(data2, axis=0)
assert np.all(impl1.numpy(result1) == impl2.numpy(result2))
# Method 2: choose the impl on the fly
result1 = ArrayAPI.sum(data1, axis=0)
result2 = ArrayAPI.sum(data2, axis=0)
assert np.all(ArrayAPI.numpy(result1) == ArrayAPI.numpy(result2))

Other Notes:

The kwarray.ensure_rng function helps you properly maintain and control local seeded random number generation. This means that you wont clobber the random state of another library / get your random state clobbered.

DataFrameArray and DataFrameLight implement a subset of the pandas API. They are less powerful, but orders of magnitude faster. The main drawback is that you lose loc, but iloc is available.

uniform32 and standard_normal32 are faster 32-bit random number generators (compared to their 64-bit numpy counterparts).

mincost_assignment is the Munkres / Hungarian algorithm. It solves the assignment problem.

one_hot_embedding is a fast numpy / torch way to perform the often needed OHE deep-learning trick.

group_items is a fast way to group a numpy array by another numpy array. For fine grained control we also expose group_indices, which groups the indices of a numpy array, and apply_grouping, which partitions a numpy array by those indices.

boolmask effectively inverts np.where.

Usefulness:

This is the frequency that I’ve used various components of this library with in my projects:

{
    'ensure_rng': 85,
    'ArrayAPI': 79,
    'DataFrameArray': 21,
    'boolmask': 17,
    'shuffle': 16,
    'argmaxima': 13,
    'group_indices': 12,
    'stats_dict': 9,
    'maxvalue_assignment': 7,
    'seed_global': 7,
    'iter_reduce_ufunc': 5,
    'isect_flags': 5,
    'group_items': 4,
    'one_hot_embedding': 4,
    'atleast_nd': 4,
    'mincost_assignment': 3,
    'standard_normal': 3,
    'arglexmax': 2,
    'DataFrameLight': 1,
    'uniform': 1,
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kwarray-0.5.5.tar.gz (44.3 kB view details)

Uploaded Source

Built Distribution

kwarray-0.5.5-py2.py3-none-any.whl (46.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file kwarray-0.5.5.tar.gz.

File metadata

  • Download URL: kwarray-0.5.5.tar.gz
  • Upload date:
  • Size: 44.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.7

File hashes

Hashes for kwarray-0.5.5.tar.gz
Algorithm Hash digest
SHA256 80a8fcf135e5b83ea0fabf0a708dfec3584f7b805d8bfa1c8ee4522ec15144fe
MD5 ee473b9587574d5724479c9745eb2e53
BLAKE2b-256 64df400aaf1063657fc2ac9a09c514603489bc7b6809606fb43b5415c23005cc

See more details on using hashes here.

File details

Details for the file kwarray-0.5.5-py2.py3-none-any.whl.

File metadata

  • Download URL: kwarray-0.5.5-py2.py3-none-any.whl
  • Upload date:
  • Size: 46.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.7

File hashes

Hashes for kwarray-0.5.5-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 9019412a92223c2c508856e8231c52c93641a0d3bfb6ac95f2746091df0c608c
MD5 e79d36b2e3af0151a63fe602ce8d6675
BLAKE2b-256 1d79c400a73607adf6e73fa96f7e50e846e851519480e51182a9f50f04b0b3bc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page