Numpy extensions for set operations on nd-arrays, group_by operations, and related functionality
This package contains functionality for indexed operations on numpy ndarrays, providing efficient vectorized functionality such as grouping and set operations.
Some brief examples to give an impression hereof:
# three sets of graph edges (doublet of ints) edges = np.random.randint(0, 9, (3, 100, 2)) # find graph edges exclusive to one of three sets ex = exclusive(*edges) print(ex) # which edges are exclusive to the first set? print(contains(edges, ex)) # where are the exclusive edges relative to the totality of them? print(indices(union(*edges), ex)) # group and reduce values by identical keys values = np.random.rand(100, 20) # and so on... print(group_by(edges).median(values))
> conda install numpy-indexed -c conda-forge
> pip install numpy-indexed
This package builds upon a generalization of the design pattern as can be found in numpy.unique. That is, by argsorting an ndarray, many subsequent operations can be implemented efficiently and in a vectorized manner.
The sorting and related low level operations are encapsulated into a hierarchy of Index classes, which allows for efficient lookup of many properties for a variety of different key-types. The public API of this package is a quite thin wrapper around these Index objects.
The two complex key types currently supported, beyond standard sequences of sortable primitive types, are ndarray keys (i.e, finding unique rows/columns of an array) and composite keys (zipped sequences). For the exact casting rules describing valid sequences of key objects to index objects, see as_index().