Numpy extensions for set operations on nd-arrays, group_by operations, and related functionality
Numpy indexed operations
This package contains functionality for indexed operations on numpy ndarrays, providing efficient vectorized functionality such as grouping and set operations.
- Rich and efficient grouping functionality:
- splitting of values by key-group
- reductions of values by key-group
- Generalization of existing array set operation to nd-arrays, such as:
- exclusive (xor)
- contains / in (in1d)
- Some new functions:
- indices: numpy equivalent of list.index
- count: numpy equivalent of collections.Counter
- mode: find the most frequently occuring items in a set
- multiplicity: number of occurrences of each key in a sequence
- count_table: like R’s table or pandas crosstab, or an ndim version of np.bincount
Some brief examples to give an impression hereof:
# three sets of graph edges (doublet of ints) edges = np.random.randint(0, 9, (3, 100, 2)) # find graph edges exclusive to one of three sets ex = exclusive(*edges) print(ex) # which edges are exclusive to the first set? print(contains(edges, ex)) # where are the exclusive edges relative to the totality of them? print(indices(union(*edges), ex)) # group and reduce values by identical keys values = np.random.rand(100, 20) # and so on... print(group_by(edges).median(values))
> conda install numpy-indexed -c conda-forge
> pip install numpy-indexed
This package builds upon a generalization of the design pattern as can be found in numpy.unique. That is, by argsorting an ndarray, many subsequent operations can be implemented efficiently and in a vectorized manner.
The sorting and related low level operations are encapsulated into a hierarchy of Index classes, which allows for efficient lookup of many properties for a variety of different key-types. The public API of this package is a quite thin wrapper around these Index objects.
The two complex key types currently supported, beyond standard sequences of sortable primitive types, are ndarray keys (i.e, finding unique rows/columns of an array) and composite keys (zipped sequences). For the exact casting rules describing valid sequences of key objects to index objects, see as_index().
Todo and open questions:
- There may be further generalizations that could be built on top of these abstractions. merge/join functionality perhaps?