Sample in hypercubes, select diverse subsets, and measure diversity
diversipy is a collection of algorithms dealing with three different but related topics. The first topic is super-uniform sampling of the unit hypercube. ‘Super-uniform’ in this context means that the obtained point sample should be more uniform than a random uniform sample, which is a desirable property in many applications. One such application is the design of computer experiments, where typically space-filling experimental designs are used. After creation, the samples can be transformed from the unit hypercube to arbitrary cuboids.
The task of subset selection is defined as follows: suppose you have a set of points in R^n and want to select a sample of them distributed as uniformly as possible. This may be necessary because the original set is too large to be processed entirely. The selection problem is related to clustering, with the difference that when using clustering, you usually want to retain the structure of the original point set.
Once one has created (or obtained from somewhere) a point set, one may want to assess its properties. Therefore, diversipy contains several functions to measure diversity and a few related concepts. Several different indicators are offered because they have different advantages and disadvantages (in terms of run time and what they measure).
>>> from diversipy import * >>> design = transform_spread_out(lhd_matrix(50, 2)) # create latin hypercube design >>> subset = psa_select(design, 10) # select subset, for whatever reason >>> unanchored_L2_discrepancy(subset) # calculate discrepancy
Note that points are stored row-wise, in accordance with numpy convention.
The documentation is located at https://www.simonwessing.de/diversipy/doc/