Cluster sets of histograms/curves, in particular kinematic distributions in high energy physics.
Project description
Clustering of Kinematic Graphs
Description
This package provides a flexible yet easy to use framework to cluster sets of histograms (or other higher dimensional data) and to select benchmark points representing each cluster. The package particularly focuses on use cases in high energy physics.
Physics Case
While most of this package is very general and can be applied to a broad variety of use cases, we have been focusing on applications in high energy physics (particle physics) so far and provide additional convenience methods for this use case. In particular, most of the current tutorials are in this context.
Though very successful, the Standard Model of Particle Physics is believed to be uncomplete, prompting the search for New Physics (NP). The phenomenology of NP models typically depends on a number of free parameters, sometimes strongly influencing the shape of distributions of kinematic variables. Besides being an obvious challenge when presenting exclusion limits on such models, this also is an issue for experimental analyses that need to make assumptions on kinematic distributions in order to extract features of interest, but still want to publish their results in a very general way.
By clustering the NP parameter space based on a metric that quantifies the similarity of the resulting kinematic distributions, a small number of NP benchmark points can be chosen in such a way that they can together represent the whole parameter space. Experiments (and theorists) can then report exclusion limits and measurements for these benchmark points without sacrificing generality.
Installation
clusterking can be installed with the python package installer:
pip3 install clusterking
For a local installation, you might want to use the --user switch of pip. You can also update your current installation with pip3 install --upgrade clusterking.
For the latest development version type:
git clone https://github.com/clusterking/clusterking/
cd clusterking
pip3 install --user .
Usage and Documentation
Good starting point: Jupyter notebooks in the examples/jupyter_notebook directory (run online using binder).
For a documentation of the classes and functions in this package, read the docs on readthedocs.io.
Example
Sample and cluster
Being a condensed version of the basic tutorial, the following code is all that is needed to cluster the shape of the q^2 distribution of B-> D* tau nu in the space of Wilson coefficients:
import flavio
import numpy as np
import clusterking as ck
s = ck.scan.WilsonScanner(scale=5, eft='WET', basis='flavio')
# Set up kinematic function
def dBrdq2(w, q):
return flavio.np_prediction("dBR/dq2(B+->Dtaunu)", w, q)
s.set_dfunction(
dBrdq2,
binning=np.linspace(3.2, 11.6, 10),
normalize=True
)
# Set sampling points in Wilson space
s.set_spoints_equidist({
"CVL_bctaunutau": (-1, 1, 10),
"CSL_bctaunutau": (-1, 1, 10),
"CT_bctaunutau": (-1, 1, 10)
})
# Create data object to write to and run
d = ck.DataWithErrors()
s.run(d)
# Use hierarchical clustering
c = ck.cluster.HierarchyCluster(d)
c.set_metric() # Use default metric (Euclidean)
c.build_hierarchy() # Build up clustering hierarchy
c.cluster(max_d=0.15) # "Cut off" hierarchy
c.write() # Write results to d
Benchmark points
b = ck.Benchmark(d) # Initialize benchmarker for data d
b.set_metric() # Use default metric (Euclidean)
b.select_bpoints() # Select benchmark points based on metric
b.write() # Write results back to d
Plotting
d.plot_clusters_scatter(
['CVL_bctaunutau', 'CSL_bctaunutau', 'CT_bctaunutau'],
clusters=[1,2] # Only plot 2 clusters for better visibility
)
d.plot_clusters_fill(['CVL_bctaunutau', 'CSL_bctaunutau'])
Plotting all benchmark points:
d.plot_dist()
Plotting minima and maxima of bin contents for all histograms in a cluster (+benchmark histogram):
d.plot_dist_minmax(clusters=[0, 2])
Similarly with box plots:
d.plot_dist_box()
License & Contributing
This project is ongoing work and questions, comments, bug reports or pull requests are most welcome. You can also use the chat room on gitter or contact us via email. We are also working on a paper, so please make sure to cite us once we publish.
This software is lienced under the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for clusterking-0.10.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c05c5f3af8628d22f1b861a0ec569d5bd2f99d344d174bddd315d06f0e258a42 |
|
MD5 | f7d96398e3541081d93250659ca1811b |
|
BLAKE2b-256 | 8fb26c2109ee852091f832bfbabed9f88234b1afdc1651906d4d759934bbe0a7 |