Skip to main content

Cluster sets of histograms/curves, in particular kinematic distributions in high energy physics.

Project description

https://raw.githubusercontent.com/clusterking/clusterking/master/readme_assets/logo/logo.png

Clustering of Kinematic Graphs

CI Coveralls Documentation Pypi Binder Gitter License Black

Description

This package provides a flexible yet easy to use framework to cluster sets of histograms (or other higher dimensional data) and to select benchmark points representing each cluster. The package particularly focuses on use cases in high energy physics.

A physics use case has been demonstrated in https://arxiv.org/abs/1909.11088.

Physics Case

While most of this package is very general and can be applied to a broad variety of use cases, we have been focusing on applications in high energy physics (particle physics) so far and provide additional convenience methods for this use case. In particular, most of the current tutorials are in this context.

Though very successful, the Standard Model of Particle Physics is believed to be uncomplete, prompting the search for New Physics (NP). The phenomenology of NP models typically depends on a number of free parameters, sometimes strongly influencing the shape of distributions of kinematic variables. Besides being an obvious challenge when presenting exclusion limits on such models, this also is an issue for experimental analyses that need to make assumptions on kinematic distributions in order to extract features of interest, but still want to publish their results in a very general way.

By clustering the NP parameter space based on a metric that quantifies the similarity of the resulting kinematic distributions, a small number of NP benchmark points can be chosen in such a way that they can together represent the whole parameter space. Experiments (and theorists) can then report exclusion limits and measurements for these benchmark points without sacrificing generality.

Installation

clusterking can be installed/upgraded with the python package installer:

pip3 install --user --upgrade "clusterking[plotting]"

If you do not require plotting, you can remove [plotting].

More options and troubleshooting advice is given in the documentation.

Caveats

  • Version 1.0.0 contained several mistakes in the chi2 metric. Please make sure that you are at least using versoin 1.1.0. These mistakes were also found in the paper and will be fixed soon.

Usage and Documentation

Good starting point: Jupyter notebooks in the examples/jupyter_notebook directory. You can also try running them online right now (without any installation required) using binder (just note that this is somewhat unstable, slow and takes some time to start up).

For a documentation of the classes and functions in this package, read the docs on readthedocs.io.

For additional examples, presentations and more, you can also head to our other repositories.

Example

Sample

The following code (taken from examples/jupyter_notebook/010_basic_tutorial.ipynb) is all that is needed to cluster the shape of the q^2 distribution of B -> D tau nu in the space of Wilson coefficients:

import flavio
import numpy as np
import clusterking as ck

s = ck.scan.WilsonScanner(scale=5, eft='WET', basis='flavio')

# Set up kinematic function

def dBrdq2(w, q):
    return flavio.np_prediction("dBR/dq2(B+->Dtaunu)", w, q)

s.set_dfunction(
    dBrdq2,
    binning=np.linspace(3.2, 11.6, 10),
    normalize=True
)

# Set sampling points in Wilson space

s.set_spoints_equidist({
    "CVL_bctaunutau": (-1, 1, 10),
    "CSL_bctaunutau": (-1, 1, 10),
    "CT_bctaunutau": (-1, 1, 10)
})

# Create data object to write to and run

d = ck.DataWithErrors()
r = s.run(d)
r.write()  # Write results back to data object

Cluster

Using hierarchical clustering:

c = ck.cluster.HierarchyCluster()  # Initialize worker class
c.set_metric("euclidean")
c.set_max_d(0.15)      # "Cut off" value for hierarchy
r = c.run(d)           # Run clustering on d
r.write()              # Write results to d

Benchmark points

b = ck.Benchmark() # Initialize worker class
b.set_metric("euclidean")
r = b.run(d)        # Select benchmark points based on metric
r.write()           # Write results back to d

Plotting

d.plot_clusters_scatter(
    ['CVL_bctaunutau', 'CSL_bctaunutau', 'CT_bctaunutau'],
    clusters=[1,2]  # Only plot 2 clusters for better visibility
)
https://raw.githubusercontent.com/clusterking/clusterking/master/readme_assets/plots/scatter_3d_02.png
d.plot_clusters_fill(['CVL_bctaunutau', 'CSL_bctaunutau'])
https://raw.githubusercontent.com/clusterking/clusterking/master/readme_assets/plots/fill_2d.png

Plotting all benchmark points:

d.plot_dist()
https://raw.githubusercontent.com/clusterking/clusterking/master/readme_assets/plots/all_bcurves.png

Plotting minima and maxima of bin contents for all histograms in a cluster (+benchmark histogram):

d.plot_dist_minmax(clusters=[0, 2])
https://raw.githubusercontent.com/clusterking/clusterking/master/readme_assets/plots/minmax_02.png

Similarly with box plots:

d.plot_dist_box()
https://raw.githubusercontent.com/clusterking/clusterking/master/readme_assets/plots/box_plot.png

License & Contributing

This project is ongoing work and questions, comments, bug reports or pull requests are most welcome. You can also use the chat room on gitter or contact us via email. We are also working on a paper, so please make sure to cite us once we publish.

This software is licenced under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clusterking-1.1.0.tar.gz (66.6 kB view details)

Uploaded Source

Built Distribution

clusterking-1.1.0-py3-none-any.whl (87.1 kB view details)

Uploaded Python 3

File details

Details for the file clusterking-1.1.0.tar.gz.

File metadata

  • Download URL: clusterking-1.1.0.tar.gz
  • Upload date:
  • Size: 66.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.1.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.5

File hashes

Hashes for clusterking-1.1.0.tar.gz
Algorithm Hash digest
SHA256 4fadd8854b6de48e9b7ba34e880bf570e97e8e0c54efa27ffe551506ee9f20ec
MD5 82d7d5b760acf1f5bcc11c2537b0f33f
BLAKE2b-256 41e2c44e27bf80f17fdd793bad82de4163e5424e9460e7178aec11b8b45e5014

See more details on using hashes here.

File details

Details for the file clusterking-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: clusterking-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 87.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.1.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.5

File hashes

Hashes for clusterking-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 638131fe74a83e716ee46d9d755eb926deea56eb5b32801ea809735235a57ac2
MD5 b742cd3ceda9a1162672172f5ac3d08a
BLAKE2b-256 ab94e0aa12cdbc6aedefb81ed180466f705d9049fea88c7bbb3af3831c30b20b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page