Skip to main content

scalable pythonic model fitting for high energy physics

Project description

zfit logo

zfit: scalable pythonic fitting

https://scikit-hep.org/assets/images/Scikit--HEP-Affiliated-blue.svg https://img.shields.io/pypi/v/zfit.svg conda-forge https://github.com/zfit/zfit/workflows/CI/badge.svg https://github.com/zfit/zfit/workflows/build/badge.svg https://coveralls.io/repos/github/zfit/zfit/badge.svg?branch=meta_changes CodeFactor

zfit is a highly scalable and customizable model manipulation and fitting library. It uses TensorFlow as its computational backend and is optimised for simple and direct manipulation of probability density functions. The project is affiliated with and well integrated into Scikit-HEP, the HEP Python ecosystem.

If you use zfit in research, please consider citing.

N.B.: zfit is currently in beta stage, so while most core parts are established, some may still be missing and bugs may be encountered. It is, however, mostly ready for production, and is being used in analyses projects. If you want to use it for your project and you are not sure if all the needed functionality is there, feel free to contact.

Installation

zfit is available on pip. To install it (recommended: use a virtual/conda env!) with all the dependencies (minimizers, uproot, …), use

pip install -U zfit[all]

(the -U just indicates to upgrade zfit, in case you have it already installed) or for minimal dependencies

pip install zfit

Why?

The basic idea behind zfit is to offer a Python oriented alternative to the very successful RooFit library from the ROOT data analysis package that can integrate with the other packages that are part if the scientific Python ecosystem. Contrary to the monolithic approach of ROOT/RooFit, the aim of zfit is to be light and flexible enough t o integrate with any state-of-art tools and to allow scalability going to larger datasets.

These core ideas are supported by two basic pillars:

  • The skeleton and extension of the code is minimalist, simple and finite: the zfit library is exclusively designed for the purpose of model fitting and sampling with no attempt to extend its functionalities to features such as statistical methods or plotting.

  • zfit is designed for optimal parallelisation and scalability by making use of TensorFlow as its backend. The use of TensorFlow provides crucial features in the context of model fitting like taking care of the parallelisation and analytic derivatives.

How to use

While the zfit library provides a model fitting and sampling framework for a broad list of applications, we will illustrate its main features with a simple example by fitting a Gaussian distribution with an unbinned likelihood fit and a parameter uncertainty estimation.

Example in short

obs = zfit.Space('x', limits=(-10, 10))

# create the model
mu    = zfit.Parameter("mu"   , 2.4, -1, 5)
sigma = zfit.Parameter("sigma", 1.3,  0, 5)
gauss = zfit.pdf.Gauss(obs=obs, mu=mu, sigma=sigma)

# load the data
data_np = np.random.normal(size=10000)
data = zfit.Data.from_numpy(obs=obs, array=data_np)

# build the loss
nll = zfit.loss.UnbinnedNLL(model=gauss, data=data)

# minimize
minimizer = zfit.minimize.Minuit()
result = minimizer.minimize(nll)

# calculate errors
param_errors = result.hesse()

This follows the zfit workflow

zfit workflow

Full explanation

The default space (e.g. normalization range) of a PDF is defined by an observable space, which is created using the zfit.Space class:

obs = zfit.Space('x', limits=(-10, 10))

To create a simple Gaussian PDF, we define its parameters and their limits using the zfit.Parameter class.

# syntax: zfit.Parameter("any_name", value, lower, upper)
  mu    = zfit.Parameter("mu"   , 2.4, -1, 5)
  sigma = zfit.Parameter("sigma", 1.3,  0, 5)
  gauss = zfit.pdf.Gauss(obs=obs, mu=mu, sigma=sigma)

For simplicity, we create the dataset to be fitted starting from a numpy array, but zfit allows for the use of other sources such as ROOT files:

mu_true = 0
sigma_true = 1
data_np = np.random.normal(mu_true, sigma_true, size=10000)
data = zfit.Data.from_numpy(obs=obs, array=data_np)

Fits are performed in three steps:

  1. Creation of a loss function, in our case a negative log-likelihood.

  2. Instantiation of our minimiser of choice, in the example the Minuit.

  3. Minimisation of the loss function.

# Stage 1: create an unbinned likelihood with the given PDF and dataset
nll = zfit.loss.UnbinnedNLL(model=gauss, data=data)

# Stage 2: instantiate a minimiser (in this case a basic minuit)
minimizer = zfit.minimize.Minuit()

# Stage 3: minimise the given negative log-likelihood
result = minimizer.minimize(nll)

Errors are calculated with a further function call to avoid running potentially expensive operations if not needed:

param_errors = result.hesse()

Once we’ve performed the fit and obtained the corresponding uncertainties, we can examine the fit results:

print("Function minimum:", result.fmin)
print("Converged:", result.converged)
print("Full minimizer information:", result)

# Information on all the parameters in the fit
params = result.params
print(params)

# Printing information on specific parameters, e.g. mu
print("mu={}".format(params[mu]['value']))

And that’s it! For more details and information of what you can do with zfit, checkout the latest documentation.

Prerequisites

zfit works with Python versions 3.7, 3.8 and 3.9. The following packages (amongst others) are required:

… and some other packages. For a full list, check the requirements.

Installing

zfit is currently only available on pip. The conda version is highly outdated and should not be used.

If possible, use a conda or virtual environment and do:

$ pip install zfit

For the newest development version, you can install the version from git with

$ pip install git+https://github.com/zfit/zfit

Contributing

Any idea of how to improve the library? Or interested to write some code? Contributions are always welcome, please have a look at the Contributing guide.

Contact

You can contact us directly:

Original Authors

Jonas Eschle <jonas.eschle@cern.ch>
Albert Puig <albert.puig@cern.ch>
Rafael Silva Coutinho <rsilvaco@cern.ch>

See here for all authors and contributors

Acknowledgements

zfit has been developed with support from the University of Zurich and the Swiss National Science Foundation (SNSF) under contracts 168169 and 174182.

The idea of zfit is inspired by the TensorFlowAnalysis framework developed by Anton Poluektov and TensorProb by Chris Burr and Igor Babuschkin using the TensorFlow open source library and more libraries.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zfit-0.17.0.tar.gz (2.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zfit-0.17.0-py2.py3-none-any.whl (1.9 MB view details)

Uploaded Python 2Python 3

File details

Details for the file zfit-0.17.0.tar.gz.

File metadata

  • Download URL: zfit-0.17.0.tar.gz
  • Upload date:
  • Size: 2.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for zfit-0.17.0.tar.gz
Algorithm Hash digest
SHA256 cd60dfc360c82666af4e8dddd78edb0ab95a095b9dd0868457f0981dc03afa5a
MD5 9c5867b0c7169dc3d8dfb696de4256c1
BLAKE2b-256 e318a7c0ca565dec314a47b60f7bc244e26497104eb570b195be39c26a444e44

See more details on using hashes here.

File details

Details for the file zfit-0.17.0-py2.py3-none-any.whl.

File metadata

  • Download URL: zfit-0.17.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for zfit-0.17.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 981b66e50026f2910038c182a8550b991ea7ccf5d880213745433bed39be5c71
MD5 9d9b5bf15c13c0a3ce32a1c5130261a1
BLAKE2b-256 c373fb062991a372f9529a900b525a878b5f0e0bd4e84826c3b3d40106d56e41

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page