Fitting histograms
Fitting is the method for modeling the expected distribution of events in a physics data analysis. ROOT offers various options to perform the fitting of the data:
- Fit() method: You can fit histograms and graphs programmatically with the
Fit()
method. - Minimization packages: ROOT provides several minimization packages.
- Using the ROOT::Fit classes
- Fit Panel: After a histogram is drawn, the Fit Panel GUI is best used for prototyping the fit.
- RooFit: The RooFit library is a toolkit for modeling the expected distribution of events in a physics analysis.
Using the Fit() method
The Fit()
method is implemented for:
- the histogram classes
TH1
- the sparse histogram classes
THnSparse
- the graph classes
TGraph
,TGraph2D
andTMultiGraph
(for fitting a collection of graphs with the same function)
Using TH1::Fit() and TGraph::Fit()
You can fit a TF1 function f1
to a histogram with the TH1::Fit() method, which has the following signature:
The function returns a TFitResultPtr, which is explained later in this manual.
By default, the fitted function object is added to the histogram and is drawn on the current pad.
For a detailed explanation of the option
and goption
strings, see TH1::Fit().
The xmin
and xmax
parameters optionally specify the fit range.
The signature of TGraph::Fit() is the same, but the supported options are slightly different: L
, WL
, and I
are exclusive to TH1::Fit(), while EX0
and ROB
only apply to TGraph::Fit() (these options are explained in the linked function documentations).
Fitting 1-D histograms with pre-defined functions
Use the TH1::Fit() method to fit a 1-D histogram with a pre-defined function. The name of the pre-defined function is the first parameter. For pre-defined functions, you do not need to set initial values for the parameters.
See the TH1::Fit() documentation for the full list of pre-defined functions.
Example
A histogram object hist
is fit with a Gaussian:
Fitting 1-D histograms with user-defined functions
You can also fit any TF1
function that you defined yourself in one of the ways listed in the class documentation to a 1-D histogram.
Example
In this example, we create a TF1 func
from a general C++ function with parameters:
Now the fitf
function is fitted to the histogram.
Configuring the fit
The following configuration actions are available when fitting a histogram or graph using the Fit()
method (relevant tutorials linked in parathesis):
- Fixing and setting parameter bounds
- Fitting subranges and multiple subranges (multifit.C / multifit.py). The tutorial shows how to fit several Gaussian functions with different parameters to separate subranges of the same histogram.
- Fitting the convolution of two functions (fitConvolution.C / fitConvolution.py)
- Fitting the normalized sum of functions (fitNormSum.C / fitNormSum.py)
- Adding functions to the list
Fixing and setting parameter bounds
For pre-defined functions like poln
, exp
, gaus
, and landau
, the parameter initial values are set automatically.
For not pre-defined functions, the fit parameters must be initialized before invoking the Fit()
method.
- Use the TF1::SetParLimits() method to set the bounds for one parameter.
When the lower and upper limits are equal, the parameter is fixed.
Example
The parameter is fixed 4 at 10.
- Use the TF1::FixParameter() method to fix a parameter to 0.
Example
You do not need to set the limits for all parameters.
Example
There is function with 6 parameters. Then a setup like the following is possible: Parameters 0 to 2 can vary freely, parameter 3 has boundaries [-10, 4] with the initial value -1.5, and parameter 4 is fixed to 0.
Adding functions to the list
The example $ROOTSYS/tutorials/fit/multifit.C
illustrates how to fit several functions on the same histogram.
By default a fit command deletes the previously fitted function in the histogram object. You can specify the option
+
in the second parameter to add the newly fitted function to the existing list of functions for the histogram.
Note that the fitted function(s) are saved with the histogram when it is written to a ROOT file.
Accessing fit results
You can obtain the following results of a fit:
- associated function
- parameter values
- errors
- covariance and correlation matrix (via the fit result object explained below)
Associated function
One or more objects (typically a TF1\*
) can be added to the list of functions associated to each histogram.
TH1::Fit() adds the fitted function to this list.
Given a histogram h
, you can retrieve the associated function with:
Accessing the fit parameters and results
If the histogram or graph is made persistent, the list of associated functions is also persistent.
Retrieve a pointer to the function with the TH1::GetFunction() method. Then you can retrieve the fit parameters from the function.
Example
With the fit option S
, you can access the full result of the fit including the covariance and correlation matrix.
Associated errors
By default, for each bin, the sum of weights is computed at fill time. You can also call TH1::Sumw2() to force the storage
and computation of the sum of the square of weights per bin. If Sumw2()
has been called, the error per bin is computed
as the sqrt(sum of squares of weights)
. Otherwise, the error is set equal to the `sqrt(bin content).
To return the error for a given bin number, use:
Empty bins are excluded in the fit when using the Chi-square fit method. When fitting an histogram representing
counts (that is with Poisson statistics) it is recommended to use the Log-Likelihood method (option L
or WL
), particularly
in case of low statistics.
Fit statistics box for plots
You can change the statistics box to display the fit parameters with the TStyle::SetOptFit() method. This parameter has four digits: mode = pcev (default = 0111)
p = 1
: Print probability.c = 1
: Print Chi-square/number of degrees of freedom.e = 1
: Print errors (if e=1, v must be 1).v = 1
: Print name/values of parameters.
Example
To print the fit probability, parameter names, values, and errors, use:
The fit result object
When fitting an histogram (a TH1
object) or a graph (a TGraph
object), it is possible to return a TFitResult
via the TFitResultPtr
object, which behaves as a smart pointer to a TFitResult
. TFitResultPtr
is the return object of TH1::Fit or TGraph::Fit.
By default TFitResultPtr
contains only the status of the fit and can be obtained by an automatic conversion of TFitResultPtr
to an integer. If the fit option S
is used instead, TFitResultPtr
contains TFitResult
and behaves as a smart pointer to it.
The TFitResult
class inherits from ROOT::Fit::FitResult.
In addition to the base FitResult class, it provides some methods to return
a covariance or correlation matrix as a TMatrixDSym
object.
Furthermore, TFitResult
can be stored in ROOT files.
All fit result objects support printing with FitResult::Print().
Example
Using ROOT::Fit classes
ROOT::Fit is the namespace for fitting classes (regression analysis). The fitting classes are part of the MathCore library.
The defined classes can be classified in the following groups:
- Fit method classes: Classes describing fit method functions like:
- ROOT::Fit::Chi2FCN: Class for binned fits using the least square methods.
- ROOT::Fit::PoissonLikelihoodFCN: Class for evaluating the log likelihood for binned Poisson likelihood fits.
- ROOT::Fit::LogLikelihoodFCN: Calls for likelihood fits.
- Fit data classes: Classes for describing the input data for fitting, including:
- Binned datasets (ROOT::Fit::BinData): data points containing both coordinates and a corresponding value or weight with optionally an error on the value or the coordinate.
They are used for least square (chi-square) fits of histograms or
TGraph
objects. - Un-binned datasets (ROOT::Fit::UnBinData): They are used for fitting vectors of data points, for example from a
TTree
.
- Binned datasets (ROOT::Fit::BinData): data points containing both coordinates and a corresponding value or weight with optionally an error on the value or the coordinate.
They are used for least square (chi-square) fits of histograms or
- User fitting classes: Classes for fitting a given dataset:
- ROOT::Fit::Fitter for executing the fit.
- ROOT::Fit::FitConfig for configuring the fit.
- ROOT::Fit::ParameterSettings to define the properties of the fit parameters (initial values, bounds, etc.).
- ROOT::Fit::FitResult for storing the result of the fit.
The fitter classes use the generic interfaces for parametric function evaluations, ROOT::Math::IParametricFunctionMultiDim, to define the fitting model function, and the ROOT::Math::Minimizer interface to perform the minimization of the target function.
Creating the data object
Example: filling a binned dataset from a histogram
There is histogram, represented as a TH1
type object. Now a ROOT:Fit::BinData
object is created and filled.
ROOT::Fit::DataOptions controls some fitting options.
In this example, the fIntegral
option is set to integrate the fit function over each bin instead of using the value at the bin centers.
The call to ROOT::Fit::DataRange sets the fit range to the interval between xmin
and xmax
.
Using un-binned data
- Use the ROOT::Fit::UnBinData class for un-binned data.
For creating un-binned datasets, there are two possibilities:
- Copy the data inside a
ROOT::Fit::UnBinData
object.
Create an emptyROOT::Fit::UnBinData
object, iterate on the data and add the data point one by one. An inputROOT::Fit::DataRange
object is passed in order to copy the data according to the given range. - Use
ROOT::Fit::UnBinData
as a wrapper to an external data storage.
In this case theROOT::Fit::UnBinData
object is created from an iterator or pointers to the data and the data are not copied inside. The data cannot be selected according to a specified range. All the data points will be included in the fit.
ROOT::Fit::UnBinData
supports also weighted data. In addition to the data points (coordinates), which
can be of arbitrary k
dimensions, the class can be constructed from a vector of weights.
Example
Data are taken from a standard vector.
Example
In this example a two-dimensional UnBinData
object is created with the contents from a tree.
Creating a fit model
To fit a dataset, a model is needed to describe the data, such as a probability density function (PDF) describing the observed data or a hypothetical function describing the relationship between the independent variables X
and the single dependent variable Y
. The model can have any number of k independent variables. For example, in fitting a k-dimensional histogram, the independent variables X
are the coordinates of the bin centers and Y
is the bin weight.
The model function needs to be expressed as function of some unknown parameters. The fitting will find the best parameter value to describe the observed data.
You can for example use the TF1
class, the parametric function class, to describe the model function.
But the ROOT::Fit::Fitter class takes as input a more general parametric function object, the abstract interface class ROOT::Math::IParametricFunctionMultiDim. It describes a generic one-dimensional or multi-dimensional function with parameters.
This interface extends the abstract ROOT::Math::IBaseFunctionMultiDim class with methods to set or retrieve parameter values and to evaluate the function given by the independent vector of values X
and the vector of parameters P
.
You convert a TF1
object in a ROOT::Math::IParametricFunctionMultiDim, using the wrapper class ROOT::Math::WrappedMultiTF1.
Example
When creating a wrapper, the parameter values stored in TF1
are copied to the ROOT::Math::WrappedMultiTF1 object. The function object representing the model function is given to the ROOT::Fit::Fitter class using the Fitter::SetFunction method.
You can also provide a function object that implements the derivatives of the function with respect to the parameters. In this case you must provide the function object as a class deriving from the ROOT::Math::IParametricGradFunctionMultiDim interface.
Note that the ROOT::Math::WrappedMultiTF1 wrapper class implements also the gradient interface, using internally TF1::GradientPar, which is based on numerical differentiation, apart for the case of linear functions (this is when TF1::IsLinear() is true
). The parameter derivatives of the model function can be useful to some minimization algorithms, such as FUMILI (see → FUMILI). However, in general is better to leave the minimization algorithm (for example TMinuit, see → TMinuit) to compute the needed derivatives using its own customised numerical differentiation algorithm. To avoid providing the parameter derivations to the fitter, explicitly set Fitter::SetFunction to false
.
Configuring the fit
Use the ROOT::Fit::FitConfig class (contained in the ROOT::Fit::ParameterSettings class) for configuring the fit.
There are the following fit configurations:
- Setting the initial values of the parameters.
- Setting the parameter step sizes.
- Setting eventual parameter bounds.
- Setting the minimizer library and the particular algorithm to use.
- Setting different minimization options (print level, tolerance, max iterations, etc. . . ).
- Setting the type of parameter errors to compute (parabolic error, minor errors, re-normalize errors using fitted chi2 values).
Example
Setting the lower and upper bounds for the first parameter and a lower bound for the second parameter:
Note that a ROOT::Fit::ParameterSettings objects exists for each fit parameter and it created by the ROOT::Fit::FitConfig class, after the model function has been set in the fitter. Only when the function is set, the number of parameter is known and automatically the FitConfig
creates the corresponding ParameterSetting
objects.
Various minimizers can be used in the fitting process. They can be implemented in different libraries and loaded at run time. Each different minimizer (for example Minuit, Minuit2, FUMILI, etc.) consists of a different implementation of the ROOT::Math::Minimizer interface. Within the same minimizer, thus within the same class implementing the Minimizer
interface, different algorithms exist.
If the requested minimizer is not available in ROOT, the default one is used. The default minimizer type and algorithm can be specified by using the static function ROOT::Math::MinimizerOptions::SetDefaultMinimizer("minimizerName")
.
Performing the fit
Depending on the available input data and the selected function for fitting, you can use one of the methods of the ROOT::Fit::Fitter class to perform the fit.
Pre-defined fitting methods
The following pre-defined fitting methods are available:
-
Least-square fit: Fitter::LeastSquare(const BinData &) or Fitter::Fit(const Bindata &). Both methods should be used when the binned data values follow a Gaussian distribution. These fit methods are implemented using the ROOT::Fit::Chi2FCN class.
-
Binned likelihood fit: Fitter::LikelihoodFit(const Bindata &). This method should be used when the binned data values follow a Poisson or a multinomial distribution. The Poisson case (extended fit) is the default and in this case the function normalization is also fit to the data. This method is implemented by the ROOT::Fit::PoissonLikelihoodFCN class.
-
Un-binned likelihood fit: Fitter::LikelihoodFit(const UnBindata &). By default the fit is not extended, this is the normalization is not fitted to the data. This method is implemented using the LogLikelihoodFCN class.
-
Linear fit: A linear fit can be chosen if the model function is linear in the parameters.
User-defined fitting methods
You can also implement your own fitting methods. You can implement your own version of the method function using on its own dataset objects and functions.
Use ROOT::Fit::Fitter::SetFCN to set the method function and ROOT::Fit::FitFCN for fitting.
You can pass the method function also in ROOT::Fit::FitFCN
, but in this case a previously defined fitting configuration is used.
The possible type of method functions that are based in ROOT::Fit::Fitter::SetFCN
are:
- A generic functor object implementing
operator()(const double * p)
wherep
is the parameter vector. In this case you need to pass the number of parameters, the function object and optionally a vector of initial parameter values. Other optional parameter include the size of the datasets and a flag specifying if it is achi2
(least-square fit). If the last two parameters are given, thechi2/ndf
can be computed after fitting the data.
- A function object implementing the ROOT::Math::IBaseFunctionMultiDim interface.
- A function object implementing the ROOT::Math::FitMethodFunction interface. This is an interface class that extends ROOT::Math::IBaseFunctionMultiDim with some additional functions which can be used when fitting is done. The extra functionality is required by some fitting algorithms like FUMILI or
GSLMultiFit
.
- An old-Minuit like FCN interface (this is a free function with the signature
fcn(int &npar, double *gin, double &f, double *u, int flag)
.
Example: simultaneous fit of two histograms
One good example that covers most of the ROOT::Fit
features is the simultaneous fit of two histograms (see the combinedFit.C / combinedFit.py tutorial).
Computing confidence intervals
With the fit result object returned by Fitter::Result(),
you can compute the confidence intervals after the fit (see ROOT::Fit::FitResult::GetConfidenceIntervals).
Given an input dataset (for example a BinData
object) and a confidence level value (for example 68%), it computes the lower and upper band values of the model function at the given data points.
You can take a loot at the ConfidenceIntervals.C tutorial for an example.
Using the Fit Panel
After you have drawn a histogram (see → Drawing a histograms), you can use the Fit Panel for fitting the data. The Fit Panel is best suited for prototyping
The following section describes how to use the Fit Panel using an example.
Given is a histogram following a Gaussian distribution.
-
Right-click on the object and then click
FitPanel
.
You also can selectTools
and then clickFit Panel
.
Figure: FitPanel in the context menu.
The Fit Panel is displayed.
Figure: Fit Panel.
In the Fit Function
section you can select the function that should be used for fitting.
The following types of functions are listed here:
-
Pre-defined functions that will depend on the dimensionality of the data.
-
Functions present in
gDirectory
. These functions were already created by the user through a ROOT macro. -
Previously used functions. Functions that fitted the current data previously, if the data is able to store such functions.
Select a fitting function.
- Click
Set Parameters...
to set the parameters of the selected function.
The Set Parameters of...
dialog window is displayed.
Figure: Set Parameters of… dialog window.
-
Set the parameters for the fit function.
-
In the
General
tab, select the general options for fitting.
This includes the method that will be used, as well as what fit options will be used with it and the draw options. You can also constrain the range of the function used for the fitting. -
In the
Minimization
tab, select the minimization algorithm for fitting. -
Click
Fit
.
Figure: A fitted histogram.