Histograms are approximate representations of the distribution of numerical data and they play a fundamental role in any kind of physical analysis. Histograms can be used to visualize your data, by being an approximation to the underlying data density distribution, and they can be used also as a powerful form of data reduction. For a detailed description of what is an histogram, see the corresponding page in Wikipedia.
ROOT provides a rich set of functionality to work with histograms. They can be used for continuos data in one or multi dimensions, they can represent integer data and they can also be used to display categorical data, such as bar charts. Furthermore, ROOT supports building histograms from weighted data sets, which are very common in HEP, it provides the functionality to compute summary statistics information from the histograms input data, such as sample mean, standard deviation and higher momenta.
ROOT provides also the functionality to perform operations on histograms such as addition, division and multiplication or transformations such as rebinning, scaling, including normalisations, or projections from a multi dimensional histograms to ones with lower dimensions. The ROOT histogram library provides also the capability or producing profile plots from multi dimensional data, see Profile Histograms.
The first step to construct an histogram is to define a range for the input data and then bin this range of values in intervals: the histogram bins. The histogram will count how many values fall into each interval, building a frequency distribution of the input data. ROOT supports histograms with bins of equal size or variable size.
See the histogram tutorials for all the possible type of histograms that can be built.→ Histogram tutorials
The ROOT histogram classes derive from the base
TH1 class, which is a common interface to interact with the ROOT histograms.
Derived classes exist depending on the dimension, 1-D, 2-D and 3-D, and the type used to represent the bin contents:
- one byte per channel:
TH3C. Maximum bin content = 127.
- one short per channel:
TH3S. Maximum bin content = 32767.
- one int per channel:
TH3I. Maximum bin content = 2147483647.
- one float per channel:
TH3F. Maximum precision 7 digits, i.e a maximum bin content of around 1E7 for having the precision of one count.
- one double per channel:
TH3D. Maximum precision 15 digits, corresponding to a maximum bin content of around 5E15.
If there are no particular needs for limiting the memory used by the histograms, it is recommended to use the double precision version:
TH1D for the 1-D case,
TH2D for the 2-D and
TH3D for 3-D.
Histograms for larger dimensions
For the case of dimensions larger than 3, ROOT provides a generic base class for multi-dimensional histogram
THn and the derived classes
THnC, which are different instantiations of a generic template
THn classes should be used when a large fraction of all bins are filled.
Given the large amount of memory used by
THn, sparse multi-dimensional histogram classes exist for the use case of multi-dimensions and large number of bins.
The base class for sparse histograms is
THnSparse with its derived instantiation
Note that both
THnSparse do not inherit from
TH1 and have therefore a slightly different interface.
In addition to the standard histograms, ROOT provides also classes for producing profile plots, i.e plots obtained from multi-dimensional input data (e.g. X and Y), where one of the dimension (Y) is not grouped in bins, but the sample mean value and the corresponding error are displayed. A profile plot can be used to better visualize dependence relations in multi-dimensional data, than using standard multi-dimensional histogram plots such as scatter plots.
TProfileis a profile histogram for (X,Y) data to display the mean value of Y and its error for each bin in X.
TProfile2Dis a profile histogram for (X,Y,Z) data to display the mean value of Z and its error for each cell in X,Y.
TProfile3Dis a profile histogram for (X,Y,Z,T) data to display the mean value of T and its error for each cell in X,Y,Z.
Profile plots have the option to display instead of the default sample mean error the sample standard deviation (the spread of the data). A similar plot to a profile, which shows the quantiles of the
Y input data, is the candle plot, called also box plot, (see tutorial), which can be obtained directly from the 2D histogram using the
All histogram types support fixed or variable bin sizes. 2-D histograms may have fixed size bins along X and variable size bins along Y or vice-versa.
The type of binning of the histogram is managed by the
TAxis class, which defines also the minimum and maximum range of the input data that will be collected in the bins.
TH1 class orders the bins using a global bin number for dealing with the multi-dimensional cases.
For all histogram types:
bin# 0 contains the underflow.
bin# 1 contains the first bin with low-edge (
The second to last bin (bin# nbins) contains the upper-edge (
The last bin (bin#
nbins+1) contains the overflow.
A global bin number is defined to access the histogram bin information independently of the dimension.
Assuming a 3-D histogram
binz, the function TH1::GetBin(binx,biny,binz) returns a global
bin number and given a global bin number
bin, the function TH1::GetBinXYZ(bin,binx,biny,binz) computes the
You can re-bin a histogram via the TH1::Rebin() method. It returns a new histogram with the re-binned contents. If bin errors were stored, they are recomputed during the re-binning. You can see this tutorial as a re-binning example.
Stack of histograms
Working with histograms
Creating and copying a histogram
- Use a constructor of the derived classes (
TH1) to create a histogram object, by passing in addition to name an title strings, the number of bins, the minimum and maximum range.
For creating variable bins histograms:
note that the array of bin edges should be of size
nbins+1 , since it contains the lower and upper range axis values.
For creating a profile histograms passing the range in the profiled variable (e..g .Y for a
TProfille) is optional:
For Clone/copy an existing histogram you can use the
Clone() method or the copy constructor.
Clone() returns a pointer to a
TObject and it requires the casting to
TH1, while the copy constructor can be used only with the leaf histogram classes (e.g
TH1D for a double type histogram).
If the type of h1 is
TH1D you can also do:
Getting the bin width
- Use the GetBinWidth() method to get the bin width of a histogram.
Filling a histogram
Fill a histogram with the TH1::Fill() method.
- For 1-D histograms
- For 2-D histograms and
- For 3-D histograms and
Fill() method computes the bin number corresponding to the given x, y or z argument and increments this bin by the given weight.
Fill() method returns the bin number for 1-D histograms or global bin number for 2-D and 3-D histograms.
Note that when filling an histogram passing a weight different than one, the histogram assumes you are dealing with a weighted data set and stores internally an additional array with the sum of weight square used to compute its error. A weighted histogram is displayed always by default showing the bin error for each bin instead of the standard histogram bar. See Filling histograms for more details about filling histograms, such as computation of bin errors or automatic axis extension.
Filling a histogram with vector input data
An histogram can also be filled directly by an array of values of type
double and an array of weights.
This is useful when working in Python with
numpy arrays, so you can fill directly an histograms. An example is provided in the next section.
Filling a histogram with random numbers
Fill a histogram with random numbers with the TH1::FillRandom() method.
FillRandom() method uses the contents of an existing
TF1 function or another
TH1 histogram (for all dimensions) and the default random number generator defined in
gRandom. See the
TRandom class for the available generators in ROOT.
A histogram is randomly filled 10 000 times with a default Gaussian distribution of mean 0 and sigma 1.
In Python you can use random numbers generated using the
numpy.random library and
FillN for filling the histogram:
import numpy as np import ROOT #generate an array of normal distributed data with mean=5 and stddev=2 containing 1000 values x = np.random.normal(5,2,1000) w = np.ones(1000) h1 = ROOT.TH1D("h1","h1 title",50,0.,10.); h1.FillN(1000, x, w)
Use the TH1::GetRandom() method to get a random number distributed according the contents of a histogram.
Adding, multiplying and dividing histograms
Following operations are supported on histograms or between histograms:
Addition of a histogram to the current histogram.
Additions of two histograms with coefficients and storage into the current histogram.
Multiplications and divisions are supported in the same way as additions.
You can use the operators (+, *, /) or the
Multiplying a histogram object with a constant:
Creating a new histogram without changing the original one:
Multiplying two histograms and put the result in the third one:
When performing operations, the resulting bin errors and histogram statistics are computed assuming the two histograms are independents and a normal approximation for the bin error is used.
A special case is when histograms are divided to compute efficiency and in this case the numerator histogram is a subset of the denominator histogram. The correct handling of errors is done by using
You can use TH1::Scale (Double_t c1 = 1, Option_t* option = “”) in combination with TH1::Integral (Option_t* option = “”) to normalize histograms.
The following example shows how to normalize a histograms, such as it represents a probability density distribution
The following histogram is given:
Figure: A trial histogram for normalizing.
To use the normalization methods, you can clone first the histogram to keep the original one, call then
TH1::Scale passing as scale parameter value the histogram integral. In addition, use the option
width to divide also by the bin width in order to display the probability density in each bin.
If you want to show just the frequency probability of each bin, you don’t need to use the
After applying the normalization method, redraw the histogram with your preferred drawing option:
Drawing a histogram
- Use the TH1::Draw() method to draw a histogram using the provided drawing option.
The drawing is delegated to the
THistPainterclass that specializes the drawing of the histogram. The
THistPainterclass is separated from the histogram, so that the histogram class does not contain the graphics overhead.
The “drawing option” is the unique parameter of the TH1::Draw() method. It specifies how the histogram will be graphically rendered. For detailed information on the drawing options for all histogram classes, refer to THistPainter. For all possible available options see the Histogram plotting options.
The drawing options are NOT case sensitive!
Drawing an histogram copy
By default when an histogram is drawn in the ROOT canvas it is not copied in order to have automatically draw updates that can happen in the histogram object. This means that if the histogram object is created on the stack inside a defined C++ scope (or inside a Python function when using PyROOT), it will be automatically deleted when exiting the scope and the final consequence will be that the drawn object will disappear. To avoid this to happen you can use:
Figure: Histogram drawn with Draw().
- draw an histogram with error bars:
- draw a 2-D histogram as a LEGO plot:
Note that we have used here
DrawCopy since in this case the histogram is created within a C++ scope and it will be deleted at the end.
Figure: Histogram drawn with Draw(“LEGO1”).
Using the histogram Editor
The following example shows how to use the GUI Editor to modify the histogram drawing
A canvas with the histogram is displayed.
View, and then click
Click on the histogram.
Style tab, you can select and change some of the drawing option and drawing style.
Projections of histograms
One can perform projection from multi-dimensional histograms (
TH3) to lower dimensional histograms and to profile histograms (
TProfile). See the
reference guide for the available projection functions.
Fast Fourier transforms for histograms
ROOT histograms provide functions to compute statistics on the input data such as mean, TH1::GetMean, standard deviation, TH1::GetStdDevand also kurtosis, TH1::GetKurtosis and skewness, TH1::GetSkewnessand covariance and correlation, see for example TH2::GetCorrelationFactor for multi-dimensional histograms.
TH1::GetRMS is equivalent to
TH1::GetStdDev, since historically the
RMS has been identified as the sample standard deviation.
The histogram statistics can be displayed in the histogram statistics box.
Note that by default, the histogram statistics are computed on all the raw input data sample, but when an histogram range is selected, the statistics are computed in the user defined range and using only the bin center information.
The function [TH1::GetQuantiles] computes, from the given histogram binned data, the quantiles, such as median and quartiles. For example, to compute the quartiles (including the median), you provide as input the probability values for which you want to compute the corresponding quantiles:
The ROOT histogram class provides also functions to perform statistical comparison tests, such as goodness of fit tests, for testing compatibility of two histograms (2 sample tests) or compatibility of an histogram with a theoretical distribution, i.e. a function (1 sample tests).
For tests of histogram-histogram compatibility:
- TH1::Chi2Test for performing a chi2 test between two histograms. This tests works also for multi-dimensional histograms, but it requires to have non-empty bins.
- TH1::KolmogorovTest to perform the Kolmogorov-Smirnov test on the two histograms. Note that this tests works only for 1-D histograms and it has a bias for binned data and should be used if the bin size is sufficiently small.
- TH1::AndersonDarlig working only for 1-D histograms.
For histogram-function comparison tests:
- TH1::Chisquare using the chi2 test
- TH1::Chisquare(function,”L”) (note the option
L) to use the Poisson likelihood ratio based method suggested by Baker and Cousins (see corresponding paper).
Histogram bin Errors
The bin error of the histograms are computed by default as following:
- unweighted histogram: square root of bin content
- weighted histogram : square root of the bin sum of the weights square.
For unweighted histograms there is also the option to compute the Poisson standard confidence intervals for each bin, by calling
TH1::SetBinErrorOption(TH1::kPoisson). After this, one can retrieve
the corresponding lower and upper bin error by using