Histograms

Histograms are approximate representations of the distribution of numerical data and they play a fundamental role in any kind of physical analysis. Histograms can be used to visualize your data, by being an approximation to the underlying data density distribution, and they can be used also as a powerful form of data reduction. For a detailed description of what is an histogram, see the corresponding page in Wikipedia.

ROOT provides a rich set of functionality to work with histograms. They can be used for continuos data in one or multi dimensions, they can represent integer data and they can also be used to display categorical data, such as bar charts. Furthermore, ROOT supports building histograms from weighted data sets, which are very common in HEP, it provides the functionality to compute summary statistics information from the histograms input data, such as sample mean, standard deviation and higher momenta.

ROOT provides also the functionality to perform operations on histograms such as addition, division and multiplication or transformations such as rebinning, scaling, including normalisations, or projections from a multi dimensional histograms to ones with lower dimensions. The ROOT histogram library provides also the capability or producing profile plots from multi dimensional data, see Profile Histograms.

The first step to construct an histogram is to define a range for the input data and then bin this range of values in intervals: the histogram bins. The histogram will count how many values fall into each interval, building a frequency distribution of the input data. ROOT supports histograms with bins of equal size or variable size.

See the histogram tutorials for all the possible type of histograms that can be built.

→ Histogram tutorials

Histogram classes

The ROOT histogram classes derive from the base TH1 class, which is a common interface to interact with the ROOT histograms. Derived classes exist depending on the dimension, 1-D, 2-D and 3-D, and the type used to represent the bin contents:

  • one byte per channel: TH1C, TH2C or TH3C. Maximum bin content = 127.
  • one short per channel: TH1S, TH2S or TH3S. Maximum bin content = 32767.
  • one int per channel: TH1I, TH2I or TH3I. Maximum bin content = 2147483647.
  • one float per channel: TH1F, TH2F or TH3F. Maximum precision 7 digits, i.e a maximum bin content of around 1E7 for having the precision of one count.
  • one double per channel: TH1D, TH2D or TH3D . Maximum precision 15 digits, corresponding to a maximum bin content of around 5E15.

If there are no particular needs for limiting the memory used by the histograms, it is recommended to use the double precision version: TH1D for the 1-D case, TH2D for the 2-D and TH3D for 3-D.

Histograms for larger dimensions

For the case of dimensions larger than 3, ROOT provides a generic base class for multi-dimensional histogram THn and the derived classes THnD, THnF, THnL, THnI, THnS and THnC, which are different instantiations of a generic template THnT<Type>. The THn classes should be used when a large fraction of all bins are filled. Given the large amount of memory used by THn, sparse multi-dimensional histogram classes exist for the use case of multi-dimensions and large number of bins. The base class for sparse histograms is THnSparse with its derived instantiation THnSparse<type>.

Note that both THn and THnSparse do not inherit from TH1 and have therefore a slightly different interface.

Profile histograms

In addition to the standard histograms, ROOT provides also classes for producing profile plots, i.e plots obtained from multi-dimensional input data (e.g. X and Y), where one of the dimension (Y) is not grouped in bins, but the sample mean value and the corresponding error are displayed. A profile plot can be used to better visualize dependence relations in multi-dimensional data, than using standard multi-dimensional histogram plots such as scatter plots.

  • TProfile is a profile histogram for (X,Y) data to display the mean value of Y and its error for each bin in X.

  • TProfile2D is a profile histogram for (X,Y,Z) data to display the mean value of Z and its error for each cell in X,Y.

  • TProfile3D is a profile histogram for (X,Y,Z,T) data to display the mean value of T and its error for each cell in X,Y,Z.

Profile plots have the option to display instead of the default sample mean error the sample standard deviation (the spread of the data). A similar plot to a profile, which shows the quantiles of the Y input data, is the candle plot, called also box plot, (see tutorial), which can be obtained directly from the 2D histogram using the candle draw option.

Bin numbering

All histogram types support fixed or variable bin sizes. 2-D histograms may have fixed size bins along X and variable size bins along Y or vice-versa. The type of binning of the histogram is managed by the TAxis class, which defines also the minimum and maximum range of the input data that will be collected in the bins. The TH1 class orders the bins using a global bin number for dealing with the multi-dimensional cases.

Conventions

For all histogram types: nbins, xlow, xup:

  • bin# 0 contains the underflow.

  • bin# 1 contains the first bin with low-edge (xlow INCLUDED).

  • The second to last bin (bin# nbins) contains the upper-edge (xup EXCLUDED).

  • The last bin (bin# nbins+1) contains the overflow.

  • A global bin number is defined to access the histogram bin information independently of the dimension.

Assuming a 3-D histogram h with binx, biny, binz, the function TH1::GetBin(binx,biny,binz) returns a global bin number and given a global bin number bin, the function TH1::GetBinXYZ(bin,binx,biny,binz) computes the corresponding binx, biny and binz.

More details on the histogram binning are available in the TH1 reference documentation. ROOT supports also automatic binning, see Histograms with automatic bins.

Re-binning

You can re-bin a histogram via the TH1::Rebin() method. It returns a new histogram with the re-binned contents. If bin errors were stored, they are recomputed during the re-binning. You can see this tutorial as a re-binning example.

Stack of histograms

THStack is a collection of TH1 or TH2 histograms. The tutorial hstack.C is a good example of how using the THStack class.

Working with histograms

Creating and copying a histogram

  • Use a constructor of the derived classes (TH1D instead of TH1) to create a histogram object, by passing in addition to name an title strings, the number of bins, the minimum and maximum range.

Examples

In the following examples, histograms are created for the classes TH1I , TH2F , TH3D :

   TH1* h1 = new TH1I("h1", "h1 title", 100, 0.0, 4.0);
   TH2* h2 = new TH2F("h2", "h2 title", 40, 0.0, 2.0, 30, -1.5, 3.5);
   TH3* h3 = new TH3D("h3", "h3 title");

For creating variable bins histograms:

   double binEdges[] = { 0.0, 0.2, 0.5, 1., 2., 4. };
   TH1* h1 = new TH1D("h1", "h1 title", 6, binEdges );
   TH2* h2 = new TH2D("h2", "h2 title", 6, binEdges , 30, -1.5, 3.5);

note that the array of bin edges should be of size nbins+1 , since it contains the lower and upper range axis values.

For creating a profile histograms passing the range in the profiled variable (e..g .Y for a TProfille) is optional:

   TProfile* p1 = new TProfile("p1", "profile title", 40, 0.0, 2.0 );
   TProfile* p2 = new TProfile("p2", "profile title", 40, 0.0, 2.0, -1.5, 3.5 );

For Clone/copy an existing histogram you can use the Clone() method or the copy constructor. Note that Clone() returns a pointer to a TObject and it requires the casting to TH1, while the copy constructor can be used only with the leaf histogram classes (e.g TH1D for a double type histogram).

Example

TH1* hc = (TH1*)h1->Clone();

If the type of h1 is TH1D you can also do:

TH1* hc = new TH1D(*h1);

Getting the bin width

  • Use the GetBinWidth() method to get the bin width of a histogram.
   TH1D h1("h1","Histogram from a Gaussian",100,-3,3);
   h1.GetBinWidth(1)

   (double) 0.060000000

Filling a histogram

Fill a histogram with the TH1::Fill() method.

Examples

  • For 1-D histograms
   h1->Fill(x);
   h1->Fill(x,w); // with weight
  • For 2-D histograms and TProfile:
   h2->Fill(x,y);
   p2->Fill(x,y);
   h2->Fill(x,y,w);  // with weights
   p2->Fill(x,y,w);
  • For 3-D histograms and TProfile2D:
   h3->Fill(x,y,z);
   h3->Fill(x,y,z,w); // with weights

The Fill() method computes the bin number corresponding to the given x, y or z argument and increments this bin by the given weight.
The Fill() method returns the bin number for 1-D histograms or global bin number for 2-D and 3-D histograms.

Note that when filling an histogram passing a weight different than one, the histogram assumes you are dealing with a weighted data set and stores internally an additional array with the sum of weight square used to compute its error. A weighted histogram is displayed always by default showing the bin error for each bin instead of the standard histogram bar. See Filling histograms for more details about filling histograms, such as computation of bin errors or automatic axis extension.

Filling a histogram with vector input data

An histogram can also be filled directly by an array of values of type double and an array of weights.

   std::vector<double> x = {0,1,2,3,4,5,6,7,8,9};
   std::vector<double> w(x.size(),1); // weights vector
   auto h1 = new TH1D("h1","h1 title",10,0.,10.);
   h1->FillN(10,x.data(),w.data());

This is useful when working in Python with numpy arrays, so you can fill directly an histograms. An example is provided in the next section.

Filling a histogram with random numbers

Fill a histogram with random numbers with the TH1::FillRandom() method.

The FillRandom() method uses the contents of an existing TF1 function or another TH1 histogram (for all dimensions) and the default random number generator defined in gRandom. See the
TRandom class for the available generators in ROOT.

Example

A histogram is randomly filled 10 000 times with a default Gaussian distribution of mean 0 and sigma 1.

   TH1F h1("h1","Histogram from a Gaussian",100,-3,3);
   h1.FillRandom("gaus",10000);

In Python you can use random numbers generated using the numpy.random library and FillN for filling the histogram:

   import numpy as np
   import ROOT
#generate an array of normal distributed data with mean=5 and stddev=2 containing 1000 values
   x = np.random.normal(5,2,1000)
   w = np.ones(1000)
   h1 = ROOT.TH1D("h1","h1 title",50,0.,10.);
   h1.FillN(1000, x, w)

Use the TH1::GetRandom() method to get a random number distributed according the contents of a histogram.

Adding, multiplying and dividing histograms

Following operations are supported on histograms or between histograms:

  • Addition of a histogram to the current histogram.

  • Additions of two histograms with coefficients and storage into the current histogram.

  • Multiplications and divisions are supported in the same way as additions.

You can use the operators (+, *, /) or the TH1 methods Add(), Multiply() and Divide().

Examples

Multiplying a histogram object with a constant:

   h1.Scale(const)

Creating a new histogram without changing the original one:

   TH1F h3 = 8*h1;

Multiplying two histograms and put the result in the third one:

TH1F h3 = h1*h2;

When performing operations, the resulting bin errors and histogram statistics are computed assuming the two histograms are independents and a normal approximation for the bin error is used. A special case is when histograms are divided to compute efficiency and in this case the numerator histogram is a subset of the denominator histogram. The correct handling of errors is done by using the TEfficiency class.

Normalizing histograms

You can use TH1::Scale (Double_t c1 = 1, Option_t* option = “”) in combination with TH1::Integral (Option_t* option = “”) to normalize histograms.

The following example shows how to normalize a histograms, such as it represents a probability density distribution

Example

The following histogram is given:

   TH1D *h = new TH1D("h","a trial histogram", 100, -1.5, 1.5);
   for (Int_t i = 0; i < 10000; i++) h->Fill(gRandom->Gaus(0, 1));
   h->Draw();

Figure: A trial histogram for normalizing.

To use the normalization methods, you can clone first the histogram to keep the original one, call then TH1::Scale passing as scale parameter value the histogram integral. In addition, use the option width to divide also by the bin width in order to display the probability density in each bin. If you want to show just the frequency probability of each bin, you don’t need to use the width option.

   TH1*h1 = (TH1*)(h->Clone("h1"));
   h1->Scale(1./h1->Integral(), "width");
   h1->Draw();

After applying the normalization method, redraw the histogram with your preferred drawing option:

   myHist->Draw("HIST")

Drawing a histogram

  • Use the TH1::Draw() method to draw a histogram using the provided drawing option. The drawing is delegated to the THistPainter class that specializes the drawing of the histogram. The THistPainter class is separated from the histogram, so that the histogram class does not contain the graphics overhead.

Drawing options

The “drawing option” is the unique parameter of the TH1::Draw() method. It specifies how the histogram will be graphically rendered. For detailed information on the drawing options for all histogram classes, refer to THistPainter. For all possible available options see the Histogram plotting options.

Note

The drawing options are NOT case sensitive!

Drawing an histogram copy

By default when an histogram is drawn in the ROOT canvas it is not copied in order to have automatically draw updates that can happen in the histogram object. This means that if the histogram object is created on the stack inside a defined C++ scope (or inside a Python function when using PyROOT), it will be automatically deleted when exiting the scope and the final consequence will be that the drawn object will disappear. To avoid this to happen you can use:

Examples

   TH1D h1("h1","Histogram from a Gaussian",100,-3,3);
   h1.FillRandom("gaus",10000);
   h1.Draw();

Figure: Histogram drawn with Draw().

  • draw an histogram with error bars:
   h1.Draw("E");
  • draw a 2-D histogram as a LEGO plot:
{
   TH2D h2("h2","Histogram filled with random numbers",40,-4,4,40,-20,20);
   float px, py;
   for (int i = 0; i < 25000; i++) {
      gRandom->Rannor(px,py);
      h2.Fill(px,5*py);
   }
   h2.DrawCopy("LEGO1");

Note that we have used here DrawCopy since in this case the histogram is created within a C++ scope and it will be deleted at the end.

Figure: Histogram drawn with Draw(“LEGO1”).

THistPainter implements drawing options for 1-D, 2-D, and 3-D histogram classes. It also implements specific drawing options for THStack.

Using the histogram Editor

The following example shows how to use the GUI Editor to modify the histogram drawing

Example

   TH1F h1("h1","Histogram from a Gaussian",100,-3,3);
   h1.FillRandom("gaus",10000);
   h1.Draw();

A canvas with the histogram is displayed.

  • Click View, and then click Editor.

  • Click on the histogram.

In the Style tab, you can select and change some of the drawing option and drawing style.

Fitting Histograms

Histograms in ROOT can be fitted with user defined functions defined using the ROOT
TF1 function classes. For fitting histograms see the Fitting section.

Miscellaneous Operations

Projections of histograms

One can perform projection from multi-dimensional histograms (TH2 and TH3) to lower dimensional histograms and to profile histograms (TProfile). See the reference guide for the available projection functions.

Fast Fourier transforms for histograms

ROOT provides with TVirtualFFT an interface class for fast Fourier transforms (FFT) (see → FFTW. With TH1::FFT() you can perform a FFT for a histogram.

Histogram statistics

ROOT histograms provide functions to compute statistics on the input data such as mean, TH1::GetMean, standard deviation, TH1::GetStdDevand also kurtosis, TH1::GetKurtosis and skewness, TH1::GetSkewnessand covariance and correlation, see for example TH2::GetCorrelationFactor for multi-dimensional histograms.

The function TH1::GetRMS is equivalent to TH1::GetStdDev, since historically the RMS has been identified as the sample standard deviation.

In addition, ROOT provides functions to compute estimations of the error of the sample mean and standard deviations. See TH1::GetMeanError and TH1::GetStdDevError.

The histogram statistics can be displayed in the histogram statistics box.

Note that by default, the histogram statistics are computed on all the raw input data sample, but when an histogram range is selected, the statistics are computed in the user defined range and using only the bin center information.

The function [TH1::GetQuantiles] computes, from the given histogram binned data, the quantiles, such as median and quartiles. For example, to compute the quartiles (including the median), you provide as input the probability values for which you want to compute the corresponding quantiles:

   double p[3] = { 0.25, 0.50, 0.75};
   double q[3];
   h1.GetQuantiles(3,q,p);
   std::cout << "first quartile (25th percentile) = " << q[0] << std::endl;
   std::cout << "median (50th percentile) = " << q[1] << std::endl;
   std::cout << "third quartile (75th percentile) = " << q[2] << std::endl;

Statistical tests

The ROOT histogram class provides also functions to perform statistical comparison tests, such as goodness of fit tests, for testing compatibility of two histograms (2 sample tests) or compatibility of an histogram with a theoretical distribution, i.e. a function (1 sample tests).

For tests of histogram-histogram compatibility:

  • TH1::Chi2Test for performing a chi2 test between two histograms. This tests works also for multi-dimensional histograms, but it requires to have non-empty bins.
  • TH1::KolmogorovTest to perform the Kolmogorov-Smirnov test on the two histograms. Note that this tests works only for 1-D histograms and it has a bias for binned data and should be used if the bin size is sufficiently small.
  • TH1::AndersonDarlig working only for 1-D histograms.

For histogram-function comparison tests:

Histogram bin Errors

The bin error of the histograms are computed by default as following:

  • unweighted histogram: square root of bin content
  • weighted histogram : square root of the bin sum of the weights square.

For unweighted histograms there is also the option to compute the Poisson standard confidence intervals for each bin, by calling TH1::SetBinErrorOption(TH1::kPoisson). After this, one can retrieve the corresponding lower and upper bin error by using TH1::GetBinErrorLow() and TH1::GetBinErrorUp.