2## \ingroup tutorial_roofit_main

3## \notebook -js

4## A tutorial that explains you how to solve problems with binning effects and

5## numerical stability in binned fits.

6##

7## ### Introduction

8##

9## In this tutorial, you will learn three new things:

10##

11## 1. How to reduce the bias in binned fits by changing the definition of the

12## normalization integral

13##

14## 2. How to completely get rid of binning effects by integrating the pdf over each bin

15##

16## 3. How to improve the numeric stability of fits with a greatly different

17## number of events per bin, using a constant per-bin counterterm

18##

19## \macro_code

20## \macro_output

21##

22## \date January 2023

23## \author Jonas Rembser

24

25import ROOT

26

27

28def generateBinnedAsimov(pdf, x, n_events):

29 """

30 Generate binned Asimov dataset for a continuous pdf.

31 One should in principle be able to use

32 pdf.generateBinned(x, n_events, RooFit::ExpectedData()).

33 Unfortunately it has a problem: it also has the bin bias that this tutorial

34 demonstrates, to if we would use it, the biases would cancel out.

35 """

36 data_h = ROOT.RooDataHist("dataH", "dataH", {x})

37 x_binning = x.getBinning()

38

39 for i_bin in range(x.numBins()):

40 x.setRange("bin", x_binning.binLow(i_bin), x_binning.binHigh(i_bin))

41 integ = pdf.createIntegral(x, NormSet=x, Range="bin")

42 ROOT.SetOwnership(integ, True)

43 integ.getVal()

44 data_h.set(i_bin, n_events * integ.getVal(), -1)

45

46 return data_h

47

48

49def enableBinIntegrator(func, num_bins):

50 """

51 Force numeric integration and do this numeric integration with the

52 RooBinIntegrator, which sums the function values at the bin centers.

53 """

54 custom_config = ROOT.RooNumIntConfig(func.getIntegratorConfig())

55 custom_config.method1D().setLabel("RooBinIntegrator")

56 custom_config.getConfigSection("RooBinIntegrator").setRealValue("numBins", num_bins)

57 func.setIntegratorConfig(custom_config)

58 func.forceNumInt(True)

59

60

61def disableBinIntegrator(func):

62 """

63 Reset the integrator config to disable the RooBinIntegrator.

64 """

65 func.setIntegratorConfig()

66 func.forceNumInt(False)

67

68

69# Silence info output for this tutorial

70ROOT.RooMsgService.instance().getStream(1).removeTopic(ROOT.RooFit.Minimization)

71ROOT.RooMsgService.instance().getStream(1).removeTopic(ROOT.RooFit.Fitting)

72ROOT.RooMsgService.instance().getStream(1).removeTopic(ROOT.RooFit.Generation)

73

74# Exponential example

75# -------------------

76

77# Set up the observable

78x = ROOT.RooRealVar("x", "x", 0.1, 5.1)

79x.setBins(10)

80# fewer bins so we have larger binning effects for this demo

81

82# Let's first look at the example of an exponential function

83c = ROOT.RooRealVar("c", "c", -1.8, -5, 5)

84expo = ROOT.RooExponential("expo", "expo", x, c)

85

86# Generate an Asimov dataset such that the only difference between the fit

87# result and the true parameters comes from binning effects.

88expo_data = generateBinnedAsimov(expo, x, 10000)

89

90# If you do the fit the usual was in RooFit, you will get a bias in the

91# result. This is because the continuous, normalized pdf is evaluated only

92# at the bin centers.

93fit1 = expo.fitTo(expo_data, Save=True, PrintLevel=-1, SumW2Error=False)

94fit1.Print()

95

96# In the case of an exponential function, the bias that you get by

97# evaluating the pdf only at the bin centers is a constant scale factor in

98# each bin. Here, we can do a trick to get rid of the bias: we also

99# evaluate the normalization integral for the pdf the same way, i.e.,

100# summing the values of the unnormalized pdf at the bin centers. Like this

101# the bias cancels out. You can achieve this by customizing the way how the

102# pdf is integrated (see also the rf901_numintconfig tutorial).

103enableBinIntegrator(expo, x.numBins())

104fit2 = expo.fitTo(expo_data, Save=True, PrintLevel=-1, SumW2Error=False)

105fit2.Print()

106disableBinIntegrator(expo)

107

108# Power law example

109# -----------------

110

111# Let's not look at another example: a power law \f[x^a\f].

112a = ROOT.RooRealVar("a", "a", -0.3, -5.0, 5.0)

113powerlaw = ROOT.RooPowerSum("powerlaw", "powerlaw", x, ROOT.RooFit.RooConst(1.0), a)

114powerlaw_data = generateBinnedAsimov(powerlaw, x, 10000)

115

116# Again, if you do a vanilla fit, you'll get a bias

117fit3 = powerlaw.fitTo(powerlaw_data, Save=True, PrintLevel=-1, SumW2Error=False)

118fit3.Print()

119

120# This time, the bias is not the same factor in each bin! This means our

121# trick by sampling the integral in the same way doesn't cancel out the

122# bias completely. The average bias is canceled, but there are per-bin

123# biases that remain. Still, this method has some value: it is cheaper than

124# rigurously correcting the bias by integrating the pdf in each bin. So if

125# you know your per-bin bias variations are small or performance is an

126# issue, this approach can be sufficient.

127enableBinIntegrator(powerlaw, x.numBins())

128fit4 = powerlaw.fitTo(powerlaw_data, Save=True, PrintLevel=-1, SumW2Error=False)

129fit4.Print()

130disableBinIntegrator(powerlaw)

131

132# To get rid of the binning effects in the general case, one can use the

133# IntegrateBins() command argument. Now, the pdf is not evaluated at the

134# bin centers, but numerically integrated over each bin and divided by the

135# bin width. The parameter for IntegrateBins() is the required precision

136# for the numeric integrals. This is computationally expensive, but the

137# bias is now not a problem anymore.

138fit5 = powerlaw.fitTo(powerlaw_data, IntegrateBins=1e-3, Save=True, PrintLevel=-1, SumW2Error=False)

139fit5.Print()

140

141# Improving numerical stability

142# -----------------------------

143

144# There is one more problem with binned fits that is related to the binning

145# effects because often, a binned fit is affected by both problems.

146#

147# The issue is numerical stability for fits with a greatly different number

148# of events in each bin. For each bin, you have a term \f[n\log(p)\f] in

149# the NLL, where \f[n\f] is the number of observations in the bin, and

150# \f[p\f] the predicted probability to have an event in that bin. The

151# difference in the logarithms for each bin is small, but the difference in

152# \f[n\f] can be orders of magnitudes! Therefore, when summing these terms,

153# lots of numerical precision is lost for the bins with less events.

154

155# We can study this with the example of an exponential plus a Gaussian. The

156# Gaussian is only a faint signal in the tail of the exponential where

157# there are not so many events. And we can't afford any precision loss for

158# these bins, otherwise we can't fit the Gaussian.

159

160x.setBins(100) # It's not about binning effects anymore, so reset the number of bins.

161

162mu = ROOT.RooRealVar("mu", "mu", 3.0, 0.1, 5.1)

163sigma = ROOT.RooRealVar("sigma", "sigma", 0.5, 0.01, 5.0)

164gauss = ROOT.RooGaussian("gauss", "gauss", x, mu, sigma)

165

166nsig = ROOT.RooRealVar("nsig", "nsig", 10000, 0, 1e9)

167nbkg = ROOT.RooRealVar("nbkg", "nbkg", 10000000, 0, 1e9)

168frac = ROOT.RooRealVar("frac", "frac", nsig.getVal() / (nsig.getVal() + nbkg.getVal()), 0.0, 1.0)

169

170model = ROOT.RooAddPdf("model", "model", [gauss, expo], [nsig, nbkg])

171

172model_data = model.generateBinned(x)

173

174# Set the starting values for the Gaussian parameters away from the true

175# value such that the fit is not trivial.

176mu.setVal(2.0)

177sigma.setVal(1.0)

178

179fit6 = model.fitTo(model_data, Save=True, PrintLevel=-1, SumW2Error=False)

180fit6.Print()

181

182# You should see in the previous fit result that the fit did not converge:

183# the `MINIMIZE` return code should be -1 (a successful fit has status code zero).

184

185# To improve the situation, we can apply a numeric trick: if we subtract in

186# each bin a constant counterterm \f[n\log(n/N)\f], we get terms for each

187# bin that are closer to each other in order of magnitude as long as the

188# initial model is not extremely off. Proving this mathematically is left

189# as an exercise to the reader.

190

191# This counterterms can be enabled by passing the Offset("bin") option to

192# RooAbsPdf::fitTo() or RooAbsPdf::createNLL().

193

194fit7 = model.fitTo(model_data, Offset="bin", Save=True, PrintLevel=-1, SumW2Error=False)

195fit7.Print()

196

197# You should now see in the last fit result that the fit has converged.

TRangeDynCast

ROOT::Detail::TRangeCast< T, true > TRangeDynCast

TRangeDynCast is an adapter class that allows the typed iteration through a TCollection.

Definition TCollection.h:358

ROOT::Detail::TRangeCast

Definition TCollection.h:311