Logo ROOT  
Reference Guide
 
All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Friends Macros Modules Pages
Loading...
Searching...
No Matches
rf614_binned_fit_problems.C
Go to the documentation of this file.
1/// \file
2/// \ingroup tutorial_roofit
3/// \notebook -js
4/// A tutorial that explains you how to solve problems with binning effects and
5/// numerical stability in binned fits.
6///
7/// ### Introduction
8///
9/// In this tutorial, you will learn three new things:
10///
11/// 1. How to reduce the bias in binned fits by changing the definition of the
12/// normalization integral
13///
14/// 2. How to completely get rid of binning effects by integrating the pdf over each bin
15///
16/// 3. How to improve the numeric stability of fits with a greatly different
17/// number of events per bin, using a constant per-bin counterterm
18///
19/// \macro_code
20/// \macro_output
21///
22/// \date January 2023
23/// \author Jonas Rembser
24
25// Generate binned Asimov dataset for a continuous pdf.
26// One should in principle be able to use
27// pdf.generateBinned(x, nEvents, RooFit::ExpectedData()).
28// Unfortunately it has a problem: it also has the bin bias that this tutorial
29// demonstrates, to if we would use it, the biases would cancel out.
30std::unique_ptr<RooDataHist> generateBinnedAsimov(RooAbsPdf const &pdf, RooRealVar &x, int nEvents)
31{
32 auto dataH = std::make_unique<RooDataHist>("dataH", "dataH", RooArgSet{x});
33 RooAbsBinning &xBinning = x.getBinning();
34 for (int iBin = 0; iBin < x.numBins(); ++iBin) {
35 x.setRange("bin", xBinning.binLow(iBin), xBinning.binHigh(iBin));
36 std::unique_ptr<RooAbsReal> integ{pdf.createIntegral(x, RooFit::NormSet(x), RooFit::Range("bin"))};
37 integ->getVal();
38 dataH->set(iBin, nEvents * integ->getVal(), -1);
39 }
40 return dataH;
41}
42
43// Force numeric integration and do this numeric integration with the
44// RooBinIntegrator, which sums the function values at the bin centers.
45void enableBinIntegrator(RooAbsReal &func, int numBins)
46{
48 customConfig.method1D().setLabel("RooBinIntegrator");
49 customConfig.getConfigSection("RooBinIntegrator").setRealValue("numBins", numBins);
51 func.forceNumInt(true);
52}
53
54// Reset the integrator config to disable the RooBinIntegrator.
56{
58 func.forceNumInt(false);
59}
60
62{
63 using namespace RooFit;
64
65 // Silence info output for this tutorial
66 RooMsgService::instance().getStream(1).removeTopic(Minimization);
67 RooMsgService::instance().getStream(1).removeTopic(Fitting);
68 RooMsgService::instance().getStream(1).removeTopic(Generation);
69
70 // Exponential example
71 // -------------------
72
73 // Set up the observable
74 RooRealVar x{"x", "x", 0.1, 5.1};
75 x.setBins(10); // fewer bins so we have larger binning effects for this demo
76
77 // Let's first look at the example of an exponential function
78 RooRealVar c{"c", "c", -1.8, -5, 5};
79 RooExponential expo{"expo", "expo", x, c};
80
81 // Generate an Asimov dataset such that the only difference between the fit
82 // result and the true parameters comes from binning effects.
83 std::unique_ptr<RooAbsData> expoData{generateBinnedAsimov(expo, x, 10000)};
84
85 // If you do the fit the usual was in RooFit, you will get a bias in the
86 // result. This is because the continuous, normalized pdf is evaluated only
87 // at the bin centers.
88 std::unique_ptr<RooFitResult> fit1{expo.fitTo(*expoData, Save(), PrintLevel(-1), SumW2Error(false))};
89 fit1->Print();
90
91 // In the case of an exponential function, the bias that you get by
92 // evaluating the pdf only at the bin centers is a constant scale factor in
93 // each bin. Here, we can do a trick to get rid of the bias: we also
94 // evaluate the normalization integral for the pdf the same way, i.e.,
95 // summing the values of the unnormalized pdf at the bin centers. Like this
96 // the bias cancels out. You can achieve this by customizing the way how the
97 // pdf is integrated (see also the rf901_numintconfig tutorial).
98 enableBinIntegrator(expo, x.numBins());
99 std::unique_ptr<RooFitResult> fit2{expo.fitTo(*expoData, Save(), PrintLevel(-1), SumW2Error(false))};
100 fit2->Print();
102
103 // Power law example
104 // -----------------
105
106 // Let's not look at another example: a power law \f[x^a\f].
107 RooRealVar a{"a", "a", -0.3, -5.0, 5.0};
108 RooPowerSum powerlaw{"powerlaw", "powerlaw", x, RooConst(1.0), a};
109 std::unique_ptr<RooAbsData> powerlawData{generateBinnedAsimov(powerlaw, x, 10000)};
110
111 // Again, if you do a vanilla fit, you'll get a bias
112 std::unique_ptr<RooFitResult> fit3{powerlaw.fitTo(*powerlawData, Save(), PrintLevel(-1), SumW2Error(false))};
113 fit3->Print();
114
115 // This time, the bias is not the same factor in each bin! This means our
116 // trick by sampling the integral in the same way doesn't cancel out the
117 // bias completely. The average bias is canceled, but there are per-bin
118 // biases that remain. Still, this method has some value: it is cheaper than
119 // rigurously correcting the bias by integrating the pdf in each bin. So if
120 // you know your per-bin bias variations are small or performance is an
121 // issue, this approach can be sufficient.
122 enableBinIntegrator(powerlaw, x.numBins());
123 std::unique_ptr<RooFitResult> fit4{powerlaw.fitTo(*powerlawData, Save(), PrintLevel(-1), SumW2Error(false))};
124 fit4->Print();
126
127 // To get rid of the binning effects in the general case, one can use the
128 // IntegrateBins() command argument. Now, the pdf is not evaluated at the
129 // bin centers, but numerically integrated over each bin and divided by the
130 // bin width. The parameter for IntegrateBins() is the required precision
131 // for the numeric integrals. This is computationally expensive, but the
132 // bias is now not a problem anymore.
133 std::unique_ptr<RooFitResult> fit5{
134 powerlaw.fitTo(*powerlawData, IntegrateBins(1e-3), Save(), PrintLevel(-1), SumW2Error(false))};
135 fit5->Print();
136
137 // Improving numerical stability
138 // -----------------------------
139
140 // There is one more problem with binned fits that is related to the binning
141 // effects because often, a binned fit is affected by both problems.
142 //
143 // The issue is numerical stability for fits with a greatly different number
144 // of events in each bin. For each bin, you have a term \f[n\log(p)\f] in
145 // the NLL, where \f[n\f] is the number of observations in the bin, and
146 // \f[p\f] the predicted probability to have an event in that bin. The
147 // difference in the logarithms for each bin is small, but the difference in
148 // \f[n\f] can be orders of magnitudes! Therefore, when summing these terms,
149 // lots of numerical precision is lost for the bins with less events.
150
151 // We can study this with the example of an exponential plus a Gaussian. The
152 // Gaussian is only a faint signal in the tail of the exponential where
153 // there are not so many events. And we can't afford any precision loss for
154 // these bins, otherwise we can't fit the Gaussian.
155
156 x.setBins(100); // It's not about binning effects anymore, so reset the number of bins.
157
158 RooRealVar mu{"mu", "mu", 3.0, 0.1, 5.1};
159 RooRealVar sigma{"sigma", "sigma", 0.5, 0.01, 5.0};
160 RooGaussian gauss{"gauss", "gauss", x, mu, sigma};
161
162 RooRealVar nsig{"nsig", "nsig", 10000, 0, 1e9};
163 RooRealVar nbkg{"nbkg", "nbkg", 10000000, 0, 1e9};
164 RooRealVar frac{"frac", "frac", nsig.getVal() / (nsig.getVal() + nbkg.getVal()), 0.0, 1.0};
165
166 RooAddPdf model{"model", "model", {gauss, expo}, {nsig, nbkg}};
167
168 std::unique_ptr<RooAbsData> modelData{model.generateBinned(x)};
169
170 // Set the starting values for the Gaussian parameters away from the true
171 // value such that the fit is not trivial.
172 mu.setVal(2.0);
173 sigma.setVal(1.0);
174
175 std::unique_ptr<RooFitResult> fit6{model.fitTo(*modelData, Save(), PrintLevel(-1), SumW2Error(false))};
176 fit6->Print();
177
178 // You should see in the previous fit result that the fit did not converge:
179 // the `MINIMIZE` return code should be -1 (a successful fit has status code zero).
180
181 // To improve the situation, we can apply a numeric trick: if we subtract in
182 // each bin a constant counterterm \f[n\log(n/N)\f], we get terms for each
183 // bin that are closer to each other in order of magnitude as long as the
184 // initial model is not extremely off. Proving this mathematically is left
185 // as an exercise to the reader.
186
187 // This counterterms can be enabled by passing the Offset("bin") option to
188 // RooAbsPdf::fitTo() or RooAbsPdf::createNLL().
189
190 std::unique_ptr<RooFitResult> fit7{
191 model.fitTo(*modelData, Offset("bin"), Save(), PrintLevel(-1), SumW2Error(false))};
192 fit7->Print();
193
194 // You should now see in the last fit result that the fit has converged.
195}
#define c(i)
Definition RSha256.hxx:101
#define a(i)
Definition RSha256.hxx:99
#define e(i)
Definition RSha256.hxx:103
ROOT::Detail::TRangeCast< T, true > TRangeDynCast
TRangeDynCast is an adapter class that allows the typed iteration through a TCollection.
Abstract base class for RooRealVar binning definitions.
Abstract interface for all probability density functions.
Definition RooAbsPdf.h:40
Abstract base class for objects that represent a real value and implements functionality common to al...
Definition RooAbsReal.h:59
RooFit::OwningPtr< RooAbsReal > createIntegral(const RooArgSet &iset, const RooCmdArg &arg1, const RooCmdArg &arg2={}, const RooCmdArg &arg3={}, const RooCmdArg &arg4={}, const RooCmdArg &arg5={}, const RooCmdArg &arg6={}, const RooCmdArg &arg7={}, const RooCmdArg &arg8={}) const
Create an object that represents the integral of the function over one or more observables listed in ...
void setIntegratorConfig()
Remove the specialized numeric integration configuration associated with this object.
virtual void forceNumInt(bool flag=true)
Definition RooAbsReal.h:169
const RooNumIntConfig * getIntegratorConfig() const
Return the numeric integration configuration used for this object.
Efficient implementation of a sum of PDFs of the form.
Definition RooAddPdf.h:33
RooArgSet is a container object that can hold multiple RooAbsArg objects.
Definition RooArgSet.h:55
Exponential PDF.
Plain Gaussian p.d.f.
Definition RooGaussian.h:24
static RooMsgService & instance()
Return reference to singleton instance.
Holds the configuration parameters of the various numeric integrators used by RooRealIntegral.
RooPowerSum implements a power law PDF of the form.
Definition RooPowerSum.h:20
Variable that can be changed from the outside.
Definition RooRealVar.h:37
void setBins(Int_t nBins, const char *name=nullptr)
Create a uniform binning under name 'name' for this variable.
RooConstVar & RooConst(double val)
RooCmdArg NormSet(Args_t &&... argsOrArgSet)
RooCmdArg IntegrateBins(double precision)
Integrate the PDF over bins.
RooCmdArg Offset(std::string const &mode)
RooCmdArg Save(bool flag=true)
RooCmdArg SumW2Error(bool flag)
RooCmdArg PrintLevel(Int_t code)
RooCmdArg Range(const char *rangeName, bool adjustNorm=true)
const Double_t sigma
Double_t x[n]
Definition legend1.C:17
The namespace RooFit contains mostly switches that change the behaviour of functions of PDFs (or othe...
Definition JSONIO.h:26
Definition fit1.py:1