Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
rf614_binned_fit_problems.C
Go to the documentation of this file.
1/// \file
2/// \ingroup tutorial_roofit
3/// \notebook -js
4/// A tutorial that explains you how to solve problems with binning effects and
5/// numerical stability in binned fits.
6///
7/// ### Introduction
8///
9/// In this tutorial, you will learn three new things:
10///
11/// 1. How to reduce the bias in binned fits by changing the definition of the
12/// normalization integral
13///
14/// 2. How to completely get rid of binning effects by integrating the pdf over each bin
15///
16/// 3. How to improve the numeric stability of fits with a greatly different
17/// number of events per bin, using a constant per-bin counterterm
18///
19/// \macro_code
20/// \macro_output
21///
22/// \date January 2023
23/// \author Jonas Rembser
24
25// Generate binned Asimov dataset for a continuous pdf.
26// One should in principle be able to use
27// pdf.generateBinned(x, nEvents, RooFit::ExpectedData()).
28// Unfortunately it has a problem: it also has the bin bias that this tutorial
29// demonstrates, to if we would use it, the biases would cancel out.
30std::unique_ptr<RooDataHist> generateBinnedAsimov(RooAbsPdf const &pdf, RooRealVar &x, int nEvents)
31{
32 auto dataH = std::make_unique<RooDataHist>("dataH", "dataH", RooArgSet{x});
33 RooAbsBinning &xBinning = x.getBinning();
34 for (int iBin = 0; iBin < x.numBins(); ++iBin) {
35 x.setRange("bin", xBinning.binLow(iBin), xBinning.binHigh(iBin));
36 std::unique_ptr<RooAbsReal> integ{pdf.createIntegral(x, RooFit::NormSet(x), RooFit::Range("bin"))};
37 integ->getVal();
38 dataH->set(iBin, nEvents * integ->getVal(), -1);
39 }
40 return dataH;
41}
42
43// Force numeric integration and do this numeric integration with the
44// RooBinIntegrator, which sums the function values at the bin centers.
45void enableBinIntegrator(RooAbsReal &func, int numBins)
46{
47 RooNumIntConfig customConfig(*func.getIntegratorConfig());
48 customConfig.method1D().setLabel("RooBinIntegrator");
49 customConfig.getConfigSection("RooBinIntegrator").setRealValue("numBins", numBins);
50 func.setIntegratorConfig(customConfig);
51 func.forceNumInt(true);
52}
53
54// Reset the integrator config to disable the RooBinIntegrator.
55void disableBinIntegrator(RooAbsReal &func)
56{
58 func.forceNumInt(false);
59}
60
62{
63 using namespace RooFit;
64
65 // Silence info output for this tutorial
69
70 // Exponential example
71 // -------------------
72
73 // Set up the observable
74 RooRealVar x{"x", "x", 0.1, 5.1};
75 x.setBins(10); // fewer bins so we have larger binning effects for this demo
76
77 // Let's first look at the example of an exponential function
78 RooRealVar c{"c", "c", -1.8, -5, 5};
79 RooExponential expo{"expo", "expo", x, c};
80
81 // Generate an Asimov dataset such that the only difference between the fit
82 // result and the true parameters comes from binning effects.
83 std::unique_ptr<RooAbsData> expoData{generateBinnedAsimov(expo, x, 10000)};
84
85 // If you do the fit the usual was in RooFit, you will get a bias in the
86 // result. This is because the continuous, normalized pdf is evaluated only
87 // at the bin centers.
88 std::unique_ptr<RooFitResult> fit1{expo.fitTo(*expoData, Save(), PrintLevel(-1), SumW2Error(false))};
89 fit1->Print();
90
91 // In the case of an exponential function, the bias that you get by
92 // evaluating the pdf only at the bin centers is a constant scale factor in
93 // each bin. Here, we can do a trick to get rid of the bias: we also
94 // evaluate the normalization integral for the pdf the same way, i.e.,
95 // summing the values of the unnormalized pdf at the bin centers. Like this
96 // the bias cancels out. You can achieve this by customizing the way how the
97 // pdf is integrated (see also the rf901_numintconfig tutorial).
98 enableBinIntegrator(expo, x.numBins());
99 std::unique_ptr<RooFitResult> fit2{expo.fitTo(*expoData, Save(), PrintLevel(-1), SumW2Error(false))};
100 fit2->Print();
101 disableBinIntegrator(expo);
102
103 // Power law example
104 // -----------------
105
106 // Let's not look at another example: a power law \f[x^a\f].
107 RooRealVar a{"a", "a", -0.3, -5.0, 5.0};
108 RooGenericPdf powerlaw{"powerlaw", "std::pow(x, a)", {x, a}};
109 std::unique_ptr<RooAbsData> powerlawData{generateBinnedAsimov(powerlaw, x, 10000)};
110
111 // Again, if you do a vanilla fit, you'll get a bias
112 std::unique_ptr<RooFitResult> fit3{powerlaw.fitTo(*powerlawData, Save(), PrintLevel(-1), SumW2Error(false))};
113 fit3->Print();
114
115 // This time, the bias is not the same factor in each bin! This means our
116 // trick by sampling the integral in the same way doesn't cancel out the
117 // bias completely. The average bias is canceled, but there are per-bin
118 // biases that remain. Still, this method has some value: it is cheaper than
119 // rigurously correcting the bias by integrating the pdf in each bin. So if
120 // you know your per-bin bias variations are small or performance is an
121 // issue, this approach can be sufficient.
122 enableBinIntegrator(powerlaw, x.numBins());
123 std::unique_ptr<RooFitResult> fit4{powerlaw.fitTo(*powerlawData, Save(), PrintLevel(-1), SumW2Error(false))};
124 fit4->Print();
125 disableBinIntegrator(powerlaw);
126
127 // To get rid of the binning effects in the general case, one can use the
128 // IntegrateBins() command argument. Now, the pdf is not evaluated at the
129 // bin centers, but numerically integrated over each bin and divided by the
130 // bin width. The parameter for IntegrateBins() is the required precision
131 // for the numeric integrals. This is computationally expensive, but the
132 // bias is now not a problem anymore.
133 std::unique_ptr<RooFitResult> fit5{
134 powerlaw.fitTo(*powerlawData, IntegrateBins(1e-3), Save(), PrintLevel(-1), SumW2Error(false))};
135 fit5->Print();
136
137 // Improving numerical stability
138 // -----------------------------
139
140 // There is one more problem with binned fits that is related to the binning
141 // effects because often, a binned fit is affected by both problems.
142 //
143 // The issue is numerical stability for fits with a greatly different number
144 // of events in each bin. For each bin, you have a term \f[n\log(p)\f] in
145 // the NLL, where \f[n\f] is the number of observations in the bin, and
146 // \f[p\f] the predicted probability to have an event in that bin. The
147 // difference in the logarithms for each bin is small, but the difference in
148 // \f[n\f] can be orders of magnitudes! Therefore, when summing these terms,
149 // lots of numerical precision is lost for the bins with less events.
150
151 // We can study this with the example of an exponential plus a Gaussian. The
152 // Gaussian is only a faint signal in the tail of the exponential where
153 // there are not so many events. And we can't afford any precision loss for
154 // these bins, otherwise we can't fit the Gaussian.
155
156 x.setBins(100); // It's not about binning effects anymore, so reset the number of bins.
157
158 RooRealVar mu{"mu", "mu", 3.0, 0.1, 5.1};
159 RooRealVar sigma{"sigma", "sigma", 0.5, 0.01, 5.0};
160 RooGaussian gauss{"gauss", "gauss", x, mu, sigma};
161
162 RooRealVar nsig{"nsig", "nsig", 10000, 0, 1e9};
163 RooRealVar nbkg{"nbkg", "nbkg", 10000000, 0, 1e9};
164 RooRealVar frac{"frac", "frac", nsig.getVal() / (nsig.getVal() + nbkg.getVal()), 0.0, 1.0};
165
166 RooAddPdf model{"model", "model", {gauss, expo}, {nsig, nbkg}};
167
168 std::unique_ptr<RooAbsData> modelData{model.generateBinned(x)};
169
170 // Set the starting values for the Gaussian parameters away from the true
171 // value such that the fit is not trivial.
172 mu.setVal(2.0);
173 sigma.setVal(1.0);
174
175 std::unique_ptr<RooFitResult> fit6{model.fitTo(*modelData, Save(), PrintLevel(-1), SumW2Error(false))};
176 fit6->Print();
177
178 // You should see in the previous fit result that the fit did not converge:
179 // the `MINIMIZE` return code should be -1 (a successful fit has status code zero).
180
181 // To improve the situation, we can apply a numeric trick: if we subtract in
182 // each bin a constant counterterm \f[n\log(n/N)\f], we get terms for each
183 // bin that are closer to each other in order of magnitude as long as the
184 // initial model is not extremely off. Proving this mathematically is left
185 // as an exercise to the reader.
186
187 // This counterterms can be enabled by passing the Offset("bin") option to
188 // RooAbsPdf::fitTo() or RooAbsPdf::createNLL().
189
190 std::unique_ptr<RooFitResult> fit7{
191 model.fitTo(*modelData, Offset("bin"), Save(), PrintLevel(-1), SumW2Error(false))};
192 fit7->Print();
193
194 // You should now see in the last fit result that the fit has converged.
195}
#define c(i)
Definition RSha256.hxx:101
#define a(i)
Definition RSha256.hxx:99
#define e(i)
Definition RSha256.hxx:103
RooAbsBinning is the abstract base class for RooRealVar binning definitions.
virtual void setRange(double xlo, double xhi)=0
Abstract interface for all probability density functions.
Definition RooAbsPdf.h:40
RooFit::OwningPtr< RooFitResult > fitTo(RooAbsData &data, CmdArgs_t const &... cmdArgs)
Fit PDF to given dataset.
Definition RooAbsPdf.h:156
virtual RooFit::OwningPtr< RooDataHist > generateBinned(const RooArgSet &whatVars, double nEvents, const RooCmdArg &arg1, const RooCmdArg &arg2={}, const RooCmdArg &arg3={}, const RooCmdArg &arg4={}, const RooCmdArg &arg5={}) const
As RooAbsPdf::generateBinned(const RooArgSet&, const RooCmdArg&,const RooCmdArg&, const RooCmdArg&,...
Definition RooAbsPdf.h:109
Abstract base class for objects that represent a real value and implements functionality common to al...
Definition RooAbsReal.h:59
RooFit::OwningPtr< RooAbsReal > createIntegral(const RooArgSet &iset, const RooCmdArg &arg1, const RooCmdArg &arg2={}, const RooCmdArg &arg3={}, const RooCmdArg &arg4={}, const RooCmdArg &arg5={}, const RooCmdArg &arg6={}, const RooCmdArg &arg7={}, const RooCmdArg &arg8={}) const
Create an object that represents the integral of the function over one or more observables listed in ...
double getVal(const RooArgSet *normalisationSet=nullptr) const
Evaluate object.
Definition RooAbsReal.h:103
void setIntegratorConfig()
Remove the specialized numeric integration configuration associated with this object.
virtual void forceNumInt(bool flag=true)
Definition RooAbsReal.h:169
const RooNumIntConfig * getIntegratorConfig() const
Return the numeric integration configuration used for this object.
Efficient implementation of a sum of PDFs of the form.
Definition RooAddPdf.h:33
RooArgSet is a container object that can hold multiple RooAbsArg objects.
Definition RooArgSet.h:55
Exponential PDF.
Plain Gaussian p.d.f.
Definition RooGaussian.h:24
RooGenericPdf is a concrete implementation of a probability density function, which takes a RooArgLis...
static RooMsgService & instance()
Return reference to singleton instance.
StreamConfig & getStream(Int_t id)
RooNumIntConfig holds the configuration parameters of the various numeric integrators used by RooReal...
RooRealVar represents a variable that can be changed from the outside.
Definition RooRealVar.h:37
void setBins(Int_t nBins, const char *name=nullptr)
Create a uniform binning under name 'name' for this variable.
RooCmdArg NormSet(Args_t &&... argsOrArgSet)
RooCmdArg IntegrateBins(double precision)
Integrate the PDF over bins.
RooCmdArg Offset(std::string const &mode)
RooCmdArg Save(bool flag=true)
RooCmdArg SumW2Error(bool flag)
RooCmdArg PrintLevel(Int_t code)
RooCmdArg Range(const char *rangeName, bool adjustNorm=true)
const Double_t sigma
Double_t x[n]
Definition legend1.C:17
The namespace RooFit contains mostly switches that change the behaviour of functions of PDFs (or othe...
Definition JSONIO.h:26
Definition fit1.py:1
void removeTopic(RooFit::MsgTopic oldTopic)