Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
RooBinSamplingPdf.cxx
Go to the documentation of this file.
1// Authors: Stephan Hageboeck, CERN; Andrea Sciandra, SCIPP-UCSC/Atlas; Nov 2020
2
3/*****************************************************************************
4 * RooFit
5 * Authors: *
6 * WV, Wouter Verkerke, UC Santa Barbara, verkerke@slac.stanford.edu *
7 * DK, David Kirkby, UC Irvine, dkirkby@uci.edu *
8 * *
9 * Copyright (c) 2000-2020, Regents of the University of California *
10 * and Stanford University. All rights reserved. *
11 * *
12 * Redistribution and use in source and binary forms, *
13 * with or without modification, are permitted according to the terms *
14 * listed in LICENSE (http://roofit.sourceforge.net/license.txt) *
15 *****************************************************************************/
16
17/**
18 * \class RooBinSamplingPdf
19 * The RooBinSamplingPdf is supposed to be used as an adapter between a continuous PDF
20 * and a binned distribution.
21 * When RooFit is used to fit binned data, and the PDF is continuous, it takes the probability density
22 * at the bin centre as a proxy for the probability averaged (integrated) over the entire bin. This is
23 * correct only if the second derivative of the function vanishes, though. This is shown in the plots
24 * below.
25 *
26 * For PDFs that have larger curvatures, the RooBinSamplingPdf can be used. It integrates the PDF in each
27 * bin using an adaptive integrator. This usually requires 21 times more function evaluations, but significantly
28 * reduces biases due to better sampling of the PDF. The integrator can be accessed from the outside
29 * using integrator(). This can be used to change the integration rules, so less/more function evaluations are
30 * performed. The target precision of the integrator can be set in the constructor.
31 *
32 *
33 * ### How to use it
34 * There are two ways to use this class:
35 * - Manually wrap a PDF:
36 * ```
37 * RooBinSamplingPdf binSampler("<name>", "title", <binned observable of PDF>, <original PDF> [, <precision for integrator>]);
38 * binSampler.fitTo(data);
39 * ```
40 * When a PDF is wrapped with a RooBinSamplingPDF, just use the bin sampling PDF instead of the original one for fits
41 * or plotting etc.
42 * \note The binning will be taken from the observable. Make sure that this binning is the same as the one of the dataset that should be fit.
43 * Use RooRealVar::setBinning() to adapt it.
44 * - Instruct test statistics to carry out this wrapping automatically:
45 * ```
46 * pdf.fitTo(data, IntegrateBins(<precision>));
47 * ```
48 * This method is especially useful when used with a simultaneous PDF, since each component will automatically be wrapped,
49 * depending on the value of `precision`:
50 * - `precision < 0.`: None of the PDFs are touched, bin sampling is off.
51 * - `precision = 0.`: Continuous PDFs that are fit to a RooDataHist are wrapped into a RooBinSamplingPdf. The target precision
52 * forwarded to the integrator is 1.E-4 (the default argument of the constructor).
53 * - `precision > 0.`: All continuous PDFs are automatically wrapped into a RooBinSamplingPdf, regardless of what data they are
54 * fit to (see next paragraph). The same `'precision'` is used for all integrators.
55 *
56 * ### Simulating a binned fit using RooDataSet
57 * Some frameworks use unbinned data (RooDataSet) to simulate binned datasets. By adding one entry for each bin centre with the
58 * appropriate weight, one can achieve the same result as fitting with RooDataHist. In this case, however, RooFit cannot
59 * auto-detect that a binned fit is running, and that an integration over the bin is desired (note that there are no bins to
60 * integrate over in this kind of dataset).
61 *
62 * In this case, `IntegrateBins(>0.)` needs to be used, and the desired binning needs to be assigned to the observable
63 * of the dataset:
64 * ```
65 * RooRealVar x("x", "x", 0., 5.);
66 * x.setBins(10);
67 *
68 * // <create dataset and model>
69 *
70 * model.fitTo(data, IntegrateBins(>0.));
71 * ```
72 *
73 * \see RooAbsPdf::fitTo()
74 * \see IntegrateBins()
75 *
76 * \note This feature is currently limited to one-dimensional PDFs.
77 *
78 *
79 * \htmlonly <style>div.image img[src="RooBinSamplingPdf_OFF.png"]{width:12cm;}</style> \endhtmlonly
80 * \htmlonly <style>div.image img[src="RooBinSamplingPdf_ON.png" ]{width:12cm;}</style> \endhtmlonly
81 * <table>
82 * <tr><th> Binned fit without %RooBinSamplingPdf <th> Binned fit with %RooBinSamplingPdf </td></tr>
83 * <tr><td> \image html RooBinSamplingPdf_OFF.png ""
84 * </td>
85 * <td> \image html RooBinSamplingPdf_ON.png ""
86 * </td></tr>
87 * </table>
88 *
89 */
90
91
92#include "RooBinSamplingPdf.h"
93
94#include "RooHelpers.h"
95#include "RooRealBinding.h"
96#include "RooRealVar.h"
97#include "RooGlobalFunc.h"
98#include "RooDataHist.h"
99
100#include "Math/Integrator.h"
101
102#include <algorithm>
103
104////////////////////////////////////////////////////////////////////////////////
105/// Construct a new RooBinSamplingPdf.
106/// \param[in] name A name to identify this object.
107/// \param[in] title Title (for e.g. plotting)
108/// \param[in] observable Observable to integrate over (the one that is binned).
109/// \param[in] inputPdf A PDF whose bins should be sampled with higher precision.
110/// \param[in] epsilon Relative precision for the integrator, which is used to sample the bins.
111/// Note that ROOT's default is to use an adaptive integrator, which in its first iteration usually reaches
112/// relative precision of 1.E-4 or better. Therefore, asking for lower precision rarely has an effect.
113RooBinSamplingPdf::RooBinSamplingPdf(const char *name, const char *title, RooAbsRealLValue& observable,
114 RooAbsPdf& inputPdf, double epsilon) :
115 RooAbsPdf(name, title),
116 _pdf("inputPdf", "Function to be converted into a PDF", this, inputPdf),
117 _observable("observable", "Observable to integrate over", this, observable, true, true),
118 _relEpsilon(epsilon) {
119 if (!_pdf->dependsOn(*_observable)) {
120 throw std::invalid_argument(std::string("RooBinSamplingPDF(") + GetName()
121 + "): The PDF " + _pdf->GetName() + " needs to depend on the observable "
122 + _observable->GetName());
123 }
124}
125
126
127 ////////////////////////////////////////////////////////////////////////////////
128 /// Copy a RooBinSamplingPdf.
129 /// \param[in] other PDF to copy.
130 /// \param[in] name Optionally rename the copy.
132 RooAbsPdf(other, name),
133 _pdf("inputPdf", this, other._pdf),
134 _observable("observable", this, other._observable),
135 _relEpsilon(other._relEpsilon) { }
136
137
138////////////////////////////////////////////////////////////////////////////////
139/// Integrate the PDF over the current bin of the observable.
141 const unsigned int bin = _observable->getBin();
142 const double low = _observable->getBinning().binLow(bin);
143 const double high = _observable->getBinning().binHigh(bin);
144
145 const double oldX = _observable->getVal();
146 double result;
147 {
148 // Important: When the integrator samples x, caching of sub-tree values needs to be off.
150 result = integrate(_normSet, low, high) / (high-low);
151 }
152
153 _observable->setVal(oldX);
154
155 return result;
156}
157
158
159////////////////////////////////////////////////////////////////////////////////
160/// Integrate the PDF over all its bins, and return a batch with those values.
161/// \param[in,out] evalData Struct with evaluation data.
162/// \param[in] normSet Normalisation set that's used to evaluate the PDF.
163void RooBinSamplingPdf::computeBatch(cudaStream_t*, double* output, size_t /*size*/, RooFit::Detail::DataMap const& dataMap) const
164{
165 // Retrieve binning, which we need to compute the probabilities
166 auto boundaries = binBoundaries();
167 auto xValues = dataMap.at(_observable);
168
169 // Important: When the integrator samples x, caching of sub-tree values needs to be off.
171
172 // Now integrate PDF in each bin:
173 for (unsigned int i=0; i < xValues.size(); ++i) {
174 const double x = xValues[i];
175 const auto upperIt = std::upper_bound(boundaries.begin(), boundaries.end(), x);
176 const unsigned int bin = std::distance(boundaries.begin(), upperIt) - 1;
177 assert(bin < boundaries.size());
178
179 output[i] = integrate(nullptr, boundaries[bin], boundaries[bin+1]) / (boundaries[bin+1]-boundaries[bin]);
180 }
181}
182
183
184////////////////////////////////////////////////////////////////////////////////
185/// Get the bin boundaries for the observable.
186/// These will be recomputed whenever the shape of this object is dirty.
188 if (isShapeDirty() || _binBoundaries.empty()) {
189 _binBoundaries.clear();
190 const RooAbsBinning& binning = _observable->getBinning(nullptr);
191 const double* boundaries = binning.array();
192
193 for (int i=0; i < binning.numBoundaries(); ++i) {
194 _binBoundaries.push_back(boundaries[i]);
195 }
196
197 assert(std::is_sorted(_binBoundaries.begin(), _binBoundaries.end()));
198
200 }
201
202 return {_binBoundaries};
203}
204
205
206////////////////////////////////////////////////////////////////////////////////
207/// Return a list of all bin boundaries, so the PDF is plotted correctly.
208/// \param[in] obs Observable to generate the boundaries for.
209/// \param[in] xlo Beginning of range to create list of boundaries for.
210/// \param[in] xhi End of range to create to create list of boundaries for.
211/// \return Pointer to a list to be deleted by caller.
212std::list<double>* RooBinSamplingPdf::binBoundaries(RooAbsRealLValue& obs, double xlo, double xhi) const {
213 if (obs.namePtr() != _observable->namePtr()) {
214 coutE(Plotting) << "RooBinSamplingPdf::binBoundaries(" << GetName() << "): observable '" << obs.GetName()
215 << "' is not the observable of this PDF ('" << _observable->GetName() << "')." << std::endl;
216 return nullptr;
217 }
218
219 auto list = new std::list<double>;
220 for (double val : binBoundaries()) {
221 if (xlo <= val && val < xhi)
222 list->push_back(val);
223 }
224
225 return list;
226}
227
228
229////////////////////////////////////////////////////////////////////////////////
230/// Return a list of all bin edges, so the PDF is plotted as a step function.
231/// \param[in] obs Observable to generate the sampling hint for.
232/// \param[in] xlo Beginning of range to create sampling hint for.
233/// \param[in] xhi End of range to create sampling hint for.
234/// \return Pointer to a list to be deleted by caller.
235std::list<double>* RooBinSamplingPdf::plotSamplingHint(RooAbsRealLValue& obs, double xlo, double xhi) const {
236 if (obs.namePtr() != _observable->namePtr()) {
237 coutE(Plotting) << "RooBinSamplingPdf::plotSamplingHint(" << GetName() << "): observable '" << obs.GetName()
238 << "' is not the observable of this PDF ('" << _observable->GetName() << "')." << std::endl;
239 return nullptr;
240 }
241
242 auto binEdges = new std::list<double>;
243 const auto& binning = obs.getBinning();
244
245 for (unsigned int bin=0, numBins = static_cast<unsigned int>(binning.numBins()); bin < numBins; ++bin) {
246 const double low = std::max(binning.binLow(bin), xlo);
247 const double high = std::min(binning.binHigh(bin), xhi);
248 const double width = high - low;
249
250 // Check if this bin is in plotting range at all
251 if (low >= high)
252 continue;
253
254 // Move support points slightly inside the bin, so step function is plotted correctly.
255 binEdges->push_back(low + 0.001 * width);
256 binEdges->push_back(high - 0.001 * width);
257 }
258
259 return binEdges;
260}
261
262
263////////////////////////////////////////////////////////////////////////////////
264/// Direct access to the unique_ptr holding the integrator that's used to sample the bins.
265/// This can be used to change options such as sampling accuracy or to entirely exchange the integrator.
266///
267/// #### Example: Use the 61-point Gauss-Kronrod integration rule
268/// ```{.cpp}
269/// ROOT::Math::IntegratorOneDimOptions intOptions = pdf.integrator()->Options();
270/// intOptions.SetNPoints(6); // 61-point integration rule
271/// intOptions.SetRelTolerance(1.E-9); // Smaller tolerance -> more subdivisions
272/// pdf.integrator()->SetOptions(intOptions);
273/// ```
274/// \see ROOT::Math::IntegratorOneDim::SetOptions for more details on integration options.
275/// \note When RooBinSamplingPdf is loaded from files, integrator options will fall back to the default values.
276std::unique_ptr<ROOT::Math::IntegratorOneDim>& RooBinSamplingPdf::integrator() const {
277 if (!_integrator) {
278 _integrator = std::make_unique<ROOT::Math::IntegratorOneDim>(*this,
279 ROOT::Math::IntegrationOneDim::kADAPTIVE, // GSL Integrator. Will really get it only if MathMore enabled.
280 -1., _relEpsilon, // Abs epsilon = default, rel epsilon set by us.
281 0, // We don't limit the sub-intervals. Steer run time via _relEpsilon.
282 2 // This should read ROOT::Math::Integration::kGAUSS21, but this is in MathMore, so we cannot include it here.
283 );
284 }
285
286 return _integrator;
287}
288
289
290////////////////////////////////////////////////////////////////////////////////
291/// Binding used by the integrator to evaluate the PDF.
292double RooBinSamplingPdf::operator()(double x) const {
294 return _pdf;
295}
296
297
298////////////////////////////////////////////////////////////////////////////////
299/// Integrate the wrapped PDF using our current integrator, with given norm set and limits.
300double RooBinSamplingPdf::integrate(const RooArgSet* /*normSet*/, double low, double high) const {
301 return integrator()->Integral(low, high);
302}
303
304
305/// Creates a wrapping RooBinSamplingPdf if appropriate.
306/// \param[in] pdf The input pdf.
307/// \param[in] data The dataset to be used in the fit, used to figure out the
308/// observables and whether the dataset is binned.
309/// \param[in] precision Precision argument for all created RooBinSamplingPdfs.
310std::unique_ptr<RooAbsPdf> RooBinSamplingPdf::create(RooAbsPdf& pdf, RooAbsData const &data, double precision) {
311 if (precision < 0.)
312 return nullptr;
313
314 std::unique_ptr<RooArgSet> funcObservables( pdf.getObservables(data) );
315 const bool oneDimAndBinned = (1 == std::count_if(funcObservables->begin(), funcObservables->end(), [](const RooAbsArg* arg) {
316 auto var = dynamic_cast<const RooRealVar*>(arg);
317 return var && var->numBins() > 1;
318 }));
319
320 if (!oneDimAndBinned) {
321 if (precision > 0.) {
322 oocoutE(&pdf, Fitting)
323 << "Integration over bins was requested, but this is currently only implemented for 1-D fits." << std::endl;
324 }
325 return nullptr;
326 }
327
328 // Find the real-valued observable. We don't care about categories.
329 auto theObs = std::find_if(funcObservables->begin(), funcObservables->end(), [](const RooAbsArg* arg){
330 return dynamic_cast<const RooAbsRealLValue*>(arg);
331 });
332 assert(theObs != funcObservables->end());
333
334 std::unique_ptr<RooAbsPdf> newPdf;
335
336 if (precision > 0.) {
337 // User forced integration. Let just apply it.
338 newPdf = std::make_unique<RooBinSamplingPdf>(
339 (std::string(pdf.GetName()) + "_binSampling").c_str(), pdf.GetTitle(),
340 *static_cast<RooAbsRealLValue *>(*theObs), pdf, precision);
341 } else if (dynamic_cast<RooDataHist const *>(&data) != nullptr &&
342 precision == 0. && !pdf.isBinnedDistribution(*data.get())) {
343 // User didn't forbid integration, and it seems appropriate with a
344 // RooDataHist.
345 newPdf = std::make_unique<RooBinSamplingPdf>(
346 (std::string(pdf.GetName()) + "_binSampling").c_str(), pdf.GetTitle(),
347 *static_cast<RooAbsRealLValue *>(*theObs), pdf);
348 }
349
350 return newPdf;
351}
#define oocoutE(o, a)
#define coutE(a)
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void data
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t result
Option_t Option_t width
char name[80]
Definition TGX11.cxx:110
RooAbsArg is the common abstract base class for objects that represent a value and a "shape" in RooFi...
Definition RooAbsArg.h:74
bool dependsOn(const RooAbsCollection &serverList, const RooAbsArg *ignoreArg=nullptr, bool valueOnly=false) const
Test whether we depend on (ie, are served by) any object in the specified collection.
const TNamed * namePtr() const
De-duplicated pointer to this object's name.
Definition RooAbsArg.h:560
bool isShapeDirty() const
Definition RooAbsArg.h:413
RooFit::OwningPtr< RooArgSet > getObservables(const RooArgSet &set, bool valueOnly=true) const
Given a set of possible observables, return the observables that this PDF depends on.
bool inhibitDirty() const
Delete watch flag.
void clearShapeDirty() const
Definition RooAbsArg.h:602
RooAbsBinning is the abstract base class for RooRealVar binning definitions.
virtual double binLow(Int_t bin) const =0
virtual double binHigh(Int_t bin) const =0
virtual Int_t numBoundaries() const =0
virtual double * array() const =0
RooAbsData is the common abstract base class for binned and unbinned datasets.
Definition RooAbsData.h:59
RooArgSet const * _normSet
Normalization integral (owned by _normMgr)
Definition RooAbsPdf.h:377
RooAbsRealLValue is the common abstract base class for objects that represent a real value that may a...
virtual const RooAbsBinning & getBinning(const char *name=nullptr, bool verbose=true, bool createOnTheFly=false) const =0
Retrive binning configuration with given name or default binning.
Int_t getBin(const char *rangeName=nullptr) const override
virtual void setVal(double value)=0
Set the current value of the object. Needs to be overridden by implementations.
double getVal(const RooArgSet *normalisationSet=nullptr) const
Evaluate object.
Definition RooAbsReal.h:91
virtual bool isBinnedDistribution(const RooArgSet &) const
Tests if the distribution is binned. Unless overridden by derived classes, this always returns false.
Definition RooAbsReal.h:358
RooArgSet is a container object that can hold multiple RooAbsArg objects.
Definition RooArgSet.h:55
The RooBinSamplingPdf is supposed to be used as an adapter between a continuous PDF and a binned dist...
double integrate(const RooArgSet *normSet, double low, double high) const
Integrate the wrapped PDF using our current integrator, with given norm set and limits.
RooTemplateProxy< RooAbsPdf > _pdf
std::vector< double > _binBoundaries
! Workspace to store data for bin sampling
std::unique_ptr< ROOT::Math::IntegratorOneDim > & integrator() const
Direct access to the unique_ptr holding the integrator that's used to sample the bins.
RooTemplateProxy< RooAbsRealLValue > _observable
void computeBatch(cudaStream_t *, double *output, size_t size, RooFit::Detail::DataMap const &) const override
Integrate the PDF over all its bins, and return a batch with those values.
RooSpan< const double > binBoundaries() const
Get the bin boundaries for the observable.
std::unique_ptr< ROOT::Math::IntegratorOneDim > _integrator
! Integrator used to sample bins.
double _relEpsilon
Default integrator precision.
static std::unique_ptr< RooAbsPdf > create(RooAbsPdf &pdf, RooAbsData const &data, double precision)
Creates a wrapping RooBinSamplingPdf if appropriate.
double operator()(double x) const
Binding used by the integrator to evaluate the PDF.
double evaluate() const override
Integrate the PDF over the current bin of the observable.
const RooAbsPdf & pdf() const
std::list< double > * plotSamplingHint(RooAbsRealLValue &obs, double xlo, double xhi) const override
Return a list of all bin edges, so the PDF is plotted as a step function.
The RooDataHist is a container class to hold N-dimensional binned data.
Definition RooDataHist.h:39
RooSpan< const double > at(RooAbsArg const *arg, RooAbsArg const *caller=nullptr)
Definition DataMap.cxx:21
A simple container to hold a batch of data values.
Definition RooSpan.h:34
const char * GetName() const override
Returns name of object.
Definition TNamed.h:47
const char * GetTitle() const override
Returns title of object.
Definition TNamed.h:48
@ kADAPTIVE
to be used for general functions without singularities
Double_t x[n]
Definition legend1.C:17
Disable all caches for sub-branches in an expression tree.
Definition RooHelpers.h:102
double epsilon
Definition triangle.c:618
static void output()