Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
buildLikelihood.cxx
Go to the documentation of this file.
1/*
2 * Project: RooFit
3 * Authors:
4 * PB, Patrick Bos, Netherlands eScience Center, p.bos@esciencecenter.nl
5 *
6 * Copyright (c) 2021, CERN
7 *
8 * Redistribution and use in source and binary forms,
9 * with or without modification, are permitted according to the terms
10 * listed in LICENSE (http://roofit.sourceforge.net/license.txt)
11 */
12
14
15#include <RooSimultaneous.h>
20#include <RooAbsPdf.h>
21#include <RooAbsData.h>
22#include <RooRealSumPdf.h>
23#include <RooProdPdf.h>
24#include <TClass.h>
25
26#include <memory>
27
28namespace RooFit {
29/**
30 * \brief Namespace for new RooFit test statistic calculation.
31 *
32 * RooFit::TestStatistics contains a major refactoring of the RooAbsTestStatistic-RooAbsOptTestStatistic-RooNLLVar
33 * inheritance tree into:
34 * 1. statistics-based classes on the one hand;
35 * 2. calculation/evaluation/optimization based classes on the other hand.
36 *
37 * The likelihood is the central unit on the statistics side. The RooAbsL class is implemented for four kinds of
38 * likelihoods: binned, unbinned, "subsidiary" (an optimization for numerical stability that gathers components like
39 * global observables) and "sum" (over multiple components of the other types). These classes provide ways to compute
40 * their components in parallelizable chunks that can be used by the calculator classes as they see fit.
41 *
42 * On top of the likelihood classes, we also provide for convenience a likelihood builder `NLLFactory`. This factory
43 * analyzes the pdf and automatically constructs the proper likelihood, built up from the available RooAbsL subclasses.
44 * Options, like specifying constraint terms or global observables, can be passed using method chaining. The
45 * `NLLFactory::build` method finally returns the constructed likelihood as a RooRealL object that can be fit to using
46 * RooMinimizer.
47 *
48 * The calculator "Wrapper" classes are abstract interfaces. These can be implemented for different kinds of algorithms,
49 * or with different kinds of optimization "back-ends" in mind. Two fork-based multi-processing implementations based
50 * on RooFit::MultiProcess are available, one to calculate the gradient of the likelihood in parallel and one for the
51 * likelihood itself. The likelihood can also be calculated serially.
52 *
53 * The coupling of all these classes to RooMinimizer is made via the MinuitFcnGrad class, which owns the Wrappers that
54 * calculate the likelihood components.
55 *
56 * More extensive documentation is available at
57 * https://github.com/root-project/root/blob/master/roofit/doc/developers/test_statistics.md
58 */
59namespace TestStatistics {
60
61namespace { // private implementation details
62
63RooArgSet getConstraintsSet(RooAbsPdf *pdf, RooAbsData *data, RooArgSet constrained_parameters,
64 RooArgSet const &external_constraints, RooArgSet global_observables,
65 std::string const &global_observables_tag)
66{
67 // BEGIN CONSTRAINT COLLECTION; copied from RooAbsPdf::createNLL
68
69 bool doStripDisconnected = false;
70 // If no explicit list of parameters to be constrained is specified apply default algorithm
71 // All terms of RooProdPdfs that do not contain observables and share parameters with one or more
72 // terms that do contain observables are added as constraints.
73#ifndef NDEBUG
74 bool did_default_constraint_algo = false;
75 std::size_t N_default_constraints = 0;
76#endif
77 if (constrained_parameters.empty()) {
78 std::unique_ptr<RooArgSet> default_constraints{pdf->getParameters(*data, false)};
79 constrained_parameters.add(*default_constraints);
80 doStripDisconnected = true;
81#ifndef NDEBUG
82 did_default_constraint_algo = true;
83 N_default_constraints = default_constraints->size();
84#endif
85 }
86#ifndef NDEBUG
87 if (did_default_constraint_algo) {
88 assert(N_default_constraints == static_cast<std::size_t>(constrained_parameters.size()));
89 }
90#endif
91
92 // Collect internal and external constraint specifications
93 RooArgSet allConstraints;
94
95 if (!global_observables_tag.empty()) {
96 if (!global_observables.empty()) {
97 global_observables.removeAll();
98 }
99 std::unique_ptr<RooArgSet> allVars{pdf->getVariables()};
100 global_observables.add(*dynamic_cast<RooArgSet *>(allVars->selectByAttrib(global_observables_tag.c_str(), true)));
101 oocoutI(nullptr, Minimization) << "User-defined specification of global observables definition with tag named '"
102 << global_observables_tag << "'" << std::endl;
103 } else if (global_observables.empty()) {
104 // neither global_observables nor global_observables_tag was given - try if a default tag is defined in the head
105 // node
106 const char *defGlobObsTag = pdf->getStringAttribute("DefaultGlobalObservablesTag");
107 if (defGlobObsTag) {
108 oocoutI(nullptr, Minimization)
109 << "p.d.f. provides built-in specification of global observables definition with tag named '"
110 << defGlobObsTag << "'" << std::endl;
111 std::unique_ptr<RooArgSet> allVars{pdf->getVariables()};
112 global_observables.add(*dynamic_cast<RooArgSet *>(allVars->selectByAttrib(defGlobObsTag, true)));
113 }
114 }
115
116 // EGP: removed workspace (RooAbsPdf::_myws) based stuff for now; TODO: reconnect this class to workspaces
117
118 if (!constrained_parameters.empty()) {
119 std::unique_ptr<RooArgSet> constraints{
120 pdf->getAllConstraints(*data->get(), constrained_parameters, doStripDisconnected)};
121 allConstraints.add(*constraints);
122 }
123 if (!external_constraints.empty()) {
124 allConstraints.add(external_constraints);
125 }
126
127 return allConstraints;
128}
129
130/*
131 * \brief Extract a collection of subsidiary likelihoods from a pdf
132 *
133 * \param[in] pdf Raw pointer to the pdf
134 * \param[in] data Raw pointer to the dataset
135 * \param[in] constrained_parameters Set of parameters that are constrained. Pdf components dependent on these alone are
136 * added to the subsidiary likelihood.
137 * \param[in] external_constraints Set of external constraint pdfs, i.e. constraints
138 * not necessarily in the pdf itself. These are always added to the subsidiary likelihood.
139 * \param[in] global_observables
140 * Observables that have a constant value, independent of the dataset events. Pdf components dependent on these alone
141 * are added to the subsidiary likelihood. \note Overrides all other likelihood parameters (like those in \p
142 * constrained_parameters) if present.
143 * \param[in] global_observables_tag String that can be set as attribute in pdf
144 * components to indicate that it is a global observable. Can be used instead of or in addition to \p
145 * global_observables.
146 * \return A unique pointer to a RooSubsidiaryL that contains all terms in the pdf that can be
147 * calculated separately from the other components in the full likelihood.
148 */
149std::unique_ptr<RooSubsidiaryL> buildSubsidiaryL(RooAbsPdf *pdf, RooAbsData *data, RooArgSet constrained_parameters,
150 RooArgSet const &external_constraints, RooArgSet global_observables,
151 std::string const &global_observables_tag)
152{
153 auto allConstraints = getConstraintsSet(pdf, data, constrained_parameters, external_constraints, global_observables,
154 global_observables_tag);
155
156 std::unique_ptr<RooSubsidiaryL> subsidiary_likelihood;
157 // Include constraints, if any, in likelihood
158 if (!allConstraints.empty()) {
159
160 oocoutI(nullptr, Minimization) << " Including the following constraint terms in minimization: " << allConstraints
161 << std::endl;
162 if (!global_observables.empty()) {
163 oocoutI(nullptr, Minimization) << "The following global observables have been defined: " << global_observables
164 << std::endl;
165 }
166 std::string name("likelihood for pdf ");
167 name += pdf->GetName();
168 subsidiary_likelihood = std::make_unique<RooSubsidiaryL>(
169 name, allConstraints, (!global_observables.empty()) ? global_observables : constrained_parameters);
170 }
171
172 return subsidiary_likelihood;
173}
174
175/// Get the binned part of a pdf
176///
177/// \param pdf Raw pointer to the pdf
178/// \return A pdf is binned if it has attribute "BinnedLikelihood" and it is a RooRealSumPdf (or derived class).
179/// If \p pdf itself is binned, it will be returned. If the pdf is a RooProdPdf (or derived), the product terms
180/// will be searched for a binned component and the first such term that is found will be returned. Note that
181/// the simultaneous pdf setup is such that it is assumed that only one component is binned, so this should
182/// always return the correct binned component. If no binned component is found, nullptr is returned.
183RooAbsPdf *getBinnedPdf(RooAbsPdf *pdf)
184{
185 RooAbsPdf *binnedPdf = nullptr;
186 if (pdf->getAttribute("BinnedLikelihood") && pdf->IsA()->InheritsFrom(RooRealSumPdf::Class())) {
187 // Simplest case: top-level of pdf is a RRSP
188 binnedPdf = pdf;
189 } else if (pdf->IsA()->InheritsFrom(RooProdPdf::Class())) {
190 // Default case: top-level pdf is a product of RRSP and other pdfs
191 for (const auto component : (static_cast<RooProdPdf *>(pdf))->pdfList()) {
192 if (component->getAttribute("BinnedLikelihood") && component->IsA()->InheritsFrom(RooRealSumPdf::Class())) {
193 binnedPdf = static_cast<RooAbsPdf *>(component);
194 break;
195 }
196 }
197 }
198 return binnedPdf;
199}
200
201} // namespace
202
203/*
204 * \brief Build a set of likelihood components to build a likelihood from a simultaneous pdf.
205 *
206 * \return A vector to RooAbsL unique_ptrs that contain all component binned and/or
207 * unbinned likelihoods. Note: subsidiary components are not included; use getConstraintsSet and/or
208 * buildSubsidiaryLikelihood to add those.
209 */
210std::vector<std::unique_ptr<RooAbsL>> NLLFactory::getSimultaneousComponents()
211{
212 auto sim_pdf = dynamic_cast<RooSimultaneous *>(&_pdf);
213
214 // the rest of this function is an adaptation of RooAbsTestStatistic::initSimMode:
215
216 auto &simCat = const_cast<RooAbsCategoryLValue &>(sim_pdf->indexCat());
217
218 // note: this is valid for simultaneous likelihoods, not for other test statistic types (e.g. chi2) for which this
219 // should return true.
220 bool process_empty_data_sets = RooAbsL::isExtendedHelper(&_pdf, _extended);
221
222 TString simCatName(simCat.GetName());
223 // Note: important not to use cloned dataset here (possible when this code is run in Roo[...]L ctor), use the
224 // original one (which is data_ in Roo[...]L ctors, but data here)
225 std::unique_ptr<TList> dsetList{_data.split(*sim_pdf, process_empty_data_sets)};
226 if (!dsetList) {
227 throw std::logic_error(
228 "getSimultaneousComponents ERROR, index category of simultaneous pdf is missing in dataset, aborting");
229 }
230
231 // Count number of used states
232 std::size_t N_components = 0;
233
234 for (const auto &catState : simCat) {
235 // Retrieve the PDF for this simCat state
236 RooAbsPdf *component_pdf = sim_pdf->getPdf(catState.first.c_str());
237 auto *dset = static_cast<RooAbsData *>(dsetList->FindObject(catState.first.c_str()));
238
239 if (component_pdf && dset && (0. != dset->sumEntries() || process_empty_data_sets)) {
240 ++N_components;
241 }
242 }
243
244 // Allocate arrays
245 std::vector<std::unique_ptr<RooAbsL>> components;
246 components.reserve(N_components);
247 // _gofSplitMode.resize(N_components); // not used, Hybrid mode only, see below
248
249 // Create array of regular fit contexts, containing subset of data and single fitCat PDF
250 std::size_t n = 0;
251 for (const auto &catState : simCat) {
252 const std::string &catName = catState.first;
253 // Retrieve the PDF for this simCat state
254 RooAbsPdf *component_pdf = sim_pdf->getPdf(catName.c_str());
255 auto *dset = static_cast<RooAbsData *>(dsetList->FindObject(catName.c_str()));
256
257 if (component_pdf && dset && (0. != dset->sumEntries() || process_empty_data_sets)) {
258 ooccoutI(nullptr, Fitting) << "getSimultaneousComponents: creating slave calculator #" << n << " for state "
259 << catName << " (" << dset->numEntries() << " dataset entries)" << std::endl;
260
261 RooAbsPdf *binnedPdf = getBinnedPdf(component_pdf);
262 bool binnedL = (binnedPdf != nullptr);
263 if (binnedPdf == nullptr && component_pdf->IsA()->InheritsFrom(RooProdPdf::Class())) {
264 // Default case: top-level pdf is a product of RRSP and other pdfs
265 for (const auto component : (static_cast<RooProdPdf *>(component_pdf))->pdfList()) {
266 if (component->getAttribute("MAIN_MEASUREMENT")) {
267 // not really a binned pdf, but this prevents a (potentially) long list of subsidiary measurements to
268 // be passed to the slave calculator
269 binnedPdf = static_cast<RooAbsPdf *>(component);
270 break;
271 }
272 }
273 }
274 // Below here directly pass binnedPdf instead of PROD(binnedPdf,constraints) as constraints are evaluated
275 // elsewhere anyway and omitting them reduces model complexity and associated handling/cloning times
276 if (binnedL) {
277 components.push_back(std::make_unique<RooBinnedL>((binnedPdf ? binnedPdf : component_pdf), dset));
278 } else {
279 components.push_back(
280 std::make_unique<RooUnbinnedL>((binnedPdf ? binnedPdf : component_pdf), dset, _extended, _evalBackend));
281 }
282 // }
283 components.back()->setSimCount(N_components);
284
285 // Servers may have been redirected between instantiation and (deferred) initialization
286
287 std::unique_ptr<RooArgSet> actualParams{binnedPdf ? binnedPdf->getParameters(dset)
288 : component_pdf->getParameters(dset)};
289 RooArgSet params;
290 _pdf.getParameters(_data.get(), params);
291 RooArgSet selTargetParams;
292 params.selectCommon(*actualParams, selTargetParams);
293
294 assert(selTargetParams.equals(*components.back()->getParameters()));
295
296 ++n;
297 } else {
298 if ((!dset || (0. != dset->sumEntries() && !process_empty_data_sets)) && component_pdf) {
299 ooccoutD(nullptr, Fitting) << "getSimultaneousComponents: state " << catName
300 << " has no data entries, no slave calculator created" << std::endl;
301 }
302 }
303 }
304 oocoutI(nullptr, Fitting) << "getSimultaneousComponents: created " << n << " slave calculators." << std::endl;
305
306 return components;
307}
308
309/// Create a likelihood builder for a given pdf and dataset.
310/// \param[in] pdf Raw pointer to the pdf
311/// \param[in] data Raw pointer to the dataset
313
314/*
315 * \brief Build a likelihood from a pdf + dataset, optionally with a subsidiary likelihood component.
316 *
317 * This function analyzes the pdf and automatically constructs the proper likelihood, built up from the available
318 * RooAbsL subclasses. In essence, this can give 8 conceptually different combinations, based on three questions:
319 * 1. Is it a simultaneous pdf?
320 * 2. Is the pdf binned?
321 * 3. Does the pdf have subsidiary terms?
322 * If questions 1 and 3 are answered negatively, this function will either return a RooBinnedL or RooUnbinnedL. In all
323 * other cases it returns a RooSumL, which will contain RooBinnedL and/or RooUnbinnedL component(s) and possibly a
324 * RooSubsidiaryL component with constraint terms.
325 *
326 * \return A unique pointer to a RooSubsidiaryL that contains all terms in
327 * the pdf that can be calculated separately from the other components in the full likelihood.
328 */
329std::unique_ptr<RooAbsL> NLLFactory::build()
330{
331 std::unique_ptr<RooAbsL> likelihood;
332 std::vector<std::unique_ptr<RooAbsL>> components;
333
334 if (dynamic_cast<RooSimultaneous const *>(&_pdf)) {
335 components = getSimultaneousComponents();
336 } else if (auto binnedPdf = getBinnedPdf(&_pdf)) {
337 likelihood = std::make_unique<RooBinnedL>(binnedPdf, &_data);
338 } else { // unbinned
339 likelihood = std::make_unique<RooUnbinnedL>(&_pdf, &_data, _extended, _evalBackend);
340 }
341
344 if (subsidiary) {
345 if (likelihood) {
346 components.push_back(std::move(likelihood));
347 }
348 components.push_back(std::move(subsidiary));
349 }
350 if (!components.empty()) {
351 likelihood = std::make_unique<RooSumL>(&_pdf, &_data, std::move(components), _extended);
352 }
353 return likelihood;
354}
355
356/// \param[in] extended Set extended term calculation on, off or use
357/// RooAbsL::Extended::Auto to determine automatically based on the
358/// pdf whether to activate or not.
360{
361 _extended = extended;
362 return *this;
363}
364
365/// \param[in] constrainedParameters Set of parameters that are constrained.
366/// Pdf components dependent on these alone are added to the
367/// subsidiary likelihood.
369{
370 _constrainedParameters.add(constrainedParameters);
371 return *this;
372}
373
374/// \param[in] externalConstraints Set of external constraint pdfs, i.e.
375/// constraints not necessarily in the pdf itself. These are always
376/// added to the subsidiary likelihood.
378{
379 _externalConstraints.add(externalConstraints);
380 return *this;
381}
382
383/// \param[in] globalObservables Observables that have a constant value,
384/// independent of the dataset events. Pdf components dependent on
385/// these alone are added to the subsidiary likelihood.
386/// \note Overrides all other likelihood parameters (like those in
387/// NLLFactory::ConstrainedParameters()) if present.
389{
390 _globalObservables.add(globalObservables);
391 return *this;
392}
393
394/// \param[in] globalObservablesTag String that can be set as attribute in
395/// pdf components to indicate that it is a global observable. Can
396/// be used instead of or in addition to
397/// NLLFactory::GlobalObservables().
398NLLFactory &NLLFactory::GlobalObservablesTag(const char *globalObservablesTag)
399{
400 _globalObservablesTag = globalObservablesTag;
401 return *this;
402}
403
405{
406 _evalBackend = evalBackend;
407 return *this;
408}
409
410} // namespace TestStatistics
411} // namespace RooFit
#define oocoutI(o, a)
#define ooccoutI(o, a)
#define ooccoutD(o, a)
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void data
char name[80]
Definition TGX11.cxx:110
RooFit::OwningPtr< RooArgSet > getParameters(const RooAbsData *data, bool stripDisconnected=true) const
Create a list of leaf nodes in the arg tree starting with ourself as top node that don't match any of...
const Text_t * getStringAttribute(const Text_t *key) const
Get string attribute mapped under key 'key'.
bool getAttribute(const Text_t *name) const
Check if a named attribute is set. By default, all attributes are unset.
RooFit::OwningPtr< RooArgSet > getVariables(bool stripDisconnected=true) const
Return RooArgSet with all variables (tree leaf nodes of expression tree)
Abstract base class for objects that represent a discrete value that can be set from the outside,...
bool equals(const RooAbsCollection &otherColl) const
Check if this and other collection have identically-named contents.
virtual void removeAll()
Remove all arguments from our set, deleting them if we own them.
virtual bool add(const RooAbsArg &var, bool silent=false)
Add the specified argument to list.
Storage_t::size_type size() const
Abstract base class for binned and unbinned datasets.
Definition RooAbsData.h:57
virtual const RooArgSet * get() const
Definition RooAbsData.h:101
virtual RooFit::OwningPtr< TList > split(const RooAbsCategory &splitCat, bool createEmptyDataSets=false) const
Split the dataset into subsets based on states of a categorical variable in this dataset.
Abstract interface for all probability density functions.
Definition RooAbsPdf.h:40
TClass * IsA() const override
Definition RooAbsPdf.h:351
RooArgSet * getAllConstraints(const RooArgSet &observables, RooArgSet &constrainedParams, bool stripDisconnected=true) const
This helper function finds and collects all constraints terms of all component p.d....
RooArgSet is a container object that can hold multiple RooAbsArg objects.
Definition RooArgSet.h:24
RooArgSet * selectCommon(const RooAbsCollection &refColl) const
Use RooAbsCollection::selecCommon(), but return as RooArgSet.
Definition RooArgSet.h:149
std::unique_ptr< RooAbsL > build()
std::vector< std::unique_ptr< RooAbsL > > getSimultaneousComponents()
NLLFactory & EvalBackend(RooFit::EvalBackend evalBackend)
NLLFactory(RooAbsPdf &pdf, RooAbsData &data)
Create a likelihood builder for a given pdf and dataset.
NLLFactory & ExternalConstraints(const RooArgSet &externalconstraints)
NLLFactory & Extended(RooAbsL::Extended extended)
NLLFactory & GlobalObservables(const RooArgSet &globalObservables)
NLLFactory & ConstrainedParameters(const RooArgSet &constrainedParameters)
NLLFactory & GlobalObservablesTag(const char *globalObservablesTag)
static bool isExtendedHelper(RooAbsPdf *pdf, Extended extended)
Definition RooAbsL.cxx:35
Efficient implementation of a product of PDFs of the form.
Definition RooProdPdf.h:39
static TClass * Class()
static TClass * Class()
Facilitates simultaneous fitting of multiple PDFs to subsets of a given dataset.
Bool_t InheritsFrom(const char *cl) const override
Return kTRUE if this class inherits from a class with name "classname".
Definition TClass.cxx:4943
const char * GetName() const override
Returns name of object.
Definition TNamed.h:47
Basic string class.
Definition TString.h:139
const Int_t n
Definition legend1.C:16
The namespace RooFit contains mostly switches that change the behaviour of functions of PDFs (or othe...
Definition CodegenImpl.h:64