Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
buildLikelihood.cxx
Go to the documentation of this file.
1/*
2 * Project: RooFit
3 * Authors:
4 * PB, Patrick Bos, Netherlands eScience Center, p.bos@esciencecenter.nl
5 *
6 * Copyright (c) 2021, CERN
7 *
8 * Redistribution and use in source and binary forms,
9 * with or without modification, are permitted according to the terms
10 * listed in LICENSE (http://roofit.sourceforge.net/license.txt)
11 */
12
14
15#include <RooSimultaneous.h>
20#include <RooAbsPdf.h>
21#include <RooAbsData.h>
22#include <RooRealSumPdf.h>
23#include <RooProdPdf.h>
24#include <TClass.h>
25
26#include <memory>
27
28namespace RooFit {
29/**
30 * \brief Namespace for new RooFit test statistic calculation.
31 *
32 * RooFit::TestStatistics contains a major refactoring of the RooAbsTestStatistic-RooAbsOptTestStatistic-RooNLLVar
33 * inheritance tree into:
34 * 1. statistics-based classes on the one hand;
35 * 2. calculation/evaluation/optimization based classes on the other hand.
36 *
37 * The likelihood is the central unit on the statistics side. The RooAbsL class is implemented for four kinds of
38 * likelihoods: binned, unbinned, "subsidiary" (an optimization for numerical stability that gathers components like
39 * global observables) and "sum" (over multiple components of the other types). These classes provide ways to compute
40 * their components in parallelizable chunks that can be used by the calculator classes as they see fit.
41 *
42 * On top of the likelihood classes, we also provide for convenience a likelihood builder buildLikelihood, as a free
43 * function in the namespace. This function analyzes the pdf and automatically constructs the proper likelihood, built
44 * up from the available RooAbsL subclasses.
45 *
46 * The calculator "Wrapper" classes are abstract interfaces. These can be implemented for different kinds of algorithms,
47 * or with different kinds of optimization "back-ends" in mind. In an upcoming PR, we will introduce the fork-based
48 * multi-processing implementation based on RooFit::MultiProcess. Other possible implementations could use the GPU or
49 * external tools like TensorFlow.
50 *
51 * The coupling of all these classes to RooMinimizer is made via the MinuitFcnGrad class, which owns the Wrappers that
52 * calculate the likelihood components.
53 */
54namespace TestStatistics {
55
56namespace { // private implementation details
57
58RooArgSet getConstraintsSet(RooAbsPdf *pdf, RooAbsData *data, RooArgSet constrained_parameters,
59 RooArgSet const &external_constraints, RooArgSet global_observables,
60 std::string const &global_observables_tag)
61{
62 // BEGIN CONSTRAINT COLLECTION; copied from RooAbsPdf::createNLL
63
64 bool doStripDisconnected = false;
65 // If no explicit list of parameters to be constrained is specified apply default algorithm
66 // All terms of RooProdPdfs that do not contain observables and share parameters with one or more
67 // terms that do contain observables are added as constraints.
68#ifndef NDEBUG
69 bool did_default_constraint_algo = false;
70 std::size_t N_default_constraints = 0;
71#endif
72 if (constrained_parameters.empty()) {
73 std::unique_ptr<RooArgSet> default_constraints{pdf->getParameters(*data, false)};
74 constrained_parameters.add(*default_constraints);
75 doStripDisconnected = true;
76#ifndef NDEBUG
77 did_default_constraint_algo = true;
78 N_default_constraints = default_constraints->getSize();
79#endif
80 }
81#ifndef NDEBUG
82 if (did_default_constraint_algo) {
83 assert(N_default_constraints == static_cast<std::size_t>(constrained_parameters.getSize()));
84 }
85#endif
86
87 // Collect internal and external constraint specifications
88 RooArgSet allConstraints;
89
90 if (!global_observables_tag.empty()) {
91 if (!global_observables.empty()) {
92 global_observables.removeAll();
93 }
94 std::unique_ptr<RooArgSet> allVars{pdf->getVariables()};
95 global_observables.add(*dynamic_cast<RooArgSet *>(allVars->selectByAttrib(global_observables_tag.c_str(), true)));
96 oocoutI(nullptr, Minimization) << "User-defined specification of global observables definition with tag named '"
97 << global_observables_tag << "'" << std::endl;
98 } else if (global_observables.empty()) {
99 // neither global_observables nor global_observables_tag was given - try if a default tag is defined in the head
100 // node
101 const char *defGlobObsTag = pdf->getStringAttribute("DefaultGlobalObservablesTag");
102 if (defGlobObsTag) {
103 oocoutI(nullptr, Minimization)
104 << "p.d.f. provides built-in specification of global observables definition with tag named '"
105 << defGlobObsTag << "'" << std::endl;
106 std::unique_ptr<RooArgSet> allVars{pdf->getVariables()};
107 global_observables.add(*dynamic_cast<RooArgSet *>(allVars->selectByAttrib(defGlobObsTag, true)));
108 }
109 }
110
111 // EGP: removed workspace (RooAbsPdf::_myws) based stuff for now; TODO: reconnect this class to workspaces
112
113 if (!constrained_parameters.empty()) {
114 std::unique_ptr<RooArgSet> constraints{
115 pdf->getAllConstraints(*data->get(), constrained_parameters, doStripDisconnected)};
116 allConstraints.add(*constraints);
117 }
118 if (!external_constraints.empty()) {
119 allConstraints.add(external_constraints);
120 }
121
122 return allConstraints;
123}
124
125/*
126 * \brief Extract a collection of subsidiary likelihoods from a pdf
127 *
128 * \param[in] pdf Raw pointer to the pdf
129 * \param[in] data Raw pointer to the dataset
130 * \param[in] constrained_parameters Set of parameters that are constrained. Pdf components dependent on these alone are
131 * added to the subsidiary likelihood.
132 * \param[in] external_constraints Set of external constraint pdfs, i.e. constraints
133 * not necessarily in the pdf itself. These are always added to the subsidiary likelihood.
134 * \param[in] global_observables
135 * Observables that have a constant value, independent of the dataset events. Pdf components dependent on these alone
136 * are added to the subsidiary likelihood. \note Overrides all other likelihood parameters (like those in \p
137 * constrained_parameters) if present.
138 * \param[in] global_observables_tag String that can be set as attribute in pdf
139 * components to indicate that it is a global observable. Can be used instead of or in addition to \p
140 * global_observables.
141 * \return A unique pointer to a RooSubsidiaryL that contains all terms in the pdf that can be
142 * calculated separately from the other components in the full likelihood.
143 */
144std::unique_ptr<RooSubsidiaryL> buildSubsidiaryL(RooAbsPdf *pdf, RooAbsData *data, RooArgSet constrained_parameters,
145 RooArgSet const &external_constraints, RooArgSet global_observables,
146 std::string const &global_observables_tag)
147{
148 auto allConstraints = getConstraintsSet(pdf, data, constrained_parameters, external_constraints, global_observables,
149 global_observables_tag);
150
151 std::unique_ptr<RooSubsidiaryL> subsidiary_likelihood;
152 // Include constraints, if any, in likelihood
153 if (!allConstraints.empty()) {
154
155 oocoutI(nullptr, Minimization) << " Including the following constraint terms in minimization: " << allConstraints
156 << std::endl;
157 if (!global_observables.empty()) {
158 oocoutI(nullptr, Minimization) << "The following global observables have been defined: " << global_observables
159 << std::endl;
160 }
161 std::string name("likelihood for pdf ");
162 name += pdf->GetName();
163 subsidiary_likelihood = std::make_unique<RooSubsidiaryL>(
164 name, allConstraints, (!global_observables.empty()) ? global_observables : constrained_parameters);
165 }
166
167 return subsidiary_likelihood;
168}
169
170/// Get the binned part of a pdf
171///
172/// \param pdf Raw pointer to the pdf
173/// \return A pdf is binned if it has attribute "BinnedLikelihood" and it is a RooRealSumPdf (or derived class).
174/// If \p pdf itself is binned, it will be returned. If the pdf is a RooProdPdf (or derived), the product terms
175/// will be searched for a binned component and the first such term that is found will be returned. Note that
176/// the simultaneous pdf setup is such that it is assumed that only one component is binned, so this should
177/// always return the correct binned component. If no binned component is found, nullptr is returned.
178RooAbsPdf *getBinnedPdf(RooAbsPdf *pdf)
179{
180 RooAbsPdf *binnedPdf = nullptr;
181 if (pdf->getAttribute("BinnedLikelihood") && pdf->IsA()->InheritsFrom(RooRealSumPdf::Class())) {
182 // Simplest case: top-level of pdf is a RRSP
183 binnedPdf = pdf;
184 } else if (pdf->IsA()->InheritsFrom(RooProdPdf::Class())) {
185 // Default case: top-level pdf is a product of RRSP and other pdfs
186 for (const auto component : ((RooProdPdf *)pdf)->pdfList()) {
187 if (component->getAttribute("BinnedLikelihood") && component->IsA()->InheritsFrom(RooRealSumPdf::Class())) {
188 binnedPdf = (RooAbsPdf *)component;
189 break;
190 }
191 }
192 }
193 return binnedPdf;
194}
195
196} // namespace
197
198/*
199 * \brief Build a set of likelihood components to build a likelihood from a simultaneous pdf.
200 *
201 * \return A vector to RooAbsL unique_ptrs that contain all component binned and/or
202 * unbinned likelihoods. Note: subsidiary components are not included; use getConstraintsSet and/or
203 * buildSubsidiaryLikelihood to add those.
204 */
205std::vector<std::unique_ptr<RooAbsL>> NLLFactory::getSimultaneousComponents()
206{
207 auto sim_pdf = dynamic_cast<RooSimultaneous *>(&_pdf);
208
209 // the rest of this function is an adaptation of RooAbsTestStatistic::initSimMode:
210
211 auto &simCat = const_cast<RooAbsCategoryLValue &>(sim_pdf->indexCat());
212
213 // note: this is valid for simultaneous likelihoods, not for other test statistic types (e.g. chi2) for which this
214 // should return true.
215 bool process_empty_data_sets = RooAbsL::isExtendedHelper(&_pdf, _extended);
216
217 TString simCatName(simCat.GetName());
218 // Note: important not to use cloned dataset here (possible when this code is run in Roo[...]L ctor), use the
219 // original one (which is data_ in Roo[...]L ctors, but data here)
220 std::unique_ptr<TList> dsetList{_data.split(*sim_pdf, process_empty_data_sets)};
221 if (!dsetList) {
222 throw std::logic_error(
223 "getSimultaneousComponents ERROR, index category of simultaneous pdf is missing in dataset, aborting");
224 }
225
226 // Count number of used states
227 std::size_t N_components = 0;
228
229 for (const auto &catState : simCat) {
230 // Retrieve the PDF for this simCat state
231 RooAbsPdf *component_pdf = sim_pdf->getPdf(catState.first.c_str());
232 auto *dset = static_cast<RooAbsData *>(dsetList->FindObject(catState.first.c_str()));
233
234 if (component_pdf && dset && (0. != dset->sumEntries() || process_empty_data_sets)) {
235 ++N_components;
236 }
237 }
238
239 // Allocate arrays
240 std::vector<std::unique_ptr<RooAbsL>> components;
241 components.reserve(N_components);
242 // _gofSplitMode.resize(N_components); // not used, Hybrid mode only, see below
243
244 // Create array of regular fit contexts, containing subset of data and single fitCat PDF
245 std::size_t n = 0;
246 for (const auto &catState : simCat) {
247 const std::string &catName = catState.first;
248 // Retrieve the PDF for this simCat state
249 RooAbsPdf *component_pdf = sim_pdf->getPdf(catName.c_str());
250 auto *dset = static_cast<RooAbsData *>(dsetList->FindObject(catName.c_str()));
251
252 if (component_pdf && dset && (0. != dset->sumEntries() || process_empty_data_sets)) {
253 ooccoutI(nullptr, Fitting) << "getSimultaneousComponents: creating slave calculator #" << n << " for state "
254 << catName << " (" << dset->numEntries() << " dataset entries)" << std::endl;
255
256 RooAbsPdf *binnedPdf = getBinnedPdf(component_pdf);
257 bool binnedL = (binnedPdf != nullptr);
258 if (binnedPdf == nullptr && component_pdf->IsA()->InheritsFrom(RooProdPdf::Class())) {
259 // Default case: top-level pdf is a product of RRSP and other pdfs
260 for (const auto component : ((RooProdPdf *)component_pdf)->pdfList()) {
261 if (component->getAttribute("MAIN_MEASUREMENT")) {
262 // not really a binned pdf, but this prevents a (potentially) long list of subsidiary measurements to
263 // be passed to the slave calculator
264 binnedPdf = (RooAbsPdf *)component;
265 break;
266 }
267 }
268 }
269 // Below here directly pass binnedPdf instead of PROD(binnedPdf,constraints) as constraints are evaluated
270 // elsewhere anyway and omitting them reduces model complexity and associated handling/cloning times
271 if (binnedL) {
272 components.push_back(std::make_unique<RooBinnedL>((binnedPdf ? binnedPdf : component_pdf), dset));
273 } else {
274 components.push_back(
275 std::make_unique<RooUnbinnedL>((binnedPdf ? binnedPdf : component_pdf), dset, _extended, _evalBackend));
276 }
277 // }
278 components.back()->setSimCount(N_components);
279
280 // Servers may have been redirected between instantiation and (deferred) initialization
281
282 std::unique_ptr<RooArgSet> actualParams{binnedPdf ? binnedPdf->getParameters(dset)
283 : component_pdf->getParameters(dset)};
284 RooArgSet params;
285 _pdf.getParameters(_data.get(), params);
286 RooArgSet selTargetParams;
287 params.selectCommon(*actualParams, selTargetParams);
288
289 assert(selTargetParams.equals(*components.back()->getParameters()));
290
291 ++n;
292 } else {
293 if ((!dset || (0. != dset->sumEntries() && !process_empty_data_sets)) && component_pdf) {
294 ooccoutD(nullptr, Fitting) << "getSimultaneousComponents: state " << catName
295 << " has no data entries, no slave calculator created" << std::endl;
296 }
297 }
298 }
299 oocoutI(nullptr, Fitting) << "getSimultaneousComponents: created " << n << " slave calculators." << std::endl;
300
301 return components;
302}
303
304/// Create a likelihood builder for a given pdf and dataset.
305/// \param[in] pdf Raw pointer to the pdf
306/// \param[in] data Raw pointer to the dataset
308
309/*
310 * \brief Build a likelihood from a pdf + dataset, optionally with a subsidiary likelihood component.
311 *
312 * This function analyzes the pdf and automatically constructs the proper likelihood, built up from the available
313 * RooAbsL subclasses. In essence, this can give 8 conceptually different combinations, based on three questions:
314 * 1. Is it a simultaneous pdf?
315 * 2. Is the pdf binned?
316 * 3. Does the pdf have subsidiary terms?
317 * If questions 1 and 3 are answered negatively, this function will either return a RooBinnedL or RooUnbinnedL. In all
318 * other cases it returns a RooSumL, which will contain RooBinnedL and/or RooUnbinnedL component(s) and possibly a
319 * RooSubsidiaryL component with constraint terms.
320 *
321 * \return A unique pointer to a RooSubsidiaryL that contains all terms in
322 * the pdf that can be calculated separately from the other components in the full likelihood.
323 */
324std::unique_ptr<RooAbsL> NLLFactory::build()
325{
326 std::unique_ptr<RooAbsL> likelihood;
327 std::vector<std::unique_ptr<RooAbsL>> components;
328
329 if (dynamic_cast<RooSimultaneous const *>(&_pdf)) {
330 components = getSimultaneousComponents();
331 } else if (auto binnedPdf = getBinnedPdf(&_pdf)) {
332 likelihood = std::make_unique<RooBinnedL>(binnedPdf, &_data);
333 } else { // unbinned
334 likelihood = std::make_unique<RooUnbinnedL>(&_pdf, &_data, _extended, _evalBackend);
335 }
336
339 if (subsidiary) {
340 if (likelihood) {
341 components.push_back(std::move(likelihood));
342 }
343 components.push_back(std::move(subsidiary));
344 }
345 if (!components.empty()) {
346 likelihood = std::make_unique<RooSumL>(&_pdf, &_data, std::move(components), _extended);
347 }
348 return likelihood;
349}
350
351/// \param[in] extended Set extended term calculation on, off or use
352/// RooAbsL::Extended::Auto to determine automatically based on the
353/// pdf whether to activate or not.
355{
356 _extended = extended;
357 return *this;
358}
359
360/// \param[in] constrainedParameters Set of parameters that are constrained.
361/// Pdf components dependent on these alone are added to the
362/// subsidiary likelihood.
364{
365 _constrainedParameters.add(constrainedParameters);
366 return *this;
367}
368
369/// \param[in] externalConstraints Set of external constraint pdfs, i.e.
370/// constraints not necessarily in the pdf itself. These are always
371/// added to the subsidiary likelihood.
373{
374 _externalConstraints.add(externalConstraints);
375 return *this;
376}
377
378/// \param[in] globalObservables Observables that have a constant value,
379/// independent of the dataset events. Pdf components dependent on
380/// these alone are added to the subsidiary likelihood.
381/// \note Overrides all other likelihood parameters (like those in
382/// NLLFactory::ConstrainedParameters()) if present.
384{
385 _globalObservables.add(globalObservables);
386 return *this;
387}
388
389/// \param[in] globalObservablesTag String that can be set as attribute in
390/// pdf components to indicate that it is a global observable. Can
391/// be used instead of or in addition to
392/// NLLFactory::GlobalObservables().
393NLLFactory &NLLFactory::GlobalObservablesTag(const char *globalObservablesTag)
394{
395 _globalObservablesTag = globalObservablesTag;
396 return *this;
397}
398
400{
401 _evalBackend = evalBackend;
402 return *this;
403}
404
405} // namespace TestStatistics
406} // namespace RooFit
#define oocoutI(o, a)
#define ooccoutI(o, a)
#define ooccoutD(o, a)
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void data
char name[80]
Definition TGX11.cxx:110
RooFit::OwningPtr< RooArgSet > getParameters(const RooAbsData *data, bool stripDisconnected=true) const
Create a list of leaf nodes in the arg tree starting with ourself as top node that don't match any of...
const Text_t * getStringAttribute(const Text_t *key) const
Get string attribute mapped under key 'key'.
bool getAttribute(const Text_t *name) const
Check if a named attribute is set. By default, all attributes are unset.
RooFit::OwningPtr< RooArgSet > getVariables(bool stripDisconnected=true) const
Return RooArgSet with all variables (tree leaf nodes of expression tree)
Abstract base class for objects that represent a discrete value that can be set from the outside,...
bool equals(const RooAbsCollection &otherColl) const
Check if this and other collection have identically-named contents.
virtual void removeAll()
Remove all arguments from our set, deleting them if we own them.
Int_t getSize() const
Return the number of elements in the collection.
virtual bool add(const RooAbsArg &var, bool silent=false)
Add the specified argument to list.
bool selectCommon(const RooAbsCollection &refColl, RooAbsCollection &outColl) const
Create a subset of the current collection, consisting only of those elements that are contained as we...
Abstract base class for binned and unbinned datasets.
Definition RooAbsData.h:57
virtual const RooArgSet * get() const
Definition RooAbsData.h:101
virtual RooFit::OwningPtr< TList > split(const RooAbsCategory &splitCat, bool createEmptyDataSets=false) const
Split the dataset into subsets based on states of a categorical variable in this dataset.
Abstract interface for all probability density functions.
Definition RooAbsPdf.h:40
TClass * IsA() const override
Definition RooAbsPdf.h:353
RooArgSet * getAllConstraints(const RooArgSet &observables, RooArgSet &constrainedParams, bool stripDisconnected=true, bool removeConstraintsFromPdf=false) const
This helper function finds and collects all constraints terms of all component p.d....
RooArgSet is a container object that can hold multiple RooAbsArg objects.
Definition RooArgSet.h:55
std::unique_ptr< RooAbsL > build()
std::vector< std::unique_ptr< RooAbsL > > getSimultaneousComponents()
NLLFactory & EvalBackend(RooFit::EvalBackend evalBackend)
NLLFactory(RooAbsPdf &pdf, RooAbsData &data)
Create a likelihood builder for a given pdf and dataset.
NLLFactory & ExternalConstraints(const RooArgSet &externalconstraints)
NLLFactory & Extended(RooAbsL::Extended extended)
NLLFactory & GlobalObservables(const RooArgSet &globalObservables)
NLLFactory & ConstrainedParameters(const RooArgSet &constrainedParameters)
NLLFactory & GlobalObservablesTag(const char *globalObservablesTag)
static bool isExtendedHelper(RooAbsPdf *pdf, Extended extended)
Definition RooAbsL.cxx:34
Efficient implementation of a product of PDFs of the form.
Definition RooProdPdf.h:33
static TClass * Class()
static TClass * Class()
Facilitates simultaneous fitting of multiple PDFs to subsets of a given dataset.
Bool_t InheritsFrom(const char *cl) const override
Return kTRUE if this class inherits from a class with name "classname".
Definition TClass.cxx:4874
const char * GetName() const override
Returns name of object.
Definition TNamed.h:47
Basic string class.
Definition TString.h:139
const Int_t n
Definition legend1.C:16
The namespace RooFit contains mostly switches that change the behaviour of functions of PDFs (or othe...
Definition JSONIO.h:26