Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
HybridStandardForm.C
Go to the documentation of this file.
1/// \file
2/// \ingroup tutorial_roostats
3/// \notebook -js
4/// A hypothesis testing example based on number counting with background uncertainty.
5///
6/// A hypothesis testing example based on number counting
7/// with background uncertainty.
8///
9/// NOTE: This example is like HybridInstructional, but the model is more clearly
10/// generalizable to an analysis with shapes. There is a lot of flexibility
11/// for how one models a problem in RooFit/RooStats. Models come in a few
12/// common forms:
13/// - standard form: extended PDF of some discriminating variable m:
14/// eg: P(m) ~ S*fs(m) + B*fb(m), with S+B events expected
15/// in this case the dataset has N rows corresponding to N events
16/// and the extended term is Pois(N|S+B)
17///
18/// - fractional form: non-extended PDF of some discriminating variable m:
19/// eg: P(m) ~ s*fs(m) + (1-s)*fb(m), where s is a signal fraction
20/// in this case the dataset has N rows corresponding to N events
21/// and there is no extended term
22///
23/// - number counting form: in which there is no discriminating variable
24/// and the counts are modeled directly (see HybridInstructional)
25/// eg: P(N) = Pois(N|S+B)
26/// in this case the dataset has 1 row corresponding to N events
27/// and the extended term is the PDF itself.
28///
29/// Here we convert the number counting form into the standard form by
30/// introducing a dummy discriminating variable m with a uniform distribution.
31///
32/// This example:
33/// - demonstrates the usage of the HybridCalcultor (Part 4-6)
34/// - demonstrates the numerical integration of RooFit (Part 2)
35/// - validates the RooStats against an example with a known analytic answer
36/// - demonstrates usage of different test statistics
37/// - explains subtle choices in the prior used for hybrid methods
38/// - demonstrates usage of different priors for the nuisance parameters
39/// - demonstrates usage of PROOF
40///
41/// The basic setup here is that a main measurement has observed x events with an
42/// expectation of s+b. One can choose an ad hoc prior for the uncertainty on b,
43/// or try to base it on an auxiliary measurement. In this case, the auxiliary
44/// measurement (aka control measurement, sideband) is another counting experiment
45/// with measurement y and expectation tau*b. With an 'original prior' on b,
46/// called \f$ \eta(b) \f$ then one can obtain a posterior from the auxiliary measurement
47/// \f$ \pi(b) = \eta(b) * Pois(y|tau*b) \f$. This is a principled choice for a prior
48/// on b in the main measurement of x, which can then be treated in a hybrid
49/// Bayesian/Frequentist way. Additionally, one can try to treat the two
50/// measurements simultaneously, which is detailed in Part 6 of the tutorial.
51///
52/// This tutorial is related to the FourBin.C tutorial in the modeling, but
53/// focuses on hypothesis testing instead of interval estimation.
54///
55/// More background on this 'prototype problem' can be found in the
56/// following papers:
57///
58/// - Evaluation of three methods for calculating statistical significance
59/// when incorporating a systematic uncertainty into a test of the
60/// background-only hypothesis for a Poisson process
61/// Authors: Robert D. Cousins, James T. Linnemann, Jordan Tucker
62/// http://arxiv.org/abs/physics/0702156
63/// NIM A 595 (2008) 480--501
64///
65/// - Statistical Challenges for Searches for New Physics at the LHC
66/// Author: Kyle Cranmer
67/// http://arxiv.org/abs/physics/0511028
68///
69/// - Measures of Significance in HEP and Astrophysics
70/// Author: J. T. Linnemann
71/// http://arxiv.org/abs/physics/0312059
72///
73/// \macro_image
74/// \macro_output
75/// \macro_code
76///
77/// \authors Kyle Cranmer, Wouter Verkerke, and Sven Kreiss
78
79#include "RooGlobalFunc.h"
80#include "RooRealVar.h"
81#include "RooProdPdf.h"
82#include "RooWorkspace.h"
83#include "RooDataSet.h"
84#include "RooDataHist.h"
85#include "TCanvas.h"
86#include "TStopwatch.h"
87#include "TH1.h"
88#include "RooPlot.h"
89#include "RooMsgService.h"
90
92
96
102
103using namespace RooFit;
104using namespace RooStats;
105
106//-------------------------------------------------------
107// A New Test Statistic Class for this example.
108// It simply returns the sum of the values in a particular
109// column of a dataset.
110// You can ignore this class and focus on the macro below
111
112class BinCountTestStat : public TestStatistic {
113public:
114 BinCountTestStat(void) : fColumnName("tmp") {}
115 BinCountTestStat(string columnName) : fColumnName(columnName) {}
116
117 virtual Double_t Evaluate(RooAbsData &data, RooArgSet & /*nullPOI*/)
118 {
119 // This is the main method in the interface
120 Double_t value = 0.0;
121 for (int i = 0; i < data.numEntries(); i++) {
122 value += data.get(i)->getRealValue(fColumnName.c_str());
123 }
124 return value;
125 }
126 virtual const TString GetVarName() const { return fColumnName; }
127
128private:
129 string fColumnName;
130
131protected:
132 ClassDef(BinCountTestStat, 1)
133};
134
135ClassImp(BinCountTestStat)
136
137//-----------------------------
138// The Actual Tutorial Macro
139//-----------------------------
140
141void HybridStandardForm(int ntoys = 6000)
142{
143 double nToysRatio = 20; // ratio Ntoys Null/ntoys ALT
144
145 // This tutorial has 6 parts
146 // Table of Contents
147 // Setup
148 // 1. Make the model for the 'prototype problem'
149 // Special cases
150 // 2. NOT RELEVANT HERE
151 // 3. Use RooStats analytic solution for this problem
152 // RooStats HybridCalculator -- can be generalized
153 // 4. RooStats ToyMC version of 2. & 3.
154 // 5. RooStats ToyMC with an equivalent test statistic
155 // 6. RooStats ToyMC with simultaneous control & main measurement
156
157 // Part 4 takes ~4 min without PROOF.
158 // Part 5 takes about ~2 min with PROOF on 4 cores.
159 // Of course, everything looks nicer with more toys, which takes longer.
160
161 TStopwatch t;
162 t.Start();
163 TCanvas *c = new TCanvas;
164 c->Divide(2, 2);
165
166 //-----------------------------------------------------
167 // P A R T 1 : D I R E C T I N T E G R A T I O N
168 // ====================================================
169 // Make model for prototype on/off problem
170 // Pois(x | s+b) * Pois(y | tau b )
171 // for Z_Gamma, use uniform prior on b.
172 RooWorkspace *w = new RooWorkspace("w");
173
174 // replace the pdf in 'number counting form'
175 // w->factory("Poisson::px(x[150,0,500],sum::splusb(s[0,0,100],b[100,0,300]))");
176 // with one in standard form. Now x is encoded in event count
177 w->factory("Uniform::f(m[0,1])"); // m is a dummy discriminating variable
178 w->factory("ExtendPdf::px(f,sum::splusb(s[0,0,100],b[100,0.1,300]))");
179 w->factory("Poisson::py(y[100,0.1,500],prod::taub(tau[1.],b))");
180 w->factory("PROD::model(px,py)");
181 w->factory("Uniform::prior_b(b)");
182
183 // We will control the output level in a few places to avoid
184 // verbose progress messages. We start by keeping track
185 // of the current threshold on messages.
187
188 // Use PROOF-lite on multi-core machines
189 ProofConfig *pc = NULL;
190 // uncomment below if you want to use PROOF
191 pc = new ProofConfig(*w, 4, "workers=4", kFALSE); // machine with 4 cores
192 // pc = new ProofConfig(*w, 2, "workers=2", kFALSE); // machine with 2 cores
193
194 //-----------------------------------------------
195 // P A R T 3 : A N A L Y T I C R E S U L T
196 // ==============================================
197 // In this special case, the integrals are known analytically
198 // and they are implemented in RooStats::NumberCountingUtils
199
200 // analytic Z_Bi
201 double p_Bi = NumberCountingUtils::BinomialWithTauObsP(150, 100, 1);
202 double Z_Bi = NumberCountingUtils::BinomialWithTauObsZ(150, 100, 1);
203 cout << "-----------------------------------------" << endl;
204 cout << "Part 3" << endl;
205 std::cout << "Z_Bi p-value (analytic): " << p_Bi << std::endl;
206 std::cout << "Z_Bi significance (analytic): " << Z_Bi << std::endl;
207 t.Stop();
208 t.Print();
209 t.Reset();
210 t.Start();
211
212 //--------------------------------------------------------------
213 // P A R T 4 : U S I N G H Y B R I D C A L C U L A T O R
214 // ==============================================================
215 // Now we demonstrate the RooStats HybridCalculator.
216 //
217 // Like all RooStats calculators it needs the data and a ModelConfig
218 // for the relevant hypotheses. Since we are doing hypothesis testing
219 // we need a ModelConfig for the null (background only) and the alternate
220 // (signal+background) hypotheses. We also need to specify the PDF,
221 // the parameters of interest, and the observables. Furthermore, since
222 // the parameter of interest is floating, we need to specify which values
223 // of the parameter corresponds to the null and alternate (eg. s=0 and s=50)
224 //
225 // define some sets of variables obs={x} and poi={s}
226 // note here, x is the only observable in the main measurement
227 // and y is treated as a separate measurement, which is used
228 // to produce the prior that will be used in this calculation
229 // to randomize the nuisance parameters.
230 w->defineSet("obs", "m");
231 w->defineSet("poi", "s");
232
233 // create a toy dataset with the x=150
234 // RooDataSet *data = new RooDataSet("d", "d", *w->set("obs"));
235 // data->add(*w->set("obs"));
236 std::unique_ptr<RooDataSet> data{w->pdf("px")->generate(*w->set("obs"), 150)};
237
238 // Part 3a : Setup ModelConfigs
239 //-------------------------------------------------------
240 // create the null (background-only) ModelConfig with s=0
241 ModelConfig b_model("B_model", w);
242 b_model.SetPdf(*w->pdf("px"));
243 b_model.SetObservables(*w->set("obs"));
244 b_model.SetParametersOfInterest(*w->set("poi"));
245 w->var("s")->setVal(0.0); // important!
246 b_model.SetSnapshot(*w->set("poi"));
247
248 // create the alternate (signal+background) ModelConfig with s=50
249 ModelConfig sb_model("S+B_model", w);
250 sb_model.SetPdf(*w->pdf("px"));
251 sb_model.SetObservables(*w->set("obs"));
252 sb_model.SetParametersOfInterest(*w->set("poi"));
253 w->var("s")->setVal(50.0); // important!
254 sb_model.SetSnapshot(*w->set("poi"));
255
256 // Part 3b : Choose Test Statistic
257 //--------------------------------------------------------------
258 // To make an equivalent calculation we need to use x as the test
259 // statistic. This is not a built-in test statistic in RooStats
260 // so we define it above. The new class inherits from the
261 // RooStats::TestStatistic interface, and simply returns the value
262 // of x in the dataset.
263
264 NumEventsTestStat eventCount(*w->pdf("px"));
265
266 // Part 3c : Define Prior used to randomize nuisance parameters
267 //-------------------------------------------------------------
268 //
269 // The prior used for the hybrid calculator is the posterior
270 // from the auxiliary measurement y. The model for the aux.
271 // measurement is Pois(y|tau*b), thus the likelihood function
272 // is proportional to (has the form of) a Gamma distribution.
273 // if the 'original prior' $\eta(b)$ is uniform, then from
274 // Bayes's theorem we have the posterior:
275 // $\pi(b) = Pois(y|tau*b) * \eta(b)$
276 // If $\eta(b)$ is flat, then we arrive at a Gamma distribution.
277 // Since RooFit will normalize the PDF we can actually supply
278 // py=Pois(y,tau*b) that will be equivalent to multiplying by a uniform.
279 //
280 // Alternatively, we could explicitly use a gamma distribution:
281 //
282 // `w->factory("Gamma::gamma(b,sum::temp(y,1),1,0)");`
283 //
284 // or we can use some other ad hoc prior that do not naturally
285 // follow from the known form of the auxiliary measurement.
286 // The common choice is the equivalent Gaussian:
287 w->factory("Gaussian::gauss_prior(b,y, expr::sqrty('sqrt(y)',y))");
288 // this corresponds to the "Z_N" calculation.
289 //
290 // or one could use the analogous log-normal prior
291 w->factory("Lognormal::lognorm_prior(b,y, expr::kappa('1+1./sqrt(y)',y))");
292 //
293 // Ideally, the HybridCalculator would be able to inspect the full
294 // model Pois(x | s+b) * Pois(y | tau b ) and be given the original
295 // prior $\eta(b)$ to form $\pi(b) = Pois(y|tau*b) * \eta(b)$.
296 // This is not yet implemented because in the general case
297 // it is not easy to identify the terms in the PDF that correspond
298 // to the auxiliary measurement. So for now, it must be set
299 // explicitly with:
300 // - ForcePriorNuisanceNull()
301 // - ForcePriorNuisanceAlt()
302 // the name "ForcePriorNuisance" was chosen because we anticipate
303 // this to be auto-detected, but will leave the option open
304 // to force to a different prior for the nuisance parameters.
305
306 // Part 3d : Construct and configure the HybridCalculator
307 //-------------------------------------------------------
308
309 HybridCalculator hc1(*data, sb_model, b_model);
310 ToyMCSampler *toymcs1 = (ToyMCSampler *)hc1.GetTestStatSampler();
311 // toymcs1->SetNEventsPerToy(1); // because the model is in number counting form
312 toymcs1->SetTestStatistic(&eventCount); // set the test statistic
313 // toymcs1->SetGenerateBinned();
314 hc1.SetToys(ntoys, ntoys / nToysRatio);
315 hc1.ForcePriorNuisanceAlt(*w->pdf("py"));
316 hc1.ForcePriorNuisanceNull(*w->pdf("py"));
317 // if you wanted to use the ad hoc Gaussian prior instead
318 // ~~~
319 // hc1.ForcePriorNuisanceAlt(*w->pdf("gauss_prior"));
320 // hc1.ForcePriorNuisanceNull(*w->pdf("gauss_prior"));
321 // ~~~
322 // if you wanted to use the ad hoc log-normal prior instead
323 // ~~~
324 // hc1.ForcePriorNuisanceAlt(*w->pdf("lognorm_prior"));
325 // hc1.ForcePriorNuisanceNull(*w->pdf("lognorm_prior"));
326 // ~~~
327
328 // enable proof
329 // proof not enabled for this test statistic
330 // if(pc) toymcs1->SetProofConfig(pc);
331
332 // these lines save current msg level and then kill any messages below ERROR
334 // Get the result
335 HypoTestResult *r1 = hc1.GetHypoTest();
336 RooMsgService::instance().setGlobalKillBelow(msglevel); // set it back
337 cout << "-----------------------------------------" << endl;
338 cout << "Part 4" << endl;
339 r1->Print();
340 t.Stop();
341 t.Print();
342 t.Reset();
343 t.Start();
344
345 c->cd(2);
346 HypoTestPlot *p1 = new HypoTestPlot(*r1, 30); // 30 bins, TS is discrete
347 p1->Draw();
348
349 return; // keep the running time sort by default
350 //-------------------------------------------------------------------------
351 // # P A R T 5 : U S I N G H Y B R I D C A L C U L A T O R W I T H
352 // # A N A L T E R N A T I V E T E S T S T A T I S T I C
353 //
354 // A likelihood ratio test statistics should be 1-to-1 with the count x
355 // when the value of b is fixed in the likelihood. This is implemented
356 // by the SimpleLikelihoodRatioTestStat
357
358 SimpleLikelihoodRatioTestStat slrts(*b_model.GetPdf(), *sb_model.GetPdf());
359 slrts.SetNullParameters(*b_model.GetSnapshot());
360 slrts.SetAltParameters(*sb_model.GetSnapshot());
361
362 // HYBRID CALCULATOR
363 HybridCalculator hc2(*data, sb_model, b_model);
364 ToyMCSampler *toymcs2 = (ToyMCSampler *)hc2.GetTestStatSampler();
365 // toymcs2->SetNEventsPerToy(1);
366 toymcs2->SetTestStatistic(&slrts);
367 // toymcs2->SetGenerateBinned();
368 hc2.SetToys(ntoys, ntoys / nToysRatio);
369 hc2.ForcePriorNuisanceAlt(*w->pdf("py"));
370 hc2.ForcePriorNuisanceNull(*w->pdf("py"));
371 // if you wanted to use the ad hoc Gaussian prior instead
372 // ~~~
373 // hc2.ForcePriorNuisanceAlt(*w->pdf("gauss_prior"));
374 // hc2.ForcePriorNuisanceNull(*w->pdf("gauss_prior"));
375 // ~~~
376 // if you wanted to use the ad hoc log-normal prior instead
377 // ~~~
378 // hc2.ForcePriorNuisanceAlt(*w->pdf("lognorm_prior"));
379 // hc2.ForcePriorNuisanceNull(*w->pdf("lognorm_prior"));
380 // ~~~
381
382 // enable proof
383 if (pc)
384 toymcs2->SetProofConfig(pc);
385
386 // these lines save current msg level and then kill any messages below ERROR
388 // Get the result
389 HypoTestResult *r2 = hc2.GetHypoTest();
390 cout << "-----------------------------------------" << endl;
391 cout << "Part 5" << endl;
392 r2->Print();
393 t.Stop();
394 t.Print();
395 t.Reset();
396 t.Start();
398
399 c->cd(3);
400 HypoTestPlot *p2 = new HypoTestPlot(*r2, 30); // 30 bins
401 p2->Draw();
402
403 return; // so standard tutorial runs faster
404
405 //---------------------------------------------
406 // OUTPUT W/O PROOF (2.66 GHz Intel Core i7)
407 // ============================================
408
409 // -----------------------------------------
410 // Part 3
411 // Z_Bi p-value (analytic): 0.00094165
412 // Z_Bi significance (analytic): 3.10804
413 // Real time 0:00:00, CP time 0.610
414
415 // Results HybridCalculator_result:
416 // - Null p-value = 0.00103333 +/- 0.000179406
417 // - Significance = 3.08048 sigma
418 // - Number of S+B toys: 1000
419 // - Number of B toys: 30000
420 // - Test statistic evaluated on data: 150
421 // - CL_b: 0.998967 +/- 0.000185496
422 // - CL_s+b: 0.495 +/- 0.0158106
423 // - CL_s: 0.495512 +/- 0.0158272
424 // Real time 0:04:43, CP time 283.780
425
426 // With PROOF
427 // -----------------------------------------
428 // Part 5
429
430 // Results HybridCalculator_result:
431 // - Null p-value = 0.00105 +/- 0.000206022
432 // - Significance = 3.07571 sigma
433 // - Number of S+B toys: 1000
434 // - Number of B toys: 20000
435 // - Test statistic evaluated on data: 10.8198
436 // - CL_b: 0.99895 +/- 0.000229008
437 // - CL_s+b: 0.491 +/- 0.0158088
438 // - CL_s: 0.491516 +/- 0.0158258
439 // Real time 0:02:22, CP time 0.990
440
441 //-------------------------------------------------------
442 // Comparison
443 //-------------------------------------------------------
444 // LEPStatToolsForLHC
445 // https://plone4.fnal.gov:4430/P0/phystat/packages/0703002
446 // Uses Gaussian prior
447 // CL_b = 6.218476e-04, Significance = 3.228665 sigma
448 //
449 //-------------------------------------------------------
450 // Comparison
451 //-------------------------------------------------------
452 // Asymptotic
453 // From the value of the profile likelihood ratio (5.0338)
454 // The significance can be estimated using Wilks's theorem
455 // significance = sqrt(2*profileLR) = 3.1729 sigma
456}
#define c(i)
Definition RSha256.hxx:101
constexpr Bool_t kFALSE
Definition RtypesCore.h:101
double Double_t
Definition RtypesCore.h:59
#define ClassDef(name, id)
Definition Rtypes.h:337
#define ClassImp(name)
Definition Rtypes.h:377
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void w
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void data
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void value
Int_t i
static RooMsgService & instance()
Return reference to singleton instance.
void setGlobalKillBelow(RooFit::MsgLevel level)
RooFit::MsgLevel globalKillBelow() const
Same purpose as HybridCalculatorOriginal, but different implementation.
This class provides the plots for the result of a study performed with any of the HypoTestCalculatorG...
HypoTestResult is a base class for results from hypothesis tests.
void Print(const Option_t *="") const override
Print out some information about the results Note: use Alt/Null labels for the hypotheses here as the...
NumEventsTestStat is a simple implementation of the TestStatistic interface used for simple number co...
Holds configuration options for proof and proof-lite.
Definition ProofConfig.h:45
void Draw(Option_t *options=nullptr) override
Draw this plot and all of the elements it contains.
TestStatistic class that returns -log(L[null] / L[alt]) where L is the likelihood.
TestStatistic is an interface class to provide a facility for construction test statistics distributi...
ToyMCSampler is an implementation of the TestStatSampler interface.
void SetProofConfig(ProofConfig *pc=nullptr)
calling with argument or nullptr deactivates proof
virtual void SetTestStatistic(TestStatistic *testStatistic, unsigned int i)
Set the TestStatistic (want the argument to be a function of the data & parameter points.
Persistable container for RooFit projects.
The Canvas class.
Definition TCanvas.h:23
Stopwatch class.
Definition TStopwatch.h:28
void Start(Bool_t reset=kTRUE)
Start the stopwatch.
void Stop()
Stop the stopwatch.
void Reset()
Definition TStopwatch.h:52
void Print(Option_t *option="") const override
Print the real and cpu time passed between the start and stop events.
The namespace RooFit contains mostly switches that change the behaviour of functions of PDFs (or othe...
Definition JSONIO.h:26
MsgLevel
Verbosity level for RooMsgService::StreamConfig in RooMsgService.
RooStats::ModelConfig ModelConfig
double BinomialWithTauObsZ(double nObs, double bExp, double tau)
See BinomialWithTauObsP.
double BinomialWithTauObsP(double nObs, double bExp, double tau)
P-value for s=nullptr in a ratio of Poisson means.
Namespace for the RooStats classes.
Definition Asimov.h:19