Logo ROOT  
Reference Guide
Loading...
Searching...
No Matches
HybridStandardForm.C
Go to the documentation of this file.
1/// \file
2/// \ingroup tutorial_roostats
3/// \notebook -js
4/// A hypothesis testing example based on number counting with background uncertainty.
5///
6/// A hypothesis testing example based on number counting
7/// with background uncertainty.
8///
9/// NOTE: This example is like HybridInstructional, but the model is more clearly
10/// generalizable to an analysis with shapes. There is a lot of flexibility
11/// for how one models a problem in RooFit/RooStats. Models come in a few
12/// common forms:
13/// - standard form: extended PDF of some discriminating variable m:
14/// eg: P(m) ~ S*fs(m) + B*fb(m), with S+B events expected
15/// in this case the dataset has N rows corresponding to N events
16/// and the extended term is Pois(N|S+B)
17///
18/// - fractional form: non-extended PDF of some discriminating variable m:
19/// eg: P(m) ~ s*fs(m) + (1-s)*fb(m), where s is a signal fraction
20/// in this case the dataset has N rows corresponding to N events
21/// and there is no extended term
22///
23/// - number counting form: in which there is no discriminating variable
24/// and the counts are modeled directly (see HybridInstructional)
25/// eg: P(N) = Pois(N|S+B)
26/// in this case the dataset has 1 row corresponding to N events
27/// and the extended term is the PDF itself.
28///
29/// Here we convert the number counting form into the standard form by
30/// introducing a dummy discriminating variable m with a uniform distribution.
31///
32/// This example:
33/// - demonstrates the usage of the HybridCalcultor (Part 4-6)
34/// - demonstrates the numerical integration of RooFit (Part 2)
35/// - validates the RooStats against an example with a known analytic answer
36/// - demonstrates usage of different test statistics
37/// - explains subtle choices in the prior used for hybrid methods
38/// - demonstrates usage of different priors for the nuisance parameters
39///
40/// The basic setup here is that a main measurement has observed x events with an
41/// expectation of s+b. One can choose an ad hoc prior for the uncertainty on b,
42/// or try to base it on an auxiliary measurement. In this case, the auxiliary
43/// measurement (aka control measurement, sideband) is another counting experiment
44/// with measurement y and expectation tau*b. With an 'original prior' on b,
45/// called \f$ \eta(b) \f$ then one can obtain a posterior from the auxiliary measurement
46/// \f$ \pi(b) = \eta(b) * Pois(y|tau*b) \f$. This is a principled choice for a prior
47/// on b in the main measurement of x, which can then be treated in a hybrid
48/// Bayesian/Frequentist way. Additionally, one can try to treat the two
49/// measurements simultaneously, which is detailed in Part 6 of the tutorial.
50///
51/// This tutorial is related to the FourBin.C tutorial in the modeling, but
52/// focuses on hypothesis testing instead of interval estimation.
53///
54/// More background on this 'prototype problem' can be found in the
55/// following papers:
56///
57/// - Evaluation of three methods for calculating statistical significance
58/// when incorporating a systematic uncertainty into a test of the
59/// background-only hypothesis for a Poisson process
60/// Authors: Robert D. Cousins, James T. Linnemann, Jordan Tucker
61/// http://arxiv.org/abs/physics/0702156
62/// NIM A 595 (2008) 480--501
63///
64/// - Statistical Challenges for Searches for New Physics at the LHC
65/// Author: Kyle Cranmer
66/// http://arxiv.org/abs/physics/0511028
67///
68/// - Measures of Significance in HEP and Astrophysics
69/// Author: J. T. Linnemann
70/// http://arxiv.org/abs/physics/0312059
71///
72/// \macro_image
73/// \macro_output
74/// \macro_code
75///
76/// \authors Kyle Cranmer, Wouter Verkerke, and Sven Kreiss
77
78#include "RooGlobalFunc.h"
79#include "RooRealVar.h"
80#include "RooProdPdf.h"
81#include "RooWorkspace.h"
82#include "RooDataSet.h"
83#include "RooDataHist.h"
84#include "TCanvas.h"
85#include "TStopwatch.h"
86#include "TH1.h"
87#include "RooPlot.h"
88#include "RooMsgService.h"
89
91
95
101
102using namespace RooFit;
103using namespace RooStats;
104
105//-------------------------------------------------------
106// A New Test Statistic Class for this example.
107// It simply returns the sum of the values in a particular
108// column of a dataset.
109// You can ignore this class and focus on the macro below
110
111class BinCountTestStat : public TestStatistic {
112public:
113 BinCountTestStat(void) : fColumnName("tmp") {}
114 BinCountTestStat(string columnName) : fColumnName(columnName) {}
115
116 virtual Double_t Evaluate(RooAbsData &data, RooArgSet & /*nullPOI*/)
117 {
118 // This is the main method in the interface
119 Double_t value = 0.0;
120 for (int i = 0; i < data.numEntries(); i++) {
121 value += data.get(i)->getRealValue(fColumnName.c_str());
122 }
123 return value;
124 }
125 virtual const TString GetVarName() const { return fColumnName; }
126
127private:
128 string fColumnName;
129
130protected:
131 ClassDef(BinCountTestStat, 1)
132};
133
134
135//-----------------------------
136// The Actual Tutorial Macro
137//-----------------------------
138
139void HybridStandardForm(int ntoys = 6000)
140{
141 double nToysRatio = 20; // ratio Ntoys Null/ntoys ALT
142
143 // This tutorial has 6 parts
144 // Table of Contents
145 // Setup
146 // 1. Make the model for the 'prototype problem'
147 // Special cases
148 // 2. NOT RELEVANT HERE
149 // 3. Use RooStats analytic solution for this problem
150 // RooStats HybridCalculator -- can be generalized
151 // 4. RooStats ToyMC version of 2. & 3.
152 // 5. RooStats ToyMC with an equivalent test statistic
153 // 6. RooStats ToyMC with simultaneous control & main measurement
154
155 // Part 4 takes ~4 min.
156 // Of course, everything looks nicer with more toys, which takes longer.
157
158 TStopwatch t;
159 t.Start();
160 TCanvas *c = new TCanvas;
161 c->Divide(2, 2);
162
163 //-----------------------------------------------------
164 // P A R T 1 : D I R E C T I N T E G R A T I O N
165 // ====================================================
166 // Make model for prototype on/off problem
167 // Pois(x | s+b) * Pois(y | tau b )
168 // for Z_Gamma, use uniform prior on b.
170
171 // replace the pdf in 'number counting form'
172 // w->factory("Poisson::px(x[150,0,500],sum::splusb(s[0,0,100],b[100,0,300]))");
173 // with one in standard form. Now x is encoded in event count
174 w->factory("Uniform::f(m[0,1])"); // m is a dummy discriminating variable
175 w->factory("ExtendPdf::px(f,sum::splusb(s[0,0,100],b[100,0.1,300]))");
176 w->factory("Poisson::py(y[100,0.1,500],prod::taub(tau[1.],b))");
177 w->factory("PROD::model(px,py)");
178 w->factory("Uniform::prior_b(b)");
179
180 // We will control the output level in a few places to avoid
181 // verbose progress messages. We start by keeping track
182 // of the current threshold on messages.
184
185 //-----------------------------------------------
186 // P A R T 3 : A N A L Y T I C R E S U L T
187 // ==============================================
188 // In this special case, the integrals are known analytically
189 // and they are implemented in RooStats::NumberCountingUtils
190
191 // analytic Z_Bi
192 double p_Bi = NumberCountingUtils::BinomialWithTauObsP(150, 100, 1);
193 double Z_Bi = NumberCountingUtils::BinomialWithTauObsZ(150, 100, 1);
194 cout << "-----------------------------------------" << endl;
195 cout << "Part 3" << endl;
196 std::cout << "Z_Bi p-value (analytic): " << p_Bi << std::endl;
197 std::cout << "Z_Bi significance (analytic): " << Z_Bi << std::endl;
198 t.Stop();
199 t.Print();
200 t.Reset();
201 t.Start();
202
203 //--------------------------------------------------------------
204 // P A R T 4 : U S I N G H Y B R I D C A L C U L A T O R
205 // ==============================================================
206 // Now we demonstrate the RooStats HybridCalculator.
207 //
208 // Like all RooStats calculators it needs the data and a ModelConfig
209 // for the relevant hypotheses. Since we are doing hypothesis testing
210 // we need a ModelConfig for the null (background only) and the alternate
211 // (signal+background) hypotheses. We also need to specify the PDF,
212 // the parameters of interest, and the observables. Furthermore, since
213 // the parameter of interest is floating, we need to specify which values
214 // of the parameter corresponds to the null and alternate (eg. s=0 and s=50)
215 //
216 // define some sets of variables obs={x} and poi={s}
217 // note here, x is the only observable in the main measurement
218 // and y is treated as a separate measurement, which is used
219 // to produce the prior that will be used in this calculation
220 // to randomize the nuisance parameters.
221 w->defineSet("obs", "m");
222 w->defineSet("poi", "s");
223
224 // create a toy dataset with the x=150
225 // RooDataSet *data = new RooDataSet("d", "d", *w->set("obs"));
226 // data->add(*w->set("obs"));
227 std::unique_ptr<RooDataSet> data{w->pdf("px")->generate(*w->set("obs"), 150)};
228
229 // Part 3a : Setup ModelConfigs
230 //-------------------------------------------------------
231 // create the null (background-only) ModelConfig with s=0
232 ModelConfig b_model("B_model", w);
233 b_model.SetPdf(*w->pdf("px"));
234 b_model.SetObservables(*w->set("obs"));
235 b_model.SetParametersOfInterest(*w->set("poi"));
236 w->var("s")->setVal(0.0); // important!
237 b_model.SetSnapshot(*w->set("poi"));
238
239 // create the alternate (signal+background) ModelConfig with s=50
240 ModelConfig sb_model("S+B_model", w);
241 sb_model.SetPdf(*w->pdf("px"));
242 sb_model.SetObservables(*w->set("obs"));
243 sb_model.SetParametersOfInterest(*w->set("poi"));
244 w->var("s")->setVal(50.0); // important!
245 sb_model.SetSnapshot(*w->set("poi"));
246
247 // Part 3b : Choose Test Statistic
248 //--------------------------------------------------------------
249 // To make an equivalent calculation we need to use x as the test
250 // statistic. This is not a built-in test statistic in RooStats
251 // so we define it above. The new class inherits from the
252 // RooStats::TestStatistic interface, and simply returns the value
253 // of x in the dataset.
254
255 NumEventsTestStat eventCount(*w->pdf("px"));
256
257 // Part 3c : Define Prior used to randomize nuisance parameters
258 //-------------------------------------------------------------
259 //
260 // The prior used for the hybrid calculator is the posterior
261 // from the auxiliary measurement y. The model for the aux.
262 // measurement is Pois(y|tau*b), thus the likelihood function
263 // is proportional to (has the form of) a Gamma distribution.
264 // if the 'original prior' $\eta(b)$ is uniform, then from
265 // Bayes's theorem we have the posterior:
266 // $\pi(b) = Pois(y|tau*b) * \eta(b)$
267 // If $\eta(b)$ is flat, then we arrive at a Gamma distribution.
268 // Since RooFit will normalize the PDF we can actually supply
269 // py=Pois(y,tau*b) that will be equivalent to multiplying by a uniform.
270 //
271 // Alternatively, we could explicitly use a gamma distribution:
272 //
273 // `w->factory("Gamma::gamma(b,sum::temp(y,1),1,0)");`
274 //
275 // or we can use some other ad hoc prior that do not naturally
276 // follow from the known form of the auxiliary measurement.
277 // The common choice is the equivalent Gaussian:
278 w->factory("Gaussian::gauss_prior(b,y, expr::sqrty('sqrt(y)',y))");
279 // this corresponds to the "Z_N" calculation.
280 //
281 // or one could use the analogous log-normal prior
282 w->factory("Lognormal::lognorm_prior(b,y, expr::kappa('1+1./sqrt(y)',y))");
283 //
284 // Ideally, the HybridCalculator would be able to inspect the full
285 // model Pois(x | s+b) * Pois(y | tau b ) and be given the original
286 // prior $\eta(b)$ to form $\pi(b) = Pois(y|tau*b) * \eta(b)$.
287 // This is not yet implemented because in the general case
288 // it is not easy to identify the terms in the PDF that correspond
289 // to the auxiliary measurement. So for now, it must be set
290 // explicitly with:
291 // - ForcePriorNuisanceNull()
292 // - ForcePriorNuisanceAlt()
293 // the name "ForcePriorNuisance" was chosen because we anticipate
294 // this to be auto-detected, but will leave the option open
295 // to force to a different prior for the nuisance parameters.
296
297 // Part 3d : Construct and configure the HybridCalculator
298 //-------------------------------------------------------
299
300 HybridCalculator hc1(*data, sb_model, b_model);
301 ToyMCSampler *toymcs1 = (ToyMCSampler *)hc1.GetTestStatSampler();
302 // toymcs1->SetNEventsPerToy(1); // because the model is in number counting form
303 toymcs1->SetTestStatistic(&eventCount); // set the test statistic
304 // toymcs1->SetGenerateBinned();
305 hc1.SetToys(ntoys, ntoys / nToysRatio);
306 hc1.ForcePriorNuisanceAlt(*w->pdf("py"));
307 hc1.ForcePriorNuisanceNull(*w->pdf("py"));
308 // if you wanted to use the ad hoc Gaussian prior instead
309 // ~~~
310 // hc1.ForcePriorNuisanceAlt(*w->pdf("gauss_prior"));
311 // hc1.ForcePriorNuisanceNull(*w->pdf("gauss_prior"));
312 // ~~~
313 // if you wanted to use the ad hoc log-normal prior instead
314 // ~~~
315 // hc1.ForcePriorNuisanceAlt(*w->pdf("lognorm_prior"));
316 // hc1.ForcePriorNuisanceNull(*w->pdf("lognorm_prior"));
317 // ~~~
318
319 // these lines save current msg level and then kill any messages below ERROR
321 // Get the result
322 HypoTestResult *r1 = hc1.GetHypoTest();
323 RooMsgService::instance().setGlobalKillBelow(msglevel); // set it back
324 cout << "-----------------------------------------" << endl;
325 cout << "Part 4" << endl;
326 r1->Print();
327 t.Stop();
328 t.Print();
329 t.Reset();
330 t.Start();
331
332 c->cd(2);
333 HypoTestPlot *p1 = new HypoTestPlot(*r1, 30); // 30 bins, TS is discrete
334 p1->Draw();
335
336 return; // keep the running time sort by default
337 //-------------------------------------------------------------------------
338 // # P A R T 5 : U S I N G H Y B R I D C A L C U L A T O R W I T H
339 // # A N A L T E R N A T I V E T E S T S T A T I S T I C
340 //
341 // A likelihood ratio test statistics should be 1-to-1 with the count x
342 // when the value of b is fixed in the likelihood. This is implemented
343 // by the SimpleLikelihoodRatioTestStat
344
345 SimpleLikelihoodRatioTestStat slrts(*b_model.GetPdf(), *sb_model.GetPdf());
346 slrts.SetNullParameters(*b_model.GetSnapshot());
347 slrts.SetAltParameters(*sb_model.GetSnapshot());
348
349 // HYBRID CALCULATOR
350 HybridCalculator hc2(*data, sb_model, b_model);
351 ToyMCSampler *toymcs2 = (ToyMCSampler *)hc2.GetTestStatSampler();
352 // toymcs2->SetNEventsPerToy(1);
353 toymcs2->SetTestStatistic(&slrts);
354 // toymcs2->SetGenerateBinned();
355 hc2.SetToys(ntoys, ntoys / nToysRatio);
356 hc2.ForcePriorNuisanceAlt(*w->pdf("py"));
357 hc2.ForcePriorNuisanceNull(*w->pdf("py"));
358 // if you wanted to use the ad hoc Gaussian prior instead
359 // ~~~
360 // hc2.ForcePriorNuisanceAlt(*w->pdf("gauss_prior"));
361 // hc2.ForcePriorNuisanceNull(*w->pdf("gauss_prior"));
362 // ~~~
363 // if you wanted to use the ad hoc log-normal prior instead
364 // ~~~
365 // hc2.ForcePriorNuisanceAlt(*w->pdf("lognorm_prior"));
366 // hc2.ForcePriorNuisanceNull(*w->pdf("lognorm_prior"));
367 // ~~~
368
369 // these lines save current msg level and then kill any messages below ERROR
371 // Get the result
372 HypoTestResult *r2 = hc2.GetHypoTest();
373 cout << "-----------------------------------------" << endl;
374 cout << "Part 5" << endl;
375 r2->Print();
376 t.Stop();
377 t.Print();
378 t.Reset();
379 t.Start();
381
382 c->cd(3);
383 HypoTestPlot *p2 = new HypoTestPlot(*r2, 30); // 30 bins
384 p2->Draw();
385
386 return; // so standard tutorial runs faster
387
388 //---------------------------------------------
389 // OUTPUT (2.66 GHz Intel Core i7)
390 // ============================================
391
392 // -----------------------------------------
393 // Part 3
394 // Z_Bi p-value (analytic): 0.00094165
395 // Z_Bi significance (analytic): 3.10804
396 // Real time 0:00:00, CP time 0.610
397
398 // Results HybridCalculator_result:
399 // - Null p-value = 0.00103333 +/- 0.000179406
400 // - Significance = 3.08048 sigma
401 // - Number of S+B toys: 1000
402 // - Number of B toys: 30000
403 // - Test statistic evaluated on data: 150
404 // - CL_b: 0.998967 +/- 0.000185496
405 // - CL_s+b: 0.495 +/- 0.0158106
406 // - CL_s: 0.495512 +/- 0.0158272
407 // Real time 0:04:43, CP time 283.780
408
409 //-------------------------------------------------------
410 // Comparison
411 //-------------------------------------------------------
412 // LEPStatToolsForLHC
413 // https://plone4.fnal.gov:4430/P0/phystat/packages/0703002
414 // Uses Gaussian prior
415 // CL_b = 6.218476e-04, Significance = 3.228665 sigma
416 //
417 //-------------------------------------------------------
418 // Comparison
419 //-------------------------------------------------------
420 // Asymptotic
421 // From the value of the profile likelihood ratio (5.0338)
422 // The significance can be estimated using Wilks's theorem
423 // significance = sqrt(2*profileLR) = 3.1729 sigma
424}
#define c(i)
Definition RSha256.hxx:101
double Double_t
Double 8 bytes.
Definition RtypesCore.h:73
#define ClassDef(name, id)
Definition Rtypes.h:344
static RooMsgService & instance()
Return reference to singleton instance.
void setGlobalKillBelow(RooFit::MsgLevel level)
RooFit::MsgLevel globalKillBelow() const
Same purpose as HybridCalculatorOriginal, but different implementation.
This class provides the plots for the result of a study performed with any of the HypoTestCalculatorG...
HypoTestResult is a base class for results from hypothesis tests.
void Print(const Option_t *="") const override
Print out some information about the results Note: use Alt/Null labels for the hypotheses here as the...
NumEventsTestStat is a simple implementation of the TestStatistic interface used for simple number co...
void Draw(Option_t *options=nullptr) override
Draw this plot and all of the elements it contains.
TestStatistic class that returns -log(L[null] / L[alt]) where L is the likelihood.
TestStatistic is an interface class to provide a facility for construction test statistics distributi...
ToyMCSampler is an implementation of the TestStatSampler interface.
virtual void SetTestStatistic(TestStatistic *testStatistic, unsigned int i)
Set the TestStatistic (want the argument to be a function of the data & parameter points.
Persistable container for RooFit projects.
The Canvas class.
Definition TCanvas.h:23
Stopwatch class.
Definition TStopwatch.h:28
void Start(Bool_t reset=kTRUE)
Start the stopwatch.
void Stop()
Stop the stopwatch.
void Reset()
Definition TStopwatch.h:52
void Print(Option_t *option="") const override
Print the real and cpu time passed between the start and stop events.
The namespace RooFit contains mostly switches that change the behaviour of functions of PDFs (or othe...
Definition CodegenImpl.h:72
MsgLevel
Verbosity level for RooMsgService::StreamConfig in RooMsgService.
RooStats::ModelConfig ModelConfig
double BinomialWithTauObsZ(double nObs, double bExp, double tau)
See BinomialWithTauObsP.
double BinomialWithTauObsP(double nObs, double bExp, double tau)
P-value for s=nullptr in a ratio of Poisson means.
Namespace for the RooStats classes.
Definition CodegenImpl.h:66