Logo ROOT   6.16/01
Reference Guide
FourBinInstructional.C
Go to the documentation of this file.
1/// \file
2/// \ingroup tutorial_roostats
3/// \notebook
4/// This example is a generalization of the on/off problem.
5///
6/// This example is a generalization of the on/off problem.
7/// It's a common setup for SUSY searches. Imagine that one has two
8/// variables "x" and "y" (eg. missing ET and SumET), see figure.
9/// The signal region has high values of both of these variables (top right).
10/// One can see low values of "x" or "y" acting as side-bands. If we
11/// just used "y" as a sideband, we would have the on/off problem.
12/// - In the signal region we observe non events and expect s+b events.
13/// - In the region with low values of "y" (bottom right)
14/// we observe noff events and expect tau*b events.
15/// Note the significance of tau. In the background only case:
16///
17/// ~~~{.cpp}
18/// tau ~ <expectation off> / <expectation on>
19/// ~~~
20///
21/// If tau is known, this model is sufficient, but often tau is not known exactly.
22/// So one can use low values of "x" as an additional constraint for tau.
23/// Note that this technique critically depends on the notion that the
24/// joint distribution for "x" and "y" can be factorized.
25/// Generally, these regions have many events, so it the ratio can be
26/// measured very precisely there. So we extend the model to describe the
27/// left two boxes... denoted with "bar".
28/// - In the upper left we observe nonbar events and expect bbar events
29/// - In the bottom left we observe noffbar events and expect tau bbar events
30/// Note again we have:
31///
32/// ~~~{.cpp}
33/// tau ~ <expectation off bar> / <expectation on bar>
34/// ~~~
35///
36/// One can further expand the model to account for the systematic associated
37/// to assuming the distribution of "x" and "y" factorizes (eg. that
38/// tau is the same for off/on and offbar/onbar). This can be done in several
39/// ways, but here we introduce an additional parameter rho, which so that
40/// one set of models will use tau and the other tau*rho. The choice is arbitrary,
41/// but it has consequences on the numerical stability of the algorithms.
42/// The "bar" measurements typically have more events (& smaller relative errors).
43/// If we choose
44///
45/// ~~~{.cpp}
46/// <expectation noffbar> = tau * rho * <expectation noonbar>
47/// ~~~
48///
49/// the product tau*rho will be known very precisely (~1/sqrt(bbar)) and the contour
50/// in those parameters will be narrow and have a non-trivial tau~1/rho shape.
51/// However, if we choose to put rho on the non/noff measurements (where the
52/// product will have an error `~1/sqrt(b))`, the contours will be more amenable
53/// to numerical techniques. Thus, here we choose to define:
54///
55/// ~~~{.cpp}
56/// tau := <expectation off bar> / (<expectation on bar>)
57/// rho := <expectation off> / (<expectation on> * tau)
58///
59/// ^ y
60/// |
61/// |---------------------------+
62/// | | |
63/// | nonbar | non |
64/// | bbar | s+b |
65/// | | |
66/// |---------------+-----------|
67/// | | |
68/// | noffbar | noff |
69/// | tau bbar | tau b rho |
70/// | | |
71/// +-----------------------------> x
72/// ~~~
73///
74/// Left in this way, the problem is under-constrained. However, one may
75/// have some auxiliary measurement (usually based on Monte Carlo) to
76/// constrain rho. Let us call this auxiliary measurement that gives
77/// the nominal value of rho "rhonom". Thus, there is a 'constraint' term in
78/// the full model: P(rhonom | rho). In this case, we consider a Gaussian
79/// constraint with standard deviation sigma.
80///
81/// In the example, the initial values of the parameters are:
82///
83/// ~~~{.cpp}
84/// - s = 40
85/// - b = 100
86/// - tau = 5
87/// - bbar = 1000
88/// - rho = 1
89/// (sigma for rho = 20%)
90/// ~~~
91///
92/// and in the toy dataset:
93///
94/// ~~~{.cpp}
95/// - non = 139
96/// - noff = 528
97/// - nonbar = 993
98/// - noffbar = 4906
99/// - rhonom = 1.27824
100/// ~~~
101///
102/// Note, the covariance matrix of the parameters has large off-diagonal terms.
103/// Clearly s,b are anti-correlated. Similarly, since noffbar >> nonbar, one would
104/// expect bbar,tau to be anti-correlated.
105///
106/// This can be seen below.
107///
108/// ~~~{.cpp}
109/// GLOBAL b bbar rho s tau
110/// b 0.96820 1.000 0.191 -0.942 -0.762 -0.209
111/// bbar 0.91191 0.191 1.000 0.000 -0.146 -0.912
112/// rho 0.96348 -0.942 0.000 1.000 0.718 -0.000
113/// s 0.76250 -0.762 -0.146 0.718 1.000 0.160
114/// tau 0.92084 -0.209 -0.912 -0.000 0.160 1.000
115/// ~~~
116///
117/// Similarly, since tau*rho appears as a product, we expect rho,tau
118/// to be anti-correlated. When the error on rho is significantly
119/// larger than 1/sqrt(bbar), tau is essentially known and the
120/// correlation is minimal (tau mainly cares about bbar, and rho about b,s).
121/// In the alternate parametrization (bbar* tau * rho) the correlation coefficient
122/// for rho,tau is large (and negative).
123///
124/// The code below uses best-practices for RooFit & RooStats as of June 2010.
125///
126/// It proceeds as follows:
127/// - create a workspace to hold the model
128/// - use workspace factory to quickly create the terms of the model
129/// - use workspace factory to define total model (a prod pdf)
130/// - create a RooStats ModelConfig to specify observables, parameters of interest
131/// - add to the ModelConfig a prior on the parameters for Bayesian techniques
132/// note, the pdf it is factorized for parameters of interest & nuisance params
133/// - visualize the model
134/// - write the workspace to a file
135/// - use several of RooStats IntervalCalculators & compare results
136///
137/// \macro_image
138/// \macro_output
139/// \macro_code
140///
141/// \authors authors: Kyle Cranmer, Tanja Rommerskirchen
142
143#include "TStopwatch.h"
144#include "TCanvas.h"
145#include "TROOT.h"
146#include "RooPlot.h"
147#include "RooAbsPdf.h"
148#include "RooWorkspace.h"
149#include "RooDataSet.h"
150#include "RooGlobalFunc.h"
151#include "RooFitResult.h"
152#include "RooRandom.h"
164
165using namespace RooFit;
166using namespace RooStats;
167
168void FourBinInstructional(bool doBayesian=false, bool doFeldmanCousins=false, bool doMCMC=false){
169
170 // let's time this challenging example
171 TStopwatch t;
172 t.Start();
173
174 // set RooFit random seed for reproducible results
176
177 // make model
178 RooWorkspace* wspace = new RooWorkspace("wspace");
179 wspace->factory("Poisson::on(non[0,1000], sum::splusb(s[40,0,100],b[100,0,300]))");
180 wspace->factory("Poisson::off(noff[0,5000], prod::taub(b,tau[5,3,7],rho[1,0,2]))");
181 wspace->factory("Poisson::onbar(nonbar[0,10000], bbar[1000,500,2000])");
182 wspace->factory("Poisson::offbar(noffbar[0,1000000], prod::lambdaoffbar(bbar, tau))");
183 wspace->factory("Gaussian::mcCons(rhonom[1.,0,2], rho, sigma[.2])");
184 wspace->factory("PROD::model(on,off,onbar,offbar,mcCons)");
185 wspace->defineSet("obs","non,noff,nonbar,noffbar,rhonom");
186
187 wspace->factory("Uniform::prior_poi({s})");
188 wspace->factory("Uniform::prior_nuis({b,bbar,tau, rho})");
189 wspace->factory("PROD::prior(prior_poi,prior_nuis)");
190
191 // ----------------------------------
192 // Control some interesting variations
193 // define parameers of interest
194 // for 1-d plots
195 wspace->defineSet("poi","s");
196 wspace->defineSet("nuis","b,tau,rho,bbar");
197 // for 2-d plots to inspect correlations:
198 // wspace->defineSet("poi","s,rho");
199
200 // test simpler cases where parameters are known.
201 // wspace->var("tau")->setConstant();
202 // wspace->var("rho")->setConstant();
203 // wspace->var("b")->setConstant();
204 // wspace->var("bbar")->setConstant();
205
206 // inspect workspace
207 // wspace->Print();
208
209 // ----------------------------------------------------------
210 // Generate toy data
211 // generate toy data assuming current value of the parameters
212 // import into workspace.
213 // add Verbose() to see how it's being generated
214 RooDataSet* data = wspace->pdf("model")->generate(*wspace->set("obs"),1);
215 // data->Print("v");
216 wspace->import(*data);
217
218 // ----------------------------------
219 // Now the statistical tests
220 // model config
221 ModelConfig* modelConfig = new ModelConfig("FourBins");
222 modelConfig->SetWorkspace(*wspace);
223 modelConfig->SetPdf(*wspace->pdf("model"));
224 modelConfig->SetPriorPdf(*wspace->pdf("prior"));
225 modelConfig->SetParametersOfInterest(*wspace->set("poi"));
226 modelConfig->SetNuisanceParameters(*wspace->set("nuis"));
227 wspace->import(*modelConfig);
228 wspace->writeToFile("FourBin.root");
229
230 // -------------------------------------------------
231 // If you want to see the covariance matrix uncomment
232 // wspace->pdf("model")->fitTo(*data);
233
234 // use ProfileLikelihood
235 ProfileLikelihoodCalculator plc(*data, *modelConfig);
236 plc.SetConfidenceLevel(0.95);
237 LikelihoodInterval* plInt = plc.GetInterval();
240 plInt->LowerLimit( *wspace->var("s") ); // get ugly print out of the way. Fix.
242
243 // use FeldmaCousins (takes ~20 min)
244 FeldmanCousins fc(*data, *modelConfig);
245 fc.SetConfidenceLevel(0.95);
246 //number counting: dataset always has 1 entry with N events observed
247 fc.FluctuateNumDataEntries(false);
248 fc.UseAdaptiveSampling(true);
249 fc.SetNBins(40);
250 PointSetInterval* fcInt = NULL;
251 if(doFeldmanCousins){ // takes 7 minutes
252 fcInt = (PointSetInterval*) fc.GetInterval(); // fix cast
253 }
254
255
256 // use BayesianCalculator (only 1-d parameter of interest, slow for this problem)
257 BayesianCalculator bc(*data, *modelConfig);
258 bc.SetConfidenceLevel(0.95);
259 SimpleInterval* bInt = NULL;
260 if(doBayesian && wspace->set("poi")->getSize() == 1) {
261 bInt = bc.GetInterval();
262 } else{
263 cout << "Bayesian Calc. only supports on parameter of interest" << endl;
264 }
265
266
267 // use MCMCCalculator (takes about 1 min)
268 // Want an efficient proposal function, so derive it from covariance
269 // matrix of fit
270 RooFitResult* fit = wspace->pdf("model")->fitTo(*data,Save());
274 ph.SetUpdateProposalParameters(kTRUE); // auto-create mean vars and add mappings
275 ph.SetCacheSize(100);
277
278 MCMCCalculator mc(*data, *modelConfig);
279 mc.SetConfidenceLevel(0.95);
280 mc.SetProposalFunction(*pf);
281 mc.SetNumBurnInSteps(500); // first N steps to be ignored as burn-in
282 mc.SetNumIters(50000);
283 mc.SetLeftSideTailFraction(0.5); // make a central interval
284 MCMCInterval* mcInt = NULL;
285 if(doMCMC)
286 mcInt = mc.GetInterval();
287
288 // ----------------------------------
289 // Make some plots
290 TCanvas* c1 = (TCanvas*) gROOT->Get("c1");
291 if(!c1)
292 c1 = new TCanvas("c1");
293
294 if(doBayesian && doMCMC){
295 c1->Divide(3);
296 c1->cd(1);
297 }
298 else if (doBayesian || doMCMC){
299 c1->Divide(2);
300 c1->cd(1);
301 }
302
304 lrplot->Draw();
305
306 if(doBayesian && wspace->set("poi")->getSize() == 1) {
307 c1->cd(2);
308 // the plot takes a long time and print lots of error
309 // using a scan it is better
310 bc.SetScanOfPosterior(20);
311 RooPlot* bplot = bc.GetPosteriorPlot();
312 bplot->Draw();
313 }
314
315 if(doMCMC){
316 if(doBayesian && wspace->set("poi")->getSize() == 1)
317 c1->cd(3);
318 else
319 c1->cd(2);
320 MCMCIntervalPlot mcPlot(*mcInt);
321 mcPlot.Draw();
322 }
323
324 // ----------------------------------
325 // querry intervals
326 cout << "Profile Likelihood interval on s = [" <<
327 plInt->LowerLimit( *wspace->var("s") ) << ", " <<
328 plInt->UpperLimit( *wspace->var("s") ) << "]" << endl;
329 //Profile Likelihood interval on s = [12.1902, 88.6871]
330
331
332 if(doBayesian && wspace->set("poi")->getSize() == 1) {
333 cout << "Bayesian interval on s = [" <<
334 bInt->LowerLimit( ) << ", " <<
335 bInt->UpperLimit( ) << "]" << endl;
336 }
337
338 if(doFeldmanCousins){
339 cout << "Feldman Cousins interval on s = [" <<
340 fcInt->LowerLimit( *wspace->var("s") ) << ", " <<
341 fcInt->UpperLimit( *wspace->var("s") ) << "]" << endl;
342 //Feldman Cousins interval on s = [18.75 +/- 2.45, 83.75 +/- 2.45]
343 }
344
345 if(doMCMC){
346 cout << "MCMC interval on s = [" <<
347 mcInt->LowerLimit(*wspace->var("s") ) << ", " <<
348 mcInt->UpperLimit(*wspace->var("s") ) << "]" << endl;
349 //MCMC interval on s = [15.7628, 84.7266]
350
351 }
352
353 t.Print();
354}
const Bool_t kTRUE
Definition: RtypesCore.h:87
#define gROOT
Definition: TROOT.h:410
static struct mg_connection * fc(struct mg_context *ctx)
Definition: civetweb.c:3728
Int_t getSize() const
RooDataSet * generate(const RooArgSet &whatVars, Int_t nEvents, const RooCmdArg &arg1, const RooCmdArg &arg2=RooCmdArg::none(), const RooCmdArg &arg3=RooCmdArg::none(), const RooCmdArg &arg4=RooCmdArg::none(), const RooCmdArg &arg5=RooCmdArg::none())
See RooAbsPdf::generate(const RooArgSet&,const RooCmdArg&,const RooCmdArg&,const RooCmdArg&,...
Definition: RooAbsPdf.h:56
virtual RooFitResult * fitTo(RooAbsData &data, const RooCmdArg &arg1=RooCmdArg::none(), const RooCmdArg &arg2=RooCmdArg::none(), const RooCmdArg &arg3=RooCmdArg::none(), const RooCmdArg &arg4=RooCmdArg::none(), const RooCmdArg &arg5=RooCmdArg::none(), const RooCmdArg &arg6=RooCmdArg::none(), const RooCmdArg &arg7=RooCmdArg::none(), const RooCmdArg &arg8=RooCmdArg::none())
Fit PDF to given dataset.
Definition: RooAbsPdf.cxx:1081
RooArgSet is a container object that can hold multiple RooAbsArg objects.
Definition: RooArgSet.h:28
RooDataSet is a container class to hold unbinned data.
Definition: RooDataSet.h:31
RooFitResult is a container class to hold the input and output of a PDF fit to a dataset.
Definition: RooFitResult.h:40
const TMatrixDSym & covarianceMatrix() const
Return covariance matrix.
const RooArgList & floatParsFinal() const
Definition: RooFitResult.h:110
static RooMsgService & instance()
Return reference to singleton instance.
void setGlobalKillBelow(RooFit::MsgLevel level)
RooFit::MsgLevel globalKillBelow() const
A RooPlot is a plot frame and a container for graphics objects within that frame.
Definition: RooPlot.h:41
virtual void Draw(Option_t *options=0)
Draw this plot and all of the elements it contains.
Definition: RooPlot.cxx:558
static TRandom * randomGenerator()
Return a pointer to a singleton random-number generator implementation.
Definition: RooRandom.cxx:54
BayesianCalculator is a concrete implementation of IntervalCalculator, providing the computation of a...
The FeldmanCousins class (like the Feldman-Cousins technique) is essentially a specific configuration...
This class provides simple and straightforward utilities to plot a LikelihoodInterval object.
void Draw(const Option_t *options=0)
draw the likelihood interval or contour for the 1D case a RooPlot is drawn by default of the profiled...
LikelihoodInterval is a concrete implementation of the RooStats::ConfInterval interface.
Double_t LowerLimit(const RooRealVar &param)
return the lower bound of the interval on a given parameter
Double_t UpperLimit(const RooRealVar &param)
return the upper bound of the interval on a given parameter
Bayesian Calculator estimating an interval or a credible region using the Markov-Chain Monte Carlo me...
This class provides simple and straightforward utilities to plot a MCMCInterval object.
MCMCInterval is a concrete implementation of the RooStats::ConfInterval interface.
Definition: MCMCInterval.h:30
virtual Double_t UpperLimit(RooRealVar &param)
get the highest value of param that is within the confidence interval
virtual Double_t LowerLimit(RooRealVar &param)
get the lowest value of param that is within the confidence interval
ModelConfig is a simple class that holds configuration information specifying how a model should be u...
Definition: ModelConfig.h:30
virtual void SetPriorPdf(const RooAbsPdf &pdf)
Set the Prior Pdf, add to the the workspace if not already there.
Definition: ModelConfig.h:87
virtual void SetWorkspace(RooWorkspace &ws)
Definition: ModelConfig.h:66
virtual void SetParametersOfInterest(const RooArgSet &set)
Definition: ModelConfig.h:99
virtual void SetNuisanceParameters(const RooArgSet &set)
specify the nuisance parameters (e.g. the rest of the parameters)
Definition: ModelConfig.h:114
virtual void SetPdf(const RooAbsPdf &pdf)
Set the Pdf, add to the the workspace if not already there.
Definition: ModelConfig.h:81
PointSetInterval is a concrete implementation of the ConfInterval interface.
Double_t UpperLimit(RooRealVar &param)
return upper limit on a given parameter
Double_t LowerLimit(RooRealVar &param)
return lower limit on a given parameter
ProfileLikelihoodCalculator is a concrete implementation of CombinedCalculator (the interface class f...
ProposalFunction is an interface for all proposal functions that would be used with a Markov Chain Mo...
virtual void SetCovMatrix(const TMatrixDSym &covMatrix)
virtual void SetUpdateProposalParameters(Bool_t updateParams)
virtual ProposalFunction * GetProposalFunction()
virtual void SetVariables(RooArgList &vars)
virtual void SetCacheSize(Int_t size)
SimpleInterval is a concrete implementation of the ConfInterval interface.
virtual Double_t UpperLimit()
virtual Double_t LowerLimit()
The RooWorkspace is a persistable container for RooFit projects.
Definition: RooWorkspace.h:43
Bool_t writeToFile(const char *fileName, Bool_t recreate=kTRUE)
Save this current workspace into given file.
Bool_t defineSet(const char *name, const RooArgSet &aset, Bool_t importMissing=kFALSE)
Define a named RooArgSet with given constituents.
RooRealVar * var(const char *name) const
Retrieve real-valued variable (RooRealVar) with given name. A null pointer is returned if not found.
Bool_t import(const RooAbsArg &arg, const RooCmdArg &arg1=RooCmdArg(), const RooCmdArg &arg2=RooCmdArg(), const RooCmdArg &arg3=RooCmdArg(), const RooCmdArg &arg4=RooCmdArg(), const RooCmdArg &arg5=RooCmdArg(), const RooCmdArg &arg6=RooCmdArg(), const RooCmdArg &arg7=RooCmdArg(), const RooCmdArg &arg8=RooCmdArg(), const RooCmdArg &arg9=RooCmdArg())
Import a RooAbsArg object, e.g.
RooFactoryWSTool & factory()
Return instance to factory tool.
const RooArgSet * set(const char *name)
Return pointer to previously defined named set with given nmame If no such set is found a null pointe...
RooAbsPdf * pdf(const char *name) const
Retrieve p.d.f (RooAbsPdf) with given name. A null pointer is returned if not found.
The Canvas class.
Definition: TCanvas.h:31
virtual void SetSeed(ULong_t seed=0)
Set the random generator seed.
Definition: TRandom.cxx:589
Stopwatch class.
Definition: TStopwatch.h:28
void Start(Bool_t reset=kTRUE)
Start the stopwatch.
Definition: TStopwatch.cxx:58
void Print(Option_t *option="") const
Print the real and cpu time passed between the start and stop events.
Definition: TStopwatch.cxx:219
return c1
Definition: legend1.C:41
RooCmdArg Save(Bool_t flag=kTRUE)
@(#)root/roostats:$Id$ Author: George Lewis, Kyle Cranmer
Definition: Asimov.h:20