Logo ROOT   6.18/05
Reference Guide
FourBinInstructional.C
Go to the documentation of this file.
1/// \file
2/// \ingroup tutorial_roostats
3/// \notebook
4/// This example is a generalization of the on/off problem.
5///
6/// This example is a generalization of the on/off problem.
7/// It's a common setup for SUSY searches. Imagine that one has two
8/// variables "x" and "y" (eg. missing ET and SumET), see figure.
9/// The signal region has high values of both of these variables (top right).
10/// One can see low values of "x" or "y" acting as side-bands. If we
11/// just used "y" as a sideband, we would have the on/off problem.
12/// - In the signal region we observe non events and expect s+b events.
13/// - In the region with low values of "y" (bottom right)
14/// we observe noff events and expect tau*b events.
15/// Note the significance of tau. In the background only case:
16///
17/// ~~~{.cpp}
18/// tau ~ <expectation off> / <expectation on>
19/// ~~~
20///
21/// If tau is known, this model is sufficient, but often tau is not known exactly.
22/// So one can use low values of "x" as an additional constraint for tau.
23/// Note that this technique critically depends on the notion that the
24/// joint distribution for "x" and "y" can be factorized.
25/// Generally, these regions have many events, so it the ratio can be
26/// measured very precisely there. So we extend the model to describe the
27/// left two boxes... denoted with "bar".
28/// - In the upper left we observe nonbar events and expect bbar events
29/// - In the bottom left we observe noffbar events and expect tau bbar events
30/// Note again we have:
31///
32/// ~~~{.cpp}
33/// tau ~ <expectation off bar> / <expectation on bar>
34/// ~~~
35///
36/// One can further expand the model to account for the systematic associated
37/// to assuming the distribution of "x" and "y" factorizes (eg. that
38/// tau is the same for off/on and offbar/onbar). This can be done in several
39/// ways, but here we introduce an additional parameter rho, which so that
40/// one set of models will use tau and the other tau*rho. The choice is arbitrary,
41/// but it has consequences on the numerical stability of the algorithms.
42/// The "bar" measurements typically have more events (& smaller relative errors).
43/// If we choose
44///
45/// ~~~{.cpp}
46/// <expectation noffbar> = tau * rho * <expectation noonbar>
47/// ~~~
48///
49/// the product tau*rho will be known very precisely (~1/sqrt(bbar)) and the contour
50/// in those parameters will be narrow and have a non-trivial tau~1/rho shape.
51/// However, if we choose to put rho on the non/noff measurements (where the
52/// product will have an error `~1/sqrt(b))`, the contours will be more amenable
53/// to numerical techniques. Thus, here we choose to define:
54///
55/// ~~~{.cpp}
56/// tau := <expectation off bar> / (<expectation on bar>)
57/// rho := <expectation off> / (<expectation on> * tau)
58///
59/// ^ y
60/// |
61/// |---------------------------+
62/// | | |
63/// | nonbar | non |
64/// | bbar | s+b |
65/// | | |
66/// |---------------+-----------|
67/// | | |
68/// | noffbar | noff |
69/// | tau bbar | tau b rho |
70/// | | |
71/// +-----------------------------> x
72/// ~~~
73///
74/// Left in this way, the problem is under-constrained. However, one may
75/// have some auxiliary measurement (usually based on Monte Carlo) to
76/// constrain rho. Let us call this auxiliary measurement that gives
77/// the nominal value of rho "rhonom". Thus, there is a 'constraint' term in
78/// the full model: P(rhonom | rho). In this case, we consider a Gaussian
79/// constraint with standard deviation sigma.
80///
81/// In the example, the initial values of the parameters are:
82///
83/// ~~~{.cpp}
84/// - s = 40
85/// - b = 100
86/// - tau = 5
87/// - bbar = 1000
88/// - rho = 1
89/// (sigma for rho = 20%)
90/// ~~~
91///
92/// and in the toy dataset:
93///
94/// ~~~{.cpp}
95/// - non = 139
96/// - noff = 528
97/// - nonbar = 993
98/// - noffbar = 4906
99/// - rhonom = 1.27824
100/// ~~~
101///
102/// Note, the covariance matrix of the parameters has large off-diagonal terms.
103/// Clearly s,b are anti-correlated. Similarly, since noffbar >> nonbar, one would
104/// expect bbar,tau to be anti-correlated.
105///
106/// This can be seen below.
107///
108/// ~~~{.cpp}
109/// GLOBAL b bbar rho s tau
110/// b 0.96820 1.000 0.191 -0.942 -0.762 -0.209
111/// bbar 0.91191 0.191 1.000 0.000 -0.146 -0.912
112/// rho 0.96348 -0.942 0.000 1.000 0.718 -0.000
113/// s 0.76250 -0.762 -0.146 0.718 1.000 0.160
114/// tau 0.92084 -0.209 -0.912 -0.000 0.160 1.000
115/// ~~~
116///
117/// Similarly, since tau*rho appears as a product, we expect rho,tau
118/// to be anti-correlated. When the error on rho is significantly
119/// larger than 1/sqrt(bbar), tau is essentially known and the
120/// correlation is minimal (tau mainly cares about bbar, and rho about b,s).
121/// In the alternate parametrization (bbar* tau * rho) the correlation coefficient
122/// for rho,tau is large (and negative).
123///
124/// The code below uses best-practices for RooFit & RooStats as of June 2010.
125///
126/// It proceeds as follows:
127/// - create a workspace to hold the model
128/// - use workspace factory to quickly create the terms of the model
129/// - use workspace factory to define total model (a prod pdf)
130/// - create a RooStats ModelConfig to specify observables, parameters of interest
131/// - add to the ModelConfig a prior on the parameters for Bayesian techniques
132/// note, the pdf it is factorized for parameters of interest & nuisance params
133/// - visualize the model
134/// - write the workspace to a file
135/// - use several of RooStats IntervalCalculators & compare results
136///
137/// \macro_image
138/// \macro_output
139/// \macro_code
140///
141/// \authors authors: Kyle Cranmer, Tanja Rommerskirchen
142
143#include "TStopwatch.h"
144#include "TCanvas.h"
145#include "TROOT.h"
146#include "RooPlot.h"
147#include "RooAbsPdf.h"
148#include "RooWorkspace.h"
149#include "RooDataSet.h"
150#include "RooGlobalFunc.h"
151#include "RooFitResult.h"
152#include "RooRandom.h"
164
165using namespace RooFit;
166using namespace RooStats;
167
168void FourBinInstructional(bool doBayesian = false, bool doFeldmanCousins = false, bool doMCMC = false)
169{
170
171 // let's time this challenging example
172 TStopwatch t;
173 t.Start();
174
175 // set RooFit random seed for reproducible results
177
178 // make model
179 RooWorkspace *wspace = new RooWorkspace("wspace");
180 wspace->factory("Poisson::on(non[0,1000], sum::splusb(s[40,0,100],b[100,0,300]))");
181 wspace->factory("Poisson::off(noff[0,5000], prod::taub(b,tau[5,3,7],rho[1,0,2]))");
182 wspace->factory("Poisson::onbar(nonbar[0,10000], bbar[1000,500,2000])");
183 wspace->factory("Poisson::offbar(noffbar[0,1000000], prod::lambdaoffbar(bbar, tau))");
184 wspace->factory("Gaussian::mcCons(rhonom[1.,0,2], rho, sigma[.2])");
185 wspace->factory("PROD::model(on,off,onbar,offbar,mcCons)");
186 wspace->defineSet("obs", "non,noff,nonbar,noffbar,rhonom");
187
188 wspace->factory("Uniform::prior_poi({s})");
189 wspace->factory("Uniform::prior_nuis({b,bbar,tau, rho})");
190 wspace->factory("PROD::prior(prior_poi,prior_nuis)");
191
192 // ----------------------------------
193 // Control some interesting variations
194 // define parameers of interest
195 // for 1-d plots
196 wspace->defineSet("poi", "s");
197 wspace->defineSet("nuis", "b,tau,rho,bbar");
198 // for 2-d plots to inspect correlations:
199 // wspace->defineSet("poi","s,rho");
200
201 // test simpler cases where parameters are known.
202 // wspace->var("tau")->setConstant();
203 // wspace->var("rho")->setConstant();
204 // wspace->var("b")->setConstant();
205 // wspace->var("bbar")->setConstant();
206
207 // inspect workspace
208 // wspace->Print();
209
210 // ----------------------------------------------------------
211 // Generate toy data
212 // generate toy data assuming current value of the parameters
213 // import into workspace.
214 // add Verbose() to see how it's being generated
215 RooDataSet *data = wspace->pdf("model")->generate(*wspace->set("obs"), 1);
216 // data->Print("v");
217 wspace->import(*data);
218
219 // ----------------------------------
220 // Now the statistical tests
221 // model config
222 ModelConfig *modelConfig = new ModelConfig("FourBins");
223 modelConfig->SetWorkspace(*wspace);
224 modelConfig->SetPdf(*wspace->pdf("model"));
225 modelConfig->SetPriorPdf(*wspace->pdf("prior"));
226 modelConfig->SetParametersOfInterest(*wspace->set("poi"));
227 modelConfig->SetNuisanceParameters(*wspace->set("nuis"));
228 wspace->import(*modelConfig);
229 wspace->writeToFile("FourBin.root");
230
231 // -------------------------------------------------
232 // If you want to see the covariance matrix uncomment
233 // wspace->pdf("model")->fitTo(*data);
234
235 // use ProfileLikelihood
236 ProfileLikelihoodCalculator plc(*data, *modelConfig);
237 plc.SetConfidenceLevel(0.95);
238 LikelihoodInterval *plInt = plc.GetInterval();
241 plInt->LowerLimit(*wspace->var("s")); // get ugly print out of the way. Fix.
243
244 // use FeldmaCousins (takes ~20 min)
245 FeldmanCousins fc(*data, *modelConfig);
246 fc.SetConfidenceLevel(0.95);
247 // number counting: dataset always has 1 entry with N events observed
248 fc.FluctuateNumDataEntries(false);
249 fc.UseAdaptiveSampling(true);
250 fc.SetNBins(40);
251 PointSetInterval *fcInt = NULL;
252 if (doFeldmanCousins) { // takes 7 minutes
253 fcInt = (PointSetInterval *)fc.GetInterval(); // fix cast
254 }
255
256 // use BayesianCalculator (only 1-d parameter of interest, slow for this problem)
257 BayesianCalculator bc(*data, *modelConfig);
258 bc.SetConfidenceLevel(0.95);
259 SimpleInterval *bInt = NULL;
260 if (doBayesian && wspace->set("poi")->getSize() == 1) {
261 bInt = bc.GetInterval();
262 } else {
263 cout << "Bayesian Calc. only supports on parameter of interest" << endl;
264 }
265
266 // use MCMCCalculator (takes about 1 min)
267 // Want an efficient proposal function, so derive it from covariance
268 // matrix of fit
269 RooFitResult *fit = wspace->pdf("model")->fitTo(*data, Save());
273 ph.SetUpdateProposalParameters(kTRUE); // auto-create mean vars and add mappings
274 ph.SetCacheSize(100);
276
277 MCMCCalculator mc(*data, *modelConfig);
278 mc.SetConfidenceLevel(0.95);
279 mc.SetProposalFunction(*pf);
280 mc.SetNumBurnInSteps(500); // first N steps to be ignored as burn-in
281 mc.SetNumIters(50000);
282 mc.SetLeftSideTailFraction(0.5); // make a central interval
283 MCMCInterval *mcInt = NULL;
284 if (doMCMC)
285 mcInt = mc.GetInterval();
286
287 // ----------------------------------
288 // Make some plots
289 TCanvas *c1 = (TCanvas *)gROOT->Get("c1");
290 if (!c1)
291 c1 = new TCanvas("c1");
292
293 if (doBayesian && doMCMC) {
294 c1->Divide(3);
295 c1->cd(1);
296 } else if (doBayesian || doMCMC) {
297 c1->Divide(2);
298 c1->cd(1);
299 }
300
302 lrplot->Draw();
303
304 if (doBayesian && wspace->set("poi")->getSize() == 1) {
305 c1->cd(2);
306 // the plot takes a long time and print lots of error
307 // using a scan it is better
308 bc.SetScanOfPosterior(20);
309 RooPlot *bplot = bc.GetPosteriorPlot();
310 bplot->Draw();
311 }
312
313 if (doMCMC) {
314 if (doBayesian && wspace->set("poi")->getSize() == 1)
315 c1->cd(3);
316 else
317 c1->cd(2);
318 MCMCIntervalPlot mcPlot(*mcInt);
319 mcPlot.Draw();
320 }
321
322 // ----------------------------------
323 // querry intervals
324 cout << "Profile Likelihood interval on s = [" << plInt->LowerLimit(*wspace->var("s")) << ", "
325 << plInt->UpperLimit(*wspace->var("s")) << "]" << endl;
326 // Profile Likelihood interval on s = [12.1902, 88.6871]
327
328 if (doBayesian && wspace->set("poi")->getSize() == 1) {
329 cout << "Bayesian interval on s = [" << bInt->LowerLimit() << ", " << bInt->UpperLimit() << "]" << endl;
330 }
331
332 if (doFeldmanCousins) {
333 cout << "Feldman Cousins interval on s = [" << fcInt->LowerLimit(*wspace->var("s")) << ", "
334 << fcInt->UpperLimit(*wspace->var("s")) << "]" << endl;
335 // Feldman Cousins interval on s = [18.75 +/- 2.45, 83.75 +/- 2.45]
336 }
337
338 if (doMCMC) {
339 cout << "MCMC interval on s = [" << mcInt->LowerLimit(*wspace->var("s")) << ", "
340 << mcInt->UpperLimit(*wspace->var("s")) << "]" << endl;
341 // MCMC interval on s = [15.7628, 84.7266]
342 }
343
344 t.Print();
345}
const Bool_t kTRUE
Definition: RtypesCore.h:87
#define gROOT
Definition: TROOT.h:414
static struct mg_connection * fc(struct mg_context *ctx)
Definition: civetweb.c:3728
Int_t getSize() const
RooDataSet * generate(const RooArgSet &whatVars, Int_t nEvents, const RooCmdArg &arg1, const RooCmdArg &arg2=RooCmdArg::none(), const RooCmdArg &arg3=RooCmdArg::none(), const RooCmdArg &arg4=RooCmdArg::none(), const RooCmdArg &arg5=RooCmdArg::none())
See RooAbsPdf::generate(const RooArgSet&,const RooCmdArg&,const RooCmdArg&,const RooCmdArg&,...
Definition: RooAbsPdf.h:56
virtual RooFitResult * fitTo(RooAbsData &data, const RooCmdArg &arg1=RooCmdArg::none(), const RooCmdArg &arg2=RooCmdArg::none(), const RooCmdArg &arg3=RooCmdArg::none(), const RooCmdArg &arg4=RooCmdArg::none(), const RooCmdArg &arg5=RooCmdArg::none(), const RooCmdArg &arg6=RooCmdArg::none(), const RooCmdArg &arg7=RooCmdArg::none(), const RooCmdArg &arg8=RooCmdArg::none())
Fit PDF to given dataset.
Definition: RooAbsPdf.cxx:1119
RooArgSet is a container object that can hold multiple RooAbsArg objects.
Definition: RooArgSet.h:28
RooDataSet is a container class to hold unbinned data.
Definition: RooDataSet.h:31
RooFitResult is a container class to hold the input and output of a PDF fit to a dataset.
Definition: RooFitResult.h:40
const TMatrixDSym & covarianceMatrix() const
Return covariance matrix.
const RooArgList & floatParsFinal() const
Definition: RooFitResult.h:110
static RooMsgService & instance()
Return reference to singleton instance.
void setGlobalKillBelow(RooFit::MsgLevel level)
RooFit::MsgLevel globalKillBelow() const
A RooPlot is a plot frame and a container for graphics objects within that frame.
Definition: RooPlot.h:41
virtual void Draw(Option_t *options=0)
Draw this plot and all of the elements it contains.
Definition: RooPlot.cxx:558
static TRandom * randomGenerator()
Return a pointer to a singleton random-number generator implementation.
Definition: RooRandom.cxx:54
BayesianCalculator is a concrete implementation of IntervalCalculator, providing the computation of a...
The FeldmanCousins class (like the Feldman-Cousins technique) is essentially a specific configuration...
This class provides simple and straightforward utilities to plot a LikelihoodInterval object.
void Draw(const Option_t *options=0)
draw the likelihood interval or contour for the 1D case a RooPlot is drawn by default of the profiled...
LikelihoodInterval is a concrete implementation of the RooStats::ConfInterval interface.
Double_t LowerLimit(const RooRealVar &param)
return the lower bound of the interval on a given parameter
Double_t UpperLimit(const RooRealVar &param)
return the upper bound of the interval on a given parameter
Bayesian Calculator estimating an interval or a credible region using the Markov-Chain Monte Carlo me...
This class provides simple and straightforward utilities to plot a MCMCInterval object.
MCMCInterval is a concrete implementation of the RooStats::ConfInterval interface.
Definition: MCMCInterval.h:30
virtual Double_t UpperLimit(RooRealVar &param)
get the highest value of param that is within the confidence interval
virtual Double_t LowerLimit(RooRealVar &param)
get the lowest value of param that is within the confidence interval
ModelConfig is a simple class that holds configuration information specifying how a model should be u...
Definition: ModelConfig.h:30
virtual void SetPriorPdf(const RooAbsPdf &pdf)
Set the Prior Pdf, add to the the workspace if not already there.
Definition: ModelConfig.h:87
virtual void SetWorkspace(RooWorkspace &ws)
Definition: ModelConfig.h:66
virtual void SetParametersOfInterest(const RooArgSet &set)
Definition: ModelConfig.h:99
virtual void SetNuisanceParameters(const RooArgSet &set)
specify the nuisance parameters (e.g. the rest of the parameters)
Definition: ModelConfig.h:114
virtual void SetPdf(const RooAbsPdf &pdf)
Set the Pdf, add to the the workspace if not already there.
Definition: ModelConfig.h:81
PointSetInterval is a concrete implementation of the ConfInterval interface.
Double_t UpperLimit(RooRealVar &param)
return upper limit on a given parameter
Double_t LowerLimit(RooRealVar &param)
return lower limit on a given parameter
The ProfileLikelihoodCalculator is a concrete implementation of CombinedCalculator (the interface cla...
ProposalFunction is an interface for all proposal functions that would be used with a Markov Chain Mo...
virtual void SetCovMatrix(const TMatrixDSym &covMatrix)
virtual void SetUpdateProposalParameters(Bool_t updateParams)
virtual ProposalFunction * GetProposalFunction()
virtual void SetVariables(RooArgList &vars)
virtual void SetCacheSize(Int_t size)
SimpleInterval is a concrete implementation of the ConfInterval interface.
virtual Double_t UpperLimit()
virtual Double_t LowerLimit()
The RooWorkspace is a persistable container for RooFit projects.
Definition: RooWorkspace.h:43
Bool_t writeToFile(const char *fileName, Bool_t recreate=kTRUE)
Save this current workspace into given file.
Bool_t defineSet(const char *name, const RooArgSet &aset, Bool_t importMissing=kFALSE)
Define a named RooArgSet with given constituents.
RooRealVar * var(const char *name) const
Retrieve real-valued variable (RooRealVar) with given name. A null pointer is returned if not found.
Bool_t import(const RooAbsArg &arg, const RooCmdArg &arg1=RooCmdArg(), const RooCmdArg &arg2=RooCmdArg(), const RooCmdArg &arg3=RooCmdArg(), const RooCmdArg &arg4=RooCmdArg(), const RooCmdArg &arg5=RooCmdArg(), const RooCmdArg &arg6=RooCmdArg(), const RooCmdArg &arg7=RooCmdArg(), const RooCmdArg &arg8=RooCmdArg(), const RooCmdArg &arg9=RooCmdArg())
Import a RooAbsArg object, e.g.
RooFactoryWSTool & factory()
Return instance to factory tool.
const RooArgSet * set(const char *name)
Return pointer to previously defined named set with given nmame If no such set is found a null pointe...
RooAbsPdf * pdf(const char *name) const
Retrieve p.d.f (RooAbsPdf) with given name. A null pointer is returned if not found.
The Canvas class.
Definition: TCanvas.h:31
virtual void SetSeed(ULong_t seed=0)
Set the random generator seed.
Definition: TRandom.cxx:597
Stopwatch class.
Definition: TStopwatch.h:28
void Start(Bool_t reset=kTRUE)
Start the stopwatch.
Definition: TStopwatch.cxx:58
void Print(Option_t *option="") const
Print the real and cpu time passed between the start and stop events.
Definition: TStopwatch.cxx:219
return c1
Definition: legend1.C:41
Template specialisation used in RooAbsArg:
RooCmdArg Save(Bool_t flag=kTRUE)
Namespace for the RooStats classes.
Definition: Asimov.h:20