Logo ROOT   6.16/01
Reference Guide
FourBinInstructional.C File Reference

Detailed Description

View in nbviewer Open in SWAN This example is a generalization of the on/off problem.

This example is a generalization of the on/off problem. It's a common setup for SUSY searches. Imagine that one has two variables "x" and "y" (eg. missing ET and SumET), see figure. The signal region has high values of both of these variables (top right). One can see low values of "x" or "y" acting as side-bands. If we just used "y" as a sideband, we would have the on/off problem.

tau ~ <expectation off> / <expectation on>

If tau is known, this model is sufficient, but often tau is not known exactly. So one can use low values of "x" as an additional constraint for tau. Note that this technique critically depends on the notion that the joint distribution for "x" and "y" can be factorized. Generally, these regions have many events, so it the ratio can be measured very precisely there. So we extend the model to describe the left two boxes... denoted with "bar".

tau ~ <expectation off bar> / <expectation on bar>
static constexpr double bar

One can further expand the model to account for the systematic associated to assuming the distribution of "x" and "y" factorizes (eg. that tau is the same for off/on and offbar/onbar). This can be done in several ways, but here we introduce an additional parameter rho, which so that one set of models will use tau and the other tau*rho. The choice is arbitrary, but it has consequences on the numerical stability of the algorithms. The "bar" measurements typically have more events (& smaller relative errors). If we choose

<expectation noffbar> = tau * rho * <expectation noonbar>

the product tau*rho will be known very precisely (~1/sqrt(bbar)) and the contour in those parameters will be narrow and have a non-trivial tau~1/rho shape. However, if we choose to put rho on the non/noff measurements (where the product will have an error ~1/sqrt(b)), the contours will be more amenable to numerical techniques. Thus, here we choose to define:

tau := <expectation off bar> / (<expectation on bar>)
rho := <expectation off> / (<expectation on> * tau)
^ y
|
|---------------------------+
| | |
| nonbar | non |
| bbar | s+b |
| | |
|---------------+-----------|
| | |
| noffbar | noff |
| tau bbar | tau b rho |
| | |
+-----------------------------> x
#define b(i)
Definition: RSha256.hxx:100
Double_t y[n]
Definition: legend1.C:17
Double_t x[n]
Definition: legend1.C:17
static constexpr double s

Left in this way, the problem is under-constrained. However, one may have some auxiliary measurement (usually based on Monte Carlo) to constrain rho. Let us call this auxiliary measurement that gives the nominal value of rho "rhonom". Thus, there is a 'constraint' term in the full model: P(rhonom | rho). In this case, we consider a Gaussian constraint with standard deviation sigma.

In the example, the initial values of the parameters are:

- s = 40
- b = 100
- tau = 5
- bbar = 1000
- rho = 1
(sigma for rho = 20%)
const Double_t sigma

and in the toy dataset:

- non = 139
- noff = 528
- nonbar = 993
- noffbar = 4906
- rhonom = 1.27824

Note, the covariance matrix of the parameters has large off-diagonal terms. Clearly s,b are anti-correlated. Similarly, since noffbar >> nonbar, one would expect bbar,tau to be anti-correlated.

This can be seen below.

GLOBAL b bbar rho s tau
b 0.96820 1.000 0.191 -0.942 -0.762 -0.209
bbar 0.91191 0.191 1.000 0.000 -0.146 -0.912
rho 0.96348 -0.942 0.000 1.000 0.718 -0.000
s 0.76250 -0.762 -0.146 0.718 1.000 0.160
tau 0.92084 -0.209 -0.912 -0.000 0.160 1.000

Similarly, since tau*rho appears as a product, we expect rho,tau to be anti-correlated. When the error on rho is significantly larger than 1/sqrt(bbar), tau is essentially known and the correlation is minimal (tau mainly cares about bbar, and rho about b,s). In the alternate parametrization (bbar* tau * rho) the correlation coefficient for rho,tau is large (and negative).

The code below uses best-practices for RooFit & RooStats as of June 2010.

It proceeds as follows:

␛[1mRooFit v3.60 -- Developed by Wouter Verkerke and David Kirkby␛[0m
Copyright (C) 2000-2013 NIKHEF, University of California & Stanford University
All rights reserved, please read http://roofit.sourceforge.net/license.txt
[#1] INFO:ObjectHandling -- RooWorkspace::import(wspace) importing dataset modelData
[#1] INFO:Minization -- createNLL: caching constraint set under name CONSTR_OF_PDF_model_FOR_OBS_noff:noffbar:non:nonbar:rhonom with 0 entries
[#0] PROGRESS:Minization -- ProfileLikelihoodCalcultor::DoGLobalFit - find MLE
[#0] PROGRESS:Minization -- ProfileLikelihoodCalcultor::DoMinimizeNLL - using Minuit / Migrad with strategy 1
[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
[#1] INFO:Minization -- The following expressions will be evaluated in cache-and-track mode: (on,off,onbar,offbar,mcCons)
[#1] INFO:Minization --
RooFitResult: minimized FCN value: 16.2872, estimated distance to minimum: 1.21263e-07
covariance matrix quality: Full, accurate covariance matrix
Status : MINIMIZE=0
Floating Parameter FinalValue +/- Error
-------------------- --------------------------
b 8.3602e+01 +/- 1.39e+01
bbar 9.9301e+02 +/- 3.15e+01
rho 1.2783e+00 +/- 1.99e-01
s 5.5397e+01 +/- 1.78e+01
tau 4.9405e+00 +/- 1.72e-01
Bayesian Calc. only supports on parameter of interest
[#1] INFO:Minization -- createNLL picked up cached consraints from workspace with 0 entries
[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
[#1] INFO:Minization -- The following expressions will be evaluated in cache-and-track mode: (on,off,onbar,offbar,mcCons)
**********
** 1 **SET PRINT 1
**********
**********
** 2 **SET NOGRAD
**********
PARAMETER DEFINITIONS:
NO. NAME VALUE STEP SIZE LIMITS
1 b 1.17958e+02 1.38657e+01 0.00000e+00 3.00000e+02
2 bbar 1.00111e+03 3.15028e+01 5.00000e+02 2.00000e+03
3 rho 9.28979e-01 1.98664e-01 0.00000e+00 2.00000e+00
4 s 1.20959e+01 1.78108e+01 0.00000e+00 1.00000e+02
MINUIT WARNING IN PARAMETR
============== VARIABLE4 BROUGHT BACK INSIDE LIMITS.
5 tau 4.89226e+00 1.71714e-01 3.00000e+00 7.00000e+00
**********
** 3 **SET ERR 0.5
**********
**********
** 4 **SET PRINT 1
**********
**********
** 5 **SET STR 1
**********
NOW USING STRATEGY 1: TRY TO BALANCE SPEED AGAINST RELIABILITY
**********
** 6 **MIGRAD 2500 1
**********
FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
START MIGRAD MINIMIZATION. STRATEGY 1. CONVERGENCE WHEN EDM .LT. 1.00e-03
FCN=18.2144 FROM MIGRAD STATUS=INITIATE 20 CALLS 21 TOTAL
EDM= unknown STRATEGY= 1 NO ERROR MATRIX
EXT PARAMETER CURRENT GUESS STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 b 1.17958e+02 1.38657e+01 9.47846e-02 -1.99668e-02
2 bbar 1.00111e+03 3.15028e+01 4.45472e-02 -1.41126e-01
3 rho 9.28979e-01 1.98664e-01 2.00529e-01 -1.44934e-02
4 s 1.20959e+01 1.78108e+01 5.77645e-01 -2.24297e+00
5 tau 4.89226e+00 1.71714e-01 8.60889e-02 -8.62116e-02
ERR DEF= 0.5
MIGRAD MINIMIZATION HAS CONVERGED.
MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=16.2872 FROM MIGRAD STATUS=CONVERGED 190 CALLS 191 TOTAL
EDM=3.60952e-06 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 b 8.35905e+01 1.38674e+01 7.30675e-05 2.40054e-02
2 bbar 9.93029e+02 3.15029e+01 5.19045e-05 -3.02691e-02
3 rho 1.27859e+00 1.98756e-01 1.58308e-04 1.62267e-02
4 s 5.54105e+01 1.78121e+01 6.70504e-04 3.43694e-04
5 tau 4.94037e+00 1.71705e-01 9.49080e-05 -2.33170e-02
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 5 ERR DEF=0.5
1.930e+02 8.359e+01 -2.620e+00 -1.929e+02 -5.000e-01
8.359e+01 9.931e+02 1.215e-04 -8.356e+01 -4.941e+00
-2.620e+00 1.215e-04 4.008e-02 2.619e+00 -7.270e-07
-1.929e+02 -8.356e+01 2.619e+00 3.319e+02 4.999e-01
-5.000e-01 -4.941e+00 -7.270e-07 4.999e-01 2.956e-02
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2 3 4 5
1 0.96819 1.000 0.191 -0.942 -0.762 -0.209
2 0.91196 0.191 1.000 0.000 -0.146 -0.912
3 0.96348 -0.942 0.000 1.000 0.718 -0.000
4 0.76233 -0.762 -0.146 0.718 1.000 0.160
5 0.92088 -0.209 -0.912 -0.000 0.160 1.000
**********
** 7 **SET ERR 0.5
**********
**********
** 8 **SET PRINT 1
**********
**********
** 9 **HESSE 2500
**********
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=16.2872 FROM HESSE STATUS=OK 31 CALLS 222 TOTAL
EDM=3.61105e-06 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER INTERNAL INTERNAL
NO. NAME VALUE ERROR STEP SIZE VALUE
1 b 8.35905e+01 1.38730e+01 1.46135e-05 -4.58641e-01
2 bbar 9.93029e+02 3.15087e+01 1.03809e-05 -3.49712e-01
3 rho 1.27859e+00 1.98830e-01 3.16616e-05 2.82321e-01
4 s 5.54105e+01 1.78187e+01 1.34101e-04 1.08422e-01
5 tau 4.94037e+00 1.71739e-01 1.89816e-05 -2.98215e-02
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 5 ERR DEF=0.5
1.931e+02 8.364e+01 -2.622e+00 -1.931e+02 -5.003e-01
8.364e+01 9.935e+02 -1.760e-04 -8.364e+01 -4.943e+00
-2.622e+00 -1.760e-04 4.011e-02 2.622e+00 1.053e-06
-1.931e+02 -8.364e+01 2.622e+00 3.321e+02 5.003e-01
-5.003e-01 -4.943e+00 1.053e-06 5.003e-01 2.957e-02
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2 3 4 5
1 0.96821 1.000 0.191 -0.942 -0.763 -0.209
2 0.91199 0.191 1.000 -0.000 -0.146 -0.912
3 0.96351 -0.942 -0.000 1.000 0.718 0.000
4 0.76255 -0.763 -0.146 0.718 1.000 0.160
5 0.92091 -0.209 -0.912 0.000 0.160 1.000
[#1] INFO:Minization -- RooMinimizer::optimizeConst: deactivating const optimization
[#1] INFO:Minization -- RooProfileLL::evaluate(nll_model_modelData_Profile[s]) Creating instance of MINUIT
[#1] INFO:Minization -- RooProfileLL::evaluate(nll_model_modelData_Profile[s]) determining minimum likelihood for current configurations w.r.t all observable
[#1] INFO:Minization -- RooProfileLL::evaluate(nll_model_modelData_Profile[s]) minimum found at (s=55.4077)
.
[#1] INFO:Minization -- RooProfileLL::evaluate(nll_model_modelData_Profile[s]) Creating instance of MINUIT
[#1] INFO:Minization -- RooProfileLL::evaluate(nll_model_modelData_Profile[s]) determining minimum likelihood for current configurations w.r.t all observable
[#0] ERROR:InputArguments -- RooArgSet::checkForDup: ERROR argument with name s is already in this set
[#1] INFO:Minization -- RooProfileLL::evaluate(nll_model_modelData_Profile[s]) minimum found at (s=55.4105)
..........................................................................................................................................................................................................Profile Likelihood interval on s = [12.1902, 88.6871]
Real time 0:00:00, CP time 0.280
#include "TStopwatch.h"
#include "TCanvas.h"
#include "TROOT.h"
#include "RooPlot.h"
#include "RooAbsPdf.h"
#include "RooWorkspace.h"
#include "RooDataSet.h"
#include "RooGlobalFunc.h"
#include "RooFitResult.h"
#include "RooRandom.h"
using namespace RooFit;
using namespace RooStats;
void FourBinInstructional(bool doBayesian=false, bool doFeldmanCousins=false, bool doMCMC=false){
// let's time this challenging example
t.Start();
// set RooFit random seed for reproducible results
// make model
RooWorkspace* wspace = new RooWorkspace("wspace");
wspace->factory("Poisson::on(non[0,1000], sum::splusb(s[40,0,100],b[100,0,300]))");
wspace->factory("Poisson::off(noff[0,5000], prod::taub(b,tau[5,3,7],rho[1,0,2]))");
wspace->factory("Poisson::onbar(nonbar[0,10000], bbar[1000,500,2000])");
wspace->factory("Poisson::offbar(noffbar[0,1000000], prod::lambdaoffbar(bbar, tau))");
wspace->factory("Gaussian::mcCons(rhonom[1.,0,2], rho, sigma[.2])");
wspace->factory("PROD::model(on,off,onbar,offbar,mcCons)");
wspace->defineSet("obs","non,noff,nonbar,noffbar,rhonom");
wspace->factory("Uniform::prior_poi({s})");
wspace->factory("Uniform::prior_nuis({b,bbar,tau, rho})");
wspace->factory("PROD::prior(prior_poi,prior_nuis)");
// ----------------------------------
// Control some interesting variations
// define parameers of interest
// for 1-d plots
wspace->defineSet("poi","s");
wspace->defineSet("nuis","b,tau,rho,bbar");
// for 2-d plots to inspect correlations:
// wspace->defineSet("poi","s,rho");
// test simpler cases where parameters are known.
// wspace->var("tau")->setConstant();
// wspace->var("rho")->setConstant();
// wspace->var("b")->setConstant();
// wspace->var("bbar")->setConstant();
// inspect workspace
// wspace->Print();
// ----------------------------------------------------------
// Generate toy data
// generate toy data assuming current value of the parameters
// import into workspace.
// add Verbose() to see how it's being generated
RooDataSet* data = wspace->pdf("model")->generate(*wspace->set("obs"),1);
// data->Print("v");
wspace->import(*data);
// ----------------------------------
// Now the statistical tests
// model config
ModelConfig* modelConfig = new ModelConfig("FourBins");
modelConfig->SetWorkspace(*wspace);
modelConfig->SetPdf(*wspace->pdf("model"));
modelConfig->SetPriorPdf(*wspace->pdf("prior"));
modelConfig->SetParametersOfInterest(*wspace->set("poi"));
modelConfig->SetNuisanceParameters(*wspace->set("nuis"));
wspace->import(*modelConfig);
wspace->writeToFile("FourBin.root");
// -------------------------------------------------
// If you want to see the covariance matrix uncomment
// wspace->pdf("model")->fitTo(*data);
// use ProfileLikelihood
ProfileLikelihoodCalculator plc(*data, *modelConfig);
plc.SetConfidenceLevel(0.95);
LikelihoodInterval* plInt = plc.GetInterval();
plInt->LowerLimit( *wspace->var("s") ); // get ugly print out of the way. Fix.
// use FeldmaCousins (takes ~20 min)
FeldmanCousins fc(*data, *modelConfig);
fc.SetConfidenceLevel(0.95);
//number counting: dataset always has 1 entry with N events observed
fc.FluctuateNumDataEntries(false);
fc.UseAdaptiveSampling(true);
fc.SetNBins(40);
PointSetInterval* fcInt = NULL;
if(doFeldmanCousins){ // takes 7 minutes
fcInt = (PointSetInterval*) fc.GetInterval(); // fix cast
}
// use BayesianCalculator (only 1-d parameter of interest, slow for this problem)
BayesianCalculator bc(*data, *modelConfig);
bc.SetConfidenceLevel(0.95);
SimpleInterval* bInt = NULL;
if(doBayesian && wspace->set("poi")->getSize() == 1) {
bInt = bc.GetInterval();
} else{
cout << "Bayesian Calc. only supports on parameter of interest" << endl;
}
// use MCMCCalculator (takes about 1 min)
// Want an efficient proposal function, so derive it from covariance
// matrix of fit
RooFitResult* fit = wspace->pdf("model")->fitTo(*data,Save());
ProposalHelper ph;
ph.SetVariables((RooArgSet&)fit->floatParsFinal());
ph.SetCovMatrix(fit->covarianceMatrix());
ph.SetUpdateProposalParameters(kTRUE); // auto-create mean vars and add mappings
ph.SetCacheSize(100);
ProposalFunction* pf = ph.GetProposalFunction();
MCMCCalculator mc(*data, *modelConfig);
mc.SetConfidenceLevel(0.95);
mc.SetProposalFunction(*pf);
mc.SetNumBurnInSteps(500); // first N steps to be ignored as burn-in
mc.SetNumIters(50000);
mc.SetLeftSideTailFraction(0.5); // make a central interval
MCMCInterval* mcInt = NULL;
if(doMCMC)
mcInt = mc.GetInterval();
// ----------------------------------
// Make some plots
TCanvas* c1 = (TCanvas*) gROOT->Get("c1");
if(!c1)
c1 = new TCanvas("c1");
if(doBayesian && doMCMC){
c1->Divide(3);
c1->cd(1);
}
else if (doBayesian || doMCMC){
c1->Divide(2);
c1->cd(1);
}
LikelihoodIntervalPlot* lrplot = new LikelihoodIntervalPlot(plInt);
lrplot->Draw();
if(doBayesian && wspace->set("poi")->getSize() == 1) {
c1->cd(2);
// the plot takes a long time and print lots of error
// using a scan it is better
bc.SetScanOfPosterior(20);
RooPlot* bplot = bc.GetPosteriorPlot();
bplot->Draw();
}
if(doMCMC){
if(doBayesian && wspace->set("poi")->getSize() == 1)
c1->cd(3);
else
c1->cd(2);
MCMCIntervalPlot mcPlot(*mcInt);
mcPlot.Draw();
}
// ----------------------------------
// querry intervals
cout << "Profile Likelihood interval on s = [" <<
plInt->LowerLimit( *wspace->var("s") ) << ", " <<
plInt->UpperLimit( *wspace->var("s") ) << "]" << endl;
//Profile Likelihood interval on s = [12.1902, 88.6871]
if(doBayesian && wspace->set("poi")->getSize() == 1) {
cout << "Bayesian interval on s = [" <<
bInt->LowerLimit( ) << ", " <<
bInt->UpperLimit( ) << "]" << endl;
}
if(doFeldmanCousins){
cout << "Feldman Cousins interval on s = [" <<
fcInt->LowerLimit( *wspace->var("s") ) << ", " <<
fcInt->UpperLimit( *wspace->var("s") ) << "]" << endl;
//Feldman Cousins interval on s = [18.75 +/- 2.45, 83.75 +/- 2.45]
}
if(doMCMC){
cout << "MCMC interval on s = [" <<
mcInt->LowerLimit(*wspace->var("s") ) << ", " <<
mcInt->UpperLimit(*wspace->var("s") ) << "]" << endl;
//MCMC interval on s = [15.7628, 84.7266]
}
t.Print();
}
const Bool_t kTRUE
Definition: RtypesCore.h:87
#define gROOT
Definition: TROOT.h:410
static struct mg_connection * fc(struct mg_context *ctx)
Definition: civetweb.c:3728
Int_t getSize() const
RooDataSet * generate(const RooArgSet &whatVars, Int_t nEvents, const RooCmdArg &arg1, const RooCmdArg &arg2=RooCmdArg::none(), const RooCmdArg &arg3=RooCmdArg::none(), const RooCmdArg &arg4=RooCmdArg::none(), const RooCmdArg &arg5=RooCmdArg::none())
See RooAbsPdf::generate(const RooArgSet&,const RooCmdArg&,const RooCmdArg&,const RooCmdArg&,...
Definition: RooAbsPdf.h:56
virtual RooFitResult * fitTo(RooAbsData &data, const RooCmdArg &arg1=RooCmdArg::none(), const RooCmdArg &arg2=RooCmdArg::none(), const RooCmdArg &arg3=RooCmdArg::none(), const RooCmdArg &arg4=RooCmdArg::none(), const RooCmdArg &arg5=RooCmdArg::none(), const RooCmdArg &arg6=RooCmdArg::none(), const RooCmdArg &arg7=RooCmdArg::none(), const RooCmdArg &arg8=RooCmdArg::none())
Fit PDF to given dataset.
Definition: RooAbsPdf.cxx:1081
RooArgSet is a container object that can hold multiple RooAbsArg objects.
Definition: RooArgSet.h:28
RooDataSet is a container class to hold unbinned data.
Definition: RooDataSet.h:31
RooFitResult is a container class to hold the input and output of a PDF fit to a dataset.
Definition: RooFitResult.h:40
const TMatrixDSym & covarianceMatrix() const
Return covariance matrix.
const RooArgList & floatParsFinal() const
Definition: RooFitResult.h:110
static RooMsgService & instance()
Return reference to singleton instance.
void setGlobalKillBelow(RooFit::MsgLevel level)
RooFit::MsgLevel globalKillBelow() const
A RooPlot is a plot frame and a container for graphics objects within that frame.
Definition: RooPlot.h:41
virtual void Draw(Option_t *options=0)
Draw this plot and all of the elements it contains.
Definition: RooPlot.cxx:558
static TRandom * randomGenerator()
Return a pointer to a singleton random-number generator implementation.
Definition: RooRandom.cxx:54
The RooWorkspace is a persistable container for RooFit projects.
Definition: RooWorkspace.h:43
Bool_t writeToFile(const char *fileName, Bool_t recreate=kTRUE)
Save this current workspace into given file.
Bool_t defineSet(const char *name, const RooArgSet &aset, Bool_t importMissing=kFALSE)
Define a named RooArgSet with given constituents.
RooRealVar * var(const char *name) const
Retrieve real-valued variable (RooRealVar) with given name. A null pointer is returned if not found.
Bool_t import(const RooAbsArg &arg, const RooCmdArg &arg1=RooCmdArg(), const RooCmdArg &arg2=RooCmdArg(), const RooCmdArg &arg3=RooCmdArg(), const RooCmdArg &arg4=RooCmdArg(), const RooCmdArg &arg5=RooCmdArg(), const RooCmdArg &arg6=RooCmdArg(), const RooCmdArg &arg7=RooCmdArg(), const RooCmdArg &arg8=RooCmdArg(), const RooCmdArg &arg9=RooCmdArg())
Import a RooAbsArg object, e.g.
RooFactoryWSTool & factory()
Return instance to factory tool.
const RooArgSet * set(const char *name)
Return pointer to previously defined named set with given nmame If no such set is found a null pointe...
RooAbsPdf * pdf(const char *name) const
Retrieve p.d.f (RooAbsPdf) with given name. A null pointer is returned if not found.
The Canvas class.
Definition: TCanvas.h:31
virtual void SetSeed(ULong_t seed=0)
Set the random generator seed.
Definition: TRandom.cxx:589
Stopwatch class.
Definition: TStopwatch.h:28
void Start(Bool_t reset=kTRUE)
Start the stopwatch.
Definition: TStopwatch.cxx:58
void Print(Option_t *option="") const
Print the real and cpu time passed between the start and stop events.
Definition: TStopwatch.cxx:219
return c1
Definition: legend1.C:41
RooCmdArg Save(Bool_t flag=kTRUE)
@(#)root/roostats:$Id$ Author: George Lewis, Kyle Cranmer
Definition: Asimov.h:20
Authors
authors: Kyle Cranmer, Tanja Rommerskirchen

Definition in file FourBinInstructional.C.