Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
FourBinInstructional.C File Reference

Detailed Description

View in nbviewer Open in SWAN
This example is a generalization of the on/off problem.

This example is a generalization of the on/off problem. It's a common setup for SUSY searches. Imagine that one has two variables "x" and "y" (eg. missing ET and SumET), see figure. The signal region has high values of both of these variables (top right). One can see low values of "x" or "y" acting as side-bands. If we just used "y" as a sideband, we would have the on/off problem.

  • In the signal region we observe non events and expect s+b events.
  • In the region with low values of "y" (bottom right) we observe noff events and expect tau*b events. Note the significance of tau. In the background only case:
tau ~ <expectation off> / <expectation on>
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void on

If tau is known, this model is sufficient, but often tau is not known exactly. So one can use low values of "x" as an additional constraint for tau. Note that this technique critically depends on the notion that the joint distribution for "x" and "y" can be factorized. Generally, these regions have many events, so it the ratio can be measured very precisely there. So we extend the model to describe the left two boxes... denoted with "bar".

  • In the upper left we observe nonbar events and expect bbar events
  • In the bottom left we observe noffbar events and expect tau bbar events Note again we have:
tau ~ <expectation off bar> / <expectation on bar>

One can further expand the model to account for the systematic associated to assuming the distribution of "x" and "y" factorizes (eg. that tau is the same for off/on and offbar/onbar). This can be done in several ways, but here we introduce an additional parameter rho, which so that one set of models will use tau and the other tau*rho. The choice is arbitrary, but it has consequences on the numerical stability of the algorithms. The "bar" measurements typically have more events (& smaller relative errors). If we choose

<expectation noffbar> = tau * rho * <expectation noonbar>

the product tau*rho will be known very precisely (~1/sqrt(bbar)) and the contour in those parameters will be narrow and have a non-trivial tau~1/rho shape. However, if we choose to put rho on the non/noff measurements (where the product will have an error ~1/sqrt(b)), the contours will be more amenable to numerical techniques. Thus, here we choose to define:

tau := <expectation off bar> / (<expectation on bar>)
rho := <expectation off> / (<expectation on> * tau)
^ y
|
|---------------------------+
| | |
| nonbar | non |
| bbar | s+b |
| | |
|---------------+-----------|
| | |
| noffbar | noff |
| tau bbar | tau b rho |
| | |
+-----------------------------> x
#define b(i)
Definition RSha256.hxx:100
Double_t y[n]
Definition legend1.C:17
Double_t x[n]
Definition legend1.C:17

Left in this way, the problem is under-constrained. However, one may have some auxiliary measurement (usually based on Monte Carlo) to constrain rho. Let us call this auxiliary measurement that gives the nominal value of rho "rhonom". Thus, there is a 'constraint' term in the full model: P(rhonom | rho). In this case, we consider a Gaussian constraint with standard deviation sigma.

In the example, the initial values of the parameters are:

- s = 40
- b = 100
- tau = 5
- bbar = 1000
- rho = 1
(sigma for rho = 20%)
const Double_t sigma

and in the toy dataset:

- non = 139
- noff = 528
- nonbar = 993
- noffbar = 4906
- rhonom = 1.27824

Note, the covariance matrix of the parameters has large off-diagonal terms. Clearly s,b are anti-correlated. Similarly, since noffbar >> nonbar, one would expect bbar,tau to be anti-correlated.

This can be seen below.

GLOBAL b bbar rho s tau
b 0.96820 1.000 0.191 -0.942 -0.762 -0.209
bbar 0.91191 0.191 1.000 0.000 -0.146 -0.912
rho 0.96348 -0.942 0.000 1.000 0.718 -0.000
s 0.76250 -0.762 -0.146 0.718 1.000 0.160
tau 0.92084 -0.209 -0.912 -0.000 0.160 1.000

Similarly, since tau*rho appears as a product, we expect rho,tau to be anti-correlated. When the error on rho is significantly larger than 1/sqrt(bbar), tau is essentially known and the correlation is minimal (tau mainly cares about bbar, and rho about b,s). In the alternate parametrization (bbar* tau * rho) the correlation coefficient for rho,tau is large (and negative).

The code below uses best-practices for RooFit & RooStats as of June 2010.

It proceeds as follows:

  • create a workspace to hold the model
  • use workspace factory to quickly create the terms of the model
  • use workspace factory to define total model (a prod pdf)
  • create a RooStats ModelConfig to specify observables, parameters of interest
  • add to the ModelConfig a prior on the parameters for Bayesian techniques note, the pdf it is factorized for parameters of interest & nuisance params
  • visualize the model
  • write the workspace to a file
  • use several of RooStats IntervalCalculators & compare results
[#0] WARNING:InputArguments -- The parameter 'sigma' with range [-inf, inf] of the RooGaussian 'mcCons' exceeds the safe range of (0, inf). Advise to limit its range.
[#1] INFO:ObjectHandling -- RooWorkspace::import(wspace) importing dataset modelData
[#1] INFO:InputArguments -- The deprecated RooFit::CloneData(1) option passed to createNLL() is ignored.
[#0] PROGRESS:Minimization -- ProfileLikelihoodCalcultor::DoGLobalFit - find MLE
[#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: activating const optimization
[#1] INFO:Minimization -- The following expressions will be evaluated in cache-and-track mode: (on,off,onbar,offbar,mcCons)
[#0] PROGRESS:Minimization -- ProfileLikelihoodCalcultor::DoMinimizeNLL - using Minuit2 / with strategy 1
[#1] INFO:Minimization --
RooFitResult: minimized FCN value: 16.2872, estimated distance to minimum: 3.94509e-11
covariance matrix quality: Full, accurate covariance matrix
Status : MINIMIZE=0
Floating Parameter FinalValue +/- Error
-------------------- --------------------------
b 8.3599e+01 +/- 1.39e+01
bbar 9.9300e+02 +/- 3.15e+01
rho 1.2784e+00 +/- 1.99e-01
s 5.5401e+01 +/- 1.78e+01
tau 4.9406e+00 +/- 1.72e-01
Bayesian Calc. only supports on parameter of interest
[#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: activating const optimization
[#1] INFO:Minimization -- The following expressions will be evaluated in cache-and-track mode: (on,off,onbar,offbar,mcCons)
Minuit2Minimizer: Minimize with max-calls 2500 convergence for edm < 1 strategy 1
Minuit2Minimizer : Valid minimum - status = 0
FVAL = 16.2877171875810056
Edm = 0.000500784837874430856
Nfcn = 161
b = 83.6241 +/- 13.8852 (limited)
bbar = 992.065 +/- 31.4894 (limited)
rho = 1.27662 +/- 0.19863 (limited)
s = 55.4214 +/- 17.8339 (limited)
tau = 4.94562 +/- 0.171976 (limited)
[#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: deactivating const optimization
[#1] INFO:Minimization -- RooProfileLL::evaluate(nll_model_modelData_Profile[s]) Creating instance of MINUIT
[#1] INFO:Minimization -- RooProfileLL::evaluate(nll_model_modelData_Profile[s]) determining minimum likelihood for current configurations w.r.t all observable
[#1] INFO:Minimization -- RooProfileLL::evaluate(nll_model_modelData_Profile[s]) minimum found at (s=55.2986)
.
[#1] INFO:Minimization -- RooProfileLL::evaluate(nll_model_modelData_Profile[s]) Creating instance of MINUIT
[#1] INFO:Minimization -- RooProfileLL::evaluate(nll_model_modelData_Profile[s]) determining minimum likelihood for current configurations w.r.t all observable
[#0] ERROR:InputArguments -- RooArgSet::checkForDup: ERROR argument with name s is already in this set
[#1] INFO:Minimization -- RooProfileLL::evaluate(nll_model_modelData_Profile[s]) minimum found at (s=55.3393)
..........................................................................................................................................................................................................Profile Likelihood interval on s = [12.1902, 88.6871]
Real time 0:00:00, CP time 0.860
#include "TStopwatch.h"
#include "TCanvas.h"
#include "TROOT.h"
#include "RooPlot.h"
#include "RooAbsPdf.h"
#include "RooWorkspace.h"
#include "RooDataSet.h"
#include "RooGlobalFunc.h"
#include "RooFitResult.h"
#include "RooRandom.h"
using namespace RooFit;
using namespace RooStats;
void FourBinInstructional(bool doBayesian = false, bool doFeldmanCousins = false, bool doMCMC = false)
{
// let's time this challenging example
t.Start();
// set RooFit random seed for reproducible results
// make model
RooWorkspace *wspace = new RooWorkspace("wspace");
wspace->factory("Poisson::on(non[0,1000], sum::splusb(s[40,0,100],b[100,0,300]))");
wspace->factory("Poisson::off(noff[0,5000], prod::taub(b,tau[5,3,7],rho[1,0,2]))");
wspace->factory("Poisson::onbar(nonbar[0,10000], bbar[1000,500,2000])");
wspace->factory("Poisson::offbar(noffbar[0,1000000], prod::lambdaoffbar(bbar, tau))");
wspace->factory("Gaussian::mcCons(rhonom[1.,0,2], rho, sigma[.2])");
wspace->factory("PROD::model(on,off,onbar,offbar,mcCons)");
wspace->defineSet("obs", "non,noff,nonbar,noffbar,rhonom");
wspace->factory("Uniform::prior_poi({s})");
wspace->factory("Uniform::prior_nuis({b,bbar,tau, rho})");
wspace->factory("PROD::prior(prior_poi,prior_nuis)");
// ----------------------------------
// Control some interesting variations
// define parameers of interest
// for 1-d plots
wspace->defineSet("poi", "s");
wspace->defineSet("nuis", "b,tau,rho,bbar");
// for 2-d plots to inspect correlations:
// wspace->defineSet("poi","s,rho");
// test simpler cases where parameters are known.
// wspace->var("tau")->setConstant();
// wspace->var("rho")->setConstant();
// wspace->var("b")->setConstant();
// wspace->var("bbar")->setConstant();
// inspect workspace
// wspace->Print();
// ----------------------------------------------------------
// Generate toy data
// generate toy data assuming current value of the parameters
// import into workspace.
// add Verbose() to see how it's being generated
std::unique_ptr<RooDataSet> data{wspace->pdf("model")->generate(*wspace->set("obs"), 1)};
// data->Print("v");
wspace->import(*data);
// ----------------------------------
// Now the statistical tests
// model config
ModelConfig *modelConfig = new ModelConfig("FourBins");
modelConfig->SetWorkspace(*wspace);
modelConfig->SetPdf(*wspace->pdf("model"));
modelConfig->SetPriorPdf(*wspace->pdf("prior"));
modelConfig->SetParametersOfInterest(*wspace->set("poi"));
modelConfig->SetNuisanceParameters(*wspace->set("nuis"));
wspace->import(*modelConfig);
wspace->writeToFile("FourBin.root");
// -------------------------------------------------
// If you want to see the covariance matrix uncomment
// wspace->pdf("model")->fitTo(*data);
// use ProfileLikelihood
ProfileLikelihoodCalculator plc(*data, *modelConfig);
plc.SetConfidenceLevel(0.95);
LikelihoodInterval *plInt = plc.GetInterval();
plInt->LowerLimit(*wspace->var("s")); // get ugly print out of the way. Fix.
// use FeldmaCousins (takes ~20 min)
FeldmanCousins fc(*data, *modelConfig);
fc.SetConfidenceLevel(0.95);
// number counting: dataset always has 1 entry with N events observed
fc.FluctuateNumDataEntries(false);
fc.UseAdaptiveSampling(true);
fc.SetNBins(40);
PointSetInterval *fcInt = NULL;
if (doFeldmanCousins) { // takes 7 minutes
fcInt = (PointSetInterval *)fc.GetInterval(); // fix cast
}
// use BayesianCalculator (only 1-d parameter of interest, slow for this problem)
BayesianCalculator bc(*data, *modelConfig);
bc.SetConfidenceLevel(0.95);
SimpleInterval *bInt = NULL;
if (doBayesian && wspace->set("poi")->getSize() == 1) {
bInt = bc.GetInterval();
} else {
cout << "Bayesian Calc. only supports on parameter of interest" << endl;
}
// use MCMCCalculator (takes about 1 min)
// Want an efficient proposal function, so derive it from covariance
// matrix of fit
std::unique_ptr<RooFitResult> fit{wspace->pdf("model")->fitTo(*data, Save())};
ph.SetVariables((RooArgSet &)fit->floatParsFinal());
ph.SetCovMatrix(fit->covarianceMatrix());
ph.SetUpdateProposalParameters(kTRUE); // auto-create mean vars and add mappings
ph.SetCacheSize(100);
MCMCCalculator mc(*data, *modelConfig);
mc.SetConfidenceLevel(0.95);
mc.SetProposalFunction(*pf);
mc.SetNumBurnInSteps(500); // first N steps to be ignored as burn-in
mc.SetNumIters(50000);
mc.SetLeftSideTailFraction(0.5); // make a central interval
MCMCInterval *mcInt = NULL;
if (doMCMC)
mcInt = mc.GetInterval();
// ----------------------------------
// Make some plots
TCanvas *c1 = (TCanvas *)gROOT->Get("c1");
if (!c1)
c1 = new TCanvas("c1");
if (doBayesian && doMCMC) {
c1->Divide(3);
c1->cd(1);
} else if (doBayesian || doMCMC) {
c1->Divide(2);
c1->cd(1);
}
lrplot->Draw();
if (doBayesian && wspace->set("poi")->getSize() == 1) {
c1->cd(2);
// the plot takes a long time and print lots of error
// using a scan it is better
bc.SetScanOfPosterior(20);
RooPlot *bplot = bc.GetPosteriorPlot();
bplot->Draw();
}
if (doMCMC) {
if (doBayesian && wspace->set("poi")->getSize() == 1)
c1->cd(3);
else
c1->cd(2);
MCMCIntervalPlot mcPlot(*mcInt);
mcPlot.Draw();
}
// ----------------------------------
// query intervals
cout << "Profile Likelihood interval on s = [" << plInt->LowerLimit(*wspace->var("s")) << ", "
<< plInt->UpperLimit(*wspace->var("s")) << "]" << endl;
// Profile Likelihood interval on s = [12.1902, 88.6871]
if (doBayesian && wspace->set("poi")->getSize() == 1) {
cout << "Bayesian interval on s = [" << bInt->LowerLimit() << ", " << bInt->UpperLimit() << "]" << endl;
}
if (doFeldmanCousins) {
cout << "Feldman Cousins interval on s = [" << fcInt->LowerLimit(*wspace->var("s")) << ", "
<< fcInt->UpperLimit(*wspace->var("s")) << "]" << endl;
// Feldman Cousins interval on s = [18.75 +/- 2.45, 83.75 +/- 2.45]
}
if (doMCMC) {
cout << "MCMC interval on s = [" << mcInt->LowerLimit(*wspace->var("s")) << ", "
<< mcInt->UpperLimit(*wspace->var("s")) << "]" << endl;
// MCMC interval on s = [15.7628, 84.7266]
}
t.Print();
}
constexpr Bool_t kTRUE
Definition RtypesCore.h:100
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void data
#define gROOT
Definition TROOT.h:407
Int_t getSize() const
Return the number of elements in the collection.
RooFit::OwningPtr< RooFitResult > fitTo(RooAbsData &data, CmdArgs_t const &... cmdArgs)
Fit PDF to given dataset.
Definition RooAbsPdf.h:156
RooFit::OwningPtr< RooDataSet > generate(const RooArgSet &whatVars, Int_t nEvents, const RooCmdArg &arg1, const RooCmdArg &arg2={}, const RooCmdArg &arg3={}, const RooCmdArg &arg4={}, const RooCmdArg &arg5={})
See RooAbsPdf::generate(const RooArgSet&,const RooCmdArg&,const RooCmdArg&,const RooCmdArg&,...
Definition RooAbsPdf.h:57
RooArgSet is a container object that can hold multiple RooAbsArg objects.
Definition RooArgSet.h:55
static RooMsgService & instance()
Return reference to singleton instance.
void setGlobalKillBelow(RooFit::MsgLevel level)
RooFit::MsgLevel globalKillBelow() const
A RooPlot is a plot frame and a container for graphics objects within that frame.
Definition RooPlot.h:43
void Draw(Option_t *options=nullptr) override
Draw this plot and all of the elements it contains.
Definition RooPlot.cxx:652
static TRandom * randomGenerator()
Return a pointer to a singleton random-number generator implementation.
Definition RooRandom.cxx:51
BayesianCalculator is a concrete implementation of IntervalCalculator, providing the computation of a...
The FeldmanCousins class (like the Feldman-Cousins technique) is essentially a specific configuration...
This class provides simple and straightforward utilities to plot a LikelihoodInterval object.
void Draw(const Option_t *options=nullptr) override
draw the likelihood interval or contour for the 1D case a RooPlot is drawn by default of the profiled...
LikelihoodInterval is a concrete implementation of the RooStats::ConfInterval interface.
double UpperLimit(const RooRealVar &param)
return the upper bound of the interval on a given parameter
double LowerLimit(const RooRealVar &param)
return the lower bound of the interval on a given parameter
Bayesian Calculator estimating an interval or a credible region using the Markov-Chain Monte Carlo me...
This class provides simple and straightforward utilities to plot a MCMCInterval object.
MCMCInterval is a concrete implementation of the RooStats::ConfInterval interface.
virtual double UpperLimit(RooRealVar &param)
get the highest value of param that is within the confidence interval
virtual double LowerLimit(RooRealVar &param)
get the lowest value of param that is within the confidence interval
ModelConfig is a simple class that holds configuration information specifying how a model should be u...
Definition ModelConfig.h:35
virtual void SetPriorPdf(const RooAbsPdf &pdf)
Set the Prior Pdf, add to the workspace if not already there.
Definition ModelConfig.h:94
virtual void SetWorkspace(RooWorkspace &ws)
Definition ModelConfig.h:70
virtual void SetParametersOfInterest(const RooArgSet &set)
Specify parameters of interest.
virtual void SetNuisanceParameters(const RooArgSet &set)
Specify the nuisance parameters (parameters that are not POI).
virtual void SetPdf(const RooAbsPdf &pdf)
Set the Pdf, add to the workspace if not already there.
Definition ModelConfig.h:87
PointSetInterval is a concrete implementation of the ConfInterval interface.
double UpperLimit(RooRealVar &param)
return upper limit on a given parameter
double LowerLimit(RooRealVar &param)
return lower limit on a given parameter
The ProfileLikelihoodCalculator is a concrete implementation of CombinedCalculator (the interface cla...
ProposalFunction is an interface for all proposal functions that would be used with a Markov Chain Mo...
virtual void SetCovMatrix(const TMatrixDSym &covMatrix)
set the covariance matrix to use for a multi-variate Gaussian proposal
virtual ProposalFunction * GetProposalFunction()
Get the ProposalFunction that we've been designing.
virtual void SetVariables(RooArgList &vars)
virtual void SetCacheSize(Int_t size)
virtual void SetUpdateProposalParameters(bool updateParams)
SimpleInterval is a concrete implementation of the ConfInterval interface.
virtual double UpperLimit()
return the interval upper limit
virtual double LowerLimit()
return the interval lower limit
Persistable container for RooFit projects.
RooAbsPdf * pdf(RooStringView name) const
Retrieve p.d.f (RooAbsPdf) with given name. A null pointer is returned if not found.
bool writeToFile(const char *fileName, bool recreate=true)
Save this current workspace into given file.
const RooArgSet * set(RooStringView name)
Return pointer to previously defined named set with given nmame If no such set is found a null pointe...
bool import(const RooAbsArg &arg, const RooCmdArg &arg1={}, const RooCmdArg &arg2={}, const RooCmdArg &arg3={}, const RooCmdArg &arg4={}, const RooCmdArg &arg5={}, const RooCmdArg &arg6={}, const RooCmdArg &arg7={}, const RooCmdArg &arg8={}, const RooCmdArg &arg9={})
Import a RooAbsArg object, e.g.
RooFactoryWSTool & factory()
Return instance to factory tool.
RooRealVar * var(RooStringView name) const
Retrieve real-valued variable (RooRealVar) with given name. A null pointer is returned if not found.
bool defineSet(const char *name, const RooArgSet &aset, bool importMissing=false)
Define a named RooArgSet with given constituents.
The Canvas class.
Definition TCanvas.h:23
virtual void SetSeed(ULong_t seed=0)
Set the random generator seed.
Definition TRandom.cxx:608
Stopwatch class.
Definition TStopwatch.h:28
void Start(Bool_t reset=kTRUE)
Start the stopwatch.
void Print(Option_t *option="") const override
Print the real and cpu time passed between the start and stop events.
RooCmdArg Save(bool flag=true)
return c1
Definition legend1.C:41
fit(model, train_loader, val_loader, num_epochs, batch_size, optimizer, criterion, save_best, scheduler)
The namespace RooFit contains mostly switches that change the behaviour of functions of PDFs (or othe...
Definition JSONIO.h:26
MsgLevel
Verbosity level for RooMsgService::StreamConfig in RooMsgService.
Namespace for the RooStats classes.
Definition Asimov.h:19
Authors
authors: Kyle Cranmer, Tanja Rommerskirchen

Definition in file FourBinInstructional.C.