A hypothesis testing example based on number counting with background uncertainty.
A hypothesis testing example based on number counting with background uncertainty.
NOTE: This example is like HybridInstructional, but the model is more clearly generalizable to an analysis with shapes. There is a lot of flexibility for how one models a problem in RooFit/RooStats. Models come in a few common forms:
- standard form: extended PDF of some discriminating variable m: eg: P(m) ~ S*fs(m) + B*fb(m), with S+B events expected in this case the dataset has N rows corresponding to N events and the extended term is Pois(N|S+B)
- fractional form: non-extended PDF of some discriminating variable m: eg: P(m) ~ s*fs(m) + (1-s)*fb(m), where s is a signal fraction in this case the dataset has N rows corresponding to N events and there is no extended term
- number counting form: in which there is no discriminating variable and the counts are modeled directly (see HybridInstructional) eg: P(N) = Pois(N|S+B) in this case the dataset has 1 row corresponding to N events and the extended term is the PDF itself.
Here we convert the number counting form into the standard form by introducing a dummy discriminating variable m with a uniform distribution.
This example:
- demonstrates the usage of the HybridCalcultor (Part 4-6)
- demonstrates the numerical integration of RooFit (Part 2)
- validates the RooStats against an example with a known analytic answer
- demonstrates usage of different test statistics
- explains subtle choices in the prior used for hybrid methods
- demonstrates usage of different priors for the nuisance parameters
- demonstrates usage of PROOF
The basic setup here is that a main measurement has observed x events with an expectation of s+b. One can choose an ad hoc prior for the uncertainty on b, or try to base it on an auxiliary measurement. In this case, the auxiliary measurement (aka control measurement, sideband) is another counting experiment with measurement y and expectation tau*b. With an 'original prior' on b, called \( \eta(b) \) then one can obtain a posterior from the auxiliary measurement \( \pi(b) = \eta(b) * Pois(y|tau*b) \). This is a principled choice for a prior on b in the main measurement of x, which can then be treated in a hybrid Bayesian/Frequentist way. Additionally, one can try to treat the two measurements simultaneously, which is detailed in Part 6 of the tutorial.
This tutorial is related to the FourBin.C tutorial in the modeling, but focuses on hypothesis testing instead of interval estimation.
More background on this 'prototype problem' can be found in the following papers:
␛[1mRooFit v3.60 -- Developed by Wouter Verkerke and David Kirkby␛[0m
Copyright (C) 2000-2013 NIKHEF, University of California & Stanford University
All rights reserved, please read http://roofit.sourceforge.net/license.txt
-----------------------------------------
Part 3
Z_Bi p-value (analytic): 0.00094165
Z_Bi significance (analytic): 3.10804
Real time 0:00:00, CP time 0.040
[#0] WARNING:InputArguments -- The parameter 'b' with range [0, 300] of the RooLognormal 'lognorm_prior' exceeds the safe range of (0, inf). Advise to limit its range.
[#0] WARNING:InputArguments -- The parameter 'y' with range [0, 500] of the RooLognormal 'lognorm_prior' exceeds the safe range of (0, inf). Advise to limit its range.
[#0] WARNING:Eval -- RooStatsUtils::MakeNuisancePdf - no constraints found on nuisance parameters in the input model
[#0] WARNING:Eval -- RooStatsUtils::MakeNuisancePdf - no constraints found on nuisance parameters in the input model
-----------------------------------------
Part 4
Results HypoTestCalculator_result:
- Null p-value = 0.0009 +/- 0.000173127
- Significance = 3.12139 +/- 0.0566416 sigma
- Number of Alt toys: 1000
- Number of Null toys: 30000
- Test statistic evaluated on data: 150
- CL_b: 0.0009 +/- 0.000173127
- CL_s+b: 0.533 +/- 0.0157769
- CL_s: 592.222 +/- 115.263
Real time 0:00:59, CP time 59.060
class BinCountTestStat : public TestStatistic {
public:
BinCountTestStat(void) : fColumnName("tmp") {}
BinCountTestStat(string columnName) : fColumnName(columnName) {}
{
}
return value;
}
virtual const TString GetVarName()
const {
return fColumnName; }
private:
string fColumnName;
protected:
};
void HybridStandardForm()
{
w->
factory(
"ExtendPdf::px(f,sum::splusb(s[0,0,100],b[100,0,300]))");
w->
factory(
"Poisson::py(y[100,0,500],prod::taub(tau[1.],b))");
pc =
new ProofConfig(*w, 4,
"workers=4",
kFALSE);
cout << "-----------------------------------------" << endl;
cout << "Part 3" << endl;
std::cout << "Z_Bi p-value (analytic): " << p_Bi << std::endl;
std::cout << "Z_Bi significance (analytic): " << Z_Bi << std::endl;
ModelConfig b_model("B_model", w);
b_model.SetPdf(*w->
pdf(
"px"));
b_model.SetObservables(*w->
set(
"obs"));
b_model.SetParametersOfInterest(*w->
set(
"poi"));
b_model.SetSnapshot(*w->
set(
"poi"));
ModelConfig sb_model("S+B_model", w);
sb_model.SetPdf(*w->
pdf(
"px"));
sb_model.SetObservables(*w->
set(
"obs"));
sb_model.SetParametersOfInterest(*w->
set(
"poi"));
sb_model.SetSnapshot(*w->
set(
"poi"));
NumEventsTestStat eventCount(*w->
pdf(
"px"));
w->
factory(
"Gaussian::gauss_prior(b,y, expr::sqrty('sqrt(y)',y))");
w->
factory(
"Lognormal::lognorm_prior(b,y, expr::kappa('1+1./sqrt(y)',y))");
HybridCalculator hc1(*data, sb_model, b_model);
ToyMCSampler *toymcs1 = (ToyMCSampler *)hc1.GetTestStatSampler();
toymcs1->SetTestStatistic(&eventCount);
hc1.SetToys(30000, 1000);
hc1.ForcePriorNuisanceAlt(*w->
pdf(
"py"));
hc1.ForcePriorNuisanceNull(*w->
pdf(
"py"));
HypoTestResult *r1 = hc1.GetHypoTest();
cout << "-----------------------------------------" << endl;
cout << "Part 4" << endl;
r1->Print();
HypoTestPlot *p1 = new HypoTestPlot(*r1, 30);
p1->Draw();
return;
SimpleLikelihoodRatioTestStat slrts(*b_model.GetPdf(), *sb_model.GetPdf());
slrts.SetNullParameters(*b_model.GetSnapshot());
slrts.SetAltParameters(*sb_model.GetSnapshot());
HybridCalculator hc2(*data, sb_model, b_model);
ToyMCSampler *toymcs2 = (ToyMCSampler *)hc2.GetTestStatSampler();
toymcs2->SetTestStatistic(&slrts);
hc2.SetToys(20000, 1000);
hc2.ForcePriorNuisanceAlt(*w->
pdf(
"py"));
hc2.ForcePriorNuisanceNull(*w->
pdf(
"py"));
toymcs2->SetProofConfig(
pc);
HypoTestResult *r2 = hc2.GetHypoTest();
cout << "-----------------------------------------" << endl;
cout << "Part 5" << endl;
r2->Print();
HypoTestPlot *p2 = new HypoTestPlot(*r2, 30);
p2->Draw();
return;
}
#define ClassDef(name, id)
RooAbsData is the common abstract base class for binned and unbinned datasets.
virtual const RooArgSet * get() const
virtual Int_t numEntries() const
RooDataSet * generate(const RooArgSet &whatVars, Int_t nEvents, const RooCmdArg &arg1, const RooCmdArg &arg2=RooCmdArg::none(), const RooCmdArg &arg3=RooCmdArg::none(), const RooCmdArg &arg4=RooCmdArg::none(), const RooCmdArg &arg5=RooCmdArg::none())
See RooAbsPdf::generate(const RooArgSet&,const RooCmdArg&,const RooCmdArg&,const RooCmdArg&,...
RooArgSet is a container object that can hold multiple RooAbsArg objects.
Double_t getRealValue(const char *name, Double_t defVal=0, Bool_t verbose=kFALSE) const
Get value of a RooAbsReal stored in set with given name.
RooDataSet is a container class to hold unbinned data.
static RooMsgService & instance()
Return reference to singleton instance.
void setGlobalKillBelow(RooFit::MsgLevel level)
RooFit::MsgLevel globalKillBelow() const
virtual void setVal(Double_t value)
Set value of variable to 'value'.
The RooWorkspace is a persistable container for RooFit projects.
Bool_t defineSet(const char *name, const RooArgSet &aset, Bool_t importMissing=kFALSE)
Define a named RooArgSet with given constituents.
RooRealVar * var(const char *name) const
Retrieve real-valued variable (RooRealVar) with given name. A null pointer is returned if not found.
RooFactoryWSTool & factory()
Return instance to factory tool.
const RooArgSet * set(const char *name)
Return pointer to previously defined named set with given nmame If no such set is found a null pointe...
RooAbsPdf * pdf(const char *name) const
Retrieve p.d.f (RooAbsPdf) with given name. A null pointer is returned if not found.
void Start(Bool_t reset=kTRUE)
Start the stopwatch.
void Stop()
Stop the stopwatch.
void Print(Option_t *option="") const
Print the real and cpu time passed between the start and stop events.
The namespace RooFit contains mostly switches that change the behaviour of functions of PDFs (or othe...
MsgLevel
Verbosity level for RooMsgService::StreamConfig in RooMsgService.
Double_t BinomialWithTauObsZ(Double_t nObs, Double_t bExp, Double_t tau)
Double_t BinomialWithTauObsP(Double_t nObs, Double_t bExp, Double_t tau)
Namespace for the RooStats classes.
static constexpr double pc
- Authors
- Kyle Cranmer, Wouter Verkerke, and Sven Kreiss
Definition in file HybridStandardForm.C.