Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
NeymanConstruction.cxx
Go to the documentation of this file.
1// @(#)root/roostats:$Id$
2// Author: Kyle Cranmer January 2009
3
4/*************************************************************************
5 * Copyright (C) 1995-2008, Rene Brun and Fons Rademakers. *
6 * All rights reserved. *
7 * *
8 * For the licensing terms see $ROOTSYS/LICENSE. *
9 * For the list of contributors see $ROOTSYS/README/CREDITS. *
10 *************************************************************************/
11
12/** \class RooStats::NeymanConstruction
13 \ingroup Roostats
14
15NeymanConstruction is a concrete implementation of the NeymanConstruction
16interface that, as the name suggests, performs a NeymanConstruction. It produces
17a RooStats::PointSetInterval, which is a concrete implementation of the
18ConfInterval interface.
19
20The Neyman Construction is not a uniquely defined statistical technique, it
21requires that one specify an ordering rule or ordering principle, which is
22usually incoded by choosing a specific test statistic and limits of integration
23(corresponding to upper/lower/central limits). As a result, this class must be
24configured with the corresponding information before it can produce an interval.
25Common configurations, such as the Feldman-Cousins approach, can be enforced by
26other light weight classes.
27
28The Neyman Construction considers every point in the parameter space
29independently, no assumptions are made that the interval is connected or of a
30particular shape. As a result, the PointSetInterval class is used to represent
31the result. The user indicate which points in the parameter space to perform
32the construction by providing a PointSetInterval instance with the desired points.
33
34This class is fairly light weight, because the choice of parameter points to be
35considered is factorized and so is the creation of the sampling distribution of
36the test statistic (which is done by a concrete class implementing the
37DistributionCreator interface). As a result, this class basically just drives the
38construction by:
39
40 - using a DistributionCreator to create the SamplingDistribution of a user-
41 defined test statistic for each parameter point of interest,
42 - defining the acceptance region in the data by finding the thresholds on the
43 test statistic such that the integral of the sampling distribution is of the
44 appropriate size and consistent with the limits of integration
45 (eg. upper/lower/central limits),
46 - and finally updating the PointSetInterval based on whether the value of the
47 test statistic evaluated on the data are in the acceptance region.
48
49*/
50
52
54
56
60
61#include "RooMsgService.h"
62#include "RooGlobalFunc.h"
63
64#include "RooDataSet.h"
65#include "TFile.h"
66#include "TMath.h"
67#include "TH1F.h"
68
70
71using namespace RooFit;
72using namespace RooStats;
73using std::endl, std::string;
74
75
76////////////////////////////////////////////////////////////////////////////////
77/// default constructor
78
80 fSize(0.05),
81 fData(data),
82 fModel(model),
83 fTestStatSampler(nullptr),
84 fPointsToTest(nullptr),
85 fLeftSideFraction(0),
86 fConfBelt(nullptr), // constructed with tree data
87 fAdaptiveSampling(false),
88 fAdditionalNToysFactor(1.),
89 fSaveBeltToFile(false),
90 fCreateBelt(false)
91
92{
93// fWS = new RooWorkspace();
94// fOwnsWorkspace = true;
95// fDataName = "";
96// fPdfName = "";
97}
98
99////////////////////////////////////////////////////////////////////////////////
100/// default constructor
101/// if(fOwnsWorkspace && fWS) delete fWS;
102/// if(fConfBelt) delete fConfBelt;
103
105}
106
107////////////////////////////////////////////////////////////////////////////////
108/// Main interface to get a RooStats::ConfInterval.
109/// It constructs a RooStats::SetInterval.
110
112
113 TFile* f=nullptr;
114 if(fSaveBeltToFile){
115 //coverity[FORWARD_NULL]
116 oocoutI(f,Contents) << "NeymanConstruction saving ConfidenceBelt to file SamplingDistributions.root" << endl;
117 f = new TFile("SamplingDistributions.root","recreate");
118 }
119
120 Int_t npass = 0;
121 RooArgSet* point;
122
123 // strange problems when using snapshots.
124 // RooArgSet* fPOI = (RooArgSet*) fModel.GetParametersOfInterest()->snapshot();
126
127 RooDataSet* pointsInInterval = new RooDataSet("pointsInInterval",
128 "points in interval",
129 *(fPointsToTest->get(0)) );
130
131 // loop over points to test
132 for(Int_t i=0; i<fPointsToTest->numEntries(); ++i){
133 // get a parameter point from the list of points to test.
134 point = const_cast<RooArgSet*>(fPointsToTest->get(i));//->clone("temp");
135
136 // set parameters of interest to current point
137 fPOI->assign(*point);
138
139 // set test stat sampler to use this point
141
142 // get the value of the test statistic for this data set
143 double thisTestStatistic = fTestStatSampler->EvaluateTestStatistic(fData, *fPOI );
144 /*
145 cout << "NC CHECK: " << i << endl;
146 point->Print();
147 fPOI->Print("v");
148 fData.Print();
149 cout <<"thisTestStatistic = " << thisTestStatistic << endl;
150 */
151
152 // find the lower & upper thresholds on the test statistic that
153 // define the acceptance region in the data
154
155 SamplingDistribution* samplingDist=nullptr;
156 double sigma;
157 double upperEdgeOfAcceptance;
158 double upperEdgeMinusSigma;
159 double upperEdgePlusSigma;
160 double lowerEdgeOfAcceptance;
161 double lowerEdgeMinusSigma;
162 double lowerEdgePlusSigma;
163 Int_t additionalMC=0;
164
165 // the adaptive sampling algorithm wants at least one toy event to be outside
166 // of the requested pvalue including the sampling variation. That leads to an equation
167 // N-1 = (1-alpha)N + Z sqrt(N - (1-alpha)N) // for upper limit and
168 // 1 = alpha N - Z sqrt(alpha N) // for lower limit
169 //
170 // solving for N gives:
171 // N = 1/alpha * [3/2 + sqrt(5)] for Z = 1 (which is used currently)
172 // thus, a good guess for the first iteration of events is N=3.73/alpha~4/alpha
173 // should replace alpha here by smaller tail probability: eg. alpha*Min(leftsideFrac, 1.-leftsideFrac)
174 // totalMC will be incremented by 2 before first call, so initiated it at half the value
175 Int_t totalMC = (Int_t) (2./fSize/std::min(fLeftSideFraction,1.-fLeftSideFraction));
176 if(fLeftSideFraction==0. || fLeftSideFraction ==1.){
177 totalMC = (Int_t) (2./fSize);
178 }
179 // use control
180 double tmc = double(totalMC)*fAdditionalNToysFactor;
181 totalMC = (Int_t) tmc;
182
183 ToyMCSampler* toyMCSampler = dynamic_cast<ToyMCSampler*>(fTestStatSampler);
184 if(fAdaptiveSampling && toyMCSampler) {
185 do{
186 // this will be executed first, then while conditioned checked
187 // as an exit condition for the loop.
188
189 // the next line is where most of the time will be spent
190 // generating the sampling dist of the test statistic.
191 additionalMC = 2*totalMC; // grow by a factor of two
192 samplingDist =
193 toyMCSampler->AppendSamplingDistribution(*point,
194 samplingDist,
195 additionalMC);
196 if (!samplingDist) {
197 oocoutE(nullptr,Eval) << "Neyman Construction: error generating sampling distribution" << endl;
198 return nullptr;
199 }
200 totalMC=samplingDist->GetSize();
201
202 //cout << "without sigma upper = " <<
203 //samplingDist->InverseCDF( 1. - ((1.-fLeftSideFraction) * fSize) ) << endl;
204
205 sigma = 1;
206 upperEdgeOfAcceptance =
207 samplingDist->InverseCDF( 1. - ((1.-fLeftSideFraction) * fSize) ,
208 sigma, upperEdgePlusSigma);
209 sigma = -1;
210 samplingDist->InverseCDF( 1. - ((1.-fLeftSideFraction) * fSize) ,
211 sigma, upperEdgeMinusSigma);
212
213 sigma = 1;
214 lowerEdgeOfAcceptance =
215 samplingDist->InverseCDF( fLeftSideFraction * fSize ,
216 sigma, lowerEdgePlusSigma);
217 sigma = -1;
218 samplingDist->InverseCDF( fLeftSideFraction * fSize ,
219 sigma, lowerEdgeMinusSigma);
220
221 ooccoutD(samplingDist,Eval) << "NeymanConstruction: "
222 << "total MC = " << totalMC
223 << " this test stat = " << thisTestStatistic << endl
224 << " upper edge -1sigma = " << upperEdgeMinusSigma
225 << ", upperEdge = "<<upperEdgeOfAcceptance
226 << ", upper edge +1sigma = " << upperEdgePlusSigma << endl
227 << " lower edge -1sigma = " << lowerEdgeMinusSigma
228 << ", lowerEdge = "<<lowerEdgeOfAcceptance
229 << ", lower edge +1sigma = " << lowerEdgePlusSigma << endl;
230 } while((
231 (thisTestStatistic <= upperEdgeOfAcceptance &&
232 thisTestStatistic > upperEdgeMinusSigma)
233 || (thisTestStatistic >= upperEdgeOfAcceptance &&
234 thisTestStatistic < upperEdgePlusSigma)
235 || (thisTestStatistic <= lowerEdgeOfAcceptance &&
236 thisTestStatistic > lowerEdgeMinusSigma)
237 || (thisTestStatistic >= lowerEdgeOfAcceptance &&
238 thisTestStatistic < lowerEdgePlusSigma)
239 ) && (totalMC < 100./fSize)
240 ) ; // need ; here
241 } else {
242 // the next line is where most of the time will be spent
243 // generating the sampling dist of the test statistic.
244 samplingDist = fTestStatSampler->GetSamplingDistribution(*point);
245 if (!samplingDist) {
246 oocoutE(nullptr,Eval) << "Neyman Construction: error generating sampling distribution" << endl;
247 return nullptr;
248 }
249
250 lowerEdgeOfAcceptance =
251 samplingDist->InverseCDF( fLeftSideFraction * fSize );
252 upperEdgeOfAcceptance =
253 samplingDist->InverseCDF( 1. - ((1.-fLeftSideFraction) * fSize) );
254 }
255
256 // add acceptance region to ConfidenceBelt
257 if(fConfBelt && fCreateBelt){
258 // cout << "conf belt set " << fConfBelt << endl;
260 lowerEdgeOfAcceptance,
261 upperEdgeOfAcceptance);
262 }
263
264 // printout some debug info
265 ooccoutP(samplingDist,Eval) << "NeymanConstruction: Prog: "<< i+1<<"/"<<fPointsToTest->numEntries()
266 << " total MC = " << samplingDist->GetSize()
267 << " this test stat = " << thisTestStatistic << endl;
268 ooccoutP(samplingDist,Eval) << " ";
269 for (auto const *myarg : static_range_cast<RooRealVar *> (*point)){
270 ooccoutP(samplingDist,Eval) << myarg->GetName() << "=" << myarg->getVal() << " ";
271 }
272 ooccoutP(samplingDist,Eval) << "[" << lowerEdgeOfAcceptance << ", "
273 << upperEdgeOfAcceptance << "] " << " in interval = " <<
274 (thisTestStatistic >= lowerEdgeOfAcceptance && thisTestStatistic <= upperEdgeOfAcceptance)
275 << endl << endl;
276
277 // Check if this data is in the acceptance region
278 if(thisTestStatistic >= lowerEdgeOfAcceptance && thisTestStatistic <= upperEdgeOfAcceptance) {
279 // if so, set this point to true
280 // fPointsToTest->add(*point, 1.); // this behaves differently for Hist and DataSet
281 pointsInInterval->add(*point);
282 ++npass;
283 }
284
285 if(fSaveBeltToFile){
286 //write to file
287 samplingDist->Write();
288 string tmpName = "hist_";
289 tmpName+=samplingDist->GetName();
290 TH1F h{tmpName.c_str(),"",500,0.,5.};
291 for(int ii=0; ii<samplingDist->GetSize(); ++ii){
292 h.Fill(samplingDist->GetSamplingDistribution().at(ii) );
293 }
294 h.Write();
295 }
296
297 delete samplingDist;
298 // delete point; // from dataset
299 }
300 oocoutI(pointsInInterval,Eval) << npass << " points in interval" << endl;
301
302 // create an interval based pointsInInterval
303 PointSetInterval* interval
304 = new PointSetInterval("ClassicalConfidenceInterval", *pointsInInterval);
305
306
307 if(fSaveBeltToFile){
308 // write belt to file
309 fConfBelt->Write();
310
311 f->Close();
312 }
313
314 delete f;
315 //delete data;
316 return interval;
317}
dim_t fSize
#define f(i)
Definition RSha256.hxx:104
#define h(i)
Definition RSha256.hxx:106
#define oocoutE(o, a)
#define oocoutI(o, a)
#define ooccoutP(o, a)
#define ooccoutD(o, a)
int Int_t
Definition RtypesCore.h:45
#define ClassImp(name)
Definition Rtypes.h:382
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void data
void assign(const RooAbsCollection &other) const
Sets the value, cache and constant attribute of any argument in our set that also appears in the othe...
Abstract base class for binned and unbinned datasets.
Definition RooAbsData.h:57
virtual const RooArgSet * get() const
Definition RooAbsData.h:101
virtual Int_t numEntries() const
Return number of entries in dataset, i.e., count unweighted entries.
RooArgSet is a container object that can hold multiple RooAbsArg objects.
Definition RooArgSet.h:24
Container class to hold unbinned data.
Definition RooDataSet.h:34
void add(const RooArgSet &row, double weight, double weightError)
Add one ore more rows of data.
void AddAcceptanceRegion(RooArgSet &, AcceptanceRegion region, double cl=-1., double leftside=-1.)
add after creating a region
ModelConfig is a simple class that holds configuration information specifying how a model should be u...
Definition ModelConfig.h:35
const RooArgSet * GetParametersOfInterest() const
get RooArgSet containing the parameter of interest (return nullptr if not existing)
NeymanConstruction is a concrete implementation of the NeymanConstruction interface that,...
bool fAdaptiveSampling
controls use of adaptive sampling algorithm
double fSize
size of the test (eg. specified rate of Type I error)
PointSetInterval * GetInterval() const override
Main interface to get a ConfInterval (will be a PointSetInterval)
~NeymanConstruction() override
default constructor if(fOwnsWorkspace && fWS) delete fWS; if(fConfBelt) delete fConfBelt;
bool fSaveBeltToFile
controls use if ConfidenceBelt should be saved to a TFile
NeymanConstruction(RooAbsData &data, ModelConfig &model)
NeymanConstruction();.
double fAdditionalNToysFactor
give user ability to ask for more toys
bool fCreateBelt
controls use if ConfidenceBelt should be saved to a TFile
PointSetInterval is a concrete implementation of the ConfInterval interface.
This class simply holds a sampling distribution of some test statistic.
Int_t GetSize() const
size of samples
double InverseCDF(double pvalue)
get the inverse of the Cumulative distribution function
const std::vector< double > & GetSamplingDistribution() const
Get test statistics values.
virtual void SetParametersForTestStat(const RooArgSet &)=0
specify the values of parameters used when evaluating test statistic
virtual double EvaluateTestStatistic(RooAbsData &data, RooArgSet &paramsOfInterest)=0
Main interface to evaluate the test statistic on a dataset.
virtual SamplingDistribution * GetSamplingDistribution(RooArgSet &paramsOfInterest)=0
Main interface to get a ConfInterval, pure virtual.
ToyMCSampler is an implementation of the TestStatSampler interface.
virtual SamplingDistribution * AppendSamplingDistribution(RooArgSet &allParameters, SamplingDistribution *last, Int_t additionalMC)
Extended interface to append to sampling distribution more samples.
A ROOT file is an on-disk file, usually with extension .root, that stores objects in a file-system-li...
Definition TFile.h:53
1-D histogram with a float per channel (see TH1 documentation)
Definition TH1.h:623
virtual Int_t Fill(Double_t x)
Increment bin with abscissa X by 1.
Definition TH1.cxx:3346
const char * GetName() const override
Returns name of object.
Definition TNamed.h:47
virtual Int_t Write(const char *name=nullptr, Int_t option=0, Int_t bufsize=0)
Write this object to the current directory.
Definition TObject.cxx:898
const Double_t sigma
The namespace RooFit contains mostly switches that change the behaviour of functions of PDFs (or othe...
Definition CodegenImpl.h:64
Namespace for the RooStats classes.
Definition CodegenImpl.h:58