Logo ROOT  
Reference Guide
Loading...
Searching...
No Matches
NeymanConstruction.cxx
Go to the documentation of this file.
1// @(#)root/roostats:$Id$
2// Author: Kyle Cranmer January 2009
3
4/*************************************************************************
5 * Copyright (C) 1995-2008, Rene Brun and Fons Rademakers. *
6 * All rights reserved. *
7 * *
8 * For the licensing terms see $ROOTSYS/LICENSE. *
9 * For the list of contributors see $ROOTSYS/README/CREDITS. *
10 *************************************************************************/
11
12/** \class RooStats::NeymanConstruction
13 \ingroup Roostats
14
15NeymanConstruction is a concrete implementation of the NeymanConstruction
16interface that, as the name suggests, performs a NeymanConstruction. It produces
17a RooStats::PointSetInterval, which is a concrete implementation of the
18ConfInterval interface.
19
20The Neyman Construction is not a uniquely defined statistical technique, it
21requires that one specify an ordering rule or ordering principle, which is
22usually incoded by choosing a specific test statistic and limits of integration
23(corresponding to upper/lower/central limits). As a result, this class must be
24configured with the corresponding information before it can produce an interval.
25Common configurations, such as the Feldman-Cousins approach, can be enforced by
26other light weight classes.
27
28The Neyman Construction considers every point in the parameter space
29independently, no assumptions are made that the interval is connected or of a
30particular shape. As a result, the PointSetInterval class is used to represent
31the result. The user indicate which points in the parameter space to perform
32the construction by providing a PointSetInterval instance with the desired points.
33
34This class is fairly light weight, because the choice of parameter points to be
35considered is factorized and so is the creation of the sampling distribution of
36the test statistic (which is done by a concrete class implementing the
37DistributionCreator interface). As a result, this class basically just drives the
38construction by:
39
40 - using a DistributionCreator to create the SamplingDistribution of a user-
41 defined test statistic for each parameter point of interest,
42 - defining the acceptance region in the data by finding the thresholds on the
43 test statistic such that the integral of the sampling distribution is of the
44 appropriate size and consistent with the limits of integration
45 (eg. upper/lower/central limits),
46 - and finally updating the PointSetInterval based on whether the value of the
47 test statistic evaluated on the data are in the acceptance region.
48
49*/
50
52
54
56
60
61#include "RooMsgService.h"
62#include "RooGlobalFunc.h"
63
64#include "RooDataSet.h"
65#include "TFile.h"
66#include "TMath.h"
67#include "TH1F.h"
68
69
70using namespace RooFit;
71using namespace RooStats;
72using std::endl, std::string;
73
74
75////////////////////////////////////////////////////////////////////////////////
76/// default constructor
77
79 fSize(0.05),
80 fData(data),
81 fModel(model),
82 fTestStatSampler(nullptr),
83 fPointsToTest(nullptr),
85 fConfBelt(nullptr), // constructed with tree data
86 fAdaptiveSampling(false),
88 fSaveBeltToFile(false),
89 fCreateBelt(false)
90
91{
92// fWS = new RooWorkspace();
93// fOwnsWorkspace = true;
94// fDataName = "";
95// fPdfName = "";
96}
97
98////////////////////////////////////////////////////////////////////////////////
99/// default constructor
100/// if(fOwnsWorkspace && fWS) delete fWS;
101/// if(fConfBelt) delete fConfBelt;
102
105
106////////////////////////////////////////////////////////////////////////////////
107/// Main interface to get a RooStats::ConfInterval.
108/// It constructs a RooStats::SetInterval.
109
111
112 TFile* f=nullptr;
113 if(fSaveBeltToFile){
114 //coverity[FORWARD_NULL]
115 oocoutI(f,Contents) << "NeymanConstruction saving ConfidenceBelt to file SamplingDistributions.root" << std::endl;
116 f = new TFile("SamplingDistributions.root","recreate");
117 }
118
119 Int_t npass = 0;
120 RooArgSet* point;
121
122 // strange problems when using snapshots.
123 // RooArgSet* fPOI = (RooArgSet*) fModel.GetParametersOfInterest()->snapshot();
124 RooArgSet* fPOI = new RooArgSet(*fModel.GetParametersOfInterest());
125
126 RooDataSet* pointsInInterval = new RooDataSet("pointsInInterval",
127 "points in interval",
128 *(fPointsToTest->get(0)) );
129
130 // loop over points to test
131 for(Int_t i=0; i<fPointsToTest->numEntries(); ++i){
132 // get a parameter point from the list of points to test.
133 point = const_cast<RooArgSet*>(fPointsToTest->get(i));//->clone("temp");
134
135 // set parameters of interest to current point
136 fPOI->assign(*point);
137
138 // set test stat sampler to use this point
139 fTestStatSampler->SetParametersForTestStat(*fPOI);
140
141 // get the value of the test statistic for this data set
142 double thisTestStatistic = fTestStatSampler->EvaluateTestStatistic(fData, *fPOI );
143 /*
144 std::cout << "NC CHECK: " << i << std::endl;
145 point->Print();
146 fPOI->Print("v");
147 fData.Print();
148 std::cout <<"thisTestStatistic = " << thisTestStatistic << std::endl;
149 */
150
151 // find the lower & upper thresholds on the test statistic that
152 // define the acceptance region in the data
153
154 SamplingDistribution* samplingDist=nullptr;
155 double sigma;
156 double upperEdgeOfAcceptance;
157 double upperEdgeMinusSigma;
158 double upperEdgePlusSigma;
159 double lowerEdgeOfAcceptance;
160 double lowerEdgeMinusSigma;
161 double lowerEdgePlusSigma;
162 Int_t additionalMC=0;
163
164 // the adaptive sampling algorithm wants at least one toy event to be outside
165 // of the requested pvalue including the sampling variation. That leads to an equation
166 // N-1 = (1-alpha)N + Z sqrt(N - (1-alpha)N) // for upper limit and
167 // 1 = alpha N - Z sqrt(alpha N) // for lower limit
168 //
169 // solving for N gives:
170 // N = 1/alpha * [3/2 + sqrt(5)] for Z = 1 (which is used currently)
171 // thus, a good guess for the first iteration of events is N=3.73/alpha~4/alpha
172 // should replace alpha here by smaller tail probability: eg. alpha*Min(leftsideFrac, 1.-leftsideFrac)
173 // totalMC will be incremented by 2 before first call, so initiated it at half the value
174 Int_t totalMC = (Int_t) (2./fSize/std::min(fLeftSideFraction,1.-fLeftSideFraction));
175 if(fLeftSideFraction==0. || fLeftSideFraction ==1.){
176 totalMC = (Int_t) (2./fSize);
177 }
178 // use control
179 double tmc = double(totalMC)*fAdditionalNToysFactor;
180 totalMC = (Int_t) tmc;
181
182 ToyMCSampler* toyMCSampler = dynamic_cast<ToyMCSampler*>(fTestStatSampler);
183 if(fAdaptiveSampling && toyMCSampler) {
184 do{
185 // this will be executed first, then while conditioned checked
186 // as an exit condition for the loop.
187
188 // the next line is where most of the time will be spent
189 // generating the sampling dist of the test statistic.
190 additionalMC = 2*totalMC; // grow by a factor of two
191 samplingDist =
192 toyMCSampler->AppendSamplingDistribution(*point,
193 samplingDist,
194 additionalMC);
195 if (!samplingDist) {
196 oocoutE(nullptr,Eval) << "Neyman Construction: error generating sampling distribution" << std::endl;
197 return nullptr;
198 }
199 totalMC=samplingDist->GetSize();
200
201 //cout << "without sigma upper = " <<
202 //samplingDist->InverseCDF( 1. - ((1.-fLeftSideFraction) * fSize) ) << std::endl;
203
204 sigma = 1;
205 upperEdgeOfAcceptance =
206 samplingDist->InverseCDF( 1. - ((1.-fLeftSideFraction) * fSize) ,
207 sigma, upperEdgePlusSigma);
208 sigma = -1;
209 samplingDist->InverseCDF( 1. - ((1.-fLeftSideFraction) * fSize) ,
210 sigma, upperEdgeMinusSigma);
211
212 sigma = 1;
213 lowerEdgeOfAcceptance =
214 samplingDist->InverseCDF( fLeftSideFraction * fSize ,
215 sigma, lowerEdgePlusSigma);
216 sigma = -1;
217 samplingDist->InverseCDF( fLeftSideFraction * fSize ,
218 sigma, lowerEdgeMinusSigma);
219
220 ooccoutD(samplingDist,Eval) << "NeymanConstruction: "
221 << "total MC = " << totalMC
222 << " this test stat = " << thisTestStatistic << std::endl
223 << " upper edge -1sigma = " << upperEdgeMinusSigma
224 << ", upperEdge = "<<upperEdgeOfAcceptance
225 << ", upper edge +1sigma = " << upperEdgePlusSigma << std::endl
226 << " lower edge -1sigma = " << lowerEdgeMinusSigma
227 << ", lowerEdge = "<<lowerEdgeOfAcceptance
228 << ", lower edge +1sigma = " << lowerEdgePlusSigma << std::endl;
229 } while((
230 (thisTestStatistic <= upperEdgeOfAcceptance &&
231 thisTestStatistic > upperEdgeMinusSigma)
232 || (thisTestStatistic >= upperEdgeOfAcceptance &&
233 thisTestStatistic < upperEdgePlusSigma)
234 || (thisTestStatistic <= lowerEdgeOfAcceptance &&
235 thisTestStatistic > lowerEdgeMinusSigma)
236 || (thisTestStatistic >= lowerEdgeOfAcceptance &&
237 thisTestStatistic < lowerEdgePlusSigma)
238 ) && (totalMC < 100./fSize)
239 ) ; // need ; here
240 } else {
241 // the next line is where most of the time will be spent
242 // generating the sampling dist of the test statistic.
243 samplingDist = fTestStatSampler->GetSamplingDistribution(*point);
244 if (!samplingDist) {
245 oocoutE(nullptr,Eval) << "Neyman Construction: error generating sampling distribution" << std::endl;
246 return nullptr;
247 }
248
249 lowerEdgeOfAcceptance =
250 samplingDist->InverseCDF( fLeftSideFraction * fSize );
251 upperEdgeOfAcceptance =
252 samplingDist->InverseCDF( 1. - ((1.-fLeftSideFraction) * fSize) );
253 }
254
255 // add acceptance region to ConfidenceBelt
256 if(fConfBelt && fCreateBelt){
257 // std::cout << "conf belt set " << fConfBelt << std::endl;
258 fConfBelt->AddAcceptanceRegion(*point, i,
259 lowerEdgeOfAcceptance,
260 upperEdgeOfAcceptance);
261 }
262
263 // printout some debug info
264 ooccoutP(samplingDist,Eval) << "NeymanConstruction: Prog: "<< i+1<<"/"<<fPointsToTest->numEntries()
265 << " total MC = " << samplingDist->GetSize()
266 << " this test stat = " << thisTestStatistic << std::endl;
267 ooccoutP(samplingDist,Eval) << " ";
268 for (auto const *myarg : static_range_cast<RooRealVar *> (*point)){
269 ooccoutP(samplingDist,Eval) << myarg->GetName() << "=" << myarg->getVal() << " ";
270 }
271 ooccoutP(samplingDist,Eval) << "[" << lowerEdgeOfAcceptance << ", "
272 << upperEdgeOfAcceptance << "] " << " in interval = " <<
273 (thisTestStatistic >= lowerEdgeOfAcceptance && thisTestStatistic <= upperEdgeOfAcceptance)
274 << std::endl << std::endl;
275
276 // Check if this data is in the acceptance region
277 if(thisTestStatistic >= lowerEdgeOfAcceptance && thisTestStatistic <= upperEdgeOfAcceptance) {
278 // if so, set this point to true
279 // fPointsToTest->add(*point, 1.); // this behaves differently for Hist and DataSet
280 pointsInInterval->add(*point);
281 ++npass;
282 }
283
284 if(fSaveBeltToFile){
285 //write to file
286 samplingDist->Write();
287 string tmpName = "hist_";
288 tmpName+=samplingDist->GetName();
289 TH1F h{tmpName.c_str(),"",500,0.,5.};
290 for(int ii=0; ii<samplingDist->GetSize(); ++ii){
291 h.Fill(samplingDist->GetSamplingDistribution().at(ii) );
292 }
293 h.Write();
294 }
295
296 delete samplingDist;
297 // delete point; // from dataset
298 }
299 oocoutI(pointsInInterval,Eval) << npass << " points in interval" << std::endl;
300
301 // create an interval based pointsInInterval
302 PointSetInterval* interval
303 = new PointSetInterval("ClassicalConfidenceInterval", *pointsInInterval);
304
305
306 if(fSaveBeltToFile){
307 // write belt to file
308 fConfBelt->Write();
309
310 f->Close();
311 }
312
313 delete f;
314 //delete data;
315 return interval;
316}
#define f(i)
Definition RSha256.hxx:104
#define h(i)
Definition RSha256.hxx:106
ROOT::RRangeCast< T, false, Range_t > static_range_cast(Range_t &&coll)
#define oocoutE(o, a)
#define oocoutI(o, a)
#define ooccoutP(o, a)
#define ooccoutD(o, a)
int Int_t
Signed integer 4 bytes (int).
Definition RtypesCore.h:59
void assign(const RooAbsCollection &other) const
Sets the value, cache and constant attribute of any argument in our set that also appears in the othe...
Abstract base class for binned and unbinned datasets.
Definition RooAbsData.h:56
RooArgSet is a container object that can hold multiple RooAbsArg objects.
Definition RooArgSet.h:24
Container class to hold unbinned data.
Definition RooDataSet.h:32
void add(const RooArgSet &row, double weight, double weightError)
Add one ore more rows of data.
< A class that holds configuration information for a model using a workspace as a store
Definition ModelConfig.h:34
bool fAdaptiveSampling
controls use of adaptive sampling algorithm
double fSize
size of the test (eg. specified rate of Type I error)
PointSetInterval * GetInterval() const override
Main interface to get a ConfInterval (will be a PointSetInterval).
~NeymanConstruction() override
default constructor if(fOwnsWorkspace && fWS) delete fWS; if(fConfBelt) delete fConfBelt;
bool fSaveBeltToFile
controls use if ConfidenceBelt should be saved to a TFile
NeymanConstruction(RooAbsData &data, ModelConfig &model)
default constructor
double fAdditionalNToysFactor
give user ability to ask for more toys
bool fCreateBelt
controls use if ConfidenceBelt should be saved to a TFile
PointSetInterval is a concrete implementation of the ConfInterval interface.
This class simply holds a sampling distribution of some test statistic.
Int_t GetSize() const
size of samples
double InverseCDF(double pvalue)
get the inverse of the Cumulative distribution function
const std::vector< double > & GetSamplingDistribution() const
Get test statistics values.
ToyMCSampler is an implementation of the TestStatSampler interface.
virtual SamplingDistribution * AppendSamplingDistribution(RooArgSet &allParameters, SamplingDistribution *last, Int_t additionalMC)
Extended interface to append to sampling distribution more samples.
A file, usually with extension .root, that stores data and code in the form of serialized objects in ...
Definition TFile.h:130
1-D histogram with a float per channel (see TH1 documentation)
Definition TH1.h:878
const char * GetName() const override
Returns name of object.
Definition TNamed.h:49
virtual Int_t Write(const char *name=nullptr, Int_t option=0, Int_t bufsize=0)
Write this object to the current directory.
Definition TObject.cxx:989
const Double_t sigma
The namespace RooFit contains mostly switches that change the behaviour of functions of PDFs (or othe...
Definition CodegenImpl.h:72
Namespace for the RooStats classes.
Definition CodegenImpl.h:66