Logo ROOT   6.18/05
Reference Guide
MethodLikelihood.cxx
Go to the documentation of this file.
1// @(#)root/tmva $Id$
2// Author: Andreas Hoecker, Joerg Stelzer, Helge Voss, Kai Voss, Eckhard von Toerne, Jan Therhaag
3
4/**********************************************************************************
5 * Project: TMVA - a Root-integrated toolkit for multivariate Data analysis *
6 * Package: TMVA *
7 * Class : TMVA::MethodLikelihood *
8 * Web : http://tmva.sourceforge.net *
9 * *
10 * Description: *
11 * Implementation (see header for description) *
12 * *
13 * Authors (alphabetical): *
14 * Andreas Hoecker <Andreas.Hocker@cern.ch> - CERN, Switzerland *
15 * Helge Voss <Helge.Voss@cern.ch> - MPI-K Heidelberg, Germany *
16 * Kai Voss <Kai.Voss@cern.ch> - U. of Victoria, Canada *
17 * Jan Therhaag <Jan.Therhaag@cern.ch> - U of Bonn, Germany *
18 * Eckhard v. Toerne <evt@uni-bonn.de> - U. of Bonn, Germany *
19 * *
20 * Copyright (c) 2005-2011: *
21 * CERN, Switzerland *
22 * U. of Victoria, Canada *
23 * MPI-K Heidelberg, Germany *
24 * U. of Bonn, Germany *
25 * *
26 * Redistribution and use in source and binary forms, with or without *
27 * modification, are permitted according to the terms listed in LICENSE *
28 * (http://tmva.sourceforge.net/LICENSE) *
29 **********************************************************************************/
30
31/*! \class TMVA::MethodLikelihood
32\ingroup TMVA
33
34Likelihood analysis ("non-parametric approach")
35
36
37Also implemented is a "diagonalized likelihood approach",
38which improves over the uncorrelated likelihood approach by
39transforming linearly the input variables into a diagonal space,
40using the square-root of the covariance matrix
41
42
43The method of maximum likelihood is the most straightforward, and
44certainly among the most elegant multivariate analyser approaches.
45We define the likelihood ratio, \f$ R_L \f$, for event
46\f$ i \f$, by:
47
48\f[
49R_L(i) = \frac{L_S(i)}{L_B(i) + L_B(i)}
50\f]
51
52Here the signal and background likelihoods, \f$ L_S \f$,
53\f$ L_B \f$, are products of the corresponding probability
54densities, \f$ p_S \f$, \f$ p_B \f$, of the
55\f$ N_{var} \f$ discriminating variables used in the MVA:
56
57\f[
58L_S(i) \ \prod_{j=1}^{N_{var}} p_{Sj} (i)
59\f]
60
61and accordingly for \f$ L_B \f$.
62In practise, TMVA uses polynomial splines to estimate the probability
63density functions (PDF) obtained from the distributions of the
64training variables.
65
66
67Note that in TMVA the output of the likelihood ratio is transformed by:
68
69\f[
70R_L(i) \to R'_L(i) = -\frac{1}{\tau} ln(R_L^{-1}(i) -1)
71\f]
72
73to avoid the occurrence of heavy peaks at \f$ R_L = 0.1 \f$ .
74
75#### Decorrelated (or "diagonalized") Likelihood
76
77The biggest drawback of the Likelihood approach is that it assumes
78that the discriminant variables are uncorrelated. If it were the case,
79it can be proven that the discrimination obtained by the above likelihood
80ratio is optimal, ie, no other method can beat it. However, in most
81practical applications of MVAs correlations are present. </p>
82
83
84Linear correlations, measured from the training sample, can be taken
85into account in a straightforward manner through the square-root
86of the covariance matrix. The square-root of a matrix
87\f$ C \f$ is the matrix \f$ C&prime; \f$ that multiplied with itself
88yields \f$ C \f$: \f$ C \f$=\f$ C&prime;C&prime; \f$. We compute the
89square-root matrix (SQM) by means of diagonalising (\f$ D \f$) the
90covariance matrix:
91
92\f[
93D = S^TCS \Rightarrow C' = S \sqrt{DS^T}
94\f]
95
96and the linear transformation of the linearly correlated into the
97uncorrelated variables space is then given by multiplying the measured
98variable tuple by the inverse of the SQM. Note that these transformations
99are performed for both signal and background separately, since the
100correlation pattern is not the same in the two samples.
101
102
103The above diagonalisation is complete for linearly correlated,
104Gaussian distributed variables only. In real-world examples this
105is not often the case, so that only little additional information
106may be recovered by the diagonalisation procedure. In these cases,
107non-linear methods must be applied.
108*/
109
111
112#include "TMVA/Configurable.h"
114#include "TMVA/DataSet.h"
115#include "TMVA/DataSetInfo.h"
116#include "TMVA/IMethod.h"
117#include "TMVA/MethodBase.h"
118#include "TMVA/MsgLogger.h"
119#include "TMVA/PDF.h"
120#include "TMVA/Ranking.h"
121#include "TMVA/Tools.h"
122#include "TMVA/Types.h"
123#include "TMVA/VariableInfo.h"
124
125#include "TMatrixD.h"
126#include "TVector.h"
127#include "TMath.h"
128#include "TObjString.h"
129#include "TFile.h"
130#include "TKey.h"
131#include "TH1.h"
132#include "TClass.h"
133#include "Riostream.h"
134
135#include <iomanip>
136#include <vector>
137#include <cstdlib>
138
139REGISTER_METHOD(Likelihood)
140
142
143////////////////////////////////////////////////////////////////////////////////
144/// standard constructor
145
147 const TString& methodTitle,
148 DataSetInfo& theData,
149 const TString& theOption ) :
150 TMVA::MethodBase( jobName, Types::kLikelihood, methodTitle, theData, theOption),
151 fEpsilon ( 1.e3 * DBL_MIN ),
152 fTransformLikelihoodOutput( kFALSE ),
153 fDropVariable ( 0 ),
154 fHistSig ( 0 ),
155 fHistBgd ( 0 ),
156 fHistSig_smooth( 0 ),
157 fHistBgd_smooth( 0 ),
158 fDefaultPDFLik ( 0 ),
159 fPDFSig ( 0 ),
160 fPDFBgd ( 0 ),
161 fNsmooth ( 2 ),
162 fNsmoothVarS ( 0 ),
163 fNsmoothVarB ( 0 ),
164 fAverageEvtPerBin( 0 ),
165 fAverageEvtPerBinVarS (0),
166 fAverageEvtPerBinVarB (0),
167 fKDEfineFactor ( 0 ),
168 fInterpolateString(0)
169{
170}
171
172////////////////////////////////////////////////////////////////////////////////
173/// construct likelihood references from file
174
176 const TString& theWeightFile) :
177 TMVA::MethodBase( Types::kLikelihood, theData, theWeightFile),
178 fEpsilon ( 1.e3 * DBL_MIN ),
179 fTransformLikelihoodOutput( kFALSE ),
180 fDropVariable ( 0 ),
181 fHistSig ( 0 ),
182 fHistBgd ( 0 ),
183 fHistSig_smooth( 0 ),
184 fHistBgd_smooth( 0 ),
185 fDefaultPDFLik ( 0 ),
186 fPDFSig ( 0 ),
187 fPDFBgd ( 0 ),
188 fNsmooth ( 2 ),
189 fNsmoothVarS ( 0 ),
190 fNsmoothVarB ( 0 ),
191 fAverageEvtPerBin( 0 ),
192 fAverageEvtPerBinVarS (0),
193 fAverageEvtPerBinVarB (0),
194 fKDEfineFactor ( 0 ),
195 fInterpolateString(0)
196{
197}
198
199////////////////////////////////////////////////////////////////////////////////
200/// destructor
201
203{
204 if (NULL != fDefaultPDFLik) delete fDefaultPDFLik;
205 if (NULL != fHistSig) delete fHistSig;
206 if (NULL != fHistBgd) delete fHistBgd;
207 if (NULL != fHistSig_smooth) delete fHistSig_smooth;
208 if (NULL != fHistBgd_smooth) delete fHistBgd_smooth;
209 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
210 if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
211 if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
212 }
213 if (NULL != fPDFSig) delete fPDFSig;
214 if (NULL != fPDFBgd) delete fPDFBgd;
215}
216
217////////////////////////////////////////////////////////////////////////////////
218/// FDA can handle classification with 2 classes
219
221 UInt_t numberClasses, UInt_t /*numberTargets*/ )
222{
223 if (type == Types::kClassification && numberClasses == 2) return kTRUE;
224 return kFALSE;
225}
226
227////////////////////////////////////////////////////////////////////////////////
228/// default initialisation called by all constructors
229
231{
232 // no ranking test
233 fDropVariable = -1;
234 fHistSig = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
235 fHistBgd = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
236 fHistSig_smooth = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
237 fHistBgd_smooth = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
238 fPDFSig = new std::vector<TMVA::PDF*>( GetNvar(), (TMVA::PDF*)0 );
239 fPDFBgd = new std::vector<TMVA::PDF*>( GetNvar(), (TMVA::PDF*)0 );
240}
241
242////////////////////////////////////////////////////////////////////////////////
243/// define the options (their key words) that can be set in the option string
244///
245/// TransformOutput <bool> transform (often strongly peaked) likelihood output through sigmoid inversion
246
248{
249 DeclareOptionRef( fTransformLikelihoodOutput = kFALSE, "TransformOutput",
250 "Transform likelihood output by inverse sigmoid function" );
251
252 // initialize
253
254 // reading every PDF's definition and passing the option string to the next one to be read and marked
255 TString updatedOptions = GetOptions();
256 fDefaultPDFLik = new PDF( TString(GetName()) + " PDF", updatedOptions );
257 fDefaultPDFLik->DeclareOptions();
258 fDefaultPDFLik->ParseOptions();
259 updatedOptions = fDefaultPDFLik->GetOptions();
260 for (UInt_t ivar = 0; ivar< DataInfo().GetNVariables(); ivar++) {
261 (*fPDFSig)[ivar] = new PDF( Form("%s PDF Sig[%d]", GetName(), ivar), updatedOptions,
262 Form("Sig[%d]",ivar), fDefaultPDFLik );
263 (*fPDFSig)[ivar]->DeclareOptions();
264 (*fPDFSig)[ivar]->ParseOptions();
265 updatedOptions = (*fPDFSig)[ivar]->GetOptions();
266 (*fPDFBgd)[ivar] = new PDF( Form("%s PDF Bkg[%d]", GetName(), ivar), updatedOptions,
267 Form("Bkg[%d]",ivar), fDefaultPDFLik );
268 (*fPDFBgd)[ivar]->DeclareOptions();
269 (*fPDFBgd)[ivar]->ParseOptions();
270 updatedOptions = (*fPDFBgd)[ivar]->GetOptions();
271 }
272
273 // the final marked option string is written back to the original likelihood
274 SetOptions( updatedOptions );
275}
276
277
279{
280 // options that are used ONLY for the READER to ensure backward compatibility
281
283 DeclareOptionRef( fNsmooth = 1, "NSmooth",
284 "Number of smoothing iterations for the input histograms");
285 DeclareOptionRef( fAverageEvtPerBin = 50, "NAvEvtPerBin",
286 "Average number of events per PDF bin");
287 DeclareOptionRef( fKDEfineFactor =1. , "KDEFineFactor",
288 "Fine tuning factor for Adaptive KDE: Factor to multiply the width of the kernel");
289 DeclareOptionRef( fBorderMethodString = "None", "KDEborder",
290 "Border effects treatment (1=no treatment , 2=kernel renormalization, 3=sample mirroring)" );
291 DeclareOptionRef( fKDEiterString = "Nonadaptive", "KDEiter",
292 "Number of iterations (1=non-adaptive, 2=adaptive)" );
293 DeclareOptionRef( fKDEtypeString = "Gauss", "KDEtype",
294 "KDE kernel type (1=Gauss)" );
295 fAverageEvtPerBinVarS = new Int_t[GetNvar()];
296 fAverageEvtPerBinVarB = new Int_t[GetNvar()];
297 fNsmoothVarS = new Int_t[GetNvar()];
298 fNsmoothVarB = new Int_t[GetNvar()];
299 fInterpolateString = new TString[GetNvar()];
300 for(UInt_t i=0; i<GetNvar(); ++i) {
301 fAverageEvtPerBinVarS[i] = fAverageEvtPerBinVarB[i] = 0;
302 fNsmoothVarS[i] = fNsmoothVarB[i] = 0;
303 fInterpolateString[i] = "";
304 }
305 DeclareOptionRef( fAverageEvtPerBinVarS, GetNvar(), "NAvEvtPerBinSig",
306 "Average num of events per PDF bin and variable (signal)");
307 DeclareOptionRef( fAverageEvtPerBinVarB, GetNvar(), "NAvEvtPerBinBkg",
308 "Average num of events per PDF bin and variable (background)");
309 DeclareOptionRef(fNsmoothVarS, GetNvar(), "NSmoothSig",
310 "Number of smoothing iterations for the input histograms");
311 DeclareOptionRef(fNsmoothVarB, GetNvar(), "NSmoothBkg",
312 "Number of smoothing iterations for the input histograms");
313 DeclareOptionRef(fInterpolateString, GetNvar(), "PDFInterpol", "Method of interpolating reference histograms (e.g. Spline2 or KDE)");
314}
315
316////////////////////////////////////////////////////////////////////////////////
317/// process user options
318/// reference cut value to distinguish signal-like from background-like events
319
321{
322 SetSignalReferenceCut( TransformLikelihoodOutput( 0.5, 0.5 ) );
323
324 fDefaultPDFLik->ProcessOptions();
325 for (UInt_t ivar = 0; ivar< DataInfo().GetNVariables(); ivar++) {
326 (*fPDFBgd)[ivar]->ProcessOptions();
327 (*fPDFSig)[ivar]->ProcessOptions();
328 }
329}
330
331////////////////////////////////////////////////////////////////////////////////
332/// create reference distributions (PDFs) from signal and background events:
333/// fill histograms and smooth them; if decorrelation is required, compute
334/// corresponding square-root matrices
335/// the reference histograms require the correct boundaries. Since in Likelihood classification
336/// the transformations are applied using both classes, also the corresponding boundaries
337/// need to take this into account
338
340{
341 UInt_t nvar=GetNvar();
342 std::vector<Double_t> xmin(nvar), xmax(nvar);
343 for (UInt_t ivar=0; ivar<nvar; ivar++) {xmin[ivar]=1e30; xmax[ivar]=-1e30;}
344
345 UInt_t nevents=Data()->GetNEvents();
346 for (UInt_t ievt=0; ievt<nevents; ievt++) {
347 // use the true-event-type's transformation
348 // set the event true event types transformation
349 const Event* origEv = Data()->GetEvent(ievt);
350 if (IgnoreEventsWithNegWeightsInTraining() && origEv->GetWeight()<=0) continue;
351 // loop over classes
352 for (int cls=0;cls<2;cls++){
353 GetTransformationHandler().SetTransformationReferenceClass(cls);
354 const Event* ev = GetTransformationHandler().Transform( origEv );
355 for (UInt_t ivar=0; ivar<nvar; ivar++) {
356 Float_t value = ev->GetValue(ivar);
357 if (value < xmin[ivar]) xmin[ivar] = value;
358 if (value > xmax[ivar]) xmax[ivar] = value;
359 }
360 }
361 }
362
363 // create reference histograms
364 // (KDE smoothing requires very finely binned reference histograms)
365 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
366 TString var = (*fInputVars)[ivar];
367
368 // the reference histograms require the correct boundaries. Since in Likelihood classification
369 // the transformations are applied using both classes, also the corresponding boundaries
370 // need to take this into account
371
372 // special treatment for discrete variables
373 if (DataInfo().GetVariableInfo(ivar).GetVarType() == 'I') {
374 // special treatment for integer variables
375 Int_t ixmin = TMath::Nint( xmin[ivar] );
376 xmax[ivar]=xmax[ivar]+1; // make sure that all entries are included in histogram
377 Int_t ixmax = TMath::Nint( xmax[ivar] );
378 Int_t nbins = ixmax - ixmin;
379 (*fHistSig)[ivar] = new TH1F(GetMethodName()+"_"+var + "_sig", var + " signal training", nbins, ixmin, ixmax );
380 (*fHistBgd)[ivar] = new TH1F(GetMethodName()+"_"+var + "_bgd", var + " background training", nbins, ixmin, ixmax );
381 } else {
382
383 UInt_t minNEvt = TMath::Min(Data()->GetNEvtSigTrain(),Data()->GetNEvtBkgdTrain());
384 Int_t nbinsS = (*fPDFSig)[ivar]->GetHistNBins( minNEvt );
385 Int_t nbinsB = (*fPDFBgd)[ivar]->GetHistNBins( minNEvt );
386
387 (*fHistSig)[ivar] = new TH1F( Form("%s_%s_%s_sig",DataInfo().GetName(),GetMethodName().Data(),var.Data()),
388 Form("%s_%s_%s signal training",DataInfo().GetName(),GetMethodName().Data(),var.Data()), nbinsS, xmin[ivar], xmax[ivar] );
389 (*fHistBgd)[ivar] = new TH1F( Form("%s_%s_%s_bgd",DataInfo().GetName(),GetMethodName().Data(),var.Data()),
390 Form("%s_%s_%s background training",DataInfo().GetName(),GetMethodName().Data(),var.Data()), nbinsB, xmin[ivar], xmax[ivar] );
391 }
392 }
393
394 // ----- fill the reference histograms
395 Log() << kINFO << "Filling reference histograms" << Endl;
396
397 // event loop
398 for (Int_t ievt=0; ievt<Data()->GetNEvents(); ievt++) {
399
400 // use the true-event-type's transformation
401 // set the event true event types transformation
402 const Event* origEv = Data()->GetEvent(ievt);
403 if (IgnoreEventsWithNegWeightsInTraining() && origEv->GetWeight()<=0) continue;
404 GetTransformationHandler().SetTransformationReferenceClass( origEv->GetClass() );
405 const Event* ev = GetTransformationHandler().Transform( origEv );
406
407 // the event weight
408 Float_t weight = ev->GetWeight();
409
410 // fill variable vector
411 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
412 Double_t value = ev->GetValue(ivar);
413 // verify limits
414 if (value >= xmax[ivar]) value = xmax[ivar] - 1.0e-10;
415 else if (value < xmin[ivar]) value = xmin[ivar] + 1.0e-10;
416 // inserting check if there are events in overflow or underflow
417 if (value >=(*fHistSig)[ivar]->GetXaxis()->GetXmax() ||
418 value <(*fHistSig)[ivar]->GetXaxis()->GetXmin()){
419 Log()<<kWARNING
420 <<"error in filling likelihood reference histograms var="
421 <<(*fInputVars)[ivar]
422 << ", xmin="<<(*fHistSig)[ivar]->GetXaxis()->GetXmin()
423 << ", value="<<value
424 << ", xmax="<<(*fHistSig)[ivar]->GetXaxis()->GetXmax()
425 << Endl;
426 }
427 if (DataInfo().IsSignal(ev)) (*fHistSig)[ivar]->Fill( value, weight );
428 else (*fHistBgd)[ivar]->Fill( value, weight );
429 }
430 }
431
432 // building the pdfs
433 Log() << kINFO << "Building PDF out of reference histograms" << Endl;
434 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
435
436 // the PDF is built from (binned) reference histograms
437 // in case of KDE, this has a large number of bins, which makes it quasi-unbinned
438 (*fPDFSig)[ivar]->BuildPDF( (*fHistSig)[ivar] );
439 (*fPDFBgd)[ivar]->BuildPDF( (*fHistBgd)[ivar] );
440
441 (*fPDFSig)[ivar]->ValidatePDF( (*fHistSig)[ivar] );
442 (*fPDFBgd)[ivar]->ValidatePDF( (*fHistBgd)[ivar] );
443
444 // saving the smoothed histograms
445 if ((*fPDFSig)[ivar]->GetSmoothedHist() != 0) (*fHistSig_smooth)[ivar] = (*fPDFSig)[ivar]->GetSmoothedHist();
446 if ((*fPDFBgd)[ivar]->GetSmoothedHist() != 0) (*fHistBgd_smooth)[ivar] = (*fPDFBgd)[ivar]->GetSmoothedHist();
447 }
448 ExitFromTraining();
449}
450
451////////////////////////////////////////////////////////////////////////////////
452/// returns the likelihood estimator for signal
453/// fill a new Likelihood branch into the testTree
454
456{
457 UInt_t ivar;
458
459 // cannot determine error
460 NoErrorCalc(err, errUpper);
461
462 // retrieve variables, and transform, if required
463 TVector vs( GetNvar() );
464 TVector vb( GetNvar() );
465
466 // need to distinguish signal and background in case of variable transformation
467 // signal first
468
469 GetTransformationHandler().SetTransformationReferenceClass( fSignalClass );
470 // temporary: JS --> FIX
471 //GetTransformationHandler().SetTransformationReferenceClass( 0 );
472 const Event* ev = GetEvent();
473 for (ivar=0; ivar<GetNvar(); ivar++) vs(ivar) = ev->GetValue(ivar);
474
475 GetTransformationHandler().SetTransformationReferenceClass( fBackgroundClass );
476 // temporary: JS --> FIX
477 //GetTransformationHandler().SetTransformationReferenceClass( 1 );
478 ev = GetEvent();
479 for (ivar=0; ivar<GetNvar(); ivar++) vb(ivar) = ev->GetValue(ivar);
480
481 // compute the likelihood (signal)
482 Double_t ps(1), pb(1), p(0);
483 for (ivar=0; ivar<GetNvar(); ivar++) {
484
485 // drop one variable (this is ONLY used for internal variable ranking !)
486 if ((Int_t)ivar == fDropVariable) continue;
487
488 Double_t x[2] = { vs(ivar), vb(ivar) };
489
490 for (UInt_t itype=0; itype < 2; itype++) {
491
492 // verify limits
493 if (x[itype] >= (*fPDFSig)[ivar]->GetXmax()) x[itype] = (*fPDFSig)[ivar]->GetXmax() - 1.0e-10;
494 else if (x[itype] < (*fPDFSig)[ivar]->GetXmin()) x[itype] = (*fPDFSig)[ivar]->GetXmin();
495
496 // find corresponding histogram from cached indices
497 PDF* pdf = (itype == 0) ? (*fPDFSig)[ivar] : (*fPDFBgd)[ivar];
498 if (pdf == 0) Log() << kFATAL << "<GetMvaValue> Reference histograms don't exist" << Endl;
499 TH1* hist = pdf->GetPDFHist();
500
501 // interpolate linearly between adjacent bins
502 // this is not useful for discrete variables
503 Int_t bin = hist->FindBin(x[itype]);
504
505 // **** POTENTIAL BUG: PREFORMANCE IS WORSE WHEN USING TRUE TYPE ***
506 // ==> commented out at present
507 if ((*fPDFSig)[ivar]->GetInterpolMethod() == TMVA::PDF::kSpline0 ||
508 DataInfo().GetVariableInfo(ivar).GetVarType() == 'N') {
509 p = TMath::Max( hist->GetBinContent(bin), fEpsilon );
510 } else { // splined PDF
511 Int_t nextbin = bin;
512 if ((x[itype] > hist->GetBinCenter(bin) && bin != hist->GetNbinsX()) || bin == 1)
513 nextbin++;
514 else
515 nextbin--;
516
517
518 Double_t dx = hist->GetBinCenter(bin) - hist->GetBinCenter(nextbin);
519 Double_t dy = hist->GetBinContent(bin) - hist->GetBinContent(nextbin);
520 Double_t like = hist->GetBinContent(bin) + (x[itype] - hist->GetBinCenter(bin)) * dy/dx;
521
522 p = TMath::Max( like, fEpsilon );
523 }
524
525 if (itype == 0) ps *= p;
526 else pb *= p;
527 }
528 }
529
530 // the likelihood ratio (transform it ?)
531 return TransformLikelihoodOutput( ps, pb );
532}
533
534////////////////////////////////////////////////////////////////////////////////
535/// returns transformed or non-transformed output
536
538{
539 if (ps < fEpsilon) ps = fEpsilon;
540 if (pb < fEpsilon) pb = fEpsilon;
541 Double_t r = ps/(ps + pb);
542 if (r >= 1.0) r = 1. - 1.e-15;
543
544 if (fTransformLikelihoodOutput) {
545 // inverse Fermi function
546
547 // sanity check
548 if (r <= 0.0) r = fEpsilon;
549 else if (r >= 1.0) r = 1. - 1.e-15;
550
551 Double_t tau = 15.0;
552 r = - TMath::Log(1.0/r - 1.0)/tau;
553 }
554
555 return r;
556}
557
558////////////////////////////////////////////////////////////////////////////////
559/// write options to stream
560
561void TMVA::MethodLikelihood::WriteOptionsToStream( std::ostream& o, const TString& prefix ) const
562{
564
565 // writing the options defined for the different pdfs
566 if (fDefaultPDFLik != 0) {
567 o << prefix << std::endl << prefix << "#Default Likelihood PDF Options:" << std::endl << prefix << std::endl;
568 fDefaultPDFLik->WriteOptionsToStream( o, prefix );
569 }
570 for (UInt_t ivar = 0; ivar < fPDFSig->size(); ivar++) {
571 if ((*fPDFSig)[ivar] != 0) {
572 o << prefix << std::endl << prefix << Form("#Signal[%d] Likelihood PDF Options:",ivar) << std::endl << prefix << std::endl;
573 (*fPDFSig)[ivar]->WriteOptionsToStream( o, prefix );
574 }
575 if ((*fPDFBgd)[ivar] != 0) {
576 o << prefix << std::endl << prefix << "#Background[%d] Likelihood PDF Options:" << std::endl << prefix << std::endl;
577 (*fPDFBgd)[ivar]->WriteOptionsToStream( o, prefix );
578 }
579 }
580}
581
582////////////////////////////////////////////////////////////////////////////////
583/// write weights to XML
584
586{
587 void* wght = gTools().AddChild(parent, "Weights");
588 gTools().AddAttr(wght, "NVariables", GetNvar());
589 gTools().AddAttr(wght, "NClasses", 2);
590 void* pdfwrap;
591 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
592 if ( (*fPDFSig)[ivar]==0 || (*fPDFBgd)[ivar]==0 )
593 Log() << kFATAL << "Reference histograms for variable " << ivar
594 << " don't exist, can't write it to weight file" << Endl;
595 pdfwrap = gTools().AddChild(wght, "PDFDescriptor");
596 gTools().AddAttr(pdfwrap, "VarIndex", ivar);
597 gTools().AddAttr(pdfwrap, "ClassIndex", 0);
598 (*fPDFSig)[ivar]->AddXMLTo(pdfwrap);
599 pdfwrap = gTools().AddChild(wght, "PDFDescriptor");
600 gTools().AddAttr(pdfwrap, "VarIndex", ivar);
601 gTools().AddAttr(pdfwrap, "ClassIndex", 1);
602 (*fPDFBgd)[ivar]->AddXMLTo(pdfwrap);
603 }
604}
605
606////////////////////////////////////////////////////////////////////////////////
607/// computes ranking of input variables
608
610{
611 // create the ranking object
612 if (fRanking) delete fRanking;
613 fRanking = new Ranking( GetName(), "Delta Separation" );
614
615 Double_t sepRef = -1, sep = -1;
616 for (Int_t ivar=-1; ivar<(Int_t)GetNvar(); ivar++) {
617
618 // this variable should not be used
619 fDropVariable = ivar;
620
621 TString nameS = Form( "rS_%i", ivar+1 );
622 TString nameB = Form( "rB_%i", ivar+1 );
623 TH1* rS = new TH1F( nameS, nameS, 80, 0, 1 );
624 TH1* rB = new TH1F( nameB, nameB, 80, 0, 1 );
625
626 // the event loop
627 for (Int_t ievt=0; ievt<Data()->GetNTrainingEvents(); ievt++) {
628
629 const Event* origEv = Data()->GetEvent(ievt);
630 GetTransformationHandler().SetTransformationReferenceClass( origEv->GetClass() );
631 const Event* ev = GetTransformationHandler().Transform(Data()->GetEvent(ievt));
632
633 Double_t lk = this->GetMvaValue();
634 Double_t w = ev->GetWeight();
635 if (DataInfo().IsSignal(ev)) rS->Fill( lk, w );
636 else rB->Fill( lk, w );
637 }
638
639 // compute separation
640 sep = TMVA::gTools().GetSeparation( rS, rB );
641 if (ivar == -1) sepRef = sep;
642 sep = sepRef - sep;
643
644 // don't need these histograms anymore
645 delete rS;
646 delete rB;
647
648 if (ivar >= 0) fRanking->AddRank( Rank( DataInfo().GetVariableInfo(ivar).GetInternalName(), sep ) );
649 }
650
651 fDropVariable = -1;
652
653 return fRanking;
654}
655
656////////////////////////////////////////////////////////////////////////////////
657/// write reference PDFs to ROOT file
658
660{
661 TString pname = "PDF_";
662 for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
663 (*fPDFSig)[ivar]->Write( pname + GetInputVar( ivar ) + "_S" );
664 (*fPDFBgd)[ivar]->Write( pname + GetInputVar( ivar ) + "_B" );
665 }
666}
667
668////////////////////////////////////////////////////////////////////////////////
669/// read weights from XML
670
672{
673 TString pname = "PDF_";
674 Bool_t addDirStatus = TH1::AddDirectoryStatus();
675 TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
676 UInt_t nvars=0;
677 gTools().ReadAttr(wghtnode, "NVariables",nvars);
678 void* descnode = gTools().GetChild(wghtnode);
679 for (UInt_t ivar=0; ivar<nvars; ivar++){
680 void* pdfnode = gTools().GetChild(descnode);
681 Log() << kDEBUG << "Reading signal and background PDF for variable: " << GetInputVar( ivar ) << Endl;
682 if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
683 if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
684 (*fPDFSig)[ivar] = new PDF( GetInputVar( ivar ) + " PDF Sig" );
685 (*fPDFBgd)[ivar] = new PDF( GetInputVar( ivar ) + " PDF Bkg" );
686 (*fPDFSig)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
687 (*fPDFBgd)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
688 (*(*fPDFSig)[ivar]).ReadXML(pdfnode);
689 descnode = gTools().GetNextChild(descnode);
690 pdfnode = gTools().GetChild(descnode);
691 (*(*fPDFBgd)[ivar]).ReadXML(pdfnode);
692 descnode = gTools().GetNextChild(descnode);
693 }
694 TH1::AddDirectory(addDirStatus);
695}
696
697////////////////////////////////////////////////////////////////////////////////
698/// read weight info from file
699/// nothing to do for this method
700
702{
703 TString pname = "PDF_";
704 Bool_t addDirStatus = TH1::AddDirectoryStatus();
705 TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
706 for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
707 Log() << kDEBUG << "Reading signal and background PDF for variable: " << GetInputVar( ivar ) << Endl;
708 if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
709 if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
710 (*fPDFSig)[ivar] = new PDF(GetInputVar( ivar ) + " PDF Sig" );
711 (*fPDFBgd)[ivar] = new PDF(GetInputVar( ivar ) + " PDF Bkg");
712 (*fPDFSig)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
713 (*fPDFBgd)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
714 istr >> *(*fPDFSig)[ivar];
715 istr >> *(*fPDFBgd)[ivar];
716 }
717 TH1::AddDirectory(addDirStatus);
718}
719
720////////////////////////////////////////////////////////////////////////////////
721/// read reference PDF from ROOT file
722
724{
725 TString pname = "PDF_";
726 Bool_t addDirStatus = TH1::AddDirectoryStatus();
727 TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
728 for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
729 (*fPDFSig)[ivar] = (TMVA::PDF*)rf.Get( Form( "PDF_%s_S", GetInputVar( ivar ).Data() ) );
730 (*fPDFBgd)[ivar] = (TMVA::PDF*)rf.Get( Form( "PDF_%s_B", GetInputVar( ivar ).Data() ) );
731 }
732 TH1::AddDirectory(addDirStatus);
733}
734
735////////////////////////////////////////////////////////////////////////////////
736/// write histograms and PDFs to file for monitoring purposes
737
739{
740 Log() << kINFO << "Write monitoring histograms to file: " << BaseDir()->GetPath() << Endl;
741 BaseDir()->cd();
742
743 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
744 (*fHistSig)[ivar]->Write();
745 (*fHistBgd)[ivar]->Write();
746 if ((*fHistSig_smooth)[ivar] != 0) (*fHistSig_smooth)[ivar]->Write();
747 if ((*fHistBgd_smooth)[ivar] != 0) (*fHistBgd_smooth)[ivar]->Write();
748 (*fPDFSig)[ivar]->GetPDFHist()->Write();
749 (*fPDFBgd)[ivar]->GetPDFHist()->Write();
750
751 if ((*fPDFSig)[ivar]->GetNSmoothHist() != 0) (*fPDFSig)[ivar]->GetNSmoothHist()->Write();
752 if ((*fPDFBgd)[ivar]->GetNSmoothHist() != 0) (*fPDFBgd)[ivar]->GetNSmoothHist()->Write();
753
754 // add special plots to check the smoothing in the GetVal method
755 Float_t xmin=((*fPDFSig)[ivar]->GetPDFHist()->GetXaxis())->GetXmin();
756 Float_t xmax=((*fPDFSig)[ivar]->GetPDFHist()->GetXaxis())->GetXmax();
757 TH1F* mm = new TH1F( (*fInputVars)[ivar]+"_additional_check",
758 (*fInputVars)[ivar]+"_additional_check", 15000, xmin, xmax );
759 Double_t intBin = (xmax-xmin)/15000;
760 for (Int_t bin=0; bin < 15000; bin++) {
761 Double_t x = (bin + 0.5)*intBin + xmin;
762 mm->SetBinContent(bin+1 ,(*fPDFSig)[ivar]->GetVal(x));
763 }
764 mm->Write();
765
766 // ---------- create cloned low-binned histogram for comparison in macros (mainly necessary for KDE)
767 TH1* h[2] = { (*fHistSig)[ivar], (*fHistBgd)[ivar] };
768 for (UInt_t i=0; i<2; i++) {
769 TH1* hclone = (TH1F*)h[i]->Clone( TString(h[i]->GetName()) + "_nice" );
770 hclone->SetName ( TString(h[i]->GetName()) + "_nice" );
771 hclone->SetTitle( TString(h[i]->GetTitle()) + "" );
772 if (hclone->GetNbinsX() > 100) {
773 Int_t resFactor = 5;
774 hclone->Rebin( resFactor );
775 hclone->Scale( 1.0/resFactor );
776 }
777 hclone->Write();
778 }
779 // ----------
780 }
781}
782
783////////////////////////////////////////////////////////////////////////////////
784/// write specific header of the classifier (mostly include files)
785
786void TMVA::MethodLikelihood::MakeClassSpecificHeader( std::ostream& fout, const TString& ) const
787{
788 fout << "#include <math.h>" << std::endl;
789 fout << "#include <cstdlib>" << std::endl;
790}
791
792////////////////////////////////////////////////////////////////////////////////
793/// write specific classifier response
794
795void TMVA::MethodLikelihood::MakeClassSpecific( std::ostream& fout, const TString& className ) const
796{
797 Int_t dp = fout.precision();
798 fout << " double fEpsilon;" << std::endl;
799
800 Int_t * nbin = new Int_t[GetNvar()];
801
802 Int_t nbinMax=-1;
803 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
804 nbin[ivar]=(*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX();
805 if (nbin[ivar] > nbinMax) nbinMax=nbin[ivar];
806 }
807
808 fout << " static float fRefS[][" << nbinMax << "]; "
809 << "// signal reference vector [nvars][max_nbins]" << std::endl;
810 fout << " static float fRefB[][" << nbinMax << "]; "
811 << "// backgr reference vector [nvars][max_nbins]" << std::endl << std::endl;
812 fout << "// if a variable has its PDF encoded as a spline0 --> treat it like an Integer valued one" <<std::endl;
813 fout << " bool fHasDiscretPDF[" << GetNvar() <<"]; "<< std::endl;
814 fout << " int fNbin[" << GetNvar() << "]; "
815 << "// number of bins (discrete variables may have less bins)" << std::endl;
816 fout << " double fHistMin[" << GetNvar() << "]; " << std::endl;
817 fout << " double fHistMax[" << GetNvar() << "]; " << std::endl;
818
819 fout << " double TransformLikelihoodOutput( double, double ) const;" << std::endl;
820 fout << "};" << std::endl;
821 fout << "" << std::endl;
822 fout << "inline void " << className << "::Initialize() " << std::endl;
823 fout << "{" << std::endl;
824 fout << " fEpsilon = " << fEpsilon << ";" << std::endl;
825 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
826 fout << " fNbin[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() << ";" << std::endl;
827 fout << " fHistMin[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetXaxis()->GetXmin() << ";" << std::endl;
828 fout << " fHistMax[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetXaxis()->GetXmax() << ";" << std::endl;
829 // sanity check (for previous code lines)
830 if ((((*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() != nbin[ivar] ||
831 (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX() != nbin[ivar])
832 ) ||
833 (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() != (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX()) {
834 Log() << kFATAL << "<MakeClassSpecific> Mismatch in binning of variable "
835 << "\"" << GetOriginalVarName(ivar) << "\" of type: \'" << DataInfo().GetVariableInfo(ivar).GetVarType()
836 << "\' : "
837 << "nxS = " << (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() << ", "
838 << "nxB = " << (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX()
839 << " while we expect " << nbin[ivar]
840 << Endl;
841 }
842 }
843 for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
844 if ((*fPDFSig)[ivar]->GetInterpolMethod() == TMVA::PDF::kSpline0)
845 fout << " fHasDiscretPDF[" << ivar <<"] = true; " << std::endl;
846 else
847 fout << " fHasDiscretPDF[" << ivar <<"] = false; " << std::endl;
848 }
849
850 fout << "}" << std::endl << std::endl;
851
852 fout << "inline double " << className
853 << "::GetMvaValue__( const std::vector<double>& inputValues ) const" << std::endl;
854 fout << "{" << std::endl;
855 fout << " double ps(1), pb(1);" << std::endl;
856 fout << " std::vector<double> inputValuesSig = inputValues;" << std::endl;
857 fout << " std::vector<double> inputValuesBgd = inputValues;" << std::endl;
858 if (GetTransformationHandler().GetTransformationList().GetSize() != 0) {
859 fout << " Transform(inputValuesSig,0);" << std::endl;
860 fout << " Transform(inputValuesBgd,1);" << std::endl;
861 }
862 fout << " for (size_t ivar = 0; ivar < GetNvar(); ivar++) {" << std::endl;
863 fout << std::endl;
864 fout << " // dummy at present... will be used for variable transforms" << std::endl;
865 fout << " double x[2] = { inputValuesSig[ivar], inputValuesBgd[ivar] };" << std::endl;
866 fout << std::endl;
867 fout << " for (int itype=0; itype < 2; itype++) {" << std::endl;
868 fout << std::endl;
869 fout << " // interpolate linearly between adjacent bins" << std::endl;
870 fout << " // this is not useful for discrete variables (or forced Spline0)" << std::endl;
871 fout << " int bin = int((x[itype] - fHistMin[ivar])/(fHistMax[ivar] - fHistMin[ivar])*fNbin[ivar]) + 0;" << std::endl;
872 fout << std::endl;
873 fout << " // since the test data sample is in general different from the training sample" << std::endl;
874 fout << " // it can happen that the min/max of the training sample are trespassed --> correct this" << std::endl;
875 fout << " if (bin < 0) {" << std::endl;
876 fout << " bin = 0;" << std::endl;
877 fout << " x[itype] = fHistMin[ivar];" << std::endl;
878 fout << " }" << std::endl;
879 fout << " else if (bin >= fNbin[ivar]) {" << std::endl;
880 fout << " bin = fNbin[ivar]-1;" << std::endl;
881 fout << " x[itype] = fHistMax[ivar];" << std::endl;
882 fout << " }" << std::endl;
883 fout << std::endl;
884 fout << " // find corresponding histogram from cached indices" << std::endl;
885 fout << " float ref = (itype == 0) ? fRefS[ivar][bin] : fRefB[ivar][bin];" << std::endl;
886 fout << std::endl;
887 fout << " // sanity check" << std::endl;
888 fout << " if (ref < 0) {" << std::endl;
889 fout << " std::cout << \"Fatal error in " << className
890 << ": bin entry < 0 ==> abort\" << std::endl;" << std::endl;
891 fout << " std::exit(1);" << std::endl;
892 fout << " }" << std::endl;
893 fout << std::endl;
894 fout << " double p = ref;" << std::endl;
895 fout << std::endl;
896 fout << " if (GetType(ivar) != 'I' && !fHasDiscretPDF[ivar]) {" << std::endl;
897 fout << " float bincenter = (bin + 0.5)/fNbin[ivar]*(fHistMax[ivar] - fHistMin[ivar]) + fHistMin[ivar];" << std::endl;
898 fout << " int nextbin = bin;" << std::endl;
899 fout << " if ((x[itype] > bincenter && bin != fNbin[ivar]-1) || bin == 0) " << std::endl;
900 fout << " nextbin++;" << std::endl;
901 fout << " else" << std::endl;
902 fout << " nextbin--; " << std::endl;
903 fout << std::endl;
904 fout << " double refnext = (itype == 0) ? fRefS[ivar][nextbin] : fRefB[ivar][nextbin];" << std::endl;
905 fout << " float nextbincenter = (nextbin + 0.5)/fNbin[ivar]*(fHistMax[ivar] - fHistMin[ivar]) + fHistMin[ivar];" << std::endl;
906 fout << std::endl;
907 fout << " double dx = bincenter - nextbincenter;" << std::endl;
908 fout << " double dy = ref - refnext;" << std::endl;
909 fout << " p += (x[itype] - bincenter) * dy/dx;" << std::endl;
910 fout << " }" << std::endl;
911 fout << std::endl;
912 fout << " if (p < fEpsilon) p = fEpsilon; // avoid zero response" << std::endl;
913 fout << std::endl;
914 fout << " if (itype == 0) ps *= p;" << std::endl;
915 fout << " else pb *= p;" << std::endl;
916 fout << " } " << std::endl;
917 fout << " } " << std::endl;
918 fout << std::endl;
919 fout << " // the likelihood ratio (transform it ?)" << std::endl;
920 fout << " return TransformLikelihoodOutput( ps, pb ); " << std::endl;
921 fout << "}" << std::endl << std::endl;
922
923 fout << "inline double " << className << "::TransformLikelihoodOutput( double ps, double pb ) const" << std::endl;
924 fout << "{" << std::endl;
925 fout << " // returns transformed or non-transformed output" << std::endl;
926 fout << " if (ps < fEpsilon) ps = fEpsilon;" << std::endl;
927 fout << " if (pb < fEpsilon) pb = fEpsilon;" << std::endl;
928 fout << " double r = ps/(ps + pb);" << std::endl;
929 fout << " if (r >= 1.0) r = 1. - 1.e-15;" << std::endl;
930 fout << std::endl;
931 fout << " if (" << (fTransformLikelihoodOutput ? "true" : "false") << ") {" << std::endl;
932 fout << " // inverse Fermi function" << std::endl;
933 fout << std::endl;
934 fout << " // sanity check" << std::endl;
935 fout << " if (r <= 0.0) r = fEpsilon;" << std::endl;
936 fout << " else if (r >= 1.0) r = 1. - 1.e-15;" << std::endl;
937 fout << std::endl;
938 fout << " double tau = 15.0;" << std::endl;
939 fout << " r = - log(1.0/r - 1.0)/tau;" << std::endl;
940 fout << " }" << std::endl;
941 fout << std::endl;
942 fout << " return r;" << std::endl;
943 fout << "}" << std::endl;
944 fout << std::endl;
945
946 fout << "// Clean up" << std::endl;
947 fout << "inline void " << className << "::Clear() " << std::endl;
948 fout << "{" << std::endl;
949 fout << " // nothing to clear" << std::endl;
950 fout << "}" << std::endl << std::endl;
951
952 fout << "// signal map" << std::endl;
953 fout << "float " << className << "::fRefS[][" << nbinMax << "] = " << std::endl;
954 fout << "{ " << std::endl;
955 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
956 fout << " { ";
957 for (Int_t ibin=1; ibin<=nbinMax; ibin++) {
958 if (ibin-1 < nbin[ivar])
959 fout << (*fPDFSig)[ivar]->GetPDFHist()->GetBinContent(ibin);
960 else
961 fout << -1;
962
963 if (ibin < nbinMax) fout << ", ";
964 }
965 fout << " }, " << std::endl;
966 }
967 fout << "}; " << std::endl;
968 fout << std::endl;
969
970 fout << "// background map" << std::endl;
971 fout << "float " << className << "::fRefB[][" << nbinMax << "] = " << std::endl;
972 fout << "{ " << std::endl;
973 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
974 fout << " { ";
975 fout << std::setprecision(8);
976 for (Int_t ibin=1; ibin<=nbinMax; ibin++) {
977 if (ibin-1 < nbin[ivar])
978 fout << (*fPDFBgd)[ivar]->GetPDFHist()->GetBinContent(ibin);
979 else
980 fout << -1;
981
982 if (ibin < nbinMax) fout << ", ";
983 }
984 fout << " }, " << std::endl;
985 }
986 fout << "}; " << std::endl;
987 fout << std::endl;
988 fout << std::setprecision(dp);
989
990 delete[] nbin;
991}
992
993////////////////////////////////////////////////////////////////////////////////
994/// get help message text
995///
996/// typical length of text line:
997/// "|--------------------------------------------------------------|"
998
1000{
1001 Log() << Endl;
1002 Log() << gTools().Color("bold") << "--- Short description:" << gTools().Color("reset") << Endl;
1003 Log() << Endl;
1004 Log() << "The maximum-likelihood classifier models the data with probability " << Endl;
1005 Log() << "density functions (PDF) reproducing the signal and background" << Endl;
1006 Log() << "distributions of the input variables. Correlations among the " << Endl;
1007 Log() << "variables are ignored." << Endl;
1008 Log() << Endl;
1009 Log() << gTools().Color("bold") << "--- Performance optimisation:" << gTools().Color("reset") << Endl;
1010 Log() << Endl;
1011 Log() << "Required for good performance are decorrelated input variables" << Endl;
1012 Log() << "(PCA transformation via the option \"VarTransform=Decorrelate\"" << Endl;
1013 Log() << "may be tried). Irreducible non-linear correlations may be reduced" << Endl;
1014 Log() << "by precombining strongly correlated input variables, or by simply" << Endl;
1015 Log() << "removing one of the variables." << Endl;
1016 Log() << Endl;
1017 Log() << gTools().Color("bold") << "--- Performance tuning via configuration options:" << gTools().Color("reset") << Endl;
1018 Log() << Endl;
1019 Log() << "High fidelity PDF estimates are mandatory, i.e., sufficient training " << Endl;
1020 Log() << "statistics is required to populate the tails of the distributions" << Endl;
1021 Log() << "It would be a surprise if the default Spline or KDE kernel parameters" << Endl;
1022 Log() << "provide a satisfying fit to the data. The user is advised to properly" << Endl;
1023 Log() << "tune the events per bin and smooth options in the spline cases" << Endl;
1024 Log() << "individually per variable. If the KDE kernel is used, the adaptive" << Endl;
1025 Log() << "Gaussian kernel may lead to artefacts, so please always also try" << Endl;
1026 Log() << "the non-adaptive one." << Endl;
1027 Log() << "" << Endl;
1028 Log() << "All tuning parameters must be adjusted individually for each input" << Endl;
1029 Log() << "variable!" << Endl;
1030}
1031
#define REGISTER_METHOD(CLASS)
for example
ROOT::R::TRInterface & r
Definition: Object.C:4
#define h(i)
Definition: RSha256.hxx:106
int Int_t
Definition: RtypesCore.h:41
unsigned int UInt_t
Definition: RtypesCore.h:42
const Bool_t kFALSE
Definition: RtypesCore.h:88
bool Bool_t
Definition: RtypesCore.h:59
double Double_t
Definition: RtypesCore.h:55
float Float_t
Definition: RtypesCore.h:53
const Bool_t kTRUE
Definition: RtypesCore.h:87
#define ClassImp(name)
Definition: Rtypes.h:365
int type
Definition: TGX11.cxx:120
float xmin
Definition: THbookFile.cxx:93
float xmax
Definition: THbookFile.cxx:93
char * Form(const char *fmt,...)
virtual TObject * Get(const char *namecycle)
Return pointer to object identified by namecycle.
A ROOT file is a suite of consecutive data records (TKey instances) with a well defined format.
Definition: TFile.h:48
1-D histogram with a float per channel (see TH1 documentation)}
Definition: TH1.h:571
The TH1 histogram class.
Definition: TH1.h:56
virtual void SetTitle(const char *title)
See GetStatOverflows for more information.
Definition: TH1.cxx:6309
virtual Double_t GetBinCenter(Int_t bin) const
Return bin center for 1D histogram.
Definition: TH1.cxx:8554
static void AddDirectory(Bool_t add=kTRUE)
Sets the flag controlling the automatic add of histograms in memory.
Definition: TH1.cxx:1225
virtual Int_t GetNbinsX() const
Definition: TH1.h:292
virtual Int_t Fill(Double_t x)
Increment bin with abscissa X by 1.
Definition: TH1.cxx:3258
virtual void SetName(const char *name)
Change the name of this histogram.
Definition: TH1.cxx:8374
virtual Double_t GetBinContent(Int_t bin) const
Return content of bin number bin.
Definition: TH1.cxx:4882
virtual void Scale(Double_t c1=1, Option_t *option="")
Multiply this histogram by a constant c1.
Definition: TH1.cxx:6218
virtual Int_t FindBin(Double_t x, Double_t y=0, Double_t z=0)
Return Global bin number corresponding to x,y,z.
Definition: TH1.cxx:3579
static Bool_t AddDirectoryStatus()
Static function: cannot be inlined on Windows/NT.
Definition: TH1.cxx:705
virtual TH1 * Rebin(Int_t ngroup=2, const char *newname="", const Double_t *xbins=0)
Rebin this histogram.
Definition: TH1.cxx:5876
void WriteOptionsToStream(std::ostream &o, const TString &prefix) const
write options to output stream (e.g. in writing the MVA weight files
Class that contains all the data information.
Definition: DataSetInfo.h:60
Float_t GetValue(UInt_t ivar) const
return value of i'th variable
Definition: Event.cxx:237
Double_t GetWeight() const
return the event weight - depending on whether the flag IgnoreNegWeightsInTraining is or not.
Definition: Event.cxx:382
UInt_t GetClass() const
Definition: Event.h:87
Virtual base Class for all MVA method.
Definition: MethodBase.h:109
virtual void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
Definition: MethodBase.cxx:601
Likelihood analysis ("non-parametric approach")
const Ranking * CreateRanking()
computes ranking of input variables
void Train()
create reference distributions (PDFs) from signal and background events: fill histograms and smooth t...
virtual Bool_t HasAnalysisType(Types::EAnalysisType type, UInt_t numberClasses, UInt_t numberTargets)
FDA can handle classification with 2 classes.
virtual void WriteOptionsToStream(std::ostream &o, const TString &prefix) const
write options to stream
void WriteMonitoringHistosToFile() const
write histograms and PDFs to file for monitoring purposes
void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
void MakeClassSpecific(std::ostream &, const TString &) const
write specific classifier response
virtual ~MethodLikelihood()
destructor
void Init()
default initialisation called by all constructors
void ReadWeightsFromStream(std::istream &istr)
read weight info from file nothing to do for this method
MethodLikelihood(const TString &jobName, const TString &methodTitle, DataSetInfo &theData, const TString &theOption="")
standard constructor
void GetHelpMessage() const
get help message text
Double_t GetMvaValue(Double_t *err=0, Double_t *errUpper=0)
returns the likelihood estimator for signal fill a new Likelihood branch into the testTree
void ReadWeightsFromXML(void *wghtnode)
read weights from XML
void MakeClassSpecificHeader(std::ostream &, const TString &="") const
write specific header of the classifier (mostly include files)
Double_t TransformLikelihoodOutput(Double_t ps, Double_t pb) const
returns transformed or non-transformed output
void ProcessOptions()
process user options reference cut value to distinguish signal-like from background-like events
void WriteWeightsToStream(TFile &rf) const
write reference PDFs to ROOT file
void AddWeightsXMLTo(void *parent) const
write weights to XML
void DeclareOptions()
define the options (their key words) that can be set in the option string
PDF wrapper for histograms; uses user-defined spline interpolation.
Definition: PDF.h:63
TH1 * GetPDFHist() const
Definition: PDF.h:92
@ kSpline0
Definition: PDF.h:70
Ranking for variables in method (implementation)
Definition: Ranking.h:48
void * GetNextChild(void *prevchild, const char *childname=0)
XML helpers.
Definition: Tools.cxx:1174
Double_t GetSeparation(TH1 *S, TH1 *B) const
compute "separation" defined as
Definition: Tools.cxx:133
void * AddChild(void *parent, const char *childname, const char *content=0, bool isRootNode=false)
add child node
Definition: Tools.cxx:1136
const TString & Color(const TString &)
human readable color strings
Definition: Tools.cxx:840
void * GetChild(void *parent, const char *childname=0)
get child node
Definition: Tools.cxx:1162
void ReadAttr(void *node, const char *, T &value)
read attribute from xml
Definition: Tools.h:337
void AddAttr(void *node, const char *, const T &value, Int_t precision=16)
add attribute to xml
Definition: Tools.h:355
Singleton class for Global types used by TMVA.
Definition: Types.h:73
EAnalysisType
Definition: Types.h:127
@ kClassification
Definition: Types.h:128
virtual Int_t Write(const char *name=0, Int_t option=0, Int_t bufsize=0)
Write this object to the current directory.
Definition: TObject.cxx:785
Basic string class.
Definition: TString.h:131
const char * Data() const
Definition: TString.h:364
TVectorT.
Definition: TVectorT.h:27
Double_t x[n]
Definition: legend1.C:17
std::string GetMethodName(TCppMethod_t)
Definition: Cppyy.cxx:753
std::string GetName(const std::string &scope_name)
Definition: Cppyy.cxx:146
static constexpr double mm
static constexpr double ps
create variable transformations
Tools & gTools()
MsgLogger & Endl(MsgLogger &ml)
Definition: MsgLogger.h:158
Int_t Nint(T x)
Round to nearest integer. Rounds half integers to the nearest even integer.
Definition: TMath.h:701
Short_t Max(Short_t a, Short_t b)
Definition: TMathBase.h:212
Double_t Log(Double_t x)
Definition: TMath.h:748
Short_t Min(Short_t a, Short_t b)
Definition: TMathBase.h:180