Logo ROOT   6.10/09
Reference Guide
MethodLikelihood.cxx
Go to the documentation of this file.
1 // @(#)root/tmva $Id$
2 // Author: Andreas Hoecker, Joerg Stelzer, Helge Voss, Kai Voss, Eckhard von Toerne, Jan Therhaag
3 
4 /**********************************************************************************
5  * Project: TMVA - a Root-integrated toolkit for multivariate Data analysis *
6  * Package: TMVA *
7  * Class : TMVA::MethodLikelihood *
8  * Web : http://tmva.sourceforge.net *
9  * *
10  * Description: *
11  * Implementation (see header for description) *
12  * *
13  * Authors (alphabetical): *
14  * Andreas Hoecker <Andreas.Hocker@cern.ch> - CERN, Switzerland *
15  * Helge Voss <Helge.Voss@cern.ch> - MPI-K Heidelberg, Germany *
16  * Kai Voss <Kai.Voss@cern.ch> - U. of Victoria, Canada *
17  * Jan Therhaag <Jan.Therhaag@cern.ch> - U of Bonn, Germany *
18  * Eckhard v. Toerne <evt@uni-bonn.de> - U. of Bonn, Germany *
19  * *
20  * Copyright (c) 2005-2011: *
21  * CERN, Switzerland *
22  * U. of Victoria, Canada *
23  * MPI-K Heidelberg, Germany *
24  * U. of Bonn, Germany *
25  * *
26  * Redistribution and use in source and binary forms, with or without *
27  * modification, are permitted according to the terms listed in LICENSE *
28  * (http://tmva.sourceforge.net/LICENSE) *
29  **********************************************************************************/
30 
31 /*! \class TMVA::MethodLikelihood
32 \ingroup TMVA
33 
34 Likelihood analysis ("non-parametric approach")
35 
36 
37 Also implemented is a "diagonalized likelihood approach",
38 which improves over the uncorrelated likelihood approach by
39 transforming linearly the input variables into a diagonal space,
40 using the square-root of the covariance matrix
41 
42 
43 The method of maximum likelihood is the most straightforward, and
44 certainly among the most elegant multivariate analyser approaches.
45 We define the likelihood ratio, \f$ R_L \f$, for event
46 \f$ i \f$, by:
47 
48 \f[
49 R_L(i) = \frac{L_S(i)}{L_B(i) + L_B(i)}
50 \f]
51 
52 Here the signal and background likelihoods, \f$ L_S \f$,
53 \f$ L_B \f$, are products of the corresponding probability
54 densities, \f$ p_S \f$, \f$ p_B \f$, of the
55 \f$ N_{var} \f$ discriminating variables used in the MVA:
56 
57 \f[
58 L_S(i) \ \prod_{j=1}^{N_{var}} p_{Sj} (i)
59 \f]
60 
61 and accordingly for \f$ L_B \f$.
62 In practise, TMVA uses polynomial splines to estimate the probability
63 density functions (PDF) obtained from the distributions of the
64 training variables.
65 
66 
67 Note that in TMVA the output of the likelihood ratio is transformed by:
68 
69 \f[
70 R_L(i) \to R'_L(i) = -\frac{1}{\tau} ln(R_L^{-1}(i) -1)
71 \f]
72 
73 to avoid the occurrence of heavy peaks at \f$ R_L = 0.1 \f$ .
74 
75 #### Decorrelated (or "diagonalized") Likelihood
76 
77 The biggest drawback of the Likelihood approach is that it assumes
78 that the discriminant variables are uncorrelated. If it were the case,
79 it can be proven that the discrimination obtained by the above likelihood
80 ratio is optimal, ie, no other method can beat it. However, in most
81 practical applications of MVAs correlations are present. </p>
82 
83 
84 Linear correlations, measured from the training sample, can be taken
85 into account in a straightforward manner through the square-root
86 of the covariance matrix. The square-root of a matrix
87 \f$ C \f$ is the matrix \f$ C&prime; \f$ that multiplied with itself
88 yields \f$ C \f$: \f$ C \f$=\f$ C&prime;C&prime; \f$. We compute the
89 square-root matrix (SQM) by means of diagonalising (\f$ D \f$) the
90 covariance matrix:
91 
92 \f[
93 D = S^TCS \Rightarrow C' = S \sqrt{DS^T}
94 \f]
95 
96 and the linear transformation of the linearly correlated into the
97 uncorrelated variables space is then given by multiplying the measured
98 variable tuple by the inverse of the SQM. Note that these transformations
99 are performed for both signal and background separately, since the
100 correlation pattern is not the same in the two samples.
101 
102 
103 The above diagonalisation is complete for linearly correlated,
104 Gaussian distributed variables only. In real-world examples this
105 is not often the case, so that only little additional information
106 may be recovered by the diagonalisation procedure. In these cases,
107 non-linear methods must be applied.
108 */
109 
110 #include "TMVA/MethodLikelihood.h"
111 
112 #include "TMVA/Configurable.h"
113 #include "TMVA/ClassifierFactory.h"
114 #include "TMVA/DataSet.h"
115 #include "TMVA/DataSetInfo.h"
116 #include "TMVA/IMethod.h"
117 #include "TMVA/MethodBase.h"
118 #include "TMVA/MsgLogger.h"
119 #include "TMVA/PDF.h"
120 #include "TMVA/Ranking.h"
121 #include "TMVA/Tools.h"
122 #include "TMVA/Types.h"
123 #include "TMVA/VariableInfo.h"
124 
125 #include "TMatrixD.h"
126 #include "TVector.h"
127 #include "TMath.h"
128 #include "TObjString.h"
129 #include "TFile.h"
130 #include "TKey.h"
131 #include "TH1.h"
132 #include "TClass.h"
133 #include "Riostream.h"
134 
135 #include <iomanip>
136 #include <vector>
137 #include <cstdlib>
138 
139 REGISTER_METHOD(Likelihood)
140 
142 
143 ////////////////////////////////////////////////////////////////////////////////
144 /// standard constructor
145 
147  const TString& methodTitle,
148  DataSetInfo& theData,
149  const TString& theOption ) :
150  TMVA::MethodBase( jobName, Types::kLikelihood, methodTitle, theData, theOption),
151  fEpsilon ( 1.e3 * DBL_MIN ),
152  fTransformLikelihoodOutput( kFALSE ),
153  fDropVariable ( 0 ),
154  fHistSig ( 0 ),
155  fHistBgd ( 0 ),
156  fHistSig_smooth( 0 ),
157  fHistBgd_smooth( 0 ),
158  fDefaultPDFLik ( 0 ),
159  fPDFSig ( 0 ),
160  fPDFBgd ( 0 ),
161  fNsmooth ( 2 ),
162  fNsmoothVarS ( 0 ),
163  fNsmoothVarB ( 0 ),
164  fAverageEvtPerBin( 0 ),
165  fAverageEvtPerBinVarS (0),
166  fAverageEvtPerBinVarB (0),
167  fKDEfineFactor ( 0 ),
168  fInterpolateString(0)
169 {
170 }
171 
172 ////////////////////////////////////////////////////////////////////////////////
173 /// construct likelihood references from file
174 
176  const TString& theWeightFile) :
177  TMVA::MethodBase( Types::kLikelihood, theData, theWeightFile),
178  fEpsilon ( 1.e3 * DBL_MIN ),
179  fTransformLikelihoodOutput( kFALSE ),
180  fDropVariable ( 0 ),
181  fHistSig ( 0 ),
182  fHistBgd ( 0 ),
183  fHistSig_smooth( 0 ),
184  fHistBgd_smooth( 0 ),
185  fDefaultPDFLik ( 0 ),
186  fPDFSig ( 0 ),
187  fPDFBgd ( 0 ),
188  fNsmooth ( 2 ),
189  fNsmoothVarS ( 0 ),
190  fNsmoothVarB ( 0 ),
191  fAverageEvtPerBin( 0 ),
192  fAverageEvtPerBinVarS (0),
193  fAverageEvtPerBinVarB (0),
194  fKDEfineFactor ( 0 ),
195  fInterpolateString(0)
196 {
197 }
198 
199 ////////////////////////////////////////////////////////////////////////////////
200 /// destructor
201 
203 {
204  if (NULL != fDefaultPDFLik) delete fDefaultPDFLik;
205  if (NULL != fHistSig) delete fHistSig;
206  if (NULL != fHistBgd) delete fHistBgd;
207  if (NULL != fHistSig_smooth) delete fHistSig_smooth;
208  if (NULL != fHistBgd_smooth) delete fHistBgd_smooth;
209  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
210  if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
211  if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
212  }
213  if (NULL != fPDFSig) delete fPDFSig;
214  if (NULL != fPDFBgd) delete fPDFBgd;
215 }
216 
217 ////////////////////////////////////////////////////////////////////////////////
218 /// FDA can handle classification with 2 classes
219 
221  UInt_t numberClasses, UInt_t /*numberTargets*/ )
222 {
223  if (type == Types::kClassification && numberClasses == 2) return kTRUE;
224  return kFALSE;
225 }
226 
227 ////////////////////////////////////////////////////////////////////////////////
228 /// default initialisation called by all constructors
229 
231 {
232  // no ranking test
233  fDropVariable = -1;
234  fHistSig = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
235  fHistBgd = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
236  fHistSig_smooth = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
237  fHistBgd_smooth = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
238  fPDFSig = new std::vector<TMVA::PDF*>( GetNvar(), (TMVA::PDF*)0 );
239  fPDFBgd = new std::vector<TMVA::PDF*>( GetNvar(), (TMVA::PDF*)0 );
240 }
241 
242 ////////////////////////////////////////////////////////////////////////////////
243 /// define the options (their key words) that can be set in the option string
244 ///
245 /// TransformOutput <bool> transform (often strongly peaked) likelihood output through sigmoid inversion
246 
248 {
249  DeclareOptionRef( fTransformLikelihoodOutput = kFALSE, "TransformOutput",
250  "Transform likelihood output by inverse sigmoid function" );
251 
252  // initialize
253 
254  // reading every PDF's definition and passing the option string to the next one to be read and marked
255  TString updatedOptions = GetOptions();
256  fDefaultPDFLik = new PDF( TString(GetName()) + " PDF", updatedOptions );
259  updatedOptions = fDefaultPDFLik->GetOptions();
260  for (UInt_t ivar = 0; ivar< DataInfo().GetNVariables(); ivar++) {
261  (*fPDFSig)[ivar] = new PDF( Form("%s PDF Sig[%d]", GetName(), ivar), updatedOptions,
262  Form("Sig[%d]",ivar), fDefaultPDFLik );
263  (*fPDFSig)[ivar]->DeclareOptions();
264  (*fPDFSig)[ivar]->ParseOptions();
265  updatedOptions = (*fPDFSig)[ivar]->GetOptions();
266  (*fPDFBgd)[ivar] = new PDF( Form("%s PDF Bkg[%d]", GetName(), ivar), updatedOptions,
267  Form("Bkg[%d]",ivar), fDefaultPDFLik );
268  (*fPDFBgd)[ivar]->DeclareOptions();
269  (*fPDFBgd)[ivar]->ParseOptions();
270  updatedOptions = (*fPDFBgd)[ivar]->GetOptions();
271  }
272 
273  // the final marked option string is written back to the original likelihood
274  SetOptions( updatedOptions );
275 }
276 
277 
279 {
280  // options that are used ONLY for the READER to ensure backward compatibility
281 
283  DeclareOptionRef( fNsmooth = 1, "NSmooth",
284  "Number of smoothing iterations for the input histograms");
285  DeclareOptionRef( fAverageEvtPerBin = 50, "NAvEvtPerBin",
286  "Average number of events per PDF bin");
287  DeclareOptionRef( fKDEfineFactor =1. , "KDEFineFactor",
288  "Fine tuning factor for Adaptive KDE: Factor to multiply the width of the kernel");
289  DeclareOptionRef( fBorderMethodString = "None", "KDEborder",
290  "Border effects treatment (1=no treatment , 2=kernel renormalization, 3=sample mirroring)" );
291  DeclareOptionRef( fKDEiterString = "Nonadaptive", "KDEiter",
292  "Number of iterations (1=non-adaptive, 2=adaptive)" );
293  DeclareOptionRef( fKDEtypeString = "Gauss", "KDEtype",
294  "KDE kernel type (1=Gauss)" );
297  fNsmoothVarS = new Int_t[GetNvar()];
298  fNsmoothVarB = new Int_t[GetNvar()];
300  for(UInt_t i=0; i<GetNvar(); ++i) {
302  fNsmoothVarS[i] = fNsmoothVarB[i] = 0;
303  fInterpolateString[i] = "";
304  }
305  DeclareOptionRef( fAverageEvtPerBinVarS, GetNvar(), "NAvEvtPerBinSig",
306  "Average num of events per PDF bin and variable (signal)");
307  DeclareOptionRef( fAverageEvtPerBinVarB, GetNvar(), "NAvEvtPerBinBkg",
308  "Average num of events per PDF bin and variable (background)");
309  DeclareOptionRef(fNsmoothVarS, GetNvar(), "NSmoothSig",
310  "Number of smoothing iterations for the input histograms");
311  DeclareOptionRef(fNsmoothVarB, GetNvar(), "NSmoothBkg",
312  "Number of smoothing iterations for the input histograms");
313  DeclareOptionRef(fInterpolateString, GetNvar(), "PDFInterpol", "Method of interpolating reference histograms (e.g. Spline2 or KDE)");
314 }
315 
316 ////////////////////////////////////////////////////////////////////////////////
317 /// process user options
318 /// reference cut value to distinguish signal-like from background-like events
319 
321 {
323 
325  for (UInt_t ivar = 0; ivar< DataInfo().GetNVariables(); ivar++) {
326  (*fPDFBgd)[ivar]->ProcessOptions();
327  (*fPDFSig)[ivar]->ProcessOptions();
328  }
329 }
330 
331 ////////////////////////////////////////////////////////////////////////////////
332 /// create reference distributions (PDFs) from signal and background events:
333 /// fill histograms and smooth them; if decorrelation is required, compute
334 /// corresponding square-root matrices
335 /// the reference histograms require the correct boundaries. Since in Likelihood classification
336 /// the transformations are applied using both classes, also the corresponding boundaries
337 /// need to take this into account
338 
340 {
341  UInt_t nvar=GetNvar();
342  std::vector<Double_t> xmin(nvar), xmax(nvar);
343  for (UInt_t ivar=0; ivar<nvar; ivar++) {xmin[ivar]=1e30; xmax[ivar]=-1e30;}
344 
345  UInt_t nevents=Data()->GetNEvents();
346  for (UInt_t ievt=0; ievt<nevents; ievt++) {
347  // use the true-event-type's transformation
348  // set the event true event types transformation
349  const Event* origEv = Data()->GetEvent(ievt);
350  if (IgnoreEventsWithNegWeightsInTraining() && origEv->GetWeight()<=0) continue;
351  // loop over classes
352  for (int cls=0;cls<2;cls++){
354  const Event* ev = GetTransformationHandler().Transform( origEv );
355  for (UInt_t ivar=0; ivar<nvar; ivar++) {
356  Float_t value = ev->GetValue(ivar);
357  if (value < xmin[ivar]) xmin[ivar] = value;
358  if (value > xmax[ivar]) xmax[ivar] = value;
359  }
360  }
361  }
362 
363  // create reference histograms
364  // (KDE smoothing requires very finely binned reference histograms)
365  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
366  TString var = (*fInputVars)[ivar];
367 
368  // the reference histograms require the correct boundaries. Since in Likelihood classification
369  // the transformations are applied using both classes, also the corresponding boundaries
370  // need to take this into account
371 
372  // special treatment for discrete variables
373  if (DataInfo().GetVariableInfo(ivar).GetVarType() == 'I') {
374  // special treatment for integer variables
375  Int_t ixmin = TMath::Nint( xmin[ivar] );
376  xmax[ivar]=xmax[ivar]+1; // make sure that all entries are included in histogram
377  Int_t ixmax = TMath::Nint( xmax[ivar] );
378  Int_t nbins = ixmax - ixmin;
379  (*fHistSig)[ivar] = new TH1F(GetMethodName()+"_"+var + "_sig", var + " signal training", nbins, ixmin, ixmax );
380  (*fHistBgd)[ivar] = new TH1F(GetMethodName()+"_"+var + "_bgd", var + " background training", nbins, ixmin, ixmax );
381  } else {
382 
383  UInt_t minNEvt = TMath::Min(Data()->GetNEvtSigTrain(),Data()->GetNEvtBkgdTrain());
384  Int_t nbinsS = (*fPDFSig)[ivar]->GetHistNBins( minNEvt );
385  Int_t nbinsB = (*fPDFBgd)[ivar]->GetHistNBins( minNEvt );
386 
387  (*fHistSig)[ivar] = new TH1F( Form("%s_%s_%s_sig",DataInfo().GetName(),GetMethodName().Data(),var.Data()),
388  Form("%s_%s_%s signal training",DataInfo().GetName(),GetMethodName().Data(),var.Data()), nbinsS, xmin[ivar], xmax[ivar] );
389  (*fHistBgd)[ivar] = new TH1F( Form("%s_%s_%s_bgd",DataInfo().GetName(),GetMethodName().Data(),var.Data()),
390  Form("%s_%s_%s background training",DataInfo().GetName(),GetMethodName().Data(),var.Data()), nbinsB, xmin[ivar], xmax[ivar] );
391  }
392  }
393 
394  // ----- fill the reference histograms
395  Log() << kINFO << "Filling reference histograms" << Endl;
396 
397  // event loop
398  for (Int_t ievt=0; ievt<Data()->GetNEvents(); ievt++) {
399 
400  // use the true-event-type's transformation
401  // set the event true event types transformation
402  const Event* origEv = Data()->GetEvent(ievt);
403  if (IgnoreEventsWithNegWeightsInTraining() && origEv->GetWeight()<=0) continue;
405  const Event* ev = GetTransformationHandler().Transform( origEv );
406 
407  // the event weight
408  Float_t weight = ev->GetWeight();
409 
410  // fill variable vector
411  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
412  Double_t value = ev->GetValue(ivar);
413  // verify limits
414  if (value >= xmax[ivar]) value = xmax[ivar] - 1.0e-10;
415  else if (value < xmin[ivar]) value = xmin[ivar] + 1.0e-10;
416  // inserting check if there are events in overflow or underflow
417  if (value >=(*fHistSig)[ivar]->GetXaxis()->GetXmax() ||
418  value <(*fHistSig)[ivar]->GetXaxis()->GetXmin()){
419  Log()<<kWARNING
420  <<"error in filling likelihood reference histograms var="
421  <<(*fInputVars)[ivar]
422  << ", xmin="<<(*fHistSig)[ivar]->GetXaxis()->GetXmin()
423  << ", value="<<value
424  << ", xmax="<<(*fHistSig)[ivar]->GetXaxis()->GetXmax()
425  << Endl;
426  }
427  if (DataInfo().IsSignal(ev)) (*fHistSig)[ivar]->Fill( value, weight );
428  else (*fHistBgd)[ivar]->Fill( value, weight );
429  }
430  }
431 
432  // building the pdfs
433  Log() << kINFO << "Building PDF out of reference histograms" << Endl;
434  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
435 
436  // the PDF is built from (binned) reference histograms
437  // in case of KDE, this has a large number of bins, which makes it quasi-unbinned
438  (*fPDFSig)[ivar]->BuildPDF( (*fHistSig)[ivar] );
439  (*fPDFBgd)[ivar]->BuildPDF( (*fHistBgd)[ivar] );
440 
441  (*fPDFSig)[ivar]->ValidatePDF( (*fHistSig)[ivar] );
442  (*fPDFBgd)[ivar]->ValidatePDF( (*fHistBgd)[ivar] );
443 
444  // saving the smoothed histograms
445  if ((*fPDFSig)[ivar]->GetSmoothedHist() != 0) (*fHistSig_smooth)[ivar] = (*fPDFSig)[ivar]->GetSmoothedHist();
446  if ((*fPDFBgd)[ivar]->GetSmoothedHist() != 0) (*fHistBgd_smooth)[ivar] = (*fPDFBgd)[ivar]->GetSmoothedHist();
447  }
449 }
450 
451 ////////////////////////////////////////////////////////////////////////////////
452 /// returns the likelihood estimator for signal
453 /// fill a new Likelihood branch into the testTree
454 
456 {
457  UInt_t ivar;
458 
459  // cannot determine error
460  NoErrorCalc(err, errUpper);
461 
462  // retrieve variables, and transform, if required
463  TVector vs( GetNvar() );
464  TVector vb( GetNvar() );
465 
466  // need to distinguish signal and background in case of variable transformation
467  // signal first
468 
470  // temporary: JS --> FIX
471  //GetTransformationHandler().SetTransformationReferenceClass( 0 );
472  const Event* ev = GetEvent();
473  for (ivar=0; ivar<GetNvar(); ivar++) vs(ivar) = ev->GetValue(ivar);
474 
476  // temporary: JS --> FIX
477  //GetTransformationHandler().SetTransformationReferenceClass( 1 );
478  ev = GetEvent();
479  for (ivar=0; ivar<GetNvar(); ivar++) vb(ivar) = ev->GetValue(ivar);
480 
481  // compute the likelihood (signal)
482  Double_t ps(1), pb(1), p(0);
483  for (ivar=0; ivar<GetNvar(); ivar++) {
484 
485  // drop one variable (this is ONLY used for internal variable ranking !)
486  if ((Int_t)ivar == fDropVariable) continue;
487 
488  Double_t x[2] = { vs(ivar), vb(ivar) };
489 
490  for (UInt_t itype=0; itype < 2; itype++) {
491 
492  // verify limits
493  if (x[itype] >= (*fPDFSig)[ivar]->GetXmax()) x[itype] = (*fPDFSig)[ivar]->GetXmax() - 1.0e-10;
494  else if (x[itype] < (*fPDFSig)[ivar]->GetXmin()) x[itype] = (*fPDFSig)[ivar]->GetXmin();
495 
496  // find corresponding histogram from cached indices
497  PDF* pdf = (itype == 0) ? (*fPDFSig)[ivar] : (*fPDFBgd)[ivar];
498  if (pdf == 0) Log() << kFATAL << "<GetMvaValue> Reference histograms don't exist" << Endl;
499  TH1* hist = pdf->GetPDFHist();
500 
501  // interpolate linearly between adjacent bins
502  // this is not useful for discrete variables
503  Int_t bin = hist->FindBin(x[itype]);
504 
505  // **** POTENTIAL BUG: PREFORMANCE IS WORSE WHEN USING TRUE TYPE ***
506  // ==> commented out at present
507  if ((*fPDFSig)[ivar]->GetInterpolMethod() == TMVA::PDF::kSpline0 ||
508  DataInfo().GetVariableInfo(ivar).GetVarType() == 'N') {
509  p = TMath::Max( hist->GetBinContent(bin), fEpsilon );
510  } else { // splined PDF
511  Int_t nextbin = bin;
512  if ((x[itype] > hist->GetBinCenter(bin) && bin != hist->GetNbinsX()) || bin == 1)
513  nextbin++;
514  else
515  nextbin--;
516 
517 
518  Double_t dx = hist->GetBinCenter(bin) - hist->GetBinCenter(nextbin);
519  Double_t dy = hist->GetBinContent(bin) - hist->GetBinContent(nextbin);
520  Double_t like = hist->GetBinContent(bin) + (x[itype] - hist->GetBinCenter(bin)) * dy/dx;
521 
522  p = TMath::Max( like, fEpsilon );
523  }
524 
525  if (itype == 0) ps *= p;
526  else pb *= p;
527  }
528  }
529 
530  // the likelihood ratio (transform it ?)
531  return TransformLikelihoodOutput( ps, pb );
532 }
533 
534 ////////////////////////////////////////////////////////////////////////////////
535 /// returns transformed or non-transformed output
536 
538 {
539  if (ps < fEpsilon) ps = fEpsilon;
540  if (pb < fEpsilon) pb = fEpsilon;
541  Double_t r = ps/(ps + pb);
542  if (r >= 1.0) r = 1. - 1.e-15;
543 
545  // inverse Fermi function
546 
547  // sanity check
548  if (r <= 0.0) r = fEpsilon;
549  else if (r >= 1.0) r = 1. - 1.e-15;
550 
551  Double_t tau = 15.0;
552  r = - TMath::Log(1.0/r - 1.0)/tau;
553  }
554 
555  return r;
556 }
557 
558 ////////////////////////////////////////////////////////////////////////////////
559 /// write options to stream
560 
561 void TMVA::MethodLikelihood::WriteOptionsToStream( std::ostream& o, const TString& prefix ) const
562 {
564 
565  // writing the options defined for the different pdfs
566  if (fDefaultPDFLik != 0) {
567  o << prefix << std::endl << prefix << "#Default Likelihood PDF Options:" << std::endl << prefix << std::endl;
568  fDefaultPDFLik->WriteOptionsToStream( o, prefix );
569  }
570  for (UInt_t ivar = 0; ivar < fPDFSig->size(); ivar++) {
571  if ((*fPDFSig)[ivar] != 0) {
572  o << prefix << std::endl << prefix << Form("#Signal[%d] Likelihood PDF Options:",ivar) << std::endl << prefix << std::endl;
573  (*fPDFSig)[ivar]->WriteOptionsToStream( o, prefix );
574  }
575  if ((*fPDFBgd)[ivar] != 0) {
576  o << prefix << std::endl << prefix << "#Background[%d] Likelihood PDF Options:" << std::endl << prefix << std::endl;
577  (*fPDFBgd)[ivar]->WriteOptionsToStream( o, prefix );
578  }
579  }
580 }
581 
582 ////////////////////////////////////////////////////////////////////////////////
583 /// write weights to XML
584 
585 void TMVA::MethodLikelihood::AddWeightsXMLTo( void* parent ) const
586 {
587  void* wght = gTools().AddChild(parent, "Weights");
588  gTools().AddAttr(wght, "NVariables", GetNvar());
589  gTools().AddAttr(wght, "NClasses", 2);
590  void* pdfwrap;
591  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
592  if ( (*fPDFSig)[ivar]==0 || (*fPDFBgd)[ivar]==0 )
593  Log() << kFATAL << "Reference histograms for variable " << ivar
594  << " don't exist, can't write it to weight file" << Endl;
595  pdfwrap = gTools().AddChild(wght, "PDFDescriptor");
596  gTools().AddAttr(pdfwrap, "VarIndex", ivar);
597  gTools().AddAttr(pdfwrap, "ClassIndex", 0);
598  (*fPDFSig)[ivar]->AddXMLTo(pdfwrap);
599  pdfwrap = gTools().AddChild(wght, "PDFDescriptor");
600  gTools().AddAttr(pdfwrap, "VarIndex", ivar);
601  gTools().AddAttr(pdfwrap, "ClassIndex", 1);
602  (*fPDFBgd)[ivar]->AddXMLTo(pdfwrap);
603  }
604 }
605 
606 ////////////////////////////////////////////////////////////////////////////////
607 /// computes ranking of input variables
608 
610 {
611  // create the ranking object
612  if (fRanking) delete fRanking;
613  fRanking = new Ranking( GetName(), "Delta Separation" );
614 
615  Double_t sepRef = -1, sep = -1;
616  for (Int_t ivar=-1; ivar<(Int_t)GetNvar(); ivar++) {
617 
618  // this variable should not be used
619  fDropVariable = ivar;
620 
621  TString nameS = Form( "rS_%i", ivar+1 );
622  TString nameB = Form( "rB_%i", ivar+1 );
623  TH1* rS = new TH1F( nameS, nameS, 80, 0, 1 );
624  TH1* rB = new TH1F( nameB, nameB, 80, 0, 1 );
625 
626  // the event loop
627  for (Int_t ievt=0; ievt<Data()->GetNTrainingEvents(); ievt++) {
628 
629  const Event* origEv = Data()->GetEvent(ievt);
631  const Event* ev = GetTransformationHandler().Transform(Data()->GetEvent(ievt));
632 
633  Double_t lk = this->GetMvaValue();
634  Double_t w = ev->GetWeight();
635  if (DataInfo().IsSignal(ev)) rS->Fill( lk, w );
636  else rB->Fill( lk, w );
637  }
638 
639  // compute separation
640  sep = TMVA::gTools().GetSeparation( rS, rB );
641  if (ivar == -1) sepRef = sep;
642  sep = sepRef - sep;
643 
644  // don't need these histograms anymore
645  delete rS;
646  delete rB;
647 
648  if (ivar >= 0) fRanking->AddRank( Rank( DataInfo().GetVariableInfo(ivar).GetInternalName(), sep ) );
649  }
650 
651  fDropVariable = -1;
652 
653  return fRanking;
654 }
655 
656 ////////////////////////////////////////////////////////////////////////////////
657 /// write reference PDFs to ROOT file
658 
660 {
661  TString pname = "PDF_";
662  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
663  (*fPDFSig)[ivar]->Write( pname + GetInputVar( ivar ) + "_S" );
664  (*fPDFBgd)[ivar]->Write( pname + GetInputVar( ivar ) + "_B" );
665  }
666 }
667 
668 ////////////////////////////////////////////////////////////////////////////////
669 /// read weights from XML
670 
672 {
673  TString pname = "PDF_";
674  Bool_t addDirStatus = TH1::AddDirectoryStatus();
675  TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
676  UInt_t nvars=0;
677  gTools().ReadAttr(wghtnode, "NVariables",nvars);
678  void* descnode = gTools().GetChild(wghtnode);
679  for (UInt_t ivar=0; ivar<nvars; ivar++){
680  void* pdfnode = gTools().GetChild(descnode);
681  Log() << kDEBUG << "Reading signal and background PDF for variable: " << GetInputVar( ivar ) << Endl;
682  if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
683  if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
684  (*fPDFSig)[ivar] = new PDF( GetInputVar( ivar ) + " PDF Sig" );
685  (*fPDFBgd)[ivar] = new PDF( GetInputVar( ivar ) + " PDF Bkg" );
686  (*fPDFSig)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
687  (*fPDFBgd)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
688  (*(*fPDFSig)[ivar]).ReadXML(pdfnode);
689  descnode = gTools().GetNextChild(descnode);
690  pdfnode = gTools().GetChild(descnode);
691  (*(*fPDFBgd)[ivar]).ReadXML(pdfnode);
692  descnode = gTools().GetNextChild(descnode);
693  }
694  TH1::AddDirectory(addDirStatus);
695 }
696 
697 ////////////////////////////////////////////////////////////////////////////////
698 /// read weight info from file
699 /// nothing to do for this method
700 
702 {
703  TString pname = "PDF_";
704  Bool_t addDirStatus = TH1::AddDirectoryStatus();
705  TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
706  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
707  Log() << kDEBUG << "Reading signal and background PDF for variable: " << GetInputVar( ivar ) << Endl;
708  if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
709  if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
710  (*fPDFSig)[ivar] = new PDF(GetInputVar( ivar ) + " PDF Sig" );
711  (*fPDFBgd)[ivar] = new PDF(GetInputVar( ivar ) + " PDF Bkg");
712  (*fPDFSig)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
713  (*fPDFBgd)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
714  istr >> *(*fPDFSig)[ivar];
715  istr >> *(*fPDFBgd)[ivar];
716  }
717  TH1::AddDirectory(addDirStatus);
718 }
719 
720 ////////////////////////////////////////////////////////////////////////////////
721 /// read reference PDF from ROOT file
722 
724 {
725  TString pname = "PDF_";
726  Bool_t addDirStatus = TH1::AddDirectoryStatus();
727  TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
728  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
729  (*fPDFSig)[ivar] = (TMVA::PDF*)rf.Get( Form( "PDF_%s_S", GetInputVar( ivar ).Data() ) );
730  (*fPDFBgd)[ivar] = (TMVA::PDF*)rf.Get( Form( "PDF_%s_B", GetInputVar( ivar ).Data() ) );
731  }
732  TH1::AddDirectory(addDirStatus);
733 }
734 
735 ////////////////////////////////////////////////////////////////////////////////
736 /// write histograms and PDFs to file for monitoring purposes
737 
739 {
740  Log() << kINFO << "Write monitoring histograms to file: " << BaseDir()->GetPath() << Endl;
741  BaseDir()->cd();
742 
743  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
744  (*fHistSig)[ivar]->Write();
745  (*fHistBgd)[ivar]->Write();
746  if ((*fHistSig_smooth)[ivar] != 0) (*fHistSig_smooth)[ivar]->Write();
747  if ((*fHistBgd_smooth)[ivar] != 0) (*fHistBgd_smooth)[ivar]->Write();
748  (*fPDFSig)[ivar]->GetPDFHist()->Write();
749  (*fPDFBgd)[ivar]->GetPDFHist()->Write();
750 
751  if ((*fPDFSig)[ivar]->GetNSmoothHist() != 0) (*fPDFSig)[ivar]->GetNSmoothHist()->Write();
752  if ((*fPDFBgd)[ivar]->GetNSmoothHist() != 0) (*fPDFBgd)[ivar]->GetNSmoothHist()->Write();
753 
754  // add special plots to check the smoothing in the GetVal method
755  Float_t xmin=((*fPDFSig)[ivar]->GetPDFHist()->GetXaxis())->GetXmin();
756  Float_t xmax=((*fPDFSig)[ivar]->GetPDFHist()->GetXaxis())->GetXmax();
757  TH1F* mm = new TH1F( (*fInputVars)[ivar]+"_additional_check",
758  (*fInputVars)[ivar]+"_additional_check", 15000, xmin, xmax );
759  Double_t intBin = (xmax-xmin)/15000;
760  for (Int_t bin=0; bin < 15000; bin++) {
761  Double_t x = (bin + 0.5)*intBin + xmin;
762  mm->SetBinContent(bin+1 ,(*fPDFSig)[ivar]->GetVal(x));
763  }
764  mm->Write();
765 
766  // ---------- create cloned low-binned histogram for comparison in macros (mainly necessary for KDE)
767  TH1* h[2] = { (*fHistSig)[ivar], (*fHistBgd)[ivar] };
768  for (UInt_t i=0; i<2; i++) {
769  TH1* hclone = (TH1F*)h[i]->Clone( TString(h[i]->GetName()) + "_nice" );
770  hclone->SetName ( TString(h[i]->GetName()) + "_nice" );
771  hclone->SetTitle( TString(h[i]->GetTitle()) + "" );
772  if (hclone->GetNbinsX() > 100) {
773  Int_t resFactor = 5;
774  hclone->Rebin( resFactor );
775  hclone->Scale( 1.0/resFactor );
776  }
777  hclone->Write();
778  }
779  // ----------
780  }
781 }
782 
783 ////////////////////////////////////////////////////////////////////////////////
784 /// write specific header of the classifier (mostly include files)
785 
786 void TMVA::MethodLikelihood::MakeClassSpecificHeader( std::ostream& fout, const TString& ) const
787 {
788  fout << "#include <math.h>" << std::endl;
789  fout << "#include <cstdlib>" << std::endl;
790 }
791 
792 ////////////////////////////////////////////////////////////////////////////////
793 /// write specific classifier response
794 
795 void TMVA::MethodLikelihood::MakeClassSpecific( std::ostream& fout, const TString& className ) const
796 {
797  Int_t dp = fout.precision();
798  fout << " double fEpsilon;" << std::endl;
799 
800  Int_t * nbin = new Int_t[GetNvar()];
801 
802  Int_t nbinMax=-1;
803  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
804  nbin[ivar]=(*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX();
805  if (nbin[ivar] > nbinMax) nbinMax=nbin[ivar];
806  }
807 
808  fout << " static float fRefS[][" << nbinMax << "]; "
809  << "// signal reference vector [nvars][max_nbins]" << std::endl;
810  fout << " static float fRefB[][" << nbinMax << "]; "
811  << "// backgr reference vector [nvars][max_nbins]" << std::endl << std::endl;
812  fout << "// if a variable has its PDF encoded as a spline0 --> treat it like an Integer valued one" <<std::endl;
813  fout << " bool fHasDiscretPDF[" << GetNvar() <<"]; "<< std::endl;
814  fout << " int fNbin[" << GetNvar() << "]; "
815  << "// number of bins (discrete variables may have less bins)" << std::endl;
816  fout << " double fHistMin[" << GetNvar() << "]; " << std::endl;
817  fout << " double fHistMax[" << GetNvar() << "]; " << std::endl;
818 
819  fout << " double TransformLikelihoodOutput( double, double ) const;" << std::endl;
820  fout << "};" << std::endl;
821  fout << "" << std::endl;
822  fout << "inline void " << className << "::Initialize() " << std::endl;
823  fout << "{" << std::endl;
824  fout << " fEpsilon = " << fEpsilon << ";" << std::endl;
825  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
826  fout << " fNbin[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() << ";" << std::endl;
827  fout << " fHistMin[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetXaxis()->GetXmin() << ";" << std::endl;
828  fout << " fHistMax[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetXaxis()->GetXmax() << ";" << std::endl;
829  // sanity check (for previous code lines)
830  if ((((*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() != nbin[ivar] ||
831  (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX() != nbin[ivar])
832  ) ||
833  (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() != (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX()) {
834  Log() << kFATAL << "<MakeClassSpecific> Mismatch in binning of variable "
835  << "\"" << GetOriginalVarName(ivar) << "\" of type: \'" << DataInfo().GetVariableInfo(ivar).GetVarType()
836  << "\' : "
837  << "nxS = " << (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() << ", "
838  << "nxB = " << (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX()
839  << " while we expect " << nbin[ivar]
840  << Endl;
841  }
842  }
843  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
844  if ((*fPDFSig)[ivar]->GetInterpolMethod() == TMVA::PDF::kSpline0)
845  fout << " fHasDiscretPDF[" << ivar <<"] = true; " << std::endl;
846  else
847  fout << " fHasDiscretPDF[" << ivar <<"] = false; " << std::endl;
848  }
849 
850  fout << "}" << std::endl << std::endl;
851 
852  fout << "inline double " << className
853  << "::GetMvaValue__( const std::vector<double>& inputValues ) const" << std::endl;
854  fout << "{" << std::endl;
855  fout << " double ps(1), pb(1);" << std::endl;
856  fout << " std::vector<double> inputValuesSig = inputValues;" << std::endl;
857  fout << " std::vector<double> inputValuesBgd = inputValues;" << std::endl;
859  fout << " Transform(inputValuesSig,0);" << std::endl;
860  fout << " Transform(inputValuesBgd,1);" << std::endl;
861  }
862  fout << " for (size_t ivar = 0; ivar < GetNvar(); ivar++) {" << std::endl;
863  fout << std::endl;
864  fout << " // dummy at present... will be used for variable transforms" << std::endl;
865  fout << " double x[2] = { inputValuesSig[ivar], inputValuesBgd[ivar] };" << std::endl;
866  fout << std::endl;
867  fout << " for (int itype=0; itype < 2; itype++) {" << std::endl;
868  fout << std::endl;
869  fout << " // interpolate linearly between adjacent bins" << std::endl;
870  fout << " // this is not useful for discrete variables (or forced Spline0)" << std::endl;
871  fout << " int bin = int((x[itype] - fHistMin[ivar])/(fHistMax[ivar] - fHistMin[ivar])*fNbin[ivar]) + 0;" << std::endl;
872  fout << std::endl;
873  fout << " // since the test data sample is in general different from the training sample" << std::endl;
874  fout << " // it can happen that the min/max of the training sample are trespassed --> correct this" << std::endl;
875  fout << " if (bin < 0) {" << std::endl;
876  fout << " bin = 0;" << std::endl;
877  fout << " x[itype] = fHistMin[ivar];" << std::endl;
878  fout << " }" << std::endl;
879  fout << " else if (bin >= fNbin[ivar]) {" << std::endl;
880  fout << " bin = fNbin[ivar]-1;" << std::endl;
881  fout << " x[itype] = fHistMax[ivar];" << std::endl;
882  fout << " }" << std::endl;
883  fout << std::endl;
884  fout << " // find corresponding histogram from cached indices" << std::endl;
885  fout << " float ref = (itype == 0) ? fRefS[ivar][bin] : fRefB[ivar][bin];" << std::endl;
886  fout << std::endl;
887  fout << " // sanity check" << std::endl;
888  fout << " if (ref < 0) {" << std::endl;
889  fout << " std::cout << \"Fatal error in " << className
890  << ": bin entry < 0 ==> abort\" << std::endl;" << std::endl;
891  fout << " std::exit(1);" << std::endl;
892  fout << " }" << std::endl;
893  fout << std::endl;
894  fout << " double p = ref;" << std::endl;
895  fout << std::endl;
896  fout << " if (GetType(ivar) != 'I' && !fHasDiscretPDF[ivar]) {" << std::endl;
897  fout << " float bincenter = (bin + 0.5)/fNbin[ivar]*(fHistMax[ivar] - fHistMin[ivar]) + fHistMin[ivar];" << std::endl;
898  fout << " int nextbin = bin;" << std::endl;
899  fout << " if ((x[itype] > bincenter && bin != fNbin[ivar]-1) || bin == 0) " << std::endl;
900  fout << " nextbin++;" << std::endl;
901  fout << " else" << std::endl;
902  fout << " nextbin--; " << std::endl;
903  fout << std::endl;
904  fout << " double refnext = (itype == 0) ? fRefS[ivar][nextbin] : fRefB[ivar][nextbin];" << std::endl;
905  fout << " float nextbincenter = (nextbin + 0.5)/fNbin[ivar]*(fHistMax[ivar] - fHistMin[ivar]) + fHistMin[ivar];" << std::endl;
906  fout << std::endl;
907  fout << " double dx = bincenter - nextbincenter;" << std::endl;
908  fout << " double dy = ref - refnext;" << std::endl;
909  fout << " p += (x[itype] - bincenter) * dy/dx;" << std::endl;
910  fout << " }" << std::endl;
911  fout << std::endl;
912  fout << " if (p < fEpsilon) p = fEpsilon; // avoid zero response" << std::endl;
913  fout << std::endl;
914  fout << " if (itype == 0) ps *= p;" << std::endl;
915  fout << " else pb *= p;" << std::endl;
916  fout << " } " << std::endl;
917  fout << " } " << std::endl;
918  fout << std::endl;
919  fout << " // the likelihood ratio (transform it ?)" << std::endl;
920  fout << " return TransformLikelihoodOutput( ps, pb ); " << std::endl;
921  fout << "}" << std::endl << std::endl;
922 
923  fout << "inline double " << className << "::TransformLikelihoodOutput( double ps, double pb ) const" << std::endl;
924  fout << "{" << std::endl;
925  fout << " // returns transformed or non-transformed output" << std::endl;
926  fout << " if (ps < fEpsilon) ps = fEpsilon;" << std::endl;
927  fout << " if (pb < fEpsilon) pb = fEpsilon;" << std::endl;
928  fout << " double r = ps/(ps + pb);" << std::endl;
929  fout << " if (r >= 1.0) r = 1. - 1.e-15;" << std::endl;
930  fout << std::endl;
931  fout << " if (" << (fTransformLikelihoodOutput ? "true" : "false") << ") {" << std::endl;
932  fout << " // inverse Fermi function" << std::endl;
933  fout << std::endl;
934  fout << " // sanity check" << std::endl;
935  fout << " if (r <= 0.0) r = fEpsilon;" << std::endl;
936  fout << " else if (r >= 1.0) r = 1. - 1.e-15;" << std::endl;
937  fout << std::endl;
938  fout << " double tau = 15.0;" << std::endl;
939  fout << " r = - log(1.0/r - 1.0)/tau;" << std::endl;
940  fout << " }" << std::endl;
941  fout << std::endl;
942  fout << " return r;" << std::endl;
943  fout << "}" << std::endl;
944  fout << std::endl;
945 
946  fout << "// Clean up" << std::endl;
947  fout << "inline void " << className << "::Clear() " << std::endl;
948  fout << "{" << std::endl;
949  fout << " // nothing to clear" << std::endl;
950  fout << "}" << std::endl << std::endl;
951 
952  fout << "// signal map" << std::endl;
953  fout << "float " << className << "::fRefS[][" << nbinMax << "] = " << std::endl;
954  fout << "{ " << std::endl;
955  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
956  fout << " { ";
957  for (Int_t ibin=1; ibin<=nbinMax; ibin++) {
958  if (ibin-1 < nbin[ivar])
959  fout << (*fPDFSig)[ivar]->GetPDFHist()->GetBinContent(ibin);
960  else
961  fout << -1;
962 
963  if (ibin < nbinMax) fout << ", ";
964  }
965  fout << " }, " << std::endl;
966  }
967  fout << "}; " << std::endl;
968  fout << std::endl;
969 
970  fout << "// background map" << std::endl;
971  fout << "float " << className << "::fRefB[][" << nbinMax << "] = " << std::endl;
972  fout << "{ " << std::endl;
973  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
974  fout << " { ";
975  fout << std::setprecision(8);
976  for (Int_t ibin=1; ibin<=nbinMax; ibin++) {
977  if (ibin-1 < nbin[ivar])
978  fout << (*fPDFBgd)[ivar]->GetPDFHist()->GetBinContent(ibin);
979  else
980  fout << -1;
981 
982  if (ibin < nbinMax) fout << ", ";
983  }
984  fout << " }, " << std::endl;
985  }
986  fout << "}; " << std::endl;
987  fout << std::endl;
988  fout << std::setprecision(dp);
989 
990  delete[] nbin;
991 }
992 
993 ////////////////////////////////////////////////////////////////////////////////
994 /// get help message text
995 ///
996 /// typical length of text line:
997 /// "|--------------------------------------------------------------|"
998 
1000 {
1001  Log() << Endl;
1002  Log() << gTools().Color("bold") << "--- Short description:" << gTools().Color("reset") << Endl;
1003  Log() << Endl;
1004  Log() << "The maximum-likelihood classifier models the data with probability " << Endl;
1005  Log() << "density functions (PDF) reproducing the signal and background" << Endl;
1006  Log() << "distributions of the input variables. Correlations among the " << Endl;
1007  Log() << "variables are ignored." << Endl;
1008  Log() << Endl;
1009  Log() << gTools().Color("bold") << "--- Performance optimisation:" << gTools().Color("reset") << Endl;
1010  Log() << Endl;
1011  Log() << "Required for good performance are decorrelated input variables" << Endl;
1012  Log() << "(PCA transformation via the option \"VarTransform=Decorrelate\"" << Endl;
1013  Log() << "may be tried). Irreducible non-linear correlations may be reduced" << Endl;
1014  Log() << "by precombining strongly correlated input variables, or by simply" << Endl;
1015  Log() << "removing one of the variables." << Endl;
1016  Log() << Endl;
1017  Log() << gTools().Color("bold") << "--- Performance tuning via configuration options:" << gTools().Color("reset") << Endl;
1018  Log() << Endl;
1019  Log() << "High fidelity PDF estimates are mandatory, i.e., sufficient training " << Endl;
1020  Log() << "statistics is required to populate the tails of the distributions" << Endl;
1021  Log() << "It would be a surprise if the default Spline or KDE kernel parameters" << Endl;
1022  Log() << "provide a satisfying fit to the data. The user is advised to properly" << Endl;
1023  Log() << "tune the events per bin and smooth options in the spline cases" << Endl;
1024  Log() << "individually per variable. If the KDE kernel is used, the adaptive" << Endl;
1025  Log() << "Gaussian kernel may lead to artefacts, so please always also try" << Endl;
1026  Log() << "the non-adaptive one." << Endl;
1027  Log() << "" << Endl;
1028  Log() << "All tuning parameters must be adjusted individually for each input" << Endl;
1029  Log() << "variable!" << Endl;
1030 }
1031 
virtual Int_t Write(const char *name=0, Int_t option=0, Int_t bufsize=0)
Write this object to the current directory.
Definition: TObject.cxx:778
virtual Int_t FindBin(Double_t x, Double_t y=0, Double_t z=0)
Return Global bin number corresponding to x,y,z.
Definition: TH1.cxx:3441
void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
virtual void Scale(Double_t c1=1, Option_t *option="")
Multiply this histogram by a constant c1.
Definition: TH1.cxx:5937
virtual Int_t Fill(Double_t x)
Increment bin with abscissa X by 1.
Definition: TH1.cxx:3126
UInt_t GetNVariables() const
Definition: DataSetInfo.h:110
void WriteWeightsToStream(TFile &rf) const
write reference PDFs to ROOT file
float xmin
Definition: THbookFile.cxx:93
virtual Double_t GetBinCenter(Int_t bin) const
Return bin center for 1D histogram.
Definition: TH1.cxx:8253
MsgLogger & Endl(MsgLogger &ml)
Definition: MsgLogger.h:158
Singleton class for Global types used by TMVA.
Definition: Types.h:73
Double_t Log(Double_t x)
Definition: TMath.h:649
virtual void WriteOptionsToStream(std::ostream &o, const TString &prefix) const
write options to stream
float Float_t
Definition: RtypesCore.h:53
void Train()
create reference distributions (PDFs) from signal and background events: fill histograms and smooth t...
UInt_t GetNvar() const
Definition: MethodBase.h:328
const TList & GetTransformationList() const
const TString & GetOriginalVarName(Int_t ivar) const
Definition: MethodBase.h:493
THist< 1, float, THistStatContent, THistStatUncertainty > TH1F
Definition: THist.hxx:311
TH1 * h
Definition: legend2.C:5
MsgLogger & Log() const
Definition: Configurable.h:122
virtual ~MethodLikelihood()
destructor
OptionBase * DeclareOptionRef(T &ref, const TString &name, const TString &desc="")
virtual Double_t GetBinContent(Int_t bin) const
Return content of bin number bin.
Definition: TH1.cxx:4639
A ROOT file is a suite of consecutive data records (TKey instances) with a well defined format...
Definition: TFile.h:46
EAnalysisType
Definition: Types.h:125
TVectorT.
Definition: TMatrixTBase.h:77
TH1 * GetPDFHist() const
Definition: PDF.h:92
Virtual base Class for all MVA method.
Definition: MethodBase.h:106
virtual TObject * Get(const char *namecycle)
Return pointer to object identified by namecycle.
std::vector< TH1 * > * fHistSig
Basic string class.
Definition: TString.h:129
static Bool_t AddDirectoryStatus()
Static function: cannot be inlined on Windows/NT.
Definition: TH1.cxx:700
tomato 1-D histogram with a float per channel (see TH1 documentation)}
Definition: TH1.h:551
TransformationHandler & GetTransformationHandler(Bool_t takeReroutedIfAvailable=true)
Definition: MethodBase.h:378
Ranking for variables in method (implementation)
Definition: Ranking.h:48
Short_t Min(Short_t a, Short_t b)
Definition: TMathBase.h:168
int Int_t
Definition: RtypesCore.h:41
bool Bool_t
Definition: RtypesCore.h:59
int nbins[3]
const Event * Transform(const Event *) const
the transformation
std::vector< TString > * fInputVars
Definition: MethodBase.h:570
#define NULL
Definition: RtypesCore.h:88
void AddAttr(void *node, const char *, const T &value, Int_t precision=16)
add attribute to xml
Definition: Tools.h:308
void * AddChild(void *parent, const char *childname, const char *content=0, bool isRootNode=false)
add child node
Definition: Tools.cxx:1135
static void AddDirectory(Bool_t add=kTRUE)
Sets the flag controlling the automatic add of histograms in memory.
Definition: TH1.cxx:1218
UInt_t fSignalClass
Definition: MethodBase.h:671
const TString & GetInputVar(Int_t i) const
Definition: MethodBase.h:333
char GetVarType() const
Definition: VariableInfo.h:61
Double_t x[n]
Definition: legend1.C:17
void ProcessOptions()
process user options reference cut value to distinguish signal-like from background-like events ...
void SetTransformationReferenceClass(Int_t cls)
overrides the setting for all classes! (this is put in basically for the likelihood-method) be carefu...
void MakeClassSpecific(std::ostream &, const TString &) const
write specific classifier response
void ReadWeightsFromXML(void *wghtnode)
read weights from XML
you should not use this method at all Int_t Int_t Double_t Double_t Double_t Int_t Double_t Double_t Double_t tau
Definition: TRolke.cxx:630
UInt_t GetTrainingTMVAVersionCode() const
Definition: MethodBase.h:373
const Event * GetEvent() const
Definition: MethodBase.h:733
DataSet * Data() const
Definition: MethodBase.h:393
void * GetChild(void *parent, const char *childname=0)
get child node
Definition: Tools.cxx:1161
UInt_t GetClass() const
Definition: Event.h:81
virtual void ParseOptions()
options parser
Double_t GetXmin(Int_t ivar) const
Definition: MethodBase.h:340
DataSetInfo & DataInfo() const
Definition: MethodBase.h:394
void SetOptions(const TString &s)
Definition: Configurable.h:85
Class that contains all the data information.
Definition: DataSetInfo.h:60
PDF wrapper for histograms; uses user-defined spline interpolation.
Definition: PDF.h:63
std::vector< PDF * > * fPDFSig
Double_t GetWeight() const
return the event weight - depending on whether the flag IgnoreNegWeightsInTraining is or not...
Definition: Event.cxx:382
Long64_t GetNTrainingEvents() const
Definition: DataSet.h:79
std::vector< TH1 * > * fHistBgd_smooth
Double_t GetXmax(Int_t ivar) const
Definition: MethodBase.h:341
TRandom2 r(17)
const char * GetName() const
Definition: MethodBase.h:318
virtual void SetBinContent(Int_t bin, Double_t content)
Set bin content see convention for numbering bins in TH1::GetBin In case the bin number is greater th...
Definition: TH1.cxx:8325
virtual TH1 * Rebin(Int_t ngroup=2, const char *newname="", const Double_t *xbins=0)
Rebin this histogram.
Definition: TH1.cxx:5605
std::vector< TH1 * > * fHistSig_smooth
unsigned int UInt_t
Definition: RtypesCore.h:42
const Ranking * CreateRanking()
computes ranking of input variables
char * Form(const char *fmt,...)
const TString & GetMethodName() const
Definition: MethodBase.h:315
void ReadAttr(void *node, const char *, T &value)
read attribute from xml
Definition: Tools.h:290
Likelihood analysis ("non-parametric approach")
float xmax
Definition: THbookFile.cxx:93
Tools & gTools()
void WriteMonitoringHistosToFile() const
write histograms and PDFs to file for monitoring purposes
virtual const char * GetPath() const
Returns the full path of the directory.
Definition: TDirectory.cxx:911
virtual Bool_t HasAnalysisType(Types::EAnalysisType type, UInt_t numberClasses, UInt_t numberTargets)
FDA can handle classification with 2 classes.
const Bool_t kFALSE
Definition: RtypesCore.h:92
Float_t GetValue(UInt_t ivar) const
return value of i&#39;th variable
Definition: Event.cxx:237
UInt_t fBackgroundClass
Definition: MethodBase.h:672
virtual void SetName(const char *name)
Change the name of this histogram.
Definition: TH1.cxx:8073
void DeclareOptions()
define the options (their key words) that can be set in the option string
Definition: PDF.cxx:823
Bool_t IgnoreEventsWithNegWeightsInTraining() const
Definition: MethodBase.h:668
#define ClassImp(name)
Definition: Rtypes.h:336
double Double_t
Definition: RtypesCore.h:55
void DeclareOptions()
define the options (their key words) that can be set in the option string
std::vector< PDF * > * fPDFBgd
int type
Definition: TGX11.cxx:120
void * GetNextChild(void *prevchild, const char *childname=0)
XML helpers.
Definition: Tools.cxx:1173
The TH1 histogram class.
Definition: TH1.h:56
VariableInfo & GetVariableInfo(Int_t i)
Definition: DataSetInfo.h:96
void ExitFromTraining()
Definition: MethodBase.h:446
virtual const char * GetName() const
Returns name of object.
Definition: DataSetInfo.h:67
void MakeClassSpecificHeader(std::ostream &, const TString &="") const
write specific header of the classifier (mostly include files)
const TString & GetOptions() const
Definition: Configurable.h:84
virtual TObject * Clone(const char *newname="") const
Make a clone of an object using the Streamer facility.
Definition: TNamed.cxx:65
void Init()
default initialisation called by all constructors
const TString & Color(const TString &)
human readable color strings
Definition: Tools.cxx:839
void ReadWeightsFromStream(std::istream &istr)
read weight info from file nothing to do for this method
#define REGISTER_METHOD(CLASS)
for example
Abstract ClassifierFactory template that handles arbitrary types.
Ranking * fRanking
Definition: MethodBase.h:569
virtual Bool_t cd(const char *path=0)
Change current directory to "this" directory.
Definition: TDirectory.cxx:435
TDirectory * BaseDir() const
returns the ROOT directory where info/histograms etc of the corresponding MVA method instance are sto...
virtual void AddRank(const Rank &rank)
Add a new rank take ownership of it.
Definition: Ranking.cxx:86
virtual void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
Definition: MethodBase.cxx:601
void ProcessOptions()
Definition: PDF.cxx:873
Short_t Max(Short_t a, Short_t b)
Definition: TMathBase.h:200
Double_t TransformLikelihoodOutput(Double_t ps, Double_t pb) const
returns transformed or non-transformed output
Long64_t GetNEvents(Types::ETreeType type=Types::kMaxTreeType) const
Definition: DataSet.h:215
void GetHelpMessage() const
get help message text
std::vector< TH1 * > * fHistBgd
virtual void SetTitle(const char *title)
Change (i.e.
Definition: TH1.cxx:6028
virtual Int_t GetNbinsX() const
Definition: TH1.h:277
virtual Int_t GetSize() const
Definition: TCollection.h:89
MethodLikelihood(const TString &jobName, const TString &methodTitle, DataSetInfo &theData, const TString &theOption="")
standard constructor
const Bool_t kTRUE
Definition: RtypesCore.h:91
Int_t Nint(T x)
Definition: TMath.h:607
const Event * GetEvent() const
Definition: DataSet.cxx:202
void NoErrorCalc(Double_t *const err, Double_t *const errUpper)
Definition: MethodBase.cxx:829
void SetSignalReferenceCut(Double_t cut)
Definition: MethodBase.h:348
void WriteOptionsToStream(std::ostream &o, const TString &prefix) const
write options to output stream (e.g. in writing the MVA weight files
virtual const char * GetTitle() const
Returns title of object.
Definition: TNamed.h:48
Double_t GetSeparation(TH1 *S, TH1 *B) const
compute "separation" defined as
Definition: Tools.cxx:133
Double_t GetMvaValue(Double_t *err=0, Double_t *errUpper=0)
returns the likelihood estimator for signal fill a new Likelihood branch into the testTree ...
void AddWeightsXMLTo(void *parent) const
write weights to XML
const char * Data() const
Definition: TString.h:347