Logo ROOT   6.07/09
Reference Guide
MethodLikelihood.cxx
Go to the documentation of this file.
1 // @(#)root/tmva $Id$
2 // Author: Andreas Hoecker, Joerg Stelzer, Helge Voss, Kai Voss, Eckhard von Toerne, Jan Therhaag
3 
4 /**********************************************************************************
5  * Project: TMVA - a Root-integrated toolkit for multivariate Data analysis *
6  * Package: TMVA *
7  * Class : TMVA::MethodLikelihood *
8  * Web : http://tmva.sourceforge.net *
9  * *
10  * Description: *
11  * Implementation (see header for description) *
12  * *
13  * Authors (alphabetical): *
14  * Andreas Hoecker <Andreas.Hocker@cern.ch> - CERN, Switzerland *
15  * Helge Voss <Helge.Voss@cern.ch> - MPI-K Heidelberg, Germany *
16  * Kai Voss <Kai.Voss@cern.ch> - U. of Victoria, Canada *
17  * Jan Therhaag <Jan.Therhaag@cern.ch> - U of Bonn, Germany *
18  * Eckhard v. Toerne <evt@uni-bonn.de> - U. of Bonn, Germany *
19  * *
20  * Copyright (c) 2005-2011: *
21  * CERN, Switzerland *
22  * U. of Victoria, Canada *
23  * MPI-K Heidelberg, Germany *
24  * U. of Bonn, Germany *
25  * *
26  * Redistribution and use in source and binary forms, with or without *
27  * modification, are permitted according to the terms listed in LICENSE *
28  * (http://tmva.sourceforge.net/LICENSE) *
29  **********************************************************************************/
30 
31 ////////////////////////////////////////////////////////////////////////////////
32 
33 /* Begin_Html
34 
35  Likelihood analysis ("non-parametric approach")
36 
37  <p>
38  Also implemented is a "diagonalized likelihood approach",
39  which improves over the uncorrelated likelihood ansatz by
40  transforming linearly the input variables into a diagonal space,
41  using the square-root of the covariance matrix
42 
43  <p>
44  The method of maximum likelihood is the most straightforward, and
45  certainly among the most elegant multivariate analyser approaches.
46  We define the likelihood ratio, <i>R<sub>L</sub></i>, for event
47  <i>i</i>, by:<br>
48  <center>
49  <img vspace=6 src="gif/tmva_likratio.gif" align="bottom" >
50  </center>
51  Here the signal and background likelihoods, <i>L<sub>S</sub></i>,
52  <i>L<sub>B</sub></i>, are products of the corresponding probability
53  densities, <i>p<sub>S</sub></i>, <i>p<sub>B</sub></i>, of the
54  <i>N</i><sub>var</sub> discriminating variables used in the MVA: <br>
55  <center>
56  <img vspace=6 src="gif/tmva_lik.gif" align="bottom" >
57  </center>
58  and accordingly for L<sub>B</sub>.
59  In practise, TMVA uses polynomial splines to estimate the probability
60  density functions (PDF) obtained from the distributions of the
61  training variables.<br>
62 
63  <p>
64  Note that in TMVA the output of the likelihood ratio is transformed
65  by<br>
66  <center>
67  <img vspace=6 src="gif/tmva_likratio_trans.gif" align="bottom"/>
68  </center>
69  to avoid the occurrence of heavy peaks at <i>R<sub>L</sub></i>=0,1.
70 
71  <b>Decorrelated (or "diagonalized") Likelihood</b>
72 
73  <p>
74  The biggest drawback of the Likelihood approach is that it assumes
75  that the discriminant variables are uncorrelated. If it were the case,
76  it can be proven that the discrimination obtained by the above likelihood
77  ratio is optimal, ie, no other method can beat it. However, in most
78  practical applications of MVAs correlations are present. <br><p></p>
79 
80  <p>
81  Linear correlations, measured from the training sample, can be taken
82  into account in a straightforward manner through the square-root
83  of the covariance matrix. The square-root of a matrix
84  <i>C</i> is the matrix <i>C&prime;</i> that multiplied with itself
85  yields <i>C</i>: <i>C</i>=<i>C&prime;C&prime;</i>. We compute the
86  square-root matrix (SQM) by means of diagonalising (<i>D</i>) the
87  covariance matrix: <br>
88  <center>
89  <img vspace=6 src="gif/tmva_sqm.gif" align="bottom" >
90  </center>
91  and the linear transformation of the linearly correlated into the
92  uncorrelated variables space is then given by multiplying the measured
93  variable tuple by the inverse of the SQM. Note that these transformations
94  are performed for both signal and background separately, since the
95  correlation pattern is not the same in the two samples.
96 
97  <p>
98  The above diagonalisation is complete for linearly correlated,
99  Gaussian distributed variables only. In real-world examples this
100  is not often the case, so that only little additional information
101  may be recovered by the diagonalisation procedure. In these cases,
102  non-linear methods must be applied.
103 
104  End_Html */
105 //_______________________________________________________________________
106 
107 #include "TMVA/MethodLikelihood.h"
108 
109 #include "TMVA/Configurable.h"
110 #include "TMVA/ClassifierFactory.h"
111 #include "TMVA/DataSet.h"
112 #include "TMVA/DataSetInfo.h"
113 #include "TMVA/IMethod.h"
114 #include "TMVA/MethodBase.h"
115 #include "TMVA/MsgLogger.h"
116 #include "TMVA/PDF.h"
117 #include "TMVA/Ranking.h"
118 #include "TMVA/Tools.h"
119 #include "TMVA/Types.h"
120 #include "TMVA/VariableInfo.h"
121 
122 #include "TMatrixD.h"
123 #include "TVector.h"
124 #include "TMath.h"
125 #include "TObjString.h"
126 #include "TFile.h"
127 #include "TKey.h"
128 #include "TH1.h"
129 #include "TClass.h"
130 #include "Riostream.h"
131 
132 #include <iomanip>
133 #include <vector>
134 #include <cstdlib>
135 
136 REGISTER_METHOD(Likelihood)
137 
138 ClassImp(TMVA::MethodLikelihood)
139 
140 ////////////////////////////////////////////////////////////////////////////////
141 /// standard constructor
142 
144  const TString& methodTitle,
145  DataSetInfo& theData,
146  const TString& theOption ) :
147  TMVA::MethodBase( jobName, Types::kLikelihood, methodTitle, theData, theOption),
148  fEpsilon ( 1.e3 * DBL_MIN ),
149  fTransformLikelihoodOutput( kFALSE ),
150  fDropVariable ( 0 ),
151  fHistSig ( 0 ),
152  fHistBgd ( 0 ),
153  fHistSig_smooth( 0 ),
154  fHistBgd_smooth( 0 ),
155  fDefaultPDFLik ( 0 ),
156  fPDFSig ( 0 ),
157  fPDFBgd ( 0 ),
158  fNsmooth ( 2 ),
159  fNsmoothVarS ( 0 ),
160  fNsmoothVarB ( 0 ),
161  fAverageEvtPerBin( 0 ),
162  fAverageEvtPerBinVarS (0),
163  fAverageEvtPerBinVarB (0),
164  fKDEfineFactor ( 0 ),
165  fInterpolateString(0)
166 {
167 }
168 
169 ////////////////////////////////////////////////////////////////////////////////
170 /// construct likelihood references from file
171 
173  const TString& theWeightFile) :
174  TMVA::MethodBase( Types::kLikelihood, theData, theWeightFile),
175  fEpsilon ( 1.e3 * DBL_MIN ),
176  fTransformLikelihoodOutput( kFALSE ),
177  fDropVariable ( 0 ),
178  fHistSig ( 0 ),
179  fHistBgd ( 0 ),
180  fHistSig_smooth( 0 ),
181  fHistBgd_smooth( 0 ),
182  fDefaultPDFLik ( 0 ),
183  fPDFSig ( 0 ),
184  fPDFBgd ( 0 ),
185  fNsmooth ( 2 ),
186  fNsmoothVarS ( 0 ),
187  fNsmoothVarB ( 0 ),
188  fAverageEvtPerBin( 0 ),
189  fAverageEvtPerBinVarS (0),
190  fAverageEvtPerBinVarB (0),
191  fKDEfineFactor ( 0 ),
192  fInterpolateString(0)
193 {
194 }
195 
196 ////////////////////////////////////////////////////////////////////////////////
197 /// destructor
198 
200 {
201  if (NULL != fDefaultPDFLik) delete fDefaultPDFLik;
202  if (NULL != fHistSig) delete fHistSig;
203  if (NULL != fHistBgd) delete fHistBgd;
204  if (NULL != fHistSig_smooth) delete fHistSig_smooth;
205  if (NULL != fHistBgd_smooth) delete fHistBgd_smooth;
206  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
207  if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
208  if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
209  }
210  if (NULL != fPDFSig) delete fPDFSig;
211  if (NULL != fPDFBgd) delete fPDFBgd;
212 }
213 
214 ////////////////////////////////////////////////////////////////////////////////
215 /// FDA can handle classification with 2 classes
216 
218  UInt_t numberClasses, UInt_t /*numberTargets*/ )
219 {
220  if (type == Types::kClassification && numberClasses == 2) return kTRUE;
221  return kFALSE;
222 }
223 
224 ////////////////////////////////////////////////////////////////////////////////
225 /// default initialisation called by all constructors
226 
228 {
229  // no ranking test
230  fDropVariable = -1;
231  fHistSig = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
232  fHistBgd = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
233  fHistSig_smooth = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
234  fHistBgd_smooth = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
235  fPDFSig = new std::vector<TMVA::PDF*>( GetNvar(), (TMVA::PDF*)0 );
236  fPDFBgd = new std::vector<TMVA::PDF*>( GetNvar(), (TMVA::PDF*)0 );
237 }
238 
239 ////////////////////////////////////////////////////////////////////////////////
240 /// define the options (their key words) that can be set in the option string
241 /// TransformOutput <bool> transform (often strongly peaked) likelihood output through sigmoid inversion
242 
244 {
245  DeclareOptionRef( fTransformLikelihoodOutput = kFALSE, "TransformOutput",
246  "Transform likelihood output by inverse sigmoid function" );
247 
248  // initialize
249 
250  // reading every PDF's definition and passing the option string to the next one to be read and marked
251  TString updatedOptions = GetOptions();
252  fDefaultPDFLik = new PDF( TString(GetName()) + " PDF", updatedOptions );
255  updatedOptions = fDefaultPDFLik->GetOptions();
256  for (UInt_t ivar = 0; ivar< DataInfo().GetNVariables(); ivar++) {
257  (*fPDFSig)[ivar] = new PDF( Form("%s PDF Sig[%d]", GetName(), ivar), updatedOptions,
258  Form("Sig[%d]",ivar), fDefaultPDFLik );
259  (*fPDFSig)[ivar]->DeclareOptions();
260  (*fPDFSig)[ivar]->ParseOptions();
261  updatedOptions = (*fPDFSig)[ivar]->GetOptions();
262  (*fPDFBgd)[ivar] = new PDF( Form("%s PDF Bkg[%d]", GetName(), ivar), updatedOptions,
263  Form("Bkg[%d]",ivar), fDefaultPDFLik );
264  (*fPDFBgd)[ivar]->DeclareOptions();
265  (*fPDFBgd)[ivar]->ParseOptions();
266  updatedOptions = (*fPDFBgd)[ivar]->GetOptions();
267  }
268 
269  // the final marked option string is written back to the original likelihood
270  SetOptions( updatedOptions );
271 }
272 
273 
275 {
276  // options that are used ONLY for the READER to ensure backward compatibility
277 
279  DeclareOptionRef( fNsmooth = 1, "NSmooth",
280  "Number of smoothing iterations for the input histograms");
281  DeclareOptionRef( fAverageEvtPerBin = 50, "NAvEvtPerBin",
282  "Average number of events per PDF bin");
283  DeclareOptionRef( fKDEfineFactor =1. , "KDEFineFactor",
284  "Fine tuning factor for Adaptive KDE: Factor to multyply the width of the kernel");
285  DeclareOptionRef( fBorderMethodString = "None", "KDEborder",
286  "Border effects treatment (1=no treatment , 2=kernel renormalization, 3=sample mirroring)" );
287  DeclareOptionRef( fKDEiterString = "Nonadaptive", "KDEiter",
288  "Number of iterations (1=non-adaptive, 2=adaptive)" );
289  DeclareOptionRef( fKDEtypeString = "Gauss", "KDEtype",
290  "KDE kernel type (1=Gauss)" );
293  fNsmoothVarS = new Int_t[GetNvar()];
294  fNsmoothVarB = new Int_t[GetNvar()];
296  for(UInt_t i=0; i<GetNvar(); ++i) {
298  fNsmoothVarS[i] = fNsmoothVarB[i] = 0;
299  fInterpolateString[i] = "";
300  }
301  DeclareOptionRef( fAverageEvtPerBinVarS, GetNvar(), "NAvEvtPerBinSig",
302  "Average num of events per PDF bin and variable (signal)");
303  DeclareOptionRef( fAverageEvtPerBinVarB, GetNvar(), "NAvEvtPerBinBkg",
304  "Average num of events per PDF bin and variable (background)");
305  DeclareOptionRef(fNsmoothVarS, GetNvar(), "NSmoothSig",
306  "Number of smoothing iterations for the input histograms");
307  DeclareOptionRef(fNsmoothVarB, GetNvar(), "NSmoothBkg",
308  "Number of smoothing iterations for the input histograms");
309  DeclareOptionRef(fInterpolateString, GetNvar(), "PDFInterpol", "Method of interpolating reference histograms (e.g. Spline2 or KDE)");
310 }
311 
312 
313 
314 
315 ////////////////////////////////////////////////////////////////////////////////
316 
318 {
319  // process user options
320  // reference cut value to distingiush signal-like from background-like events
322 
324  for (UInt_t ivar = 0; ivar< DataInfo().GetNVariables(); ivar++) {
325  (*fPDFBgd)[ivar]->ProcessOptions();
326  (*fPDFSig)[ivar]->ProcessOptions();
327  }
328 }
329 
330 ////////////////////////////////////////////////////////////////////////////////
331 /// create reference distributions (PDFs) from signal and background events:
332 /// fill histograms and smooth them; if decorrelation is required, compute
333 /// corresponding square-root matrices
334 /// the reference histograms require the correct boundaries. Since in Likelihood classification
335 /// the transformations are applied using both classes, also the corresponding boundaries
336 /// need to take this into account
337 
339 {
340  UInt_t nvar=GetNvar();
341  std::vector<Double_t> xmin(nvar), xmax(nvar);
342  for (UInt_t ivar=0; ivar<nvar; ivar++) {xmin[ivar]=1e30; xmax[ivar]=-1e30;}
343 
344  UInt_t nevents=Data()->GetNEvents();
345  for (UInt_t ievt=0; ievt<nevents; ievt++) {
346  // use the true-event-type's transformation
347  // set the event true event types transformation
348  const Event* origEv = Data()->GetEvent(ievt);
349  if (IgnoreEventsWithNegWeightsInTraining() && origEv->GetWeight()<=0) continue;
350  // loop over classes
351  for (int cls=0;cls<2;cls++){
353  const Event* ev = GetTransformationHandler().Transform( origEv );
354  for (UInt_t ivar=0; ivar<nvar; ivar++) {
355  Float_t value = ev->GetValue(ivar);
356  if (value < xmin[ivar]) xmin[ivar] = value;
357  if (value > xmax[ivar]) xmax[ivar] = value;
358  }
359  }
360  }
361 
362  // create reference histograms
363  // (KDE smoothing requires very finely binned reference histograms)
364  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
365  TString var = (*fInputVars)[ivar];
366 
367  // the reference histograms require the correct boundaries. Since in Likelihood classification
368  // the transformations are applied using both classes, also the corresponding boundaries
369  // need to take this into account
370 
371  // special treatment for discrete variables
372  if (DataInfo().GetVariableInfo(ivar).GetVarType() == 'I') {
373  // special treatment for integer variables
374  Int_t ixmin = TMath::Nint( xmin[ivar] );
375  xmax[ivar]=xmax[ivar]+1; // make sure that all entries are included in histogram
376  Int_t ixmax = TMath::Nint( xmax[ivar] );
377  Int_t nbins = ixmax - ixmin;
378  (*fHistSig)[ivar] = new TH1F(GetMethodName()+"_"+var + "_sig", var + " signal training", nbins, ixmin, ixmax );
379  (*fHistBgd)[ivar] = new TH1F(GetMethodName()+"_"+var + "_bgd", var + " background training", nbins, ixmin, ixmax );
380  } else {
381 
382  UInt_t minNEvt = TMath::Min(Data()->GetNEvtSigTrain(),Data()->GetNEvtBkgdTrain());
383  Int_t nbinsS = (*fPDFSig)[ivar]->GetHistNBins( minNEvt );
384  Int_t nbinsB = (*fPDFBgd)[ivar]->GetHistNBins( minNEvt );
385 
386  (*fHistSig)[ivar] = new TH1F( Form("%s_%s_%s_sig",DataInfo().GetName(),GetMethodName().Data(),var.Data()),
387  Form("%s_%s_%s signal training",DataInfo().GetName(),GetMethodName().Data(),var.Data()), nbinsS, xmin[ivar], xmax[ivar] );
388  (*fHistBgd)[ivar] = new TH1F( Form("%s_%s_%s_bgd",DataInfo().GetName(),GetMethodName().Data(),var.Data()),
389  Form("%s_%s_%s background training",DataInfo().GetName(),GetMethodName().Data(),var.Data()), nbinsB, xmin[ivar], xmax[ivar] );
390  }
391  }
392 
393  // ----- fill the reference histograms
394  Log() << kINFO << "Filling reference histograms" << Endl;
395 
396  // event loop
397  for (Int_t ievt=0; ievt<Data()->GetNEvents(); ievt++) {
398 
399  // use the true-event-type's transformation
400  // set the event true event types transformation
401  const Event* origEv = Data()->GetEvent(ievt);
402  if (IgnoreEventsWithNegWeightsInTraining() && origEv->GetWeight()<=0) continue;
404  const Event* ev = GetTransformationHandler().Transform( origEv );
405 
406  // the event weight
407  Float_t weight = ev->GetWeight();
408 
409  // fill variable vector
410  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
411  Double_t value = ev->GetValue(ivar);
412  // verify limits
413  if (value >= xmax[ivar]) value = xmax[ivar] - 1.0e-10;
414  else if (value < xmin[ivar]) value = xmin[ivar] + 1.0e-10;
415  // inserting check if there are events in overflow or underflow
416  if (value >=(*fHistSig)[ivar]->GetXaxis()->GetXmax() ||
417  value <(*fHistSig)[ivar]->GetXaxis()->GetXmin()){
418  Log()<<kWARNING
419  <<"error in filling likelihood reference histograms var="
420  <<(*fInputVars)[ivar]
421  << ", xmin="<<(*fHistSig)[ivar]->GetXaxis()->GetXmin()
422  << ", value="<<value
423  << ", xmax="<<(*fHistSig)[ivar]->GetXaxis()->GetXmax()
424  << Endl;
425  }
426  if (DataInfo().IsSignal(ev)) (*fHistSig)[ivar]->Fill( value, weight );
427  else (*fHistBgd)[ivar]->Fill( value, weight );
428  }
429  }
430 
431  // building the pdfs
432  Log() << kINFO << "Building PDF out of reference histograms" << Endl;
433  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
434 
435  // the PDF is built from (binned) reference histograms
436  // in case of KDE, this has a large number of bins, which makes it quasi-unbinned
437  (*fPDFSig)[ivar]->BuildPDF( (*fHistSig)[ivar] );
438  (*fPDFBgd)[ivar]->BuildPDF( (*fHistBgd)[ivar] );
439 
440  (*fPDFSig)[ivar]->ValidatePDF( (*fHistSig)[ivar] );
441  (*fPDFBgd)[ivar]->ValidatePDF( (*fHistBgd)[ivar] );
442 
443  // saving the smoothed histograms
444  if ((*fPDFSig)[ivar]->GetSmoothedHist() != 0) (*fHistSig_smooth)[ivar] = (*fPDFSig)[ivar]->GetSmoothedHist();
445  if ((*fPDFBgd)[ivar]->GetSmoothedHist() != 0) (*fHistBgd_smooth)[ivar] = (*fPDFBgd)[ivar]->GetSmoothedHist();
446  }
448 }
449 
450 ////////////////////////////////////////////////////////////////////////////////
451 /// returns the likelihood estimator for signal
452 /// fill a new Likelihood branch into the testTree
453 
455 {
456  UInt_t ivar;
457 
458  // cannot determine error
459  NoErrorCalc(err, errUpper);
460 
461  // retrieve variables, and transform, if required
462  TVector vs( GetNvar() );
463  TVector vb( GetNvar() );
464 
465  // need to distinguish signal and background in case of variable transformation
466  // signal first
467 
469  // temporary: JS --> FIX
470  //GetTransformationHandler().SetTransformationReferenceClass( 0 );
471  const Event* ev = GetEvent();
472  for (ivar=0; ivar<GetNvar(); ivar++) vs(ivar) = ev->GetValue(ivar);
473 
475  // temporary: JS --> FIX
476  //GetTransformationHandler().SetTransformationReferenceClass( 1 );
477  ev = GetEvent();
478  for (ivar=0; ivar<GetNvar(); ivar++) vb(ivar) = ev->GetValue(ivar);
479 
480  // compute the likelihood (signal)
481  Double_t ps(1), pb(1), p(0);
482  for (ivar=0; ivar<GetNvar(); ivar++) {
483 
484  // drop one variable (this is ONLY used for internal variable ranking !)
485  if ((Int_t)ivar == fDropVariable) continue;
486 
487  Double_t x[2] = { vs(ivar), vb(ivar) };
488 
489  for (UInt_t itype=0; itype < 2; itype++) {
490 
491  // verify limits
492  if (x[itype] >= (*fPDFSig)[ivar]->GetXmax()) x[itype] = (*fPDFSig)[ivar]->GetXmax() - 1.0e-10;
493  else if (x[itype] < (*fPDFSig)[ivar]->GetXmin()) x[itype] = (*fPDFSig)[ivar]->GetXmin();
494 
495  // find corresponding histogram from cached indices
496  PDF* pdf = (itype == 0) ? (*fPDFSig)[ivar] : (*fPDFBgd)[ivar];
497  if (pdf == 0) Log() << kFATAL << "<GetMvaValue> Reference histograms don't exist" << Endl;
498  TH1* hist = pdf->GetPDFHist();
499 
500  // interpolate linearly between adjacent bins
501  // this is not useful for discrete variables
502  Int_t bin = hist->FindBin(x[itype]);
503 
504  // **** POTENTIAL BUG: PREFORMANCE IS WORSE WHEN USING TRUE TYPE ***
505  // ==> commented out at present
506  if ((*fPDFSig)[ivar]->GetInterpolMethod() == TMVA::PDF::kSpline0 ||
507  DataInfo().GetVariableInfo(ivar).GetVarType() == 'N') {
508  p = TMath::Max( hist->GetBinContent(bin), fEpsilon );
509  } else { // splined PDF
510  Int_t nextbin = bin;
511  if ((x[itype] > hist->GetBinCenter(bin) && bin != hist->GetNbinsX()) || bin == 1)
512  nextbin++;
513  else
514  nextbin--;
515 
516 
517  Double_t dx = hist->GetBinCenter(bin) - hist->GetBinCenter(nextbin);
518  Double_t dy = hist->GetBinContent(bin) - hist->GetBinContent(nextbin);
519  Double_t like = hist->GetBinContent(bin) + (x[itype] - hist->GetBinCenter(bin)) * dy/dx;
520 
521  p = TMath::Max( like, fEpsilon );
522  }
523 
524  if (itype == 0) ps *= p;
525  else pb *= p;
526  }
527  }
528 
529  // the likelihood ratio (transform it ?)
530  return TransformLikelihoodOutput( ps, pb );
531 }
532 
533 ////////////////////////////////////////////////////////////////////////////////
534 /// returns transformed or non-transformed output
535 
537 {
538  if (ps < fEpsilon) ps = fEpsilon;
539  if (pb < fEpsilon) pb = fEpsilon;
540  Double_t r = ps/(ps + pb);
541  if (r >= 1.0) r = 1. - 1.e-15;
542 
544  // inverse Fermi function
545 
546  // sanity check
547  if (r <= 0.0) r = fEpsilon;
548  else if (r >= 1.0) r = 1. - 1.e-15;
549 
550  Double_t tau = 15.0;
551  r = - TMath::Log(1.0/r - 1.0)/tau;
552  }
553 
554  return r;
555 }
556 
557 ////////////////////////////////////////////////////////////////////////////////
558 /// write options to stream
559 
560 void TMVA::MethodLikelihood::WriteOptionsToStream( std::ostream& o, const TString& prefix ) const
561 {
563 
564  // writing the options defined for the different pdfs
565  if (fDefaultPDFLik != 0) {
566  o << prefix << std::endl << prefix << "#Default Likelihood PDF Options:" << std::endl << prefix << std::endl;
567  fDefaultPDFLik->WriteOptionsToStream( o, prefix );
568  }
569  for (UInt_t ivar = 0; ivar < fPDFSig->size(); ivar++) {
570  if ((*fPDFSig)[ivar] != 0) {
571  o << prefix << std::endl << prefix << Form("#Signal[%d] Likelihood PDF Options:",ivar) << std::endl << prefix << std::endl;
572  (*fPDFSig)[ivar]->WriteOptionsToStream( o, prefix );
573  }
574  if ((*fPDFBgd)[ivar] != 0) {
575  o << prefix << std::endl << prefix << "#Background[%d] Likelihood PDF Options:" << std::endl << prefix << std::endl;
576  (*fPDFBgd)[ivar]->WriteOptionsToStream( o, prefix );
577  }
578  }
579 }
580 
581 ////////////////////////////////////////////////////////////////////////////////
582 /// write weights to XML
583 
584 void TMVA::MethodLikelihood::AddWeightsXMLTo( void* parent ) const
585 {
586  void* wght = gTools().AddChild(parent, "Weights");
587  gTools().AddAttr(wght, "NVariables", GetNvar());
588  gTools().AddAttr(wght, "NClasses", 2);
589  void* pdfwrap;
590  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
591  if ( (*fPDFSig)[ivar]==0 || (*fPDFBgd)[ivar]==0 )
592  Log() << kFATAL << "Reference histograms for variable " << ivar
593  << " don't exist, can't write it to weight file" << Endl;
594  pdfwrap = gTools().AddChild(wght, "PDFDescriptor");
595  gTools().AddAttr(pdfwrap, "VarIndex", ivar);
596  gTools().AddAttr(pdfwrap, "ClassIndex", 0);
597  (*fPDFSig)[ivar]->AddXMLTo(pdfwrap);
598  pdfwrap = gTools().AddChild(wght, "PDFDescriptor");
599  gTools().AddAttr(pdfwrap, "VarIndex", ivar);
600  gTools().AddAttr(pdfwrap, "ClassIndex", 1);
601  (*fPDFBgd)[ivar]->AddXMLTo(pdfwrap);
602  }
603 }
604 
605 ////////////////////////////////////////////////////////////////////////////////
606 /// computes ranking of input variables
607 
609 {
610  // create the ranking object
611  if (fRanking) delete fRanking;
612  fRanking = new Ranking( GetName(), "Delta Separation" );
613 
614  Double_t sepRef = -1, sep = -1;
615  for (Int_t ivar=-1; ivar<(Int_t)GetNvar(); ivar++) {
616 
617  // this variable should not be used
618  fDropVariable = ivar;
619 
620  TString nameS = Form( "rS_%i", ivar+1 );
621  TString nameB = Form( "rB_%i", ivar+1 );
622  TH1* rS = new TH1F( nameS, nameS, 80, 0, 1 );
623  TH1* rB = new TH1F( nameB, nameB, 80, 0, 1 );
624 
625  // the event loop
626  for (Int_t ievt=0; ievt<Data()->GetNTrainingEvents(); ievt++) {
627 
628  const Event* origEv = Data()->GetEvent(ievt);
630  const Event* ev = GetTransformationHandler().Transform(Data()->GetEvent(ievt));
631 
632  Double_t lk = this->GetMvaValue();
633  Double_t w = ev->GetWeight();
634  if (DataInfo().IsSignal(ev)) rS->Fill( lk, w );
635  else rB->Fill( lk, w );
636  }
637 
638  // compute separation
639  sep = TMVA::gTools().GetSeparation( rS, rB );
640  if (ivar == -1) sepRef = sep;
641  sep = sepRef - sep;
642 
643  // don't need these histograms anymore
644  delete rS;
645  delete rB;
646 
647  if (ivar >= 0) fRanking->AddRank( Rank( DataInfo().GetVariableInfo(ivar).GetInternalName(), sep ) );
648  }
649 
650  fDropVariable = -1;
651 
652  return fRanking;
653 }
654 
655 ////////////////////////////////////////////////////////////////////////////////
656 /// write reference PDFs to ROOT file
657 
659 {
660  TString pname = "PDF_";
661  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
662  (*fPDFSig)[ivar]->Write( pname + GetInputVar( ivar ) + "_S" );
663  (*fPDFBgd)[ivar]->Write( pname + GetInputVar( ivar ) + "_B" );
664  }
665 }
666 ////////////////////////////////////////////////////////////////////////////////
667 /// read weights from XML
668 
670 {
671  TString pname = "PDF_";
672  Bool_t addDirStatus = TH1::AddDirectoryStatus();
673  TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
674  UInt_t nvars=0;
675  gTools().ReadAttr(wghtnode, "NVariables",nvars);
676  void* descnode = gTools().GetChild(wghtnode);
677  for (UInt_t ivar=0; ivar<nvars; ivar++){
678  void* pdfnode = gTools().GetChild(descnode);
679  Log() << kDEBUG << "Reading signal and background PDF for variable: " << GetInputVar( ivar ) << Endl;
680  if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
681  if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
682  (*fPDFSig)[ivar] = new PDF( GetInputVar( ivar ) + " PDF Sig" );
683  (*fPDFBgd)[ivar] = new PDF( GetInputVar( ivar ) + " PDF Bkg" );
684  (*fPDFSig)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
685  (*fPDFBgd)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
686  (*(*fPDFSig)[ivar]).ReadXML(pdfnode);
687  descnode = gTools().GetNextChild(descnode);
688  pdfnode = gTools().GetChild(descnode);
689  (*(*fPDFBgd)[ivar]).ReadXML(pdfnode);
690  descnode = gTools().GetNextChild(descnode);
691  }
692  TH1::AddDirectory(addDirStatus);
693 }
694 ////////////////////////////////////////////////////////////////////////////////
695 /// read weight info from file
696 /// nothing to do for this method
697 
699 {
700  TString pname = "PDF_";
701  Bool_t addDirStatus = TH1::AddDirectoryStatus();
702  TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
703  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
704  Log() << kDEBUG << "Reading signal and background PDF for variable: " << GetInputVar( ivar ) << Endl;
705  if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
706  if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
707  (*fPDFSig)[ivar] = new PDF(GetInputVar( ivar ) + " PDF Sig" );
708  (*fPDFBgd)[ivar] = new PDF(GetInputVar( ivar ) + " PDF Bkg");
709  (*fPDFSig)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
710  (*fPDFBgd)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
711  istr >> *(*fPDFSig)[ivar];
712  istr >> *(*fPDFBgd)[ivar];
713  }
714  TH1::AddDirectory(addDirStatus);
715 }
716 
717 ////////////////////////////////////////////////////////////////////////////////
718 /// read reference PDF from ROOT file
719 
721 {
722  TString pname = "PDF_";
723  Bool_t addDirStatus = TH1::AddDirectoryStatus();
724  TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
725  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
726  (*fPDFSig)[ivar] = (TMVA::PDF*)rf.Get( Form( "PDF_%s_S", GetInputVar( ivar ).Data() ) );
727  (*fPDFBgd)[ivar] = (TMVA::PDF*)rf.Get( Form( "PDF_%s_B", GetInputVar( ivar ).Data() ) );
728  }
729  TH1::AddDirectory(addDirStatus);
730 }
731 
732 ////////////////////////////////////////////////////////////////////////////////
733 /// write histograms and PDFs to file for monitoring purposes
734 
736 {
737  Log() << kINFO << "Write monitoring histograms to file: " << BaseDir()->GetPath() << Endl;
738  BaseDir()->cd();
739 
740  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
741  (*fHistSig)[ivar]->Write();
742  (*fHistBgd)[ivar]->Write();
743  if ((*fHistSig_smooth)[ivar] != 0) (*fHistSig_smooth)[ivar]->Write();
744  if ((*fHistBgd_smooth)[ivar] != 0) (*fHistBgd_smooth)[ivar]->Write();
745  (*fPDFSig)[ivar]->GetPDFHist()->Write();
746  (*fPDFBgd)[ivar]->GetPDFHist()->Write();
747 
748  if ((*fPDFSig)[ivar]->GetNSmoothHist() != 0) (*fPDFSig)[ivar]->GetNSmoothHist()->Write();
749  if ((*fPDFBgd)[ivar]->GetNSmoothHist() != 0) (*fPDFBgd)[ivar]->GetNSmoothHist()->Write();
750 
751  // add special plots to check the smoothing in the GetVal method
752  Float_t xmin=((*fPDFSig)[ivar]->GetPDFHist()->GetXaxis())->GetXmin();
753  Float_t xmax=((*fPDFSig)[ivar]->GetPDFHist()->GetXaxis())->GetXmax();
754  TH1F* mm = new TH1F( (*fInputVars)[ivar]+"_additional_check",
755  (*fInputVars)[ivar]+"_additional_check", 15000, xmin, xmax );
756  Double_t intBin = (xmax-xmin)/15000;
757  for (Int_t bin=0; bin < 15000; bin++) {
758  Double_t x = (bin + 0.5)*intBin + xmin;
759  mm->SetBinContent(bin+1 ,(*fPDFSig)[ivar]->GetVal(x));
760  }
761  mm->Write();
762 
763  // ---------- create cloned low-binned histogram for comparison in macros (mainly necessary for KDE)
764  TH1* h[2] = { (*fHistSig)[ivar], (*fHistBgd)[ivar] };
765  for (UInt_t i=0; i<2; i++) {
766  TH1* hclone = (TH1F*)h[i]->Clone( TString(h[i]->GetName()) + "_nice" );
767  hclone->SetName ( TString(h[i]->GetName()) + "_nice" );
768  hclone->SetTitle( TString(h[i]->GetTitle()) + "" );
769  if (hclone->GetNbinsX() > 100) {
770  Int_t resFactor = 5;
771  hclone->Rebin( resFactor );
772  hclone->Scale( 1.0/resFactor );
773  }
774  hclone->Write();
775  }
776  // ----------
777  }
778 }
779 
780 ////////////////////////////////////////////////////////////////////////////////
781 /// write specific header of the classifier (mostly include files)
782 
783 void TMVA::MethodLikelihood::MakeClassSpecificHeader( std::ostream& fout, const TString& ) const
784 {
785  fout << "#include <math.h>" << std::endl;
786  fout << "#include <cstdlib>" << std::endl;
787 }
788 
789 ////////////////////////////////////////////////////////////////////////////////
790 /// write specific classifier response
791 
792 void TMVA::MethodLikelihood::MakeClassSpecific( std::ostream& fout, const TString& className ) const
793 {
794  Int_t dp = fout.precision();
795  fout << " double fEpsilon;" << std::endl;
796 
797  Int_t * nbin = new Int_t[GetNvar()];
798 
799  Int_t nbinMax=-1;
800  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
801  nbin[ivar]=(*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX();
802  if (nbin[ivar] > nbinMax) nbinMax=nbin[ivar];
803  }
804 
805  fout << " static float fRefS[][" << nbinMax << "]; "
806  << "// signal reference vector [nvars][max_nbins]" << std::endl;
807  fout << " static float fRefB[][" << nbinMax << "]; "
808  << "// backgr reference vector [nvars][max_nbins]" << std::endl << std::endl;
809  fout << "// if a variable has its PDF encoded as a spline0 --> treat it like an Integer valued one" <<std::endl;
810  fout << " bool fHasDiscretPDF[" << GetNvar() <<"]; "<< std::endl;
811  fout << " int fNbin[" << GetNvar() << "]; "
812  << "// number of bins (discrete variables may have less bins)" << std::endl;
813  fout << " double fHistMin[" << GetNvar() << "]; " << std::endl;
814  fout << " double fHistMax[" << GetNvar() << "]; " << std::endl;
815 
816  fout << " double TransformLikelihoodOutput( double, double ) const;" << std::endl;
817  fout << "};" << std::endl;
818  fout << "" << std::endl;
819  fout << "inline void " << className << "::Initialize() " << std::endl;
820  fout << "{" << std::endl;
821  fout << " fEpsilon = " << fEpsilon << ";" << std::endl;
822  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
823  fout << " fNbin[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() << ";" << std::endl;
824  fout << " fHistMin[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetXaxis()->GetXmin() << ";" << std::endl;
825  fout << " fHistMax[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetXaxis()->GetXmax() << ";" << std::endl;
826  // sanity check (for previous code lines)
827  if ((((*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() != nbin[ivar] ||
828  (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX() != nbin[ivar])
829  ) ||
830  (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() != (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX()) {
831  Log() << kFATAL << "<MakeClassSpecific> Mismatch in binning of variable "
832  << "\"" << GetOriginalVarName(ivar) << "\" of type: \'" << DataInfo().GetVariableInfo(ivar).GetVarType()
833  << "\' : "
834  << "nxS = " << (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() << ", "
835  << "nxB = " << (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX()
836  << " while we expect " << nbin[ivar]
837  << Endl;
838  }
839  }
840  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
841  if ((*fPDFSig)[ivar]->GetInterpolMethod() == TMVA::PDF::kSpline0)
842  fout << " fHasDiscretPDF[" << ivar <<"] = true; " << std::endl;
843  else
844  fout << " fHasDiscretPDF[" << ivar <<"] = false; " << std::endl;
845  }
846 
847  fout << "}" << std::endl << std::endl;
848 
849  fout << "inline double " << className
850  << "::GetMvaValue__( const std::vector<double>& inputValues ) const" << std::endl;
851  fout << "{" << std::endl;
852  fout << " double ps(1), pb(1);" << std::endl;
853  fout << " std::vector<double> inputValuesSig = inputValues;" << std::endl;
854  fout << " std::vector<double> inputValuesBgd = inputValues;" << std::endl;
856  fout << " Transform(inputValuesSig,0);" << std::endl;
857  fout << " Transform(inputValuesBgd,1);" << std::endl;
858  }
859  fout << " for (size_t ivar = 0; ivar < GetNvar(); ivar++) {" << std::endl;
860  fout << std::endl;
861  fout << " // dummy at present... will be used for variable transforms" << std::endl;
862  fout << " double x[2] = { inputValuesSig[ivar], inputValuesBgd[ivar] };" << std::endl;
863  fout << std::endl;
864  fout << " for (int itype=0; itype < 2; itype++) {" << std::endl;
865  fout << std::endl;
866  fout << " // interpolate linearly between adjacent bins" << std::endl;
867  fout << " // this is not useful for discrete variables (or forced Spline0)" << std::endl;
868  fout << " int bin = int((x[itype] - fHistMin[ivar])/(fHistMax[ivar] - fHistMin[ivar])*fNbin[ivar]) + 0;" << std::endl;
869  fout << std::endl;
870  fout << " // since the test data sample is in general different from the training sample" << std::endl;
871  fout << " // it can happen that the min/max of the training sample are trespassed --> correct this" << std::endl;
872  fout << " if (bin < 0) {" << std::endl;
873  fout << " bin = 0;" << std::endl;
874  fout << " x[itype] = fHistMin[ivar];" << std::endl;
875  fout << " }" << std::endl;
876  fout << " else if (bin >= fNbin[ivar]) {" << std::endl;
877  fout << " bin = fNbin[ivar]-1;" << std::endl;
878  fout << " x[itype] = fHistMax[ivar];" << std::endl;
879  fout << " }" << std::endl;
880  fout << std::endl;
881  fout << " // find corresponding histogram from cached indices" << std::endl;
882  fout << " float ref = (itype == 0) ? fRefS[ivar][bin] : fRefB[ivar][bin];" << std::endl;
883  fout << std::endl;
884  fout << " // sanity check" << std::endl;
885  fout << " if (ref < 0) {" << std::endl;
886  fout << " std::cout << \"Fatal error in " << className
887  << ": bin entry < 0 ==> abort\" << std::endl;" << std::endl;
888  fout << " std::exit(1);" << std::endl;
889  fout << " }" << std::endl;
890  fout << std::endl;
891  fout << " double p = ref;" << std::endl;
892  fout << std::endl;
893  fout << " if (GetType(ivar) != 'I' && !fHasDiscretPDF[ivar]) {" << std::endl;
894  fout << " float bincenter = (bin + 0.5)/fNbin[ivar]*(fHistMax[ivar] - fHistMin[ivar]) + fHistMin[ivar];" << std::endl;
895  fout << " int nextbin = bin;" << std::endl;
896  fout << " if ((x[itype] > bincenter && bin != fNbin[ivar]-1) || bin == 0) " << std::endl;
897  fout << " nextbin++;" << std::endl;
898  fout << " else" << std::endl;
899  fout << " nextbin--; " << std::endl;
900  fout << std::endl;
901  fout << " double refnext = (itype == 0) ? fRefS[ivar][nextbin] : fRefB[ivar][nextbin];" << std::endl;
902  fout << " float nextbincenter = (nextbin + 0.5)/fNbin[ivar]*(fHistMax[ivar] - fHistMin[ivar]) + fHistMin[ivar];" << std::endl;
903  fout << std::endl;
904  fout << " double dx = bincenter - nextbincenter;" << std::endl;
905  fout << " double dy = ref - refnext;" << std::endl;
906  fout << " p += (x[itype] - bincenter) * dy/dx;" << std::endl;
907  fout << " }" << std::endl;
908  fout << std::endl;
909  fout << " if (p < fEpsilon) p = fEpsilon; // avoid zero response" << std::endl;
910  fout << std::endl;
911  fout << " if (itype == 0) ps *= p;" << std::endl;
912  fout << " else pb *= p;" << std::endl;
913  fout << " } " << std::endl;
914  fout << " } " << std::endl;
915  fout << std::endl;
916  fout << " // the likelihood ratio (transform it ?)" << std::endl;
917  fout << " return TransformLikelihoodOutput( ps, pb ); " << std::endl;
918  fout << "}" << std::endl << std::endl;
919 
920  fout << "inline double " << className << "::TransformLikelihoodOutput( double ps, double pb ) const" << std::endl;
921  fout << "{" << std::endl;
922  fout << " // returns transformed or non-transformed output" << std::endl;
923  fout << " if (ps < fEpsilon) ps = fEpsilon;" << std::endl;
924  fout << " if (pb < fEpsilon) pb = fEpsilon;" << std::endl;
925  fout << " double r = ps/(ps + pb);" << std::endl;
926  fout << " if (r >= 1.0) r = 1. - 1.e-15;" << std::endl;
927  fout << std::endl;
928  fout << " if (" << (fTransformLikelihoodOutput ? "true" : "false") << ") {" << std::endl;
929  fout << " // inverse Fermi function" << std::endl;
930  fout << std::endl;
931  fout << " // sanity check" << std::endl;
932  fout << " if (r <= 0.0) r = fEpsilon;" << std::endl;
933  fout << " else if (r >= 1.0) r = 1. - 1.e-15;" << std::endl;
934  fout << std::endl;
935  fout << " double tau = 15.0;" << std::endl;
936  fout << " r = - log(1.0/r - 1.0)/tau;" << std::endl;
937  fout << " }" << std::endl;
938  fout << std::endl;
939  fout << " return r;" << std::endl;
940  fout << "}" << std::endl;
941  fout << std::endl;
942 
943  fout << "// Clean up" << std::endl;
944  fout << "inline void " << className << "::Clear() " << std::endl;
945  fout << "{" << std::endl;
946  fout << " // nothing to clear" << std::endl;
947  fout << "}" << std::endl << std::endl;
948 
949  fout << "// signal map" << std::endl;
950  fout << "float " << className << "::fRefS[][" << nbinMax << "] = " << std::endl;
951  fout << "{ " << std::endl;
952  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
953  fout << " { ";
954  for (Int_t ibin=1; ibin<=nbinMax; ibin++) {
955  if (ibin-1 < nbin[ivar])
956  fout << (*fPDFSig)[ivar]->GetPDFHist()->GetBinContent(ibin);
957  else
958  fout << -1;
959 
960  if (ibin < nbinMax) fout << ", ";
961  }
962  fout << " }, " << std::endl;
963  }
964  fout << "}; " << std::endl;
965  fout << std::endl;
966 
967  fout << "// background map" << std::endl;
968  fout << "float " << className << "::fRefB[][" << nbinMax << "] = " << std::endl;
969  fout << "{ " << std::endl;
970  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
971  fout << " { ";
972  fout << std::setprecision(8);
973  for (Int_t ibin=1; ibin<=nbinMax; ibin++) {
974  if (ibin-1 < nbin[ivar])
975  fout << (*fPDFBgd)[ivar]->GetPDFHist()->GetBinContent(ibin);
976  else
977  fout << -1;
978 
979  if (ibin < nbinMax) fout << ", ";
980  }
981  fout << " }, " << std::endl;
982  }
983  fout << "}; " << std::endl;
984  fout << std::endl;
985  fout << std::setprecision(dp);
986 
987  delete[] nbin;
988 }
989 
990 ////////////////////////////////////////////////////////////////////////////////
991 /// get help message text
992 ///
993 /// typical length of text line:
994 /// "|--------------------------------------------------------------|"
995 
997 {
998  Log() << Endl;
999  Log() << gTools().Color("bold") << "--- Short description:" << gTools().Color("reset") << Endl;
1000  Log() << Endl;
1001  Log() << "The maximum-likelihood classifier models the data with probability " << Endl;
1002  Log() << "density functions (PDF) reproducing the signal and background" << Endl;
1003  Log() << "distributions of the input variables. Correlations among the " << Endl;
1004  Log() << "variables are ignored." << Endl;
1005  Log() << Endl;
1006  Log() << gTools().Color("bold") << "--- Performance optimisation:" << gTools().Color("reset") << Endl;
1007  Log() << Endl;
1008  Log() << "Required for good performance are decorrelated input variables" << Endl;
1009  Log() << "(PCA transformation via the option \"VarTransform=Decorrelate\"" << Endl;
1010  Log() << "may be tried). Irreducible non-linear correlations may be reduced" << Endl;
1011  Log() << "by precombining strongly correlated input variables, or by simply" << Endl;
1012  Log() << "removing one of the variables." << Endl;
1013  Log() << Endl;
1014  Log() << gTools().Color("bold") << "--- Performance tuning via configuration options:" << gTools().Color("reset") << Endl;
1015  Log() << Endl;
1016  Log() << "High fidelity PDF estimates are mandatory, i.e., sufficient training " << Endl;
1017  Log() << "statistics is required to populate the tails of the distributions" << Endl;
1018  Log() << "It would be a surprise if the default Spline or KDE kernel parameters" << Endl;
1019  Log() << "provide a satisfying fit to the data. The user is advised to properly" << Endl;
1020  Log() << "tune the events per bin and smooth options in the spline cases" << Endl;
1021  Log() << "individually per variable. If the KDE kernel is used, the adaptive" << Endl;
1022  Log() << "Gaussian kernel may lead to artefacts, so please always also try" << Endl;
1023  Log() << "the non-adaptive one." << Endl;
1024  Log() << "" << Endl;
1025  Log() << "All tuning parameters must be adjusted individually for each input" << Endl;
1026  Log() << "variable!" << Endl;
1027 }
1028 
virtual Int_t Write(const char *name=0, Int_t option=0, Int_t bufsize=0)
Write this object to the current directory.
Definition: TObject.cxx:830
virtual const char * GetTitle() const
Returns title of object.
Definition: TNamed.h:52
virtual Int_t FindBin(Double_t x, Double_t y=0, Double_t z=0)
Return Global bin number corresponding to x,y,z.
Definition: TH1.cxx:3442
void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
virtual void Scale(Double_t c1=1, Option_t *option="")
Multiply this histogram by a constant c1.
Definition: TH1.cxx:5893
virtual Int_t Fill(Double_t x)
Increment bin with abscissa X by 1.
Definition: TH1.cxx:3127
void WriteWeightsToStream(TFile &rf) const
write reference PDFs to ROOT file
float xmin
Definition: THbookFile.cxx:93
MsgLogger & Endl(MsgLogger &ml)
Definition: MsgLogger.h:162
virtual Double_t GetBinContent(Int_t bin) const
Return content of bin number bin.
Definition: TH1.cxx:4640
virtual void WriteOptionsToStream(std::ostream &o, const TString &prefix) const
write options to stream
#define REGISTER_METHOD(CLASS)
for example
Double_t Log(Double_t x)
Definition: TMath.h:526
float Float_t
Definition: RtypesCore.h:53
void Train()
create reference distributions (PDFs) from signal and background events: fill histograms and smooth t...
const char * GetName() const
Definition: MethodBase.h:330
Bool_t IgnoreEventsWithNegWeightsInTraining() const
Definition: MethodBase.h:680
THist< 1, float, THistStatContent, THistStatUncertainty > TH1F
Definition: THist.hxx:302
TH1 * h
Definition: legend2.C:5
virtual ~MethodLikelihood()
destructor
OptionBase * DeclareOptionRef(T &ref, const TString &name, const TString &desc="")
A ROOT file is a suite of consecutive data records (TKey instances) with a well defined format...
Definition: TFile.h:50
DataSet * Data() const
Definition: MethodBase.h:405
UInt_t GetNvar() const
Definition: MethodBase.h:340
EAnalysisType
Definition: Types.h:128
TVectorT.
Definition: TMatrixTBase.h:89
const TString & GetOriginalVarName(Int_t ivar) const
Definition: MethodBase.h:505
virtual TObject * Get(const char *namecycle)
Return pointer to object identified by namecycle.
std::vector< TH1 * > * fHistSig
Basic string class.
Definition: TString.h:137
static Bool_t AddDirectoryStatus()
Static function: cannot be inlined on Windows/NT.
Definition: TH1.cxx:700
tomato 1-D histogram with a float per channel (see TH1 documentation)}
Definition: TH1.h:575
TransformationHandler & GetTransformationHandler(Bool_t takeReroutedIfAvailable=true)
Definition: MethodBase.h:390
void WriteMonitoringHistosToFile() const
write histograms and PDFs to file for monitoring purposes
Short_t Min(Short_t a, Short_t b)
Definition: TMathBase.h:170
int Int_t
Definition: RtypesCore.h:41
bool Bool_t
Definition: RtypesCore.h:59
const Bool_t kFALSE
Definition: Rtypes.h:92
int nbins[3]
virtual Int_t GetNbinsX() const
Definition: TH1.h:301
std::vector< TString > * fInputVars
Definition: MethodBase.h:582
Double_t GetWeight() const
return the event weight - depending on whether the flag IgnoreNegWeightsInTraining is or not...
Definition: Event.cxx:378
void AddAttr(void *node, const char *, const T &value, Int_t precision=16)
Definition: Tools.h:309
UInt_t GetNVariables() const
Definition: DataSetInfo.h:128
const TString & GetInputVar(Int_t i) const
Definition: MethodBase.h:345
virtual const char * GetPath() const
Returns the full path of the directory.
Definition: TDirectory.cxx:911
void WriteOptionsToStream(std::ostream &o, const TString &prefix) const
write options to output stream (e.g. in writing the MVA weight files
void * AddChild(void *parent, const char *childname, const char *content=0, bool isRootNode=false)
add child node
Definition: Tools.cxx:1134
const TString & GetMethodName() const
Definition: MethodBase.h:327
static void AddDirectory(Bool_t add=kTRUE)
Sets the flag controlling the automatic add of histograms in memory.
Definition: TH1.cxx:1220
Float_t GetValue(UInt_t ivar) const
return value of i&#39;th variable
Definition: Event.cxx:233
UInt_t fSignalClass
Definition: MethodBase.h:683
virtual const char * GetName() const
Returns name of object.
Definition: DataSetInfo.h:85
const char * Data() const
Definition: TString.h:349
Tools & gTools()
Definition: Tools.cxx:79
const TList & GetTransformationList() const
Double_t x[n]
Definition: legend1.C:17
void SetTransformationReferenceClass(Int_t cls)
overrides the setting for all classes! (this is put in basically for the likelihood-method) be carefu...
void ReadWeightsFromXML(void *wghtnode)
read weights from XML
you should not use this method at all Int_t Int_t Double_t Double_t Double_t Int_t Double_t Double_t Double_t tau
Definition: TRolke.cxx:630
void * GetChild(void *parent, const char *childname=0)
get child node
Definition: Tools.cxx:1158
virtual void ParseOptions()
options parser
void SetOptions(const TString &s)
Definition: Configurable.h:91
virtual TObject * Clone(const char *newname="") const
Make a clone of an object using the Streamer facility.
Definition: TNamed.cxx:65
Definition: PDF.h:71
std::vector< PDF * > * fPDFSig
const Event * Transform(const Event *) const
the transformation
std::vector< TH1 * > * fHistBgd_smooth
Double_t TransformLikelihoodOutput(Double_t ps, Double_t pb) const
returns transformed or non-transformed output
virtual Double_t GetBinCenter(Int_t bin) const
Return bin center for 1D histogram.
Definition: TH1.cxx:8208
char GetVarType() const
Definition: VariableInfo.h:69
TRandom2 r(17)
virtual void SetBinContent(Int_t bin, Double_t content)
Set bin content see convention for numbering bins in TH1::GetBin In case the bin number is greater th...
Definition: TH1.cxx:8280
virtual TH1 * Rebin(Int_t ngroup=2, const char *newname="", const Double_t *xbins=0)
Rebin this histogram.
Definition: TH1.cxx:5578
void GetHelpMessage() const
get help message text
std::vector< TH1 * > * fHistSig_smooth
unsigned int UInt_t
Definition: RtypesCore.h:42
const Event * GetEvent() const
Definition: MethodBase.h:745
const Ranking * CreateRanking()
computes ranking of input variables
char * Form(const char *fmt,...)
Double_t GetSeparation(TH1 *S, TH1 *B) const
compute "separation" defined as <s2> = (1/2) Int_-oo..+oo { (S(x) - B(x))^2/(S(x) + B(x)) dx } ...
Definition: Tools.cxx:136
void ReadAttr(void *node, const char *, T &value)
Definition: Tools.h:296
float xmax
Definition: THbookFile.cxx:93
virtual Bool_t HasAnalysisType(Types::EAnalysisType type, UInt_t numberClasses, UInt_t numberTargets)
FDA can handle classification with 2 classes.
UInt_t fBackgroundClass
Definition: MethodBase.h:684
virtual void SetName(const char *name)
Change the name of this histogram.
Definition: TH1.cxx:8028
const Event * GetEvent() const
Definition: DataSet.cxx:211
void MakeClassSpecific(std::ostream &, const TString &) const
write specific classifier response
void DeclareOptions()
define the options (their key words) that can be set in the option string know options: PDFInterpol[i...
Definition: PDF.cxx:816
virtual Int_t GetSize() const
Definition: TCollection.h:95
#define ClassImp(name)
Definition: Rtypes.h:279
double Double_t
Definition: RtypesCore.h:55
void DeclareOptions()
define the options (their key words) that can be set in the option string TransformOutput <bool> tran...
std::vector< PDF * > * fPDFBgd
int type
Definition: TGX11.cxx:120
TDirectory * BaseDir() const
returns the ROOT directory where info/histograms etc of the corresponding MVA method instance are sto...
void * GetNextChild(void *prevchild, const char *childname=0)
XML helpers.
Definition: Tools.cxx:1170
Long64_t GetNEvents(Types::ETreeType type=Types::kMaxTreeType) const
Definition: DataSet.h:229
MsgLogger & Log() const
Definition: Configurable.h:128
The TH1 histogram class.
Definition: TH1.h:80
DataSetInfo & DataInfo() const
Definition: MethodBase.h:406
VariableInfo & GetVariableInfo(Int_t i)
Definition: DataSetInfo.h:114
UInt_t GetClass() const
Definition: Event.h:89
void AddWeightsXMLTo(void *parent) const
write weights to XML
void ExitFromTraining()
Definition: MethodBase.h:458
void Init()
default initialisation called by all constructors
const TString & Color(const TString &)
human readable color strings
Definition: Tools.cxx:837
void ReadWeightsFromStream(std::istream &istr)
read weight info from file nothing to do for this method
Abstract ClassifierFactory template that handles arbitrary types.
Ranking * fRanking
Definition: MethodBase.h:581
TH1 * GetPDFHist() const
Definition: PDF.h:100
virtual Bool_t cd(const char *path=0)
Change current directory to "this" directory.
Definition: TDirectory.cxx:435
Double_t GetXmax(Int_t ivar) const
Definition: MethodBase.h:353
virtual void AddRank(const Rank &rank)
Add a new rank take ownership of it.
Definition: Ranking.cxx:86
UInt_t GetTrainingTMVAVersionCode() const
Definition: MethodBase.h:385
virtual void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
Definition: MethodBase.cxx:590
void ProcessOptions()
Definition: PDF.cxx:866
Short_t Max(Short_t a, Short_t b)
Definition: TMathBase.h:202
const TString & GetOptions() const
Definition: Configurable.h:90
void MakeClassSpecificHeader(std::ostream &, const TString &="") const
write specific header of the classifier (mostly include files)
#define NULL
Definition: Rtypes.h:82
Long64_t GetNTrainingEvents() const
Definition: DataSet.h:93
std::vector< TH1 * > * fHistBgd
virtual void SetTitle(const char *title)
Change (i.e.
Definition: TH1.cxx:5984
const Bool_t kTRUE
Definition: Rtypes.h:91
MethodLikelihood(const TString &jobName, const TString &methodTitle, DataSetInfo &theData, const TString &theOption="")
standard constructor
Int_t Nint(T x)
Definition: TMath.h:480
void NoErrorCalc(Double_t *const err, Double_t *const errUpper)
Definition: MethodBase.cxx:819
void SetSignalReferenceCut(Double_t cut)
Definition: MethodBase.h:360
Double_t GetXmin(Int_t ivar) const
Definition: MethodBase.h:352
Double_t GetMvaValue(Double_t *err=0, Double_t *errUpper=0)
returns the likelihood estimator for signal fill a new Likelihood branch into the testTree ...