Logo ROOT  
Reference Guide
MethodLikelihood.cxx
Go to the documentation of this file.
1 // @(#)root/tmva $Id$
2 // Author: Andreas Hoecker, Joerg Stelzer, Helge Voss, Kai Voss, Eckhard von Toerne, Jan Therhaag
3 
4 /**********************************************************************************
5  * Project: TMVA - a Root-integrated toolkit for multivariate Data analysis *
6  * Package: TMVA *
7  * Class : TMVA::MethodLikelihood *
8  * Web : http://tmva.sourceforge.net *
9  * *
10  * Description: *
11  * Implementation (see header for description) *
12  * *
13  * Authors (alphabetical): *
14  * Andreas Hoecker <Andreas.Hocker@cern.ch> - CERN, Switzerland *
15  * Helge Voss <Helge.Voss@cern.ch> - MPI-K Heidelberg, Germany *
16  * Kai Voss <Kai.Voss@cern.ch> - U. of Victoria, Canada *
17  * Jan Therhaag <Jan.Therhaag@cern.ch> - U of Bonn, Germany *
18  * Eckhard v. Toerne <evt@uni-bonn.de> - U. of Bonn, Germany *
19  * *
20  * Copyright (c) 2005-2011: *
21  * CERN, Switzerland *
22  * U. of Victoria, Canada *
23  * MPI-K Heidelberg, Germany *
24  * U. of Bonn, Germany *
25  * *
26  * Redistribution and use in source and binary forms, with or without *
27  * modification, are permitted according to the terms listed in LICENSE *
28  * (http://tmva.sourceforge.net/LICENSE) *
29  **********************************************************************************/
30 
31 /*! \class TMVA::MethodLikelihood
32 \ingroup TMVA
33 
34 Likelihood analysis ("non-parametric approach")
35 
36 
37 Also implemented is a "diagonalized likelihood approach",
38 which improves over the uncorrelated likelihood approach by
39 transforming linearly the input variables into a diagonal space,
40 using the square-root of the covariance matrix
41 
42 
43 The method of maximum likelihood is the most straightforward, and
44 certainly among the most elegant multivariate analyser approaches.
45 We define the likelihood ratio, \f$ R_L \f$, for event
46 \f$ i \f$, by:
47 
48 \f[
49 R_L(i) = \frac{L_S(i)}{L_B(i) + L_B(i)}
50 \f]
51 
52 Here the signal and background likelihoods, \f$ L_S \f$,
53 \f$ L_B \f$, are products of the corresponding probability
54 densities, \f$ p_S \f$, \f$ p_B \f$, of the
55 \f$ N_{var} \f$ discriminating variables used in the MVA:
56 
57 \f[
58 L_S(i) \ \prod_{j=1}^{N_{var}} p_{Sj} (i)
59 \f]
60 
61 and accordingly for \f$ L_B \f$.
62 In practise, TMVA uses polynomial splines to estimate the probability
63 density functions (PDF) obtained from the distributions of the
64 training variables.
65 
66 
67 Note that in TMVA the output of the likelihood ratio is transformed by:
68 
69 \f[
70 R_L(i) \to R'_L(i) = -\frac{1}{\tau} ln(R_L^{-1}(i) -1)
71 \f]
72 
73 to avoid the occurrence of heavy peaks at \f$ R_L = 0.1 \f$ .
74 
75 #### Decorrelated (or "diagonalized") Likelihood
76 
77 The biggest drawback of the Likelihood approach is that it assumes
78 that the discriminant variables are uncorrelated. If it were the case,
79 it can be proven that the discrimination obtained by the above likelihood
80 ratio is optimal, ie, no other method can beat it. However, in most
81 practical applications of MVAs correlations are present. </p>
82 
83 
84 Linear correlations, measured from the training sample, can be taken
85 into account in a straightforward manner through the square-root
86 of the covariance matrix. The square-root of a matrix
87 \f$ C \f$ is the matrix \f$ C&prime; \f$ that multiplied with itself
88 yields \f$ C \f$: \f$ C \f$=\f$ C&prime;C&prime; \f$. We compute the
89 square-root matrix (SQM) by means of diagonalising (\f$ D \f$) the
90 covariance matrix:
91 
92 \f[
93 D = S^TCS \Rightarrow C' = S \sqrt{DS^T}
94 \f]
95 
96 and the linear transformation of the linearly correlated into the
97 uncorrelated variables space is then given by multiplying the measured
98 variable tuple by the inverse of the SQM. Note that these transformations
99 are performed for both signal and background separately, since the
100 correlation pattern is not the same in the two samples.
101 
102 
103 The above diagonalisation is complete for linearly correlated,
104 Gaussian distributed variables only. In real-world examples this
105 is not often the case, so that only little additional information
106 may be recovered by the diagonalisation procedure. In these cases,
107 non-linear methods must be applied.
108 */
109 
110 #include "TMVA/MethodLikelihood.h"
111 
112 #include "TMVA/Configurable.h"
113 #include "TMVA/ClassifierFactory.h"
114 #include "TMVA/DataSet.h"
115 #include "TMVA/DataSetInfo.h"
116 #include "TMVA/IMethod.h"
117 #include "TMVA/MethodBase.h"
118 #include "TMVA/MsgLogger.h"
119 #include "TMVA/PDF.h"
120 #include "TMVA/Ranking.h"
121 #include "TMVA/Tools.h"
122 #include "TMVA/Types.h"
123 #include "TMVA/VariableInfo.h"
124 
125 #include "TVector.h"
126 #include "TMath.h"
127 #include "TFile.h"
128 #include "TH1.h"
129 
130 #include <iostream>
131 #include <iomanip>
132 #include <vector>
133 #include <cstdlib>
134 
135 REGISTER_METHOD(Likelihood)
136 
138 
139 ////////////////////////////////////////////////////////////////////////////////
140 /// standard constructor
141 
143  const TString& methodTitle,
144  DataSetInfo& theData,
145  const TString& theOption ) :
146  TMVA::MethodBase( jobName, Types::kLikelihood, methodTitle, theData, theOption),
147  fEpsilon ( 1.e3 * DBL_MIN ),
148  fTransformLikelihoodOutput( kFALSE ),
149  fDropVariable ( 0 ),
150  fHistSig ( 0 ),
151  fHistBgd ( 0 ),
152  fHistSig_smooth( 0 ),
153  fHistBgd_smooth( 0 ),
154  fDefaultPDFLik ( 0 ),
155  fPDFSig ( 0 ),
156  fPDFBgd ( 0 ),
157  fNsmooth ( 2 ),
158  fNsmoothVarS ( 0 ),
159  fNsmoothVarB ( 0 ),
160  fAverageEvtPerBin( 0 ),
161  fAverageEvtPerBinVarS (0),
162  fAverageEvtPerBinVarB (0),
163  fKDEfineFactor ( 0 ),
164  fInterpolateString(0)
165 {
166 }
167 
168 ////////////////////////////////////////////////////////////////////////////////
169 /// construct likelihood references from file
170 
172  const TString& theWeightFile) :
173  TMVA::MethodBase( Types::kLikelihood, theData, theWeightFile),
174  fEpsilon ( 1.e3 * DBL_MIN ),
175  fTransformLikelihoodOutput( kFALSE ),
176  fDropVariable ( 0 ),
177  fHistSig ( 0 ),
178  fHistBgd ( 0 ),
179  fHistSig_smooth( 0 ),
180  fHistBgd_smooth( 0 ),
181  fDefaultPDFLik ( 0 ),
182  fPDFSig ( 0 ),
183  fPDFBgd ( 0 ),
184  fNsmooth ( 2 ),
185  fNsmoothVarS ( 0 ),
186  fNsmoothVarB ( 0 ),
187  fAverageEvtPerBin( 0 ),
188  fAverageEvtPerBinVarS (0),
189  fAverageEvtPerBinVarB (0),
190  fKDEfineFactor ( 0 ),
191  fInterpolateString(0)
192 {
193 }
194 
195 ////////////////////////////////////////////////////////////////////////////////
196 /// destructor
197 
199 {
200  if (NULL != fDefaultPDFLik) delete fDefaultPDFLik;
201  if (NULL != fHistSig) delete fHistSig;
202  if (NULL != fHistBgd) delete fHistBgd;
203  if (NULL != fHistSig_smooth) delete fHistSig_smooth;
204  if (NULL != fHistBgd_smooth) delete fHistBgd_smooth;
205  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
206  if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
207  if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
208  }
209  if (NULL != fPDFSig) delete fPDFSig;
210  if (NULL != fPDFBgd) delete fPDFBgd;
211 }
212 
213 ////////////////////////////////////////////////////////////////////////////////
214 /// FDA can handle classification with 2 classes
215 
217  UInt_t numberClasses, UInt_t /*numberTargets*/ )
218 {
219  if (type == Types::kClassification && numberClasses == 2) return kTRUE;
220  return kFALSE;
221 }
222 
223 ////////////////////////////////////////////////////////////////////////////////
224 /// default initialisation called by all constructors
225 
227 {
228  // no ranking test
229  fDropVariable = -1;
230  fHistSig = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
231  fHistBgd = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
232  fHistSig_smooth = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
233  fHistBgd_smooth = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
234  fPDFSig = new std::vector<TMVA::PDF*>( GetNvar(), (TMVA::PDF*)0 );
235  fPDFBgd = new std::vector<TMVA::PDF*>( GetNvar(), (TMVA::PDF*)0 );
236 }
237 
238 ////////////////////////////////////////////////////////////////////////////////
239 /// define the options (their key words) that can be set in the option string
240 ///
241 /// TransformOutput <bool> transform (often strongly peaked) likelihood output through sigmoid inversion
242 
244 {
245  DeclareOptionRef( fTransformLikelihoodOutput = kFALSE, "TransformOutput",
246  "Transform likelihood output by inverse sigmoid function" );
247 
248  // initialize
249 
250  // reading every PDF's definition and passing the option string to the next one to be read and marked
251  TString updatedOptions = GetOptions();
252  fDefaultPDFLik = new PDF( TString(GetName()) + " PDF", updatedOptions );
253  fDefaultPDFLik->DeclareOptions();
254  fDefaultPDFLik->ParseOptions();
255  updatedOptions = fDefaultPDFLik->GetOptions();
256  for (UInt_t ivar = 0; ivar< DataInfo().GetNVariables(); ivar++) {
257  (*fPDFSig)[ivar] = new PDF( Form("%s PDF Sig[%d]", GetName(), ivar), updatedOptions,
258  Form("Sig[%d]",ivar), fDefaultPDFLik );
259  (*fPDFSig)[ivar]->DeclareOptions();
260  (*fPDFSig)[ivar]->ParseOptions();
261  updatedOptions = (*fPDFSig)[ivar]->GetOptions();
262  (*fPDFBgd)[ivar] = new PDF( Form("%s PDF Bkg[%d]", GetName(), ivar), updatedOptions,
263  Form("Bkg[%d]",ivar), fDefaultPDFLik );
264  (*fPDFBgd)[ivar]->DeclareOptions();
265  (*fPDFBgd)[ivar]->ParseOptions();
266  updatedOptions = (*fPDFBgd)[ivar]->GetOptions();
267  }
268 
269  // the final marked option string is written back to the original likelihood
270  SetOptions( updatedOptions );
271 }
272 
273 
275 {
276  // options that are used ONLY for the READER to ensure backward compatibility
277 
279  DeclareOptionRef( fNsmooth = 1, "NSmooth",
280  "Number of smoothing iterations for the input histograms");
281  DeclareOptionRef( fAverageEvtPerBin = 50, "NAvEvtPerBin",
282  "Average number of events per PDF bin");
283  DeclareOptionRef( fKDEfineFactor =1. , "KDEFineFactor",
284  "Fine tuning factor for Adaptive KDE: Factor to multiply the width of the kernel");
285  DeclareOptionRef( fBorderMethodString = "None", "KDEborder",
286  "Border effects treatment (1=no treatment , 2=kernel renormalization, 3=sample mirroring)" );
287  DeclareOptionRef( fKDEiterString = "Nonadaptive", "KDEiter",
288  "Number of iterations (1=non-adaptive, 2=adaptive)" );
289  DeclareOptionRef( fKDEtypeString = "Gauss", "KDEtype",
290  "KDE kernel type (1=Gauss)" );
291  fAverageEvtPerBinVarS = new Int_t[GetNvar()];
292  fAverageEvtPerBinVarB = new Int_t[GetNvar()];
293  fNsmoothVarS = new Int_t[GetNvar()];
294  fNsmoothVarB = new Int_t[GetNvar()];
295  fInterpolateString = new TString[GetNvar()];
296  for(UInt_t i=0; i<GetNvar(); ++i) {
297  fAverageEvtPerBinVarS[i] = fAverageEvtPerBinVarB[i] = 0;
298  fNsmoothVarS[i] = fNsmoothVarB[i] = 0;
299  fInterpolateString[i] = "";
300  }
301  DeclareOptionRef( fAverageEvtPerBinVarS, GetNvar(), "NAvEvtPerBinSig",
302  "Average num of events per PDF bin and variable (signal)");
303  DeclareOptionRef( fAverageEvtPerBinVarB, GetNvar(), "NAvEvtPerBinBkg",
304  "Average num of events per PDF bin and variable (background)");
305  DeclareOptionRef(fNsmoothVarS, GetNvar(), "NSmoothSig",
306  "Number of smoothing iterations for the input histograms");
307  DeclareOptionRef(fNsmoothVarB, GetNvar(), "NSmoothBkg",
308  "Number of smoothing iterations for the input histograms");
309  DeclareOptionRef(fInterpolateString, GetNvar(), "PDFInterpol", "Method of interpolating reference histograms (e.g. Spline2 or KDE)");
310 }
311 
312 ////////////////////////////////////////////////////////////////////////////////
313 /// process user options
314 /// reference cut value to distinguish signal-like from background-like events
315 
317 {
318  SetSignalReferenceCut( TransformLikelihoodOutput( 0.5, 0.5 ) );
319 
320  fDefaultPDFLik->ProcessOptions();
321  for (UInt_t ivar = 0; ivar< DataInfo().GetNVariables(); ivar++) {
322  (*fPDFBgd)[ivar]->ProcessOptions();
323  (*fPDFSig)[ivar]->ProcessOptions();
324  }
325 }
326 
327 ////////////////////////////////////////////////////////////////////////////////
328 /// create reference distributions (PDFs) from signal and background events:
329 /// fill histograms and smooth them; if decorrelation is required, compute
330 /// corresponding square-root matrices
331 /// the reference histograms require the correct boundaries. Since in Likelihood classification
332 /// the transformations are applied using both classes, also the corresponding boundaries
333 /// need to take this into account
334 
336 {
337  UInt_t nvar=GetNvar();
338  std::vector<Double_t> xmin(nvar), xmax(nvar);
339  for (UInt_t ivar=0; ivar<nvar; ivar++) {xmin[ivar]=1e30; xmax[ivar]=-1e30;}
340 
341  UInt_t nevents=Data()->GetNEvents();
342  for (UInt_t ievt=0; ievt<nevents; ievt++) {
343  // use the true-event-type's transformation
344  // set the event true event types transformation
345  const Event* origEv = Data()->GetEvent(ievt);
346  if (IgnoreEventsWithNegWeightsInTraining() && origEv->GetWeight()<=0) continue;
347  // loop over classes
348  for (int cls=0;cls<2;cls++){
349  GetTransformationHandler().SetTransformationReferenceClass(cls);
350  const Event* ev = GetTransformationHandler().Transform( origEv );
351  for (UInt_t ivar=0; ivar<nvar; ivar++) {
352  Float_t value = ev->GetValue(ivar);
353  if (value < xmin[ivar]) xmin[ivar] = value;
354  if (value > xmax[ivar]) xmax[ivar] = value;
355  }
356  }
357  }
358 
359  // create reference histograms
360  // (KDE smoothing requires very finely binned reference histograms)
361  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
362  TString var = (*fInputVars)[ivar];
363 
364  // the reference histograms require the correct boundaries. Since in Likelihood classification
365  // the transformations are applied using both classes, also the corresponding boundaries
366  // need to take this into account
367 
368  // special treatment for discrete variables
369  if (DataInfo().GetVariableInfo(ivar).GetVarType() == 'I') {
370  // special treatment for integer variables
371  Int_t ixmin = TMath::Nint( xmin[ivar] );
372  xmax[ivar]=xmax[ivar]+1; // make sure that all entries are included in histogram
373  Int_t ixmax = TMath::Nint( xmax[ivar] );
374  Int_t nbins = ixmax - ixmin;
375  (*fHistSig)[ivar] = new TH1F(GetMethodName()+"_"+var + "_sig", var + " signal training", nbins, ixmin, ixmax );
376  (*fHistBgd)[ivar] = new TH1F(GetMethodName()+"_"+var + "_bgd", var + " background training", nbins, ixmin, ixmax );
377  } else {
378 
379  UInt_t minNEvt = TMath::Min(Data()->GetNEvtSigTrain(),Data()->GetNEvtBkgdTrain());
380  Int_t nbinsS = (*fPDFSig)[ivar]->GetHistNBins( minNEvt );
381  Int_t nbinsB = (*fPDFBgd)[ivar]->GetHistNBins( minNEvt );
382 
383  (*fHistSig)[ivar] = new TH1F( Form("%s_%s_%s_sig",DataInfo().GetName(),GetMethodName().Data(),var.Data()),
384  Form("%s_%s_%s signal training",DataInfo().GetName(),GetMethodName().Data(),var.Data()), nbinsS, xmin[ivar], xmax[ivar] );
385  (*fHistBgd)[ivar] = new TH1F( Form("%s_%s_%s_bgd",DataInfo().GetName(),GetMethodName().Data(),var.Data()),
386  Form("%s_%s_%s background training",DataInfo().GetName(),GetMethodName().Data(),var.Data()), nbinsB, xmin[ivar], xmax[ivar] );
387  }
388  }
389 
390  // ----- fill the reference histograms
391  Log() << kINFO << "Filling reference histograms" << Endl;
392 
393  // event loop
394  for (Int_t ievt=0; ievt<Data()->GetNEvents(); ievt++) {
395 
396  // use the true-event-type's transformation
397  // set the event true event types transformation
398  const Event* origEv = Data()->GetEvent(ievt);
399  if (IgnoreEventsWithNegWeightsInTraining() && origEv->GetWeight()<=0) continue;
400  GetTransformationHandler().SetTransformationReferenceClass( origEv->GetClass() );
401  const Event* ev = GetTransformationHandler().Transform( origEv );
402 
403  // the event weight
404  Float_t weight = ev->GetWeight();
405 
406  // fill variable vector
407  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
408  Double_t value = ev->GetValue(ivar);
409  // verify limits
410  if (value >= xmax[ivar]) value = xmax[ivar] - 1.0e-10;
411  else if (value < xmin[ivar]) value = xmin[ivar] + 1.0e-10;
412  // inserting check if there are events in overflow or underflow
413  if (value >=(*fHistSig)[ivar]->GetXaxis()->GetXmax() ||
414  value <(*fHistSig)[ivar]->GetXaxis()->GetXmin()){
415  Log()<<kWARNING
416  <<"error in filling likelihood reference histograms var="
417  <<(*fInputVars)[ivar]
418  << ", xmin="<<(*fHistSig)[ivar]->GetXaxis()->GetXmin()
419  << ", value="<<value
420  << ", xmax="<<(*fHistSig)[ivar]->GetXaxis()->GetXmax()
421  << Endl;
422  }
423  if (DataInfo().IsSignal(ev)) (*fHistSig)[ivar]->Fill( value, weight );
424  else (*fHistBgd)[ivar]->Fill( value, weight );
425  }
426  }
427 
428  // building the pdfs
429  Log() << kINFO << "Building PDF out of reference histograms" << Endl;
430  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
431 
432  // the PDF is built from (binned) reference histograms
433  // in case of KDE, this has a large number of bins, which makes it quasi-unbinned
434  (*fPDFSig)[ivar]->BuildPDF( (*fHistSig)[ivar] );
435  (*fPDFBgd)[ivar]->BuildPDF( (*fHistBgd)[ivar] );
436 
437  (*fPDFSig)[ivar]->ValidatePDF( (*fHistSig)[ivar] );
438  (*fPDFBgd)[ivar]->ValidatePDF( (*fHistBgd)[ivar] );
439 
440  // saving the smoothed histograms
441  if ((*fPDFSig)[ivar]->GetSmoothedHist() != 0) (*fHistSig_smooth)[ivar] = (*fPDFSig)[ivar]->GetSmoothedHist();
442  if ((*fPDFBgd)[ivar]->GetSmoothedHist() != 0) (*fHistBgd_smooth)[ivar] = (*fPDFBgd)[ivar]->GetSmoothedHist();
443  }
444  ExitFromTraining();
445 }
446 
447 ////////////////////////////////////////////////////////////////////////////////
448 /// returns the likelihood estimator for signal
449 /// fill a new Likelihood branch into the testTree
450 
452 {
453  UInt_t ivar;
454 
455  // cannot determine error
456  NoErrorCalc(err, errUpper);
457 
458  // retrieve variables, and transform, if required
459  TVector vs( GetNvar() );
460  TVector vb( GetNvar() );
461 
462  // need to distinguish signal and background in case of variable transformation
463  // signal first
464 
465  GetTransformationHandler().SetTransformationReferenceClass( fSignalClass );
466  // temporary: JS --> FIX
467  //GetTransformationHandler().SetTransformationReferenceClass( 0 );
468  const Event* ev = GetEvent();
469  for (ivar=0; ivar<GetNvar(); ivar++) vs(ivar) = ev->GetValue(ivar);
470 
471  GetTransformationHandler().SetTransformationReferenceClass( fBackgroundClass );
472  // temporary: JS --> FIX
473  //GetTransformationHandler().SetTransformationReferenceClass( 1 );
474  ev = GetEvent();
475  for (ivar=0; ivar<GetNvar(); ivar++) vb(ivar) = ev->GetValue(ivar);
476 
477  // compute the likelihood (signal)
478  Double_t ps(1), pb(1), p(0);
479  for (ivar=0; ivar<GetNvar(); ivar++) {
480 
481  // drop one variable (this is ONLY used for internal variable ranking !)
482  if ((Int_t)ivar == fDropVariable) continue;
483 
484  Double_t x[2] = { vs(ivar), vb(ivar) };
485 
486  for (UInt_t itype=0; itype < 2; itype++) {
487 
488  // verify limits
489  if (x[itype] >= (*fPDFSig)[ivar]->GetXmax()) x[itype] = (*fPDFSig)[ivar]->GetXmax() - 1.0e-10;
490  else if (x[itype] < (*fPDFSig)[ivar]->GetXmin()) x[itype] = (*fPDFSig)[ivar]->GetXmin();
491 
492  // find corresponding histogram from cached indices
493  PDF* pdf = (itype == 0) ? (*fPDFSig)[ivar] : (*fPDFBgd)[ivar];
494  if (pdf == 0) Log() << kFATAL << "<GetMvaValue> Reference histograms don't exist" << Endl;
495  TH1* hist = pdf->GetPDFHist();
496 
497  // interpolate linearly between adjacent bins
498  // this is not useful for discrete variables
499  Int_t bin = hist->FindBin(x[itype]);
500 
501  // **** POTENTIAL BUG: PREFORMANCE IS WORSE WHEN USING TRUE TYPE ***
502  // ==> commented out at present
503  if ((*fPDFSig)[ivar]->GetInterpolMethod() == TMVA::PDF::kSpline0 ||
504  DataInfo().GetVariableInfo(ivar).GetVarType() == 'N') {
505  p = TMath::Max( hist->GetBinContent(bin), fEpsilon );
506  } else { // splined PDF
507  Int_t nextbin = bin;
508  if ((x[itype] > hist->GetBinCenter(bin) && bin != hist->GetNbinsX()) || bin == 1)
509  nextbin++;
510  else
511  nextbin--;
512 
513 
514  Double_t dx = hist->GetBinCenter(bin) - hist->GetBinCenter(nextbin);
515  Double_t dy = hist->GetBinContent(bin) - hist->GetBinContent(nextbin);
516  Double_t like = hist->GetBinContent(bin) + (x[itype] - hist->GetBinCenter(bin)) * dy/dx;
517 
518  p = TMath::Max( like, fEpsilon );
519  }
520 
521  if (itype == 0) ps *= p;
522  else pb *= p;
523  }
524  }
525 
526  // the likelihood ratio (transform it ?)
527  return TransformLikelihoodOutput( ps, pb );
528 }
529 
530 ////////////////////////////////////////////////////////////////////////////////
531 /// returns transformed or non-transformed output
532 
534 {
535  if (ps < fEpsilon) ps = fEpsilon;
536  if (pb < fEpsilon) pb = fEpsilon;
537  Double_t r = ps/(ps + pb);
538  if (r >= 1.0) r = 1. - 1.e-15;
539 
540  if (fTransformLikelihoodOutput) {
541  // inverse Fermi function
542 
543  // sanity check
544  if (r <= 0.0) r = fEpsilon;
545  else if (r >= 1.0) r = 1. - 1.e-15;
546 
547  Double_t tau = 15.0;
548  r = - TMath::Log(1.0/r - 1.0)/tau;
549  }
550 
551  return r;
552 }
553 
554 ////////////////////////////////////////////////////////////////////////////////
555 /// write options to stream
556 
557 void TMVA::MethodLikelihood::WriteOptionsToStream( std::ostream& o, const TString& prefix ) const
558 {
560 
561  // writing the options defined for the different pdfs
562  if (fDefaultPDFLik != 0) {
563  o << prefix << std::endl << prefix << "#Default Likelihood PDF Options:" << std::endl << prefix << std::endl;
564  fDefaultPDFLik->WriteOptionsToStream( o, prefix );
565  }
566  for (UInt_t ivar = 0; ivar < fPDFSig->size(); ivar++) {
567  if ((*fPDFSig)[ivar] != 0) {
568  o << prefix << std::endl << prefix << Form("#Signal[%d] Likelihood PDF Options:",ivar) << std::endl << prefix << std::endl;
569  (*fPDFSig)[ivar]->WriteOptionsToStream( o, prefix );
570  }
571  if ((*fPDFBgd)[ivar] != 0) {
572  o << prefix << std::endl << prefix << "#Background[%d] Likelihood PDF Options:" << std::endl << prefix << std::endl;
573  (*fPDFBgd)[ivar]->WriteOptionsToStream( o, prefix );
574  }
575  }
576 }
577 
578 ////////////////////////////////////////////////////////////////////////////////
579 /// write weights to XML
580 
581 void TMVA::MethodLikelihood::AddWeightsXMLTo( void* parent ) const
582 {
583  void* wght = gTools().AddChild(parent, "Weights");
584  gTools().AddAttr(wght, "NVariables", GetNvar());
585  gTools().AddAttr(wght, "NClasses", 2);
586  void* pdfwrap;
587  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
588  if ( (*fPDFSig)[ivar]==0 || (*fPDFBgd)[ivar]==0 )
589  Log() << kFATAL << "Reference histograms for variable " << ivar
590  << " don't exist, can't write it to weight file" << Endl;
591  pdfwrap = gTools().AddChild(wght, "PDFDescriptor");
592  gTools().AddAttr(pdfwrap, "VarIndex", ivar);
593  gTools().AddAttr(pdfwrap, "ClassIndex", 0);
594  (*fPDFSig)[ivar]->AddXMLTo(pdfwrap);
595  pdfwrap = gTools().AddChild(wght, "PDFDescriptor");
596  gTools().AddAttr(pdfwrap, "VarIndex", ivar);
597  gTools().AddAttr(pdfwrap, "ClassIndex", 1);
598  (*fPDFBgd)[ivar]->AddXMLTo(pdfwrap);
599  }
600 }
601 
602 ////////////////////////////////////////////////////////////////////////////////
603 /// computes ranking of input variables
604 
606 {
607  // create the ranking object
608  if (fRanking) delete fRanking;
609  fRanking = new Ranking( GetName(), "Delta Separation" );
610 
611  Double_t sepRef = -1, sep = -1;
612  for (Int_t ivar=-1; ivar<(Int_t)GetNvar(); ivar++) {
613 
614  // this variable should not be used
615  fDropVariable = ivar;
616 
617  TString nameS = Form( "rS_%i", ivar+1 );
618  TString nameB = Form( "rB_%i", ivar+1 );
619  TH1* rS = new TH1F( nameS, nameS, 80, 0, 1 );
620  TH1* rB = new TH1F( nameB, nameB, 80, 0, 1 );
621 
622  // the event loop
623  for (Int_t ievt=0; ievt<Data()->GetNTrainingEvents(); ievt++) {
624 
625  const Event* origEv = Data()->GetEvent(ievt);
626  GetTransformationHandler().SetTransformationReferenceClass( origEv->GetClass() );
627  const Event* ev = GetTransformationHandler().Transform(Data()->GetEvent(ievt));
628 
629  Double_t lk = this->GetMvaValue();
630  Double_t w = ev->GetWeight();
631  if (DataInfo().IsSignal(ev)) rS->Fill( lk, w );
632  else rB->Fill( lk, w );
633  }
634 
635  // compute separation
636  sep = TMVA::gTools().GetSeparation( rS, rB );
637  if (ivar == -1) sepRef = sep;
638  sep = sepRef - sep;
639 
640  // don't need these histograms anymore
641  delete rS;
642  delete rB;
643 
644  if (ivar >= 0) fRanking->AddRank( Rank( DataInfo().GetVariableInfo(ivar).GetInternalName(), sep ) );
645  }
646 
647  fDropVariable = -1;
648 
649  return fRanking;
650 }
651 
652 ////////////////////////////////////////////////////////////////////////////////
653 /// write reference PDFs to ROOT file
654 
656 {
657  TString pname = "PDF_";
658  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
659  (*fPDFSig)[ivar]->Write( pname + GetInputVar( ivar ) + "_S" );
660  (*fPDFBgd)[ivar]->Write( pname + GetInputVar( ivar ) + "_B" );
661  }
662 }
663 
664 ////////////////////////////////////////////////////////////////////////////////
665 /// read weights from XML
666 
668 {
669  TString pname = "PDF_";
670  Bool_t addDirStatus = TH1::AddDirectoryStatus();
671  TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
672  UInt_t nvars=0;
673  gTools().ReadAttr(wghtnode, "NVariables",nvars);
674  void* descnode = gTools().GetChild(wghtnode);
675  for (UInt_t ivar=0; ivar<nvars; ivar++){
676  void* pdfnode = gTools().GetChild(descnode);
677  Log() << kDEBUG << "Reading signal and background PDF for variable: " << GetInputVar( ivar ) << Endl;
678  if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
679  if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
680  (*fPDFSig)[ivar] = new PDF( GetInputVar( ivar ) + " PDF Sig" );
681  (*fPDFBgd)[ivar] = new PDF( GetInputVar( ivar ) + " PDF Bkg" );
682  (*fPDFSig)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
683  (*fPDFBgd)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
684  (*(*fPDFSig)[ivar]).ReadXML(pdfnode);
685  descnode = gTools().GetNextChild(descnode);
686  pdfnode = gTools().GetChild(descnode);
687  (*(*fPDFBgd)[ivar]).ReadXML(pdfnode);
688  descnode = gTools().GetNextChild(descnode);
689  }
690  TH1::AddDirectory(addDirStatus);
691 }
692 
693 ////////////////////////////////////////////////////////////////////////////////
694 /// read weight info from file
695 /// nothing to do for this method
696 
698 {
699  TString pname = "PDF_";
700  Bool_t addDirStatus = TH1::AddDirectoryStatus();
701  TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
702  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
703  Log() << kDEBUG << "Reading signal and background PDF for variable: " << GetInputVar( ivar ) << Endl;
704  if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
705  if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
706  (*fPDFSig)[ivar] = new PDF(GetInputVar( ivar ) + " PDF Sig" );
707  (*fPDFBgd)[ivar] = new PDF(GetInputVar( ivar ) + " PDF Bkg");
708  (*fPDFSig)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
709  (*fPDFBgd)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
710  istr >> *(*fPDFSig)[ivar];
711  istr >> *(*fPDFBgd)[ivar];
712  }
713  TH1::AddDirectory(addDirStatus);
714 }
715 
716 ////////////////////////////////////////////////////////////////////////////////
717 /// read reference PDF from ROOT file
718 
720 {
721  TString pname = "PDF_";
722  Bool_t addDirStatus = TH1::AddDirectoryStatus();
723  TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
724  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
725  (*fPDFSig)[ivar] = (TMVA::PDF*)rf.Get( Form( "PDF_%s_S", GetInputVar( ivar ).Data() ) );
726  (*fPDFBgd)[ivar] = (TMVA::PDF*)rf.Get( Form( "PDF_%s_B", GetInputVar( ivar ).Data() ) );
727  }
728  TH1::AddDirectory(addDirStatus);
729 }
730 
731 ////////////////////////////////////////////////////////////////////////////////
732 /// write histograms and PDFs to file for monitoring purposes
733 
735 {
736  Log() << kINFO << "Write monitoring histograms to file: " << BaseDir()->GetPath() << Endl;
737  BaseDir()->cd();
738 
739  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
740  (*fHistSig)[ivar]->Write();
741  (*fHistBgd)[ivar]->Write();
742  if ((*fHistSig_smooth)[ivar] != 0) (*fHistSig_smooth)[ivar]->Write();
743  if ((*fHistBgd_smooth)[ivar] != 0) (*fHistBgd_smooth)[ivar]->Write();
744  (*fPDFSig)[ivar]->GetPDFHist()->Write();
745  (*fPDFBgd)[ivar]->GetPDFHist()->Write();
746 
747  if ((*fPDFSig)[ivar]->GetNSmoothHist() != 0) (*fPDFSig)[ivar]->GetNSmoothHist()->Write();
748  if ((*fPDFBgd)[ivar]->GetNSmoothHist() != 0) (*fPDFBgd)[ivar]->GetNSmoothHist()->Write();
749 
750  // add special plots to check the smoothing in the GetVal method
751  Float_t xmin=((*fPDFSig)[ivar]->GetPDFHist()->GetXaxis())->GetXmin();
752  Float_t xmax=((*fPDFSig)[ivar]->GetPDFHist()->GetXaxis())->GetXmax();
753  TH1F* mm = new TH1F( (*fInputVars)[ivar]+"_additional_check",
754  (*fInputVars)[ivar]+"_additional_check", 15000, xmin, xmax );
755  Double_t intBin = (xmax-xmin)/15000;
756  for (Int_t bin=0; bin < 15000; bin++) {
757  Double_t x = (bin + 0.5)*intBin + xmin;
758  mm->SetBinContent(bin+1 ,(*fPDFSig)[ivar]->GetVal(x));
759  }
760  mm->Write();
761 
762  // ---------- create cloned low-binned histogram for comparison in macros (mainly necessary for KDE)
763  TH1* h[2] = { (*fHistSig)[ivar], (*fHistBgd)[ivar] };
764  for (UInt_t i=0; i<2; i++) {
765  TH1* hclone = (TH1F*)h[i]->Clone( TString(h[i]->GetName()) + "_nice" );
766  hclone->SetName ( TString(h[i]->GetName()) + "_nice" );
767  hclone->SetTitle( TString(h[i]->GetTitle()) + "" );
768  if (hclone->GetNbinsX() > 100) {
769  Int_t resFactor = 5;
770  hclone->Rebin( resFactor );
771  hclone->Scale( 1.0/resFactor );
772  }
773  hclone->Write();
774  }
775  // ----------
776  }
777 }
778 
779 ////////////////////////////////////////////////////////////////////////////////
780 /// write specific header of the classifier (mostly include files)
781 
782 void TMVA::MethodLikelihood::MakeClassSpecificHeader( std::ostream& fout, const TString& ) const
783 {
784  fout << "#include <math.h>" << std::endl;
785  fout << "#include <cstdlib>" << std::endl;
786 }
787 
788 ////////////////////////////////////////////////////////////////////////////////
789 /// write specific classifier response
790 
791 void TMVA::MethodLikelihood::MakeClassSpecific( std::ostream& fout, const TString& className ) const
792 {
793  Int_t dp = fout.precision();
794  fout << " double fEpsilon;" << std::endl;
795 
796  Int_t * nbin = new Int_t[GetNvar()];
797 
798  Int_t nbinMax=-1;
799  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
800  nbin[ivar]=(*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX();
801  if (nbin[ivar] > nbinMax) nbinMax=nbin[ivar];
802  }
803 
804  fout << " static float fRefS[][" << nbinMax << "]; "
805  << "// signal reference vector [nvars][max_nbins]" << std::endl;
806  fout << " static float fRefB[][" << nbinMax << "]; "
807  << "// backgr reference vector [nvars][max_nbins]" << std::endl << std::endl;
808  fout << "// if a variable has its PDF encoded as a spline0 --> treat it like an Integer valued one" <<std::endl;
809  fout << " bool fHasDiscretPDF[" << GetNvar() <<"]; "<< std::endl;
810  fout << " int fNbin[" << GetNvar() << "]; "
811  << "// number of bins (discrete variables may have less bins)" << std::endl;
812  fout << " double fHistMin[" << GetNvar() << "]; " << std::endl;
813  fout << " double fHistMax[" << GetNvar() << "]; " << std::endl;
814 
815  fout << " double TransformLikelihoodOutput( double, double ) const;" << std::endl;
816  fout << "};" << std::endl;
817  fout << "" << std::endl;
818  fout << "inline void " << className << "::Initialize() " << std::endl;
819  fout << "{" << std::endl;
820  fout << " fEpsilon = " << fEpsilon << ";" << std::endl;
821  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
822  fout << " fNbin[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() << ";" << std::endl;
823  fout << " fHistMin[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetXaxis()->GetXmin() << ";" << std::endl;
824  fout << " fHistMax[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetXaxis()->GetXmax() << ";" << std::endl;
825  // sanity check (for previous code lines)
826  if ((((*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() != nbin[ivar] ||
827  (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX() != nbin[ivar])
828  ) ||
829  (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() != (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX()) {
830  Log() << kFATAL << "<MakeClassSpecific> Mismatch in binning of variable "
831  << "\"" << GetOriginalVarName(ivar) << "\" of type: \'" << DataInfo().GetVariableInfo(ivar).GetVarType()
832  << "\' : "
833  << "nxS = " << (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() << ", "
834  << "nxB = " << (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX()
835  << " while we expect " << nbin[ivar]
836  << Endl;
837  }
838  }
839  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
840  if ((*fPDFSig)[ivar]->GetInterpolMethod() == TMVA::PDF::kSpline0)
841  fout << " fHasDiscretPDF[" << ivar <<"] = true; " << std::endl;
842  else
843  fout << " fHasDiscretPDF[" << ivar <<"] = false; " << std::endl;
844  }
845 
846  fout << "}" << std::endl << std::endl;
847 
848  fout << "inline double " << className
849  << "::GetMvaValue__( const std::vector<double>& inputValues ) const" << std::endl;
850  fout << "{" << std::endl;
851  fout << " double ps(1), pb(1);" << std::endl;
852  fout << " std::vector<double> inputValuesSig = inputValues;" << std::endl;
853  fout << " std::vector<double> inputValuesBgd = inputValues;" << std::endl;
854  if (GetTransformationHandler().GetTransformationList().GetSize() != 0) {
855  fout << " Transform(inputValuesSig,0);" << std::endl;
856  fout << " Transform(inputValuesBgd,1);" << std::endl;
857  }
858  fout << " for (size_t ivar = 0; ivar < GetNvar(); ivar++) {" << std::endl;
859  fout << std::endl;
860  fout << " // dummy at present... will be used for variable transforms" << std::endl;
861  fout << " double x[2] = { inputValuesSig[ivar], inputValuesBgd[ivar] };" << std::endl;
862  fout << std::endl;
863  fout << " for (int itype=0; itype < 2; itype++) {" << std::endl;
864  fout << std::endl;
865  fout << " // interpolate linearly between adjacent bins" << std::endl;
866  fout << " // this is not useful for discrete variables (or forced Spline0)" << std::endl;
867  fout << " int bin = int((x[itype] - fHistMin[ivar])/(fHistMax[ivar] - fHistMin[ivar])*fNbin[ivar]) + 0;" << std::endl;
868  fout << std::endl;
869  fout << " // since the test data sample is in general different from the training sample" << std::endl;
870  fout << " // it can happen that the min/max of the training sample are trespassed --> correct this" << std::endl;
871  fout << " if (bin < 0) {" << std::endl;
872  fout << " bin = 0;" << std::endl;
873  fout << " x[itype] = fHistMin[ivar];" << std::endl;
874  fout << " }" << std::endl;
875  fout << " else if (bin >= fNbin[ivar]) {" << std::endl;
876  fout << " bin = fNbin[ivar]-1;" << std::endl;
877  fout << " x[itype] = fHistMax[ivar];" << std::endl;
878  fout << " }" << std::endl;
879  fout << std::endl;
880  fout << " // find corresponding histogram from cached indices" << std::endl;
881  fout << " float ref = (itype == 0) ? fRefS[ivar][bin] : fRefB[ivar][bin];" << std::endl;
882  fout << std::endl;
883  fout << " // sanity check" << std::endl;
884  fout << " if (ref < 0) {" << std::endl;
885  fout << " std::cout << \"Fatal error in " << className
886  << ": bin entry < 0 ==> abort\" << std::endl;" << std::endl;
887  fout << " std::exit(1);" << std::endl;
888  fout << " }" << std::endl;
889  fout << std::endl;
890  fout << " double p = ref;" << std::endl;
891  fout << std::endl;
892  fout << " if (GetType(ivar) != 'I' && !fHasDiscretPDF[ivar]) {" << std::endl;
893  fout << " float bincenter = (bin + 0.5)/fNbin[ivar]*(fHistMax[ivar] - fHistMin[ivar]) + fHistMin[ivar];" << std::endl;
894  fout << " int nextbin = bin;" << std::endl;
895  fout << " if ((x[itype] > bincenter && bin != fNbin[ivar]-1) || bin == 0) " << std::endl;
896  fout << " nextbin++;" << std::endl;
897  fout << " else" << std::endl;
898  fout << " nextbin--; " << std::endl;
899  fout << std::endl;
900  fout << " double refnext = (itype == 0) ? fRefS[ivar][nextbin] : fRefB[ivar][nextbin];" << std::endl;
901  fout << " float nextbincenter = (nextbin + 0.5)/fNbin[ivar]*(fHistMax[ivar] - fHistMin[ivar]) + fHistMin[ivar];" << std::endl;
902  fout << std::endl;
903  fout << " double dx = bincenter - nextbincenter;" << std::endl;
904  fout << " double dy = ref - refnext;" << std::endl;
905  fout << " p += (x[itype] - bincenter) * dy/dx;" << std::endl;
906  fout << " }" << std::endl;
907  fout << std::endl;
908  fout << " if (p < fEpsilon) p = fEpsilon; // avoid zero response" << std::endl;
909  fout << std::endl;
910  fout << " if (itype == 0) ps *= p;" << std::endl;
911  fout << " else pb *= p;" << std::endl;
912  fout << " } " << std::endl;
913  fout << " } " << std::endl;
914  fout << std::endl;
915  fout << " // the likelihood ratio (transform it ?)" << std::endl;
916  fout << " return TransformLikelihoodOutput( ps, pb ); " << std::endl;
917  fout << "}" << std::endl << std::endl;
918 
919  fout << "inline double " << className << "::TransformLikelihoodOutput( double ps, double pb ) const" << std::endl;
920  fout << "{" << std::endl;
921  fout << " // returns transformed or non-transformed output" << std::endl;
922  fout << " if (ps < fEpsilon) ps = fEpsilon;" << std::endl;
923  fout << " if (pb < fEpsilon) pb = fEpsilon;" << std::endl;
924  fout << " double r = ps/(ps + pb);" << std::endl;
925  fout << " if (r >= 1.0) r = 1. - 1.e-15;" << std::endl;
926  fout << std::endl;
927  fout << " if (" << (fTransformLikelihoodOutput ? "true" : "false") << ") {" << std::endl;
928  fout << " // inverse Fermi function" << std::endl;
929  fout << std::endl;
930  fout << " // sanity check" << std::endl;
931  fout << " if (r <= 0.0) r = fEpsilon;" << std::endl;
932  fout << " else if (r >= 1.0) r = 1. - 1.e-15;" << std::endl;
933  fout << std::endl;
934  fout << " double tau = 15.0;" << std::endl;
935  fout << " r = - log(1.0/r - 1.0)/tau;" << std::endl;
936  fout << " }" << std::endl;
937  fout << std::endl;
938  fout << " return r;" << std::endl;
939  fout << "}" << std::endl;
940  fout << std::endl;
941 
942  fout << "// Clean up" << std::endl;
943  fout << "inline void " << className << "::Clear() " << std::endl;
944  fout << "{" << std::endl;
945  fout << " // nothing to clear" << std::endl;
946  fout << "}" << std::endl << std::endl;
947 
948  fout << "// signal map" << std::endl;
949  fout << "float " << className << "::fRefS[][" << nbinMax << "] = " << std::endl;
950  fout << "{ " << std::endl;
951  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
952  fout << " { ";
953  for (Int_t ibin=1; ibin<=nbinMax; ibin++) {
954  if (ibin-1 < nbin[ivar])
955  fout << (*fPDFSig)[ivar]->GetPDFHist()->GetBinContent(ibin);
956  else
957  fout << -1;
958 
959  if (ibin < nbinMax) fout << ", ";
960  }
961  fout << " }, " << std::endl;
962  }
963  fout << "}; " << std::endl;
964  fout << std::endl;
965 
966  fout << "// background map" << std::endl;
967  fout << "float " << className << "::fRefB[][" << nbinMax << "] = " << std::endl;
968  fout << "{ " << std::endl;
969  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
970  fout << " { ";
971  fout << std::setprecision(8);
972  for (Int_t ibin=1; ibin<=nbinMax; ibin++) {
973  if (ibin-1 < nbin[ivar])
974  fout << (*fPDFBgd)[ivar]->GetPDFHist()->GetBinContent(ibin);
975  else
976  fout << -1;
977 
978  if (ibin < nbinMax) fout << ", ";
979  }
980  fout << " }, " << std::endl;
981  }
982  fout << "}; " << std::endl;
983  fout << std::endl;
984  fout << std::setprecision(dp);
985 
986  delete[] nbin;
987 }
988 
989 ////////////////////////////////////////////////////////////////////////////////
990 /// get help message text
991 ///
992 /// typical length of text line:
993 /// "|--------------------------------------------------------------|"
994 
996 {
997  Log() << Endl;
998  Log() << gTools().Color("bold") << "--- Short description:" << gTools().Color("reset") << Endl;
999  Log() << Endl;
1000  Log() << "The maximum-likelihood classifier models the data with probability " << Endl;
1001  Log() << "density functions (PDF) reproducing the signal and background" << Endl;
1002  Log() << "distributions of the input variables. Correlations among the " << Endl;
1003  Log() << "variables are ignored." << Endl;
1004  Log() << Endl;
1005  Log() << gTools().Color("bold") << "--- Performance optimisation:" << gTools().Color("reset") << Endl;
1006  Log() << Endl;
1007  Log() << "Required for good performance are decorrelated input variables" << Endl;
1008  Log() << "(PCA transformation via the option \"VarTransform=Decorrelate\"" << Endl;
1009  Log() << "may be tried). Irreducible non-linear correlations may be reduced" << Endl;
1010  Log() << "by precombining strongly correlated input variables, or by simply" << Endl;
1011  Log() << "removing one of the variables." << Endl;
1012  Log() << Endl;
1013  Log() << gTools().Color("bold") << "--- Performance tuning via configuration options:" << gTools().Color("reset") << Endl;
1014  Log() << Endl;
1015  Log() << "High fidelity PDF estimates are mandatory, i.e., sufficient training " << Endl;
1016  Log() << "statistics is required to populate the tails of the distributions" << Endl;
1017  Log() << "It would be a surprise if the default Spline or KDE kernel parameters" << Endl;
1018  Log() << "provide a satisfying fit to the data. The user is advised to properly" << Endl;
1019  Log() << "tune the events per bin and smooth options in the spline cases" << Endl;
1020  Log() << "individually per variable. If the KDE kernel is used, the adaptive" << Endl;
1021  Log() << "Gaussian kernel may lead to artefacts, so please always also try" << Endl;
1022  Log() << "the non-adaptive one." << Endl;
1023  Log() << "" << Endl;
1024  Log() << "All tuning parameters must be adjusted individually for each input" << Endl;
1025  Log() << "variable!" << Endl;
1026 }
1027 
kTRUE
const Bool_t kTRUE
Definition: RtypesCore.h:100
TMVA::Tools::GetChild
void * GetChild(void *parent, const char *childname=0)
get child node
Definition: Tools.cxx:1162
TMVA::MethodLikelihood::ReadWeightsFromXML
void ReadWeightsFromXML(void *wghtnode)
read weights from XML
Definition: MethodLikelihood.cxx:667
TMVA::PDF::GetPDFHist
TH1 * GetPDFHist() const
Definition: PDF.h:92
TMath::Max
Short_t Max(Short_t a, Short_t b)
Definition: TMathBase.h:212
TMVA::MethodLikelihood::CreateRanking
const Ranking * CreateRanking()
computes ranking of input variables
Definition: MethodLikelihood.cxx:605
TString::Data
const char * Data() const
Definition: TString.h:369
DataSetInfo.h
TMVA::MethodLikelihood::~MethodLikelihood
virtual ~MethodLikelihood()
destructor
Definition: MethodLikelihood.cxx:198
ClassImp
#define ClassImp(name)
Definition: Rtypes.h:364
Form
char * Form(const char *fmt,...)
TMVA::Ranking
Ranking for variables in method (implementation)
Definition: Ranking.h:48
TMVA::MethodLikelihood::GetHelpMessage
void GetHelpMessage() const
get help message text
Definition: MethodLikelihood.cxx:995
TMVA::Event::GetClass
UInt_t GetClass() const
Definition: Event.h:86
TMVA::MethodLikelihood::GetMvaValue
Double_t GetMvaValue(Double_t *err=0, Double_t *errUpper=0)
returns the likelihood estimator for signal fill a new Likelihood branch into the testTree
Definition: MethodLikelihood.cxx:451
r
ROOT::R::TRInterface & r
Definition: Object.C:4
xmax
float xmax
Definition: THbookFile.cxx:95
IMethod.h
TMVA::MethodLikelihood::MethodLikelihood
MethodLikelihood(const TString &jobName, const TString &methodTitle, DataSetInfo &theData, const TString &theOption="")
standard constructor
Definition: MethodLikelihood.cxx:142
TMath::Log
Double_t Log(Double_t x)
Definition: TMath.h:760
TMVA::MethodBase::DeclareCompatibilityOptions
virtual void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
Definition: MethodBase.cxx:596
TMVA::Configurable::WriteOptionsToStream
void WriteOptionsToStream(std::ostream &o, const TString &prefix) const
write options to output stream (e.g. in writing the MVA weight files
Definition: Configurable.cxx:333
Ranking.h
TMVA::Tools::AddChild
void * AddChild(void *parent, const char *childname, const char *content=0, bool isRootNode=false)
add child node
Definition: Tools.cxx:1136
TMVA::MethodLikelihood::ReadWeightsFromStream
void ReadWeightsFromStream(std::istream &istr)
read weight info from file nothing to do for this method
Definition: MethodLikelihood.cxx:697
Float_t
float Float_t
Definition: RtypesCore.h:57
VariableInfo.h
TMVA::Tools::GetSeparation
Double_t GetSeparation(TH1 *S, TH1 *B) const
compute "separation" defined as
Definition: Tools.cxx:133
Int_t
int Int_t
Definition: RtypesCore.h:45
TGeant4Unit::ps
static constexpr double ps
Definition: TGeant4SystemOfUnits.h:165
x
Double_t x[n]
Definition: legend1.C:17
TMVA::MethodLikelihood::WriteWeightsToStream
void WriteWeightsToStream(TFile &rf) const
write reference PDFs to ROOT file
Definition: MethodLikelihood.cxx:655
TMVA::MethodLikelihood::Train
void Train()
create reference distributions (PDFs) from signal and background events: fill histograms and smooth t...
Definition: MethodLikelihood.cxx:335
MethodBase.h
TMVA::MethodLikelihood::Init
void Init()
default initialisation called by all constructors
Definition: MethodLikelihood.cxx:226
TMVA::Rank
Definition: Ranking.h:76
TMVA::MethodLikelihood::DeclareOptions
void DeclareOptions()
define the options (their key words) that can be set in the option string
Definition: MethodLikelihood.cxx:243
TH1::SetName
virtual void SetName(const char *name)
Change the name of this histogram.
Definition: TH1.cxx:8794
TString
Basic string class.
Definition: TString.h:136
TFile.h
REGISTER_METHOD
#define REGISTER_METHOD(CLASS)
for example
Definition: ClassifierFactory.h:124
bool
PDF.h
TH1::SetTitle
virtual void SetTitle(const char *title)
See GetStatOverflows for more information.
Definition: TH1.cxx:6679
TMVA::Tools::AddAttr
void AddAttr(void *node, const char *, const T &value, Int_t precision=16)
add attribute to xml
Definition: Tools.h:353
TMVA::Event::GetValue
Float_t GetValue(UInt_t ivar) const
return value of i'th variable
Definition: Event.cxx:236
TMVA::DataSetInfo
Class that contains all the data information.
Definition: DataSetInfo.h:62
TDirectoryFile::Get
TObject * Get(const char *namecycle) override
Return pointer to object identified by namecycle.
Definition: TDirectoryFile.cxx:909
TMath::Nint
Int_t Nint(T x)
Round to nearest integer. Rounds half integers to the nearest even integer.
Definition: TMath.h:713
TH1::GetBinContent
virtual Double_t GetBinContent(Int_t bin) const
Return content of bin number bin.
Definition: TH1.cxx:4993
MsgLogger.h
TMVA::MethodLikelihood::AddWeightsXMLTo
void AddWeightsXMLTo(void *parent) const
write weights to XML
Definition: MethodLikelihood.cxx:581
xmin
float xmin
Definition: THbookFile.cxx:95
TH1::Rebin
virtual TH1 * Rebin(Int_t ngroup=2, const char *newname="", const Double_t *xbins=0)
Rebin this histogram.
Definition: TH1.cxx:6223
TMVA::Types::EAnalysisType
EAnalysisType
Definition: Types.h:128
h
#define h(i)
Definition: RSha256.hxx:106
kFALSE
const Bool_t kFALSE
Definition: RtypesCore.h:101
TH1::AddDirectory
static void AddDirectory(Bool_t add=kTRUE)
Sets the flag controlling the automatic add of histograms in memory.
Definition: TH1.cxx:1282
TH1::GetBinCenter
virtual Double_t GetBinCenter(Int_t bin) const
Return bin center for 1D histogram.
Definition: TH1.cxx:8975
TMVA::MethodLikelihood::HasAnalysisType
virtual Bool_t HasAnalysisType(Types::EAnalysisType type, UInt_t numberClasses, UInt_t numberTargets)
FDA can handle classification with 2 classes.
Definition: MethodLikelihood.cxx:216
TMVA::Tools::ReadAttr
void ReadAttr(void *node, const char *, T &value)
read attribute from xml
Definition: Tools.h:335
TMVA::Types::kClassification
@ kClassification
Definition: Types.h:129
TH1::Fill
virtual Int_t Fill(Double_t x)
Increment bin with abscissa X by 1.
Definition: TH1.cxx:3350
TMVA::MethodBase
Virtual base Class for all MVA method.
Definition: MethodBase.h:111
TMVA::Types
Singleton class for Global types used by TMVA.
Definition: Types.h:73
Types.h
Configurable.h
TMVA::MethodLikelihood::WriteMonitoringHistosToFile
void WriteMonitoringHistosToFile() const
write histograms and PDFs to file for monitoring purposes
Definition: MethodLikelihood.cxx:734
TGeant4Unit::mm
static constexpr double mm
Definition: TGeant4SystemOfUnits.h:108
MethodLikelihood.h
TFile
A ROOT file is a suite of consecutive data records (TKey instances) with a well defined format.
Definition: TFile.h:54
TMath::Min
Short_t Min(Short_t a, Short_t b)
Definition: TMathBase.h:180
TMVA::MethodLikelihood::DeclareCompatibilityOptions
void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
Definition: MethodLikelihood.cxx:274
TMVA::Endl
MsgLogger & Endl(MsgLogger &ml)
Definition: MsgLogger.h:158
unsigned int
TMVA::TMVAGlob::GetMethodName
void GetMethodName(TString &name, TKey *mkey)
Definition: tmvaglob.cxx:342
TMVA::Tools::Color
const TString & Color(const TString &)
human readable color strings
Definition: Tools.cxx:840
TMVA::PDF
PDF wrapper for histograms; uses user-defined spline interpolation.
Definition: PDF.h:63
TMVA::MethodLikelihood::MakeClassSpecific
void MakeClassSpecific(std::ostream &, const TString &) const
write specific classifier response
Definition: MethodLikelihood.cxx:791
TMVA::MethodLikelihood::TransformLikelihoodOutput
Double_t TransformLikelihoodOutput(Double_t ps, Double_t pb) const
returns transformed or non-transformed output
Definition: MethodLikelihood.cxx:533
TVectorT
TVectorT.
Definition: TVectorT.h:27
TVector.h
Double_t
double Double_t
Definition: RtypesCore.h:59
TMVA::MethodLikelihood
Likelihood analysis ("non-parametric approach")
Definition: MethodLikelihood.h:61
TMVA::Event::GetWeight
Double_t GetWeight() const
return the event weight - depending on whether the flag IgnoreNegWeightsInTraining is or not.
Definition: Event.cxx:381
TH1::FindBin
virtual Int_t FindBin(Double_t x, Double_t y=0, Double_t z=0)
Return Global bin number corresponding to x,y,z.
Definition: TH1.cxx:3680
TMVA::kFATAL
@ kFATAL
Definition: Types.h:63
TMVA::Tools::GetNextChild
void * GetNextChild(void *prevchild, const char *childname=0)
XML helpers.
Definition: Tools.cxx:1174
TH1F
1-D histogram with a float per channel (see TH1 documentation)}
Definition: TH1.h:575
TMVA::PDF::kSpline0
@ kSpline0
Definition: PDF.h:70
TMVA::MethodLikelihood::MakeClassSpecificHeader
void MakeClassSpecificHeader(std::ostream &, const TString &="") const
write specific header of the classifier (mostly include files)
Definition: MethodLikelihood.cxx:782
TObject::Write
virtual Int_t Write(const char *name=0, Int_t option=0, Int_t bufsize=0)
Write this object to the current directory.
Definition: TObject.cxx:798
TMVA::Event
Definition: Event.h:51
TH1
TH1 is the base class of all histogram classes in ROOT.
Definition: TH1.h:58
TMVA::kDEBUG
@ kDEBUG
Definition: Types.h:58
TMVA::MethodLikelihood::ProcessOptions
void ProcessOptions()
process user options reference cut value to distinguish signal-like from background-like events
Definition: MethodLikelihood.cxx:316
TMVA::kINFO
@ kINFO
Definition: Types.h:60
Tools.h
ClassifierFactory.h
type
int type
Definition: TGX11.cxx:121
TMVA::MethodLikelihood::WriteOptionsToStream
virtual void WriteOptionsToStream(std::ostream &o, const TString &prefix) const
write options to stream
Definition: MethodLikelihood.cxx:557
TH1::AddDirectoryStatus
static Bool_t AddDirectoryStatus()
Static function: cannot be inlined on Windows/NT.
Definition: TH1.cxx:750
ROOT::Math::detail::sep
@ sep
Definition: GenVectorIO.h:35
TMVA::gTools
Tools & gTools()
TH1.h
DataSet.h
TH1::Scale
virtual void Scale(Double_t c1=1, Option_t *option="")
Multiply this histogram by a constant c1.
Definition: TH1.cxx:6565
TMVA::kWARNING
@ kWARNING
Definition: Types.h:61
TMath.h
TH1::GetNbinsX
virtual Int_t GetNbinsX() const
Definition: TH1.h:296
TMVA
create variable transformations
Definition: GeneticMinimizer.h:22
int