ROOT  6.06/09
Reference Guide
MethodLikelihood.cxx
Go to the documentation of this file.
1 // @(#)root/tmva $Id$
2 // Author: Andreas Hoecker, Joerg Stelzer, Helge Voss, Kai Voss, Eckhard von Toerne, Jan Therhaag
3 
4 /**********************************************************************************
5  * Project: TMVA - a Root-integrated toolkit for multivariate Data analysis *
6  * Package: TMVA *
7  * Class : TMVA::MethodLikelihood *
8  * Web : http://tmva.sourceforge.net *
9  * *
10  * Description: *
11  * Implementation (see header for description) *
12  * *
13  * Authors (alphabetical): *
14  * Andreas Hoecker <Andreas.Hocker@cern.ch> - CERN, Switzerland *
15  * Helge Voss <Helge.Voss@cern.ch> - MPI-K Heidelberg, Germany *
16  * Kai Voss <Kai.Voss@cern.ch> - U. of Victoria, Canada *
17  * Jan Therhaag <Jan.Therhaag@cern.ch> - U of Bonn, Germany *
18  * Eckhard v. Toerne <evt@uni-bonn.de> - U. of Bonn, Germany *
19  * *
20  * Copyright (c) 2005-2011: *
21  * CERN, Switzerland *
22  * U. of Victoria, Canada *
23  * MPI-K Heidelberg, Germany *
24  * U. of Bonn, Germany *
25  * *
26  * Redistribution and use in source and binary forms, with or without *
27  * modification, are permitted according to the terms listed in LICENSE *
28  * (http://tmva.sourceforge.net/LICENSE) *
29  **********************************************************************************/
30 
31 ////////////////////////////////////////////////////////////////////////////////
32 
33 /* Begin_Html
34 
35  Likelihood analysis ("non-parametric approach")
36 
37  <p>
38  Also implemented is a "diagonalized likelihood approach",
39  which improves over the uncorrelated likelihood ansatz by
40  transforming linearly the input variables into a diagonal space,
41  using the square-root of the covariance matrix
42 
43  <p>
44  The method of maximum likelihood is the most straightforward, and
45  certainly among the most elegant multivariate analyser approaches.
46  We define the likelihood ratio, <i>R<sub>L</sub></i>, for event
47  <i>i</i>, by:<br>
48  <center>
49  <img vspace=6 src="gif/tmva_likratio.gif" align="bottom" >
50  </center>
51  Here the signal and background likelihoods, <i>L<sub>S</sub></i>,
52  <i>L<sub>B</sub></i>, are products of the corresponding probability
53  densities, <i>p<sub>S</sub></i>, <i>p<sub>B</sub></i>, of the
54  <i>N</i><sub>var</sub> discriminating variables used in the MVA: <br>
55  <center>
56  <img vspace=6 src="gif/tmva_lik.gif" align="bottom" >
57  </center>
58  and accordingly for L<sub>B</sub>.
59  In practise, TMVA uses polynomial splines to estimate the probability
60  density functions (PDF) obtained from the distributions of the
61  training variables.<br>
62 
63  <p>
64  Note that in TMVA the output of the likelihood ratio is transformed
65  by<br>
66  <center>
67  <img vspace=6 src="gif/tmva_likratio_trans.gif" align="bottom"/>
68  </center>
69  to avoid the occurrence of heavy peaks at <i>R<sub>L</sub></i>=0,1.
70 
71  <b>Decorrelated (or "diagonalized") Likelihood</b>
72 
73  <p>
74  The biggest drawback of the Likelihood approach is that it assumes
75  that the discriminant variables are uncorrelated. If it were the case,
76  it can be proven that the discrimination obtained by the above likelihood
77  ratio is optimal, ie, no other method can beat it. However, in most
78  practical applications of MVAs correlations are present. <br><p></p>
79 
80  <p>
81  Linear correlations, measured from the training sample, can be taken
82  into account in a straightforward manner through the square-root
83  of the covariance matrix. The square-root of a matrix
84  <i>C</i> is the matrix <i>C&prime;</i> that multiplied with itself
85  yields <i>C</i>: <i>C</i>=<i>C&prime;C&prime;</i>. We compute the
86  square-root matrix (SQM) by means of diagonalising (<i>D</i>) the
87  covariance matrix: <br>
88  <center>
89  <img vspace=6 src="gif/tmva_sqm.gif" align="bottom" >
90  </center>
91  and the linear transformation of the linearly correlated into the
92  uncorrelated variables space is then given by multiplying the measured
93  variable tuple by the inverse of the SQM. Note that these transformations
94  are performed for both signal and background separately, since the
95  correlation pattern is not the same in the two samples.
96 
97  <p>
98  The above diagonalisation is complete for linearly correlated,
99  Gaussian distributed variables only. In real-world examples this
100  is not often the case, so that only little additional information
101  may be recovered by the diagonalisation procedure. In these cases,
102  non-linear methods must be applied.
103 
104 End_Html */
105 //_______________________________________________________________________
106 
107 #include <iomanip>
108 #include <vector>
109 #include <cstdlib>
110 
111 #include "TMatrixD.h"
112 #include "TVector.h"
113 #include "TMath.h"
114 #include "TObjString.h"
115 #include "TFile.h"
116 #include "TKey.h"
117 #include "TH1.h"
118 #include "TClass.h"
119 #include "Riostream.h"
120 
121 #include "TMVA/ClassifierFactory.h"
122 #include "TMVA/MethodLikelihood.h"
123 #include "TMVA/Tools.h"
124 #include "TMVA/Ranking.h"
125 
126 REGISTER_METHOD(Likelihood)
127 
128 ClassImp(TMVA::MethodLikelihood)
129 
130 ////////////////////////////////////////////////////////////////////////////////
131 /// standard constructor
132 
133 TMVA::MethodLikelihood::MethodLikelihood( const TString& jobName,
134  const TString& methodTitle,
135  DataSetInfo& theData,
136  const TString& theOption,
137  TDirectory* theTargetDir ) :
138  TMVA::MethodBase( jobName, Types::kLikelihood, methodTitle, theData, theOption, theTargetDir ),
139  fEpsilon ( 1.e3 * DBL_MIN ),
140  fTransformLikelihoodOutput( kFALSE ),
141  fDropVariable ( 0 ),
142  fHistSig ( 0 ),
143  fHistBgd ( 0 ),
144  fHistSig_smooth( 0 ),
145  fHistBgd_smooth( 0 ),
146  fDefaultPDFLik ( 0 ),
147  fPDFSig ( 0 ),
148  fPDFBgd ( 0 ),
149  fNsmooth ( 2 ),
150  fNsmoothVarS ( 0 ),
151  fNsmoothVarB ( 0 ),
152  fAverageEvtPerBin( 0 ),
153  fAverageEvtPerBinVarS (0),
154  fAverageEvtPerBinVarB (0),
155  fKDEfineFactor ( 0 ),
156  fInterpolateString(0)
157 {
158 }
159 
160 ////////////////////////////////////////////////////////////////////////////////
161 /// construct likelihood references from file
162 
164  const TString& theWeightFile,
165  TDirectory* theTargetDir ) :
166  TMVA::MethodBase( Types::kLikelihood, theData, theWeightFile, theTargetDir ),
167  fEpsilon ( 1.e3 * DBL_MIN ),
168  fTransformLikelihoodOutput( kFALSE ),
169  fDropVariable ( 0 ),
170  fHistSig ( 0 ),
171  fHistBgd ( 0 ),
172  fHistSig_smooth( 0 ),
173  fHistBgd_smooth( 0 ),
174  fDefaultPDFLik ( 0 ),
175  fPDFSig ( 0 ),
176  fPDFBgd ( 0 ),
177  fNsmooth ( 2 ),
178  fNsmoothVarS ( 0 ),
179  fNsmoothVarB ( 0 ),
180  fAverageEvtPerBin( 0 ),
181  fAverageEvtPerBinVarS (0),
182  fAverageEvtPerBinVarB (0),
183  fKDEfineFactor ( 0 ),
184  fInterpolateString(0)
185 {
186 }
187 
188 ////////////////////////////////////////////////////////////////////////////////
189 /// destructor
190 
192 {
193  if (NULL != fDefaultPDFLik) delete fDefaultPDFLik;
194  if (NULL != fHistSig) delete fHistSig;
195  if (NULL != fHistBgd) delete fHistBgd;
196  if (NULL != fHistSig_smooth) delete fHistSig_smooth;
197  if (NULL != fHistBgd_smooth) delete fHistBgd_smooth;
198  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
199  if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
200  if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
201  }
202  if (NULL != fPDFSig) delete fPDFSig;
203  if (NULL != fPDFBgd) delete fPDFBgd;
204 }
205 
206 ////////////////////////////////////////////////////////////////////////////////
207 /// FDA can handle classification with 2 classes
208 
210  UInt_t numberClasses, UInt_t /*numberTargets*/ )
211 {
212  if (type == Types::kClassification && numberClasses == 2) return kTRUE;
213  return kFALSE;
214 }
215 
216 ////////////////////////////////////////////////////////////////////////////////
217 /// default initialisation called by all constructors
218 
220 {
221  // no ranking test
222  fDropVariable = -1;
223  fHistSig = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
224  fHistBgd = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
225  fHistSig_smooth = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
226  fHistBgd_smooth = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
227  fPDFSig = new std::vector<TMVA::PDF*>( GetNvar(), (TMVA::PDF*)0 );
228  fPDFBgd = new std::vector<TMVA::PDF*>( GetNvar(), (TMVA::PDF*)0 );
229 }
230 
231 ////////////////////////////////////////////////////////////////////////////////
232 /// define the options (their key words) that can be set in the option string
233 /// TransformOutput <bool> transform (often strongly peaked) likelihood output through sigmoid inversion
234 
236 {
237  DeclareOptionRef( fTransformLikelihoodOutput = kFALSE, "TransformOutput",
238  "Transform likelihood output by inverse sigmoid function" );
239 
240  // initialize
241 
242  // reading every PDF's definition and passing the option string to the next one to be read and marked
243  TString updatedOptions = GetOptions();
244  fDefaultPDFLik = new PDF( TString(GetName()) + " PDF", updatedOptions );
245  fDefaultPDFLik->DeclareOptions();
246  fDefaultPDFLik->ParseOptions();
247  updatedOptions = fDefaultPDFLik->GetOptions();
248  for (UInt_t ivar = 0; ivar< DataInfo().GetNVariables(); ivar++) {
249  (*fPDFSig)[ivar] = new PDF( Form("%s PDF Sig[%d]", GetName(), ivar), updatedOptions,
250  Form("Sig[%d]",ivar), fDefaultPDFLik );
251  (*fPDFSig)[ivar]->DeclareOptions();
252  (*fPDFSig)[ivar]->ParseOptions();
253  updatedOptions = (*fPDFSig)[ivar]->GetOptions();
254  (*fPDFBgd)[ivar] = new PDF( Form("%s PDF Bkg[%d]", GetName(), ivar), updatedOptions,
255  Form("Bkg[%d]",ivar), fDefaultPDFLik );
256  (*fPDFBgd)[ivar]->DeclareOptions();
257  (*fPDFBgd)[ivar]->ParseOptions();
258  updatedOptions = (*fPDFBgd)[ivar]->GetOptions();
259  }
260 
261  // the final marked option string is written back to the original likelihood
262  SetOptions( updatedOptions );
263 }
264 
265 
267 {
268  // options that are used ONLY for the READER to ensure backward compatibility
269 
271  DeclareOptionRef( fNsmooth = 1, "NSmooth",
272  "Number of smoothing iterations for the input histograms");
273  DeclareOptionRef( fAverageEvtPerBin = 50, "NAvEvtPerBin",
274  "Average number of events per PDF bin");
275  DeclareOptionRef( fKDEfineFactor =1. , "KDEFineFactor",
276  "Fine tuning factor for Adaptive KDE: Factor to multyply the width of the kernel");
277  DeclareOptionRef( fBorderMethodString = "None", "KDEborder",
278  "Border effects treatment (1=no treatment , 2=kernel renormalization, 3=sample mirroring)" );
279  DeclareOptionRef( fKDEiterString = "Nonadaptive", "KDEiter",
280  "Number of iterations (1=non-adaptive, 2=adaptive)" );
281  DeclareOptionRef( fKDEtypeString = "Gauss", "KDEtype",
282  "KDE kernel type (1=Gauss)" );
283  fAverageEvtPerBinVarS = new Int_t[GetNvar()];
284  fAverageEvtPerBinVarB = new Int_t[GetNvar()];
285  fNsmoothVarS = new Int_t[GetNvar()];
286  fNsmoothVarB = new Int_t[GetNvar()];
287  fInterpolateString = new TString[GetNvar()];
288  for(UInt_t i=0; i<GetNvar(); ++i) {
289  fAverageEvtPerBinVarS[i] = fAverageEvtPerBinVarB[i] = 0;
290  fNsmoothVarS[i] = fNsmoothVarB[i] = 0;
291  fInterpolateString[i] = "";
292  }
293  DeclareOptionRef( fAverageEvtPerBinVarS, GetNvar(), "NAvEvtPerBinSig",
294  "Average num of events per PDF bin and variable (signal)");
295  DeclareOptionRef( fAverageEvtPerBinVarB, GetNvar(), "NAvEvtPerBinBkg",
296  "Average num of events per PDF bin and variable (background)");
297  DeclareOptionRef(fNsmoothVarS, GetNvar(), "NSmoothSig",
298  "Number of smoothing iterations for the input histograms");
299  DeclareOptionRef(fNsmoothVarB, GetNvar(), "NSmoothBkg",
300  "Number of smoothing iterations for the input histograms");
301  DeclareOptionRef(fInterpolateString, GetNvar(), "PDFInterpol", "Method of interpolating reference histograms (e.g. Spline2 or KDE)");
302 }
303 
304 
305 
306 
307 ////////////////////////////////////////////////////////////////////////////////
308 
310 {
311  // process user options
312  // reference cut value to distingiush signal-like from background-like events
313  SetSignalReferenceCut( TransformLikelihoodOutput( 0.5, 0.5 ) );
314 
315  fDefaultPDFLik->ProcessOptions();
316  for (UInt_t ivar = 0; ivar< DataInfo().GetNVariables(); ivar++) {
317  (*fPDFBgd)[ivar]->ProcessOptions();
318  (*fPDFSig)[ivar]->ProcessOptions();
319  }
320 }
321 
322 ////////////////////////////////////////////////////////////////////////////////
323 /// create reference distributions (PDFs) from signal and background events:
324 /// fill histograms and smooth them; if decorrelation is required, compute
325 /// corresponding square-root matrices
326 /// the reference histograms require the correct boundaries. Since in Likelihood classification
327 /// the transformations are applied using both classes, also the corresponding boundaries
328 /// need to take this into account
329 
331 {
332  UInt_t nvar=GetNvar();
333  std::vector<Double_t> xmin(nvar), xmax(nvar);
334  for (UInt_t ivar=0; ivar<nvar; ivar++) {xmin[ivar]=1e30; xmax[ivar]=-1e30;}
335 
336  UInt_t nevents=Data()->GetNEvents();
337  for (UInt_t ievt=0; ievt<nevents; ievt++) {
338  // use the true-event-type's transformation
339  // set the event true event types transformation
340  const Event* origEv = Data()->GetEvent(ievt);
341  if (IgnoreEventsWithNegWeightsInTraining() && origEv->GetWeight()<=0) continue;
342  // loop over classes
343  for (int cls=0;cls<2;cls++){
344  GetTransformationHandler().SetTransformationReferenceClass(cls);
345  const Event* ev = GetTransformationHandler().Transform( origEv );
346  for (UInt_t ivar=0; ivar<nvar; ivar++) {
347  Float_t value = ev->GetValue(ivar);
348  if (value < xmin[ivar]) xmin[ivar] = value;
349  if (value > xmax[ivar]) xmax[ivar] = value;
350  }
351  }
352  }
353 
354  // create reference histograms
355  // (KDE smoothing requires very finely binned reference histograms)
356  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
357  TString var = (*fInputVars)[ivar];
358 
359  // the reference histograms require the correct boundaries. Since in Likelihood classification
360  // the transformations are applied using both classes, also the corresponding boundaries
361  // need to take this into account
362 
363  // special treatment for discrete variables
364  if (DataInfo().GetVariableInfo(ivar).GetVarType() == 'I') {
365  // special treatment for integer variables
366  Int_t ixmin = TMath::Nint( xmin[ivar] );
367  xmax[ivar]=xmax[ivar]+1; // make sure that all entries are included in histogram
368  Int_t ixmax = TMath::Nint( xmax[ivar] );
369  Int_t nbins = ixmax - ixmin;
370 
371  (*fHistSig)[ivar] = new TH1F( var + "_sig", var + " signal training", nbins, ixmin, ixmax );
372  (*fHistBgd)[ivar] = new TH1F( var + "_bgd", var + " background training", nbins, ixmin, ixmax );
373  } else {
374 
375  UInt_t minNEvt = TMath::Min(Data()->GetNEvtSigTrain(),Data()->GetNEvtBkgdTrain());
376  Int_t nbinsS = (*fPDFSig)[ivar]->GetHistNBins( minNEvt );
377  Int_t nbinsB = (*fPDFBgd)[ivar]->GetHistNBins( minNEvt );
378 
379  (*fHistSig)[ivar] = new TH1F( Form("%s_sig",var.Data()),
380  Form("%s signal training",var.Data()), nbinsS, xmin[ivar], xmax[ivar] );
381  (*fHistBgd)[ivar] = new TH1F( Form("%s_bgd",var.Data()),
382  Form("%s background training",var.Data()), nbinsB, xmin[ivar], xmax[ivar] );
383  }
384  }
385 
386  // ----- fill the reference histograms
387  Log() << kINFO << "Filling reference histograms" << Endl;
388 
389  // event loop
390  for (Int_t ievt=0; ievt<Data()->GetNEvents(); ievt++) {
391 
392  // use the true-event-type's transformation
393  // set the event true event types transformation
394  const Event* origEv = Data()->GetEvent(ievt);
395  if (IgnoreEventsWithNegWeightsInTraining() && origEv->GetWeight()<=0) continue;
396  GetTransformationHandler().SetTransformationReferenceClass( origEv->GetClass() );
397  const Event* ev = GetTransformationHandler().Transform( origEv );
398 
399  // the event weight
400  Float_t weight = ev->GetWeight();
401 
402  // fill variable vector
403  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
404  Double_t value = ev->GetValue(ivar);
405  // verify limits
406  if (value >= xmax[ivar]) value = xmax[ivar] - 1.0e-10;
407  else if (value < xmin[ivar]) value = xmin[ivar] + 1.0e-10;
408  // inserting check if there are events in overflow or underflow
409  if (value >=(*fHistSig)[ivar]->GetXaxis()->GetXmax() ||
410  value <(*fHistSig)[ivar]->GetXaxis()->GetXmin()){
411  Log()<<kWARNING
412  <<"error in filling likelihood reference histograms var="
413  <<(*fInputVars)[ivar]
414  << ", xmin="<<(*fHistSig)[ivar]->GetXaxis()->GetXmin()
415  << ", value="<<value
416  << ", xmax="<<(*fHistSig)[ivar]->GetXaxis()->GetXmax()
417  << Endl;
418  }
419  if (DataInfo().IsSignal(ev)) (*fHistSig)[ivar]->Fill( value, weight );
420  else (*fHistBgd)[ivar]->Fill( value, weight );
421  }
422  }
423 
424  // building the pdfs
425  Log() << kINFO << "Building PDF out of reference histograms" << Endl;
426  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
427 
428  // the PDF is built from (binned) reference histograms
429  // in case of KDE, this has a large number of bins, which makes it quasi-unbinned
430  (*fPDFSig)[ivar]->BuildPDF( (*fHistSig)[ivar] );
431  (*fPDFBgd)[ivar]->BuildPDF( (*fHistBgd)[ivar] );
432 
433  (*fPDFSig)[ivar]->ValidatePDF( (*fHistSig)[ivar] );
434  (*fPDFBgd)[ivar]->ValidatePDF( (*fHistBgd)[ivar] );
435 
436  // saving the smoothed histograms
437  if ((*fPDFSig)[ivar]->GetSmoothedHist() != 0) (*fHistSig_smooth)[ivar] = (*fPDFSig)[ivar]->GetSmoothedHist();
438  if ((*fPDFBgd)[ivar]->GetSmoothedHist() != 0) (*fHistBgd_smooth)[ivar] = (*fPDFBgd)[ivar]->GetSmoothedHist();
439  }
440 }
441 
442 ////////////////////////////////////////////////////////////////////////////////
443 /// returns the likelihood estimator for signal
444 /// fill a new Likelihood branch into the testTree
445 
447 {
448  UInt_t ivar;
449 
450  // cannot determine error
451  NoErrorCalc(err, errUpper);
452 
453  // retrieve variables, and transform, if required
454  TVector vs( GetNvar() );
455  TVector vb( GetNvar() );
456 
457  // need to distinguish signal and background in case of variable transformation
458  // signal first
459 
460  GetTransformationHandler().SetTransformationReferenceClass( fSignalClass );
461  // temporary: JS --> FIX
462  //GetTransformationHandler().SetTransformationReferenceClass( 0 );
463  const Event* ev = GetEvent();
464  for (ivar=0; ivar<GetNvar(); ivar++) vs(ivar) = ev->GetValue(ivar);
465 
466  GetTransformationHandler().SetTransformationReferenceClass( fBackgroundClass );
467  // temporary: JS --> FIX
468  //GetTransformationHandler().SetTransformationReferenceClass( 1 );
469  ev = GetEvent();
470  for (ivar=0; ivar<GetNvar(); ivar++) vb(ivar) = ev->GetValue(ivar);
471 
472  // compute the likelihood (signal)
473  Double_t ps(1), pb(1), p(0);
474  for (ivar=0; ivar<GetNvar(); ivar++) {
475 
476  // drop one variable (this is ONLY used for internal variable ranking !)
477  if ((Int_t)ivar == fDropVariable) continue;
478 
479  Double_t x[2] = { vs(ivar), vb(ivar) };
480 
481  for (UInt_t itype=0; itype < 2; itype++) {
482 
483  // verify limits
484  if (x[itype] >= (*fPDFSig)[ivar]->GetXmax()) x[itype] = (*fPDFSig)[ivar]->GetXmax() - 1.0e-10;
485  else if (x[itype] < (*fPDFSig)[ivar]->GetXmin()) x[itype] = (*fPDFSig)[ivar]->GetXmin();
486 
487  // find corresponding histogram from cached indices
488  PDF* pdf = (itype == 0) ? (*fPDFSig)[ivar] : (*fPDFBgd)[ivar];
489  if (pdf == 0) Log() << kFATAL << "<GetMvaValue> Reference histograms don't exist" << Endl;
490  TH1* hist = pdf->GetPDFHist();
491 
492  // interpolate linearly between adjacent bins
493  // this is not useful for discrete variables
494  Int_t bin = hist->FindBin(x[itype]);
495 
496  // **** POTENTIAL BUG: PREFORMANCE IS WORSE WHEN USING TRUE TYPE ***
497  // ==> commented out at present
498  if ((*fPDFSig)[ivar]->GetInterpolMethod() == TMVA::PDF::kSpline0 ||
499  DataInfo().GetVariableInfo(ivar).GetVarType() == 'N') {
500  p = TMath::Max( hist->GetBinContent(bin), fEpsilon );
501  } else { // splined PDF
502  Int_t nextbin = bin;
503  if ((x[itype] > hist->GetBinCenter(bin) && bin != hist->GetNbinsX()) || bin == 1)
504  nextbin++;
505  else
506  nextbin--;
507 
508 
509  Double_t dx = hist->GetBinCenter(bin) - hist->GetBinCenter(nextbin);
510  Double_t dy = hist->GetBinContent(bin) - hist->GetBinContent(nextbin);
511  Double_t like = hist->GetBinContent(bin) + (x[itype] - hist->GetBinCenter(bin)) * dy/dx;
512 
513  p = TMath::Max( like, fEpsilon );
514  }
515 
516  if (itype == 0) ps *= p;
517  else pb *= p;
518  }
519  }
520 
521  // the likelihood ratio (transform it ?)
522  return TransformLikelihoodOutput( ps, pb );
523 }
524 
525 ////////////////////////////////////////////////////////////////////////////////
526 /// returns transformed or non-transformed output
527 
529 {
530  if (ps < fEpsilon) ps = fEpsilon;
531  if (pb < fEpsilon) pb = fEpsilon;
532  Double_t r = ps/(ps + pb);
533  if (r >= 1.0) r = 1. - 1.e-15;
534 
535  if (fTransformLikelihoodOutput) {
536  // inverse Fermi function
537 
538  // sanity check
539  if (r <= 0.0) r = fEpsilon;
540  else if (r >= 1.0) r = 1. - 1.e-15;
541 
542  Double_t tau = 15.0;
543  r = - TMath::Log(1.0/r - 1.0)/tau;
544  }
545 
546  return r;
547 }
548 
549 ////////////////////////////////////////////////////////////////////////////////
550 /// write options to stream
551 
552 void TMVA::MethodLikelihood::WriteOptionsToStream( std::ostream& o, const TString& prefix ) const
553 {
555 
556  // writing the options defined for the different pdfs
557  if (fDefaultPDFLik != 0) {
558  o << prefix << std::endl << prefix << "#Default Likelihood PDF Options:" << std::endl << prefix << std::endl;
559  fDefaultPDFLik->WriteOptionsToStream( o, prefix );
560  }
561  for (UInt_t ivar = 0; ivar < fPDFSig->size(); ivar++) {
562  if ((*fPDFSig)[ivar] != 0) {
563  o << prefix << std::endl << prefix << Form("#Signal[%d] Likelihood PDF Options:",ivar) << std::endl << prefix << std::endl;
564  (*fPDFSig)[ivar]->WriteOptionsToStream( o, prefix );
565  }
566  if ((*fPDFBgd)[ivar] != 0) {
567  o << prefix << std::endl << prefix << "#Background[%d] Likelihood PDF Options:" << std::endl << prefix << std::endl;
568  (*fPDFBgd)[ivar]->WriteOptionsToStream( o, prefix );
569  }
570  }
571 }
572 
573 ////////////////////////////////////////////////////////////////////////////////
574 /// write weights to XML
575 
576 void TMVA::MethodLikelihood::AddWeightsXMLTo( void* parent ) const
577 {
578  void* wght = gTools().AddChild(parent, "Weights");
579  gTools().AddAttr(wght, "NVariables", GetNvar());
580  gTools().AddAttr(wght, "NClasses", 2);
581  void* pdfwrap;
582  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
583  if ( (*fPDFSig)[ivar]==0 || (*fPDFBgd)[ivar]==0 )
584  Log() << kFATAL << "Reference histograms for variable " << ivar
585  << " don't exist, can't write it to weight file" << Endl;
586  pdfwrap = gTools().AddChild(wght, "PDFDescriptor");
587  gTools().AddAttr(pdfwrap, "VarIndex", ivar);
588  gTools().AddAttr(pdfwrap, "ClassIndex", 0);
589  (*fPDFSig)[ivar]->AddXMLTo(pdfwrap);
590  pdfwrap = gTools().AddChild(wght, "PDFDescriptor");
591  gTools().AddAttr(pdfwrap, "VarIndex", ivar);
592  gTools().AddAttr(pdfwrap, "ClassIndex", 1);
593  (*fPDFBgd)[ivar]->AddXMLTo(pdfwrap);
594  }
595 }
596 
597 ////////////////////////////////////////////////////////////////////////////////
598 /// computes ranking of input variables
599 
601 {
602  // create the ranking object
603  if (fRanking) delete fRanking;
604  fRanking = new Ranking( GetName(), "Delta Separation" );
605 
606  Double_t sepRef = -1, sep = -1;
607  for (Int_t ivar=-1; ivar<(Int_t)GetNvar(); ivar++) {
608 
609  // this variable should not be used
610  fDropVariable = ivar;
611 
612  TString nameS = Form( "rS_%i", ivar+1 );
613  TString nameB = Form( "rB_%i", ivar+1 );
614  TH1* rS = new TH1F( nameS, nameS, 80, 0, 1 );
615  TH1* rB = new TH1F( nameB, nameB, 80, 0, 1 );
616 
617  // the event loop
618  for (Int_t ievt=0; ievt<Data()->GetNTrainingEvents(); ievt++) {
619 
620  const Event* origEv = Data()->GetEvent(ievt);
621  GetTransformationHandler().SetTransformationReferenceClass( origEv->GetClass() );
622  const Event* ev = GetTransformationHandler().Transform(Data()->GetEvent(ievt));
623 
624  Double_t lk = this->GetMvaValue();
625  Double_t w = ev->GetWeight();
626  if (DataInfo().IsSignal(ev)) rS->Fill( lk, w );
627  else rB->Fill( lk, w );
628  }
629 
630  // compute separation
631  sep = TMVA::gTools().GetSeparation( rS, rB );
632  if (ivar == -1) sepRef = sep;
633  sep = sepRef - sep;
634 
635  // don't need these histograms anymore
636  delete rS;
637  delete rB;
638 
639  if (ivar >= 0) fRanking->AddRank( Rank( DataInfo().GetVariableInfo(ivar).GetInternalName(), sep ) );
640  }
641 
642  fDropVariable = -1;
643 
644  return fRanking;
645 }
646 
647 ////////////////////////////////////////////////////////////////////////////////
648 /// write reference PDFs to ROOT file
649 
651 {
652  TString pname = "PDF_";
653  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
654  (*fPDFSig)[ivar]->Write( pname + GetInputVar( ivar ) + "_S" );
655  (*fPDFBgd)[ivar]->Write( pname + GetInputVar( ivar ) + "_B" );
656  }
657 }
658 ////////////////////////////////////////////////////////////////////////////////
659 /// read weights from XML
660 
662 {
663  TString pname = "PDF_";
664  Bool_t addDirStatus = TH1::AddDirectoryStatus();
665  TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
666  UInt_t nvars=0;
667  gTools().ReadAttr(wghtnode, "NVariables",nvars);
668  void* descnode = gTools().GetChild(wghtnode);
669  for (UInt_t ivar=0; ivar<nvars; ivar++){
670  void* pdfnode = gTools().GetChild(descnode);
671  Log() << kINFO << "Reading signal and background PDF for variable: " << GetInputVar( ivar ) << Endl;
672  if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
673  if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
674  (*fPDFSig)[ivar] = new PDF( GetInputVar( ivar ) + " PDF Sig" );
675  (*fPDFBgd)[ivar] = new PDF( GetInputVar( ivar ) + " PDF Bkg" );
676  (*fPDFSig)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
677  (*fPDFBgd)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
678  (*(*fPDFSig)[ivar]).ReadXML(pdfnode);
679  descnode = gTools().GetNextChild(descnode);
680  pdfnode = gTools().GetChild(descnode);
681  (*(*fPDFBgd)[ivar]).ReadXML(pdfnode);
682  descnode = gTools().GetNextChild(descnode);
683  }
684  TH1::AddDirectory(addDirStatus);
685 }
686 ////////////////////////////////////////////////////////////////////////////////
687 /// read weight info from file
688 /// nothing to do for this method
689 
691 {
692  TString pname = "PDF_";
693  Bool_t addDirStatus = TH1::AddDirectoryStatus();
694  TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
695  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
696  Log() << kINFO << "Reading signal and background PDF for variable: " << GetInputVar( ivar ) << Endl;
697  if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
698  if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
699  (*fPDFSig)[ivar] = new PDF(GetInputVar( ivar ) + " PDF Sig" );
700  (*fPDFBgd)[ivar] = new PDF(GetInputVar( ivar ) + " PDF Bkg");
701  (*fPDFSig)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
702  (*fPDFBgd)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
703  istr >> *(*fPDFSig)[ivar];
704  istr >> *(*fPDFBgd)[ivar];
705  }
706  TH1::AddDirectory(addDirStatus);
707 }
708 
709 ////////////////////////////////////////////////////////////////////////////////
710 /// read reference PDF from ROOT file
711 
713 {
714  TString pname = "PDF_";
715  Bool_t addDirStatus = TH1::AddDirectoryStatus();
716  TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
717  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
718  (*fPDFSig)[ivar] = (TMVA::PDF*)rf.Get( Form( "PDF_%s_S", GetInputVar( ivar ).Data() ) );
719  (*fPDFBgd)[ivar] = (TMVA::PDF*)rf.Get( Form( "PDF_%s_B", GetInputVar( ivar ).Data() ) );
720  }
721  TH1::AddDirectory(addDirStatus);
722 }
723 
724 ////////////////////////////////////////////////////////////////////////////////
725 /// write histograms and PDFs to file for monitoring purposes
726 
728 {
729  Log() << kINFO << "Write monitoring histograms to file: " << BaseDir()->GetPath() << Endl;
730  BaseDir()->cd();
731 
732  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
733  (*fHistSig)[ivar]->Write();
734  (*fHistBgd)[ivar]->Write();
735  if ((*fHistSig_smooth)[ivar] != 0) (*fHistSig_smooth)[ivar]->Write();
736  if ((*fHistBgd_smooth)[ivar] != 0) (*fHistBgd_smooth)[ivar]->Write();
737  (*fPDFSig)[ivar]->GetPDFHist()->Write();
738  (*fPDFBgd)[ivar]->GetPDFHist()->Write();
739 
740  if ((*fPDFSig)[ivar]->GetNSmoothHist() != 0) (*fPDFSig)[ivar]->GetNSmoothHist()->Write();
741  if ((*fPDFBgd)[ivar]->GetNSmoothHist() != 0) (*fPDFBgd)[ivar]->GetNSmoothHist()->Write();
742 
743  // add special plots to check the smoothing in the GetVal method
744  Float_t xmin=((*fPDFSig)[ivar]->GetPDFHist()->GetXaxis())->GetXmin();
745  Float_t xmax=((*fPDFSig)[ivar]->GetPDFHist()->GetXaxis())->GetXmax();
746  TH1F* mm = new TH1F( (*fInputVars)[ivar]+"_additional_check",
747  (*fInputVars)[ivar]+"_additional_check", 15000, xmin, xmax );
748  Double_t intBin = (xmax-xmin)/15000;
749  for (Int_t bin=0; bin < 15000; bin++) {
750  Double_t x = (bin + 0.5)*intBin + xmin;
751  mm->SetBinContent(bin+1 ,(*fPDFSig)[ivar]->GetVal(x));
752  }
753  mm->Write();
754 
755  // ---------- create cloned low-binned histogram for comparison in macros (mainly necessary for KDE)
756  TH1* h[2] = { (*fHistSig)[ivar], (*fHistBgd)[ivar] };
757  for (UInt_t i=0; i<2; i++) {
758  TH1* hclone = (TH1F*)h[i]->Clone( TString(h[i]->GetName()) + "_nice" );
759  hclone->SetName ( TString(h[i]->GetName()) + "_nice" );
760  hclone->SetTitle( TString(h[i]->GetTitle()) + "" );
761  if (hclone->GetNbinsX() > 100) {
762  Int_t resFactor = 5;
763  hclone->Rebin( resFactor );
764  hclone->Scale( 1.0/resFactor );
765  }
766  hclone->Write();
767  }
768  // ----------
769  }
770 }
771 
772 ////////////////////////////////////////////////////////////////////////////////
773 /// write specific header of the classifier (mostly include files)
774 
775 void TMVA::MethodLikelihood::MakeClassSpecificHeader( std::ostream& fout, const TString& ) const
776 {
777  fout << "#include <math.h>" << std::endl;
778  fout << "#include <cstdlib>" << std::endl;
779 }
780 
781 ////////////////////////////////////////////////////////////////////////////////
782 /// write specific classifier response
783 
784 void TMVA::MethodLikelihood::MakeClassSpecific( std::ostream& fout, const TString& className ) const
785 {
786  Int_t dp = fout.precision();
787  fout << " double fEpsilon;" << std::endl;
788 
789  Int_t * nbin = new Int_t[GetNvar()];
790 
791  Int_t nbinMax=-1;
792  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
793  nbin[ivar]=(*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX();
794  if (nbin[ivar] > nbinMax) nbinMax=nbin[ivar];
795  }
796 
797  fout << " static float fRefS[][" << nbinMax << "]; "
798  << "// signal reference vector [nvars][max_nbins]" << std::endl;
799  fout << " static float fRefB[][" << nbinMax << "]; "
800  << "// backgr reference vector [nvars][max_nbins]" << std::endl << std::endl;
801  fout << "// if a variable has its PDF encoded as a spline0 --> treat it like an Integer valued one" <<std::endl;
802  fout << " bool fHasDiscretPDF[" << GetNvar() <<"]; "<< std::endl;
803  fout << " int fNbin[" << GetNvar() << "]; "
804  << "// number of bins (discrete variables may have less bins)" << std::endl;
805  fout << " double fHistMin[" << GetNvar() << "]; " << std::endl;
806  fout << " double fHistMax[" << GetNvar() << "]; " << std::endl;
807 
808  fout << " double TransformLikelihoodOutput( double, double ) const;" << std::endl;
809  fout << "};" << std::endl;
810  fout << "" << std::endl;
811  fout << "inline void " << className << "::Initialize() " << std::endl;
812  fout << "{" << std::endl;
813  fout << " fEpsilon = " << fEpsilon << ";" << std::endl;
814  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
815  fout << " fNbin[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() << ";" << std::endl;
816  fout << " fHistMin[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetXaxis()->GetXmin() << ";" << std::endl;
817  fout << " fHistMax[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetXaxis()->GetXmax() << ";" << std::endl;
818  // sanity check (for previous code lines)
819  if ((((*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() != nbin[ivar] ||
820  (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX() != nbin[ivar])
821  ) ||
822  (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() != (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX()) {
823  Log() << kFATAL << "<MakeClassSpecific> Mismatch in binning of variable "
824  << "\"" << GetOriginalVarName(ivar) << "\" of type: \'" << DataInfo().GetVariableInfo(ivar).GetVarType()
825  << "\' : "
826  << "nxS = " << (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() << ", "
827  << "nxB = " << (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX()
828  << " while we expect " << nbin[ivar]
829  << Endl;
830  }
831  }
832  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
833  if ((*fPDFSig)[ivar]->GetInterpolMethod() == TMVA::PDF::kSpline0)
834  fout << " fHasDiscretPDF[" << ivar <<"] = true; " << std::endl;
835  else
836  fout << " fHasDiscretPDF[" << ivar <<"] = false; " << std::endl;
837  }
838 
839  fout << "}" << std::endl << std::endl;
840 
841  fout << "inline double " << className
842  << "::GetMvaValue__( const std::vector<double>& inputValues ) const" << std::endl;
843  fout << "{" << std::endl;
844  fout << " double ps(1), pb(1);" << std::endl;
845  fout << " std::vector<double> inputValuesSig = inputValues;" << std::endl;
846  fout << " std::vector<double> inputValuesBgd = inputValues;" << std::endl;
847  if (GetTransformationHandler().GetTransformationList().GetSize() != 0) {
848  fout << " Transform(inputValuesSig,0);" << std::endl;
849  fout << " Transform(inputValuesBgd,1);" << std::endl;
850  }
851  fout << " for (size_t ivar = 0; ivar < GetNvar(); ivar++) {" << std::endl;
852  fout << std::endl;
853  fout << " // dummy at present... will be used for variable transforms" << std::endl;
854  fout << " double x[2] = { inputValuesSig[ivar], inputValuesBgd[ivar] };" << std::endl;
855  fout << std::endl;
856  fout << " for (int itype=0; itype < 2; itype++) {" << std::endl;
857  fout << std::endl;
858  fout << " // interpolate linearly between adjacent bins" << std::endl;
859  fout << " // this is not useful for discrete variables (or forced Spline0)" << std::endl;
860  fout << " int bin = int((x[itype] - fHistMin[ivar])/(fHistMax[ivar] - fHistMin[ivar])*fNbin[ivar]) + 0;" << std::endl;
861  fout << std::endl;
862  fout << " // since the test data sample is in general different from the training sample" << std::endl;
863  fout << " // it can happen that the min/max of the training sample are trespassed --> correct this" << std::endl;
864  fout << " if (bin < 0) {" << std::endl;
865  fout << " bin = 0;" << std::endl;
866  fout << " x[itype] = fHistMin[ivar];" << std::endl;
867  fout << " }" << std::endl;
868  fout << " else if (bin >= fNbin[ivar]) {" << std::endl;
869  fout << " bin = fNbin[ivar]-1;" << std::endl;
870  fout << " x[itype] = fHistMax[ivar];" << std::endl;
871  fout << " }" << std::endl;
872  fout << std::endl;
873  fout << " // find corresponding histogram from cached indices" << std::endl;
874  fout << " float ref = (itype == 0) ? fRefS[ivar][bin] : fRefB[ivar][bin];" << std::endl;
875  fout << std::endl;
876  fout << " // sanity check" << std::endl;
877  fout << " if (ref < 0) {" << std::endl;
878  fout << " std::cout << \"Fatal error in " << className
879  << ": bin entry < 0 ==> abort\" << std::endl;" << std::endl;
880  fout << " std::exit(1);" << std::endl;
881  fout << " }" << std::endl;
882  fout << std::endl;
883  fout << " double p = ref;" << std::endl;
884  fout << std::endl;
885  fout << " if (GetType(ivar) != 'I' && !fHasDiscretPDF[ivar]) {" << std::endl;
886  fout << " float bincenter = (bin + 0.5)/fNbin[ivar]*(fHistMax[ivar] - fHistMin[ivar]) + fHistMin[ivar];" << std::endl;
887  fout << " int nextbin = bin;" << std::endl;
888  fout << " if ((x[itype] > bincenter && bin != fNbin[ivar]-1) || bin == 0) " << std::endl;
889  fout << " nextbin++;" << std::endl;
890  fout << " else" << std::endl;
891  fout << " nextbin--; " << std::endl;
892  fout << std::endl;
893  fout << " double refnext = (itype == 0) ? fRefS[ivar][nextbin] : fRefB[ivar][nextbin];" << std::endl;
894  fout << " float nextbincenter = (nextbin + 0.5)/fNbin[ivar]*(fHistMax[ivar] - fHistMin[ivar]) + fHistMin[ivar];" << std::endl;
895  fout << std::endl;
896  fout << " double dx = bincenter - nextbincenter;" << std::endl;
897  fout << " double dy = ref - refnext;" << std::endl;
898  fout << " p += (x[itype] - bincenter) * dy/dx;" << std::endl;
899  fout << " }" << std::endl;
900  fout << std::endl;
901  fout << " if (p < fEpsilon) p = fEpsilon; // avoid zero response" << std::endl;
902  fout << std::endl;
903  fout << " if (itype == 0) ps *= p;" << std::endl;
904  fout << " else pb *= p;" << std::endl;
905  fout << " } " << std::endl;
906  fout << " } " << std::endl;
907  fout << std::endl;
908  fout << " // the likelihood ratio (transform it ?)" << std::endl;
909  fout << " return TransformLikelihoodOutput( ps, pb ); " << std::endl;
910  fout << "}" << std::endl << std::endl;
911 
912  fout << "inline double " << className << "::TransformLikelihoodOutput( double ps, double pb ) const" << std::endl;
913  fout << "{" << std::endl;
914  fout << " // returns transformed or non-transformed output" << std::endl;
915  fout << " if (ps < fEpsilon) ps = fEpsilon;" << std::endl;
916  fout << " if (pb < fEpsilon) pb = fEpsilon;" << std::endl;
917  fout << " double r = ps/(ps + pb);" << std::endl;
918  fout << " if (r >= 1.0) r = 1. - 1.e-15;" << std::endl;
919  fout << std::endl;
920  fout << " if (" << (fTransformLikelihoodOutput ? "true" : "false") << ") {" << std::endl;
921  fout << " // inverse Fermi function" << std::endl;
922  fout << std::endl;
923  fout << " // sanity check" << std::endl;
924  fout << " if (r <= 0.0) r = fEpsilon;" << std::endl;
925  fout << " else if (r >= 1.0) r = 1. - 1.e-15;" << std::endl;
926  fout << std::endl;
927  fout << " double tau = 15.0;" << std::endl;
928  fout << " r = - log(1.0/r - 1.0)/tau;" << std::endl;
929  fout << " }" << std::endl;
930  fout << std::endl;
931  fout << " return r;" << std::endl;
932  fout << "}" << std::endl;
933  fout << std::endl;
934 
935  fout << "// Clean up" << std::endl;
936  fout << "inline void " << className << "::Clear() " << std::endl;
937  fout << "{" << std::endl;
938  fout << " // nothing to clear" << std::endl;
939  fout << "}" << std::endl << std::endl;
940 
941  fout << "// signal map" << std::endl;
942  fout << "float " << className << "::fRefS[][" << nbinMax << "] = " << std::endl;
943  fout << "{ " << std::endl;
944  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
945  fout << " { ";
946  for (Int_t ibin=1; ibin<=nbinMax; ibin++) {
947  if (ibin-1 < nbin[ivar])
948  fout << (*fPDFSig)[ivar]->GetPDFHist()->GetBinContent(ibin);
949  else
950  fout << -1;
951 
952  if (ibin < nbinMax) fout << ", ";
953  }
954  fout << " }, " << std::endl;
955  }
956  fout << "}; " << std::endl;
957  fout << std::endl;
958 
959  fout << "// background map" << std::endl;
960  fout << "float " << className << "::fRefB[][" << nbinMax << "] = " << std::endl;
961  fout << "{ " << std::endl;
962  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
963  fout << " { ";
964  fout << std::setprecision(8);
965  for (Int_t ibin=1; ibin<=nbinMax; ibin++) {
966  if (ibin-1 < nbin[ivar])
967  fout << (*fPDFBgd)[ivar]->GetPDFHist()->GetBinContent(ibin);
968  else
969  fout << -1;
970 
971  if (ibin < nbinMax) fout << ", ";
972  }
973  fout << " }, " << std::endl;
974  }
975  fout << "}; " << std::endl;
976  fout << std::endl;
977  fout << std::setprecision(dp);
978 
979  delete[] nbin;
980 }
981 
982 ////////////////////////////////////////////////////////////////////////////////
983 /// get help message text
984 ///
985 /// typical length of text line:
986 /// "|--------------------------------------------------------------|"
987 
989 {
990  Log() << Endl;
991  Log() << gTools().Color("bold") << "--- Short description:" << gTools().Color("reset") << Endl;
992  Log() << Endl;
993  Log() << "The maximum-likelihood classifier models the data with probability " << Endl;
994  Log() << "density functions (PDF) reproducing the signal and background" << Endl;
995  Log() << "distributions of the input variables. Correlations among the " << Endl;
996  Log() << "variables are ignored." << Endl;
997  Log() << Endl;
998  Log() << gTools().Color("bold") << "--- Performance optimisation:" << gTools().Color("reset") << Endl;
999  Log() << Endl;
1000  Log() << "Required for good performance are decorrelated input variables" << Endl;
1001  Log() << "(PCA transformation via the option \"VarTransform=Decorrelate\"" << Endl;
1002  Log() << "may be tried). Irreducible non-linear correlations may be reduced" << Endl;
1003  Log() << "by precombining strongly correlated input variables, or by simply" << Endl;
1004  Log() << "removing one of the variables." << Endl;
1005  Log() << Endl;
1006  Log() << gTools().Color("bold") << "--- Performance tuning via configuration options:" << gTools().Color("reset") << Endl;
1007  Log() << Endl;
1008  Log() << "High fidelity PDF estimates are mandatory, i.e., sufficient training " << Endl;
1009  Log() << "statistics is required to populate the tails of the distributions" << Endl;
1010  Log() << "It would be a surprise if the default Spline or KDE kernel parameters" << Endl;
1011  Log() << "provide a satisfying fit to the data. The user is advised to properly" << Endl;
1012  Log() << "tune the events per bin and smooth options in the spline cases" << Endl;
1013  Log() << "individually per variable. If the KDE kernel is used, the adaptive" << Endl;
1014  Log() << "Gaussian kernel may lead to artefacts, so please always also try" << Endl;
1015  Log() << "the non-adaptive one." << Endl;
1016  Log() << "" << Endl;
1017  Log() << "All tuning parameters must be adjusted individually for each input" << Endl;
1018  Log() << "variable!" << Endl;
1019 }
1020 
virtual Int_t Write(const char *name=0, Int_t option=0, Int_t bufsize=0)
Write this object to the current directory.
Definition: TObject.cxx:823
virtual Int_t FindBin(Double_t x, Double_t y=0, Double_t z=0)
Return Global bin number corresponding to x,y,z.
Definition: TH1.cxx:3478
void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
virtual void Scale(Double_t c1=1, Option_t *option="")
Multiply this histogram by a constant c1.
Definition: TH1.cxx:6174
virtual Int_t Fill(Double_t x)
Increment bin with abscissa X by 1.
Definition: TH1.cxx:3159
void WriteWeightsToStream(TFile &rf) const
write reference PDFs to ROOT file
ClassImp(TMVA::MethodLikelihood) TMVA
standard constructor
float xmin
Definition: THbookFile.cxx:93
MsgLogger & Endl(MsgLogger &ml)
Definition: MsgLogger.h:162
virtual Double_t GetBinContent(Int_t bin) const
Return content of bin number bin.
Definition: TH1.cxx:4629
virtual void WriteOptionsToStream(std::ostream &o, const TString &prefix) const
write options to stream
Double_t Log(Double_t x)
Definition: TMath.h:526
float Float_t
Definition: RtypesCore.h:53
void Train()
create reference distributions (PDFs) from signal and background events: fill histograms and smooth t...
TH1 * h
Definition: legend2.C:5
virtual ~MethodLikelihood()
destructor
A ROOT file is a suite of consecutive data records (TKey instances) with a well defined format...
Definition: TFile.h:45
EAnalysisType
Definition: Types.h:124
virtual TObject * Get(const char *namecycle)
Return pointer to object identified by namecycle.
THist< 1, float > TH1F
Definition: THist.h:315
Basic string class.
Definition: TString.h:137
static Bool_t AddDirectoryStatus()
static function: cannot be inlined on Windows/NT
Definition: TH1.cxx:709
1-D histogram with a float per channel (see TH1 documentation)}
Definition: TH1.h:570
void WriteMonitoringHistosToFile() const
write histograms and PDFs to file for monitoring purposes
Short_t Min(Short_t a, Short_t b)
Definition: TMathBase.h:170
int Int_t
Definition: RtypesCore.h:41
bool Bool_t
Definition: RtypesCore.h:59
const Bool_t kFALSE
Definition: Rtypes.h:92
int nbins[3]
virtual Int_t GetNbinsX() const
Definition: TH1.h:296
Double_t GetWeight() const
return the event weight - depending on whether the flag IgnoreNegWeightsInTraining is or not...
Definition: Event.cxx:376
void AddAttr(void *node, const char *, const T &value, Int_t precision=16)
Definition: Tools.h:308
void WriteOptionsToStream(std::ostream &o, const TString &prefix) const
write options to output stream (e.g. in writing the MVA weight files
void * AddChild(void *parent, const char *childname, const char *content=0, bool isRootNode=false)
add child node
Definition: Tools.cxx:1134
static void AddDirectory(Bool_t add=kTRUE)
Sets the flag controlling the automatic add of histograms in memory.
Definition: TH1.cxx:1231
Float_t GetValue(UInt_t ivar) const
return value of i'th variable
Definition: Event.cxx:231
const char * Data() const
Definition: TString.h:349
Tools & gTools()
Definition: Tools.cxx:79
Double_t x[n]
Definition: legend1.C:17
void ReadWeightsFromXML(void *wghtnode)
read weights from XML
void * GetChild(void *parent, const char *childname=0)
get child node
Definition: Tools.cxx:1158
std::vector< std::vector< double > > Data
Definition: PDF.h:71
Double_t TransformLikelihoodOutput(Double_t ps, Double_t pb) const
returns transformed or non-transformed output
virtual Double_t GetBinCenter(Int_t bin) const
return bin center for 1D historam Better to use h1.GetXaxis().GetBinCenter(bin)
Definition: TH1.cxx:8470
ROOT::R::TRInterface & r
Definition: Object.C:4
virtual void SetBinContent(Int_t bin, Double_t content)
Set bin content see convention for numbering bins in TH1::GetBin In case the bin number is greater th...
Definition: TH1.cxx:8543
virtual TH1 * Rebin(Int_t ngroup=2, const char *newname="", const Double_t *xbins=0)
Rebin this histogram.
Definition: TH1.cxx:5859
void GetHelpMessage() const
get help message text
MethodLikelihood(const TString &jobName, const TString &methodTitle, DataSetInfo &theData, const TString &theOption="", TDirectory *theTargetDir=0)
unsigned int UInt_t
Definition: RtypesCore.h:42
const Ranking * CreateRanking()
computes ranking of input variables
char * Form(const char *fmt,...)
Double_t GetSeparation(TH1 *S, TH1 *B) const
compute "separation" defined as = (1/2) Int_-oo..+oo { (S(x) - B(x))^2/(S(x) + B(x)) dx } ...
Definition: Tools.cxx:136
void ReadAttr(void *node, const char *, T &value)
Definition: Tools.h:295
float xmax
Definition: THbookFile.cxx:93
virtual Bool_t HasAnalysisType(Types::EAnalysisType type, UInt_t numberClasses, UInt_t numberTargets)
FDA can handle classification with 2 classes.
virtual void SetName(const char *name)
Change the name of this histogram.
Definition: TH1.cxx:8288
void MakeClassSpecific(std::ostream &, const TString &) const
write specific classifier response
double Double_t
Definition: RtypesCore.h:55
void DeclareOptions()
define the options (their key words) that can be set in the option string TransformOutput tran...
Describe directory structure in memory.
Definition: TDirectory.h:41
int type
Definition: TGX11.cxx:120
void * GetNextChild(void *prevchild, const char *childname=0)
XML helpers.
Definition: Tools.cxx:1170
The TH1 histogram class.
Definition: TH1.h:80
UInt_t GetClass() const
Definition: Event.h:86
void AddWeightsXMLTo(void *parent) const
write weights to XML
void Init()
default initialisation called by all constructors
const TString & Color(const TString &)
human readable color strings
Definition: Tools.cxx:837
void ReadWeightsFromStream(std::istream &istr)
read weight info from file nothing to do for this method
#define REGISTER_METHOD(CLASS)
for example
Abstract ClassifierFactory template that handles arbitrary types.
TH1 * GetPDFHist() const
Definition: PDF.h:100
virtual void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
Definition: MethodBase.cxx:599
Short_t Max(Short_t a, Short_t b)
Definition: TMathBase.h:202
void MakeClassSpecificHeader(std::ostream &, const TString &="") const
write specific header of the classifier (mostly include files)
#define NULL
Definition: Rtypes.h:82
virtual void SetTitle(const char *title)
Change (i.e.
Definition: TH1.cxx:6268
const Bool_t kTRUE
Definition: Rtypes.h:91
float value
Definition: math.cpp:443
Int_t Nint(T x)
Definition: TMath.h:480
Definition: math.cpp:60
Double_t GetMvaValue(Double_t *err=0, Double_t *errUpper=0)
returns the likelihood estimator for signal fill a new Likelihood branch into the testTree ...