ROOT  6.07/01
Reference Guide
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Friends Macros Groups Pages
MethodLikelihood.cxx
Go to the documentation of this file.
1 // @(#)root/tmva $Id$
2 // Author: Andreas Hoecker, Joerg Stelzer, Helge Voss, Kai Voss, Eckhard von Toerne, Jan Therhaag
3 
4 /**********************************************************************************
5  * Project: TMVA - a Root-integrated toolkit for multivariate Data analysis *
6  * Package: TMVA *
7  * Class : TMVA::MethodLikelihood *
8  * Web : http://tmva.sourceforge.net *
9  * *
10  * Description: *
11  * Implementation (see header for description) *
12  * *
13  * Authors (alphabetical): *
14  * Andreas Hoecker <Andreas.Hocker@cern.ch> - CERN, Switzerland *
15  * Helge Voss <Helge.Voss@cern.ch> - MPI-K Heidelberg, Germany *
16  * Kai Voss <Kai.Voss@cern.ch> - U. of Victoria, Canada *
17  * Jan Therhaag <Jan.Therhaag@cern.ch> - U of Bonn, Germany *
18  * Eckhard v. Toerne <evt@uni-bonn.de> - U. of Bonn, Germany *
19  * *
20  * Copyright (c) 2005-2011: *
21  * CERN, Switzerland *
22  * U. of Victoria, Canada *
23  * MPI-K Heidelberg, Germany *
24  * U. of Bonn, Germany *
25  * *
26  * Redistribution and use in source and binary forms, with or without *
27  * modification, are permitted according to the terms listed in LICENSE *
28  * (http://tmva.sourceforge.net/LICENSE) *
29  **********************************************************************************/
30 
31 ////////////////////////////////////////////////////////////////////////////////
32 
33 /* Begin_Html
34 
35  Likelihood analysis ("non-parametric approach")
36 
37  <p>
38  Also implemented is a "diagonalized likelihood approach",
39  which improves over the uncorrelated likelihood ansatz by
40  transforming linearly the input variables into a diagonal space,
41  using the square-root of the covariance matrix
42 
43  <p>
44  The method of maximum likelihood is the most straightforward, and
45  certainly among the most elegant multivariate analyser approaches.
46  We define the likelihood ratio, <i>R<sub>L</sub></i>, for event
47  <i>i</i>, by:<br>
48  <center>
49  <img vspace=6 src="gif/tmva_likratio.gif" align="bottom" >
50  </center>
51  Here the signal and background likelihoods, <i>L<sub>S</sub></i>,
52  <i>L<sub>B</sub></i>, are products of the corresponding probability
53  densities, <i>p<sub>S</sub></i>, <i>p<sub>B</sub></i>, of the
54  <i>N</i><sub>var</sub> discriminating variables used in the MVA: <br>
55  <center>
56  <img vspace=6 src="gif/tmva_lik.gif" align="bottom" >
57  </center>
58  and accordingly for L<sub>B</sub>.
59  In practise, TMVA uses polynomial splines to estimate the probability
60  density functions (PDF) obtained from the distributions of the
61  training variables.<br>
62 
63  <p>
64  Note that in TMVA the output of the likelihood ratio is transformed
65  by<br>
66  <center>
67  <img vspace=6 src="gif/tmva_likratio_trans.gif" align="bottom"/>
68  </center>
69  to avoid the occurrence of heavy peaks at <i>R<sub>L</sub></i>=0,1.
70 
71  <b>Decorrelated (or "diagonalized") Likelihood</b>
72 
73  <p>
74  The biggest drawback of the Likelihood approach is that it assumes
75  that the discriminant variables are uncorrelated. If it were the case,
76  it can be proven that the discrimination obtained by the above likelihood
77  ratio is optimal, ie, no other method can beat it. However, in most
78  practical applications of MVAs correlations are present. <br><p></p>
79 
80  <p>
81  Linear correlations, measured from the training sample, can be taken
82  into account in a straightforward manner through the square-root
83  of the covariance matrix. The square-root of a matrix
84  <i>C</i> is the matrix <i>C&prime;</i> that multiplied with itself
85  yields <i>C</i>: <i>C</i>=<i>C&prime;C&prime;</i>. We compute the
86  square-root matrix (SQM) by means of diagonalising (<i>D</i>) the
87  covariance matrix: <br>
88  <center>
89  <img vspace=6 src="gif/tmva_sqm.gif" align="bottom" >
90  </center>
91  and the linear transformation of the linearly correlated into the
92  uncorrelated variables space is then given by multiplying the measured
93  variable tuple by the inverse of the SQM. Note that these transformations
94  are performed for both signal and background separately, since the
95  correlation pattern is not the same in the two samples.
96 
97  <p>
98  The above diagonalisation is complete for linearly correlated,
99  Gaussian distributed variables only. In real-world examples this
100  is not often the case, so that only little additional information
101  may be recovered by the diagonalisation procedure. In these cases,
102  non-linear methods must be applied.
103 
104 End_Html */
105 //_______________________________________________________________________
106 
107 #include "TMVA/MethodLikelihood.h"
108 
109 #include <iomanip>
110 #include <vector>
111 #include <cstdlib>
112 
113 #include "TMatrixD.h"
114 #include "TVector.h"
115 #include "TMath.h"
116 #include "TObjString.h"
117 #include "TFile.h"
118 #include "TKey.h"
119 #include "TH1.h"
120 #include "TClass.h"
121 #include "Riostream.h"
122 
123 #include "TMVA/Configurable.h"
124 #include "TMVA/ClassifierFactory.h"
125 #include "TMVA/DataSet.h"
126 #include "TMVA/DataSetInfo.h"
127 #include "TMVA/MethodBase.h"
128 #include "TMVA/MsgLogger.h"
129 #include "TMVA/PDF.h"
130 #include "TMVA/Ranking.h"
131 #include "TMVA/Tools.h"
132 #include "TMVA/Types.h"
133 #include "TMVA/VariableInfo.h"
134 
135 REGISTER_METHOD(Likelihood)
136 
137 ClassImp(TMVA::MethodLikelihood)
138 
139 ////////////////////////////////////////////////////////////////////////////////
140 /// standard constructor
141 
142 TMVA::MethodLikelihood::MethodLikelihood( const TString& jobName,
143  const TString& methodTitle,
144  DataSetInfo& theData,
145  const TString& theOption,
146  TDirectory* theTargetDir ) :
147  TMVA::MethodBase( jobName, Types::kLikelihood, methodTitle, theData, theOption, theTargetDir ),
148  fEpsilon ( 1.e3 * DBL_MIN ),
149  fTransformLikelihoodOutput( kFALSE ),
150  fDropVariable ( 0 ),
151  fHistSig ( 0 ),
152  fHistBgd ( 0 ),
153  fHistSig_smooth( 0 ),
154  fHistBgd_smooth( 0 ),
155  fDefaultPDFLik ( 0 ),
156  fPDFSig ( 0 ),
157  fPDFBgd ( 0 ),
158  fNsmooth ( 2 ),
159  fNsmoothVarS ( 0 ),
160  fNsmoothVarB ( 0 ),
161  fAverageEvtPerBin( 0 ),
162  fAverageEvtPerBinVarS (0),
163  fAverageEvtPerBinVarB (0),
164  fKDEfineFactor ( 0 ),
165  fInterpolateString(0)
166 {
167 }
168 
169 ////////////////////////////////////////////////////////////////////////////////
170 /// construct likelihood references from file
171 
173  const TString& theWeightFile,
174  TDirectory* theTargetDir ) :
175  TMVA::MethodBase( Types::kLikelihood, theData, theWeightFile, theTargetDir ),
176  fEpsilon ( 1.e3 * DBL_MIN ),
177  fTransformLikelihoodOutput( kFALSE ),
178  fDropVariable ( 0 ),
179  fHistSig ( 0 ),
180  fHistBgd ( 0 ),
181  fHistSig_smooth( 0 ),
182  fHistBgd_smooth( 0 ),
183  fDefaultPDFLik ( 0 ),
184  fPDFSig ( 0 ),
185  fPDFBgd ( 0 ),
186  fNsmooth ( 2 ),
187  fNsmoothVarS ( 0 ),
188  fNsmoothVarB ( 0 ),
189  fAverageEvtPerBin( 0 ),
190  fAverageEvtPerBinVarS (0),
191  fAverageEvtPerBinVarB (0),
192  fKDEfineFactor ( 0 ),
193  fInterpolateString(0)
194 {
195 }
196 
197 ////////////////////////////////////////////////////////////////////////////////
198 /// destructor
199 
201 {
202  if (NULL != fDefaultPDFLik) delete fDefaultPDFLik;
203  if (NULL != fHistSig) delete fHistSig;
204  if (NULL != fHistBgd) delete fHistBgd;
205  if (NULL != fHistSig_smooth) delete fHistSig_smooth;
206  if (NULL != fHistBgd_smooth) delete fHistBgd_smooth;
207  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
208  if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
209  if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
210  }
211  if (NULL != fPDFSig) delete fPDFSig;
212  if (NULL != fPDFBgd) delete fPDFBgd;
213 }
214 
215 ////////////////////////////////////////////////////////////////////////////////
216 /// FDA can handle classification with 2 classes
217 
219  UInt_t numberClasses, UInt_t /*numberTargets*/ )
220 {
221  if (type == Types::kClassification && numberClasses == 2) return kTRUE;
222  return kFALSE;
223 }
224 
225 ////////////////////////////////////////////////////////////////////////////////
226 /// default initialisation called by all constructors
227 
229 {
230  // no ranking test
231  fDropVariable = -1;
232  fHistSig = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
233  fHistBgd = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
234  fHistSig_smooth = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
235  fHistBgd_smooth = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
236  fPDFSig = new std::vector<TMVA::PDF*>( GetNvar(), (TMVA::PDF*)0 );
237  fPDFBgd = new std::vector<TMVA::PDF*>( GetNvar(), (TMVA::PDF*)0 );
238 }
239 
240 ////////////////////////////////////////////////////////////////////////////////
241 /// define the options (their key words) that can be set in the option string
242 /// TransformOutput <bool> transform (often strongly peaked) likelihood output through sigmoid inversion
243 
245 {
246  DeclareOptionRef( fTransformLikelihoodOutput = kFALSE, "TransformOutput",
247  "Transform likelihood output by inverse sigmoid function" );
248 
249  // initialize
250 
251  // reading every PDF's definition and passing the option string to the next one to be read and marked
252  TString updatedOptions = GetOptions();
253  fDefaultPDFLik = new PDF( TString(GetName()) + " PDF", updatedOptions );
254  fDefaultPDFLik->DeclareOptions();
255  fDefaultPDFLik->ParseOptions();
256  updatedOptions = fDefaultPDFLik->GetOptions();
257  for (UInt_t ivar = 0; ivar< DataInfo().GetNVariables(); ivar++) {
258  (*fPDFSig)[ivar] = new PDF( Form("%s PDF Sig[%d]", GetName(), ivar), updatedOptions,
259  Form("Sig[%d]",ivar), fDefaultPDFLik );
260  (*fPDFSig)[ivar]->DeclareOptions();
261  (*fPDFSig)[ivar]->ParseOptions();
262  updatedOptions = (*fPDFSig)[ivar]->GetOptions();
263  (*fPDFBgd)[ivar] = new PDF( Form("%s PDF Bkg[%d]", GetName(), ivar), updatedOptions,
264  Form("Bkg[%d]",ivar), fDefaultPDFLik );
265  (*fPDFBgd)[ivar]->DeclareOptions();
266  (*fPDFBgd)[ivar]->ParseOptions();
267  updatedOptions = (*fPDFBgd)[ivar]->GetOptions();
268  }
269 
270  // the final marked option string is written back to the original likelihood
271  SetOptions( updatedOptions );
272 }
273 
274 
276 {
277  // options that are used ONLY for the READER to ensure backward compatibility
278 
280  DeclareOptionRef( fNsmooth = 1, "NSmooth",
281  "Number of smoothing iterations for the input histograms");
282  DeclareOptionRef( fAverageEvtPerBin = 50, "NAvEvtPerBin",
283  "Average number of events per PDF bin");
284  DeclareOptionRef( fKDEfineFactor =1. , "KDEFineFactor",
285  "Fine tuning factor for Adaptive KDE: Factor to multyply the width of the kernel");
286  DeclareOptionRef( fBorderMethodString = "None", "KDEborder",
287  "Border effects treatment (1=no treatment , 2=kernel renormalization, 3=sample mirroring)" );
288  DeclareOptionRef( fKDEiterString = "Nonadaptive", "KDEiter",
289  "Number of iterations (1=non-adaptive, 2=adaptive)" );
290  DeclareOptionRef( fKDEtypeString = "Gauss", "KDEtype",
291  "KDE kernel type (1=Gauss)" );
292  fAverageEvtPerBinVarS = new Int_t[GetNvar()];
293  fAverageEvtPerBinVarB = new Int_t[GetNvar()];
294  fNsmoothVarS = new Int_t[GetNvar()];
295  fNsmoothVarB = new Int_t[GetNvar()];
296  fInterpolateString = new TString[GetNvar()];
297  for(UInt_t i=0; i<GetNvar(); ++i) {
298  fAverageEvtPerBinVarS[i] = fAverageEvtPerBinVarB[i] = 0;
299  fNsmoothVarS[i] = fNsmoothVarB[i] = 0;
300  fInterpolateString[i] = "";
301  }
302  DeclareOptionRef( fAverageEvtPerBinVarS, GetNvar(), "NAvEvtPerBinSig",
303  "Average num of events per PDF bin and variable (signal)");
304  DeclareOptionRef( fAverageEvtPerBinVarB, GetNvar(), "NAvEvtPerBinBkg",
305  "Average num of events per PDF bin and variable (background)");
306  DeclareOptionRef(fNsmoothVarS, GetNvar(), "NSmoothSig",
307  "Number of smoothing iterations for the input histograms");
308  DeclareOptionRef(fNsmoothVarB, GetNvar(), "NSmoothBkg",
309  "Number of smoothing iterations for the input histograms");
310  DeclareOptionRef(fInterpolateString, GetNvar(), "PDFInterpol", "Method of interpolating reference histograms (e.g. Spline2 or KDE)");
311 }
312 
313 
314 
315 
316 ////////////////////////////////////////////////////////////////////////////////
317 
319 {
320  // process user options
321  // reference cut value to distingiush signal-like from background-like events
322  SetSignalReferenceCut( TransformLikelihoodOutput( 0.5, 0.5 ) );
323 
324  fDefaultPDFLik->ProcessOptions();
325  for (UInt_t ivar = 0; ivar< DataInfo().GetNVariables(); ivar++) {
326  (*fPDFBgd)[ivar]->ProcessOptions();
327  (*fPDFSig)[ivar]->ProcessOptions();
328  }
329 }
330 
331 ////////////////////////////////////////////////////////////////////////////////
332 /// create reference distributions (PDFs) from signal and background events:
333 /// fill histograms and smooth them; if decorrelation is required, compute
334 /// corresponding square-root matrices
335 /// the reference histograms require the correct boundaries. Since in Likelihood classification
336 /// the transformations are applied using both classes, also the corresponding boundaries
337 /// need to take this into account
338 
340 {
341  UInt_t nvar=GetNvar();
342  std::vector<Double_t> xmin(nvar), xmax(nvar);
343  for (UInt_t ivar=0; ivar<nvar; ivar++) {xmin[ivar]=1e30; xmax[ivar]=-1e30;}
344 
345  UInt_t nevents=Data()->GetNEvents();
346  for (UInt_t ievt=0; ievt<nevents; ievt++) {
347  // use the true-event-type's transformation
348  // set the event true event types transformation
349  const Event* origEv = Data()->GetEvent(ievt);
350  if (IgnoreEventsWithNegWeightsInTraining() && origEv->GetWeight()<=0) continue;
351  // loop over classes
352  for (int cls=0;cls<2;cls++){
353  GetTransformationHandler().SetTransformationReferenceClass(cls);
354  const Event* ev = GetTransformationHandler().Transform( origEv );
355  for (UInt_t ivar=0; ivar<nvar; ivar++) {
356  Float_t value = ev->GetValue(ivar);
357  if (value < xmin[ivar]) xmin[ivar] = value;
358  if (value > xmax[ivar]) xmax[ivar] = value;
359  }
360  }
361  }
362 
363  // create reference histograms
364  // (KDE smoothing requires very finely binned reference histograms)
365  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
366  TString var = (*fInputVars)[ivar];
367 
368  // the reference histograms require the correct boundaries. Since in Likelihood classification
369  // the transformations are applied using both classes, also the corresponding boundaries
370  // need to take this into account
371 
372  // special treatment for discrete variables
373  if (DataInfo().GetVariableInfo(ivar).GetVarType() == 'I') {
374  // special treatment for integer variables
375  Int_t ixmin = TMath::Nint( xmin[ivar] );
376  xmax[ivar]=xmax[ivar]+1; // make sure that all entries are included in histogram
377  Int_t ixmax = TMath::Nint( xmax[ivar] );
378  Int_t nbins = ixmax - ixmin;
379 
380  (*fHistSig)[ivar] = new TH1F( var + "_sig", var + " signal training", nbins, ixmin, ixmax );
381  (*fHistBgd)[ivar] = new TH1F( var + "_bgd", var + " background training", nbins, ixmin, ixmax );
382  } else {
383 
384  UInt_t minNEvt = TMath::Min(Data()->GetNEvtSigTrain(),Data()->GetNEvtBkgdTrain());
385  Int_t nbinsS = (*fPDFSig)[ivar]->GetHistNBins( minNEvt );
386  Int_t nbinsB = (*fPDFBgd)[ivar]->GetHistNBins( minNEvt );
387 
388  (*fHistSig)[ivar] = new TH1F( Form("%s_sig",var.Data()),
389  Form("%s signal training",var.Data()), nbinsS, xmin[ivar], xmax[ivar] );
390  (*fHistBgd)[ivar] = new TH1F( Form("%s_bgd",var.Data()),
391  Form("%s background training",var.Data()), nbinsB, xmin[ivar], xmax[ivar] );
392  }
393  }
394 
395  // ----- fill the reference histograms
396  Log() << kINFO << "Filling reference histograms" << Endl;
397 
398  // event loop
399  for (Int_t ievt=0; ievt<Data()->GetNEvents(); ievt++) {
400 
401  // use the true-event-type's transformation
402  // set the event true event types transformation
403  const Event* origEv = Data()->GetEvent(ievt);
404  if (IgnoreEventsWithNegWeightsInTraining() && origEv->GetWeight()<=0) continue;
405  GetTransformationHandler().SetTransformationReferenceClass( origEv->GetClass() );
406  const Event* ev = GetTransformationHandler().Transform( origEv );
407 
408  // the event weight
409  Float_t weight = ev->GetWeight();
410 
411  // fill variable vector
412  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
413  Double_t value = ev->GetValue(ivar);
414  // verify limits
415  if (value >= xmax[ivar]) value = xmax[ivar] - 1.0e-10;
416  else if (value < xmin[ivar]) value = xmin[ivar] + 1.0e-10;
417  // inserting check if there are events in overflow or underflow
418  if (value >=(*fHistSig)[ivar]->GetXaxis()->GetXmax() ||
419  value <(*fHistSig)[ivar]->GetXaxis()->GetXmin()){
420  Log()<<kWARNING
421  <<"error in filling likelihood reference histograms var="
422  <<(*fInputVars)[ivar]
423  << ", xmin="<<(*fHistSig)[ivar]->GetXaxis()->GetXmin()
424  << ", value="<<value
425  << ", xmax="<<(*fHistSig)[ivar]->GetXaxis()->GetXmax()
426  << Endl;
427  }
428  if (DataInfo().IsSignal(ev)) (*fHistSig)[ivar]->Fill( value, weight );
429  else (*fHistBgd)[ivar]->Fill( value, weight );
430  }
431  }
432 
433  // building the pdfs
434  Log() << kINFO << "Building PDF out of reference histograms" << Endl;
435  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
436 
437  // the PDF is built from (binned) reference histograms
438  // in case of KDE, this has a large number of bins, which makes it quasi-unbinned
439  (*fPDFSig)[ivar]->BuildPDF( (*fHistSig)[ivar] );
440  (*fPDFBgd)[ivar]->BuildPDF( (*fHistBgd)[ivar] );
441 
442  (*fPDFSig)[ivar]->ValidatePDF( (*fHistSig)[ivar] );
443  (*fPDFBgd)[ivar]->ValidatePDF( (*fHistBgd)[ivar] );
444 
445  // saving the smoothed histograms
446  if ((*fPDFSig)[ivar]->GetSmoothedHist() != 0) (*fHistSig_smooth)[ivar] = (*fPDFSig)[ivar]->GetSmoothedHist();
447  if ((*fPDFBgd)[ivar]->GetSmoothedHist() != 0) (*fHistBgd_smooth)[ivar] = (*fPDFBgd)[ivar]->GetSmoothedHist();
448  }
449 }
450 
451 ////////////////////////////////////////////////////////////////////////////////
452 /// returns the likelihood estimator for signal
453 /// fill a new Likelihood branch into the testTree
454 
456 {
457  UInt_t ivar;
458 
459  // cannot determine error
460  NoErrorCalc(err, errUpper);
461 
462  // retrieve variables, and transform, if required
463  TVector vs( GetNvar() );
464  TVector vb( GetNvar() );
465 
466  // need to distinguish signal and background in case of variable transformation
467  // signal first
468 
469  GetTransformationHandler().SetTransformationReferenceClass( fSignalClass );
470  // temporary: JS --> FIX
471  //GetTransformationHandler().SetTransformationReferenceClass( 0 );
472  const Event* ev = GetEvent();
473  for (ivar=0; ivar<GetNvar(); ivar++) vs(ivar) = ev->GetValue(ivar);
474 
475  GetTransformationHandler().SetTransformationReferenceClass( fBackgroundClass );
476  // temporary: JS --> FIX
477  //GetTransformationHandler().SetTransformationReferenceClass( 1 );
478  ev = GetEvent();
479  for (ivar=0; ivar<GetNvar(); ivar++) vb(ivar) = ev->GetValue(ivar);
480 
481  // compute the likelihood (signal)
482  Double_t ps(1), pb(1), p(0);
483  for (ivar=0; ivar<GetNvar(); ivar++) {
484 
485  // drop one variable (this is ONLY used for internal variable ranking !)
486  if ((Int_t)ivar == fDropVariable) continue;
487 
488  Double_t x[2] = { vs(ivar), vb(ivar) };
489 
490  for (UInt_t itype=0; itype < 2; itype++) {
491 
492  // verify limits
493  if (x[itype] >= (*fPDFSig)[ivar]->GetXmax()) x[itype] = (*fPDFSig)[ivar]->GetXmax() - 1.0e-10;
494  else if (x[itype] < (*fPDFSig)[ivar]->GetXmin()) x[itype] = (*fPDFSig)[ivar]->GetXmin();
495 
496  // find corresponding histogram from cached indices
497  PDF* pdf = (itype == 0) ? (*fPDFSig)[ivar] : (*fPDFBgd)[ivar];
498  if (pdf == 0) Log() << kFATAL << "<GetMvaValue> Reference histograms don't exist" << Endl;
499  TH1* hist = pdf->GetPDFHist();
500 
501  // interpolate linearly between adjacent bins
502  // this is not useful for discrete variables
503  Int_t bin = hist->FindBin(x[itype]);
504 
505  // **** POTENTIAL BUG: PREFORMANCE IS WORSE WHEN USING TRUE TYPE ***
506  // ==> commented out at present
507  if ((*fPDFSig)[ivar]->GetInterpolMethod() == TMVA::PDF::kSpline0 ||
508  DataInfo().GetVariableInfo(ivar).GetVarType() == 'N') {
509  p = TMath::Max( hist->GetBinContent(bin), fEpsilon );
510  } else { // splined PDF
511  Int_t nextbin = bin;
512  if ((x[itype] > hist->GetBinCenter(bin) && bin != hist->GetNbinsX()) || bin == 1)
513  nextbin++;
514  else
515  nextbin--;
516 
517 
518  Double_t dx = hist->GetBinCenter(bin) - hist->GetBinCenter(nextbin);
519  Double_t dy = hist->GetBinContent(bin) - hist->GetBinContent(nextbin);
520  Double_t like = hist->GetBinContent(bin) + (x[itype] - hist->GetBinCenter(bin)) * dy/dx;
521 
522  p = TMath::Max( like, fEpsilon );
523  }
524 
525  if (itype == 0) ps *= p;
526  else pb *= p;
527  }
528  }
529 
530  // the likelihood ratio (transform it ?)
531  return TransformLikelihoodOutput( ps, pb );
532 }
533 
534 ////////////////////////////////////////////////////////////////////////////////
535 /// returns transformed or non-transformed output
536 
538 {
539  if (ps < fEpsilon) ps = fEpsilon;
540  if (pb < fEpsilon) pb = fEpsilon;
541  Double_t r = ps/(ps + pb);
542  if (r >= 1.0) r = 1. - 1.e-15;
543 
544  if (fTransformLikelihoodOutput) {
545  // inverse Fermi function
546 
547  // sanity check
548  if (r <= 0.0) r = fEpsilon;
549  else if (r >= 1.0) r = 1. - 1.e-15;
550 
551  Double_t tau = 15.0;
552  r = - TMath::Log(1.0/r - 1.0)/tau;
553  }
554 
555  return r;
556 }
557 
558 ////////////////////////////////////////////////////////////////////////////////
559 /// write options to stream
560 
561 void TMVA::MethodLikelihood::WriteOptionsToStream( std::ostream& o, const TString& prefix ) const
562 {
564 
565  // writing the options defined for the different pdfs
566  if (fDefaultPDFLik != 0) {
567  o << prefix << std::endl << prefix << "#Default Likelihood PDF Options:" << std::endl << prefix << std::endl;
568  fDefaultPDFLik->WriteOptionsToStream( o, prefix );
569  }
570  for (UInt_t ivar = 0; ivar < fPDFSig->size(); ivar++) {
571  if ((*fPDFSig)[ivar] != 0) {
572  o << prefix << std::endl << prefix << Form("#Signal[%d] Likelihood PDF Options:",ivar) << std::endl << prefix << std::endl;
573  (*fPDFSig)[ivar]->WriteOptionsToStream( o, prefix );
574  }
575  if ((*fPDFBgd)[ivar] != 0) {
576  o << prefix << std::endl << prefix << "#Background[%d] Likelihood PDF Options:" << std::endl << prefix << std::endl;
577  (*fPDFBgd)[ivar]->WriteOptionsToStream( o, prefix );
578  }
579  }
580 }
581 
582 ////////////////////////////////////////////////////////////////////////////////
583 /// write weights to XML
584 
585 void TMVA::MethodLikelihood::AddWeightsXMLTo( void* parent ) const
586 {
587  void* wght = gTools().AddChild(parent, "Weights");
588  gTools().AddAttr(wght, "NVariables", GetNvar());
589  gTools().AddAttr(wght, "NClasses", 2);
590  void* pdfwrap;
591  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
592  if ( (*fPDFSig)[ivar]==0 || (*fPDFBgd)[ivar]==0 )
593  Log() << kFATAL << "Reference histograms for variable " << ivar
594  << " don't exist, can't write it to weight file" << Endl;
595  pdfwrap = gTools().AddChild(wght, "PDFDescriptor");
596  gTools().AddAttr(pdfwrap, "VarIndex", ivar);
597  gTools().AddAttr(pdfwrap, "ClassIndex", 0);
598  (*fPDFSig)[ivar]->AddXMLTo(pdfwrap);
599  pdfwrap = gTools().AddChild(wght, "PDFDescriptor");
600  gTools().AddAttr(pdfwrap, "VarIndex", ivar);
601  gTools().AddAttr(pdfwrap, "ClassIndex", 1);
602  (*fPDFBgd)[ivar]->AddXMLTo(pdfwrap);
603  }
604 }
605 
606 ////////////////////////////////////////////////////////////////////////////////
607 /// computes ranking of input variables
608 
610 {
611  // create the ranking object
612  if (fRanking) delete fRanking;
613  fRanking = new Ranking( GetName(), "Delta Separation" );
614 
615  Double_t sepRef = -1, sep = -1;
616  for (Int_t ivar=-1; ivar<(Int_t)GetNvar(); ivar++) {
617 
618  // this variable should not be used
619  fDropVariable = ivar;
620 
621  TString nameS = Form( "rS_%i", ivar+1 );
622  TString nameB = Form( "rB_%i", ivar+1 );
623  TH1* rS = new TH1F( nameS, nameS, 80, 0, 1 );
624  TH1* rB = new TH1F( nameB, nameB, 80, 0, 1 );
625 
626  // the event loop
627  for (Int_t ievt=0; ievt<Data()->GetNTrainingEvents(); ievt++) {
628 
629  const Event* origEv = Data()->GetEvent(ievt);
630  GetTransformationHandler().SetTransformationReferenceClass( origEv->GetClass() );
631  const Event* ev = GetTransformationHandler().Transform(Data()->GetEvent(ievt));
632 
633  Double_t lk = this->GetMvaValue();
634  Double_t w = ev->GetWeight();
635  if (DataInfo().IsSignal(ev)) rS->Fill( lk, w );
636  else rB->Fill( lk, w );
637  }
638 
639  // compute separation
640  sep = TMVA::gTools().GetSeparation( rS, rB );
641  if (ivar == -1) sepRef = sep;
642  sep = sepRef - sep;
643 
644  // don't need these histograms anymore
645  delete rS;
646  delete rB;
647 
648  if (ivar >= 0) fRanking->AddRank( Rank( DataInfo().GetVariableInfo(ivar).GetInternalName(), sep ) );
649  }
650 
651  fDropVariable = -1;
652 
653  return fRanking;
654 }
655 
656 ////////////////////////////////////////////////////////////////////////////////
657 /// write reference PDFs to ROOT file
658 
660 {
661  TString pname = "PDF_";
662  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
663  (*fPDFSig)[ivar]->Write( pname + GetInputVar( ivar ) + "_S" );
664  (*fPDFBgd)[ivar]->Write( pname + GetInputVar( ivar ) + "_B" );
665  }
666 }
667 ////////////////////////////////////////////////////////////////////////////////
668 /// read weights from XML
669 
671 {
672  TString pname = "PDF_";
673  Bool_t addDirStatus = TH1::AddDirectoryStatus();
674  TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
675  UInt_t nvars=0;
676  gTools().ReadAttr(wghtnode, "NVariables",nvars);
677  void* descnode = gTools().GetChild(wghtnode);
678  for (UInt_t ivar=0; ivar<nvars; ivar++){
679  void* pdfnode = gTools().GetChild(descnode);
680  Log() << kINFO << "Reading signal and background PDF for variable: " << GetInputVar( ivar ) << Endl;
681  if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
682  if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
683  (*fPDFSig)[ivar] = new PDF( GetInputVar( ivar ) + " PDF Sig" );
684  (*fPDFBgd)[ivar] = new PDF( GetInputVar( ivar ) + " PDF Bkg" );
685  (*fPDFSig)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
686  (*fPDFBgd)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
687  (*(*fPDFSig)[ivar]).ReadXML(pdfnode);
688  descnode = gTools().GetNextChild(descnode);
689  pdfnode = gTools().GetChild(descnode);
690  (*(*fPDFBgd)[ivar]).ReadXML(pdfnode);
691  descnode = gTools().GetNextChild(descnode);
692  }
693  TH1::AddDirectory(addDirStatus);
694 }
695 ////////////////////////////////////////////////////////////////////////////////
696 /// read weight info from file
697 /// nothing to do for this method
698 
700 {
701  TString pname = "PDF_";
702  Bool_t addDirStatus = TH1::AddDirectoryStatus();
703  TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
704  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
705  Log() << kINFO << "Reading signal and background PDF for variable: " << GetInputVar( ivar ) << Endl;
706  if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
707  if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
708  (*fPDFSig)[ivar] = new PDF(GetInputVar( ivar ) + " PDF Sig" );
709  (*fPDFBgd)[ivar] = new PDF(GetInputVar( ivar ) + " PDF Bkg");
710  (*fPDFSig)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
711  (*fPDFBgd)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
712  istr >> *(*fPDFSig)[ivar];
713  istr >> *(*fPDFBgd)[ivar];
714  }
715  TH1::AddDirectory(addDirStatus);
716 }
717 
718 ////////////////////////////////////////////////////////////////////////////////
719 /// read reference PDF from ROOT file
720 
722 {
723  TString pname = "PDF_";
724  Bool_t addDirStatus = TH1::AddDirectoryStatus();
725  TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
726  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
727  (*fPDFSig)[ivar] = (TMVA::PDF*)rf.Get( Form( "PDF_%s_S", GetInputVar( ivar ).Data() ) );
728  (*fPDFBgd)[ivar] = (TMVA::PDF*)rf.Get( Form( "PDF_%s_B", GetInputVar( ivar ).Data() ) );
729  }
730  TH1::AddDirectory(addDirStatus);
731 }
732 
733 ////////////////////////////////////////////////////////////////////////////////
734 /// write histograms and PDFs to file for monitoring purposes
735 
737 {
738  Log() << kINFO << "Write monitoring histograms to file: " << BaseDir()->GetPath() << Endl;
739  BaseDir()->cd();
740 
741  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
742  (*fHistSig)[ivar]->Write();
743  (*fHistBgd)[ivar]->Write();
744  if ((*fHistSig_smooth)[ivar] != 0) (*fHistSig_smooth)[ivar]->Write();
745  if ((*fHistBgd_smooth)[ivar] != 0) (*fHistBgd_smooth)[ivar]->Write();
746  (*fPDFSig)[ivar]->GetPDFHist()->Write();
747  (*fPDFBgd)[ivar]->GetPDFHist()->Write();
748 
749  if ((*fPDFSig)[ivar]->GetNSmoothHist() != 0) (*fPDFSig)[ivar]->GetNSmoothHist()->Write();
750  if ((*fPDFBgd)[ivar]->GetNSmoothHist() != 0) (*fPDFBgd)[ivar]->GetNSmoothHist()->Write();
751 
752  // add special plots to check the smoothing in the GetVal method
753  Float_t xmin=((*fPDFSig)[ivar]->GetPDFHist()->GetXaxis())->GetXmin();
754  Float_t xmax=((*fPDFSig)[ivar]->GetPDFHist()->GetXaxis())->GetXmax();
755  TH1F* mm = new TH1F( (*fInputVars)[ivar]+"_additional_check",
756  (*fInputVars)[ivar]+"_additional_check", 15000, xmin, xmax );
757  Double_t intBin = (xmax-xmin)/15000;
758  for (Int_t bin=0; bin < 15000; bin++) {
759  Double_t x = (bin + 0.5)*intBin + xmin;
760  mm->SetBinContent(bin+1 ,(*fPDFSig)[ivar]->GetVal(x));
761  }
762  mm->Write();
763 
764  // ---------- create cloned low-binned histogram for comparison in macros (mainly necessary for KDE)
765  TH1* h[2] = { (*fHistSig)[ivar], (*fHistBgd)[ivar] };
766  for (UInt_t i=0; i<2; i++) {
767  TH1* hclone = (TH1F*)h[i]->Clone( TString(h[i]->GetName()) + "_nice" );
768  hclone->SetName ( TString(h[i]->GetName()) + "_nice" );
769  hclone->SetTitle( TString(h[i]->GetTitle()) + "" );
770  if (hclone->GetNbinsX() > 100) {
771  Int_t resFactor = 5;
772  hclone->Rebin( resFactor );
773  hclone->Scale( 1.0/resFactor );
774  }
775  hclone->Write();
776  }
777  // ----------
778  }
779 }
780 
781 ////////////////////////////////////////////////////////////////////////////////
782 /// write specific header of the classifier (mostly include files)
783 
784 void TMVA::MethodLikelihood::MakeClassSpecificHeader( std::ostream& fout, const TString& ) const
785 {
786  fout << "#include <math.h>" << std::endl;
787  fout << "#include <cstdlib>" << std::endl;
788 }
789 
790 ////////////////////////////////////////////////////////////////////////////////
791 /// write specific classifier response
792 
793 void TMVA::MethodLikelihood::MakeClassSpecific( std::ostream& fout, const TString& className ) const
794 {
795  Int_t dp = fout.precision();
796  fout << " double fEpsilon;" << std::endl;
797 
798  Int_t * nbin = new Int_t[GetNvar()];
799 
800  Int_t nbinMax=-1;
801  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
802  nbin[ivar]=(*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX();
803  if (nbin[ivar] > nbinMax) nbinMax=nbin[ivar];
804  }
805 
806  fout << " static float fRefS[][" << nbinMax << "]; "
807  << "// signal reference vector [nvars][max_nbins]" << std::endl;
808  fout << " static float fRefB[][" << nbinMax << "]; "
809  << "// backgr reference vector [nvars][max_nbins]" << std::endl << std::endl;
810  fout << "// if a variable has its PDF encoded as a spline0 --> treat it like an Integer valued one" <<std::endl;
811  fout << " bool fHasDiscretPDF[" << GetNvar() <<"]; "<< std::endl;
812  fout << " int fNbin[" << GetNvar() << "]; "
813  << "// number of bins (discrete variables may have less bins)" << std::endl;
814  fout << " double fHistMin[" << GetNvar() << "]; " << std::endl;
815  fout << " double fHistMax[" << GetNvar() << "]; " << std::endl;
816 
817  fout << " double TransformLikelihoodOutput( double, double ) const;" << std::endl;
818  fout << "};" << std::endl;
819  fout << "" << std::endl;
820  fout << "inline void " << className << "::Initialize() " << std::endl;
821  fout << "{" << std::endl;
822  fout << " fEpsilon = " << fEpsilon << ";" << std::endl;
823  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
824  fout << " fNbin[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() << ";" << std::endl;
825  fout << " fHistMin[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetXaxis()->GetXmin() << ";" << std::endl;
826  fout << " fHistMax[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetXaxis()->GetXmax() << ";" << std::endl;
827  // sanity check (for previous code lines)
828  if ((((*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() != nbin[ivar] ||
829  (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX() != nbin[ivar])
830  ) ||
831  (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() != (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX()) {
832  Log() << kFATAL << "<MakeClassSpecific> Mismatch in binning of variable "
833  << "\"" << GetOriginalVarName(ivar) << "\" of type: \'" << DataInfo().GetVariableInfo(ivar).GetVarType()
834  << "\' : "
835  << "nxS = " << (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() << ", "
836  << "nxB = " << (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX()
837  << " while we expect " << nbin[ivar]
838  << Endl;
839  }
840  }
841  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
842  if ((*fPDFSig)[ivar]->GetInterpolMethod() == TMVA::PDF::kSpline0)
843  fout << " fHasDiscretPDF[" << ivar <<"] = true; " << std::endl;
844  else
845  fout << " fHasDiscretPDF[" << ivar <<"] = false; " << std::endl;
846  }
847 
848  fout << "}" << std::endl << std::endl;
849 
850  fout << "inline double " << className
851  << "::GetMvaValue__( const std::vector<double>& inputValues ) const" << std::endl;
852  fout << "{" << std::endl;
853  fout << " double ps(1), pb(1);" << std::endl;
854  fout << " std::vector<double> inputValuesSig = inputValues;" << std::endl;
855  fout << " std::vector<double> inputValuesBgd = inputValues;" << std::endl;
856  if (GetTransformationHandler().GetTransformationList().GetSize() != 0) {
857  fout << " Transform(inputValuesSig,0);" << std::endl;
858  fout << " Transform(inputValuesBgd,1);" << std::endl;
859  }
860  fout << " for (size_t ivar = 0; ivar < GetNvar(); ivar++) {" << std::endl;
861  fout << std::endl;
862  fout << " // dummy at present... will be used for variable transforms" << std::endl;
863  fout << " double x[2] = { inputValuesSig[ivar], inputValuesBgd[ivar] };" << std::endl;
864  fout << std::endl;
865  fout << " for (int itype=0; itype < 2; itype++) {" << std::endl;
866  fout << std::endl;
867  fout << " // interpolate linearly between adjacent bins" << std::endl;
868  fout << " // this is not useful for discrete variables (or forced Spline0)" << std::endl;
869  fout << " int bin = int((x[itype] - fHistMin[ivar])/(fHistMax[ivar] - fHistMin[ivar])*fNbin[ivar]) + 0;" << std::endl;
870  fout << std::endl;
871  fout << " // since the test data sample is in general different from the training sample" << std::endl;
872  fout << " // it can happen that the min/max of the training sample are trespassed --> correct this" << std::endl;
873  fout << " if (bin < 0) {" << std::endl;
874  fout << " bin = 0;" << std::endl;
875  fout << " x[itype] = fHistMin[ivar];" << std::endl;
876  fout << " }" << std::endl;
877  fout << " else if (bin >= fNbin[ivar]) {" << std::endl;
878  fout << " bin = fNbin[ivar]-1;" << std::endl;
879  fout << " x[itype] = fHistMax[ivar];" << std::endl;
880  fout << " }" << std::endl;
881  fout << std::endl;
882  fout << " // find corresponding histogram from cached indices" << std::endl;
883  fout << " float ref = (itype == 0) ? fRefS[ivar][bin] : fRefB[ivar][bin];" << std::endl;
884  fout << std::endl;
885  fout << " // sanity check" << std::endl;
886  fout << " if (ref < 0) {" << std::endl;
887  fout << " std::cout << \"Fatal error in " << className
888  << ": bin entry < 0 ==> abort\" << std::endl;" << std::endl;
889  fout << " std::exit(1);" << std::endl;
890  fout << " }" << std::endl;
891  fout << std::endl;
892  fout << " double p = ref;" << std::endl;
893  fout << std::endl;
894  fout << " if (GetType(ivar) != 'I' && !fHasDiscretPDF[ivar]) {" << std::endl;
895  fout << " float bincenter = (bin + 0.5)/fNbin[ivar]*(fHistMax[ivar] - fHistMin[ivar]) + fHistMin[ivar];" << std::endl;
896  fout << " int nextbin = bin;" << std::endl;
897  fout << " if ((x[itype] > bincenter && bin != fNbin[ivar]-1) || bin == 0) " << std::endl;
898  fout << " nextbin++;" << std::endl;
899  fout << " else" << std::endl;
900  fout << " nextbin--; " << std::endl;
901  fout << std::endl;
902  fout << " double refnext = (itype == 0) ? fRefS[ivar][nextbin] : fRefB[ivar][nextbin];" << std::endl;
903  fout << " float nextbincenter = (nextbin + 0.5)/fNbin[ivar]*(fHistMax[ivar] - fHistMin[ivar]) + fHistMin[ivar];" << std::endl;
904  fout << std::endl;
905  fout << " double dx = bincenter - nextbincenter;" << std::endl;
906  fout << " double dy = ref - refnext;" << std::endl;
907  fout << " p += (x[itype] - bincenter) * dy/dx;" << std::endl;
908  fout << " }" << std::endl;
909  fout << std::endl;
910  fout << " if (p < fEpsilon) p = fEpsilon; // avoid zero response" << std::endl;
911  fout << std::endl;
912  fout << " if (itype == 0) ps *= p;" << std::endl;
913  fout << " else pb *= p;" << std::endl;
914  fout << " } " << std::endl;
915  fout << " } " << std::endl;
916  fout << std::endl;
917  fout << " // the likelihood ratio (transform it ?)" << std::endl;
918  fout << " return TransformLikelihoodOutput( ps, pb ); " << std::endl;
919  fout << "}" << std::endl << std::endl;
920 
921  fout << "inline double " << className << "::TransformLikelihoodOutput( double ps, double pb ) const" << std::endl;
922  fout << "{" << std::endl;
923  fout << " // returns transformed or non-transformed output" << std::endl;
924  fout << " if (ps < fEpsilon) ps = fEpsilon;" << std::endl;
925  fout << " if (pb < fEpsilon) pb = fEpsilon;" << std::endl;
926  fout << " double r = ps/(ps + pb);" << std::endl;
927  fout << " if (r >= 1.0) r = 1. - 1.e-15;" << std::endl;
928  fout << std::endl;
929  fout << " if (" << (fTransformLikelihoodOutput ? "true" : "false") << ") {" << std::endl;
930  fout << " // inverse Fermi function" << std::endl;
931  fout << std::endl;
932  fout << " // sanity check" << std::endl;
933  fout << " if (r <= 0.0) r = fEpsilon;" << std::endl;
934  fout << " else if (r >= 1.0) r = 1. - 1.e-15;" << std::endl;
935  fout << std::endl;
936  fout << " double tau = 15.0;" << std::endl;
937  fout << " r = - log(1.0/r - 1.0)/tau;" << std::endl;
938  fout << " }" << std::endl;
939  fout << std::endl;
940  fout << " return r;" << std::endl;
941  fout << "}" << std::endl;
942  fout << std::endl;
943 
944  fout << "// Clean up" << std::endl;
945  fout << "inline void " << className << "::Clear() " << std::endl;
946  fout << "{" << std::endl;
947  fout << " // nothing to clear" << std::endl;
948  fout << "}" << std::endl << std::endl;
949 
950  fout << "// signal map" << std::endl;
951  fout << "float " << className << "::fRefS[][" << nbinMax << "] = " << std::endl;
952  fout << "{ " << std::endl;
953  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
954  fout << " { ";
955  for (Int_t ibin=1; ibin<=nbinMax; ibin++) {
956  if (ibin-1 < nbin[ivar])
957  fout << (*fPDFSig)[ivar]->GetPDFHist()->GetBinContent(ibin);
958  else
959  fout << -1;
960 
961  if (ibin < nbinMax) fout << ", ";
962  }
963  fout << " }, " << std::endl;
964  }
965  fout << "}; " << std::endl;
966  fout << std::endl;
967 
968  fout << "// background map" << std::endl;
969  fout << "float " << className << "::fRefB[][" << nbinMax << "] = " << std::endl;
970  fout << "{ " << std::endl;
971  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
972  fout << " { ";
973  fout << std::setprecision(8);
974  for (Int_t ibin=1; ibin<=nbinMax; ibin++) {
975  if (ibin-1 < nbin[ivar])
976  fout << (*fPDFBgd)[ivar]->GetPDFHist()->GetBinContent(ibin);
977  else
978  fout << -1;
979 
980  if (ibin < nbinMax) fout << ", ";
981  }
982  fout << " }, " << std::endl;
983  }
984  fout << "}; " << std::endl;
985  fout << std::endl;
986  fout << std::setprecision(dp);
987 
988  delete[] nbin;
989 }
990 
991 ////////////////////////////////////////////////////////////////////////////////
992 /// get help message text
993 ///
994 /// typical length of text line:
995 /// "|--------------------------------------------------------------|"
996 
998 {
999  Log() << Endl;
1000  Log() << gTools().Color("bold") << "--- Short description:" << gTools().Color("reset") << Endl;
1001  Log() << Endl;
1002  Log() << "The maximum-likelihood classifier models the data with probability " << Endl;
1003  Log() << "density functions (PDF) reproducing the signal and background" << Endl;
1004  Log() << "distributions of the input variables. Correlations among the " << Endl;
1005  Log() << "variables are ignored." << Endl;
1006  Log() << Endl;
1007  Log() << gTools().Color("bold") << "--- Performance optimisation:" << gTools().Color("reset") << Endl;
1008  Log() << Endl;
1009  Log() << "Required for good performance are decorrelated input variables" << Endl;
1010  Log() << "(PCA transformation via the option \"VarTransform=Decorrelate\"" << Endl;
1011  Log() << "may be tried). Irreducible non-linear correlations may be reduced" << Endl;
1012  Log() << "by precombining strongly correlated input variables, or by simply" << Endl;
1013  Log() << "removing one of the variables." << Endl;
1014  Log() << Endl;
1015  Log() << gTools().Color("bold") << "--- Performance tuning via configuration options:" << gTools().Color("reset") << Endl;
1016  Log() << Endl;
1017  Log() << "High fidelity PDF estimates are mandatory, i.e., sufficient training " << Endl;
1018  Log() << "statistics is required to populate the tails of the distributions" << Endl;
1019  Log() << "It would be a surprise if the default Spline or KDE kernel parameters" << Endl;
1020  Log() << "provide a satisfying fit to the data. The user is advised to properly" << Endl;
1021  Log() << "tune the events per bin and smooth options in the spline cases" << Endl;
1022  Log() << "individually per variable. If the KDE kernel is used, the adaptive" << Endl;
1023  Log() << "Gaussian kernel may lead to artefacts, so please always also try" << Endl;
1024  Log() << "the non-adaptive one." << Endl;
1025  Log() << "" << Endl;
1026  Log() << "All tuning parameters must be adjusted individually for each input" << Endl;
1027  Log() << "variable!" << Endl;
1028 }
1029 
virtual Int_t Write(const char *name=0, Int_t option=0, Int_t bufsize=0)
Write this object to the current directory.
Definition: TObject.cxx:823
virtual Int_t FindBin(Double_t x, Double_t y=0, Double_t z=0)
Return Global bin number corresponding to x,y,z.
Definition: TH1.cxx:3478
void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
virtual void Scale(Double_t c1=1, Option_t *option="")
Multiply this histogram by a constant c1.
Definition: TH1.cxx:6174
virtual Int_t Fill(Double_t x)
Increment bin with abscissa X by 1.
Definition: TH1.cxx:3159
void WriteWeightsToStream(TFile &rf) const
write reference PDFs to ROOT file
ClassImp(TMVA::MethodLikelihood) TMVA
standard constructor
float xmin
Definition: THbookFile.cxx:93
MsgLogger & Endl(MsgLogger &ml)
Definition: MsgLogger.h:162
virtual Double_t GetBinContent(Int_t bin) const
Return content of bin number bin.
Definition: TH1.cxx:4629
virtual void WriteOptionsToStream(std::ostream &o, const TString &prefix) const
write options to stream
Double_t Log(Double_t x)
Definition: TMath.h:526
float Float_t
Definition: RtypesCore.h:53
void Train()
create reference distributions (PDFs) from signal and background events: fill histograms and smooth t...
TH1 * h
Definition: legend2.C:5
virtual ~MethodLikelihood()
destructor
tuple pname
Definition: tree.py:131
A ROOT file is a suite of consecutive data records (TKey instances) with a well defined format...
Definition: TFile.h:45
EAnalysisType
Definition: Types.h:124
virtual TObject * Get(const char *namecycle)
Return pointer to object identified by namecycle.
Basic string class.
Definition: TString.h:137
static Bool_t AddDirectoryStatus()
static function: cannot be inlined on Windows/NT
Definition: TH1.cxx:709
1-D histogram with a float per channel (see TH1 documentation)}
Definition: TH1.h:570
void WriteMonitoringHistosToFile() const
write histograms and PDFs to file for monitoring purposes
Short_t Min(Short_t a, Short_t b)
Definition: TMathBase.h:170
int Int_t
Definition: RtypesCore.h:41
bool Bool_t
Definition: RtypesCore.h:59
const Bool_t kFALSE
Definition: Rtypes.h:92
Double_t GetWeight() const
return the event weight - depending on whether the flag IgnoreNegWeightsInTraining is or not...
Definition: Event.cxx:376
int nbins[3]
virtual Int_t GetNbinsX() const
Definition: TH1.h:296
void AddAttr(void *node, const char *, const T &value, Int_t precision=16)
Definition: Tools.h:308
void WriteOptionsToStream(std::ostream &o, const TString &prefix) const
write options to output stream (e.g. in writing the MVA weight files
void * AddChild(void *parent, const char *childname, const char *content=0, bool isRootNode=false)
add child node
Definition: Tools.cxx:1134
static void AddDirectory(Bool_t add=kTRUE)
Sets the flag controlling the automatic add of histograms in memory.
Definition: TH1.cxx:1231
const char * Data() const
Definition: TString.h:349
Tools & gTools()
Definition: Tools.cxx:79
Double_t x[n]
Definition: legend1.C:17
void ReadWeightsFromXML(void *wghtnode)
read weights from XML
void * GetChild(void *parent, const char *childname=0)
get child node
Definition: Tools.cxx:1158
std::vector< std::vector< double > > Data
Definition: PDF.h:71
Double_t TransformLikelihoodOutput(Double_t ps, Double_t pb) const
returns transformed or non-transformed output
virtual Double_t GetBinCenter(Int_t bin) const
return bin center for 1D historam Better to use h1.GetXaxis().GetBinCenter(bin)
Definition: TH1.cxx:8470
ROOT::R::TRInterface & r
Definition: Object.C:4
virtual void SetBinContent(Int_t bin, Double_t content)
Set bin content see convention for numbering bins in TH1::GetBin In case the bin number is greater th...
Definition: TH1.cxx:8543
virtual TH1 * Rebin(Int_t ngroup=2, const char *newname="", const Double_t *xbins=0)
Rebin this histogram.
Definition: TH1.cxx:5859
void GetHelpMessage() const
get help message text
MethodLikelihood(const TString &jobName, const TString &methodTitle, DataSetInfo &theData, const TString &theOption="", TDirectory *theTargetDir=0)
unsigned int UInt_t
Definition: RtypesCore.h:42
const Ranking * CreateRanking()
computes ranking of input variables
char * Form(const char *fmt,...)
Double_t GetSeparation(TH1 *S, TH1 *B) const
compute "separation" defined as <s2> = (1/2) Int_-oo..+oo { (S(x) - B(x))^2/(S(x) + B(x)) dx } ...
Definition: Tools.cxx:136
tuple w
Definition: qtexample.py:51
void ReadAttr(void *node, const char *, T &value)
Definition: Tools.h:295
float xmax
Definition: THbookFile.cxx:93
virtual Bool_t HasAnalysisType(Types::EAnalysisType type, UInt_t numberClasses, UInt_t numberTargets)
FDA can handle classification with 2 classes.
virtual void SetName(const char *name)
Change the name of this histogram.
Definition: TH1.cxx:8288
void MakeClassSpecific(std::ostream &, const TString &) const
write specific classifier response
double Double_t
Definition: RtypesCore.h:55
void DeclareOptions()
define the options (their key words) that can be set in the option string TransformOutput <bool> tran...
Describe directory structure in memory.
Definition: TDirectory.h:44
int type
Definition: TGX11.cxx:120
void * GetNextChild(void *prevchild, const char *childname=0)
XML helpers.
Definition: Tools.cxx:1170
Float_t GetValue(UInt_t ivar) const
return value of i'th variable
Definition: Event.cxx:231
The TH1 histogram class.
Definition: TH1.h:80
UInt_t GetClass() const
Definition: Event.h:86
void AddWeightsXMLTo(void *parent) const
write weights to XML
void Init()
default initialisation called by all constructors
const TString & Color(const TString &)
human readable color strings
Definition: Tools.cxx:837
void ReadWeightsFromStream(std::istream &istr)
read weight info from file nothing to do for this method
#define REGISTER_METHOD(CLASS)
for example
TH1 * GetPDFHist() const
Definition: PDF.h:100
virtual void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
Definition: MethodBase.cxx:606
Short_t Max(Short_t a, Short_t b)
Definition: TMathBase.h:202
void MakeClassSpecificHeader(std::ostream &, const TString &="") const
write specific header of the classifier (mostly include files)
#define NULL
Definition: Rtypes.h:82
virtual void SetTitle(const char *title)
Change (i.e.
Definition: TH1.cxx:6268
const Bool_t kTRUE
Definition: Rtypes.h:91
float value
Definition: math.cpp:443
Int_t Nint(T x)
Definition: TMath.h:480
Definition: math.cpp:60
Double_t GetMvaValue(Double_t *err=0, Double_t *errUpper=0)
returns the likelihood estimator for signal fill a new Likelihood branch into the testTree ...