ROOT  6.07/01
Reference Guide
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Friends Macros Groups Pages
MethodFisher.cxx
Go to the documentation of this file.
1 // @(#)root/tmva $Id$
2 // Author: Andreas Hoecker, Xavier Prudent, Joerg Stelzer, Helge Voss, Kai Voss
3 
4 /**********************************************************************************
5  * Project: TMVA - a Root-integrated toolkit for multivariate Data analysis *
6  * Package: TMVA *
7  * Class : MethodFisher *
8  * Web : http://tmva.sourceforge.net *
9  * *
10  * Description: *
11  * Implementation (see header for description) *
12  * *
13  * Original author of this Fisher-Discriminant implementation: *
14  * Andre Gaidot, CEA-France; *
15  * (Translation from FORTRAN) *
16  * *
17  * Authors (alphabetical): *
18  * Andreas Hoecker <Andreas.Hocker@cern.ch> - CERN, Switzerland *
19  * Xavier Prudent <prudent@lapp.in2p3.fr> - LAPP, France *
20  * Helge Voss <Helge.Voss@cern.ch> - MPI-K Heidelberg, Germany *
21  * Kai Voss <Kai.Voss@cern.ch> - U. of Victoria, Canada *
22  * *
23  * Copyright (c) 2005: *
24  * CERN, Switzerland *
25  * U. of Victoria, Canada *
26  * MPI-K Heidelberg, Germany *
27  * LAPP, Annecy, France *
28  * *
29  * Redistribution and use in source and binary forms, with or without *
30  * modification, are permitted according to the terms listed in LICENSE *
31  * (http://tmva.sourceforge.net/LICENSE) *
32  **********************************************************************************/
33 
34 ////////////////////////////////////////////////////////////////////////////////
35 
36 /* Begin_Html
37  Fisher and Mahalanobis Discriminants (Linear Discriminant Analysis)
38 
39  <p>
40  In the method of Fisher discriminants event selection is performed
41  in a transformed variable space with zero linear correlations, by
42  distinguishing the mean values of the signal and background
43  distributions.<br></p>
44 
45  <p>
46  The linear discriminant analysis determines an axis in the (correlated)
47  hyperspace of the input variables
48  such that, when projecting the output classes (signal and background)
49  upon this axis, they are pushed as far as possible away from each other,
50  while events of a same class are confined in a close vicinity.
51  The linearity property of this method is reflected in the metric with
52  which "far apart" and "close vicinity" are determined: the covariance
53  matrix of the discriminant variable space.
54  </p>
55 
56  <p>
57  The classification of the events in signal and background classes
58  relies on the following characteristics (only): overall sample means,
59  <i><my:o>x</my:o><sub>i</sub></i>, for each input variable, <i>i</i>,
60  class-specific sample means, <i><my:o>x</my:o><sub>S(B),i</sub></i>,
61  and total covariance matrix <i>T<sub>ij</sub></i>. The covariance matrix
62  can be decomposed into the sum of a <i>within-</i> (<i>W<sub>ij</sub></i>)
63  and a <i>between-class</i> (<i>B<sub>ij</sub></i>) class matrix. They describe
64  the dispersion of events relative to the means of their own class (within-class
65  matrix), and relative to the overall sample means (between-class matrix).
66  The Fisher coefficients, <i>F<sub>i</sub></i>, are then given by <br>
67  <center>
68  <img vspace=6 src="gif/tmva_fisherC.gif" align="bottom" >
69  </center>
70  where in TMVA is set <i>N<sub>S</sub>=N<sub>B</sub></i>, so that the factor
71  in front of the sum simplifies to &frac12;.
72  The Fisher discriminant then reads<br>
73  <center>
74  <img vspace=6 src="gif/tmva_fisherD.gif" align="bottom" >
75  </center>
76  The offset <i>F</i><sub>0</sub> centers the sample mean of <i>x</i><sub>Fi</sub>
77  at zero. Instead of using the within-class matrix, the Mahalanobis variant
78  determines the Fisher coefficients as follows:<br>
79  <center>
80  <img vspace=6 src="gif/tmva_mahaC.gif" align="bottom" >
81  </center>
82  with resulting <i>x</i><sub>Ma</sub> that are very similar to the
83  <i>x</i><sub>Fi</sub>. <br></p>
84 
85  TMVA provides two outputs for the ranking of the input variables:<br><p></p>
86  <ul>
87  <li> <u>Fisher test:</u> the Fisher analysis aims at simultaneously maximising
88  the between-class separation, while minimising the within-class dispersion.
89  A useful measure of the discrimination power of a variable is hence given
90  by the diagonal quantity: <i>B<sub>ii</sub>/W<sub>ii</sub></i>.
91  </li>
92 
93  <li> <u>Discrimination power:</u> the value of the Fisher coefficient is a
94  measure of the discriminating power of a variable. The discrimination power
95  of set of input variables can therefore be measured by the scalar
96  <center>
97  <img vspace=6 src="gif/tmva_discpower.gif" align="bottom" >
98  </center>
99  </li>
100  </ul>
101  The corresponding numbers are printed on standard output.
102  End_Html */
103 //_______________________________________________________________________
104 
105 #include "TMVA/MethodFisher.h"
106 
107 #include <iomanip>
108 #include <cassert>
109 
110 #include "TMath.h"
111 #include "TMatrix.h"
112 #include "Riostream.h"
113 
114 #include "TMVA/ClassifierFactory.h"
115 #include "TMVA/DataSet.h"
116 #include "TMVA/DataSetInfo.h"
117 #include "TMVA/Event.h"
118 #include "TMVA/MsgLogger.h"
119 #include "TMVA/Ranking.h"
120 #include "TMVA/Tools.h"
122 #include "TMVA/Types.h"
124 
125 REGISTER_METHOD(Fisher)
126 
127 ClassImp(TMVA::MethodFisher);
128 
129 ////////////////////////////////////////////////////////////////////////////////
130 /// standard constructor for the "Fisher"
131 
132 TMVA::MethodFisher::MethodFisher( const TString& jobName,
133  const TString& methodTitle,
134  DataSetInfo& dsi,
135  const TString& theOption,
136  TDirectory* theTargetDir ) :
137  MethodBase( jobName, Types::kFisher, methodTitle, dsi, theOption, theTargetDir ),
138  fMeanMatx ( 0 ),
139  fTheMethod ( "Fisher" ),
140  fFisherMethod ( kFisher ),
141  fBetw ( 0 ),
142  fWith ( 0 ),
143  fCov ( 0 ),
144  fSumOfWeightsS( 0 ),
145  fSumOfWeightsB( 0 ),
146  fDiscrimPow ( 0 ),
147  fFisherCoeff ( 0 ),
148  fF0 ( 0 )
149 {
150 }
151 
152 ////////////////////////////////////////////////////////////////////////////////
153 /// constructor from weight file
154 
156  const TString& theWeightFile,
157  TDirectory* theTargetDir ) :
158  MethodBase( Types::kFisher, dsi, theWeightFile, theTargetDir ),
159  fMeanMatx ( 0 ),
160  fTheMethod ( "Fisher" ),
161  fFisherMethod ( kFisher ),
162  fBetw ( 0 ),
163  fWith ( 0 ),
164  fCov ( 0 ),
165  fSumOfWeightsS( 0 ),
166  fSumOfWeightsB( 0 ),
167  fDiscrimPow ( 0 ),
168  fFisherCoeff ( 0 ),
169  fF0 ( 0 )
170 {
171 }
172 
173 ////////////////////////////////////////////////////////////////////////////////
174 /// default initialization called by all constructors
175 
177 {
178  // allocate Fisher coefficients
179  fFisherCoeff = new std::vector<Double_t>( GetNvar() );
180 
181  // the minimum requirement to declare an event signal-like
182  SetSignalReferenceCut( 0.0 );
183 
184  // this is the preparation for training
185  InitMatrices();
186 }
187 
188 ////////////////////////////////////////////////////////////////////////////////
189 ///
190 /// MethodFisher options:
191 /// format and syntax of option string: "type"
192 /// where type is "Fisher" or "Mahalanobis"
193 ///
194 
196 {
197  DeclareOptionRef( fTheMethod = "Fisher", "Method", "Discrimination method" );
198  AddPreDefVal(TString("Fisher"));
199  AddPreDefVal(TString("Mahalanobis"));
200 }
201 
202 ////////////////////////////////////////////////////////////////////////////////
203 /// process user options
204 
206 {
207  if (fTheMethod == "Fisher" ) fFisherMethod = kFisher;
208  else fFisherMethod = kMahalanobis;
209 
210  // this is the preparation for training
211  InitMatrices();
212 }
213 
214 ////////////////////////////////////////////////////////////////////////////////
215 /// destructor
216 
218 {
219  if (fBetw ) { delete fBetw; fBetw = 0; }
220  if (fWith ) { delete fWith; fWith = 0; }
221  if (fCov ) { delete fCov; fCov = 0; }
222  if (fDiscrimPow ) { delete fDiscrimPow; fDiscrimPow = 0; }
223  if (fFisherCoeff) { delete fFisherCoeff; fFisherCoeff = 0; }
224 }
225 
226 ////////////////////////////////////////////////////////////////////////////////
227 /// Fisher can only handle classification with 2 classes
228 
230 {
231  if (type == Types::kClassification && numberClasses == 2) return kTRUE;
232  return kFALSE;
233 }
234 
235 ////////////////////////////////////////////////////////////////////////////////
236 /// computation of Fisher coefficients by series of matrix operations
237 
239 {
240  // get mean value of each variables for signal, backgd and signal+backgd
241  GetMean();
242 
243  // get the matrix of covariance 'within class'
244  GetCov_WithinClass();
245 
246  // get the matrix of covariance 'between class'
247  GetCov_BetweenClass();
248 
249  // get the matrix of covariance 'between class'
250  GetCov_Full();
251 
252  //--------------------------------------------------------------
253 
254  // get the Fisher coefficients
255  GetFisherCoeff();
256 
257  // get the discriminating power of each variables
258  GetDiscrimPower();
259 
260  // nice output
261  PrintCoefficients();
262 }
263 
264 ////////////////////////////////////////////////////////////////////////////////
265 /// returns the Fisher value (no fixed range)
266 
268 {
269  const Event * ev = GetEvent();
270  Double_t result = fF0;
271  for (UInt_t ivar=0; ivar<GetNvar(); ivar++)
272  result += (*fFisherCoeff)[ivar]*ev->GetValue(ivar);
273 
274  // cannot determine error
275  NoErrorCalc(err, errUpper);
276 
277  return result;
278 
279 }
280 
281 ////////////////////////////////////////////////////////////////////////////////
282 /// initializaton method; creates global matrices and vectors
283 
285 {
286  // average value of each variables for S, B, S+B
287  fMeanMatx = new TMatrixD( GetNvar(), 3 );
288 
289  // the covariance 'within class' and 'between class' matrices
290  fBetw = new TMatrixD( GetNvar(), GetNvar() );
291  fWith = new TMatrixD( GetNvar(), GetNvar() );
292  fCov = new TMatrixD( GetNvar(), GetNvar() );
293 
294  // discriminating power
295  fDiscrimPow = new std::vector<Double_t>( GetNvar() );
296 }
297 
298 ////////////////////////////////////////////////////////////////////////////////
299 /// compute mean values of variables in each sample, and the overall means
300 
302 {
303  // initialize internal sum-of-weights variables
304  fSumOfWeightsS = 0;
305  fSumOfWeightsB = 0;
306 
307  const UInt_t nvar = DataInfo().GetNVariables();
308 
309  // init vectors
310  Double_t* sumS = new Double_t[nvar];
311  Double_t* sumB = new Double_t[nvar];
312  for (UInt_t ivar=0; ivar<nvar; ivar++) { sumS[ivar] = sumB[ivar] = 0; }
313 
314  // compute sample means
315  for (Int_t ievt=0; ievt<Data()->GetNEvents(); ievt++) {
316 
317  // read the Training Event into "event"
318  const Event * ev = GetEvent(ievt);
319 
320  // sum of weights
321  Double_t weight = ev->GetWeight();
322  if (DataInfo().IsSignal(ev)) fSumOfWeightsS += weight;
323  else fSumOfWeightsB += weight;
324 
325  Double_t* sum = DataInfo().IsSignal(ev) ? sumS : sumB;
326 
327  for (UInt_t ivar=0; ivar<nvar; ivar++) sum[ivar] += ev->GetValue( ivar )*weight;
328  }
329 
330  for (UInt_t ivar=0; ivar<nvar; ivar++) {
331  (*fMeanMatx)( ivar, 2 ) = sumS[ivar];
332  (*fMeanMatx)( ivar, 0 ) = sumS[ivar]/fSumOfWeightsS;
333 
334  (*fMeanMatx)( ivar, 2 ) += sumB[ivar];
335  (*fMeanMatx)( ivar, 1 ) = sumB[ivar]/fSumOfWeightsB;
336 
337  // signal + background
338  (*fMeanMatx)( ivar, 2 ) /= (fSumOfWeightsS + fSumOfWeightsB);
339  }
340 
341  // fMeanMatx->Print();
342  delete [] sumS;
343  delete [] sumB;
344 }
345 
346 ////////////////////////////////////////////////////////////////////////////////
347 /// the matrix of covariance 'within class' reflects the dispersion of the
348 /// events relative to the center of gravity of their own class
349 
351 {
352  // assert required
353  assert( fSumOfWeightsS > 0 && fSumOfWeightsB > 0 );
354 
355  // product matrices (x-<x>)(y-<y>) where x;y are variables
356 
357  // init
358  const Int_t nvar = GetNvar();
359  const Int_t nvar2 = nvar*nvar;
360  Double_t *sumSig = new Double_t[nvar2];
361  Double_t *sumBgd = new Double_t[nvar2];
362  Double_t *xval = new Double_t[nvar];
363  memset(sumSig,0,nvar2*sizeof(Double_t));
364  memset(sumBgd,0,nvar2*sizeof(Double_t));
365 
366  // 'within class' covariance
367  for (Int_t ievt=0; ievt<Data()->GetNEvents(); ievt++) {
368 
369  // read the Training Event into "event"
370  const Event* ev = GetEvent(ievt);
371 
372  Double_t weight = ev->GetWeight(); // may ignore events with negative weights
373 
374  for (Int_t x=0; x<nvar; x++) xval[x] = ev->GetValue( x );
375  Int_t k=0;
376  for (Int_t x=0; x<nvar; x++) {
377  for (Int_t y=0; y<nvar; y++) {
378  if (DataInfo().IsSignal(ev)) {
379  Double_t v = ( (xval[x] - (*fMeanMatx)(x, 0))*(xval[y] - (*fMeanMatx)(y, 0)) )*weight;
380  sumSig[k] += v;
381  }else{
382  Double_t v = ( (xval[x] - (*fMeanMatx)(x, 1))*(xval[y] - (*fMeanMatx)(y, 1)) )*weight;
383  sumBgd[k] += v;
384  }
385  k++;
386  }
387  }
388  }
389  Int_t k=0;
390  for (Int_t x=0; x<nvar; x++) {
391  for (Int_t y=0; y<nvar; y++) {
392  //(*fWith)(x, y) = (sumSig[k] + sumBgd[k])/(fSumOfWeightsS + fSumOfWeightsB);
393  // HHV: I am still convinced that THIS is how it should be (below) However, while
394  // the old version corresponded so nicely with LD, the FIXED version does not, unless
395  // we agree to change LD. For LD, it is not "defined" to my knowledge how the weights
396  // are weighted, while it is clear how the "Within" matrix for Fisher should be calcuated
397  // (i.e. as seen below). In order to agree with the Fisher classifier, one would have to
398  // weigh signal and background such that they correspond to the same number of effective
399  // (weithed) events.
400  // THAT is NOT done currently, but just "event weights" are used.
401  (*fWith)(x, y) = sumSig[k]/fSumOfWeightsS + sumBgd[k]/fSumOfWeightsB;
402  k++;
403  }
404  }
405 
406  delete [] sumSig;
407  delete [] sumBgd;
408  delete [] xval;
409 }
410 
411 ////////////////////////////////////////////////////////////////////////////////
412 /// the matrix of covariance 'between class' reflects the dispersion of the
413 /// events of a class relative to the global center of gravity of all the class
414 /// hence the separation between classes
415 
417 {
418  // assert required
419  assert( fSumOfWeightsS > 0 && fSumOfWeightsB > 0);
420 
421  Double_t prodSig, prodBgd;
422 
423  for (UInt_t x=0; x<GetNvar(); x++) {
424  for (UInt_t y=0; y<GetNvar(); y++) {
425 
426  prodSig = ( ((*fMeanMatx)(x, 0) - (*fMeanMatx)(x, 2))*
427  ((*fMeanMatx)(y, 0) - (*fMeanMatx)(y, 2)) );
428  prodBgd = ( ((*fMeanMatx)(x, 1) - (*fMeanMatx)(x, 2))*
429  ((*fMeanMatx)(y, 1) - (*fMeanMatx)(y, 2)) );
430 
431  (*fBetw)(x, y) = (fSumOfWeightsS*prodSig + fSumOfWeightsB*prodBgd) / (fSumOfWeightsS + fSumOfWeightsB);
432  }
433  }
434 }
435 
436 ////////////////////////////////////////////////////////////////////////////////
437 /// compute full covariance matrix from sum of within and between matrices
438 
440 {
441  for (UInt_t x=0; x<GetNvar(); x++)
442  for (UInt_t y=0; y<GetNvar(); y++)
443  (*fCov)(x, y) = (*fWith)(x, y) + (*fBetw)(x, y);
444 }
445 
446 ////////////////////////////////////////////////////////////////////////////////
447 /// Fisher = Sum { [coeff]*[variables] }
448 ///
449 /// let Xs be the array of the mean values of variables for signal evts
450 /// let Xb be the array of the mean values of variables for backgd evts
451 /// let InvWith be the inverse matrix of the 'within class' correlation matrix
452 ///
453 /// then the array of Fisher coefficients is
454 /// [coeff] =sqrt(fNsig*fNbgd)/fNevt*transpose{Xs-Xb}*InvWith
455 
457 {
458  // assert required
459  assert( fSumOfWeightsS > 0 && fSumOfWeightsB > 0);
460 
461  // invert covariance matrix
462  TMatrixD* theMat = 0;
463  switch (GetFisherMethod()) {
464  case kFisher:
465  theMat = fWith;
466  break;
467  case kMahalanobis:
468  theMat = fCov;
469  break;
470  default:
471  Log() << kFATAL << "<GetFisherCoeff> undefined method" << GetFisherMethod() << Endl;
472  }
473 
474  TMatrixD invCov( *theMat );
475 
476  if ( TMath::Abs(invCov.Determinant()) < 10E-24 ) {
477  Log() << kWARNING << "<GetFisherCoeff> matrix is almost singular with deterninant="
478  << TMath::Abs(invCov.Determinant())
479  << " did you use the variables that are linear combinations or highly correlated?"
480  << Endl;
481  }
482  if ( TMath::Abs(invCov.Determinant()) < 10E-120 ) {
483  theMat->Print();
484  Log() << kFATAL << "<GetFisherCoeff> matrix is singular with determinant="
485  << TMath::Abs(invCov.Determinant())
486  << " did you use the variables that are linear combinations? \n"
487  << " do you any clue as to what went wrong in above printout of the covariance matrix? "
488  << Endl;
489  }
490 
491  invCov.Invert();
492 
493  // apply rescaling factor
494  Double_t xfact = TMath::Sqrt( fSumOfWeightsS*fSumOfWeightsB ) / (fSumOfWeightsS + fSumOfWeightsB);
495 
496  // compute difference of mean values
497  std::vector<Double_t> diffMeans( GetNvar() );
498  UInt_t ivar, jvar;
499  for (ivar=0; ivar<GetNvar(); ivar++) {
500  (*fFisherCoeff)[ivar] = 0;
501 
502  for (jvar=0; jvar<GetNvar(); jvar++) {
503  Double_t d = (*fMeanMatx)(jvar, 0) - (*fMeanMatx)(jvar, 1);
504  (*fFisherCoeff)[ivar] += invCov(ivar, jvar)*d;
505  }
506  // rescale
507  (*fFisherCoeff)[ivar] *= xfact;
508  }
509 
510 
511  // offset correction
512  fF0 = 0.0;
513  for (ivar=0; ivar<GetNvar(); ivar++){
514  fF0 += (*fFisherCoeff)[ivar]*((*fMeanMatx)(ivar, 0) + (*fMeanMatx)(ivar, 1));
515  }
516  fF0 /= -2.0;
517 }
518 
519 ////////////////////////////////////////////////////////////////////////////////
520 /// computation of discrimination power indicator for each variable
521 /// small values of "fWith" indicates little compactness of sig & of backgd
522 /// big values of "fBetw" indicates large separation between sig & backgd
523 ///
524 /// we want signal & backgd classes as compact and separated as possible
525 /// the discriminating power is then defined as the ration "fBetw/fWith"
526 
528 {
529  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
530  if ((*fCov)(ivar, ivar) != 0)
531  (*fDiscrimPow)[ivar] = (*fBetw)(ivar, ivar)/(*fCov)(ivar, ivar);
532  else
533  (*fDiscrimPow)[ivar] = 0;
534  }
535 }
536 
537 ////////////////////////////////////////////////////////////////////////////////
538 /// computes ranking of input variables
539 
541 {
542  // create the ranking object
543  fRanking = new Ranking( GetName(), "Discr. power" );
544 
545  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
546  fRanking->AddRank( Rank( GetInputLabel(ivar), (*fDiscrimPow)[ivar] ) );
547  }
548 
549  return fRanking;
550 }
551 
552 ////////////////////////////////////////////////////////////////////////////////
553 /// display Fisher coefficients and discriminating power for each variable
554 /// check maximum length of variable name
555 
557 {
558  Log() << kINFO << "Results for Fisher coefficients:" << Endl;
559 
560  if (GetTransformationHandler().GetTransformationList().GetSize() != 0) {
561  Log() << kINFO << "NOTE: The coefficients must be applied to TRANFORMED variables" << Endl;
562  Log() << kINFO << " List of the transformation: " << Endl;
563  TListIter trIt(&GetTransformationHandler().GetTransformationList());
564  while (VariableTransformBase *trf = (VariableTransformBase*) trIt()) {
565  Log() << kINFO << " -- " << trf->GetName() << Endl;
566  }
567  }
568  std::vector<TString> vars;
569  std::vector<Double_t> coeffs;
570  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
571  vars .push_back( GetInputLabel(ivar) );
572  coeffs.push_back( (*fFisherCoeff)[ivar] );
573  }
574  vars .push_back( "(offset)" );
575  coeffs.push_back( fF0 );
576  TMVA::gTools().FormattedOutput( coeffs, vars, "Variable" , "Coefficient", Log() );
577 
578  // for (int i=0; i<coeffs.size(); i++)
579  // std::cout << "fisher coeff["<<i<<"]="<<coeffs[i]<<std::endl;
580 
581  if (IsNormalised()) {
582  Log() << kINFO << "NOTE: You have chosen to use the \"Normalise\" booking option. Hence, the" << Endl;
583  Log() << kINFO << " coefficients must be applied to NORMALISED (') variables as follows:" << Endl;
584  Int_t maxL = 0;
585  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) if (GetInputLabel(ivar).Length() > maxL) maxL = GetInputLabel(ivar).Length();
586 
587  // Print normalisation expression (see Tools.cxx): "2*(x - xmin)/(xmax - xmin) - 1.0"
588  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
589  Log() << kINFO
590  << std::setw(maxL+9) << TString("[") + GetInputLabel(ivar) + "]' = 2*("
591  << std::setw(maxL+2) << TString("[") + GetInputLabel(ivar) + "]"
592  << std::setw(3) << (GetXmin(ivar) > 0 ? " - " : " + ")
593  << std::setw(6) << TMath::Abs(GetXmin(ivar)) << std::setw(3) << ")/"
594  << std::setw(6) << (GetXmax(ivar) - GetXmin(ivar) )
595  << std::setw(3) << " - 1"
596  << Endl;
597  }
598  Log() << kINFO << "The TMVA Reader will properly account for this normalisation, but if the" << Endl;
599  Log() << kINFO << "Fisher classifier is applied outside the Reader, the transformation must be" << Endl;
600  Log() << kINFO << "implemented -- or the \"Normalise\" option is removed and Fisher retrained." << Endl;
601  Log() << kINFO << Endl;
602  }
603 }
604 
605 ////////////////////////////////////////////////////////////////////////////////
606 /// read Fisher coefficients from weight file
607 
609 {
610  istr >> fF0;
611  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) istr >> (*fFisherCoeff)[ivar];
612 }
613 
614 ////////////////////////////////////////////////////////////////////////////////
615 /// create XML description of Fisher classifier
616 
617 void TMVA::MethodFisher::AddWeightsXMLTo( void* parent ) const
618 {
619  void* wght = gTools().AddChild(parent, "Weights");
620  gTools().AddAttr( wght, "NCoeff", GetNvar()+1 );
621  void* coeffxml = gTools().AddChild(wght, "Coefficient");
622  gTools().AddAttr( coeffxml, "Index", 0 );
623  gTools().AddAttr( coeffxml, "Value", fF0 );
624  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
625  coeffxml = gTools().AddChild( wght, "Coefficient" );
626  gTools().AddAttr( coeffxml, "Index", ivar+1 );
627  gTools().AddAttr( coeffxml, "Value", (*fFisherCoeff)[ivar] );
628  }
629 }
630 
631 ////////////////////////////////////////////////////////////////////////////////
632 /// read Fisher coefficients from xml weight file
633 
635 {
636  UInt_t ncoeff, coeffidx;
637  gTools().ReadAttr( wghtnode, "NCoeff", ncoeff );
638  fFisherCoeff->resize(ncoeff-1);
639 
640  void* ch = gTools().GetChild(wghtnode);
641  Double_t coeff;
642  while (ch) {
643  gTools().ReadAttr( ch, "Index", coeffidx );
644  gTools().ReadAttr( ch, "Value", coeff );
645  if (coeffidx==0) fF0 = coeff;
646  else (*fFisherCoeff)[coeffidx-1] = coeff;
647  ch = gTools().GetNextChild(ch);
648  }
649 }
650 
651 ////////////////////////////////////////////////////////////////////////////////
652 /// write Fisher-specific classifier response
653 
654 void TMVA::MethodFisher::MakeClassSpecific( std::ostream& fout, const TString& className ) const
655 {
656  Int_t dp = fout.precision();
657  fout << " double fFisher0;" << std::endl;
658  fout << " std::vector<double> fFisherCoefficients;" << std::endl;
659  fout << "};" << std::endl;
660  fout << "" << std::endl;
661  fout << "inline void " << className << "::Initialize() " << std::endl;
662  fout << "{" << std::endl;
663  fout << " fFisher0 = " << std::setprecision(12) << fF0 << ";" << std::endl;
664  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
665  fout << " fFisherCoefficients.push_back( " << std::setprecision(12) << (*fFisherCoeff)[ivar] << " );" << std::endl;
666  }
667  fout << std::endl;
668  fout << " // sanity check" << std::endl;
669  fout << " if (fFisherCoefficients.size() != fNvars) {" << std::endl;
670  fout << " std::cout << \"Problem in class \\\"\" << fClassName << \"\\\"::Initialize: mismatch in number of input values\"" << std::endl;
671  fout << " << fFisherCoefficients.size() << \" != \" << fNvars << std::endl;" << std::endl;
672  fout << " fStatusIsClean = false;" << std::endl;
673  fout << " } " << std::endl;
674  fout << "}" << std::endl;
675  fout << std::endl;
676  fout << "inline double " << className << "::GetMvaValue__( const std::vector<double>& inputValues ) const" << std::endl;
677  fout << "{" << std::endl;
678  fout << " double retval = fFisher0;" << std::endl;
679  fout << " for (size_t ivar = 0; ivar < fNvars; ivar++) {" << std::endl;
680  fout << " retval += fFisherCoefficients[ivar]*inputValues[ivar];" << std::endl;
681  fout << " }" << std::endl;
682  fout << std::endl;
683  fout << " return retval;" << std::endl;
684  fout << "}" << std::endl;
685  fout << std::endl;
686  fout << "// Clean up" << std::endl;
687  fout << "inline void " << className << "::Clear() " << std::endl;
688  fout << "{" << std::endl;
689  fout << " // clear coefficients" << std::endl;
690  fout << " fFisherCoefficients.clear(); " << std::endl;
691  fout << "}" << std::endl;
692  fout << std::setprecision(dp);
693 }
694 
695 ////////////////////////////////////////////////////////////////////////////////
696 /// get help message text
697 ///
698 /// typical length of text line:
699 /// "|--------------------------------------------------------------|"
700 
702 {
703  Log() << Endl;
704  Log() << gTools().Color("bold") << "--- Short description:" << gTools().Color("reset") << Endl;
705  Log() << Endl;
706  Log() << "Fisher discriminants select events by distinguishing the mean " << Endl;
707  Log() << "values of the signal and background distributions in a trans- " << Endl;
708  Log() << "formed variable space where linear correlations are removed." << Endl;
709  Log() << Endl;
710  Log() << " (More precisely: the \"linear discriminator\" determines" << Endl;
711  Log() << " an axis in the (correlated) hyperspace of the input " << Endl;
712  Log() << " variables such that, when projecting the output classes " << Endl;
713  Log() << " (signal and background) upon this axis, they are pushed " << Endl;
714  Log() << " as far as possible away from each other, while events" << Endl;
715  Log() << " of a same class are confined in a close vicinity. The " << Endl;
716  Log() << " linearity property of this classifier is reflected in the " << Endl;
717  Log() << " metric with which \"far apart\" and \"close vicinity\" are " << Endl;
718  Log() << " determined: the covariance matrix of the discriminating" << Endl;
719  Log() << " variable space.)" << Endl;
720  Log() << Endl;
721  Log() << gTools().Color("bold") << "--- Performance optimisation:" << gTools().Color("reset") << Endl;
722  Log() << Endl;
723  Log() << "Optimal performance for Fisher discriminants is obtained for " << Endl;
724  Log() << "linearly correlated Gaussian-distributed variables. Any deviation" << Endl;
725  Log() << "from this ideal reduces the achievable separation power. In " << Endl;
726  Log() << "particular, no discrimination at all is achieved for a variable" << Endl;
727  Log() << "that has the same sample mean for signal and background, even if " << Endl;
728  Log() << "the shapes of the distributions are very different. Thus, Fisher " << Endl;
729  Log() << "discriminants often benefit from suitable transformations of the " << Endl;
730  Log() << "input variables. For example, if a variable x in [-1,1] has a " << Endl;
731  Log() << "a parabolic signal distributions, and a uniform background" << Endl;
732  Log() << "distributions, their mean value is zero in both cases, leading " << Endl;
733  Log() << "to no separation. The simple transformation x -> |x| renders this " << Endl;
734  Log() << "variable powerful for the use in a Fisher discriminant." << Endl;
735  Log() << Endl;
736  Log() << gTools().Color("bold") << "--- Performance tuning via configuration options:" << gTools().Color("reset") << Endl;
737  Log() << Endl;
738  Log() << "<None>" << Endl;
739 }
void GetCov_BetweenClass(void)
the matrix of covariance 'between class' reflects the dispersion of the events of a class relative to...
const Ranking * CreateRanking()
computes ranking of input variables
void MakeClassSpecific(std::ostream &, const TString &) const
write Fisher-specific classifier response
MsgLogger & Endl(MsgLogger &ml)
Definition: MsgLogger.h:162
void ReadWeightsFromStream(std::istream &i)
read Fisher coefficients from weight file
virtual ~MethodFisher(void)
destructor
void AddWeightsXMLTo(void *parent) const
create XML description of Fisher classifier
void GetCov_Full(void)
compute full covariance matrix from sum of within and between matrices
void GetDiscrimPower(void)
computation of discrimination power indicator for each variable small values of "fWith" indicates lit...
virtual Bool_t HasAnalysisType(Types::EAnalysisType type, UInt_t numberClasses, UInt_t numberTargets)
Fisher can only handle classification with 2 classes.
#define assert(cond)
Definition: unittest.h:542
MethodFisher(const TString &jobName, const TString &methodTitle, DataSetInfo &dsi, const TString &theOption="Fisher", TDirectory *theTargetDir=0)
standard constructor for the "Fisher"
EAnalysisType
Definition: Types.h:124
Basic string class.
Definition: TString.h:137
int Int_t
Definition: RtypesCore.h:41
bool Bool_t
Definition: RtypesCore.h:59
const Bool_t kFALSE
Definition: Rtypes.h:92
Double_t GetWeight() const
return the event weight - depending on whether the flag IgnoreNegWeightsInTraining is or not...
Definition: Event.cxx:376
void AddAttr(void *node, const char *, const T &value, Int_t precision=16)
Definition: Tools.h:308
void * AddChild(void *parent, const char *childname, const char *content=0, bool isRootNode=false)
add child node
Definition: Tools.cxx:1134
Short_t Abs(Short_t d)
Definition: TMathBase.h:110
Iterator of linked list.
Definition: TList.h:187
Tools & gTools()
Definition: Tools.cxx:79
Double_t x[n]
Definition: legend1.C:17
int d
Definition: tornado.py:11
void * GetChild(void *parent, const char *childname=0)
get child node
Definition: Tools.cxx:1158
std::vector< std::vector< double > > Data
TMatrixT< Element > & Invert(Double_t *det=0)
Invert the matrix and calculate its determinant.
Definition: TMatrixT.cxx:1388
TMatrixT< Double_t > TMatrixD
Definition: TMatrixDfwd.h:24
void Print(Option_t *name="") const
Print the matrix as a table of elements.
Double_t GetMvaValue(Double_t *err=0, Double_t *errUpper=0)
returns the Fisher value (no fixed range)
void Train(void)
computation of Fisher coefficients by series of matrix operations
void ReadWeightsFromXML(void *wghtnode)
read Fisher coefficients from xml weight file
void ProcessOptions()
process user options
SVector< double, 2 > v
Definition: Dict.h:5
unsigned int UInt_t
Definition: RtypesCore.h:42
ClassImp(TMVA::MethodFisher)
Double_t E()
Definition: TMath.h:54
void InitMatrices(void)
initializaton method; creates global matrices and vectors
void ReadAttr(void *node, const char *, T &value)
Definition: Tools.h:295
void GetHelpMessage() const
get help message text
void DeclareOptions()
MethodFisher options: format and syntax of option string: "type" where type is "Fisher" or "Mahalanob...
double Double_t
Definition: RtypesCore.h:55
void GetMean(void)
compute mean values of variables in each sample, and the overall means
void GetCov_WithinClass(void)
the matrix of covariance 'within class' reflects the dispersion of the events relative to the center ...
Describe directory structure in memory.
Definition: TDirectory.h:44
int type
Definition: TGX11.cxx:120
void * GetNextChild(void *prevchild, const char *childname=0)
XML helpers.
Definition: Tools.cxx:1170
Float_t GetValue(UInt_t ivar) const
return value of i'th variable
Definition: Event.cxx:231
Double_t y[n]
Definition: legend1.C:17
void FormattedOutput(const std::vector< Double_t > &, const std::vector< TString > &, const TString titleVars, const TString titleValues, MsgLogger &logger, TString format="%+1.3f")
formatted output of simple table
Definition: Tools.cxx:896
const TString & Color(const TString &)
human readable color strings
Definition: Tools.cxx:837
#define REGISTER_METHOD(CLASS)
for example
virtual Double_t Determinant() const
Return the matrix determinant.
Definition: TMatrixT.cxx:1353
void PrintCoefficients(void)
display Fisher coefficients and discriminating power for each variable check maximum length of variab...
double result[121]
Double_t Sqrt(Double_t x)
Definition: TMath.h:464
void GetFisherCoeff(void)
Fisher = Sum { [coeff]*[variables] }.
const Bool_t kTRUE
Definition: Rtypes.h:91
Definition: math.cpp:60
void Init(void)
default initialization called by all constructors