Logo ROOT   6.10/09
Reference Guide
MethodFisher.cxx
Go to the documentation of this file.
1 // @(#)root/tmva $Id$
2 // Author: Andreas Hoecker, Xavier Prudent, Joerg Stelzer, Helge Voss, Kai Voss
3 
4 /**********************************************************************************
5  * Project: TMVA - a Root-integrated toolkit for multivariate Data analysis *
6  * Package: TMVA *
7  * Class : MethodFisher *
8  * Web : http://tmva.sourceforge.net *
9  * *
10  * Description: *
11  * Implementation (see header for description) *
12  * *
13  * Original author of this Fisher-Discriminant implementation: *
14  * Andre Gaidot, CEA-France; *
15  * (Translation from FORTRAN) *
16  * *
17  * Authors (alphabetical): *
18  * Andreas Hoecker <Andreas.Hocker@cern.ch> - CERN, Switzerland *
19  * Xavier Prudent <prudent@lapp.in2p3.fr> - LAPP, France *
20  * Helge Voss <Helge.Voss@cern.ch> - MPI-K Heidelberg, Germany *
21  * Kai Voss <Kai.Voss@cern.ch> - U. of Victoria, Canada *
22  * *
23  * Copyright (c) 2005: *
24  * CERN, Switzerland *
25  * U. of Victoria, Canada *
26  * MPI-K Heidelberg, Germany *
27  * LAPP, Annecy, France *
28  * *
29  * Redistribution and use in source and binary forms, with or without *
30  * modification, are permitted according to the terms listed in LICENSE *
31  * (http://tmva.sourceforge.net/LICENSE) *
32  **********************************************************************************/
33 
34 /*! \class TMVA::MethodFisher
35 \ingroup TMVA
36 
37 Fisher and Mahalanobis Discriminants (Linear Discriminant Analysis)
38 
39 In the method of Fisher discriminants event selection is performed
40 in a transformed variable space with zero linear correlations, by
41 distinguishing the mean values of the signal and background
42 distributions.
43 
44 The linear discriminant analysis determines an axis in the (correlated)
45 hyperspace of the input variables
46 such that, when projecting the output classes (signal and background)
47 upon this axis, they are pushed as far as possible away from each other,
48 while events of a same class are confined in a close vicinity.
49 The linearity property of this method is reflected in the metric with
50 which "far apart" and "close vicinity" are determined: the covariance
51 matrix of the discriminant variable space.
52 
53 The classification of the events in signal and background classes
54 relies on the following characteristics (only): overall sample means, \f$ x_i \f$,
55 for each input variable, \f$ i \f$,
56 class-specific sample means, \f$ x_{S(B),i}\f$,
57 and total covariance matrix \f$ T_{ij} \f$. The covariance matrix
58 can be decomposed into the sum of a _within_ (\f$ W_{ij} \f$)
59 and a _between-class_ (\f$ B_{ij} \f$) class matrix. They describe
60 the dispersion of events relative to the means of their own class (within-class
61 matrix), and relative to the overall sample means (between-class matrix).
62 The Fisher coefficients, \f$ F_i \f$, are then given by
63 
64 \f[
65 F_i = \frac{\sqrt{N_s N_b}}{N_s + N_b} \sum_{j=1}^{N_{SB}} W_{ij}^{-1} (\bar{X}_{Sj} - \bar{X}_{Bj})
66 \f]
67 
68 where in TMVA is set \f$ N_S = N_B \f$, so that the factor
69 in front of the sum simplifies to \f$ \frac{1}{2}\f$.
70 The Fisher discriminant then reads
71 
72 \f[
73 X_{Fi} = F_0 + \sum_{i=1}^{N_{SB}} F_i X_i
74 \f]
75 
76 The offset \f$ F_0 \f$ centers the sample mean of \f$ x_{Fi} \f$
77 at zero. Instead of using the within-class matrix, the Mahalanobis variant
78 determines the Fisher coefficients as follows:
79 
80 \f[
81 F_i = \frac{\sqrt{N_s N_b}}{N_s + N_b} \sum_{j=1}^{N_{SB}} (W + B)_{ij}^{-1} (\bar{X}_{Sj} - \bar{X}_{Bj})
82 \f]
83 
84 with resulting \f$ x_{Ma} \f$ that are very similar to the \f$ x_{Fi} \f$.
85 
86 TMVA provides two outputs for the ranking of the input variables:
87 
88  - __Fisher test:__ the Fisher analysis aims at simultaneously maximising
89 the between-class separation, while minimising the within-class dispersion.
90 A useful measure of the discrimination power of a variable is hence given
91 by the diagonal quantity: \f$ \frac{B_{ii}}{W_{ii}} \f$ .
92 
93  - __Discrimination power:__ the value of the Fisher coefficient is a
94 measure of the discriminating power of a variable. The discrimination power
95 of set of input variables can therefore be measured by the scalar
96 
97 \f[
98 \lambda = \frac{\sqrt{N_s N_b}}{N_s + N_b} \sum_{j=1}^{N_{SB}} F_i (\bar{X}_{Sj} - \bar{X}_{Bj})
99 \f]
100 
101 The corresponding numbers are printed on standard output.
102 */
103 
104 #include "TMVA/MethodFisher.h"
105 
106 #include "TMVA/ClassifierFactory.h"
107 #include "TMVA/Configurable.h"
108 #include "TMVA/DataSet.h"
109 #include "TMVA/DataSetInfo.h"
110 #include "TMVA/Event.h"
111 #include "TMVA/IMethod.h"
112 #include "TMVA/MethodBase.h"
113 #include "TMVA/MsgLogger.h"
114 #include "TMVA/Ranking.h"
115 #include "TMVA/Tools.h"
117 #include "TMVA/Types.h"
119 
120 #include "TMath.h"
121 #include "TMatrix.h"
122 #include "TList.h"
123 #include "Riostream.h"
124 
125 #include <iomanip>
126 #include <cassert>
127 
128 REGISTER_METHOD(Fisher)
129 
131 
132 ////////////////////////////////////////////////////////////////////////////////
133 /// standard constructor for the "Fisher"
134 
136  const TString& methodTitle,
137  DataSetInfo& dsi,
138  const TString& theOption ) :
139  MethodBase( jobName, Types::kFisher, methodTitle, dsi, theOption),
140  fMeanMatx ( 0 ),
141  fTheMethod ( "Fisher" ),
142  fFisherMethod ( kFisher ),
143  fBetw ( 0 ),
144  fWith ( 0 ),
145  fCov ( 0 ),
146  fSumOfWeightsS( 0 ),
147  fSumOfWeightsB( 0 ),
148  fDiscrimPow ( 0 ),
149  fFisherCoeff ( 0 ),
150  fF0 ( 0 )
151 {
152 }
153 
154 ////////////////////////////////////////////////////////////////////////////////
155 /// constructor from weight file
156 
158  const TString& theWeightFile) :
159  MethodBase( Types::kFisher, dsi, theWeightFile),
160  fMeanMatx ( 0 ),
161  fTheMethod ( "Fisher" ),
163  fBetw ( 0 ),
164  fWith ( 0 ),
165  fCov ( 0 ),
166  fSumOfWeightsS( 0 ),
167  fSumOfWeightsB( 0 ),
168  fDiscrimPow ( 0 ),
169  fFisherCoeff ( 0 ),
170  fF0 ( 0 )
171 {
172 }
173 
174 ////////////////////////////////////////////////////////////////////////////////
175 /// default initialization called by all constructors
176 
178 {
179  // allocate Fisher coefficients
180  fFisherCoeff = new std::vector<Double_t>( GetNvar() );
181 
182  // the minimum requirement to declare an event signal-like
183  SetSignalReferenceCut( 0.0 );
184 
185  // this is the preparation for training
186  InitMatrices();
187 }
188 
189 ////////////////////////////////////////////////////////////////////////////////
190 /// MethodFisher options:
191 /// format and syntax of option string: "type"
192 /// where type is "Fisher" or "Mahalanobis"
193 
195 {
196  DeclareOptionRef( fTheMethod = "Fisher", "Method", "Discrimination method" );
197  AddPreDefVal(TString("Fisher"));
198  AddPreDefVal(TString("Mahalanobis"));
199 }
200 
201 ////////////////////////////////////////////////////////////////////////////////
202 /// process user options
203 
205 {
206  if (fTheMethod == "Fisher" ) fFisherMethod = kFisher;
208 
209  // this is the preparation for training
210  InitMatrices();
211 }
212 
213 ////////////////////////////////////////////////////////////////////////////////
214 /// destructor
215 
217 {
218  if (fBetw ) { delete fBetw; fBetw = 0; }
219  if (fWith ) { delete fWith; fWith = 0; }
220  if (fCov ) { delete fCov; fCov = 0; }
221  if (fDiscrimPow ) { delete fDiscrimPow; fDiscrimPow = 0; }
222  if (fFisherCoeff) { delete fFisherCoeff; fFisherCoeff = 0; }
223 }
224 
225 ////////////////////////////////////////////////////////////////////////////////
226 /// Fisher can only handle classification with 2 classes
227 
229 {
230  if (type == Types::kClassification && numberClasses == 2) return kTRUE;
231  return kFALSE;
232 }
233 
234 ////////////////////////////////////////////////////////////////////////////////
235 /// computation of Fisher coefficients by series of matrix operations
236 
238 {
239  // get mean value of each variables for signal, backgd and signal+backgd
240  GetMean();
241 
242  // get the matrix of covariance 'within class'
244 
245  // get the matrix of covariance 'between class'
247 
248  // get the matrix of covariance 'between class'
249  GetCov_Full();
250 
251  //--------------------------------------------------------------
252 
253  // get the Fisher coefficients
254  GetFisherCoeff();
255 
256  // get the discriminating power of each variables
257  GetDiscrimPower();
258 
259  // nice output
261 
263 }
264 
265 ////////////////////////////////////////////////////////////////////////////////
266 /// returns the Fisher value (no fixed range)
267 
269 {
270  const Event * ev = GetEvent();
271  Double_t result = fF0;
272  for (UInt_t ivar=0; ivar<GetNvar(); ivar++)
273  result += (*fFisherCoeff)[ivar]*ev->GetValue(ivar);
274 
275  // cannot determine error
276  NoErrorCalc(err, errUpper);
277 
278  return result;
279 
280 }
281 
282 ////////////////////////////////////////////////////////////////////////////////
283 /// initialization method; creates global matrices and vectors
284 
286 {
287  // average value of each variables for S, B, S+B
288  fMeanMatx = new TMatrixD( GetNvar(), 3 );
289 
290  // the covariance 'within class' and 'between class' matrices
291  fBetw = new TMatrixD( GetNvar(), GetNvar() );
292  fWith = new TMatrixD( GetNvar(), GetNvar() );
293  fCov = new TMatrixD( GetNvar(), GetNvar() );
294 
295  // discriminating power
296  fDiscrimPow = new std::vector<Double_t>( GetNvar() );
297 }
298 
299 ////////////////////////////////////////////////////////////////////////////////
300 /// compute mean values of variables in each sample, and the overall means
301 
303 {
304  // initialize internal sum-of-weights variables
305  fSumOfWeightsS = 0;
306  fSumOfWeightsB = 0;
307 
308  const UInt_t nvar = DataInfo().GetNVariables();
309 
310  // init vectors
311  Double_t* sumS = new Double_t[nvar];
312  Double_t* sumB = new Double_t[nvar];
313  for (UInt_t ivar=0; ivar<nvar; ivar++) { sumS[ivar] = sumB[ivar] = 0; }
314 
315  // compute sample means
316  for (Int_t ievt=0; ievt<Data()->GetNEvents(); ievt++) {
317 
318  // read the Training Event into "event"
319  const Event * ev = GetEvent(ievt);
320 
321  // sum of weights
322  Double_t weight = ev->GetWeight();
323  if (DataInfo().IsSignal(ev)) fSumOfWeightsS += weight;
324  else fSumOfWeightsB += weight;
325 
326  Double_t* sum = DataInfo().IsSignal(ev) ? sumS : sumB;
327 
328  for (UInt_t ivar=0; ivar<nvar; ivar++) sum[ivar] += ev->GetValue( ivar )*weight;
329  }
330 
331  for (UInt_t ivar=0; ivar<nvar; ivar++) {
332  (*fMeanMatx)( ivar, 2 ) = sumS[ivar];
333  (*fMeanMatx)( ivar, 0 ) = sumS[ivar]/fSumOfWeightsS;
334 
335  (*fMeanMatx)( ivar, 2 ) += sumB[ivar];
336  (*fMeanMatx)( ivar, 1 ) = sumB[ivar]/fSumOfWeightsB;
337 
338  // signal + background
339  (*fMeanMatx)( ivar, 2 ) /= (fSumOfWeightsS + fSumOfWeightsB);
340  }
341 
342  // fMeanMatx->Print();
343  delete [] sumS;
344  delete [] sumB;
345 }
346 
347 ////////////////////////////////////////////////////////////////////////////////
348 /// the matrix of covariance 'within class' reflects the dispersion of the
349 /// events relative to the center of gravity of their own class
350 
352 {
353  // assert required
354  assert( fSumOfWeightsS > 0 && fSumOfWeightsB > 0 );
355 
356  // product matrices (x-<x>)(y-<y>) where x;y are variables
357 
358  // init
359  const Int_t nvar = GetNvar();
360  const Int_t nvar2 = nvar*nvar;
361  Double_t *sumSig = new Double_t[nvar2];
362  Double_t *sumBgd = new Double_t[nvar2];
363  Double_t *xval = new Double_t[nvar];
364  memset(sumSig,0,nvar2*sizeof(Double_t));
365  memset(sumBgd,0,nvar2*sizeof(Double_t));
366 
367  // 'within class' covariance
368  for (Int_t ievt=0; ievt<Data()->GetNEvents(); ievt++) {
369 
370  // read the Training Event into "event"
371  const Event* ev = GetEvent(ievt);
372 
373  Double_t weight = ev->GetWeight(); // may ignore events with negative weights
374 
375  for (Int_t x=0; x<nvar; x++) xval[x] = ev->GetValue( x );
376  Int_t k=0;
377  for (Int_t x=0; x<nvar; x++) {
378  for (Int_t y=0; y<nvar; y++) {
379  if (DataInfo().IsSignal(ev)) {
380  Double_t v = ( (xval[x] - (*fMeanMatx)(x, 0))*(xval[y] - (*fMeanMatx)(y, 0)) )*weight;
381  sumSig[k] += v;
382  }else{
383  Double_t v = ( (xval[x] - (*fMeanMatx)(x, 1))*(xval[y] - (*fMeanMatx)(y, 1)) )*weight;
384  sumBgd[k] += v;
385  }
386  k++;
387  }
388  }
389  }
390  Int_t k=0;
391  for (Int_t x=0; x<nvar; x++) {
392  for (Int_t y=0; y<nvar; y++) {
393  //(*fWith)(x, y) = (sumSig[k] + sumBgd[k])/(fSumOfWeightsS + fSumOfWeightsB);
394  // HHV: I am still convinced that THIS is how it should be (below) However, while
395  // the old version corresponded so nicely with LD, the FIXED version does not, unless
396  // we agree to change LD. For LD, it is not "defined" to my knowledge how the weights
397  // are weighted, while it is clear how the "Within" matrix for Fisher should be calculated
398  // (i.e. as seen below). In order to agree with the Fisher classifier, one would have to
399  // weigh signal and background such that they correspond to the same number of effective
400  // (weighted) events.
401  // THAT is NOT done currently, but just "event weights" are used.
402  (*fWith)(x, y) = sumSig[k]/fSumOfWeightsS + sumBgd[k]/fSumOfWeightsB;
403  k++;
404  }
405  }
406 
407  delete [] sumSig;
408  delete [] sumBgd;
409  delete [] xval;
410 }
411 
412 ////////////////////////////////////////////////////////////////////////////////
413 /// the matrix of covariance 'between class' reflects the dispersion of the
414 /// events of a class relative to the global center of gravity of all the class
415 /// hence the separation between classes
416 
418 {
419  // assert required
420  assert( fSumOfWeightsS > 0 && fSumOfWeightsB > 0);
421 
422  Double_t prodSig, prodBgd;
423 
424  for (UInt_t x=0; x<GetNvar(); x++) {
425  for (UInt_t y=0; y<GetNvar(); y++) {
426 
427  prodSig = ( ((*fMeanMatx)(x, 0) - (*fMeanMatx)(x, 2))*
428  ((*fMeanMatx)(y, 0) - (*fMeanMatx)(y, 2)) );
429  prodBgd = ( ((*fMeanMatx)(x, 1) - (*fMeanMatx)(x, 2))*
430  ((*fMeanMatx)(y, 1) - (*fMeanMatx)(y, 2)) );
431 
432  (*fBetw)(x, y) = (fSumOfWeightsS*prodSig + fSumOfWeightsB*prodBgd) / (fSumOfWeightsS + fSumOfWeightsB);
433  }
434  }
435 }
436 
437 ////////////////////////////////////////////////////////////////////////////////
438 /// compute full covariance matrix from sum of within and between matrices
439 
441 {
442  for (UInt_t x=0; x<GetNvar(); x++)
443  for (UInt_t y=0; y<GetNvar(); y++)
444  (*fCov)(x, y) = (*fWith)(x, y) + (*fBetw)(x, y);
445 }
446 
447 ////////////////////////////////////////////////////////////////////////////////
448 /// Fisher = Sum { [coeff]*[variables] }
449 ///
450 /// let Xs be the array of the mean values of variables for signal evts
451 /// let Xb be the array of the mean values of variables for backgd evts
452 /// let InvWith be the inverse matrix of the 'within class' correlation matrix
453 ///
454 /// then the array of Fisher coefficients is
455 /// [coeff] =sqrt(fNsig*fNbgd)/fNevt*transpose{Xs-Xb}*InvWith
456 
458 {
459  // assert required
460  assert( fSumOfWeightsS > 0 && fSumOfWeightsB > 0);
461 
462  // invert covariance matrix
463  TMatrixD* theMat = 0;
464  switch (GetFisherMethod()) {
465  case kFisher:
466  theMat = fWith;
467  break;
468  case kMahalanobis:
469  theMat = fCov;
470  break;
471  default:
472  Log() << kFATAL << "<GetFisherCoeff> undefined method" << GetFisherMethod() << Endl;
473  }
474 
475  TMatrixD invCov( *theMat );
476 
477  if ( TMath::Abs(invCov.Determinant()) < 10E-24 ) {
478  Log() << kWARNING << "<GetFisherCoeff> matrix is almost singular with determinant="
479  << TMath::Abs(invCov.Determinant())
480  << " did you use the variables that are linear combinations or highly correlated?"
481  << Endl;
482  }
483  if ( TMath::Abs(invCov.Determinant()) < 10E-120 ) {
484  theMat->Print();
485  Log() << kFATAL << "<GetFisherCoeff> matrix is singular with determinant="
486  << TMath::Abs(invCov.Determinant())
487  << " did you use the variables that are linear combinations? \n"
488  << " do you any clue as to what went wrong in above printout of the covariance matrix? "
489  << Endl;
490  }
491 
492  invCov.Invert();
493 
494  // apply rescaling factor
496 
497  // compute difference of mean values
498  std::vector<Double_t> diffMeans( GetNvar() );
499  UInt_t ivar, jvar;
500  for (ivar=0; ivar<GetNvar(); ivar++) {
501  (*fFisherCoeff)[ivar] = 0;
502 
503  for (jvar=0; jvar<GetNvar(); jvar++) {
504  Double_t d = (*fMeanMatx)(jvar, 0) - (*fMeanMatx)(jvar, 1);
505  (*fFisherCoeff)[ivar] += invCov(ivar, jvar)*d;
506  }
507  // rescale
508  (*fFisherCoeff)[ivar] *= xfact;
509  }
510 
511 
512  // offset correction
513  fF0 = 0.0;
514  for (ivar=0; ivar<GetNvar(); ivar++){
515  fF0 += (*fFisherCoeff)[ivar]*((*fMeanMatx)(ivar, 0) + (*fMeanMatx)(ivar, 1));
516  }
517  fF0 /= -2.0;
518 }
519 
520 ////////////////////////////////////////////////////////////////////////////////
521 /// computation of discrimination power indicator for each variable
522 /// small values of "fWith" indicates little compactness of sig & of backgd
523 /// big values of "fBetw" indicates large separation between sig & backgd
524 ///
525 /// we want signal & backgd classes as compact and separated as possible
526 /// the discriminating power is then defined as the ration "fBetw/fWith"
527 
529 {
530  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
531  if ((*fCov)(ivar, ivar) != 0)
532  (*fDiscrimPow)[ivar] = (*fBetw)(ivar, ivar)/(*fCov)(ivar, ivar);
533  else
534  (*fDiscrimPow)[ivar] = 0;
535  }
536 }
537 
538 ////////////////////////////////////////////////////////////////////////////////
539 /// computes ranking of input variables
540 
542 {
543  // create the ranking object
544  fRanking = new Ranking( GetName(), "Discr. power" );
545 
546  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
547  fRanking->AddRank( Rank( GetInputLabel(ivar), (*fDiscrimPow)[ivar] ) );
548  }
549 
550  return fRanking;
551 }
552 
553 ////////////////////////////////////////////////////////////////////////////////
554 /// display Fisher coefficients and discriminating power for each variable
555 /// check maximum length of variable name
556 
558 {
559  Log() << kHEADER << "Results for Fisher coefficients:" << Endl;
560 
561  if (GetTransformationHandler().GetTransformationList().GetSize() != 0) {
562  Log() << kINFO << "NOTE: The coefficients must be applied to TRANFORMED variables" << Endl;
563  Log() << kINFO << " List of the transformation: " << Endl;
564  TListIter trIt(&GetTransformationHandler().GetTransformationList());
565  while (VariableTransformBase *trf = (VariableTransformBase*) trIt()) {
566  Log() << kINFO << " -- " << trf->GetName() << Endl;
567  }
568  }
569  std::vector<TString> vars;
570  std::vector<Double_t> coeffs;
571  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
572  vars .push_back( GetInputLabel(ivar) );
573  coeffs.push_back( (*fFisherCoeff)[ivar] );
574  }
575  vars .push_back( "(offset)" );
576  coeffs.push_back( fF0 );
577  TMVA::gTools().FormattedOutput( coeffs, vars, "Variable" , "Coefficient", Log() );
578 
579  // for (int i=0; i<coeffs.size(); i++)
580  // std::cout << "fisher coeff["<<i<<"]="<<coeffs[i]<<std::endl;
581 
582  if (IsNormalised()) {
583  Log() << kINFO << "NOTE: You have chosen to use the \"Normalise\" booking option. Hence, the" << Endl;
584  Log() << kINFO << " coefficients must be applied to NORMALISED (') variables as follows:" << Endl;
585  Int_t maxL = 0;
586  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) if (GetInputLabel(ivar).Length() > maxL) maxL = GetInputLabel(ivar).Length();
587 
588  // Print normalisation expression (see Tools.cxx): "2*(x - xmin)/(xmax - xmin) - 1.0"
589  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
590  Log() << kINFO
591  << std::setw(maxL+9) << TString("[") + GetInputLabel(ivar) + "]' = 2*("
592  << std::setw(maxL+2) << TString("[") + GetInputLabel(ivar) + "]"
593  << std::setw(3) << (GetXmin(ivar) > 0 ? " - " : " + ")
594  << std::setw(6) << TMath::Abs(GetXmin(ivar)) << std::setw(3) << ")/"
595  << std::setw(6) << (GetXmax(ivar) - GetXmin(ivar) )
596  << std::setw(3) << " - 1"
597  << Endl;
598  }
599  Log() << kINFO << "The TMVA Reader will properly account for this normalisation, but if the" << Endl;
600  Log() << kINFO << "Fisher classifier is applied outside the Reader, the transformation must be" << Endl;
601  Log() << kINFO << "implemented -- or the \"Normalise\" option is removed and Fisher retrained." << Endl;
602  Log() << kINFO << Endl;
603  }
604 }
605 
606 ////////////////////////////////////////////////////////////////////////////////
607 /// read Fisher coefficients from weight file
608 
610 {
611  istr >> fF0;
612  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) istr >> (*fFisherCoeff)[ivar];
613 }
614 
615 ////////////////////////////////////////////////////////////////////////////////
616 /// create XML description of Fisher classifier
617 
618 void TMVA::MethodFisher::AddWeightsXMLTo( void* parent ) const
619 {
620  void* wght = gTools().AddChild(parent, "Weights");
621  gTools().AddAttr( wght, "NCoeff", GetNvar()+1 );
622  void* coeffxml = gTools().AddChild(wght, "Coefficient");
623  gTools().AddAttr( coeffxml, "Index", 0 );
624  gTools().AddAttr( coeffxml, "Value", fF0 );
625  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
626  coeffxml = gTools().AddChild( wght, "Coefficient" );
627  gTools().AddAttr( coeffxml, "Index", ivar+1 );
628  gTools().AddAttr( coeffxml, "Value", (*fFisherCoeff)[ivar] );
629  }
630 }
631 
632 ////////////////////////////////////////////////////////////////////////////////
633 /// read Fisher coefficients from xml weight file
634 
636 {
637  UInt_t ncoeff, coeffidx;
638  gTools().ReadAttr( wghtnode, "NCoeff", ncoeff );
639  fFisherCoeff->resize(ncoeff-1);
640 
641  void* ch = gTools().GetChild(wghtnode);
642  Double_t coeff;
643  while (ch) {
644  gTools().ReadAttr( ch, "Index", coeffidx );
645  gTools().ReadAttr( ch, "Value", coeff );
646  if (coeffidx==0) fF0 = coeff;
647  else (*fFisherCoeff)[coeffidx-1] = coeff;
648  ch = gTools().GetNextChild(ch);
649  }
650 }
651 
652 ////////////////////////////////////////////////////////////////////////////////
653 /// write Fisher-specific classifier response
654 
655 void TMVA::MethodFisher::MakeClassSpecific( std::ostream& fout, const TString& className ) const
656 {
657  Int_t dp = fout.precision();
658  fout << " double fFisher0;" << std::endl;
659  fout << " std::vector<double> fFisherCoefficients;" << std::endl;
660  fout << "};" << std::endl;
661  fout << "" << std::endl;
662  fout << "inline void " << className << "::Initialize() " << std::endl;
663  fout << "{" << std::endl;
664  fout << " fFisher0 = " << std::setprecision(12) << fF0 << ";" << std::endl;
665  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
666  fout << " fFisherCoefficients.push_back( " << std::setprecision(12) << (*fFisherCoeff)[ivar] << " );" << std::endl;
667  }
668  fout << std::endl;
669  fout << " // sanity check" << std::endl;
670  fout << " if (fFisherCoefficients.size() != fNvars) {" << std::endl;
671  fout << " std::cout << \"Problem in class \\\"\" << fClassName << \"\\\"::Initialize: mismatch in number of input values\"" << std::endl;
672  fout << " << fFisherCoefficients.size() << \" != \" << fNvars << std::endl;" << std::endl;
673  fout << " fStatusIsClean = false;" << std::endl;
674  fout << " } " << std::endl;
675  fout << "}" << std::endl;
676  fout << std::endl;
677  fout << "inline double " << className << "::GetMvaValue__( const std::vector<double>& inputValues ) const" << std::endl;
678  fout << "{" << std::endl;
679  fout << " double retval = fFisher0;" << std::endl;
680  fout << " for (size_t ivar = 0; ivar < fNvars; ivar++) {" << std::endl;
681  fout << " retval += fFisherCoefficients[ivar]*inputValues[ivar];" << std::endl;
682  fout << " }" << std::endl;
683  fout << std::endl;
684  fout << " return retval;" << std::endl;
685  fout << "}" << std::endl;
686  fout << std::endl;
687  fout << "// Clean up" << std::endl;
688  fout << "inline void " << className << "::Clear() " << std::endl;
689  fout << "{" << std::endl;
690  fout << " // clear coefficients" << std::endl;
691  fout << " fFisherCoefficients.clear(); " << std::endl;
692  fout << "}" << std::endl;
693  fout << std::setprecision(dp);
694 }
695 
696 ////////////////////////////////////////////////////////////////////////////////
697 /// get help message text
698 ///
699 /// typical length of text line:
700 /// "|--------------------------------------------------------------|"
701 
703 {
704  Log() << Endl;
705  Log() << gTools().Color("bold") << "--- Short description:" << gTools().Color("reset") << Endl;
706  Log() << Endl;
707  Log() << "Fisher discriminants select events by distinguishing the mean " << Endl;
708  Log() << "values of the signal and background distributions in a trans- " << Endl;
709  Log() << "formed variable space where linear correlations are removed." << Endl;
710  Log() << Endl;
711  Log() << " (More precisely: the \"linear discriminator\" determines" << Endl;
712  Log() << " an axis in the (correlated) hyperspace of the input " << Endl;
713  Log() << " variables such that, when projecting the output classes " << Endl;
714  Log() << " (signal and background) upon this axis, they are pushed " << Endl;
715  Log() << " as far as possible away from each other, while events" << Endl;
716  Log() << " of a same class are confined in a close vicinity. The " << Endl;
717  Log() << " linearity property of this classifier is reflected in the " << Endl;
718  Log() << " metric with which \"far apart\" and \"close vicinity\" are " << Endl;
719  Log() << " determined: the covariance matrix of the discriminating" << Endl;
720  Log() << " variable space.)" << Endl;
721  Log() << Endl;
722  Log() << gTools().Color("bold") << "--- Performance optimisation:" << gTools().Color("reset") << Endl;
723  Log() << Endl;
724  Log() << "Optimal performance for Fisher discriminants is obtained for " << Endl;
725  Log() << "linearly correlated Gaussian-distributed variables. Any deviation" << Endl;
726  Log() << "from this ideal reduces the achievable separation power. In " << Endl;
727  Log() << "particular, no discrimination at all is achieved for a variable" << Endl;
728  Log() << "that has the same sample mean for signal and background, even if " << Endl;
729  Log() << "the shapes of the distributions are very different. Thus, Fisher " << Endl;
730  Log() << "discriminants often benefit from suitable transformations of the " << Endl;
731  Log() << "input variables. For example, if a variable x in [-1,1] has a " << Endl;
732  Log() << "a parabolic signal distributions, and a uniform background" << Endl;
733  Log() << "distributions, their mean value is zero in both cases, leading " << Endl;
734  Log() << "to no separation. The simple transformation x -> |x| renders this " << Endl;
735  Log() << "variable powerful for the use in a Fisher discriminant." << Endl;
736  Log() << Endl;
737  Log() << gTools().Color("bold") << "--- Performance tuning via configuration options:" << gTools().Color("reset") << Endl;
738  Log() << Endl;
739  Log() << "<None>" << Endl;
740 }
void GetCov_BetweenClass(void)
the matrix of covariance &#39;between class&#39; reflects the dispersion of the events of a class relative to...
const Ranking * CreateRanking()
computes ranking of input variables
UInt_t GetNVariables() const
Definition: DataSetInfo.h:110
static long int sum(long int i)
Definition: Factory.cxx:2162
MethodFisher(const TString &jobName, const TString &methodTitle, DataSetInfo &dsi, const TString &theOption="Fisher")
standard constructor for the "Fisher"
MsgLogger & Endl(MsgLogger &ml)
Definition: MsgLogger.h:158
Singleton class for Global types used by TMVA.
Definition: Types.h:73
void ReadWeightsFromStream(std::istream &i)
read Fisher coefficients from weight file
virtual ~MethodFisher(void)
destructor
void GetCov_Full(void)
compute full covariance matrix from sum of within and between matrices
void GetDiscrimPower(void)
computation of discrimination power indicator for each variable small values of "fWith" indicates lit...
virtual Bool_t HasAnalysisType(Types::EAnalysisType type, UInt_t numberClasses, UInt_t numberTargets)
Fisher can only handle classification with 2 classes.
UInt_t GetNvar() const
Definition: MethodBase.h:328
EFisherMethod GetFisherMethod(void)
Definition: MethodFisher.h:87
MsgLogger & Log() const
Definition: Configurable.h:122
OptionBase * DeclareOptionRef(T &ref, const TString &name, const TString &desc="")
Double_t fSumOfWeightsS
Definition: MethodFisher.h:144
EAnalysisType
Definition: Types.h:125
Virtual base Class for all MVA method.
Definition: MethodBase.h:106
void AddWeightsXMLTo(void *parent) const
create XML description of Fisher classifier
Basic string class.
Definition: TString.h:129
TMatrixD * fMeanMatx
Definition: MethodFisher.h:132
TransformationHandler & GetTransformationHandler(Bool_t takeReroutedIfAvailable=true)
Definition: MethodBase.h:378
Ranking for variables in method (implementation)
Definition: Ranking.h:48
int Int_t
Definition: RtypesCore.h:41
bool Bool_t
Definition: RtypesCore.h:59
std::vector< Double_t > * fFisherCoeff
Definition: MethodFisher.h:148
virtual Double_t Determinant() const
Return the matrix determinant.
Definition: TMatrixT.cxx:1361
void AddAttr(void *node, const char *, const T &value, Int_t precision=16)
add attribute to xml
Definition: Tools.h:308
void * AddChild(void *parent, const char *childname, const char *content=0, bool isRootNode=false)
add child node
Definition: Tools.cxx:1135
Short_t Abs(Short_t d)
Definition: TMathBase.h:108
void GetHelpMessage() const
get help message text
Iterator of linked list.
Definition: TList.h:183
const TString & GetInputLabel(Int_t i) const
Definition: MethodBase.h:334
Double_t x[n]
Definition: legend1.C:17
Double_t fSumOfWeightsB
Definition: MethodFisher.h:145
const Event * GetEvent() const
Definition: MethodBase.h:733
DataSet * Data() const
Definition: MethodBase.h:393
void * GetChild(void *parent, const char *childname=0)
get child node
Definition: Tools.cxx:1161
EFisherMethod fFisherMethod
Definition: MethodFisher.h:136
Double_t GetXmin(Int_t ivar) const
Definition: MethodBase.h:340
DataSetInfo & DataInfo() const
Definition: MethodBase.h:394
Class that contains all the data information.
Definition: DataSetInfo.h:60
Double_t GetWeight() const
return the event weight - depending on whether the flag IgnoreNegWeightsInTraining is or not...
Definition: Event.cxx:382
TMatrixT< Element > & Invert(Double_t *det=0)
Invert the matrix and calculate its determinant.
Definition: TMatrixT.cxx:1396
Double_t GetXmax(Int_t ivar) const
Definition: MethodBase.h:341
TMatrixT< Double_t > TMatrixD
Definition: TMatrixDfwd.h:22
Double_t GetMvaValue(Double_t *err=0, Double_t *errUpper=0)
returns the Fisher value (no fixed range)
void Train(void)
computation of Fisher coefficients by series of matrix operations
void ReadWeightsFromXML(void *wghtnode)
read Fisher coefficients from xml weight file
Linear interpolation class.
void ProcessOptions()
process user options
SVector< double, 2 > v
Definition: Dict.h:5
const char * GetName() const
Definition: MethodBase.h:318
unsigned int UInt_t
Definition: RtypesCore.h:42
Ssiz_t Length() const
Definition: TString.h:388
void InitMatrices(void)
initialization method; creates global matrices and vectors
void ReadAttr(void *node, const char *, T &value)
read attribute from xml
Definition: Tools.h:290
Tools & gTools()
constexpr Double_t E()
Definition: TMath.h:74
void DeclareOptions()
MethodFisher options: format and syntax of option string: "type" where type is "Fisher" or "Mahalanob...
const Bool_t kFALSE
Definition: RtypesCore.h:92
Float_t GetValue(UInt_t ivar) const
return value of i&#39;th variable
Definition: Event.cxx:237
#define ClassImp(name)
Definition: Rtypes.h:336
double Double_t
Definition: RtypesCore.h:55
void GetMean(void)
compute mean values of variables in each sample, and the overall means
void GetCov_WithinClass(void)
the matrix of covariance &#39;within class&#39; reflects the dispersion of the events relative to the center ...
Bool_t IsNormalised() const
Definition: MethodBase.h:478
int type
Definition: TGX11.cxx:120
void * GetNextChild(void *prevchild, const char *childname=0)
XML helpers.
Definition: Tools.cxx:1173
Double_t y[n]
Definition: legend1.C:17
void AddPreDefVal(const T &)
Definition: Configurable.h:168
void ExitFromTraining()
Definition: MethodBase.h:446
void Print(Option_t *name="") const
Print the matrix as a table of elements.
void FormattedOutput(const std::vector< Double_t > &, const std::vector< TString > &, const TString titleVars, const TString titleValues, MsgLogger &logger, TString format="%+1.3f")
formatted output of simple table
Definition: Tools.cxx:898
const TString & Color(const TString &)
human readable color strings
Definition: Tools.cxx:839
#define REGISTER_METHOD(CLASS)
for example
void MakeClassSpecific(std::ostream &, const TString &) const
write Fisher-specific classifier response
Ranking * fRanking
Definition: MethodBase.h:569
virtual void AddRank(const Rank &rank)
Add a new rank take ownership of it.
Definition: Ranking.cxx:86
Long64_t GetNEvents(Types::ETreeType type=Types::kMaxTreeType) const
Definition: DataSet.h:215
void PrintCoefficients(void)
display Fisher coefficients and discriminating power for each variable check maximum length of variab...
Bool_t IsSignal(const Event *ev) const
Fisher and Mahalanobis Discriminants (Linear Discriminant Analysis)
Definition: MethodFisher.h:54
double result[121]
Double_t Sqrt(Double_t x)
Definition: TMath.h:591
virtual const char * GetName() const
Returns name of object.
Definition: TObject.cxx:364
void GetFisherCoeff(void)
Fisher = Sum { [coeff]*[variables] }.
const Bool_t kTRUE
Definition: RtypesCore.h:91
std::vector< Double_t > * fDiscrimPow
Definition: MethodFisher.h:147
void NoErrorCalc(Double_t *const err, Double_t *const errUpper)
Definition: MethodBase.cxx:829
void SetSignalReferenceCut(Double_t cut)
Definition: MethodBase.h:348
void Init(void)
default initialization called by all constructors