Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
MethodLikelihood.cxx
Go to the documentation of this file.
1// @(#)root/tmva $Id$
2// Author: Andreas Hoecker, Joerg Stelzer, Helge Voss, Kai Voss, Eckhard von Toerne, Jan Therhaag
3
4/**********************************************************************************
5 * Project: TMVA - a Root-integrated toolkit for multivariate Data analysis *
6 * Package: TMVA *
7 * Class : TMVA::MethodLikelihood *
8 * Web : http://tmva.sourceforge.net *
9 * *
10 * Description: *
11 * Implementation (see header for description) *
12 * *
13 * Authors (alphabetical): *
14 * Andreas Hoecker <Andreas.Hocker@cern.ch> - CERN, Switzerland *
15 * Helge Voss <Helge.Voss@cern.ch> - MPI-K Heidelberg, Germany *
16 * Kai Voss <Kai.Voss@cern.ch> - U. of Victoria, Canada *
17 * Jan Therhaag <Jan.Therhaag@cern.ch> - U of Bonn, Germany *
18 * Eckhard v. Toerne <evt@uni-bonn.de> - U. of Bonn, Germany *
19 * *
20 * Copyright (c) 2005-2011: *
21 * CERN, Switzerland *
22 * U. of Victoria, Canada *
23 * MPI-K Heidelberg, Germany *
24 * U. of Bonn, Germany *
25 * *
26 * Redistribution and use in source and binary forms, with or without *
27 * modification, are permitted according to the terms listed in LICENSE *
28 * (http://tmva.sourceforge.net/LICENSE) *
29 **********************************************************************************/
30
31/*! \class TMVA::MethodLikelihood
32\ingroup TMVA
33
34Likelihood analysis ("non-parametric approach")
35
36
37Also implemented is a "diagonalized likelihood approach",
38which improves over the uncorrelated likelihood approach by
39transforming linearly the input variables into a diagonal space,
40using the square-root of the covariance matrix
41
42
43The method of maximum likelihood is the most straightforward, and
44certainly among the most elegant multivariate analyser approaches.
45We define the likelihood ratio, \f$ R_L \f$, for event
46\f$ i \f$, by:
47
48\f[
49R_L(i) = \frac{L_S(i)}{L_B(i) + L_B(i)}
50\f]
51
52Here the signal and background likelihoods, \f$ L_S \f$,
53\f$ L_B \f$, are products of the corresponding probability
54densities, \f$ p_S \f$, \f$ p_B \f$, of the
55\f$ N_{var} \f$ discriminating variables used in the MVA:
56
57\f[
58L_S(i) \ \prod_{j=1}^{N_{var}} p_{Sj} (i)
59\f]
60
61and accordingly for \f$ L_B \f$.
62In practise, TMVA uses polynomial splines to estimate the probability
63density functions (PDF) obtained from the distributions of the
64training variables.
65
66
67Note that in TMVA the output of the likelihood ratio is transformed by:
68
69\f[
70R_L(i) \to R'_L(i) = -\frac{1}{\tau} ln(R_L^{-1}(i) -1)
71\f]
72
73to avoid the occurrence of heavy peaks at \f$ R_L = 0.1 \f$ .
74
75#### Decorrelated (or "diagonalized") Likelihood
76
77The biggest drawback of the Likelihood approach is that it assumes
78that the discriminant variables are uncorrelated. If it were the case,
79it can be proven that the discrimination obtained by the above likelihood
80ratio is optimal, ie, no other method can beat it. However, in most
81practical applications of MVAs correlations are present. </p>
82
83
84Linear correlations, measured from the training sample, can be taken
85into account in a straightforward manner through the square-root
86of the covariance matrix. The square-root of a matrix
87\f$ C \f$ is the matrix \f$ C&prime; \f$ that multiplied with itself
88yields \f$ C \f$: \f$ C \f$=\f$ C&prime;C&prime; \f$. We compute the
89square-root matrix (SQM) by means of diagonalising (\f$ D \f$) the
90covariance matrix:
91
92\f[
93D = S^TCS \Rightarrow C' = S \sqrt{DS^T}
94\f]
95
96and the linear transformation of the linearly correlated into the
97uncorrelated variables space is then given by multiplying the measured
98variable tuple by the inverse of the SQM. Note that these transformations
99are performed for both signal and background separately, since the
100correlation pattern is not the same in the two samples.
101
102
103The above diagonalisation is complete for linearly correlated,
104Gaussian distributed variables only. In real-world examples this
105is not often the case, so that only little additional information
106may be recovered by the diagonalisation procedure. In these cases,
107non-linear methods must be applied.
108*/
109
111
112#include "TMVA/Configurable.h"
114#include "TMVA/DataSet.h"
115#include "TMVA/DataSetInfo.h"
116#include "TMVA/IMethod.h"
117#include "TMVA/MethodBase.h"
118#include "TMVA/MsgLogger.h"
119#include "TMVA/PDF.h"
120#include "TMVA/Ranking.h"
121#include "TMVA/Tools.h"
122#include "TMVA/Types.h"
123#include "TMVA/VariableInfo.h"
124
125#include "TVector.h"
126#include "TMath.h"
127#include "TFile.h"
128#include "TH1.h"
129
130#include <iostream>
131#include <iomanip>
132#include <vector>
133#include <cstdlib>
134
135REGISTER_METHOD(Likelihood)
136
138
139////////////////////////////////////////////////////////////////////////////////
140/// standard constructor
141
143 const TString& methodTitle,
144 DataSetInfo& theData,
145 const TString& theOption ) :
146 TMVA::MethodBase( jobName, Types::kLikelihood, methodTitle, theData, theOption),
147 fEpsilon ( 1.e3 * DBL_MIN ),
148 fTransformLikelihoodOutput( kFALSE ),
149 fDropVariable ( 0 ),
150 fHistSig ( 0 ),
151 fHistBgd ( 0 ),
152 fHistSig_smooth( 0 ),
153 fHistBgd_smooth( 0 ),
154 fDefaultPDFLik ( 0 ),
155 fPDFSig ( 0 ),
156 fPDFBgd ( 0 ),
157 fNsmooth ( 2 ),
158 fNsmoothVarS ( 0 ),
159 fNsmoothVarB ( 0 ),
160 fAverageEvtPerBin( 0 ),
161 fAverageEvtPerBinVarS (0),
162 fAverageEvtPerBinVarB (0),
163 fKDEfineFactor ( 0 ),
164 fInterpolateString(0)
165{
166}
167
168////////////////////////////////////////////////////////////////////////////////
169/// construct likelihood references from file
170
172 const TString& theWeightFile) :
173 TMVA::MethodBase( Types::kLikelihood, theData, theWeightFile),
174 fEpsilon ( 1.e3 * DBL_MIN ),
175 fTransformLikelihoodOutput( kFALSE ),
176 fDropVariable ( 0 ),
177 fHistSig ( 0 ),
178 fHistBgd ( 0 ),
179 fHistSig_smooth( 0 ),
180 fHistBgd_smooth( 0 ),
181 fDefaultPDFLik ( 0 ),
182 fPDFSig ( 0 ),
183 fPDFBgd ( 0 ),
184 fNsmooth ( 2 ),
185 fNsmoothVarS ( 0 ),
186 fNsmoothVarB ( 0 ),
187 fAverageEvtPerBin( 0 ),
188 fAverageEvtPerBinVarS (0),
189 fAverageEvtPerBinVarB (0),
190 fKDEfineFactor ( 0 ),
191 fInterpolateString(0)
192{
193}
194
195////////////////////////////////////////////////////////////////////////////////
196/// destructor
197
199{
200 if (NULL != fDefaultPDFLik) delete fDefaultPDFLik;
201 if (NULL != fHistSig) delete fHistSig;
202 if (NULL != fHistBgd) delete fHistBgd;
203 if (NULL != fHistSig_smooth) delete fHistSig_smooth;
204 if (NULL != fHistBgd_smooth) delete fHistBgd_smooth;
205 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
206 if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
207 if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
208 }
209 if (NULL != fPDFSig) delete fPDFSig;
210 if (NULL != fPDFBgd) delete fPDFBgd;
211}
212
213////////////////////////////////////////////////////////////////////////////////
214/// FDA can handle classification with 2 classes
215
217 UInt_t numberClasses, UInt_t /*numberTargets*/ )
218{
219 if (type == Types::kClassification && numberClasses == 2) return kTRUE;
220 return kFALSE;
221}
222
223////////////////////////////////////////////////////////////////////////////////
224/// default initialisation called by all constructors
225
227{
228 // no ranking test
229 fDropVariable = -1;
230 fHistSig = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
231 fHistBgd = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
232 fHistSig_smooth = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
233 fHistBgd_smooth = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
234 fPDFSig = new std::vector<TMVA::PDF*>( GetNvar(), (TMVA::PDF*)0 );
235 fPDFBgd = new std::vector<TMVA::PDF*>( GetNvar(), (TMVA::PDF*)0 );
236}
237
238////////////////////////////////////////////////////////////////////////////////
239/// define the options (their key words) that can be set in the option string
240///
241/// TransformOutput <bool> transform (often strongly peaked) likelihood output through sigmoid inversion
242
244{
245 DeclareOptionRef( fTransformLikelihoodOutput = kFALSE, "TransformOutput",
246 "Transform likelihood output by inverse sigmoid function" );
247
248 // initialize
249
250 // reading every PDF's definition and passing the option string to the next one to be read and marked
251 TString updatedOptions = GetOptions();
252 fDefaultPDFLik = new PDF( TString(GetName()) + " PDF", updatedOptions );
253 fDefaultPDFLik->DeclareOptions();
254 fDefaultPDFLik->ParseOptions();
255 updatedOptions = fDefaultPDFLik->GetOptions();
256 for (UInt_t ivar = 0; ivar< DataInfo().GetNVariables(); ivar++) {
257 (*fPDFSig)[ivar] = new PDF( Form("%s PDF Sig[%d]", GetName(), ivar), updatedOptions,
258 Form("Sig[%d]",ivar), fDefaultPDFLik );
259 (*fPDFSig)[ivar]->DeclareOptions();
260 (*fPDFSig)[ivar]->ParseOptions();
261 updatedOptions = (*fPDFSig)[ivar]->GetOptions();
262 (*fPDFBgd)[ivar] = new PDF( Form("%s PDF Bkg[%d]", GetName(), ivar), updatedOptions,
263 Form("Bkg[%d]",ivar), fDefaultPDFLik );
264 (*fPDFBgd)[ivar]->DeclareOptions();
265 (*fPDFBgd)[ivar]->ParseOptions();
266 updatedOptions = (*fPDFBgd)[ivar]->GetOptions();
267 }
268
269 // the final marked option string is written back to the original likelihood
270 SetOptions( updatedOptions );
271}
272
273
275{
276 // options that are used ONLY for the READER to ensure backward compatibility
277
279 DeclareOptionRef( fNsmooth = 1, "NSmooth",
280 "Number of smoothing iterations for the input histograms");
281 DeclareOptionRef( fAverageEvtPerBin = 50, "NAvEvtPerBin",
282 "Average number of events per PDF bin");
283 DeclareOptionRef( fKDEfineFactor =1. , "KDEFineFactor",
284 "Fine tuning factor for Adaptive KDE: Factor to multiply the width of the kernel");
285 DeclareOptionRef( fBorderMethodString = "None", "KDEborder",
286 "Border effects treatment (1=no treatment , 2=kernel renormalization, 3=sample mirroring)" );
287 DeclareOptionRef( fKDEiterString = "Nonadaptive", "KDEiter",
288 "Number of iterations (1=non-adaptive, 2=adaptive)" );
289 DeclareOptionRef( fKDEtypeString = "Gauss", "KDEtype",
290 "KDE kernel type (1=Gauss)" );
291 fAverageEvtPerBinVarS = new Int_t[GetNvar()];
292 fAverageEvtPerBinVarB = new Int_t[GetNvar()];
293 fNsmoothVarS = new Int_t[GetNvar()];
294 fNsmoothVarB = new Int_t[GetNvar()];
295 fInterpolateString = new TString[GetNvar()];
296 for(UInt_t i=0; i<GetNvar(); ++i) {
297 fAverageEvtPerBinVarS[i] = fAverageEvtPerBinVarB[i] = 0;
298 fNsmoothVarS[i] = fNsmoothVarB[i] = 0;
299 fInterpolateString[i] = "";
300 }
301 DeclareOptionRef( fAverageEvtPerBinVarS, GetNvar(), "NAvEvtPerBinSig",
302 "Average num of events per PDF bin and variable (signal)");
303 DeclareOptionRef( fAverageEvtPerBinVarB, GetNvar(), "NAvEvtPerBinBkg",
304 "Average num of events per PDF bin and variable (background)");
305 DeclareOptionRef(fNsmoothVarS, GetNvar(), "NSmoothSig",
306 "Number of smoothing iterations for the input histograms");
307 DeclareOptionRef(fNsmoothVarB, GetNvar(), "NSmoothBkg",
308 "Number of smoothing iterations for the input histograms");
309 DeclareOptionRef(fInterpolateString, GetNvar(), "PDFInterpol", "Method of interpolating reference histograms (e.g. Spline2 or KDE)");
310}
311
312////////////////////////////////////////////////////////////////////////////////
313/// process user options
314/// reference cut value to distinguish signal-like from background-like events
315
317{
318 SetSignalReferenceCut( TransformLikelihoodOutput( 0.5, 0.5 ) );
319
320 fDefaultPDFLik->ProcessOptions();
321 for (UInt_t ivar = 0; ivar< DataInfo().GetNVariables(); ivar++) {
322 (*fPDFBgd)[ivar]->ProcessOptions();
323 (*fPDFSig)[ivar]->ProcessOptions();
324 }
325}
326
327////////////////////////////////////////////////////////////////////////////////
328/// create reference distributions (PDFs) from signal and background events:
329/// fill histograms and smooth them; if decorrelation is required, compute
330/// corresponding square-root matrices
331/// the reference histograms require the correct boundaries. Since in Likelihood classification
332/// the transformations are applied using both classes, also the corresponding boundaries
333/// need to take this into account
334
336{
337 UInt_t nvar=GetNvar();
338 std::vector<Double_t> xmin(nvar), xmax(nvar);
339 for (UInt_t ivar=0; ivar<nvar; ivar++) {xmin[ivar]=1e30; xmax[ivar]=-1e30;}
340
341 UInt_t nevents=Data()->GetNEvents();
342 for (UInt_t ievt=0; ievt<nevents; ievt++) {
343 // use the true-event-type's transformation
344 // set the event true event types transformation
345 const Event* origEv = Data()->GetEvent(ievt);
346 if (IgnoreEventsWithNegWeightsInTraining() && origEv->GetWeight()<=0) continue;
347 // loop over classes
348 for (int cls=0;cls<2;cls++){
349 GetTransformationHandler().SetTransformationReferenceClass(cls);
350 const Event* ev = GetTransformationHandler().Transform( origEv );
351 for (UInt_t ivar=0; ivar<nvar; ivar++) {
352 Float_t value = ev->GetValue(ivar);
353 if (value < xmin[ivar]) xmin[ivar] = value;
354 if (value > xmax[ivar]) xmax[ivar] = value;
355 }
356 }
357 }
358
359 // create reference histograms
360 // (KDE smoothing requires very finely binned reference histograms)
361 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
362 TString var = (*fInputVars)[ivar];
363
364 // the reference histograms require the correct boundaries. Since in Likelihood classification
365 // the transformations are applied using both classes, also the corresponding boundaries
366 // need to take this into account
367
368 // special treatment for discrete variables
369 if (DataInfo().GetVariableInfo(ivar).GetVarType() == 'I') {
370 // special treatment for integer variables
371 Int_t ixmin = TMath::Nint( xmin[ivar] );
372 xmax[ivar]=xmax[ivar]+1; // make sure that all entries are included in histogram
373 Int_t ixmax = TMath::Nint( xmax[ivar] );
374 Int_t nbins = ixmax - ixmin;
375 (*fHistSig)[ivar] = new TH1F(GetMethodName()+"_"+var + "_sig", var + " signal training", nbins, ixmin, ixmax );
376 (*fHistBgd)[ivar] = new TH1F(GetMethodName()+"_"+var + "_bgd", var + " background training", nbins, ixmin, ixmax );
377 } else {
378
379 UInt_t minNEvt = TMath::Min(Data()->GetNEvtSigTrain(),Data()->GetNEvtBkgdTrain());
380 Int_t nbinsS = (*fPDFSig)[ivar]->GetHistNBins( minNEvt );
381 Int_t nbinsB = (*fPDFBgd)[ivar]->GetHistNBins( minNEvt );
382
383 (*fHistSig)[ivar] = new TH1F( Form("%s_%s_%s_sig",DataInfo().GetName(),GetMethodName().Data(),var.Data()),
384 Form("%s_%s_%s signal training",DataInfo().GetName(),GetMethodName().Data(),var.Data()), nbinsS, xmin[ivar], xmax[ivar] );
385 (*fHistBgd)[ivar] = new TH1F( Form("%s_%s_%s_bgd",DataInfo().GetName(),GetMethodName().Data(),var.Data()),
386 Form("%s_%s_%s background training",DataInfo().GetName(),GetMethodName().Data(),var.Data()), nbinsB, xmin[ivar], xmax[ivar] );
387 }
388 }
389
390 // ----- fill the reference histograms
391 Log() << kINFO << "Filling reference histograms" << Endl;
392
393 // event loop
394 for (Int_t ievt=0; ievt<Data()->GetNEvents(); ievt++) {
395
396 // use the true-event-type's transformation
397 // set the event true event types transformation
398 const Event* origEv = Data()->GetEvent(ievt);
399 if (IgnoreEventsWithNegWeightsInTraining() && origEv->GetWeight()<=0) continue;
400 GetTransformationHandler().SetTransformationReferenceClass( origEv->GetClass() );
401 const Event* ev = GetTransformationHandler().Transform( origEv );
402
403 // the event weight
404 Float_t weight = ev->GetWeight();
405
406 // fill variable vector
407 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
408 Double_t value = ev->GetValue(ivar);
409 // verify limits
410 if (value >= xmax[ivar]) value = xmax[ivar] - 1.0e-10;
411 else if (value < xmin[ivar]) value = xmin[ivar] + 1.0e-10;
412 // inserting check if there are events in overflow or underflow
413 if (value >=(*fHistSig)[ivar]->GetXaxis()->GetXmax() ||
414 value <(*fHistSig)[ivar]->GetXaxis()->GetXmin()){
415 Log()<<kWARNING
416 <<"error in filling likelihood reference histograms var="
417 <<(*fInputVars)[ivar]
418 << ", xmin="<<(*fHistSig)[ivar]->GetXaxis()->GetXmin()
419 << ", value="<<value
420 << ", xmax="<<(*fHistSig)[ivar]->GetXaxis()->GetXmax()
421 << Endl;
422 }
423 if (DataInfo().IsSignal(ev)) (*fHistSig)[ivar]->Fill( value, weight );
424 else (*fHistBgd)[ivar]->Fill( value, weight );
425 }
426 }
427
428 // building the pdfs
429 Log() << kINFO << "Building PDF out of reference histograms" << Endl;
430 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
431
432 // the PDF is built from (binned) reference histograms
433 // in case of KDE, this has a large number of bins, which makes it quasi-unbinned
434 (*fPDFSig)[ivar]->BuildPDF( (*fHistSig)[ivar] );
435 (*fPDFBgd)[ivar]->BuildPDF( (*fHistBgd)[ivar] );
436
437 (*fPDFSig)[ivar]->ValidatePDF( (*fHistSig)[ivar] );
438 (*fPDFBgd)[ivar]->ValidatePDF( (*fHistBgd)[ivar] );
439
440 // saving the smoothed histograms
441 if ((*fPDFSig)[ivar]->GetSmoothedHist() != 0) (*fHistSig_smooth)[ivar] = (*fPDFSig)[ivar]->GetSmoothedHist();
442 if ((*fPDFBgd)[ivar]->GetSmoothedHist() != 0) (*fHistBgd_smooth)[ivar] = (*fPDFBgd)[ivar]->GetSmoothedHist();
443 }
444 ExitFromTraining();
445}
446
447////////////////////////////////////////////////////////////////////////////////
448/// returns the likelihood estimator for signal
449/// fill a new Likelihood branch into the testTree
450
452{
453 UInt_t ivar;
454
455 // cannot determine error
456 NoErrorCalc(err, errUpper);
457
458 // retrieve variables, and transform, if required
459 TVector vs( GetNvar() );
460 TVector vb( GetNvar() );
461
462 // need to distinguish signal and background in case of variable transformation
463 // signal first
464
465 GetTransformationHandler().SetTransformationReferenceClass( fSignalClass );
466 // temporary: JS --> FIX
467 //GetTransformationHandler().SetTransformationReferenceClass( 0 );
468 const Event* ev = GetEvent();
469 for (ivar=0; ivar<GetNvar(); ivar++) vs(ivar) = ev->GetValue(ivar);
470
471 GetTransformationHandler().SetTransformationReferenceClass( fBackgroundClass );
472 // temporary: JS --> FIX
473 //GetTransformationHandler().SetTransformationReferenceClass( 1 );
474 ev = GetEvent();
475 for (ivar=0; ivar<GetNvar(); ivar++) vb(ivar) = ev->GetValue(ivar);
476
477 // compute the likelihood (signal)
478 Double_t ps(1), pb(1), p(0);
479 for (ivar=0; ivar<GetNvar(); ivar++) {
480
481 // drop one variable (this is ONLY used for internal variable ranking !)
482 if ((Int_t)ivar == fDropVariable) continue;
483
484 Double_t x[2] = { vs(ivar), vb(ivar) };
485
486 for (UInt_t itype=0; itype < 2; itype++) {
487
488 // verify limits
489 if (x[itype] >= (*fPDFSig)[ivar]->GetXmax()) x[itype] = (*fPDFSig)[ivar]->GetXmax() - 1.0e-10;
490 else if (x[itype] < (*fPDFSig)[ivar]->GetXmin()) x[itype] = (*fPDFSig)[ivar]->GetXmin();
491
492 // find corresponding histogram from cached indices
493 PDF* pdf = (itype == 0) ? (*fPDFSig)[ivar] : (*fPDFBgd)[ivar];
494 if (pdf == 0) Log() << kFATAL << "<GetMvaValue> Reference histograms don't exist" << Endl;
495 TH1* hist = pdf->GetPDFHist();
496
497 // interpolate linearly between adjacent bins
498 // this is not useful for discrete variables
499 Int_t bin = hist->FindBin(x[itype]);
500
501 // **** POTENTIAL BUG: PREFORMANCE IS WORSE WHEN USING TRUE TYPE ***
502 // ==> commented out at present
503 if ((*fPDFSig)[ivar]->GetInterpolMethod() == TMVA::PDF::kSpline0 ||
504 DataInfo().GetVariableInfo(ivar).GetVarType() == 'N') {
505 p = TMath::Max( hist->GetBinContent(bin), fEpsilon );
506 } else { // splined PDF
507 Int_t nextbin = bin;
508 if ((x[itype] > hist->GetBinCenter(bin) && bin != hist->GetNbinsX()) || bin == 1)
509 nextbin++;
510 else
511 nextbin--;
512
513
514 Double_t dx = hist->GetBinCenter(bin) - hist->GetBinCenter(nextbin);
515 Double_t dy = hist->GetBinContent(bin) - hist->GetBinContent(nextbin);
516 Double_t like = hist->GetBinContent(bin) + (x[itype] - hist->GetBinCenter(bin)) * dy/dx;
517
518 p = TMath::Max( like, fEpsilon );
519 }
520
521 if (itype == 0) ps *= p;
522 else pb *= p;
523 }
524 }
525
526 // the likelihood ratio (transform it ?)
527 return TransformLikelihoodOutput( ps, pb );
528}
529
530////////////////////////////////////////////////////////////////////////////////
531/// returns transformed or non-transformed output
532
534{
535 if (ps < fEpsilon) ps = fEpsilon;
536 if (pb < fEpsilon) pb = fEpsilon;
537 Double_t r = ps/(ps + pb);
538 if (r >= 1.0) r = 1. - 1.e-15;
539
540 if (fTransformLikelihoodOutput) {
541 // inverse Fermi function
542
543 // sanity check
544 if (r <= 0.0) r = fEpsilon;
545 else if (r >= 1.0) r = 1. - 1.e-15;
546
547 Double_t tau = 15.0;
548 r = - TMath::Log(1.0/r - 1.0)/tau;
549 }
550
551 return r;
552}
553
554////////////////////////////////////////////////////////////////////////////////
555/// write options to stream
556
557void TMVA::MethodLikelihood::WriteOptionsToStream( std::ostream& o, const TString& prefix ) const
558{
560
561 // writing the options defined for the different pdfs
562 if (fDefaultPDFLik != 0) {
563 o << prefix << std::endl << prefix << "#Default Likelihood PDF Options:" << std::endl << prefix << std::endl;
564 fDefaultPDFLik->WriteOptionsToStream( o, prefix );
565 }
566 for (UInt_t ivar = 0; ivar < fPDFSig->size(); ivar++) {
567 if ((*fPDFSig)[ivar] != 0) {
568 o << prefix << std::endl << prefix << Form("#Signal[%d] Likelihood PDF Options:",ivar) << std::endl << prefix << std::endl;
569 (*fPDFSig)[ivar]->WriteOptionsToStream( o, prefix );
570 }
571 if ((*fPDFBgd)[ivar] != 0) {
572 o << prefix << std::endl << prefix << "#Background[%d] Likelihood PDF Options:" << std::endl << prefix << std::endl;
573 (*fPDFBgd)[ivar]->WriteOptionsToStream( o, prefix );
574 }
575 }
576}
577
578////////////////////////////////////////////////////////////////////////////////
579/// write weights to XML
580
582{
583 void* wght = gTools().AddChild(parent, "Weights");
584 gTools().AddAttr(wght, "NVariables", GetNvar());
585 gTools().AddAttr(wght, "NClasses", 2);
586 void* pdfwrap;
587 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
588 if ( (*fPDFSig)[ivar]==0 || (*fPDFBgd)[ivar]==0 )
589 Log() << kFATAL << "Reference histograms for variable " << ivar
590 << " don't exist, can't write it to weight file" << Endl;
591 pdfwrap = gTools().AddChild(wght, "PDFDescriptor");
592 gTools().AddAttr(pdfwrap, "VarIndex", ivar);
593 gTools().AddAttr(pdfwrap, "ClassIndex", 0);
594 (*fPDFSig)[ivar]->AddXMLTo(pdfwrap);
595 pdfwrap = gTools().AddChild(wght, "PDFDescriptor");
596 gTools().AddAttr(pdfwrap, "VarIndex", ivar);
597 gTools().AddAttr(pdfwrap, "ClassIndex", 1);
598 (*fPDFBgd)[ivar]->AddXMLTo(pdfwrap);
599 }
600}
601
602////////////////////////////////////////////////////////////////////////////////
603/// computes ranking of input variables
604
606{
607 // create the ranking object
608 if (fRanking) delete fRanking;
609 fRanking = new Ranking( GetName(), "Delta Separation" );
610
611 Double_t sepRef = -1, sep = -1;
612 for (Int_t ivar=-1; ivar<(Int_t)GetNvar(); ivar++) {
613
614 // this variable should not be used
615 fDropVariable = ivar;
616
617 TString nameS = Form( "rS_%i", ivar+1 );
618 TString nameB = Form( "rB_%i", ivar+1 );
619 TH1* rS = new TH1F( nameS, nameS, 80, 0, 1 );
620 TH1* rB = new TH1F( nameB, nameB, 80, 0, 1 );
621
622 // the event loop
623 for (Int_t ievt=0; ievt<Data()->GetNTrainingEvents(); ievt++) {
624
625 const Event* origEv = Data()->GetEvent(ievt);
626 GetTransformationHandler().SetTransformationReferenceClass( origEv->GetClass() );
627 const Event* ev = GetTransformationHandler().Transform(Data()->GetEvent(ievt));
628
629 Double_t lk = this->GetMvaValue();
630 Double_t w = ev->GetWeight();
631 if (DataInfo().IsSignal(ev)) rS->Fill( lk, w );
632 else rB->Fill( lk, w );
633 }
634
635 // compute separation
636 sep = TMVA::gTools().GetSeparation( rS, rB );
637 if (ivar == -1) sepRef = sep;
638 sep = sepRef - sep;
639
640 // don't need these histograms anymore
641 delete rS;
642 delete rB;
643
644 if (ivar >= 0) fRanking->AddRank( Rank( DataInfo().GetVariableInfo(ivar).GetInternalName(), sep ) );
645 }
646
647 fDropVariable = -1;
648
649 return fRanking;
650}
651
652////////////////////////////////////////////////////////////////////////////////
653/// write reference PDFs to ROOT file
654
656{
657 TString pname = "PDF_";
658 for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
659 (*fPDFSig)[ivar]->Write( pname + GetInputVar( ivar ) + "_S" );
660 (*fPDFBgd)[ivar]->Write( pname + GetInputVar( ivar ) + "_B" );
661 }
662}
663
664////////////////////////////////////////////////////////////////////////////////
665/// read weights from XML
666
668{
669 TString pname = "PDF_";
670 Bool_t addDirStatus = TH1::AddDirectoryStatus();
671 TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
672 UInt_t nvars=0;
673 gTools().ReadAttr(wghtnode, "NVariables",nvars);
674 void* descnode = gTools().GetChild(wghtnode);
675 for (UInt_t ivar=0; ivar<nvars; ivar++){
676 void* pdfnode = gTools().GetChild(descnode);
677 Log() << kDEBUG << "Reading signal and background PDF for variable: " << GetInputVar( ivar ) << Endl;
678 if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
679 if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
680 (*fPDFSig)[ivar] = new PDF( GetInputVar( ivar ) + " PDF Sig" );
681 (*fPDFBgd)[ivar] = new PDF( GetInputVar( ivar ) + " PDF Bkg" );
682 (*fPDFSig)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
683 (*fPDFBgd)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
684 (*(*fPDFSig)[ivar]).ReadXML(pdfnode);
685 descnode = gTools().GetNextChild(descnode);
686 pdfnode = gTools().GetChild(descnode);
687 (*(*fPDFBgd)[ivar]).ReadXML(pdfnode);
688 descnode = gTools().GetNextChild(descnode);
689 }
690 TH1::AddDirectory(addDirStatus);
691}
692
693////////////////////////////////////////////////////////////////////////////////
694/// read weight info from file
695/// nothing to do for this method
696
698{
699 TString pname = "PDF_";
700 Bool_t addDirStatus = TH1::AddDirectoryStatus();
701 TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
702 for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
703 Log() << kDEBUG << "Reading signal and background PDF for variable: " << GetInputVar( ivar ) << Endl;
704 if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
705 if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
706 (*fPDFSig)[ivar] = new PDF(GetInputVar( ivar ) + " PDF Sig" );
707 (*fPDFBgd)[ivar] = new PDF(GetInputVar( ivar ) + " PDF Bkg");
708 (*fPDFSig)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
709 (*fPDFBgd)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
710 istr >> *(*fPDFSig)[ivar];
711 istr >> *(*fPDFBgd)[ivar];
712 }
713 TH1::AddDirectory(addDirStatus);
714}
715
716////////////////////////////////////////////////////////////////////////////////
717/// read reference PDF from ROOT file
718
720{
721 TString pname = "PDF_";
722 Bool_t addDirStatus = TH1::AddDirectoryStatus();
723 TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
724 for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
725 (*fPDFSig)[ivar] = (TMVA::PDF*)rf.Get( Form( "PDF_%s_S", GetInputVar( ivar ).Data() ) );
726 (*fPDFBgd)[ivar] = (TMVA::PDF*)rf.Get( Form( "PDF_%s_B", GetInputVar( ivar ).Data() ) );
727 }
728 TH1::AddDirectory(addDirStatus);
729}
730
731////////////////////////////////////////////////////////////////////////////////
732/// write histograms and PDFs to file for monitoring purposes
733
735{
736 Log() << kINFO << "Write monitoring histograms to file: " << BaseDir()->GetPath() << Endl;
737 BaseDir()->cd();
738
739 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
740 (*fHistSig)[ivar]->Write();
741 (*fHistBgd)[ivar]->Write();
742 if ((*fHistSig_smooth)[ivar] != 0) (*fHistSig_smooth)[ivar]->Write();
743 if ((*fHistBgd_smooth)[ivar] != 0) (*fHistBgd_smooth)[ivar]->Write();
744 (*fPDFSig)[ivar]->GetPDFHist()->Write();
745 (*fPDFBgd)[ivar]->GetPDFHist()->Write();
746
747 if ((*fPDFSig)[ivar]->GetNSmoothHist() != 0) (*fPDFSig)[ivar]->GetNSmoothHist()->Write();
748 if ((*fPDFBgd)[ivar]->GetNSmoothHist() != 0) (*fPDFBgd)[ivar]->GetNSmoothHist()->Write();
749
750 // add special plots to check the smoothing in the GetVal method
751 Float_t xmin=((*fPDFSig)[ivar]->GetPDFHist()->GetXaxis())->GetXmin();
752 Float_t xmax=((*fPDFSig)[ivar]->GetPDFHist()->GetXaxis())->GetXmax();
753 TH1F* mm = new TH1F( (*fInputVars)[ivar]+"_additional_check",
754 (*fInputVars)[ivar]+"_additional_check", 15000, xmin, xmax );
755 Double_t intBin = (xmax-xmin)/15000;
756 for (Int_t bin=0; bin < 15000; bin++) {
757 Double_t x = (bin + 0.5)*intBin + xmin;
758 mm->SetBinContent(bin+1 ,(*fPDFSig)[ivar]->GetVal(x));
759 }
760 mm->Write();
761
762 // ---------- create cloned low-binned histogram for comparison in macros (mainly necessary for KDE)
763 TH1* h[2] = { (*fHistSig)[ivar], (*fHistBgd)[ivar] };
764 for (UInt_t i=0; i<2; i++) {
765 TH1* hclone = (TH1F*)h[i]->Clone( TString(h[i]->GetName()) + "_nice" );
766 hclone->SetName ( TString(h[i]->GetName()) + "_nice" );
767 hclone->SetTitle( TString(h[i]->GetTitle()) + "" );
768 if (hclone->GetNbinsX() > 100) {
769 Int_t resFactor = 5;
770 hclone->Rebin( resFactor );
771 hclone->Scale( 1.0/resFactor );
772 }
773 hclone->Write();
774 }
775 // ----------
776 }
777}
778
779////////////////////////////////////////////////////////////////////////////////
780/// write specific header of the classifier (mostly include files)
781
782void TMVA::MethodLikelihood::MakeClassSpecificHeader( std::ostream& fout, const TString& ) const
783{
784 fout << "#include <math.h>" << std::endl;
785 fout << "#include <cstdlib>" << std::endl;
786}
787
788////////////////////////////////////////////////////////////////////////////////
789/// write specific classifier response
790
791void TMVA::MethodLikelihood::MakeClassSpecific( std::ostream& fout, const TString& className ) const
792{
793 Int_t dp = fout.precision();
794 fout << " double fEpsilon;" << std::endl;
795
796 Int_t * nbin = new Int_t[GetNvar()];
797
798 Int_t nbinMax=-1;
799 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
800 nbin[ivar]=(*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX();
801 if (nbin[ivar] > nbinMax) nbinMax=nbin[ivar];
802 }
803
804 fout << " static float fRefS[][" << nbinMax << "]; "
805 << "// signal reference vector [nvars][max_nbins]" << std::endl;
806 fout << " static float fRefB[][" << nbinMax << "]; "
807 << "// backgr reference vector [nvars][max_nbins]" << std::endl << std::endl;
808 fout << "// if a variable has its PDF encoded as a spline0 --> treat it like an Integer valued one" <<std::endl;
809 fout << " bool fHasDiscretPDF[" << GetNvar() <<"]; "<< std::endl;
810 fout << " int fNbin[" << GetNvar() << "]; "
811 << "// number of bins (discrete variables may have less bins)" << std::endl;
812 fout << " double fHistMin[" << GetNvar() << "]; " << std::endl;
813 fout << " double fHistMax[" << GetNvar() << "]; " << std::endl;
814
815 fout << " double TransformLikelihoodOutput( double, double ) const;" << std::endl;
816 fout << "};" << std::endl;
817 fout << "" << std::endl;
818 fout << "inline void " << className << "::Initialize() " << std::endl;
819 fout << "{" << std::endl;
820 fout << " fEpsilon = " << fEpsilon << ";" << std::endl;
821 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
822 fout << " fNbin[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() << ";" << std::endl;
823 fout << " fHistMin[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetXaxis()->GetXmin() << ";" << std::endl;
824 fout << " fHistMax[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetXaxis()->GetXmax() << ";" << std::endl;
825 // sanity check (for previous code lines)
826 if ((((*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() != nbin[ivar] ||
827 (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX() != nbin[ivar])
828 ) ||
829 (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() != (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX()) {
830 Log() << kFATAL << "<MakeClassSpecific> Mismatch in binning of variable "
831 << "\"" << GetOriginalVarName(ivar) << "\" of type: \'" << DataInfo().GetVariableInfo(ivar).GetVarType()
832 << "\' : "
833 << "nxS = " << (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() << ", "
834 << "nxB = " << (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX()
835 << " while we expect " << nbin[ivar]
836 << Endl;
837 }
838 }
839 for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
840 if ((*fPDFSig)[ivar]->GetInterpolMethod() == TMVA::PDF::kSpline0)
841 fout << " fHasDiscretPDF[" << ivar <<"] = true; " << std::endl;
842 else
843 fout << " fHasDiscretPDF[" << ivar <<"] = false; " << std::endl;
844 }
845
846 fout << "}" << std::endl << std::endl;
847
848 fout << "inline double " << className
849 << "::GetMvaValue__( const std::vector<double>& inputValues ) const" << std::endl;
850 fout << "{" << std::endl;
851 fout << " double ps(1), pb(1);" << std::endl;
852 fout << " std::vector<double> inputValuesSig = inputValues;" << std::endl;
853 fout << " std::vector<double> inputValuesBgd = inputValues;" << std::endl;
854 if (GetTransformationHandler().GetTransformationList().GetSize() != 0) {
855 fout << " Transform(inputValuesSig,0);" << std::endl;
856 fout << " Transform(inputValuesBgd,1);" << std::endl;
857 }
858 fout << " for (size_t ivar = 0; ivar < GetNvar(); ivar++) {" << std::endl;
859 fout << std::endl;
860 fout << " // dummy at present... will be used for variable transforms" << std::endl;
861 fout << " double x[2] = { inputValuesSig[ivar], inputValuesBgd[ivar] };" << std::endl;
862 fout << std::endl;
863 fout << " for (int itype=0; itype < 2; itype++) {" << std::endl;
864 fout << std::endl;
865 fout << " // interpolate linearly between adjacent bins" << std::endl;
866 fout << " // this is not useful for discrete variables (or forced Spline0)" << std::endl;
867 fout << " int bin = int((x[itype] - fHistMin[ivar])/(fHistMax[ivar] - fHistMin[ivar])*fNbin[ivar]) + 0;" << std::endl;
868 fout << std::endl;
869 fout << " // since the test data sample is in general different from the training sample" << std::endl;
870 fout << " // it can happen that the min/max of the training sample are trespassed --> correct this" << std::endl;
871 fout << " if (bin < 0) {" << std::endl;
872 fout << " bin = 0;" << std::endl;
873 fout << " x[itype] = fHistMin[ivar];" << std::endl;
874 fout << " }" << std::endl;
875 fout << " else if (bin >= fNbin[ivar]) {" << std::endl;
876 fout << " bin = fNbin[ivar]-1;" << std::endl;
877 fout << " x[itype] = fHistMax[ivar];" << std::endl;
878 fout << " }" << std::endl;
879 fout << std::endl;
880 fout << " // find corresponding histogram from cached indices" << std::endl;
881 fout << " float ref = (itype == 0) ? fRefS[ivar][bin] : fRefB[ivar][bin];" << std::endl;
882 fout << std::endl;
883 fout << " // sanity check" << std::endl;
884 fout << " if (ref < 0) {" << std::endl;
885 fout << " std::cout << \"Fatal error in " << className
886 << ": bin entry < 0 ==> abort\" << std::endl;" << std::endl;
887 fout << " std::exit(1);" << std::endl;
888 fout << " }" << std::endl;
889 fout << std::endl;
890 fout << " double p = ref;" << std::endl;
891 fout << std::endl;
892 fout << " if (GetType(ivar) != 'I' && !fHasDiscretPDF[ivar]) {" << std::endl;
893 fout << " float bincenter = (bin + 0.5)/fNbin[ivar]*(fHistMax[ivar] - fHistMin[ivar]) + fHistMin[ivar];" << std::endl;
894 fout << " int nextbin = bin;" << std::endl;
895 fout << " if ((x[itype] > bincenter && bin != fNbin[ivar]-1) || bin == 0) " << std::endl;
896 fout << " nextbin++;" << std::endl;
897 fout << " else" << std::endl;
898 fout << " nextbin--; " << std::endl;
899 fout << std::endl;
900 fout << " double refnext = (itype == 0) ? fRefS[ivar][nextbin] : fRefB[ivar][nextbin];" << std::endl;
901 fout << " float nextbincenter = (nextbin + 0.5)/fNbin[ivar]*(fHistMax[ivar] - fHistMin[ivar]) + fHistMin[ivar];" << std::endl;
902 fout << std::endl;
903 fout << " double dx = bincenter - nextbincenter;" << std::endl;
904 fout << " double dy = ref - refnext;" << std::endl;
905 fout << " p += (x[itype] - bincenter) * dy/dx;" << std::endl;
906 fout << " }" << std::endl;
907 fout << std::endl;
908 fout << " if (p < fEpsilon) p = fEpsilon; // avoid zero response" << std::endl;
909 fout << std::endl;
910 fout << " if (itype == 0) ps *= p;" << std::endl;
911 fout << " else pb *= p;" << std::endl;
912 fout << " } " << std::endl;
913 fout << " } " << std::endl;
914 fout << std::endl;
915 fout << " // the likelihood ratio (transform it ?)" << std::endl;
916 fout << " return TransformLikelihoodOutput( ps, pb ); " << std::endl;
917 fout << "}" << std::endl << std::endl;
918
919 fout << "inline double " << className << "::TransformLikelihoodOutput( double ps, double pb ) const" << std::endl;
920 fout << "{" << std::endl;
921 fout << " // returns transformed or non-transformed output" << std::endl;
922 fout << " if (ps < fEpsilon) ps = fEpsilon;" << std::endl;
923 fout << " if (pb < fEpsilon) pb = fEpsilon;" << std::endl;
924 fout << " double r = ps/(ps + pb);" << std::endl;
925 fout << " if (r >= 1.0) r = 1. - 1.e-15;" << std::endl;
926 fout << std::endl;
927 fout << " if (" << (fTransformLikelihoodOutput ? "true" : "false") << ") {" << std::endl;
928 fout << " // inverse Fermi function" << std::endl;
929 fout << std::endl;
930 fout << " // sanity check" << std::endl;
931 fout << " if (r <= 0.0) r = fEpsilon;" << std::endl;
932 fout << " else if (r >= 1.0) r = 1. - 1.e-15;" << std::endl;
933 fout << std::endl;
934 fout << " double tau = 15.0;" << std::endl;
935 fout << " r = - log(1.0/r - 1.0)/tau;" << std::endl;
936 fout << " }" << std::endl;
937 fout << std::endl;
938 fout << " return r;" << std::endl;
939 fout << "}" << std::endl;
940 fout << std::endl;
941
942 fout << "// Clean up" << std::endl;
943 fout << "inline void " << className << "::Clear() " << std::endl;
944 fout << "{" << std::endl;
945 fout << " // nothing to clear" << std::endl;
946 fout << "}" << std::endl << std::endl;
947
948 fout << "// signal map" << std::endl;
949 fout << "float " << className << "::fRefS[][" << nbinMax << "] = " << std::endl;
950 fout << "{ " << std::endl;
951 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
952 fout << " { ";
953 for (Int_t ibin=1; ibin<=nbinMax; ibin++) {
954 if (ibin-1 < nbin[ivar])
955 fout << (*fPDFSig)[ivar]->GetPDFHist()->GetBinContent(ibin);
956 else
957 fout << -1;
958
959 if (ibin < nbinMax) fout << ", ";
960 }
961 fout << " }, " << std::endl;
962 }
963 fout << "}; " << std::endl;
964 fout << std::endl;
965
966 fout << "// background map" << std::endl;
967 fout << "float " << className << "::fRefB[][" << nbinMax << "] = " << std::endl;
968 fout << "{ " << std::endl;
969 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
970 fout << " { ";
971 fout << std::setprecision(8);
972 for (Int_t ibin=1; ibin<=nbinMax; ibin++) {
973 if (ibin-1 < nbin[ivar])
974 fout << (*fPDFBgd)[ivar]->GetPDFHist()->GetBinContent(ibin);
975 else
976 fout << -1;
977
978 if (ibin < nbinMax) fout << ", ";
979 }
980 fout << " }, " << std::endl;
981 }
982 fout << "}; " << std::endl;
983 fout << std::endl;
984 fout << std::setprecision(dp);
985
986 delete[] nbin;
987}
988
989////////////////////////////////////////////////////////////////////////////////
990/// get help message text
991///
992/// typical length of text line:
993/// "|--------------------------------------------------------------|"
994
996{
997 Log() << Endl;
998 Log() << gTools().Color("bold") << "--- Short description:" << gTools().Color("reset") << Endl;
999 Log() << Endl;
1000 Log() << "The maximum-likelihood classifier models the data with probability " << Endl;
1001 Log() << "density functions (PDF) reproducing the signal and background" << Endl;
1002 Log() << "distributions of the input variables. Correlations among the " << Endl;
1003 Log() << "variables are ignored." << Endl;
1004 Log() << Endl;
1005 Log() << gTools().Color("bold") << "--- Performance optimisation:" << gTools().Color("reset") << Endl;
1006 Log() << Endl;
1007 Log() << "Required for good performance are decorrelated input variables" << Endl;
1008 Log() << "(PCA transformation via the option \"VarTransform=Decorrelate\"" << Endl;
1009 Log() << "may be tried). Irreducible non-linear correlations may be reduced" << Endl;
1010 Log() << "by precombining strongly correlated input variables, or by simply" << Endl;
1011 Log() << "removing one of the variables." << Endl;
1012 Log() << Endl;
1013 Log() << gTools().Color("bold") << "--- Performance tuning via configuration options:" << gTools().Color("reset") << Endl;
1014 Log() << Endl;
1015 Log() << "High fidelity PDF estimates are mandatory, i.e., sufficient training " << Endl;
1016 Log() << "statistics is required to populate the tails of the distributions" << Endl;
1017 Log() << "It would be a surprise if the default Spline or KDE kernel parameters" << Endl;
1018 Log() << "provide a satisfying fit to the data. The user is advised to properly" << Endl;
1019 Log() << "tune the events per bin and smooth options in the spline cases" << Endl;
1020 Log() << "individually per variable. If the KDE kernel is used, the adaptive" << Endl;
1021 Log() << "Gaussian kernel may lead to artefacts, so please always also try" << Endl;
1022 Log() << "the non-adaptive one." << Endl;
1023 Log() << "" << Endl;
1024 Log() << "All tuning parameters must be adjusted individually for each input" << Endl;
1025 Log() << "variable!" << Endl;
1026}
1027
#define REGISTER_METHOD(CLASS)
for example
ROOT::R::TRInterface & r
Definition Object.C:4
#define h(i)
Definition RSha256.hxx:106
int Int_t
Definition RtypesCore.h:45
const Bool_t kFALSE
Definition RtypesCore.h:92
double Double_t
Definition RtypesCore.h:59
float Float_t
Definition RtypesCore.h:57
const Bool_t kTRUE
Definition RtypesCore.h:91
#define ClassImp(name)
Definition Rtypes.h:364
int type
Definition TGX11.cxx:121
float xmin
float xmax
char * Form(const char *fmt,...)
TVectorT< Float_t > TVector
Definition TVectorfwd.h:23
TObject * Get(const char *namecycle) override
Return pointer to object identified by namecycle.
A ROOT file is a suite of consecutive data records (TKey instances) with a well defined format.
Definition TFile.h:54
1-D histogram with a float per channel (see TH1 documentation)}
Definition TH1.h:575
TH1 is the base class of all histogram classes in ROOT.
Definition TH1.h:58
virtual void SetTitle(const char *title)
See GetStatOverflows for more information.
Definition TH1.cxx:6678
virtual Double_t GetBinCenter(Int_t bin) const
Return bin center for 1D histogram.
Definition TH1.cxx:8981
static void AddDirectory(Bool_t add=kTRUE)
Sets the flag controlling the automatic add of histograms in memory.
Definition TH1.cxx:1282
virtual Int_t GetNbinsX() const
Definition TH1.h:296
virtual Int_t Fill(Double_t x)
Increment bin with abscissa X by 1.
Definition TH1.cxx:3350
virtual void SetBinContent(Int_t bin, Double_t content)
Set bin content see convention for numbering bins in TH1::GetBin In case the bin number is greater th...
Definition TH1.cxx:9062
virtual void SetName(const char *name)
Change the name of this histogram.
Definition TH1.cxx:8800
virtual Double_t GetBinContent(Int_t bin) const
Return content of bin number bin.
Definition TH1.cxx:4993
virtual void Scale(Double_t c1=1, Option_t *option="")
Multiply this histogram by a constant c1.
Definition TH1.cxx:6564
virtual Int_t FindBin(Double_t x, Double_t y=0, Double_t z=0)
Return Global bin number corresponding to x,y,z.
Definition TH1.cxx:3680
static Bool_t AddDirectoryStatus()
Static function: cannot be inlined on Windows/NT.
Definition TH1.cxx:750
virtual TH1 * Rebin(Int_t ngroup=2, const char *newname="", const Double_t *xbins=0)
Rebin this histogram.
Definition TH1.cxx:6222
void WriteOptionsToStream(std::ostream &o, const TString &prefix) const
write options to output stream (e.g. in writing the MVA weight files
Class that contains all the data information.
Definition DataSetInfo.h:62
Float_t GetValue(UInt_t ivar) const
return value of i'th variable
Definition Event.cxx:236
Double_t GetWeight() const
return the event weight - depending on whether the flag IgnoreNegWeightsInTraining is or not.
Definition Event.cxx:381
UInt_t GetClass() const
Definition Event.h:86
Virtual base Class for all MVA method.
Definition MethodBase.h:111
virtual void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
Likelihood analysis ("non-parametric approach")
const Ranking * CreateRanking()
computes ranking of input variables
void Train()
create reference distributions (PDFs) from signal and background events: fill histograms and smooth t...
virtual Bool_t HasAnalysisType(Types::EAnalysisType type, UInt_t numberClasses, UInt_t numberTargets)
FDA can handle classification with 2 classes.
virtual void WriteOptionsToStream(std::ostream &o, const TString &prefix) const
write options to stream
void WriteMonitoringHistosToFile() const
write histograms and PDFs to file for monitoring purposes
void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
void MakeClassSpecific(std::ostream &, const TString &) const
write specific classifier response
virtual ~MethodLikelihood()
destructor
void Init()
default initialisation called by all constructors
void ReadWeightsFromStream(std::istream &istr)
read weight info from file nothing to do for this method
MethodLikelihood(const TString &jobName, const TString &methodTitle, DataSetInfo &theData, const TString &theOption="")
standard constructor
void GetHelpMessage() const
get help message text
Double_t GetMvaValue(Double_t *err=0, Double_t *errUpper=0)
returns the likelihood estimator for signal fill a new Likelihood branch into the testTree
void ReadWeightsFromXML(void *wghtnode)
read weights from XML
void MakeClassSpecificHeader(std::ostream &, const TString &="") const
write specific header of the classifier (mostly include files)
Double_t TransformLikelihoodOutput(Double_t ps, Double_t pb) const
returns transformed or non-transformed output
void ProcessOptions()
process user options reference cut value to distinguish signal-like from background-like events
void WriteWeightsToStream(TFile &rf) const
write reference PDFs to ROOT file
void AddWeightsXMLTo(void *parent) const
write weights to XML
void DeclareOptions()
define the options (their key words) that can be set in the option string
PDF wrapper for histograms; uses user-defined spline interpolation.
Definition PDF.h:63
TH1 * GetPDFHist() const
Definition PDF.h:92
@ kSpline0
Definition PDF.h:70
Ranking for variables in method (implementation)
Definition Ranking.h:48
void * GetNextChild(void *prevchild, const char *childname=0)
XML helpers.
Definition Tools.cxx:1174
Double_t GetSeparation(TH1 *S, TH1 *B) const
compute "separation" defined as
Definition Tools.cxx:133
void * AddChild(void *parent, const char *childname, const char *content=0, bool isRootNode=false)
add child node
Definition Tools.cxx:1136
const TString & Color(const TString &)
human readable color strings
Definition Tools.cxx:840
void * GetChild(void *parent, const char *childname=0)
get child node
Definition Tools.cxx:1162
void ReadAttr(void *node, const char *, T &value)
read attribute from xml
Definition Tools.h:335
void AddAttr(void *node, const char *, const T &value, Int_t precision=16)
add attribute to xml
Definition Tools.h:353
Singleton class for Global types used by TMVA.
Definition Types.h:73
@ kClassification
Definition Types.h:129
virtual Int_t Write(const char *name=0, Int_t option=0, Int_t bufsize=0)
Write this object to the current directory.
Definition TObject.cxx:798
Basic string class.
Definition TString.h:136
const char * Data() const
Definition TString.h:369
Double_t x[n]
Definition legend1.C:17
create variable transformations
Tools & gTools()
MsgLogger & Endl(MsgLogger &ml)
Definition MsgLogger.h:158
Int_t Nint(T x)
Round to nearest integer. Rounds half integers to the nearest even integer.
Definition TMath.h:713
Short_t Max(Short_t a, Short_t b)
Definition TMathBase.h:212
Double_t Log(Double_t x)
Definition TMath.h:760
Short_t Min(Short_t a, Short_t b)
Definition TMathBase.h:180