Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
MethodLikelihood.cxx
Go to the documentation of this file.
1// @(#)root/tmva $Id$
2// Author: Andreas Hoecker, Joerg Stelzer, Helge Voss, Kai Voss, Eckhard von Toerne, Jan Therhaag
3
4/**********************************************************************************
5 * Project: TMVA - a Root-integrated toolkit for multivariate Data analysis *
6 * Package: TMVA *
7 * Class : TMVA::MethodLikelihood *
8 * *
9 * *
10 * Description: *
11 * Implementation (see header for description) *
12 * *
13 * Authors (alphabetical): *
14 * Andreas Hoecker <Andreas.Hocker@cern.ch> - CERN, Switzerland *
15 * Helge Voss <Helge.Voss@cern.ch> - MPI-K Heidelberg, Germany *
16 * Kai Voss <Kai.Voss@cern.ch> - U. of Victoria, Canada *
17 * Jan Therhaag <Jan.Therhaag@cern.ch> - U of Bonn, Germany *
18 * Eckhard v. Toerne <evt@uni-bonn.de> - U. of Bonn, Germany *
19 * *
20 * Copyright (c) 2005-2011: *
21 * CERN, Switzerland *
22 * U. of Victoria, Canada *
23 * MPI-K Heidelberg, Germany *
24 * U. of Bonn, Germany *
25 * *
26 * Redistribution and use in source and binary forms, with or without *
27 * modification, are permitted according to the terms listed in LICENSE *
28 * (see tmva/doc/LICENSE) *
29 **********************************************************************************/
30
31/*! \class TMVA::MethodLikelihood
32\ingroup TMVA
33
34Likelihood analysis ("non-parametric approach")
35
36
37Also implemented is a "diagonalized likelihood approach",
38which improves over the uncorrelated likelihood approach by
39transforming linearly the input variables into a diagonal space,
40using the square-root of the covariance matrix
41
42
43The method of maximum likelihood is the most straightforward, and
44certainly among the most elegant multivariate analyser approaches.
45We define the likelihood ratio, \f$ R_L \f$, for event
46\f$ i \f$, by:
47
48\f[
49R_L(i) = \frac{L_S(i)}{L_B(i) + L_B(i)}
50\f]
51
52Here the signal and background likelihoods, \f$ L_S \f$,
53\f$ L_B \f$, are products of the corresponding probability
54densities, \f$ p_S \f$, \f$ p_B \f$, of the
55\f$ N_{var} \f$ discriminating variables used in the MVA:
56
57\f[
58L_S(i) \ \prod_{j=1}^{N_{var}} p_{Sj} (i)
59\f]
60
61and accordingly for \f$ L_B \f$.
62In practise, TMVA uses polynomial splines to estimate the probability
63density functions (PDF) obtained from the distributions of the
64training variables.
65
66
67Note that in TMVA the output of the likelihood ratio is transformed by:
68
69\f[
70R_L(i) \to R'_L(i) = -\frac{1}{\tau} ln(R_L^{-1}(i) -1)
71\f]
72
73to avoid the occurrence of heavy peaks at \f$ R_L = 0.1 \f$ .
74
75#### Decorrelated (or "diagonalized") Likelihood
76
77The biggest drawback of the Likelihood approach is that it assumes
78that the discriminant variables are uncorrelated. If it were the case,
79it can be proven that the discrimination obtained by the above likelihood
80ratio is optimal, ie, no other method can beat it. However, in most
81practical applications of MVAs correlations are present. </p>
82
83
84Linear correlations, measured from the training sample, can be taken
85into account in a straightforward manner through the square-root
86of the covariance matrix. The square-root of a matrix
87\f$ C \f$ is the matrix \f$ C&prime; \f$ that multiplied with itself
88yields \f$ C \f$: \f$ C \f$=\f$ C&prime;C&prime; \f$. We compute the
89square-root matrix (SQM) by means of diagonalising (\f$ D \f$) the
90covariance matrix:
91
92\f[
93D = S^TCS \Rightarrow C' = S \sqrt{DS^T}
94\f]
95
96and the linear transformation of the linearly correlated into the
97uncorrelated variables space is then given by multiplying the measured
98variable tuple by the inverse of the SQM. Note that these transformations
99are performed for both signal and background separately, since the
100correlation pattern is not the same in the two samples.
101
102
103The above diagonalisation is complete for linearly correlated,
104Gaussian distributed variables only. In real-world examples this
105is not often the case, so that only little additional information
106may be recovered by the diagonalisation procedure. In these cases,
107non-linear methods must be applied.
108*/
109
111
112#include "TMVA/Configurable.h"
114#include "TMVA/DataSet.h"
115#include "TMVA/DataSetInfo.h"
116#include "TMVA/IMethod.h"
117#include "TMVA/MethodBase.h"
118#include "TMVA/MsgLogger.h"
119#include "TMVA/PDF.h"
120#include "TMVA/Ranking.h"
121#include "TMVA/Tools.h"
122#include "TMVA/Types.h"
123#include "TMVA/VariableInfo.h"
124
125#include "TVector.h"
126#include "TMath.h"
127#include "TFile.h"
128#include "TH1.h"
129
130#include <iostream>
131#include <iomanip>
132#include <vector>
133#include <cstdlib>
134
135REGISTER_METHOD(Likelihood)
136
138
139////////////////////////////////////////////////////////////////////////////////
140/// standard constructor
141
143 const TString& methodTitle,
144 DataSetInfo& theData,
145 const TString& theOption ) :
146 TMVA::MethodBase( jobName, Types::kLikelihood, methodTitle, theData, theOption),
147 fEpsilon ( 1.e3 * DBL_MIN ),
148 fTransformLikelihoodOutput( kFALSE ),
149 fDropVariable ( 0 ),
150 fHistSig ( 0 ),
151 fHistBgd ( 0 ),
152 fHistSig_smooth( 0 ),
153 fHistBgd_smooth( 0 ),
154 fDefaultPDFLik ( 0 ),
155 fPDFSig ( 0 ),
156 fPDFBgd ( 0 ),
157 fNsmooth ( 2 ),
158 fNsmoothVarS ( 0 ),
159 fNsmoothVarB ( 0 ),
160 fAverageEvtPerBin( 0 ),
161 fAverageEvtPerBinVarS (0),
162 fAverageEvtPerBinVarB (0),
163 fKDEfineFactor ( 0 ),
164 fInterpolateString(0)
165{
166}
167
168////////////////////////////////////////////////////////////////////////////////
169/// construct likelihood references from file
170
172 const TString& theWeightFile) :
173 TMVA::MethodBase( Types::kLikelihood, theData, theWeightFile),
174 fEpsilon ( 1.e3 * DBL_MIN ),
175 fTransformLikelihoodOutput( kFALSE ),
176 fDropVariable ( 0 ),
177 fHistSig ( 0 ),
178 fHistBgd ( 0 ),
179 fHistSig_smooth( 0 ),
180 fHistBgd_smooth( 0 ),
181 fDefaultPDFLik ( 0 ),
182 fPDFSig ( 0 ),
183 fPDFBgd ( 0 ),
184 fNsmooth ( 2 ),
185 fNsmoothVarS ( 0 ),
186 fNsmoothVarB ( 0 ),
187 fAverageEvtPerBin( 0 ),
188 fAverageEvtPerBinVarS (0),
189 fAverageEvtPerBinVarB (0),
190 fKDEfineFactor ( 0 ),
191 fInterpolateString(0)
192{
193}
194
195////////////////////////////////////////////////////////////////////////////////
196/// destructor
197
199{
200 if (NULL != fDefaultPDFLik) delete fDefaultPDFLik;
201 if (NULL != fHistSig) delete fHistSig;
202 if (NULL != fHistBgd) delete fHistBgd;
203 if (NULL != fHistSig_smooth) delete fHistSig_smooth;
204 if (NULL != fHistBgd_smooth) delete fHistBgd_smooth;
205 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
206 if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
207 if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
208 }
209 if (NULL != fPDFSig) delete fPDFSig;
210 if (NULL != fPDFBgd) delete fPDFBgd;
211}
212
213////////////////////////////////////////////////////////////////////////////////
214/// FDA can handle classification with 2 classes
215
217 UInt_t numberClasses, UInt_t /*numberTargets*/ )
218{
219 if (type == Types::kClassification && numberClasses == 2) return kTRUE;
220 return kFALSE;
221}
222
223////////////////////////////////////////////////////////////////////////////////
224/// default initialisation called by all constructors
225
227{
228 // no ranking test
229 fDropVariable = -1;
230 fHistSig = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
231 fHistBgd = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
232 fHistSig_smooth = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
233 fHistBgd_smooth = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
234 fPDFSig = new std::vector<TMVA::PDF*>( GetNvar(), (TMVA::PDF*)0 );
235 fPDFBgd = new std::vector<TMVA::PDF*>( GetNvar(), (TMVA::PDF*)0 );
236}
237
238////////////////////////////////////////////////////////////////////////////////
239/// define the options (their key words) that can be set in the option string
240///
241/// TransformOutput `<bool>` transform (often strongly peaked) likelihood output through sigmoid inversion
242
244{
245 DeclareOptionRef( fTransformLikelihoodOutput = kFALSE, "TransformOutput",
246 "Transform likelihood output by inverse sigmoid function" );
247
248 // initialize
249
250 // reading every PDF's definition and passing the option string to the next one to be read and marked
251 TString updatedOptions = GetOptions();
252 fDefaultPDFLik = new PDF( TString(GetName()) + " PDF", updatedOptions );
253 fDefaultPDFLik->DeclareOptions();
254 fDefaultPDFLik->ParseOptions();
255 updatedOptions = fDefaultPDFLik->GetOptions();
256 for (UInt_t ivar = 0; ivar< DataInfo().GetNVariables(); ivar++) {
257 (*fPDFSig)[ivar] = new PDF( TString::Format("%s PDF Sig[%d]", GetName(), ivar), updatedOptions,
258 TString::Format("Sig[%d]",ivar), fDefaultPDFLik );
259 (*fPDFSig)[ivar]->DeclareOptions();
260 (*fPDFSig)[ivar]->ParseOptions();
261 updatedOptions = (*fPDFSig)[ivar]->GetOptions();
262 (*fPDFBgd)[ivar] = new PDF( TString::Format("%s PDF Bkg[%d]", GetName(), ivar), updatedOptions,
263 TString::Format("Bkg[%d]",ivar), fDefaultPDFLik );
264 (*fPDFBgd)[ivar]->DeclareOptions();
265 (*fPDFBgd)[ivar]->ParseOptions();
266 updatedOptions = (*fPDFBgd)[ivar]->GetOptions();
267 }
268
269 // the final marked option string is written back to the original likelihood
270 SetOptions( updatedOptions );
271}
272
273
275{
276 // options that are used ONLY for the READER to ensure backward compatibility
277
279 DeclareOptionRef( fNsmooth = 1, "NSmooth",
280 "Number of smoothing iterations for the input histograms");
281 DeclareOptionRef( fAverageEvtPerBin = 50, "NAvEvtPerBin",
282 "Average number of events per PDF bin");
283 DeclareOptionRef( fKDEfineFactor =1. , "KDEFineFactor",
284 "Fine tuning factor for Adaptive KDE: Factor to multiply the width of the kernel");
285 DeclareOptionRef( fBorderMethodString = "None", "KDEborder",
286 "Border effects treatment (1=no treatment , 2=kernel renormalization, 3=sample mirroring)" );
287 DeclareOptionRef( fKDEiterString = "Nonadaptive", "KDEiter",
288 "Number of iterations (1=non-adaptive, 2=adaptive)" );
289 DeclareOptionRef( fKDEtypeString = "Gauss", "KDEtype",
290 "KDE kernel type (1=Gauss)" );
291 fAverageEvtPerBinVarS = new Int_t[GetNvar()];
292 fAverageEvtPerBinVarB = new Int_t[GetNvar()];
293 fNsmoothVarS = new Int_t[GetNvar()];
294 fNsmoothVarB = new Int_t[GetNvar()];
295 fInterpolateString = new TString[GetNvar()];
296 for(UInt_t i=0; i<GetNvar(); ++i) {
297 fAverageEvtPerBinVarS[i] = fAverageEvtPerBinVarB[i] = 0;
298 fNsmoothVarS[i] = fNsmoothVarB[i] = 0;
299 fInterpolateString[i] = "";
300 }
301 DeclareOptionRef( fAverageEvtPerBinVarS, GetNvar(), "NAvEvtPerBinSig",
302 "Average num of events per PDF bin and variable (signal)");
303 DeclareOptionRef( fAverageEvtPerBinVarB, GetNvar(), "NAvEvtPerBinBkg",
304 "Average num of events per PDF bin and variable (background)");
305 DeclareOptionRef(fNsmoothVarS, GetNvar(), "NSmoothSig",
306 "Number of smoothing iterations for the input histograms");
307 DeclareOptionRef(fNsmoothVarB, GetNvar(), "NSmoothBkg",
308 "Number of smoothing iterations for the input histograms");
309 DeclareOptionRef(fInterpolateString, GetNvar(), "PDFInterpol", "Method of interpolating reference histograms (e.g. Spline2 or KDE)");
310}
311
312////////////////////////////////////////////////////////////////////////////////
313/// process user options
314/// reference cut value to distinguish signal-like from background-like events
315
317{
318 SetSignalReferenceCut( TransformLikelihoodOutput( 0.5, 0.5 ) );
319
320 fDefaultPDFLik->ProcessOptions();
321 for (UInt_t ivar = 0; ivar< DataInfo().GetNVariables(); ivar++) {
322 (*fPDFBgd)[ivar]->ProcessOptions();
323 (*fPDFSig)[ivar]->ProcessOptions();
324 }
325}
326
327////////////////////////////////////////////////////////////////////////////////
328/// create reference distributions (PDFs) from signal and background events:
329/// fill histograms and smooth them; if decorrelation is required, compute
330/// corresponding square-root matrices
331/// the reference histograms require the correct boundaries. Since in Likelihood classification
332/// the transformations are applied using both classes, also the corresponding boundaries
333/// need to take this into account
334
336{
337 UInt_t nvar=GetNvar();
338 std::vector<Double_t> xmin(nvar), xmax(nvar);
339 for (UInt_t ivar=0; ivar<nvar; ivar++) {xmin[ivar]=1e30; xmax[ivar]=-1e30;}
340
341 UInt_t nevents=Data()->GetNEvents();
342 for (UInt_t ievt=0; ievt<nevents; ievt++) {
343 // use the true-event-type's transformation
344 // set the event true event types transformation
345 const Event* origEv = Data()->GetEvent(ievt);
346 if (IgnoreEventsWithNegWeightsInTraining() && origEv->GetWeight()<=0) continue;
347 // loop over classes
348 for (int cls=0;cls<2;cls++){
349 GetTransformationHandler().SetTransformationReferenceClass(cls);
350 const Event* ev = GetTransformationHandler().Transform( origEv );
351 for (UInt_t ivar=0; ivar<nvar; ivar++) {
352 Float_t value = ev->GetValue(ivar);
353 if (value < xmin[ivar]) xmin[ivar] = value;
354 if (value > xmax[ivar]) xmax[ivar] = value;
355 }
356 }
357 }
358
359 // create reference histograms
360 // (KDE smoothing requires very finely binned reference histograms)
361 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
362 TString var = (*fInputVars)[ivar];
363
364 // the reference histograms require the correct boundaries. Since in Likelihood classification
365 // the transformations are applied using both classes, also the corresponding boundaries
366 // need to take this into account
367
368 // special treatment for discrete variables
369 if (DataInfo().GetVariableInfo(ivar).GetVarType() == 'I') {
370 // special treatment for integer variables
371 Int_t ixmin = TMath::Nint( xmin[ivar] );
372 xmax[ivar]=xmax[ivar]+1; // make sure that all entries are included in histogram
373 Int_t ixmax = TMath::Nint( xmax[ivar] );
374 Int_t nbins = ixmax - ixmin;
375 (*fHistSig)[ivar] = new TH1F(GetMethodName()+"_"+var + "_sig", var + " signal training", nbins, ixmin, ixmax );
376 (*fHistBgd)[ivar] = new TH1F(GetMethodName()+"_"+var + "_bgd", var + " background training", nbins, ixmin, ixmax );
377 } else {
378
379 UInt_t minNEvt = TMath::Min(Data()->GetNEvtSigTrain(),Data()->GetNEvtBkgdTrain());
380 Int_t nbinsS = (*fPDFSig)[ivar]->GetHistNBins( minNEvt );
381 Int_t nbinsB = (*fPDFBgd)[ivar]->GetHistNBins( minNEvt );
382
383 (*fHistSig)[ivar] = new TH1F( TString::Format("%s_%s_%s_sig",DataInfo().GetName(),GetMethodName().Data(),var.Data()),
384 TString::Format("%s_%s_%s signal training",DataInfo().GetName(),GetMethodName().Data(),var.Data()),
385 nbinsS, xmin[ivar], xmax[ivar] );
386 (*fHistBgd)[ivar] = new TH1F( TString::Format("%s_%s_%s_bgd",DataInfo().GetName(),GetMethodName().Data(),var.Data()),
387 TString::Format("%s_%s_%s background training",DataInfo().GetName(),GetMethodName().Data(),var.Data()),
388 nbinsB, xmin[ivar], xmax[ivar] );
389 }
390 }
391
392 // ----- fill the reference histograms
393 Log() << kINFO << "Filling reference histograms" << Endl;
394
395 // event loop
396 for (Int_t ievt=0; ievt<Data()->GetNEvents(); ievt++) {
397
398 // use the true-event-type's transformation
399 // set the event true event types transformation
400 const Event* origEv = Data()->GetEvent(ievt);
401 if (IgnoreEventsWithNegWeightsInTraining() && origEv->GetWeight()<=0) continue;
402 GetTransformationHandler().SetTransformationReferenceClass( origEv->GetClass() );
403 const Event* ev = GetTransformationHandler().Transform( origEv );
404
405 // the event weight
406 Float_t weight = ev->GetWeight();
407
408 // fill variable vector
409 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
410 Double_t value = ev->GetValue(ivar);
411 // verify limits
412 if (value >= xmax[ivar]) value = xmax[ivar] - 1.0e-10;
413 else if (value < xmin[ivar]) value = xmin[ivar] + 1.0e-10;
414 // inserting check if there are events in overflow or underflow
415 if (value >=(*fHistSig)[ivar]->GetXaxis()->GetXmax() ||
416 value <(*fHistSig)[ivar]->GetXaxis()->GetXmin()){
417 Log()<<kWARNING
418 <<"error in filling likelihood reference histograms var="
419 <<(*fInputVars)[ivar]
420 << ", xmin="<<(*fHistSig)[ivar]->GetXaxis()->GetXmin()
421 << ", value="<<value
422 << ", xmax="<<(*fHistSig)[ivar]->GetXaxis()->GetXmax()
423 << Endl;
424 }
425 if (DataInfo().IsSignal(ev)) (*fHistSig)[ivar]->Fill( value, weight );
426 else (*fHistBgd)[ivar]->Fill( value, weight );
427 }
428 }
429
430 // building the pdfs
431 Log() << kINFO << "Building PDF out of reference histograms" << Endl;
432 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
433
434 // the PDF is built from (binned) reference histograms
435 // in case of KDE, this has a large number of bins, which makes it quasi-unbinned
436 (*fPDFSig)[ivar]->BuildPDF( (*fHistSig)[ivar] );
437 (*fPDFBgd)[ivar]->BuildPDF( (*fHistBgd)[ivar] );
438
439 (*fPDFSig)[ivar]->ValidatePDF( (*fHistSig)[ivar] );
440 (*fPDFBgd)[ivar]->ValidatePDF( (*fHistBgd)[ivar] );
441
442 // saving the smoothed histograms
443 if ((*fPDFSig)[ivar]->GetSmoothedHist() != 0) (*fHistSig_smooth)[ivar] = (*fPDFSig)[ivar]->GetSmoothedHist();
444 if ((*fPDFBgd)[ivar]->GetSmoothedHist() != 0) (*fHistBgd_smooth)[ivar] = (*fPDFBgd)[ivar]->GetSmoothedHist();
445 }
446 ExitFromTraining();
447}
448
449////////////////////////////////////////////////////////////////////////////////
450/// returns the likelihood estimator for signal
451/// fill a new Likelihood branch into the testTree
452
454{
455 UInt_t ivar;
456
457 // cannot determine error
458 NoErrorCalc(err, errUpper);
459
460 // retrieve variables, and transform, if required
461 TVector vs( GetNvar() );
462 TVector vb( GetNvar() );
463
464 // need to distinguish signal and background in case of variable transformation
465 // signal first
466
467 GetTransformationHandler().SetTransformationReferenceClass( fSignalClass );
468 // temporary: JS --> FIX
469 //GetTransformationHandler().SetTransformationReferenceClass( 0 );
470 const Event* ev = GetEvent();
471 for (ivar=0; ivar<GetNvar(); ivar++) vs(ivar) = ev->GetValue(ivar);
472
473 GetTransformationHandler().SetTransformationReferenceClass( fBackgroundClass );
474 // temporary: JS --> FIX
475 //GetTransformationHandler().SetTransformationReferenceClass( 1 );
476 ev = GetEvent();
477 for (ivar=0; ivar<GetNvar(); ivar++) vb(ivar) = ev->GetValue(ivar);
478
479 // compute the likelihood (signal)
480 Double_t ps(1), pb(1), p(0);
481 for (ivar=0; ivar<GetNvar(); ivar++) {
482
483 // drop one variable (this is ONLY used for internal variable ranking !)
484 if ((Int_t)ivar == fDropVariable) continue;
485
486 Double_t x[2] = { vs(ivar), vb(ivar) };
487
488 for (UInt_t itype=0; itype < 2; itype++) {
489
490 // verify limits
491 if (x[itype] >= (*fPDFSig)[ivar]->GetXmax()) x[itype] = (*fPDFSig)[ivar]->GetXmax() - 1.0e-10;
492 else if (x[itype] < (*fPDFSig)[ivar]->GetXmin()) x[itype] = (*fPDFSig)[ivar]->GetXmin();
493
494 // find corresponding histogram from cached indices
495 PDF* pdf = (itype == 0) ? (*fPDFSig)[ivar] : (*fPDFBgd)[ivar];
496 if (pdf == 0) Log() << kFATAL << "<GetMvaValue> Reference histograms don't exist" << Endl;
497 TH1* hist = pdf->GetPDFHist();
498
499 // interpolate linearly between adjacent bins
500 // this is not useful for discrete variables
501 Int_t bin = hist->FindBin(x[itype]);
502
503 // **** POTENTIAL BUG: PREFORMANCE IS WORSE WHEN USING TRUE TYPE ***
504 // ==> commented out at present
505 if ((*fPDFSig)[ivar]->GetInterpolMethod() == TMVA::PDF::kSpline0 ||
506 DataInfo().GetVariableInfo(ivar).GetVarType() == 'N') {
507 p = TMath::Max( hist->GetBinContent(bin), fEpsilon );
508 } else { // splined PDF
509 Int_t nextbin = bin;
510 if ((x[itype] > hist->GetBinCenter(bin) && bin != hist->GetNbinsX()) || bin == 1)
511 nextbin++;
512 else
513 nextbin--;
514
515
516 Double_t dx = hist->GetBinCenter(bin) - hist->GetBinCenter(nextbin);
517 Double_t dy = hist->GetBinContent(bin) - hist->GetBinContent(nextbin);
518 Double_t like = hist->GetBinContent(bin) + (x[itype] - hist->GetBinCenter(bin)) * dy/dx;
519
520 p = TMath::Max( like, fEpsilon );
521 }
522
523 if (itype == 0) ps *= p;
524 else pb *= p;
525 }
526 }
527
528 // the likelihood ratio (transform it ?)
529 return TransformLikelihoodOutput( ps, pb );
530}
531
532////////////////////////////////////////////////////////////////////////////////
533/// returns transformed or non-transformed output
534
536{
537 if (ps < fEpsilon) ps = fEpsilon;
538 if (pb < fEpsilon) pb = fEpsilon;
539 Double_t r = ps/(ps + pb);
540 if (r >= 1.0) r = 1. - 1.e-15;
541
542 if (fTransformLikelihoodOutput) {
543 // inverse Fermi function
544
545 // sanity check
546 if (r <= 0.0) r = fEpsilon;
547 else if (r >= 1.0) r = 1. - 1.e-15;
548
549 Double_t tau = 15.0;
550 r = - TMath::Log(1.0/r - 1.0)/tau;
551 }
552
553 return r;
554}
555
556////////////////////////////////////////////////////////////////////////////////
557/// write options to stream
558
559void TMVA::MethodLikelihood::WriteOptionsToStream( std::ostream& o, const TString& prefix ) const
560{
562
563 // writing the options defined for the different pdfs
564 if (fDefaultPDFLik != 0) {
565 o << prefix << std::endl << prefix << "#Default Likelihood PDF Options:" << std::endl << prefix << std::endl;
566 fDefaultPDFLik->WriteOptionsToStream( o, prefix );
567 }
568 for (UInt_t ivar = 0; ivar < fPDFSig->size(); ivar++) {
569 if ((*fPDFSig)[ivar] != 0) {
570 o << prefix << std::endl << prefix << TString::Format("#Signal[%d] Likelihood PDF Options:",ivar) << std::endl << prefix << std::endl;
571 (*fPDFSig)[ivar]->WriteOptionsToStream( o, prefix );
572 }
573 if ((*fPDFBgd)[ivar] != 0) {
574 o << prefix << std::endl << prefix << "#Background[%d] Likelihood PDF Options:" << std::endl << prefix << std::endl;
575 (*fPDFBgd)[ivar]->WriteOptionsToStream( o, prefix );
576 }
577 }
578}
579
580////////////////////////////////////////////////////////////////////////////////
581/// write weights to XML
582
584{
585 void* wght = gTools().AddChild(parent, "Weights");
586 gTools().AddAttr(wght, "NVariables", GetNvar());
587 gTools().AddAttr(wght, "NClasses", 2);
588 void* pdfwrap;
589 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
590 if ( (*fPDFSig)[ivar]==0 || (*fPDFBgd)[ivar]==0 )
591 Log() << kFATAL << "Reference histograms for variable " << ivar
592 << " don't exist, can't write it to weight file" << Endl;
593 pdfwrap = gTools().AddChild(wght, "PDFDescriptor");
594 gTools().AddAttr(pdfwrap, "VarIndex", ivar);
595 gTools().AddAttr(pdfwrap, "ClassIndex", 0);
596 (*fPDFSig)[ivar]->AddXMLTo(pdfwrap);
597 pdfwrap = gTools().AddChild(wght, "PDFDescriptor");
598 gTools().AddAttr(pdfwrap, "VarIndex", ivar);
599 gTools().AddAttr(pdfwrap, "ClassIndex", 1);
600 (*fPDFBgd)[ivar]->AddXMLTo(pdfwrap);
601 }
602}
603
604////////////////////////////////////////////////////////////////////////////////
605/// computes ranking of input variables
606
608{
609 // create the ranking object
610 if (fRanking) delete fRanking;
611 fRanking = new Ranking( GetName(), "Delta Separation" );
612
613 Double_t sepRef = -1, sep = -1;
614 for (Int_t ivar=-1; ivar<(Int_t)GetNvar(); ivar++) {
615
616 // this variable should not be used
617 fDropVariable = ivar;
618
619 TString nameS = TString::Format( "rS_%i", ivar+1 );
620 TString nameB = TString::Format( "rB_%i", ivar+1 );
621 TH1* rS = new TH1F( nameS, nameS, 80, 0, 1 );
622 TH1* rB = new TH1F( nameB, nameB, 80, 0, 1 );
623
624 // the event loop
625 for (Int_t ievt=0; ievt<Data()->GetNTrainingEvents(); ievt++) {
626
627 const Event* origEv = Data()->GetEvent(ievt);
628 GetTransformationHandler().SetTransformationReferenceClass( origEv->GetClass() );
629 const Event* ev = GetTransformationHandler().Transform(Data()->GetEvent(ievt));
630
631 Double_t lk = this->GetMvaValue();
632 Double_t w = ev->GetWeight();
633 if (DataInfo().IsSignal(ev)) rS->Fill( lk, w );
634 else rB->Fill( lk, w );
635 }
636
637 // compute separation
638 sep = TMVA::gTools().GetSeparation( rS, rB );
639 if (ivar == -1) sepRef = sep;
640 sep = sepRef - sep;
641
642 // don't need these histograms anymore
643 delete rS;
644 delete rB;
645
646 if (ivar >= 0) fRanking->AddRank( Rank( DataInfo().GetVariableInfo(ivar).GetInternalName(), sep ) );
647 }
648
649 fDropVariable = -1;
650
651 return fRanking;
652}
653
654////////////////////////////////////////////////////////////////////////////////
655/// write reference PDFs to ROOT file
656
658{
659 TString pname = "PDF_";
660 for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
661 (*fPDFSig)[ivar]->Write( pname + GetInputVar( ivar ) + "_S" );
662 (*fPDFBgd)[ivar]->Write( pname + GetInputVar( ivar ) + "_B" );
663 }
664}
665
666////////////////////////////////////////////////////////////////////////////////
667/// read weights from XML
668
670{
671 TString pname = "PDF_";
672 Bool_t addDirStatus = TH1::AddDirectoryStatus();
673 TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
674 UInt_t nvars=0;
675 gTools().ReadAttr(wghtnode, "NVariables",nvars);
676 void* descnode = gTools().GetChild(wghtnode);
677 for (UInt_t ivar=0; ivar<nvars; ivar++){
678 void* pdfnode = gTools().GetChild(descnode);
679 Log() << kDEBUG << "Reading signal and background PDF for variable: " << GetInputVar( ivar ) << Endl;
680 if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
681 if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
682 (*fPDFSig)[ivar] = new PDF( GetInputVar( ivar ) + " PDF Sig" );
683 (*fPDFBgd)[ivar] = new PDF( GetInputVar( ivar ) + " PDF Bkg" );
684 (*fPDFSig)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
685 (*fPDFBgd)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
686 (*(*fPDFSig)[ivar]).ReadXML(pdfnode);
687 descnode = gTools().GetNextChild(descnode);
688 pdfnode = gTools().GetChild(descnode);
689 (*(*fPDFBgd)[ivar]).ReadXML(pdfnode);
690 descnode = gTools().GetNextChild(descnode);
691 }
692 TH1::AddDirectory(addDirStatus);
693}
694
695////////////////////////////////////////////////////////////////////////////////
696/// read weight info from file
697/// nothing to do for this method
698
700{
701 TString pname = "PDF_";
702 Bool_t addDirStatus = TH1::AddDirectoryStatus();
703 TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
704 for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
705 Log() << kDEBUG << "Reading signal and background PDF for variable: " << GetInputVar( ivar ) << Endl;
706 if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
707 if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
708 (*fPDFSig)[ivar] = new PDF(GetInputVar( ivar ) + " PDF Sig" );
709 (*fPDFBgd)[ivar] = new PDF(GetInputVar( ivar ) + " PDF Bkg");
710 (*fPDFSig)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
711 (*fPDFBgd)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
712 istr >> *(*fPDFSig)[ivar];
713 istr >> *(*fPDFBgd)[ivar];
714 }
715 TH1::AddDirectory(addDirStatus);
716}
717
718////////////////////////////////////////////////////////////////////////////////
719/// read reference PDF from ROOT file
720
722{
723 TString pname = "PDF_";
724 Bool_t addDirStatus = TH1::AddDirectoryStatus();
725 TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
726 for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
727 (*fPDFSig)[ivar] = (TMVA::PDF*)rf.Get( TString::Format( "PDF_%s_S", GetInputVar( ivar ).Data() ) );
728 (*fPDFBgd)[ivar] = (TMVA::PDF*)rf.Get( TString::Format( "PDF_%s_B", GetInputVar( ivar ).Data() ) );
729 }
730 TH1::AddDirectory(addDirStatus);
731}
732
733////////////////////////////////////////////////////////////////////////////////
734/// write histograms and PDFs to file for monitoring purposes
735
737{
738 Log() << kINFO << "Write monitoring histograms to file: " << BaseDir()->GetPath() << Endl;
739 BaseDir()->cd();
740
741 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
742 (*fHistSig)[ivar]->Write();
743 (*fHistBgd)[ivar]->Write();
744 if ((*fHistSig_smooth)[ivar] != 0) (*fHistSig_smooth)[ivar]->Write();
745 if ((*fHistBgd_smooth)[ivar] != 0) (*fHistBgd_smooth)[ivar]->Write();
746 (*fPDFSig)[ivar]->GetPDFHist()->Write();
747 (*fPDFBgd)[ivar]->GetPDFHist()->Write();
748
749 if ((*fPDFSig)[ivar]->GetNSmoothHist() != 0) (*fPDFSig)[ivar]->GetNSmoothHist()->Write();
750 if ((*fPDFBgd)[ivar]->GetNSmoothHist() != 0) (*fPDFBgd)[ivar]->GetNSmoothHist()->Write();
751
752 // add special plots to check the smoothing in the GetVal method
753 Float_t xmin=((*fPDFSig)[ivar]->GetPDFHist()->GetXaxis())->GetXmin();
754 Float_t xmax=((*fPDFSig)[ivar]->GetPDFHist()->GetXaxis())->GetXmax();
755 TH1F* mm = new TH1F( (*fInputVars)[ivar]+"_additional_check",
756 (*fInputVars)[ivar]+"_additional_check", 15000, xmin, xmax );
757 Double_t intBin = (xmax-xmin)/15000;
758 for (Int_t bin=0; bin < 15000; bin++) {
759 Double_t x = (bin + 0.5)*intBin + xmin;
760 mm->SetBinContent(bin+1 ,(*fPDFSig)[ivar]->GetVal(x));
761 }
762 mm->Write();
763
764 // ---------- create cloned low-binned histogram for comparison in macros (mainly necessary for KDE)
765 TH1* h[2] = { (*fHistSig)[ivar], (*fHistBgd)[ivar] };
766 for (UInt_t i=0; i<2; i++) {
767 TH1* hclone = (TH1F*)h[i]->Clone( TString(h[i]->GetName()) + "_nice" );
768 hclone->SetName ( TString(h[i]->GetName()) + "_nice" );
769 hclone->SetTitle( TString(h[i]->GetTitle()) + "" );
770 if (hclone->GetNbinsX() > 100) {
771 Int_t resFactor = 5;
772 hclone->Rebin( resFactor );
773 hclone->Scale( 1.0/resFactor );
774 }
775 hclone->Write();
776 }
777 // ----------
778 }
779}
780
781////////////////////////////////////////////////////////////////////////////////
782/// write specific header of the classifier (mostly include files)
783
784void TMVA::MethodLikelihood::MakeClassSpecificHeader( std::ostream& fout, const TString& ) const
785{
786 fout << "#include <math.h>" << std::endl;
787 fout << "#include <cstdlib>" << std::endl;
788}
789
790////////////////////////////////////////////////////////////////////////////////
791/// write specific classifier response
792
793void TMVA::MethodLikelihood::MakeClassSpecific( std::ostream& fout, const TString& className ) const
794{
795 Int_t dp = fout.precision();
796 fout << " double fEpsilon;" << std::endl;
797
798 Int_t * nbin = new Int_t[GetNvar()];
799
800 Int_t nbinMax=-1;
801 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
802 nbin[ivar]=(*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX();
803 if (nbin[ivar] > nbinMax) nbinMax=nbin[ivar];
804 }
805
806 fout << " static float fRefS[][" << nbinMax << "]; "
807 << "// signal reference vector [nvars][max_nbins]" << std::endl;
808 fout << " static float fRefB[][" << nbinMax << "]; "
809 << "// backgr reference vector [nvars][max_nbins]" << std::endl << std::endl;
810 fout << "// if a variable has its PDF encoded as a spline0 --> treat it like an Integer valued one" <<std::endl;
811 fout << " bool fHasDiscretPDF[" << GetNvar() <<"]; "<< std::endl;
812 fout << " int fNbin[" << GetNvar() << "]; "
813 << "// number of bins (discrete variables may have less bins)" << std::endl;
814 fout << " double fHistMin[" << GetNvar() << "]; " << std::endl;
815 fout << " double fHistMax[" << GetNvar() << "]; " << std::endl;
816
817 fout << " double TransformLikelihoodOutput( double, double ) const;" << std::endl;
818 fout << "};" << std::endl;
819 fout << "" << std::endl;
820 fout << "inline void " << className << "::Initialize() " << std::endl;
821 fout << "{" << std::endl;
822 fout << " fEpsilon = " << fEpsilon << ";" << std::endl;
823 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
824 fout << " fNbin[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() << ";" << std::endl;
825 fout << " fHistMin[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetXaxis()->GetXmin() << ";" << std::endl;
826 fout << " fHistMax[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetXaxis()->GetXmax() << ";" << std::endl;
827 // sanity check (for previous code lines)
828 if ((((*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() != nbin[ivar] ||
829 (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX() != nbin[ivar])
830 ) ||
831 (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() != (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX()) {
832 Log() << kFATAL << "<MakeClassSpecific> Mismatch in binning of variable "
833 << "\"" << GetOriginalVarName(ivar) << "\" of type: \'" << DataInfo().GetVariableInfo(ivar).GetVarType()
834 << "\' : "
835 << "nxS = " << (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() << ", "
836 << "nxB = " << (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX()
837 << " while we expect " << nbin[ivar]
838 << Endl;
839 }
840 }
841 for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
842 if ((*fPDFSig)[ivar]->GetInterpolMethod() == TMVA::PDF::kSpline0)
843 fout << " fHasDiscretPDF[" << ivar <<"] = true; " << std::endl;
844 else
845 fout << " fHasDiscretPDF[" << ivar <<"] = false; " << std::endl;
846 }
847
848 fout << "}" << std::endl << std::endl;
849
850 fout << "inline double " << className
851 << "::GetMvaValue__( const std::vector<double>& inputValues ) const" << std::endl;
852 fout << "{" << std::endl;
853 fout << " double ps(1), pb(1);" << std::endl;
854 fout << " std::vector<double> inputValuesSig = inputValues;" << std::endl;
855 fout << " std::vector<double> inputValuesBgd = inputValues;" << std::endl;
856 if (GetTransformationHandler().GetTransformationList().GetSize() != 0) {
857 fout << " Transform(inputValuesSig,0);" << std::endl;
858 fout << " Transform(inputValuesBgd,1);" << std::endl;
859 }
860 fout << " for (size_t ivar = 0; ivar < GetNvar(); ivar++) {" << std::endl;
861 fout << std::endl;
862 fout << " // dummy at present... will be used for variable transforms" << std::endl;
863 fout << " double x[2] = { inputValuesSig[ivar], inputValuesBgd[ivar] };" << std::endl;
864 fout << std::endl;
865 fout << " for (int itype=0; itype < 2; itype++) {" << std::endl;
866 fout << std::endl;
867 fout << " // interpolate linearly between adjacent bins" << std::endl;
868 fout << " // this is not useful for discrete variables (or forced Spline0)" << std::endl;
869 fout << " int bin = int((x[itype] - fHistMin[ivar])/(fHistMax[ivar] - fHistMin[ivar])*fNbin[ivar]) + 0;" << std::endl;
870 fout << std::endl;
871 fout << " // since the test data sample is in general different from the training sample" << std::endl;
872 fout << " // it can happen that the min/max of the training sample are trespassed --> correct this" << std::endl;
873 fout << " if (bin < 0) {" << std::endl;
874 fout << " bin = 0;" << std::endl;
875 fout << " x[itype] = fHistMin[ivar];" << std::endl;
876 fout << " }" << std::endl;
877 fout << " else if (bin >= fNbin[ivar]) {" << std::endl;
878 fout << " bin = fNbin[ivar]-1;" << std::endl;
879 fout << " x[itype] = fHistMax[ivar];" << std::endl;
880 fout << " }" << std::endl;
881 fout << std::endl;
882 fout << " // find corresponding histogram from cached indices" << std::endl;
883 fout << " float ref = (itype == 0) ? fRefS[ivar][bin] : fRefB[ivar][bin];" << std::endl;
884 fout << std::endl;
885 fout << " // sanity check" << std::endl;
886 fout << " if (ref < 0) {" << std::endl;
887 fout << " std::cout << \"Fatal error in " << className
888 << ": bin entry < 0 ==> abort\" << std::endl;" << std::endl;
889 fout << " std::exit(1);" << std::endl;
890 fout << " }" << std::endl;
891 fout << std::endl;
892 fout << " double p = ref;" << std::endl;
893 fout << std::endl;
894 fout << " if (GetType(ivar) != 'I' && !fHasDiscretPDF[ivar]) {" << std::endl;
895 fout << " float bincenter = (bin + 0.5)/fNbin[ivar]*(fHistMax[ivar] - fHistMin[ivar]) + fHistMin[ivar];" << std::endl;
896 fout << " int nextbin = bin;" << std::endl;
897 fout << " if ((x[itype] > bincenter && bin != fNbin[ivar]-1) || bin == 0) " << std::endl;
898 fout << " nextbin++;" << std::endl;
899 fout << " else" << std::endl;
900 fout << " nextbin--; " << std::endl;
901 fout << std::endl;
902 fout << " double refnext = (itype == 0) ? fRefS[ivar][nextbin] : fRefB[ivar][nextbin];" << std::endl;
903 fout << " float nextbincenter = (nextbin + 0.5)/fNbin[ivar]*(fHistMax[ivar] - fHistMin[ivar]) + fHistMin[ivar];" << std::endl;
904 fout << std::endl;
905 fout << " double dx = bincenter - nextbincenter;" << std::endl;
906 fout << " double dy = ref - refnext;" << std::endl;
907 fout << " p += (x[itype] - bincenter) * dy/dx;" << std::endl;
908 fout << " }" << std::endl;
909 fout << std::endl;
910 fout << " if (p < fEpsilon) p = fEpsilon; // avoid zero response" << std::endl;
911 fout << std::endl;
912 fout << " if (itype == 0) ps *= p;" << std::endl;
913 fout << " else pb *= p;" << std::endl;
914 fout << " } " << std::endl;
915 fout << " } " << std::endl;
916 fout << std::endl;
917 fout << " // the likelihood ratio (transform it ?)" << std::endl;
918 fout << " return TransformLikelihoodOutput( ps, pb ); " << std::endl;
919 fout << "}" << std::endl << std::endl;
920
921 fout << "inline double " << className << "::TransformLikelihoodOutput( double ps, double pb ) const" << std::endl;
922 fout << "{" << std::endl;
923 fout << " // returns transformed or non-transformed output" << std::endl;
924 fout << " if (ps < fEpsilon) ps = fEpsilon;" << std::endl;
925 fout << " if (pb < fEpsilon) pb = fEpsilon;" << std::endl;
926 fout << " double r = ps/(ps + pb);" << std::endl;
927 fout << " if (r >= 1.0) r = 1. - 1.e-15;" << std::endl;
928 fout << std::endl;
929 fout << " if (" << (fTransformLikelihoodOutput ? "true" : "false") << ") {" << std::endl;
930 fout << " // inverse Fermi function" << std::endl;
931 fout << std::endl;
932 fout << " // sanity check" << std::endl;
933 fout << " if (r <= 0.0) r = fEpsilon;" << std::endl;
934 fout << " else if (r >= 1.0) r = 1. - 1.e-15;" << std::endl;
935 fout << std::endl;
936 fout << " double tau = 15.0;" << std::endl;
937 fout << " r = - log(1.0/r - 1.0)/tau;" << std::endl;
938 fout << " }" << std::endl;
939 fout << std::endl;
940 fout << " return r;" << std::endl;
941 fout << "}" << std::endl;
942 fout << std::endl;
943
944 fout << "// Clean up" << std::endl;
945 fout << "inline void " << className << "::Clear() " << std::endl;
946 fout << "{" << std::endl;
947 fout << " // nothing to clear" << std::endl;
948 fout << "}" << std::endl << std::endl;
949
950 fout << "// signal map" << std::endl;
951 fout << "float " << className << "::fRefS[][" << nbinMax << "] = " << std::endl;
952 fout << "{ " << std::endl;
953 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
954 fout << " { ";
955 for (Int_t ibin=1; ibin<=nbinMax; ibin++) {
956 if (ibin-1 < nbin[ivar])
957 fout << (*fPDFSig)[ivar]->GetPDFHist()->GetBinContent(ibin);
958 else
959 fout << -1;
960
961 if (ibin < nbinMax) fout << ", ";
962 }
963 fout << " }, " << std::endl;
964 }
965 fout << "}; " << std::endl;
966 fout << std::endl;
967
968 fout << "// background map" << std::endl;
969 fout << "float " << className << "::fRefB[][" << nbinMax << "] = " << std::endl;
970 fout << "{ " << std::endl;
971 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
972 fout << " { ";
973 fout << std::setprecision(8);
974 for (Int_t ibin=1; ibin<=nbinMax; ibin++) {
975 if (ibin-1 < nbin[ivar])
976 fout << (*fPDFBgd)[ivar]->GetPDFHist()->GetBinContent(ibin);
977 else
978 fout << -1;
979
980 if (ibin < nbinMax) fout << ", ";
981 }
982 fout << " }, " << std::endl;
983 }
984 fout << "}; " << std::endl;
985 fout << std::endl;
986 fout << std::setprecision(dp);
987
988 delete[] nbin;
989}
990
991////////////////////////////////////////////////////////////////////////////////
992/// get help message text
993///
994/// typical length of text line:
995/// "|--------------------------------------------------------------|"
996
998{
999 Log() << Endl;
1000 Log() << gTools().Color("bold") << "--- Short description:" << gTools().Color("reset") << Endl;
1001 Log() << Endl;
1002 Log() << "The maximum-likelihood classifier models the data with probability " << Endl;
1003 Log() << "density functions (PDF) reproducing the signal and background" << Endl;
1004 Log() << "distributions of the input variables. Correlations among the " << Endl;
1005 Log() << "variables are ignored." << Endl;
1006 Log() << Endl;
1007 Log() << gTools().Color("bold") << "--- Performance optimisation:" << gTools().Color("reset") << Endl;
1008 Log() << Endl;
1009 Log() << "Required for good performance are decorrelated input variables" << Endl;
1010 Log() << "(PCA transformation via the option \"VarTransform=Decorrelate\"" << Endl;
1011 Log() << "may be tried). Irreducible non-linear correlations may be reduced" << Endl;
1012 Log() << "by precombining strongly correlated input variables, or by simply" << Endl;
1013 Log() << "removing one of the variables." << Endl;
1014 Log() << Endl;
1015 Log() << gTools().Color("bold") << "--- Performance tuning via configuration options:" << gTools().Color("reset") << Endl;
1016 Log() << Endl;
1017 Log() << "High fidelity PDF estimates are mandatory, i.e., sufficient training " << Endl;
1018 Log() << "statistics is required to populate the tails of the distributions" << Endl;
1019 Log() << "It would be a surprise if the default Spline or KDE kernel parameters" << Endl;
1020 Log() << "provide a satisfying fit to the data. The user is advised to properly" << Endl;
1021 Log() << "tune the events per bin and smooth options in the spline cases" << Endl;
1022 Log() << "individually per variable. If the KDE kernel is used, the adaptive" << Endl;
1023 Log() << "Gaussian kernel may lead to artefacts, so please always also try" << Endl;
1024 Log() << "the non-adaptive one." << Endl;
1025 Log() << "" << Endl;
1026 Log() << "All tuning parameters must be adjusted individually for each input" << Endl;
1027 Log() << "variable!" << Endl;
1028}
1029
#define REGISTER_METHOD(CLASS)
for example
#define h(i)
Definition RSha256.hxx:106
int Int_t
Definition RtypesCore.h:45
float Float_t
Definition RtypesCore.h:57
constexpr Bool_t kFALSE
Definition RtypesCore.h:101
constexpr Bool_t kTRUE
Definition RtypesCore.h:100
#define ClassImp(name)
Definition Rtypes.h:377
winID h TVirtualViewer3D TVirtualGLPainter p
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t r
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void value
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t type
float xmin
float xmax
TVectorT< Float_t > TVector
Definition TVectorfwd.h:23
TObject * Get(const char *namecycle) override
Return pointer to object identified by namecycle.
A ROOT file is an on-disk file, usually with extension .root, that stores objects in a file-system-li...
Definition TFile.h:53
1-D histogram with a float per channel (see TH1 documentation)
Definition TH1.h:621
TH1 is the base class of all histogram classes in ROOT.
Definition TH1.h:59
virtual Double_t GetBinCenter(Int_t bin) const
Return bin center for 1D histogram.
Definition TH1.cxx:9109
void SetTitle(const char *title) override
Change/set the title.
Definition TH1.cxx:6686
static void AddDirectory(Bool_t add=kTRUE)
Sets the flag controlling the automatic add of histograms in memory.
Definition TH1.cxx:1294
virtual Int_t GetNbinsX() const
Definition TH1.h:297
virtual Int_t Fill(Double_t x)
Increment bin with abscissa X by 1.
Definition TH1.cxx:3344
virtual void SetBinContent(Int_t bin, Double_t content)
Set bin content see convention for numbering bins in TH1::GetBin In case the bin number is greater th...
Definition TH1.cxx:9190
virtual TH1 * Rebin(Int_t ngroup=2, const char *newname="", const Double_t *xbins=nullptr)
Rebin this histogram.
Definition TH1.cxx:6243
void SetName(const char *name) override
Change the name of this histogram.
Definition TH1.cxx:8928
virtual Double_t GetBinContent(Int_t bin) const
Return content of bin number bin.
Definition TH1.cxx:5029
virtual void Scale(Double_t c1=1, Option_t *option="")
Multiply this histogram by a constant c1.
Definition TH1.cxx:6572
virtual Int_t FindBin(Double_t x, Double_t y=0, Double_t z=0)
Return Global bin number corresponding to x,y,z.
Definition TH1.cxx:3672
static Bool_t AddDirectoryStatus()
Static function: cannot be inlined on Windows/NT.
Definition TH1.cxx:754
void WriteOptionsToStream(std::ostream &o, const TString &prefix) const
write options to output stream (e.g. in writing the MVA weight files
Class that contains all the data information.
Definition DataSetInfo.h:62
Float_t GetValue(UInt_t ivar) const
return value of i'th variable
Definition Event.cxx:236
Double_t GetWeight() const
return the event weight - depending on whether the flag IgnoreNegWeightsInTraining is or not.
Definition Event.cxx:389
UInt_t GetClass() const
Definition Event.h:86
Virtual base Class for all MVA method.
Definition MethodBase.h:111
virtual void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
Likelihood analysis ("non-parametric approach")
const Ranking * CreateRanking()
computes ranking of input variables
void Train()
create reference distributions (PDFs) from signal and background events: fill histograms and smooth t...
virtual Bool_t HasAnalysisType(Types::EAnalysisType type, UInt_t numberClasses, UInt_t numberTargets)
FDA can handle classification with 2 classes.
virtual void WriteOptionsToStream(std::ostream &o, const TString &prefix) const
write options to stream
void WriteMonitoringHistosToFile() const
write histograms and PDFs to file for monitoring purposes
void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
void MakeClassSpecific(std::ostream &, const TString &) const
write specific classifier response
virtual ~MethodLikelihood()
destructor
void Init()
default initialisation called by all constructors
void ReadWeightsFromStream(std::istream &istr)
read weight info from file nothing to do for this method
MethodLikelihood(const TString &jobName, const TString &methodTitle, DataSetInfo &theData, const TString &theOption="")
standard constructor
void GetHelpMessage() const
get help message text
void ReadWeightsFromXML(void *wghtnode)
read weights from XML
void MakeClassSpecificHeader(std::ostream &, const TString &="") const
write specific header of the classifier (mostly include files)
Double_t TransformLikelihoodOutput(Double_t ps, Double_t pb) const
returns transformed or non-transformed output
void ProcessOptions()
process user options reference cut value to distinguish signal-like from background-like events
void WriteWeightsToStream(TFile &rf) const
write reference PDFs to ROOT file
Double_t GetMvaValue(Double_t *err=nullptr, Double_t *errUpper=nullptr)
returns the likelihood estimator for signal fill a new Likelihood branch into the testTree
void AddWeightsXMLTo(void *parent) const
write weights to XML
void DeclareOptions()
define the options (their key words) that can be set in the option string
PDF wrapper for histograms; uses user-defined spline interpolation.
Definition PDF.h:63
TH1 * GetPDFHist() const
Definition PDF.h:92
@ kSpline0
Definition PDF.h:70
Ranking for variables in method (implementation)
Definition Ranking.h:48
Double_t GetSeparation(TH1 *S, TH1 *B) const
compute "separation" defined as
Definition Tools.cxx:121
const TString & Color(const TString &)
human readable color strings
Definition Tools.cxx:828
void ReadAttr(void *node, const char *, T &value)
read attribute from xml
Definition Tools.h:329
void * GetChild(void *parent, const char *childname=nullptr)
get child node
Definition Tools.cxx:1150
void AddAttr(void *node, const char *, const T &value, Int_t precision=16)
add attribute to xml
Definition Tools.h:347
void * AddChild(void *parent, const char *childname, const char *content=nullptr, bool isRootNode=false)
add child node
Definition Tools.cxx:1124
void * GetNextChild(void *prevchild, const char *childname=nullptr)
XML helpers.
Definition Tools.cxx:1162
Singleton class for Global types used by TMVA.
Definition Types.h:71
@ kClassification
Definition Types.h:127
virtual Int_t Write(const char *name=nullptr, Int_t option=0, Int_t bufsize=0)
Write this object to the current directory.
Definition TObject.cxx:880
Basic string class.
Definition TString.h:139
const char * Data() const
Definition TString.h:376
static TString Format(const char *fmt,...)
Static method which formats a string using a printf style format descriptor and return a TString.
Definition TString.cxx:2378
Double_t x[n]
Definition legend1.C:17
create variable transformations
Tools & gTools()
MsgLogger & Endl(MsgLogger &ml)
Definition MsgLogger.h:148
Int_t Nint(T x)
Round to nearest integer. Rounds half integers to the nearest even integer.
Definition TMath.h:693
Short_t Max(Short_t a, Short_t b)
Returns the largest of a and b.
Definition TMathBase.h:250
Double_t Log(Double_t x)
Returns the natural logarithm of x.
Definition TMath.h:756
Short_t Min(Short_t a, Short_t b)
Returns the smallest of a and b.
Definition TMathBase.h:198