Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
MethodCuts.cxx
Go to the documentation of this file.
1// @(#)root/tmva $Id$
2// Author: Andreas Hoecker, Matt Jachowski, Peter Speckmayer, Eckhard von Toerne, Helge Voss, Kai Voss
3
4/**********************************************************************************
5 * Project: TMVA - a Root-integrated toolkit for multivariate Data analysis *
6 * Package: TMVA *
7 * Class : TMVA::MethodCuts *
8 * Web : http://tmva.sourceforge.net *
9 * *
10 * Description: *
11 * Implementation (see header for description) *
12 * *
13 * Authors (alphabetical): *
14 * Andreas Hoecker <Andreas.Hocker@cern.ch> - CERN, Switzerland *
15 * Matt Jachowski <jachowski@stanford.edu> - Stanford University, USA *
16 * Peter Speckmayer <speckmay@mail.cern.ch> - CERN, Switzerland *
17 * Eckhard von Toerne <evt@physik.uni-bonn.de> - U. of Bonn, Germany *
18 * Helge Voss <Helge.Voss@cern.ch> - MPI-K Heidelberg, Germany *
19 * Kai Voss <Kai.Voss@cern.ch> - U. of Victoria, Canada *
20 * *
21 * Copyright (c) 2005: *
22 * CERN, Switzerland *
23 * U. of Victoria, Canada *
24 * MPI-K Heidelberg, Germany *
25 * *
26 * Redistribution and use in source and binary forms, with or without *
27 * modification, are permitted according to the terms listed in LICENSE *
28 * (http://tmva.sourceforge.net/LICENSE) *
29 **********************************************************************************/
30
31/*! \class TMVA::MethodCuts
32\ingroup TMVA
33
34 Multivariate optimisation of signal efficiency for given background
35 efficiency, applying rectangular minimum and maximum requirements.
36
37 Also implemented is a "decorrelate/diagonalized cuts approach",
38 which improves over the uncorrelated cuts approach by
39 transforming linearly the input variables into a diagonal space,
40 using the square-root of the covariance matrix.
41
42 Other optimisation criteria, such as maximising the signal significance-
43 squared, \f$ \frac{S^2}{(S+B)} \f$, with S and B being the signal and background yields,
44 correspond to a particular point in the optimised background rejection
45 versus signal efficiency curve. This working point requires the knowledge
46 of the expected yields, which is not the case in general. Note also that
47 for rare signals, Poissonian statistics should be used, which modifies
48 the significance criterion.
49
50 The rectangular cut of a volume in the variable space is performed using
51 a binary tree to sort the training events. This provides a significant
52 reduction in computing time (up to several orders of magnitudes, depending
53 on the complexity of the problem at hand).
54
55 Technically, optimisation is achieved in TMVA by two methods:
56
57 1. Monte Carlo generation using uniform priors for the lower cut value,
58 and the cut width, thrown within the variable ranges.
59
60 2. A Genetic Algorithm (GA) searches for the optimal ("fittest") cut sample.
61 The GA is configurable by many external settings through the option
62 string. For difficult cases (such as many variables), some tuning
63 may be necessary to achieve satisfying results
64
65 Attempts to use Minuit fits (Simplex ot Migrad) instead have not shown
66 superior results, and often failed due to convergence at local minima.
67
68 The tests we have performed so far showed that in generic applications,
69 the GA is superior to MC sampling, and hence GA is the default method.
70 It is worthwhile trying both anyway.
71
72 **Decorrelated (or "diagonalized") Cuts**
73
74 See class description for Method Likelihood for a detailed explanation.
75*/
76
77#include "TMVA/MethodCuts.h"
78
81#include "TMVA/Config.h"
82#include "TMVA/Configurable.h"
83#include "TMVA/DataSet.h"
84#include "TMVA/DataSetInfo.h"
85#include "TMVA/Event.h"
86#include "TMVA/IFitterTarget.h"
87#include "TMVA/IMethod.h"
88#include "TMVA/GeneticFitter.h"
89#include "TMVA/Interval.h"
90#include "TMVA/FitterBase.h"
91#include "TMVA/MCFitter.h"
92#include "TMVA/MethodBase.h"
93#include "TMVA/MethodFDA.h"
94#include "TMVA/MinuitFitter.h"
95#include "TMVA/MsgLogger.h"
96#include "TMVA/PDF.h"
97#include "TMVA/Results.h"
99#include "TMVA/Timer.h"
100#include "TMVA/Tools.h"
102#include "TMVA/Types.h"
103#include "TMVA/TSpline1.h"
105
106#include "TH1F.h"
107#include "TObjString.h"
108#include "TDirectory.h"
109#include "TMath.h"
110#include "TGraph.h"
111#include "TSpline.h"
112#include "TRandom3.h"
113
114#include <cstdlib>
115#include <iostream>
116#include <iomanip>
117
118using std::atof;
119
120REGISTER_METHOD(Cuts)
121
123
125
126////////////////////////////////////////////////////////////////////////////////
127/// standard constructor
128
130 const TString& methodTitle,
131 DataSetInfo& theData,
132 const TString& theOption ) :
133 MethodBase( jobName, Types::kCuts, methodTitle, theData, theOption),
134 fFitMethod ( kUseGeneticAlgorithm ),
135 fEffMethod ( kUseEventSelection ),
136 fFitParams (0),
137 fTestSignalEff(0.7),
138 fEffSMin ( 0 ),
139 fEffSMax ( 0 ),
140 fCutRangeMin( 0 ),
141 fCutRangeMax( 0 ),
142 fBinaryTreeS( 0 ),
143 fBinaryTreeB( 0 ),
144 fCutMin ( 0 ),
145 fCutMax ( 0 ),
146 fTmpCutMin ( 0 ),
147 fTmpCutMax ( 0 ),
148 fAllVarsI ( 0 ),
149 fNpar ( 0 ),
150 fEffRef ( 0 ),
151 fRangeSign ( 0 ),
152 fRandom ( 0 ),
153 fMeanS ( 0 ),
154 fMeanB ( 0 ),
155 fRmsS ( 0 ),
156 fRmsB ( 0 ),
157 fEffBvsSLocal( 0 ),
158 fVarHistS ( 0 ),
159 fVarHistB ( 0 ),
160 fVarHistS_smooth( 0 ),
161 fVarHistB_smooth( 0 ),
162 fVarPdfS ( 0 ),
163 fVarPdfB ( 0 ),
164 fNegEffWarning( kFALSE )
165{
166}
167
168////////////////////////////////////////////////////////////////////////////////
169/// construction from weight file
170
172 const TString& theWeightFile) :
173 MethodBase( Types::kCuts, theData, theWeightFile),
174 fFitMethod ( kUseGeneticAlgorithm ),
175 fEffMethod ( kUseEventSelection ),
176 fFitParams (0),
177 fTestSignalEff(0.7),
178 fEffSMin ( 0 ),
179 fEffSMax ( 0 ),
180 fCutRangeMin( 0 ),
181 fCutRangeMax( 0 ),
182 fBinaryTreeS( 0 ),
183 fBinaryTreeB( 0 ),
184 fCutMin ( 0 ),
185 fCutMax ( 0 ),
186 fTmpCutMin ( 0 ),
187 fTmpCutMax ( 0 ),
188 fAllVarsI ( 0 ),
189 fNpar ( 0 ),
190 fEffRef ( 0 ),
191 fRangeSign ( 0 ),
192 fRandom ( 0 ),
193 fMeanS ( 0 ),
194 fMeanB ( 0 ),
195 fRmsS ( 0 ),
196 fRmsB ( 0 ),
197 fEffBvsSLocal( 0 ),
198 fVarHistS ( 0 ),
199 fVarHistB ( 0 ),
200 fVarHistS_smooth( 0 ),
201 fVarHistB_smooth( 0 ),
202 fVarPdfS ( 0 ),
203 fVarPdfB ( 0 ),
204 fNegEffWarning( kFALSE )
205{
206}
207
208////////////////////////////////////////////////////////////////////////////////
209/// Cuts can only handle classification with 2 classes
210
212 UInt_t /*numberTargets*/ )
213{
214 return (type == Types::kClassification && numberClasses == 2);
215}
216
217////////////////////////////////////////////////////////////////////////////////
218/// default initialisation called by all constructors
219
221{
222 fVarHistS = fVarHistB = 0;
223 fVarHistS_smooth = fVarHistB_smooth = 0;
224 fVarPdfS = fVarPdfB = 0;
225 fFitParams = 0;
226 fBinaryTreeS = fBinaryTreeB = 0;
227 fEffSMin = 0;
228 fEffSMax = 0;
229
230 // vector with fit results
231 fNpar = 2*GetNvar();
232 fRangeSign = new std::vector<Int_t> ( GetNvar() );
233 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) (*fRangeSign)[ivar] = +1;
234
235 fMeanS = new std::vector<Double_t>( GetNvar() );
236 fMeanB = new std::vector<Double_t>( GetNvar() );
237 fRmsS = new std::vector<Double_t>( GetNvar() );
238 fRmsB = new std::vector<Double_t>( GetNvar() );
239
240 // get the variable specific options, first initialize default
241 fFitParams = new std::vector<EFitParameters>( GetNvar() );
242 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) (*fFitParams)[ivar] = kNotEnforced;
243
244 fFitMethod = kUseMonteCarlo;
245 fTestSignalEff = -1;
246
247 // create LUT for cuts
248 fCutMin = new Double_t*[GetNvar()];
249 fCutMax = new Double_t*[GetNvar()];
250 for (UInt_t i=0; i<GetNvar(); i++) {
251 fCutMin[i] = new Double_t[fNbins];
252 fCutMax[i] = new Double_t[fNbins];
253 }
254
255 // init
256 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
257 for (Int_t ibin=0; ibin<fNbins; ibin++) {
258 fCutMin[ivar][ibin] = 0;
259 fCutMax[ivar][ibin] = 0;
260 }
261 }
262
263 fTmpCutMin = new Double_t[GetNvar()];
264 fTmpCutMax = new Double_t[GetNvar()];
265}
266
267////////////////////////////////////////////////////////////////////////////////
268/// destructor
269
271{
272 delete fRangeSign;
273 delete fMeanS;
274 delete fMeanB;
275 delete fRmsS;
276 delete fRmsB;
277 delete fFitParams;
278 delete fEffBvsSLocal;
279
280 if (NULL != fCutRangeMin) delete [] fCutRangeMin;
281 if (NULL != fCutRangeMax) delete [] fCutRangeMax;
282 if (NULL != fAllVarsI) delete [] fAllVarsI;
283
284 for (UInt_t i=0;i<GetNvar();i++) {
285 if (NULL != fCutMin[i] ) delete [] fCutMin[i];
286 if (NULL != fCutMax[i] ) delete [] fCutMax[i];
287 if (NULL != fCutRange[i]) delete fCutRange[i];
288 }
289
290 if (NULL != fCutMin) delete [] fCutMin;
291 if (NULL != fCutMax) delete [] fCutMax;
292
293 if (NULL != fTmpCutMin) delete [] fTmpCutMin;
294 if (NULL != fTmpCutMax) delete [] fTmpCutMax;
295
296 if (NULL != fBinaryTreeS) delete fBinaryTreeS;
297 if (NULL != fBinaryTreeB) delete fBinaryTreeB;
298}
299
300////////////////////////////////////////////////////////////////////////////////
301/// define the options (their key words) that can be set in the option string.
302///
303/// know options:
304/// - Method `<string>` Minimisation method. Available values are:
305/// - MC Monte Carlo `<default>`
306/// - GA Genetic Algorithm
307/// - SA Simulated annealing
308///
309/// - EffMethod `<string>` Efficiency selection method. Available values are:
310/// - EffSel `<default>`
311/// - EffPDF
312///
313/// - VarProp `<string>` Property of variable 1 for the MC method (taking precedence over the
314/// globale setting. The same values as for the global option are available. Variables 1..10 can be
315/// set this way
316///
317/// - CutRangeMin/Max `<float>` user-defined ranges in which cuts are varied
318
320{
321 DeclareOptionRef(fFitMethodS = "GA", "FitMethod", "Minimisation Method (GA, SA, and MC are the primary methods to be used; the others have been introduced for testing purposes and are depreciated)");
322 AddPreDefVal(TString("GA"));
323 AddPreDefVal(TString("SA"));
324 AddPreDefVal(TString("MC"));
325 AddPreDefVal(TString("MCEvents"));
326 AddPreDefVal(TString("MINUIT"));
327 AddPreDefVal(TString("EventScan"));
328
329 // selection type
330 DeclareOptionRef(fEffMethodS = "EffSel", "EffMethod", "Selection Method");
331 AddPreDefVal(TString("EffSel"));
332 AddPreDefVal(TString("EffPDF"));
333
334 // cut ranges
335 fCutRange.resize(GetNvar());
336 fCutRangeMin = new Double_t[GetNvar()];
337 fCutRangeMax = new Double_t[GetNvar()];
338 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
339 fCutRange[ivar] = 0;
340 fCutRangeMin[ivar] = fCutRangeMax[ivar] = -1;
341 }
342
343 DeclareOptionRef( fCutRangeMin, GetNvar(), "CutRangeMin", "Minimum of allowed cut range (set per variable)" );
344 DeclareOptionRef( fCutRangeMax, GetNvar(), "CutRangeMax", "Maximum of allowed cut range (set per variable)" );
345
346 fAllVarsI = new TString[GetNvar()];
347
348 for (UInt_t i=0; i<GetNvar(); i++) fAllVarsI[i] = "NotEnforced";
349
350 DeclareOptionRef(fAllVarsI, GetNvar(), "VarProp", "Categorisation of cuts");
351 AddPreDefVal(TString("NotEnforced"));
352 AddPreDefVal(TString("FMax"));
353 AddPreDefVal(TString("FMin"));
354 AddPreDefVal(TString("FSmart"));
355}
356
357////////////////////////////////////////////////////////////////////////////////
358/// process user options.
359///
360/// sanity check, do not allow the input variables to be normalised, because this
361/// only creates problems when interpreting the cuts
362
364{
365 if (IsNormalised()) {
366 Log() << kWARNING << "Normalisation of the input variables for cut optimisation is not" << Endl;
367 Log() << kWARNING << "supported because this provides intransparent cut values, and no" << Endl;
368 Log() << kWARNING << "improvement in the performance of the algorithm." << Endl;
369 Log() << kWARNING << "Please remove \"Normalise\" option from booking option string" << Endl;
370 Log() << kWARNING << "==> Will reset normalisation flag to \"False\"" << Endl;
371 SetNormalised( kFALSE );
372 }
373
374 if (IgnoreEventsWithNegWeightsInTraining()) {
375 Log() << kFATAL << "Mechanism to ignore events with negative weights in training not yet available for method: "
376 << GetMethodTypeName()
377 << " --> Please remove \"IgnoreNegWeightsInTraining\" option from booking string."
378 << Endl;
379 }
380
381 if (fFitMethodS == "MC" ) fFitMethod = kUseMonteCarlo;
382 else if (fFitMethodS == "MCEvents") fFitMethod = kUseMonteCarloEvents;
383 else if (fFitMethodS == "GA" ) fFitMethod = kUseGeneticAlgorithm;
384 else if (fFitMethodS == "SA" ) fFitMethod = kUseSimulatedAnnealing;
385 else if (fFitMethodS == "MINUIT" ) {
386 fFitMethod = kUseMinuit;
387 Log() << kWARNING << "poor performance of MINUIT in MethodCuts; preferred fit method: GA" << Endl;
388 }
389 else if (fFitMethodS == "EventScan" ) fFitMethod = kUseEventScan;
390 else Log() << kFATAL << "unknown minimisation method: " << fFitMethodS << Endl;
391
392 if (fEffMethodS == "EFFSEL" ) fEffMethod = kUseEventSelection; // highly recommended
393 else if (fEffMethodS == "EFFPDF" ) fEffMethod = kUsePDFs;
394 else fEffMethod = kUseEventSelection;
395
396 // options output
397 Log() << kINFO << Form("Use optimization method: \"%s\"",
398 (fFitMethod == kUseMonteCarlo) ? "Monte Carlo" :
399 (fFitMethod == kUseMonteCarlo) ? "Monte-Carlo-Event sampling" :
400 (fFitMethod == kUseEventScan) ? "Full Event Scan (slow)" :
401 (fFitMethod == kUseMinuit) ? "MINUIT" : "Genetic Algorithm" ) << Endl;
402 Log() << kINFO << Form("Use efficiency computation method: \"%s\"",
403 (fEffMethod == kUseEventSelection) ? "Event Selection" : "PDF" ) << Endl;
404
405 // cut ranges
406 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
407 fCutRange[ivar] = new Interval( fCutRangeMin[ivar], fCutRangeMax[ivar] );
408 }
409
410 // individual options
411 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
412 EFitParameters theFitP = kNotEnforced;
413 if (fAllVarsI[ivar] == "" || fAllVarsI[ivar] == "NotEnforced") theFitP = kNotEnforced;
414 else if (fAllVarsI[ivar] == "FMax" ) theFitP = kForceMax;
415 else if (fAllVarsI[ivar] == "FMin" ) theFitP = kForceMin;
416 else if (fAllVarsI[ivar] == "FSmart" ) theFitP = kForceSmart;
417 else {
418 Log() << kFATAL << "unknown value \'" << fAllVarsI[ivar]
419 << "\' for fit parameter option " << Form("VarProp[%i]",ivar) << Endl;
420 }
421 (*fFitParams)[ivar] = theFitP;
422
423 if (theFitP != kNotEnforced)
424 Log() << kINFO << "Use \"" << fAllVarsI[ivar]
425 << "\" cuts for variable: " << "'" << (*fInputVars)[ivar] << "'" << Endl;
426 }
427}
428
429////////////////////////////////////////////////////////////////////////////////
430/// cut evaluation: returns 1.0 if event passed, 0.0 otherwise
431
433{
434 // cannot determine error
435 NoErrorCalc(err, errUpper);
436
437 // sanity check
438 if (fCutMin == NULL || fCutMax == NULL || fNbins == 0) {
439 Log() << kFATAL << "<Eval_Cuts> fCutMin/Max have zero pointer. "
440 << "Did you book Cuts ?" << Endl;
441 }
442
443 const Event* ev = GetEvent();
444
445 // sanity check
446 if (fTestSignalEff > 0) {
447 // get efficiency bin
448 Int_t ibin = fEffBvsSLocal->FindBin( fTestSignalEff );
449 if (ibin < 0 ) ibin = 0;
450 else if (ibin >= fNbins) ibin = fNbins - 1;
451
452 Bool_t passed = kTRUE;
453 for (UInt_t ivar=0; ivar<GetNvar(); ivar++)
454 passed &= ( (ev->GetValue(ivar) > fCutMin[ivar][ibin]) &&
455 (ev->GetValue(ivar) <= fCutMax[ivar][ibin]) );
456
457 return passed ? 1. : 0. ;
458 }
459 else return 0;
460}
461
462////////////////////////////////////////////////////////////////////////////////
463/// print cuts
464
466{
467 std::vector<Double_t> cutsMin;
468 std::vector<Double_t> cutsMax;
469 Int_t ibin = fEffBvsSLocal->FindBin( effS );
470
471 Double_t trueEffS = GetCuts( effS, cutsMin, cutsMax );
472
473 // retrieve variable expressions (could be transformations)
474 std::vector<TString>* varVec = 0;
475 if (GetTransformationHandler().GetNumOfTransformations() == 0) {
476 // no transformation applied, replace by current variables
477 varVec = new std::vector<TString>;
478 for (UInt_t ivar=0; ivar<cutsMin.size(); ivar++) {
479 varVec->push_back( DataInfo().GetVariableInfo(ivar).GetLabel() );
480 }
481 }
482 else if (GetTransformationHandler().GetNumOfTransformations() == 1) {
483 // get transformation string
484 varVec = GetTransformationHandler().GetTransformationStringsOfLastTransform();
485 }
486 else {
487 // replace transformation print by current variables and indicated incompleteness
488 varVec = new std::vector<TString>;
489 for (UInt_t ivar=0; ivar<cutsMin.size(); ivar++) {
490 varVec->push_back( DataInfo().GetVariableInfo(ivar).GetLabel() + " [transformed]" );
491 }
492 }
493
494 UInt_t maxL = 0;
495 for (UInt_t ivar=0; ivar<cutsMin.size(); ivar++) {
496 if ((UInt_t)(*varVec)[ivar].Length() > maxL) maxL = (*varVec)[ivar].Length();
497 }
498 UInt_t maxLine = 20+maxL+16;
499
500 for (UInt_t i=0; i<maxLine; i++) Log() << "-";
501 Log() << Endl;
502 Log() << kHEADER << "Cut values for requested signal efficiency: " << trueEffS << Endl;
503 Log() << kINFO << "Corresponding background efficiency : " << fEffBvsSLocal->GetBinContent( ibin ) << Endl;
504 if (GetTransformationHandler().GetNumOfTransformations() == 1) {
505 Log() << kINFO << "Transformation applied to input variables : \""
506 << GetTransformationHandler().GetNameOfLastTransform() << "\"" << Endl;
507 }
508 else if (GetTransformationHandler().GetNumOfTransformations() > 1) {
509 Log() << kINFO << "[ More than one (=" << GetTransformationHandler().GetNumOfTransformations() << ") "
510 << " transformations applied in transformation chain; cuts applied on transformed quantities ] " << Endl;
511 }
512 else {
513 Log() << kINFO << "Transformation applied to input variables : None" << Endl;
514 }
515 for (UInt_t i=0; i<maxLine; i++) Log() << "-";
516 Log() << Endl;
517 for (UInt_t ivar=0; ivar<cutsMin.size(); ivar++) {
518 Log() << kINFO
519 << "Cut[" << std::setw(2) << ivar << "]: "
520 << std::setw(10) << cutsMin[ivar]
521 << " < "
522 << std::setw(maxL) << (*varVec)[ivar]
523 << " <= "
524 << std::setw(10) << cutsMax[ivar] << Endl;
525 }
526 for (UInt_t i=0; i<maxLine; i++) Log() << "-";
527 Log() << Endl;
528
529 delete varVec; // yes, ownership has been given to us
530}
531
532////////////////////////////////////////////////////////////////////////////////
533/// retrieve cut values for given signal efficiency
534/// assume vector of correct size !!
535
537{
538 std::vector<Double_t> cMin( GetNvar() );
539 std::vector<Double_t> cMax( GetNvar() );
540 Double_t trueEffS = GetCuts( effS, cMin, cMax );
541 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
542 cutMin[ivar] = cMin[ivar];
543 cutMax[ivar] = cMax[ivar];
544 }
545 return trueEffS;
546}
547
548////////////////////////////////////////////////////////////////////////////////
549/// retrieve cut values for given signal efficiency
550
552 std::vector<Double_t>& cutMin,
553 std::vector<Double_t>& cutMax ) const
554{
555 // find corresponding bin
556 Int_t ibin = fEffBvsSLocal->FindBin( effS );
557
558 // get the true efficiency which is the one on the "left hand" side of the bin
559 Double_t trueEffS = fEffBvsSLocal->GetBinLowEdge( ibin );
560
561 ibin--; // the 'cut' vector has 0...fNbins indices
562 if (ibin < 0 ) ibin = 0;
563 else if (ibin >= fNbins) ibin = fNbins - 1;
564
565 cutMin.clear();
566 cutMax.clear();
567 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
568 cutMin.push_back( fCutMin[ivar][ibin] );
569 cutMax.push_back( fCutMax[ivar][ibin] );
570 }
571
572 return trueEffS;
573}
574
575////////////////////////////////////////////////////////////////////////////////
576/// training method: here the cuts are optimised for the training sample
577
579{
580 if (fEffMethod == kUsePDFs) CreateVariablePDFs(); // create PDFs for variables
581
582 // create binary trees (global member variables) for signal and background
583 if (fBinaryTreeS != 0) { delete fBinaryTreeS; fBinaryTreeS = 0; }
584 if (fBinaryTreeB != 0) { delete fBinaryTreeB; fBinaryTreeB = 0; }
585
586 // the variables may be transformed by a transformation method: to coherently
587 // treat signal and background one must decide which transformation type shall
588 // be used: our default is signal-type
589
590 fBinaryTreeS = new BinarySearchTree();
591 fBinaryTreeS->Fill( GetEventCollection(Types::kTraining), fSignalClass );
592 fBinaryTreeB = new BinarySearchTree();
593 fBinaryTreeB->Fill( GetEventCollection(Types::kTraining), fBackgroundClass );
594
595 for (UInt_t ivar =0; ivar < Data()->GetNVariables(); ivar++) {
596 (*fMeanS)[ivar] = fBinaryTreeS->Mean(Types::kSignal, ivar);
597 (*fRmsS)[ivar] = fBinaryTreeS->RMS (Types::kSignal, ivar);
598 (*fMeanB)[ivar] = fBinaryTreeB->Mean(Types::kBackground, ivar);
599 (*fRmsB)[ivar] = fBinaryTreeB->RMS (Types::kBackground, ivar);
600
601 // update interval ?
602 Double_t xmin = TMath::Min(fBinaryTreeS->Min(Types::kSignal, ivar),
603 fBinaryTreeB->Min(Types::kBackground, ivar));
604 Double_t xmax = TMath::Max(fBinaryTreeS->Max(Types::kSignal, ivar),
605 fBinaryTreeB->Max(Types::kBackground, ivar));
606
607 // redefine ranges to be slightly smaller and larger than xmin and xmax, respectively
608 Double_t eps = 0.01*(xmax - xmin);
609 xmin -= eps;
610 xmax += eps;
611
612 if (TMath::Abs(fCutRange[ivar]->GetMin() - fCutRange[ivar]->GetMax()) < 1.0e-300 ) {
613 fCutRange[ivar]->SetMin( xmin );
614 fCutRange[ivar]->SetMax( xmax );
615 }
616 else if (xmin > fCutRange[ivar]->GetMin()) fCutRange[ivar]->SetMin( xmin );
617 else if (xmax < fCutRange[ivar]->GetMax()) fCutRange[ivar]->SetMax( xmax );
618 }
619
620 std::vector<TH1F*> signalDist, bkgDist;
621
622 // this is important: reset the branch addresses of the training tree to the current event
623 delete fEffBvsSLocal;
624 fEffBvsSLocal = new TH1F( GetTestvarName() + "_effBvsSLocal",
625 TString(GetName()) + " efficiency of B vs S", fNbins, 0.0, 1.0 );
626 fEffBvsSLocal->SetDirectory(nullptr); // it's local
627
628 // init
629 for (Int_t ibin=1; ibin<=fNbins; ibin++) fEffBvsSLocal->SetBinContent( ibin, -0.1 );
630
631 // --------------------------------------------------------------------------
632 if (fFitMethod == kUseGeneticAlgorithm ||
633 fFitMethod == kUseMonteCarlo ||
634 fFitMethod == kUseMinuit ||
635 fFitMethod == kUseSimulatedAnnealing) {
636
637 // ranges
638 std::vector<Interval*> ranges;
639
640 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
641
642 Int_t nbins = 0;
643 if (DataInfo().GetVariableInfo(ivar).GetVarType() == 'I') {
644 nbins = Int_t(fCutRange[ivar]->GetMax() - fCutRange[ivar]->GetMin()) + 1;
645 }
646
647 if ((*fFitParams)[ivar] == kForceSmart) {
648 if ((*fMeanS)[ivar] > (*fMeanB)[ivar]) (*fFitParams)[ivar] = kForceMax;
649 else (*fFitParams)[ivar] = kForceMin;
650 }
651
652 if ((*fFitParams)[ivar] == kForceMin) {
653 ranges.push_back( new Interval( fCutRange[ivar]->GetMin(), fCutRange[ivar]->GetMin(), nbins ) );
654 ranges.push_back( new Interval( 0, fCutRange[ivar]->GetMax() - fCutRange[ivar]->GetMin(), nbins ) );
655 }
656 else if ((*fFitParams)[ivar] == kForceMax) {
657 ranges.push_back( new Interval( fCutRange[ivar]->GetMin(), fCutRange[ivar]->GetMax(), nbins ) );
658 ranges.push_back( new Interval( fCutRange[ivar]->GetMax() - fCutRange[ivar]->GetMin(),
659 fCutRange[ivar]->GetMax() - fCutRange[ivar]->GetMin(), nbins ) );
660 }
661 else {
662 ranges.push_back( new Interval( fCutRange[ivar]->GetMin(), fCutRange[ivar]->GetMax(), nbins ) );
663 ranges.push_back( new Interval( 0, fCutRange[ivar]->GetMax() - fCutRange[ivar]->GetMin(), nbins ) );
664 }
665 }
666
667 // create the fitter
668 FitterBase* fitter = NULL;
669
670 switch (fFitMethod) {
671 case kUseGeneticAlgorithm:
672 fitter = new GeneticFitter( *this, TString::Format("%sFitter_GA", GetName()), ranges, GetOptions() );
673 break;
674 case kUseMonteCarlo:
675 fitter = new MCFitter ( *this, TString::Format("%sFitter_MC", GetName()), ranges, GetOptions() );
676 break;
677 case kUseMinuit:
678 fitter = new MinuitFitter ( *this, TString::Format("%sFitter_MINUIT", GetName()), ranges, GetOptions() );
679 break;
680 case kUseSimulatedAnnealing:
681 fitter = new SimulatedAnnealingFitter( *this, TString::Format("%sFitter_SA", GetName()), ranges, GetOptions() );
682 break;
683 default:
684 Log() << kFATAL << "Wrong fit method: " << fFitMethod << Endl;
685 }
686
687 if (fInteractive) fitter->SetIPythonInteractive(&fExitFromTraining, &fIPyMaxIter, &fIPyCurrentIter);
688
689 fitter->CheckForUnusedOptions();
690
691 // perform the fit
692 fitter->Run();
693
694 // clean up
695 for (UInt_t ivar=0; ivar<ranges.size(); ivar++) delete ranges[ivar];
696 delete fitter;
697
698 }
699 // --------------------------------------------------------------------------
700 else if (fFitMethod == kUseEventScan) {
701
702 Int_t nevents = Data()->GetNEvents();
703 Int_t ic = 0;
704
705 // timing of MC
706 Int_t nsamples = Int_t(0.5*nevents*(nevents - 1));
707 Timer timer( nsamples, GetName() );
708 fIPyMaxIter = nsamples;
709
710 Log() << kINFO << "Running full event scan: " << Endl;
711 for (Int_t ievt1=0; ievt1<nevents; ievt1++) {
712 for (Int_t ievt2=ievt1+1; ievt2<nevents; ievt2++) {
713
714 fIPyCurrentIter = ic;
715 if (fExitFromTraining) break;
716 EstimatorFunction( ievt1, ievt2 );
717
718 // what's the time please?
719 ic++;
720 if ((nsamples<10000) || ic%10000 == 0) timer.DrawProgressBar( ic );
721 }
722 }
723 }
724 // --------------------------------------------------------------------------
725 else if (fFitMethod == kUseMonteCarloEvents) {
726
727 Int_t nsamples = 200000;
728 UInt_t seed = 100;
729 DeclareOptionRef( nsamples, "SampleSize", "Number of Monte-Carlo-Event samples" );
730 DeclareOptionRef( seed, "Seed", "Seed for the random generator (0 takes random seeds)" );
731 ParseOptions();
732
733 Int_t nevents = Data()->GetNEvents();
734 Int_t ic = 0;
735
736 // timing of MC
737 Timer timer( nsamples, GetName() );
738 fIPyMaxIter = nsamples;
739
740 // random generator
741 TRandom3*rnd = new TRandom3( seed );
742
743 Log() << kINFO << "Running Monte-Carlo-Event sampling over " << nsamples << " events" << Endl;
744 std::vector<Double_t> pars( 2*GetNvar() );
745
746 for (Int_t itoy=0; itoy<nsamples; itoy++) {
747 fIPyCurrentIter = ic;
748 if (fExitFromTraining) break;
749
750 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
751
752 // generate minimum and delta cuts for this variable
753
754 // retrieve signal events
755 Bool_t isSignal = kFALSE;
756 Int_t ievt1, ievt2;
757 Double_t evt1 = 0., evt2 = 0.;
758 Int_t nbreak = 0;
759 while (!isSignal) {
760 ievt1 = Int_t(rnd->Uniform(0.,1.)*nevents);
761 ievt2 = Int_t(rnd->Uniform(0.,1.)*nevents);
762
763 const Event *ev1 = GetEvent(ievt1);
764 isSignal = DataInfo().IsSignal(ev1);
765 evt1 = ev1->GetValue( ivar );
766
767 const Event *ev2 = GetEvent(ievt2);
768 isSignal &= DataInfo().IsSignal(ev2);
769 evt2 = ev2->GetValue( ivar );
770
771 if (nbreak++ > 10000) {
772 Log() << kFATAL << "<MCEvents>: could not find signal events"
773 << " after 10000 trials - do you have signal events in your sample ?"
774 << Endl;
775 isSignal = 1;
776 }
777 }
778
779 // sort
780 if (evt1 > evt2) { Double_t z = evt1; evt1 = evt2; evt2 = z; }
781 pars[2*ivar] = evt1;
782 pars[2*ivar+1] = evt2 - evt1;
783 }
784
785 // compute estimator
786 EstimatorFunction( pars );
787
788 // what's the time please?
789 ic++;
790 if ((nsamples<1000) || ic%1000 == 0) timer.DrawProgressBar( ic );
791 }
792
793 delete rnd;
794 }
795 // --------------------------------------------------------------------------
796 else Log() << kFATAL << "Unknown minimisation method: " << fFitMethod << Endl;
797
798 if (fBinaryTreeS != 0) { delete fBinaryTreeS; fBinaryTreeS = 0; }
799 if (fBinaryTreeB != 0) { delete fBinaryTreeB; fBinaryTreeB = 0; }
800
801 // force cut ranges within limits
802 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
803 for (Int_t ibin=0; ibin<fNbins; ibin++) {
804
805 if ((*fFitParams)[ivar] == kForceMin && fCutMin[ivar][ibin] > -fgMaxAbsCutVal) {
806 fCutMin[ivar][ibin] = -fgMaxAbsCutVal;
807 }
808 if ((*fFitParams)[ivar] == kForceMax && fCutMax[ivar][ibin] < fgMaxAbsCutVal) {
809 fCutMax[ivar][ibin] = fgMaxAbsCutVal;
810 }
811 }
812 }
813
814 // some output
815 // the efficiency which is asked for has to be slightly higher than the bin-borders.
816 // if not, then the wrong bin is taken in some cases.
817 Double_t epsilon = 0.0001;
818 for (Double_t eff=0.1; eff<0.95; eff += 0.1) PrintCuts( eff+epsilon );
819
820 if (!fExitFromTraining) fIPyMaxIter = fIPyCurrentIter;
821 ExitFromTraining();
822}
823
824////////////////////////////////////////////////////////////////////////////////
825/// nothing to test
826
828{
829}
830
831////////////////////////////////////////////////////////////////////////////////
832/// for full event scan
833
835{
836 const Event *ev1 = GetEvent(ievt1);
837 if (!DataInfo().IsSignal(ev1)) return -1;
838
839 const Event *ev2 = GetEvent(ievt2);
840 if (!DataInfo().IsSignal(ev2)) return -1;
841
842 const Int_t nvar = GetNvar();
843 Double_t* evt1 = new Double_t[nvar];
844 Double_t* evt2 = new Double_t[nvar];
845
846 for (Int_t ivar=0; ivar<nvar; ivar++) {
847 evt1[ivar] = ev1->GetValue( ivar );
848 evt2[ivar] = ev2->GetValue( ivar );
849 }
850
851 // determine cuts
852 std::vector<Double_t> pars;
853 for (Int_t ivar=0; ivar<nvar; ivar++) {
854 Double_t cutMin;
855 Double_t cutMax;
856 if (evt1[ivar] < evt2[ivar]) {
857 cutMin = evt1[ivar];
858 cutMax = evt2[ivar];
859 }
860 else {
861 cutMin = evt2[ivar];
862 cutMax = evt1[ivar];
863 }
864
865 pars.push_back( cutMin );
866 pars.push_back( cutMax - cutMin );
867 }
868
869 delete [] evt1;
870 delete [] evt2;
871
872 return ComputeEstimator( pars );
873}
874
875////////////////////////////////////////////////////////////////////////////////
876/// returns estimator for "cut fitness" used by GA
877
878Double_t TMVA::MethodCuts::EstimatorFunction( std::vector<Double_t>& pars )
879{
880 return ComputeEstimator( pars );
881}
882
883////////////////////////////////////////////////////////////////////////////////
884/// returns estimator for "cut fitness" used by GA.
885///
886/// there are two requirements:
887/// 1. the signal efficiency must be equal to the required one in the
888/// efficiency scan
889/// 2. the background efficiency must be as small as possible
890///
891/// the requirement 1. has priority over 2.
892
893Double_t TMVA::MethodCuts::ComputeEstimator( std::vector<Double_t>& pars )
894{
895 // caution: the npar gives the _free_ parameters
896 // however: the "pars" array contains all parameters
897
898 // determine cuts
899 Double_t effS = 0, effB = 0;
900 this->MatchParsToCuts( pars, &fTmpCutMin[0], &fTmpCutMax[0] );
901
902 // retrieve signal and background efficiencies for given cut
903 switch (fEffMethod) {
904 case kUsePDFs:
905 this->GetEffsfromPDFs (&fTmpCutMin[0], &fTmpCutMax[0], effS, effB);
906 break;
907 case kUseEventSelection:
908 this->GetEffsfromSelection (&fTmpCutMin[0], &fTmpCutMax[0], effS, effB);
909 break;
910 default:
911 this->GetEffsfromSelection (&fTmpCutMin[0], &fTmpCutMax[0], effS, effB);
912 }
913
914 Double_t eta = 0;
915
916 // test for a estimator function which optimizes on the whole background-rejection signal-efficiency plot
917
918 // get the backg-reject. and sig-eff for the parameters given to this function
919 // effS, effB
920
921 // get best background rejection for given signal efficiency
922 Int_t ibinS = fEffBvsSLocal->FindBin( effS );
923
924 Double_t effBH = fEffBvsSLocal->GetBinContent( ibinS );
925 Double_t effBH_left = (ibinS > 1 ) ? fEffBvsSLocal->GetBinContent( ibinS-1 ) : effBH;
926 Double_t effBH_right = (ibinS < fNbins) ? fEffBvsSLocal->GetBinContent( ibinS+1 ) : effBH;
927
928 Double_t average = 0.5*(effBH_left + effBH_right);
929 if (effBH < effB) average = effBH;
930
931 // if the average of the bin right and left is larger than this one, add the difference to
932 // the current value of the estimator (because you can do at least so much better)
933 eta = ( -TMath::Abs(effBH-average) + (1.0 - (effBH - effB))) / (1.0 + effS);
934 // alternative idea
935 //if (effBH<0) eta = (1.e-6+effB)/(1.0 + effS);
936 //else eta = (effB - effBH) * (1.0 + 10.* effS);
937
938 // if a point is found which is better than an existing one, ... replace it.
939 // preliminary best event -> backup
940 if (effBH < 0 || effBH > effB) {
941 fEffBvsSLocal->SetBinContent( ibinS, effB );
942 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
943 fCutMin[ivar][ibinS-1] = fTmpCutMin[ivar]; // bin 1 stored in index 0
944 fCutMax[ivar][ibinS-1] = fTmpCutMax[ivar];
945 }
946 }
947
948 // caution (!) this value is not good for a decision for MC, .. it is designed for GA
949 // but .. it doesn't matter, as MC samplings are independent from the former ones
950 // and the replacement of the best variables by better ones is done about 10 lines above.
951 // ( if (effBH < 0 || effBH > effB) { .... )
952
953 if (ibinS<=1) {
954 // add penalty for effS=0 bin
955 // to avoid that the minimizer gets stuck in the zero-bin
956 // force it towards higher efficiency
957 Double_t penalty=0.,diff=0.;
958 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
959 diff=(fCutRange[ivar]->GetMax()-fTmpCutMax[ivar])/(fCutRange[ivar]->GetMax()-fCutRange[ivar]->GetMin());
960 penalty+=diff*diff;
961 diff=(fCutRange[ivar]->GetMin()-fTmpCutMin[ivar])/(fCutRange[ivar]->GetMax()-fCutRange[ivar]->GetMin());
962 penalty+=4.*diff*diff;
963 }
964
965 if (effS<1.e-4) return 10.0+penalty;
966 else return 10.*(1.-10.*effS);
967 }
968 return eta;
969}
970
971////////////////////////////////////////////////////////////////////////////////
972/// translates parameters into cuts
973
974void TMVA::MethodCuts::MatchParsToCuts( const std::vector<Double_t> & pars,
975 Double_t* cutMin, Double_t* cutMax )
976{
977 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
978 Int_t ipar = 2*ivar;
979 cutMin[ivar] = ((*fRangeSign)[ivar] > 0) ? pars[ipar] : pars[ipar] - pars[ipar+1];
980 cutMax[ivar] = ((*fRangeSign)[ivar] > 0) ? pars[ipar] + pars[ipar+1] : pars[ipar];
981 }
982}
983
984////////////////////////////////////////////////////////////////////////////////
985/// translate the cuts into parameters (obsolete function)
986
987void TMVA::MethodCuts::MatchCutsToPars( std::vector<Double_t>& pars,
988 Double_t** cutMinAll, Double_t** cutMaxAll, Int_t ibin )
989{
990 if (ibin < 1 || ibin > fNbins) Log() << kFATAL << "::MatchCutsToPars: bin error: "
991 << ibin << Endl;
992
993 const UInt_t nvar = GetNvar();
994 Double_t *cutMin = new Double_t[nvar];
995 Double_t *cutMax = new Double_t[nvar];
996 for (UInt_t ivar=0; ivar<nvar; ivar++) {
997 cutMin[ivar] = cutMinAll[ivar][ibin-1];
998 cutMax[ivar] = cutMaxAll[ivar][ibin-1];
999 }
1000
1001 MatchCutsToPars( pars, cutMin, cutMax );
1002 delete [] cutMin;
1003 delete [] cutMax;
1004}
1005
1006////////////////////////////////////////////////////////////////////////////////
1007/// translates cuts into parameters
1008
1009void TMVA::MethodCuts::MatchCutsToPars( std::vector<Double_t>& pars,
1010 Double_t* cutMin, Double_t* cutMax )
1011{
1012 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
1013 Int_t ipar = 2*ivar;
1014 pars[ipar] = ((*fRangeSign)[ivar] > 0) ? cutMin[ivar] : cutMax[ivar];
1015 pars[ipar+1] = cutMax[ivar] - cutMin[ivar];
1016 }
1017}
1018
1019////////////////////////////////////////////////////////////////////////////////
1020/// compute signal and background efficiencies from PDFs
1021/// for given cut sample
1022
1024 Double_t& effS, Double_t& effB )
1025{
1026 effS = 1.0;
1027 effB = 1.0;
1028 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
1029 effS *= (*fVarPdfS)[ivar]->GetIntegral( cutMin[ivar], cutMax[ivar] );
1030 effB *= (*fVarPdfB)[ivar]->GetIntegral( cutMin[ivar], cutMax[ivar] );
1031 }
1032
1033 // quick fix to prevent from efficiencies < 0
1034 if( effS < 0.0 ) {
1035 effS = 0.0;
1036 if( !fNegEffWarning ) Log() << kWARNING << "Negative signal efficiency found and set to 0. This is probably due to many events with negative weights in a certain cut-region." << Endl;
1037 fNegEffWarning = kTRUE;
1038 }
1039 if( effB < 0.0 ) {
1040 effB = 0.0;
1041 if( !fNegEffWarning ) Log() << kWARNING << "Negative background efficiency found and set to 0. This is probably due to many events with negative weights in a certain cut-region." << Endl;
1042 fNegEffWarning = kTRUE;
1043 }
1044}
1045
1046////////////////////////////////////////////////////////////////////////////////
1047/// compute signal and background efficiencies from event counting
1048/// for given cut sample
1049
1051 Double_t& effS, Double_t& effB)
1052{
1053 Float_t nTotS = 0, nTotB = 0;
1054 Float_t nSelS = 0, nSelB = 0;
1055
1056 Volume* volume = new Volume( cutMin, cutMax, GetNvar() );
1057
1058 // search for all events lying in the volume, and add up their weights
1059 nSelS = fBinaryTreeS->SearchVolume( volume );
1060 nSelB = fBinaryTreeB->SearchVolume( volume );
1061
1062 delete volume;
1063
1064 // total number of "events" (sum of weights) as reference to compute efficiency
1065 nTotS = fBinaryTreeS->GetSumOfWeights();
1066 nTotB = fBinaryTreeB->GetSumOfWeights();
1067
1068 // sanity check
1069 if (nTotS == 0 && nTotB == 0) {
1070 Log() << kFATAL << "<GetEffsfromSelection> fatal error in zero total number of events:"
1071 << " nTotS, nTotB: " << nTotS << " " << nTotB << " ***" << Endl;
1072 }
1073
1074 // efficiencies
1075 if (nTotS == 0 ) {
1076 effS = 0;
1077 effB = nSelB/nTotB;
1078 Log() << kWARNING << "<ComputeEstimator> zero number of signal events" << Endl;
1079 }
1080 else if (nTotB == 0) {
1081 effB = 0;
1082 effS = nSelS/nTotS;
1083 Log() << kWARNING << "<ComputeEstimator> zero number of background events" << Endl;
1084 }
1085 else {
1086 effS = nSelS/nTotS;
1087 effB = nSelB/nTotB;
1088 }
1089
1090 // quick fix to prevent from efficiencies < 0
1091 if( effS < 0.0 ) {
1092 effS = 0.0;
1093 if( !fNegEffWarning ) Log() << kWARNING << "Negative signal efficiency found and set to 0. This is probably due to many events with negative weights in a certain cut-region." << Endl;
1094 fNegEffWarning = kTRUE;
1095 }
1096 if( effB < 0.0 ) {
1097 effB = 0.0;
1098 if( !fNegEffWarning ) Log() << kWARNING << "Negative background efficiency found and set to 0. This is probably due to many events with negative weights in a certain cut-region." << Endl;
1099 fNegEffWarning = kTRUE;
1100 }
1101}
1102
1103////////////////////////////////////////////////////////////////////////////////
1104/// for PDF method: create efficiency reference histograms and PDFs
1105
1107{
1108 // create list of histograms and PDFs
1109 fVarHistS = new std::vector<TH1*>( GetNvar() );
1110 fVarHistB = new std::vector<TH1*>( GetNvar() );
1111 fVarHistS_smooth = new std::vector<TH1*>( GetNvar() );
1112 fVarHistB_smooth = new std::vector<TH1*>( GetNvar() );
1113 fVarPdfS = new std::vector<PDF*>( GetNvar() );
1114 fVarPdfB = new std::vector<PDF*>( GetNvar() );
1115
1116 Int_t nsmooth = 0;
1117
1118 // get min and max values of all events
1119 Double_t minVal = DBL_MAX;
1120 Double_t maxVal = -DBL_MAX;
1121 for( UInt_t ievt=0; ievt<Data()->GetNEvents(); ievt++ ){
1122 const Event *ev = GetEvent(ievt);
1123 Float_t val = ev->GetValue(ievt);
1124 if( val > minVal ) minVal = val;
1125 if( val < maxVal ) maxVal = val;
1126 }
1127
1128 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
1129
1130 // ---- signal
1131 TString histTitle = (*fInputVars)[ivar] + " signal training";
1132 TString histName = (*fInputVars)[ivar] + "_sig";
1133 // TString drawOpt = (*fInputVars)[ivar] + ">>h(";
1134 // drawOpt += fNbins;
1135 // drawOpt += ")";
1136
1137 // selection
1138 // Data().GetTrainingTree()->Draw( drawOpt, "type==1", "goff" );
1139 // (*fVarHistS)[ivar] = (TH1F*)gDirectory->Get("h");
1140 // (*fVarHistS)[ivar]->SetName(histName);
1141 // (*fVarHistS)[ivar]->SetTitle(histTitle);
1142
1143 (*fVarHistS)[ivar] = new TH1F(histName.Data(), histTitle.Data(), fNbins, minVal, maxVal );
1144
1145 // ---- background
1146 histTitle = (*fInputVars)[ivar] + " background training";
1147 histName = (*fInputVars)[ivar] + "_bgd";
1148 // drawOpt = (*fInputVars)[ivar] + ">>h(";
1149 // drawOpt += fNbins;
1150 // drawOpt += ")";
1151
1152 // Data().GetTrainingTree()->Draw( drawOpt, "type==0", "goff" );
1153 // (*fVarHistB)[ivar] = (TH1F*)gDirectory->Get("h");
1154 // (*fVarHistB)[ivar]->SetName(histName);
1155 // (*fVarHistB)[ivar]->SetTitle(histTitle);
1156
1157
1158 (*fVarHistB)[ivar] = new TH1F(histName.Data(), histTitle.Data(), fNbins, minVal, maxVal );
1159
1160 for( UInt_t ievt=0; ievt<Data()->GetNEvents(); ievt++ ){
1161 const Event *ev = GetEvent(ievt);
1162 Float_t val = ev->GetValue(ievt);
1163 if( DataInfo().IsSignal(ev) ){
1164 (*fVarHistS)[ivar]->Fill( val );
1165 }else{
1166 (*fVarHistB)[ivar]->Fill( val );
1167 }
1168 }
1169
1170
1171
1172 // make copy for smoothed histos
1173 (*fVarHistS_smooth)[ivar] = (TH1F*)(*fVarHistS)[ivar]->Clone();
1174 histTitle = (*fInputVars)[ivar] + " signal training smoothed ";
1175 histTitle += nsmooth;
1176 histTitle +=" times";
1177 histName = (*fInputVars)[ivar] + "_sig_smooth";
1178 (*fVarHistS_smooth)[ivar]->SetName(histName);
1179 (*fVarHistS_smooth)[ivar]->SetTitle(histTitle);
1180
1181 // smooth
1182 (*fVarHistS_smooth)[ivar]->Smooth(nsmooth);
1183
1184 // ---- background
1185 // histTitle = (*fInputVars)[ivar] + " background training";
1186 // histName = (*fInputVars)[ivar] + "_bgd";
1187 // drawOpt = (*fInputVars)[ivar] + ">>h(";
1188 // drawOpt += fNbins;
1189 // drawOpt += ")";
1190
1191 // Data().GetTrainingTree()->Draw( drawOpt, "type==0", "goff" );
1192 // (*fVarHistB)[ivar] = (TH1F*)gDirectory->Get("h");
1193 // (*fVarHistB)[ivar]->SetName(histName);
1194 // (*fVarHistB)[ivar]->SetTitle(histTitle);
1195
1196 // make copy for smoothed histos
1197 (*fVarHistB_smooth)[ivar] = (TH1F*)(*fVarHistB)[ivar]->Clone();
1198 histTitle = (*fInputVars)[ivar]+" background training smoothed ";
1199 histTitle += nsmooth;
1200 histTitle +=" times";
1201 histName = (*fInputVars)[ivar]+"_bgd_smooth";
1202 (*fVarHistB_smooth)[ivar]->SetName(histName);
1203 (*fVarHistB_smooth)[ivar]->SetTitle(histTitle);
1204
1205 // smooth
1206 (*fVarHistB_smooth)[ivar]->Smooth(nsmooth);
1207
1208 // create PDFs
1209 (*fVarPdfS)[ivar] = new PDF( TString(GetName()) + " PDF Var Sig " + GetInputVar( ivar ), (*fVarHistS_smooth)[ivar], PDF::kSpline2 );
1210 (*fVarPdfB)[ivar] = new PDF( TString(GetName()) + " PDF Var Bkg " + GetInputVar( ivar ), (*fVarHistB_smooth)[ivar], PDF::kSpline2 );
1211 }
1212}
1213
1214////////////////////////////////////////////////////////////////////////////////
1215/// read the cuts from stream
1216
1218{
1219 TString dummy;
1220 UInt_t dummyInt;
1221
1222 // first the dimensions
1223 istr >> dummy >> dummy;
1224 // coverity[tainted_data_argument]
1225 istr >> dummy >> fNbins;
1226
1227 // get rid of one read-in here because we read in once all ready to check for decorrelation
1228 istr >> dummy >> dummy >> dummy >> dummy >> dummy >> dummy >> dummyInt >> dummy ;
1229
1230 // sanity check
1231 if (dummyInt != Data()->GetNVariables()) {
1232 Log() << kFATAL << "<ReadWeightsFromStream> fatal error: mismatch "
1233 << "in number of variables: " << dummyInt << " != " << Data()->GetNVariables() << Endl;
1234 }
1235 //SetNvar(dummyInt);
1236
1237 // print some information
1238 if (fFitMethod == kUseMonteCarlo) {
1239 Log() << kWARNING << "Read cuts optimised using sample of MC events" << Endl;
1240 }
1241 else if (fFitMethod == kUseMonteCarloEvents) {
1242 Log() << kWARNING << "Read cuts optimised using sample of MC events" << Endl;
1243 }
1244 else if (fFitMethod == kUseGeneticAlgorithm) {
1245 Log() << kINFO << "Read cuts optimised using Genetic Algorithm" << Endl;
1246 }
1247 else if (fFitMethod == kUseSimulatedAnnealing) {
1248 Log() << kINFO << "Read cuts optimised using Simulated Annealing algorithm" << Endl;
1249 }
1250 else if (fFitMethod == kUseEventScan) {
1251 Log() << kINFO << "Read cuts optimised using Full Event Scan" << Endl;
1252 }
1253 else {
1254 Log() << kWARNING << "unknown method: " << fFitMethod << Endl;
1255 }
1256 Log() << kINFO << "in " << fNbins << " signal efficiency bins and for " << GetNvar() << " variables" << Endl;
1257
1258 // now read the cuts
1259 char buffer[200];
1260 istr.getline(buffer,200);
1261 istr.getline(buffer,200);
1262
1263 Int_t tmpbin;
1264 Float_t tmpeffS, tmpeffB;
1265 if (fEffBvsSLocal != 0) delete fEffBvsSLocal;
1266 fEffBvsSLocal = new TH1F( GetTestvarName() + "_effBvsSLocal",
1267 TString(GetName()) + " efficiency of B vs S", fNbins, 0.0, 1.0 );
1268 fEffBvsSLocal->SetDirectory(nullptr); // it's local
1269
1270 for (Int_t ibin=0; ibin<fNbins; ibin++) {
1271 istr >> tmpbin >> tmpeffS >> tmpeffB;
1272 fEffBvsSLocal->SetBinContent( ibin+1, tmpeffB );
1273
1274 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
1275 istr >> fCutMin[ivar][ibin] >> fCutMax[ivar][ibin];
1276 }
1277 }
1278
1279 fEffSMin = fEffBvsSLocal->GetBinCenter(1);
1280 fEffSMax = fEffBvsSLocal->GetBinCenter(fNbins);
1281}
1282
1283////////////////////////////////////////////////////////////////////////////////
1284/// create XML description for LD classification and regression
1285/// (for arbitrary number of output classes/targets)
1286
1287void TMVA::MethodCuts::AddWeightsXMLTo( void* parent ) const
1288{
1289 // write all necessary information to the stream
1290 std::vector<Double_t> cutsMin;
1291 std::vector<Double_t> cutsMax;
1292
1293 void* wght = gTools().AddChild(parent, "Weights");
1294 gTools().AddAttr( wght, "OptimisationMethod", (Int_t)fEffMethod);
1295 gTools().AddAttr( wght, "FitMethod", (Int_t)fFitMethod );
1296 gTools().AddAttr( wght, "nbins", fNbins );
1297 gTools().AddComment( wght, TString::Format( "Below are the optimised cuts for %i variables: Format: ibin(hist) effS effB cutMin[ivar=0] cutMax[ivar=0] ... cutMin[ivar=n-1] cutMax[ivar=n-1]", GetNvar() ) );
1298
1299 // NOTE: The signal efficiency written out into
1300 // the weight file does not correspond to the center of the bin within which the
1301 // background rejection is maximised (as before) but to the lower left edge of it.
1302 // This is because the cut optimisation algorithm determines the best background
1303 // rejection for all signal efficiencies belonging into a bin. Since the best background
1304 // rejection is in general obtained for the lowest possible signal efficiency, the
1305 // reference signal efficiency is the lowest value in the bin.
1306
1307 for (Int_t ibin=0; ibin<fNbins; ibin++) {
1308 Double_t effS = fEffBvsSLocal->GetBinCenter ( ibin + 1 );
1309 Double_t trueEffS = GetCuts( effS, cutsMin, cutsMax );
1310 if (TMath::Abs(trueEffS) < 1e-10) trueEffS = 0;
1311
1312 void* binxml = gTools().AddChild( wght, "Bin" );
1313 gTools().AddAttr( binxml, "ibin", ibin+1 );
1314 gTools().AddAttr( binxml, "effS", trueEffS );
1315 gTools().AddAttr( binxml, "effB", fEffBvsSLocal->GetBinContent( ibin + 1 ) );
1316 void* cutsxml = gTools().AddChild( binxml, "Cuts" );
1317 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
1318 gTools().AddAttr( cutsxml, TString::Format( "cutMin_%i", ivar ), cutsMin[ivar] );
1319 gTools().AddAttr( cutsxml, TString::Format( "cutMax_%i", ivar ), cutsMax[ivar] );
1320 }
1321 }
1322}
1323
1324////////////////////////////////////////////////////////////////////////////////
1325/// read coefficients from xml weight file
1326
1328{
1329 // delete old min and max
1330 for (UInt_t i=0; i<GetNvar(); i++) {
1331 if (fCutMin[i] != 0) delete [] fCutMin[i];
1332 if (fCutMax[i] != 0) delete [] fCutMax[i];
1333 }
1334 if (fCutMin != 0) delete [] fCutMin;
1335 if (fCutMax != 0) delete [] fCutMax;
1336
1337 Int_t tmpEffMethod, tmpFitMethod;
1338 gTools().ReadAttr( wghtnode, "OptimisationMethod", tmpEffMethod );
1339 gTools().ReadAttr( wghtnode, "FitMethod", tmpFitMethod );
1340 gTools().ReadAttr( wghtnode, "nbins", fNbins );
1341
1342 fEffMethod = (EEffMethod)tmpEffMethod;
1343 fFitMethod = (EFitMethodType)tmpFitMethod;
1344
1345 // print some information
1346 if (fFitMethod == kUseMonteCarlo) {
1347 Log() << kINFO << "Read cuts optimised using sample of MC events" << Endl;
1348 }
1349 else if (fFitMethod == kUseMonteCarloEvents) {
1350 Log() << kINFO << "Read cuts optimised using sample of MC-Event events" << Endl;
1351 }
1352 else if (fFitMethod == kUseGeneticAlgorithm) {
1353 Log() << kINFO << "Read cuts optimised using Genetic Algorithm" << Endl;
1354 }
1355 else if (fFitMethod == kUseSimulatedAnnealing) {
1356 Log() << kINFO << "Read cuts optimised using Simulated Annealing algorithm" << Endl;
1357 }
1358 else if (fFitMethod == kUseEventScan) {
1359 Log() << kINFO << "Read cuts optimised using Full Event Scan" << Endl;
1360 }
1361 else {
1362 Log() << kWARNING << "unknown method: " << fFitMethod << Endl;
1363 }
1364 Log() << kINFO << "Reading " << fNbins << " signal efficiency bins for " << GetNvar() << " variables" << Endl;
1365
1366 delete fEffBvsSLocal;
1367 fEffBvsSLocal = new TH1F( GetTestvarName() + "_effBvsSLocal",
1368 TString(GetName()) + " efficiency of B vs S", fNbins, 0.0, 1.0 );
1369 fEffBvsSLocal->SetDirectory(nullptr); // it's local
1370 for (Int_t ibin=1; ibin<=fNbins; ibin++) fEffBvsSLocal->SetBinContent( ibin, -0.1 ); // Init
1371
1372 fCutMin = new Double_t*[GetNvar()];
1373 fCutMax = new Double_t*[GetNvar()];
1374 for (UInt_t i=0;i<GetNvar();i++) {
1375 fCutMin[i] = new Double_t[fNbins];
1376 fCutMax[i] = new Double_t[fNbins];
1377 }
1378
1379 // read efficiencies and cuts
1380 Int_t tmpbin;
1381 Float_t tmpeffS, tmpeffB;
1382 void* ch = gTools().GetChild(wghtnode,"Bin");
1383 while (ch) {
1384 // if (strcmp(gTools().GetName(ch),"Bin") !=0) {
1385 // ch = gTools().GetNextChild(ch);
1386 // continue;
1387 // }
1388
1389 gTools().ReadAttr( ch, "ibin", tmpbin );
1390 gTools().ReadAttr( ch, "effS", tmpeffS );
1391 gTools().ReadAttr( ch, "effB", tmpeffB );
1392
1393 // sanity check
1394 if (tmpbin-1 >= fNbins || tmpbin-1 < 0) {
1395 Log() << kFATAL << "Mismatch in bins: " << tmpbin-1 << " >= " << fNbins << Endl;
1396 }
1397
1398 fEffBvsSLocal->SetBinContent( tmpbin, tmpeffB );
1399 void* ct = gTools().GetChild(ch);
1400 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
1401 gTools().ReadAttr( ct, TString::Format( "cutMin_%i", ivar ), fCutMin[ivar][tmpbin-1] );
1402 gTools().ReadAttr( ct, TString::Format( "cutMax_%i", ivar ), fCutMax[ivar][tmpbin-1] );
1403 }
1404 ch = gTools().GetNextChild(ch, "Bin");
1405 }
1406}
1407
1408////////////////////////////////////////////////////////////////////////////////
1409/// write histograms and PDFs to file for monitoring purposes
1410
1412{
1413 Log() << kINFO << "Write monitoring histograms to file: " << BaseDir()->GetPath() << Endl;
1414
1415 fEffBvsSLocal->Write();
1416
1417 // save reference histograms to file
1418 if (fEffMethod == kUsePDFs) {
1419 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
1420 (*fVarHistS)[ivar]->Write();
1421 (*fVarHistB)[ivar]->Write();
1422 (*fVarHistS_smooth)[ivar]->Write();
1423 (*fVarHistB_smooth)[ivar]->Write();
1424 (*fVarPdfS)[ivar]->GetPDFHist()->Write();
1425 (*fVarPdfB)[ivar]->GetPDFHist()->Write();
1426 }
1427 }
1428}
1429
1430////////////////////////////////////////////////////////////////////////////////
1431/// Overloaded function to create background efficiency (rejection) versus
1432/// signal efficiency plot (first call of this function).
1433///
1434/// The function returns the signal efficiency at background efficiency
1435/// indicated in theString
1436///
1437/// "theString" must have two entries:
1438/// - `[0]`: "Efficiency"
1439/// - `[1]`: the value of background efficiency at which the signal efficiency
1440/// is to be returned
1441
1443{
1444 // parse input string for required background efficiency
1445 TList* list = gTools().ParseFormatLine( theString );
1446 // sanity check
1447 if (list->GetSize() != 2) {
1448 Log() << kFATAL << "<GetTrainingEfficiency> wrong number of arguments"
1449 << " in string: " << theString
1450 << " | required format, e.g., Efficiency:0.05" << Endl;
1451 return -1;
1452 }
1453
1454 Results* results = Data()->GetResults(GetMethodName(), Types::kTesting, GetAnalysisType());
1455
1456 // that will be the value of the efficiency retured (does not affect
1457 // the efficiency-vs-bkg plot which is done anyway.
1458 Float_t effBref = atof( ((TObjString*)list->At(1))->GetString() );
1459
1460 delete list;
1461
1462 // first round ? --> create histograms
1463 if (results->GetHist("EFF_BVSS_TR")==0) {
1464
1465 if (fBinaryTreeS != 0) { delete fBinaryTreeS; fBinaryTreeS = 0; }
1466 if (fBinaryTreeB != 0) { delete fBinaryTreeB; fBinaryTreeB = 0; }
1467
1468 fBinaryTreeS = new BinarySearchTree();
1469 fBinaryTreeS->Fill( GetEventCollection(Types::kTraining), fSignalClass );
1470 fBinaryTreeB = new BinarySearchTree();
1471 fBinaryTreeB->Fill( GetEventCollection(Types::kTraining), fBackgroundClass );
1472 // there is no really good equivalent to the fEffS; fEffB (efficiency vs cutvalue)
1473 // for the "Cuts" method (unless we had only one cut). Maybe later I might add here
1474 // histograms for each of the cuts...but this would require also a change in the
1475 // base class, and it is not really necessary, as we get exactly THIS info from the
1476 // "evaluateAllVariables" anyway.
1477
1478 // now create efficiency curve: background versus signal
1479 TH1* eff_bvss_tr = new TH1F( GetTestvarName() + "_trainingEffBvsS", GetTestvarName() + "", fNbins, 0, 1 );
1480 for (Int_t ibin=1; ibin<=fNbins; ibin++) eff_bvss_tr->SetBinContent( ibin, -0.1 ); // Init
1481 TH1* rej_bvss_tr = new TH1F( GetTestvarName() + "_trainingRejBvsS", GetTestvarName() + "", fNbins, 0, 1 );
1482 for (Int_t ibin=1; ibin<=fNbins; ibin++) rej_bvss_tr->SetBinContent( ibin, 0. ); // Init
1483 results->Store(eff_bvss_tr, "EFF_BVSS_TR");
1484 results->Store(rej_bvss_tr, "REJ_BVSS_TR");
1485
1486 // use root finder
1487
1488 // make the background-vs-signal efficiency plot
1489 Double_t* tmpCutMin = new Double_t[GetNvar()];
1490 Double_t* tmpCutMax = new Double_t[GetNvar()];
1491 Int_t nFailedBins=0;
1492 for (Int_t bini=1; bini<=fNbins; bini++) {
1493 for (UInt_t ivar=0; ivar <GetNvar(); ivar++){
1494 tmpCutMin[ivar] = fCutMin[ivar][bini-1];
1495 tmpCutMax[ivar] = fCutMax[ivar][bini-1];
1496 }
1497 // find cut value corresponding to a given signal efficiency
1498 Double_t effS, effB;
1499 this->GetEffsfromSelection( &tmpCutMin[0], &tmpCutMax[0], effS, effB);
1500 // check that effS matches bini
1501 Int_t effBin = eff_bvss_tr->GetXaxis()->FindBin(effS);
1502 if (effBin != bini){
1503 Log()<< kVERBOSE << "unable to fill efficiency bin " << bini<< " " << effBin <<Endl;
1504 nFailedBins++;
1505 }
1506 else{
1507 // and fill histograms
1508 eff_bvss_tr->SetBinContent( bini, effB );
1509 rej_bvss_tr->SetBinContent( bini, 1.0-effB );
1510 }
1511 }
1512 if (nFailedBins>0) Log()<< kWARNING << " unable to fill "<< nFailedBins <<" efficiency bins " <<Endl;
1513
1514 delete [] tmpCutMin;
1515 delete [] tmpCutMax;
1516
1517 // create splines for histogram
1518 fSplTrainEffBvsS = new TSpline1( "trainEffBvsS", new TGraph( eff_bvss_tr ) );
1519 }
1520
1521 // must exist...
1522 if (NULL == fSplTrainEffBvsS) return 0.0;
1523
1524 // now find signal efficiency that corresponds to required background efficiency
1525 Double_t effS = 0., effB, effS_ = 0., effB_ = 0.;
1526 Int_t nbins_ = 1000;
1527
1528 // loop over efficiency bins until the background eff. matches the requirement
1529 for (Int_t bini=1; bini<=nbins_; bini++) {
1530 // get corresponding signal and background efficiencies
1531 effS = (bini - 0.5)/Float_t(nbins_);
1532 effB = fSplTrainEffBvsS->Eval( effS );
1533
1534 // find signal efficiency that corresponds to required background efficiency
1535 if ((effB - effBref)*(effB_ - effBref) < 0) break;
1536 effS_ = effS;
1537 effB_ = effB;
1538 }
1539
1540 return 0.5*(effS + effS_);
1541}
1542
1543////////////////////////////////////////////////////////////////////////////////
1544/// Overloaded function to create background efficiency (rejection) versus
1545/// signal efficiency plot (first call of this function).
1546///
1547/// The function returns the signal efficiency at background efficiency
1548/// indicated in theString
1549///
1550/// "theString" must have two entries:
1551/// - `[0]`: "Efficiency"
1552/// - `[1]`: the value of background efficiency at which the signal efficiency
1553/// is to be returned
1554
1556{
1557 Data()->SetCurrentType(type);
1558
1559 Results* results = Data()->GetResults( GetMethodName(), Types::kTesting, GetAnalysisType() );
1560
1561 // parse input string for required background efficiency
1562 TList* list = gTools().ParseFormatLine( theString, ":" );
1563
1564 if (list->GetSize() > 2) {
1565 delete list;
1566 Log() << kFATAL << "<GetEfficiency> wrong number of arguments"
1567 << " in string: " << theString
1568 << " | required format, e.g., Efficiency:0.05, or empty string" << Endl;
1569 return -1;
1570 }
1571
1572 // sanity check
1573 Bool_t computeArea = (list->GetSize() < 2); // the area is computed
1574
1575 // that will be the value of the efficiency retured (does not affect
1576 // the efficiency-vs-bkg plot which is done anyway.
1577 Float_t effBref = (computeArea?1.:atof( ((TObjString*)list->At(1))->GetString() ));
1578
1579 delete list;
1580
1581
1582 // first round ? --> create histograms
1583 if (results->GetHist("MVA_EFF_BvsS")==0) {
1584
1585 if (fBinaryTreeS!=0) { delete fBinaryTreeS; fBinaryTreeS = 0; }
1586 if (fBinaryTreeB!=0) { delete fBinaryTreeB; fBinaryTreeB = 0; }
1587
1588 // the variables may be transformed by a transformation method: to coherently
1589 // treat signal and background one must decide which transformation type shall
1590 // be used: our default is signal-type
1591 fBinaryTreeS = new BinarySearchTree();
1592 fBinaryTreeS->Fill( GetEventCollection(Types::kTesting), fSignalClass );
1593 fBinaryTreeB = new BinarySearchTree();
1594 fBinaryTreeB->Fill( GetEventCollection(Types::kTesting), fBackgroundClass );
1595
1596 // there is no really good equivalent to the fEffS; fEffB (efficiency vs cutvalue)
1597 // for the "Cuts" method (unless we had only one cut). Maybe later I might add here
1598 // histograms for each of the cuts...but this would require also a change in the
1599 // base class, and it is not really necessary, as we get exactly THIS info from the
1600 // "evaluateAllVariables" anyway.
1601
1602 // now create efficiency curve: background versus signal
1603 TH1* eff_BvsS = new TH1F( GetTestvarName() + "_effBvsS", GetTestvarName() + "", fNbins, 0, 1 );
1604 for (Int_t ibin=1; ibin<=fNbins; ibin++) eff_BvsS->SetBinContent( ibin, -0.1 ); // Init
1605 TH1* rej_BvsS = new TH1F( GetTestvarName() + "_rejBvsS", GetTestvarName() + "", fNbins, 0, 1 );
1606 for (Int_t ibin=1; ibin<=fNbins; ibin++) rej_BvsS->SetBinContent( ibin, 0.0 ); // Init
1607 results->Store(eff_BvsS, "MVA_EFF_BvsS");
1608 results->Store(rej_BvsS);
1609
1610 Double_t xmin = 0.;
1611 Double_t xmax = 1.000001;
1612
1613 TH1* eff_s = new TH1F( GetTestvarName() + "_effS", GetTestvarName() + " (signal)", fNbins, xmin, xmax);
1614 for (Int_t ibin=1; ibin<=fNbins; ibin++) eff_s->SetBinContent( ibin, -0.1 ); // Init
1615 TH1* eff_b = new TH1F( GetTestvarName() + "_effB", GetTestvarName() + " (background)", fNbins, xmin, xmax);
1616 for (Int_t ibin=1; ibin<=fNbins; ibin++) eff_b->SetBinContent( ibin, -0.1 ); // Init
1617 results->Store(eff_s, "MVA_S");
1618 results->Store(eff_b, "MVA_B");
1619
1620 // use root finder
1621
1622 // make the background-vs-signal efficiency plot
1623 Double_t* tmpCutMin = new Double_t[GetNvar()];
1624 Double_t* tmpCutMax = new Double_t[GetNvar()];
1625 TGraph* tmpBvsS = new TGraph(fNbins+1);
1626 tmpBvsS->SetPoint(0, 0., 0.);
1627
1628 for (Int_t bini=1; bini<=fNbins; bini++) {
1629 for (UInt_t ivar=0; ivar <GetNvar(); ivar++) {
1630 tmpCutMin[ivar] = fCutMin[ivar][bini-1];
1631 tmpCutMax[ivar] = fCutMax[ivar][bini-1];
1632 }
1633 // find cut value corresponding to a given signal efficiency
1634 Double_t effS, effB;
1635 this->GetEffsfromSelection( &tmpCutMin[0], &tmpCutMax[0], effS, effB);
1636 tmpBvsS->SetPoint(bini, effS, effB);
1637
1638 eff_s->SetBinContent(bini, effS);
1639 eff_b->SetBinContent(bini, effB);
1640 }
1641 tmpBvsS->SetPoint(fNbins+1, 1., 1.);
1642
1643 delete [] tmpCutMin;
1644 delete [] tmpCutMax;
1645
1646 // create splines for histogram
1647 fSpleffBvsS = new TSpline1( "effBvsS", tmpBvsS );
1648 for (Int_t bini=1; bini<=fNbins; bini++) {
1649 Double_t effS = (bini - 0.5)/Float_t(fNbins);
1650 Double_t effB = fSpleffBvsS->Eval( effS );
1651 eff_BvsS->SetBinContent( bini, effB );
1652 rej_BvsS->SetBinContent( bini, 1.0-effB );
1653 }
1654 }
1655
1656 // must exist...
1657 if (NULL == fSpleffBvsS) return 0.0;
1658
1659 // now find signal efficiency that corresponds to required background efficiency
1660 Double_t effS = 0, effB = 0, effS_ = 0, effB_ = 0;
1661 Int_t nbins_ = 1000;
1662
1663 if (computeArea) {
1664
1665 // compute area of rej-vs-eff plot
1666 Double_t integral = 0;
1667 for (Int_t bini=1; bini<=nbins_; bini++) {
1668
1669 // get corresponding signal and background efficiencies
1670 effS = (bini - 0.5)/Float_t(nbins_);
1671 effB = fSpleffBvsS->Eval( effS );
1672 integral += (1.0 - effB);
1673 }
1674 integral /= nbins_;
1675
1676 return integral;
1677 }
1678 else {
1679
1680 // loop over efficiency bins until the background eff. matches the requirement
1681 for (Int_t bini=1; bini<=nbins_; bini++) {
1682 // get corresponding signal and background efficiencies
1683 effS = (bini - 0.5)/Float_t(nbins_);
1684 effB = fSpleffBvsS->Eval( effS );
1685
1686 // find signal efficiency that corresponds to required background efficiency
1687 if ((effB - effBref)*(effB_ - effBref) < 0) break;
1688 effS_ = effS;
1689 effB_ = effB;
1690 }
1691
1692 effS = 0.5*(effS + effS_);
1693 effSerr = 0;
1694 if (Data()->GetNEvtSigTest() > 0)
1695 effSerr = TMath::Sqrt( effS*(1.0 - effS)/Double_t(Data()->GetNEvtSigTest()) );
1696
1697 return effS;
1698
1699 }
1700
1701 return -1;
1702}
1703
1704////////////////////////////////////////////////////////////////////////////////
1705/// write specific classifier response
1706
1707void TMVA::MethodCuts::MakeClassSpecific( std::ostream& fout, const TString& className ) const
1708{
1709 fout << " // not implemented for class: \"" << className << "\"" << std::endl;
1710 fout << "};" << std::endl;
1711}
1712
1713////////////////////////////////////////////////////////////////////////////////
1714/// get help message text
1715///
1716/// typical length of text line:
1717/// "|--------------------------------------------------------------|"
1718
1720{
1721 TString bold = gConfig().WriteOptionsReference() ? "<b>" : "";
1722 TString resbold = gConfig().WriteOptionsReference() ? "</b>" : "";
1723 TString brk = gConfig().WriteOptionsReference() ? "<br>" : "";
1724
1725 Log() << Endl;
1726 Log() << gTools().Color("bold") << "--- Short description:" << gTools().Color("reset") << Endl;
1727 Log() << Endl;
1728 Log() << "The optimisation of rectangular cuts performed by TMVA maximises " << Endl;
1729 Log() << "the background rejection at given signal efficiency, and scans " << Endl;
1730 Log() << "over the full range of the latter quantity. Three optimisation" << Endl;
1731 Log() << "methods are optional: Monte Carlo sampling (MC), a Genetics" << Endl;
1732 Log() << "Algorithm (GA), and Simulated Annealing (SA). GA and SA are" << Endl;
1733 Log() << "expected to perform best." << Endl;
1734 Log() << Endl;
1735 Log() << "The difficulty to find the optimal cuts strongly increases with" << Endl;
1736 Log() << "the dimensionality (number of input variables) of the problem." << Endl;
1737 Log() << "This behavior is due to the non-uniqueness of the solution space."<< Endl;
1738 Log() << Endl;
1739 Log() << gTools().Color("bold") << "--- Performance optimisation:" << gTools().Color("reset") << Endl;
1740 Log() << Endl;
1741 Log() << "If the dimensionality exceeds, say, 4 input variables, it is " << Endl;
1742 Log() << "advisable to scrutinize the separation power of the variables," << Endl;
1743 Log() << "and to remove the weakest ones. If some among the input variables" << Endl;
1744 Log() << "can be described by a single cut (e.g., because signal tends to be" << Endl;
1745 Log() << "larger than background), this can be indicated to MethodCuts via" << Endl;
1746 Log() << "the \"Fsmart\" options (see option string). Choosing this option" << Endl;
1747 Log() << "reduces the number of requirements for the variable from 2 (min/max)" << Endl;
1748 Log() << "to a single one (TMVA finds out whether it is to be interpreted as" << Endl;
1749 Log() << "min or max)." << Endl;
1750 Log() << Endl;
1751 Log() << gTools().Color("bold") << "--- Performance tuning via configuration options:" << gTools().Color("reset") << Endl;
1752 Log() << "" << Endl;
1753 Log() << bold << "Monte Carlo sampling:" << resbold << Endl;
1754 Log() << "" << Endl;
1755 Log() << "Apart form the \"Fsmart\" option for the variables, the only way" << Endl;
1756 Log() << "to improve the MC sampling is to increase the sampling rate. This" << Endl;
1757 Log() << "is done via the configuration option \"MC_NRandCuts\". The execution" << Endl;
1758 Log() << "time scales linearly with the sampling rate." << Endl;
1759 Log() << "" << Endl;
1760 Log() << bold << "Genetic Algorithm:" << resbold << Endl;
1761 Log() << "" << Endl;
1762 Log() << "The algorithm terminates if no significant fitness increase has" << Endl;
1763 Log() << "been achieved within the last \"nsteps\" steps of the calculation." << Endl;
1764 Log() << "Wiggles in the ROC curve or constant background rejection of 1" << Endl;
1765 Log() << "indicate that the GA failed to always converge at the true maximum" << Endl;
1766 Log() << "fitness. In such a case, it is recommended to broaden the search " << Endl;
1767 Log() << "by increasing the population size (\"popSize\") and to give the GA " << Endl;
1768 Log() << "more time to find improvements by increasing the number of steps" << Endl;
1769 Log() << "(\"nsteps\")" << Endl;
1770 Log() << " -> increase \"popSize\" (at least >10 * number of variables)" << Endl;
1771 Log() << " -> increase \"nsteps\"" << Endl;
1772 Log() << "" << Endl;
1773 Log() << bold << "Simulated Annealing (SA) algorithm:" << resbold << Endl;
1774 Log() << "" << Endl;
1775 Log() << "\"Increasing Adaptive\" approach:" << Endl;
1776 Log() << "" << Endl;
1777 Log() << "The algorithm seeks local minima and explores their neighborhoods, while" << Endl;
1778 Log() << "changing the ambient temperature depending on the number of failures" << Endl;
1779 Log() << "in the previous steps. The performance can be improved by increasing" << Endl;
1780 Log() << "the number of iteration steps (\"MaxCalls\"), or by adjusting the" << Endl;
1781 Log() << "minimal temperature (\"MinTemperature\"). Manual adjustments of the" << Endl;
1782 Log() << "speed of the temperature increase (\"TemperatureScale\" and \"AdaptiveSpeed\")" << Endl;
1783 Log() << "to individual data sets should also help. Summary:" << brk << Endl;
1784 Log() << " -> increase \"MaxCalls\"" << brk << Endl;
1785 Log() << " -> adjust \"MinTemperature\"" << brk << Endl;
1786 Log() << " -> adjust \"TemperatureScale\"" << brk << Endl;
1787 Log() << " -> adjust \"AdaptiveSpeed\"" << Endl;
1788 Log() << "" << Endl;
1789 Log() << "\"Decreasing Adaptive\" approach:" << Endl;
1790 Log() << "" << Endl;
1791 Log() << "The algorithm calculates the initial temperature (based on the effect-" << Endl;
1792 Log() << "iveness of large steps) and the multiplier that ensures to reach the" << Endl;
1793 Log() << "minimal temperature with the requested number of iteration steps." << Endl;
1794 Log() << "The performance can be improved by adjusting the minimal temperature" << Endl;
1795 Log() << " (\"MinTemperature\") and by increasing number of steps (\"MaxCalls\"):" << brk << Endl;
1796 Log() << " -> increase \"MaxCalls\"" << brk << Endl;
1797 Log() << " -> adjust \"MinTemperature\"" << Endl;
1798 Log() << " " << Endl;
1799 Log() << "Other kernels:" << Endl;
1800 Log() << "" << Endl;
1801 Log() << "Alternative ways of counting the temperature change are implemented. " << Endl;
1802 Log() << "Each of them starts with the maximum temperature (\"MaxTemperature\")" << Endl;
1803 Log() << "and decreases while changing the temperature according to a given" << Endl;
1804 Log() << "prescription:" << brk << Endl;
1805 Log() << "CurrentTemperature =" << brk << Endl;
1806 Log() << " - Sqrt: InitialTemperature / Sqrt(StepNumber+2) * TemperatureScale" << brk << Endl;
1807 Log() << " - Log: InitialTemperature / Log(StepNumber+2) * TemperatureScale" << brk << Endl;
1808 Log() << " - Homo: InitialTemperature / (StepNumber+2) * TemperatureScale" << brk << Endl;
1809 Log() << " - Sin: (Sin(StepNumber / TemperatureScale) + 1) / (StepNumber + 1)*InitialTemperature + Eps" << brk << Endl;
1810 Log() << " - Geo: CurrentTemperature * TemperatureScale" << Endl;
1811 Log() << "" << Endl;
1812 Log() << "Their performance can be improved by adjusting initial temperature" << Endl;
1813 Log() << "(\"InitialTemperature\"), the number of iteration steps (\"MaxCalls\")," << Endl;
1814 Log() << "and the multiplier that scales the temperature decrease" << Endl;
1815 Log() << "(\"TemperatureScale\")" << brk << Endl;
1816 Log() << " -> increase \"MaxCalls\"" << brk << Endl;
1817 Log() << " -> adjust \"InitialTemperature\"" << brk << Endl;
1818 Log() << " -> adjust \"TemperatureScale\"" << brk << Endl;
1819 Log() << " -> adjust \"KernelTemperature\"" << Endl;
1820}
#define REGISTER_METHOD(CLASS)
for example
#define e(i)
Definition RSha256.hxx:103
int Int_t
Definition RtypesCore.h:45
float Float_t
Definition RtypesCore.h:57
constexpr Bool_t kFALSE
Definition RtypesCore.h:101
double Double_t
Definition RtypesCore.h:59
constexpr Bool_t kTRUE
Definition RtypesCore.h:100
#define ClassImp(name)
Definition Rtypes.h:377
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t type
float xmin
float xmax
char * Form(const char *fmt,...)
Formats a string in a circular formatting buffer.
Definition TString.cxx:2467
virtual Int_t FindBin(Double_t x)
Find bin number corresponding to abscissa x.
Definition TAxis.cxx:293
virtual Int_t GetSize() const
Return the capacity of the collection, i.e.
A TGraph is an object made of two arrays X and Y with npoints each.
Definition TGraph.h:41
virtual void SetPoint(Int_t i, Double_t x, Double_t y)
Set x and y values for point number i.
Definition TGraph.cxx:2315
1-D histogram with a float per channel (see TH1 documentation)}
Definition TH1.h:577
TH1 is the base class of all histogram classes in ROOT.
Definition TH1.h:58
TAxis * GetXaxis()
Definition TH1.h:322
virtual void SetBinContent(Int_t bin, Double_t content)
Set bin content see convention for numbering bins in TH1::GetBin In case the bin number is greater th...
Definition TH1.cxx:9139
A doubly linked list.
Definition TList.h:38
TObject * At(Int_t idx) const override
Returns the object at position idx. Returns 0 if idx is out of range.
Definition TList.cxx:357
A simple Binary search tree including a volume search method.
Bool_t WriteOptionsReference() const
Definition Config.h:65
void CheckForUnusedOptions() const
checks for unused options in option string
Class that contains all the data information.
Definition DataSetInfo.h:62
Float_t GetValue(UInt_t ivar) const
return value of i'th variable
Definition Event.cxx:236
Base class for TMVA fitters.
Definition FitterBase.h:51
void SetIPythonInteractive(bool *ExitFromTraining, UInt_t *fIPyMaxIter_, UInt_t *fIPyCurrentIter_)
Definition FitterBase.h:73
Double_t Run()
estimator function interface for fitting
Fitter using a Genetic Algorithm.
The TMVA::Interval Class.
Definition Interval.h:61
Fitter using Monte Carlo sampling of parameters.
Definition MCFitter.h:44
Virtual base Class for all MVA method.
Definition MethodBase.h:111
friend class MethodCuts
Definition MethodBase.h:603
Multivariate optimisation of signal efficiency for given background efficiency, applying rectangular ...
Definition MethodCuts.h:61
Double_t ComputeEstimator(std::vector< Double_t > &)
returns estimator for "cut fitness" used by GA.
void MakeClassSpecific(std::ostream &, const TString &) const
write specific classifier response
void ReadWeightsFromStream(std::istream &i)
read the cuts from stream
Double_t GetEfficiency(const TString &, Types::ETreeType, Double_t &)
Overloaded function to create background efficiency (rejection) versus signal efficiency plot (first ...
Double_t EstimatorFunction(std::vector< Double_t > &)
returns estimator for "cut fitness" used by GA
void DeclareOptions()
define the options (their key words) that can be set in the option string.
void MatchCutsToPars(std::vector< Double_t > &, Double_t *, Double_t *)
translates cuts into parameters
void GetHelpMessage() const
get help message text
void Train(void)
training method: here the cuts are optimised for the training sample
static const Double_t fgMaxAbsCutVal
Definition MethodCuts.h:130
void CreateVariablePDFs(void)
for PDF method: create efficiency reference histograms and PDFs
Double_t GetMvaValue(Double_t *err=nullptr, Double_t *errUpper=nullptr)
cut evaluation: returns 1.0 if event passed, 0.0 otherwise
void GetEffsfromSelection(Double_t *cutMin, Double_t *cutMax, Double_t &effS, Double_t &effB)
compute signal and background efficiencies from event counting for given cut sample
void AddWeightsXMLTo(void *parent) const
create XML description for LD classification and regression (for arbitrary number of output classes/t...
void Init(void)
default initialisation called by all constructors
Double_t GetTrainingEfficiency(const TString &)
Overloaded function to create background efficiency (rejection) versus signal efficiency plot (first ...
virtual Bool_t HasAnalysisType(Types::EAnalysisType type, UInt_t numberClasses, UInt_t numberTargets)
Cuts can only handle classification with 2 classes.
void ProcessOptions()
process user options.
void WriteMonitoringHistosToFile(void) const
write histograms and PDFs to file for monitoring purposes
void MatchParsToCuts(const std::vector< Double_t > &, Double_t *, Double_t *)
translates parameters into cuts
virtual ~MethodCuts(void)
destructor
void TestClassification()
nothing to test
void ReadWeightsFromXML(void *wghtnode)
read coefficients from xml weight file
void GetEffsfromPDFs(Double_t *cutMin, Double_t *cutMax, Double_t &effS, Double_t &effB)
compute signal and background efficiencies from PDFs for given cut sample
Double_t GetCuts(Double_t effS, std::vector< Double_t > &cutMin, std::vector< Double_t > &cutMax) const
retrieve cut values for given signal efficiency
void PrintCuts(Double_t effS) const
print cuts
/Fitter using MINUIT
PDF wrapper for histograms; uses user-defined spline interpolation.
Definition PDF.h:63
@ kSpline2
Definition PDF.h:70
Class that is the base-class for a vector of result.
Definition Results.h:57
void Store(TObject *obj, const char *alias=nullptr)
Definition Results.cxx:86
TH1 * GetHist(const TString &alias) const
Definition Results.cxx:136
Fitter using a Simulated Annealing Algorithm.
Linear interpolation of TGraph.
Definition TSpline1.h:43
Timing information for training and evaluation of MVA methods.
Definition Timer.h:58
void DrawProgressBar(Int_t, const TString &comment="")
draws progress bar in color or B&W caution:
Definition Timer.cxx:202
TList * ParseFormatLine(TString theString, const char *sep=":")
Parse the string and cut into labels separated by ":".
Definition Tools.cxx:401
const TString & Color(const TString &)
human readable color strings
Definition Tools.cxx:828
void ReadAttr(void *node, const char *, T &value)
read attribute from xml
Definition Tools.h:329
Bool_t AddComment(void *node, const char *comment)
Definition Tools.cxx:1132
void * GetChild(void *parent, const char *childname=nullptr)
get child node
Definition Tools.cxx:1150
void AddAttr(void *node, const char *, const T &value, Int_t precision=16)
add attribute to xml
Definition Tools.h:347
void * AddChild(void *parent, const char *childname, const char *content=nullptr, bool isRootNode=false)
add child node
Definition Tools.cxx:1124
void * GetNextChild(void *prevchild, const char *childname=nullptr)
XML helpers.
Definition Tools.cxx:1162
Singleton class for Global types used by TMVA.
Definition Types.h:71
@ kSignal
Never change this number - it is elsewhere assumed to be zero !
Definition Types.h:135
@ kBackground
Definition Types.h:136
@ kClassification
Definition Types.h:127
@ kTraining
Definition Types.h:143
Volume for BinarySearchTree.
Definition Volume.h:47
Collectable string class.
Definition TObjString.h:28
virtual Int_t Write(const char *name=nullptr, Int_t option=0, Int_t bufsize=0)
Write this object to the current directory.
Definition TObject.cxx:880
Random number generator class based on M.
Definition TRandom3.h:27
virtual Double_t Uniform(Double_t x1=1)
Returns a uniform deviate on the interval (0, x1).
Definition TRandom.cxx:672
Basic string class.
Definition TString.h:139
const char * Data() const
Definition TString.h:380
static TString Format(const char *fmt,...)
Static method which formats a string using a printf style format descriptor and return a TString.
Definition TString.cxx:2356
void Clone(Ssiz_t nc)
Make self a distinct copy with capacity of at least tot, where tot cannot be smaller than the current...
Definition TString.cxx:1257
Config & gConfig()
Tools & gTools()
MsgLogger & Endl(MsgLogger &ml)
Definition MsgLogger.h:148
Short_t Max(Short_t a, Short_t b)
Returns the largest of a and b.
Definition TMathBase.h:250
Double_t Sqrt(Double_t x)
Returns the square root of x.
Definition TMath.h:662
Short_t Min(Short_t a, Short_t b)
Returns the smallest of a and b.
Definition TMathBase.h:198
Short_t Abs(Short_t d)
Returns the absolute value of parameter Short_t d.
Definition TMathBase.h:123
double epsilon
Definition triangle.c:618