Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
MethodCuts.cxx
Go to the documentation of this file.
1// @(#)root/tmva $Id$
2// Author: Andreas Hoecker, Matt Jachowski, Peter Speckmayer, Eckhard von Toerne, Helge Voss, Kai Voss
3
4/**********************************************************************************
5 * Project: TMVA - a Root-integrated toolkit for multivariate Data analysis *
6 * Package: TMVA *
7 * Class : TMVA::MethodCuts *
8 * *
9 * *
10 * Description: *
11 * Implementation (see header for description) *
12 * *
13 * Authors (alphabetical): *
14 * Andreas Hoecker <Andreas.Hocker@cern.ch> - CERN, Switzerland *
15 * Matt Jachowski <jachowski@stanford.edu> - Stanford University, USA *
16 * Peter Speckmayer <speckmay@mail.cern.ch> - CERN, Switzerland *
17 * Eckhard von Toerne <evt@physik.uni-bonn.de> - U. of Bonn, Germany *
18 * Helge Voss <Helge.Voss@cern.ch> - MPI-K Heidelberg, Germany *
19 * Kai Voss <Kai.Voss@cern.ch> - U. of Victoria, Canada *
20 * *
21 * Copyright (c) 2005: *
22 * CERN, Switzerland *
23 * U. of Victoria, Canada *
24 * MPI-K Heidelberg, Germany *
25 * *
26 * Redistribution and use in source and binary forms, with or without *
27 * modification, are permitted according to the terms listed in LICENSE *
28 * (see tmva/doc/LICENSE) *
29 **********************************************************************************/
30
31/*! \class TMVA::MethodCuts
32\ingroup TMVA
33
34 Multivariate optimisation of signal efficiency for given background
35 efficiency, applying rectangular minimum and maximum requirements.
36
37 Also implemented is a "decorrelate/diagonalized cuts approach",
38 which improves over the uncorrelated cuts approach by
39 transforming linearly the input variables into a diagonal space,
40 using the square-root of the covariance matrix.
41
42 Other optimisation criteria, such as maximising the signal significance-
43 squared, \f$ \frac{S^2}{(S+B)} \f$, with S and B being the signal and background yields,
44 correspond to a particular point in the optimised background rejection
45 versus signal efficiency curve. This working point requires the knowledge
46 of the expected yields, which is not the case in general. Note also that
47 for rare signals, Poissonian statistics should be used, which modifies
48 the significance criterion.
49
50 The rectangular cut of a volume in the variable space is performed using
51 a binary tree to sort the training events. This provides a significant
52 reduction in computing time (up to several orders of magnitudes, depending
53 on the complexity of the problem at hand).
54
55 Technically, optimisation is achieved in TMVA by two methods:
56
57 1. Monte Carlo generation using uniform priors for the lower cut value,
58 and the cut width, thrown within the variable ranges.
59
60 2. A Genetic Algorithm (GA) searches for the optimal ("fittest") cut sample.
61 The GA is configurable by many external settings through the option
62 string. For difficult cases (such as many variables), some tuning
63 may be necessary to achieve satisfying results
64
65 Attempts to use Minuit fits (Simplex ot Migrad) instead have not shown
66 superior results, and often failed due to convergence at local minima.
67
68 The tests we have performed so far showed that in generic applications,
69 the GA is superior to MC sampling, and hence GA is the default method.
70 It is worthwhile trying both anyway.
71
72 **Decorrelated (or "diagonalized") Cuts**
73
74 See class description for Method Likelihood for a detailed explanation.
75*/
76
77#include "TMVA/MethodCuts.h"
78
81#include "TMVA/Config.h"
82#include "TMVA/Configurable.h"
83#include "TMVA/DataSet.h"
84#include "TMVA/DataSetInfo.h"
85#include "TMVA/Event.h"
86#include "TMVA/IFitterTarget.h"
87#include "TMVA/IMethod.h"
88#include "TMVA/GeneticFitter.h"
89#include "TMVA/Interval.h"
90#include "TMVA/FitterBase.h"
91#include "TMVA/MCFitter.h"
92#include "TMVA/MethodBase.h"
93#include "TMVA/MethodFDA.h"
94#include "TMVA/MinuitFitter.h"
95#include "TMVA/MsgLogger.h"
96#include "TMVA/PDF.h"
97#include "TMVA/Results.h"
99#include "TMVA/Timer.h"
100#include "TMVA/Tools.h"
102#include "TMVA/Types.h"
103#include "TMVA/TSpline1.h"
105
106#include "TH1F.h"
107#include "TObjString.h"
108#include "TDirectory.h"
109#include "TMath.h"
110#include "TGraph.h"
111#include "TSpline.h"
112#include "TRandom3.h"
113
114#include <cstdlib>
115#include <iostream>
116#include <iomanip>
117
118using std::atof;
119
121
122
124
125////////////////////////////////////////////////////////////////////////////////
126/// standard constructor
127
129 const TString& methodTitle,
131 const TString& theOption ) :
132 MethodBase( jobName, Types::kCuts, methodTitle, theData, theOption),
133 fFitMethod ( kUseGeneticAlgorithm ),
134 fEffMethod ( kUseEventSelection ),
135 fFitParams (0),
136 fTestSignalEff(0.7),
137 fEffSMin ( 0 ),
138 fEffSMax ( 0 ),
139 fCutRangeMin( 0 ),
140 fCutRangeMax( 0 ),
141 fBinaryTreeS( 0 ),
142 fBinaryTreeB( 0 ),
143 fCutMin ( 0 ),
144 fCutMax ( 0 ),
145 fTmpCutMin ( 0 ),
146 fTmpCutMax ( 0 ),
147 fAllVarsI ( 0 ),
148 fNpar ( 0 ),
149 fEffRef ( 0 ),
150 fRangeSign ( 0 ),
151 fRandom ( 0 ),
152 fMeanS ( 0 ),
153 fMeanB ( 0 ),
154 fRmsS ( 0 ),
155 fRmsB ( 0 ),
156 fEffBvsSLocal( 0 ),
157 fVarHistS ( 0 ),
158 fVarHistB ( 0 ),
159 fVarHistS_smooth( 0 ),
160 fVarHistB_smooth( 0 ),
161 fVarPdfS ( 0 ),
162 fVarPdfB ( 0 ),
163 fNegEffWarning( kFALSE )
164{
165}
166
167////////////////////////////////////////////////////////////////////////////////
168/// construction from weight file
169
171 const TString& theWeightFile) :
173 fFitMethod ( kUseGeneticAlgorithm ),
174 fEffMethod ( kUseEventSelection ),
175 fFitParams (0),
176 fTestSignalEff(0.7),
177 fEffSMin ( 0 ),
178 fEffSMax ( 0 ),
179 fCutRangeMin( 0 ),
180 fCutRangeMax( 0 ),
181 fBinaryTreeS( 0 ),
182 fBinaryTreeB( 0 ),
183 fCutMin ( 0 ),
184 fCutMax ( 0 ),
185 fTmpCutMin ( 0 ),
186 fTmpCutMax ( 0 ),
187 fAllVarsI ( 0 ),
188 fNpar ( 0 ),
189 fEffRef ( 0 ),
190 fRangeSign ( 0 ),
191 fRandom ( 0 ),
192 fMeanS ( 0 ),
193 fMeanB ( 0 ),
194 fRmsS ( 0 ),
195 fRmsB ( 0 ),
196 fEffBvsSLocal( 0 ),
197 fVarHistS ( 0 ),
198 fVarHistB ( 0 ),
199 fVarHistS_smooth( 0 ),
200 fVarHistB_smooth( 0 ),
201 fVarPdfS ( 0 ),
202 fVarPdfB ( 0 ),
203 fNegEffWarning( kFALSE )
204{
205}
206
207////////////////////////////////////////////////////////////////////////////////
208/// Cuts can only handle classification with 2 classes
209
215
216////////////////////////////////////////////////////////////////////////////////
217/// default initialisation called by all constructors
218
220{
221 fVarHistS = fVarHistB = 0;
222 fVarHistS_smooth = fVarHistB_smooth = 0;
223 fVarPdfS = fVarPdfB = 0;
224 fFitParams = 0;
225 fBinaryTreeS = fBinaryTreeB = 0;
226 fEffSMin = 0;
227 fEffSMax = 0;
228
229 // vector with fit results
230 fNpar = 2*GetNvar();
231 fRangeSign = new std::vector<Int_t> ( GetNvar() );
232 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) (*fRangeSign)[ivar] = +1;
233
234 fMeanS = new std::vector<Double_t>( GetNvar() );
235 fMeanB = new std::vector<Double_t>( GetNvar() );
236 fRmsS = new std::vector<Double_t>( GetNvar() );
237 fRmsB = new std::vector<Double_t>( GetNvar() );
238
239 // get the variable specific options, first initialize default
240 fFitParams = new std::vector<EFitParameters>( GetNvar() );
241 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) (*fFitParams)[ivar] = kNotEnforced;
242
243 fFitMethod = kUseMonteCarlo;
244 fTestSignalEff = -1;
245
246 // create LUT for cuts
247 fCutMin = new Double_t*[GetNvar()];
248 fCutMax = new Double_t*[GetNvar()];
249 for (UInt_t i=0; i<GetNvar(); i++) {
250 fCutMin[i] = new Double_t[fNbins];
251 fCutMax[i] = new Double_t[fNbins];
252 }
253
254 // init
255 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
256 for (Int_t ibin=0; ibin<fNbins; ibin++) {
257 fCutMin[ivar][ibin] = 0;
258 fCutMax[ivar][ibin] = 0;
259 }
260 }
261
262 fTmpCutMin = new Double_t[GetNvar()];
263 fTmpCutMax = new Double_t[GetNvar()];
264}
265
266////////////////////////////////////////////////////////////////////////////////
267/// destructor
268
270{
271 delete fRangeSign;
272 delete fMeanS;
273 delete fMeanB;
274 delete fRmsS;
275 delete fRmsB;
276 delete fFitParams;
277 delete fEffBvsSLocal;
278
279 if (NULL != fCutRangeMin) delete [] fCutRangeMin;
280 if (NULL != fCutRangeMax) delete [] fCutRangeMax;
281 if (NULL != fAllVarsI) delete [] fAllVarsI;
282
283 for (UInt_t i=0;i<GetNvar();i++) {
284 if (NULL != fCutMin[i] ) delete [] fCutMin[i];
285 if (NULL != fCutMax[i] ) delete [] fCutMax[i];
286 if (NULL != fCutRange[i]) delete fCutRange[i];
287 }
288
289 if (NULL != fCutMin) delete [] fCutMin;
290 if (NULL != fCutMax) delete [] fCutMax;
291
292 if (NULL != fTmpCutMin) delete [] fTmpCutMin;
293 if (NULL != fTmpCutMax) delete [] fTmpCutMax;
294
295 if (NULL != fBinaryTreeS) delete fBinaryTreeS;
296 if (NULL != fBinaryTreeB) delete fBinaryTreeB;
297}
298
299////////////////////////////////////////////////////////////////////////////////
300/// define the options (their key words) that can be set in the option string.
301///
302/// know options:
303/// - Method `<string>` Minimisation method. Available values are:
304/// - MC Monte Carlo `<default>`
305/// - GA Genetic Algorithm
306/// - SA Simulated annealing
307///
308/// - EffMethod `<string>` Efficiency selection method. Available values are:
309/// - EffSel `<default>`
310/// - EffPDF
311///
312/// - VarProp `<string>` Property of variable 1 for the MC method (taking precedence over the
313/// globale setting. The same values as for the global option are available. Variables 1..10 can be
314/// set this way
315///
316/// - CutRangeMin/Max `<float>` user-defined ranges in which cuts are varied
317
319{
320 DeclareOptionRef(fFitMethodS = "GA", "FitMethod", "Minimisation Method (GA, SA, and MC are the primary methods to be used; the others have been introduced for testing purposes and are depreciated)");
321 AddPreDefVal(TString("GA"));
322 AddPreDefVal(TString("SA"));
323 AddPreDefVal(TString("MC"));
324 AddPreDefVal(TString("MCEvents"));
325 AddPreDefVal(TString("MINUIT"));
326 AddPreDefVal(TString("EventScan"));
327
328 // selection type
329 DeclareOptionRef(fEffMethodS = "EffSel", "EffMethod", "Selection Method");
330 AddPreDefVal(TString("EffSel"));
331 AddPreDefVal(TString("EffPDF"));
332
333 // cut ranges
334 fCutRange.resize(GetNvar());
335 fCutRangeMin = new Double_t[GetNvar()];
336 fCutRangeMax = new Double_t[GetNvar()];
337 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
338 fCutRange[ivar] = 0;
339 fCutRangeMin[ivar] = fCutRangeMax[ivar] = -1;
340 }
341
342 DeclareOptionRef( fCutRangeMin, GetNvar(), "CutRangeMin", "Minimum of allowed cut range (set per variable)" );
343 DeclareOptionRef( fCutRangeMax, GetNvar(), "CutRangeMax", "Maximum of allowed cut range (set per variable)" );
344
345 fAllVarsI = new TString[GetNvar()];
346
347 for (UInt_t i=0; i<GetNvar(); i++) fAllVarsI[i] = "NotEnforced";
348
349 DeclareOptionRef(fAllVarsI, GetNvar(), "VarProp", "Categorisation of cuts");
350 AddPreDefVal(TString("NotEnforced"));
351 AddPreDefVal(TString("FMax"));
352 AddPreDefVal(TString("FMin"));
353 AddPreDefVal(TString("FSmart"));
354}
355
356////////////////////////////////////////////////////////////////////////////////
357/// process user options.
358///
359/// sanity check, do not allow the input variables to be normalised, because this
360/// only creates problems when interpreting the cuts
361
363{
364 if (IsNormalised()) {
365 Log() << kWARNING << "Normalisation of the input variables for cut optimisation is not" << Endl;
366 Log() << kWARNING << "supported because this provides intransparent cut values, and no" << Endl;
367 Log() << kWARNING << "improvement in the performance of the algorithm." << Endl;
368 Log() << kWARNING << "Please remove \"Normalise\" option from booking option string" << Endl;
369 Log() << kWARNING << "==> Will reset normalisation flag to \"False\"" << Endl;
370 SetNormalised( kFALSE );
371 }
372
373 if (IgnoreEventsWithNegWeightsInTraining()) {
374 Log() << kFATAL << "Mechanism to ignore events with negative weights in training not yet available for method: "
375 << GetMethodTypeName()
376 << " --> Please remove \"IgnoreNegWeightsInTraining\" option from booking string."
377 << Endl;
378 }
379
380 if (fFitMethodS == "MC" ) fFitMethod = kUseMonteCarlo;
381 else if (fFitMethodS == "MCEvents") fFitMethod = kUseMonteCarloEvents;
382 else if (fFitMethodS == "GA" ) fFitMethod = kUseGeneticAlgorithm;
383 else if (fFitMethodS == "SA" ) fFitMethod = kUseSimulatedAnnealing;
384 else if (fFitMethodS == "MINUIT" ) {
385 fFitMethod = kUseMinuit;
386 Log() << kWARNING << "poor performance of MINUIT in MethodCuts; preferred fit method: GA" << Endl;
387 }
388 else if (fFitMethodS == "EventScan" ) fFitMethod = kUseEventScan;
389 else Log() << kFATAL << "unknown minimisation method: " << fFitMethodS << Endl;
390
391 if (fEffMethodS == "EFFSEL" ) fEffMethod = kUseEventSelection; // highly recommended
392 else if (fEffMethodS == "EFFPDF" ) fEffMethod = kUsePDFs;
393 else fEffMethod = kUseEventSelection;
394
395 // options output
396 Log() << kINFO << Form("Use optimization method: \"%s\"",
397 (fFitMethod == kUseMonteCarlo) ? "Monte Carlo" :
398 (fFitMethod == kUseMonteCarlo) ? "Monte-Carlo-Event sampling" :
399 (fFitMethod == kUseEventScan) ? "Full Event Scan (slow)" :
400 (fFitMethod == kUseMinuit) ? "MINUIT" : "Genetic Algorithm" ) << Endl;
401 Log() << kINFO << Form("Use efficiency computation method: \"%s\"",
402 (fEffMethod == kUseEventSelection) ? "Event Selection" : "PDF" ) << Endl;
403
404 // cut ranges
405 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
406 fCutRange[ivar] = new Interval( fCutRangeMin[ivar], fCutRangeMax[ivar] );
407 }
408
409 // individual options
410 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
411 EFitParameters theFitP = kNotEnforced;
412 if (fAllVarsI[ivar] == "" || fAllVarsI[ivar] == "NotEnforced") theFitP = kNotEnforced;
413 else if (fAllVarsI[ivar] == "FMax" ) theFitP = kForceMax;
414 else if (fAllVarsI[ivar] == "FMin" ) theFitP = kForceMin;
415 else if (fAllVarsI[ivar] == "FSmart" ) theFitP = kForceSmart;
416 else {
417 Log() << kFATAL << "unknown value \'" << fAllVarsI[ivar]
418 << "\' for fit parameter option " << Form("VarProp[%i]",ivar) << Endl;
419 }
420 (*fFitParams)[ivar] = theFitP;
421
422 if (theFitP != kNotEnforced)
423 Log() << kINFO << "Use \"" << fAllVarsI[ivar]
424 << "\" cuts for variable: " << "'" << (*fInputVars)[ivar] << "'" << Endl;
425 }
426}
427
428////////////////////////////////////////////////////////////////////////////////
429/// cut evaluation: returns 1.0 if event passed, 0.0 otherwise
430
432{
433 // cannot determine error
434 NoErrorCalc(err, errUpper);
435
436 // sanity check
437 if (fCutMin == NULL || fCutMax == NULL || fNbins == 0) {
438 Log() << kFATAL << "<Eval_Cuts> fCutMin/Max have zero pointer. "
439 << "Did you book Cuts ?" << Endl;
440 }
441
442 const Event* ev = GetEvent();
443
444 // sanity check
445 if (fTestSignalEff > 0) {
446 // get efficiency bin
447 Int_t ibin = fEffBvsSLocal->FindBin( fTestSignalEff );
448 if (ibin < 0 ) ibin = 0;
449 else if (ibin >= fNbins) ibin = fNbins - 1;
450
452 for (UInt_t ivar=0; ivar<GetNvar(); ivar++)
453 passed &= ( (ev->GetValue(ivar) > fCutMin[ivar][ibin]) &&
454 (ev->GetValue(ivar) <= fCutMax[ivar][ibin]) );
455
456 return passed ? 1. : 0. ;
457 }
458 else return 0;
459}
460
461////////////////////////////////////////////////////////////////////////////////
462/// print cuts
463
465{
466 std::vector<Double_t> cutsMin;
467 std::vector<Double_t> cutsMax;
468 Int_t ibin = fEffBvsSLocal->FindBin( effS );
469
470 Double_t trueEffS = GetCuts( effS, cutsMin, cutsMax );
471
472 // retrieve variable expressions (could be transformations)
473 std::vector<TString>* varVec = 0;
474 if (GetTransformationHandler().GetNumOfTransformations() == 0) {
475 // no transformation applied, replace by current variables
476 varVec = new std::vector<TString>;
477 for (UInt_t ivar=0; ivar<cutsMin.size(); ivar++) {
478 varVec->push_back( DataInfo().GetVariableInfo(ivar).GetLabel() );
479 }
480 }
481 else if (GetTransformationHandler().GetNumOfTransformations() == 1) {
482 // get transformation string
483 varVec = GetTransformationHandler().GetTransformationStringsOfLastTransform();
484 }
485 else {
486 // replace transformation print by current variables and indicated incompleteness
487 varVec = new std::vector<TString>;
488 for (UInt_t ivar=0; ivar<cutsMin.size(); ivar++) {
489 varVec->push_back( DataInfo().GetVariableInfo(ivar).GetLabel() + " [transformed]" );
490 }
491 }
492
493 UInt_t maxL = 0;
494 for (UInt_t ivar=0; ivar<cutsMin.size(); ivar++) {
495 if ((UInt_t)(*varVec)[ivar].Length() > maxL) maxL = (*varVec)[ivar].Length();
496 }
497 UInt_t maxLine = 20+maxL+16;
498
499 for (UInt_t i=0; i<maxLine; i++) Log() << "-";
500 Log() << Endl;
501 Log() << kHEADER << "Cut values for requested signal efficiency: " << trueEffS << Endl;
502 Log() << kINFO << "Corresponding background efficiency : " << fEffBvsSLocal->GetBinContent( ibin ) << Endl;
503 if (GetTransformationHandler().GetNumOfTransformations() == 1) {
504 Log() << kINFO << "Transformation applied to input variables : \""
505 << GetTransformationHandler().GetNameOfLastTransform() << "\"" << Endl;
506 }
507 else if (GetTransformationHandler().GetNumOfTransformations() > 1) {
508 Log() << kINFO << "[ More than one (=" << GetTransformationHandler().GetNumOfTransformations() << ") "
509 << " transformations applied in transformation chain; cuts applied on transformed quantities ] " << Endl;
510 }
511 else {
512 Log() << kINFO << "Transformation applied to input variables : None" << Endl;
513 }
514 for (UInt_t i=0; i<maxLine; i++) Log() << "-";
515 Log() << Endl;
516 for (UInt_t ivar=0; ivar<cutsMin.size(); ivar++) {
517 Log() << kINFO
518 << "Cut[" << std::setw(2) << ivar << "]: "
519 << std::setw(10) << cutsMin[ivar]
520 << " < "
521 << std::setw(maxL) << (*varVec)[ivar]
522 << " <= "
523 << std::setw(10) << cutsMax[ivar] << Endl;
524 }
525 for (UInt_t i=0; i<maxLine; i++) Log() << "-";
526 Log() << Endl;
527
528 delete varVec; // yes, ownership has been given to us
529}
530
531////////////////////////////////////////////////////////////////////////////////
532/// retrieve cut values for given signal efficiency
533/// assume vector of correct size !!
534
536{
537 std::vector<Double_t> cMin( GetNvar() );
538 std::vector<Double_t> cMax( GetNvar() );
539 Double_t trueEffS = GetCuts( effS, cMin, cMax );
540 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
541 cutMin[ivar] = cMin[ivar];
542 cutMax[ivar] = cMax[ivar];
543 }
544 return trueEffS;
545}
546
547////////////////////////////////////////////////////////////////////////////////
548/// retrieve cut values for given signal efficiency
549
551 std::vector<Double_t>& cutMin,
552 std::vector<Double_t>& cutMax ) const
553{
554 // find corresponding bin
555 Int_t ibin = fEffBvsSLocal->FindBin( effS );
556
557 // get the true efficiency which is the one on the "left hand" side of the bin
558 Double_t trueEffS = fEffBvsSLocal->GetBinLowEdge( ibin );
559
560 ibin--; // the 'cut' vector has 0...fNbins indices
561 if (ibin < 0 ) ibin = 0;
562 else if (ibin >= fNbins) ibin = fNbins - 1;
563
564 cutMin.clear();
565 cutMax.clear();
566 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
567 cutMin.push_back( fCutMin[ivar][ibin] );
568 cutMax.push_back( fCutMax[ivar][ibin] );
569 }
570
571 return trueEffS;
572}
573
574////////////////////////////////////////////////////////////////////////////////
575/// training method: here the cuts are optimised for the training sample
576
578{
579 if (fEffMethod == kUsePDFs) CreateVariablePDFs(); // create PDFs for variables
580
581 // create binary trees (global member variables) for signal and background
582 if (fBinaryTreeS != 0) { delete fBinaryTreeS; fBinaryTreeS = 0; }
583 if (fBinaryTreeB != 0) { delete fBinaryTreeB; fBinaryTreeB = 0; }
584
585 // the variables may be transformed by a transformation method: to coherently
586 // treat signal and background one must decide which transformation type shall
587 // be used: our default is signal-type
588
589 fBinaryTreeS = new BinarySearchTree();
590 fBinaryTreeS->Fill( GetEventCollection(Types::kTraining), fSignalClass );
591 fBinaryTreeB = new BinarySearchTree();
592 fBinaryTreeB->Fill( GetEventCollection(Types::kTraining), fBackgroundClass );
593
594 for (UInt_t ivar =0; ivar < Data()->GetNVariables(); ivar++) {
595 (*fMeanS)[ivar] = fBinaryTreeS->Mean(Types::kSignal, ivar);
596 (*fRmsS)[ivar] = fBinaryTreeS->RMS (Types::kSignal, ivar);
597 (*fMeanB)[ivar] = fBinaryTreeB->Mean(Types::kBackground, ivar);
598 (*fRmsB)[ivar] = fBinaryTreeB->RMS (Types::kBackground, ivar);
599
600 // update interval ?
601 Double_t xmin = TMath::Min(fBinaryTreeS->Min(Types::kSignal, ivar),
602 fBinaryTreeB->Min(Types::kBackground, ivar));
603 Double_t xmax = TMath::Max(fBinaryTreeS->Max(Types::kSignal, ivar),
604 fBinaryTreeB->Max(Types::kBackground, ivar));
605
606 // redefine ranges to be slightly smaller and larger than xmin and xmax, respectively
607 Double_t eps = 0.01*(xmax - xmin);
608 xmin -= eps;
609 xmax += eps;
610
611 if (TMath::Abs(fCutRange[ivar]->GetMin() - fCutRange[ivar]->GetMax()) < 1.0e-300 ) {
612 fCutRange[ivar]->SetMin( xmin );
613 fCutRange[ivar]->SetMax( xmax );
614 }
615 else if (xmin > fCutRange[ivar]->GetMin()) fCutRange[ivar]->SetMin( xmin );
616 else if (xmax < fCutRange[ivar]->GetMax()) fCutRange[ivar]->SetMax( xmax );
617 }
618
619 std::vector<TH1F*> signalDist, bkgDist;
620
621 // this is important: reset the branch addresses of the training tree to the current event
622 delete fEffBvsSLocal;
623 fEffBvsSLocal = new TH1F( GetTestvarName() + "_effBvsSLocal",
624 TString(GetName()) + " efficiency of B vs S", fNbins, 0.0, 1.0 );
625 fEffBvsSLocal->SetDirectory(nullptr); // it's local
626
627 // init
628 for (Int_t ibin=1; ibin<=fNbins; ibin++) fEffBvsSLocal->SetBinContent( ibin, -0.1 );
629
630 // --------------------------------------------------------------------------
631 if (fFitMethod == kUseGeneticAlgorithm ||
632 fFitMethod == kUseMonteCarlo ||
633 fFitMethod == kUseMinuit ||
634 fFitMethod == kUseSimulatedAnnealing) {
635
636 // ranges
637 std::vector<Interval*> ranges;
638
639 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
640
641 Int_t nbins = 0;
642 if (DataInfo().GetVariableInfo(ivar).GetVarType() == 'I') {
643 nbins = Int_t(fCutRange[ivar]->GetMax() - fCutRange[ivar]->GetMin()) + 1;
644 }
645
646 if ((*fFitParams)[ivar] == kForceSmart) {
647 if ((*fMeanS)[ivar] > (*fMeanB)[ivar]) (*fFitParams)[ivar] = kForceMax;
648 else (*fFitParams)[ivar] = kForceMin;
649 }
650
651 if ((*fFitParams)[ivar] == kForceMin) {
652 ranges.push_back( new Interval( fCutRange[ivar]->GetMin(), fCutRange[ivar]->GetMin(), nbins ) );
653 ranges.push_back( new Interval( 0, fCutRange[ivar]->GetMax() - fCutRange[ivar]->GetMin(), nbins ) );
654 }
655 else if ((*fFitParams)[ivar] == kForceMax) {
656 ranges.push_back( new Interval( fCutRange[ivar]->GetMin(), fCutRange[ivar]->GetMax(), nbins ) );
657 ranges.push_back( new Interval( fCutRange[ivar]->GetMax() - fCutRange[ivar]->GetMin(),
658 fCutRange[ivar]->GetMax() - fCutRange[ivar]->GetMin(), nbins ) );
659 }
660 else {
661 ranges.push_back( new Interval( fCutRange[ivar]->GetMin(), fCutRange[ivar]->GetMax(), nbins ) );
662 ranges.push_back( new Interval( 0, fCutRange[ivar]->GetMax() - fCutRange[ivar]->GetMin(), nbins ) );
663 }
664 }
665
666 // create the fitter
667 FitterBase* fitter = NULL;
668
669 switch (fFitMethod) {
670 case kUseGeneticAlgorithm:
671 fitter = new GeneticFitter( *this, TString::Format("%sFitter_GA", GetName()), ranges, GetOptions() );
672 break;
673 case kUseMonteCarlo:
674 fitter = new MCFitter ( *this, TString::Format("%sFitter_MC", GetName()), ranges, GetOptions() );
675 break;
676 case kUseMinuit:
677 fitter = new MinuitFitter ( *this, TString::Format("%sFitter_MINUIT", GetName()), ranges, GetOptions() );
678 break;
679 case kUseSimulatedAnnealing:
680 fitter = new SimulatedAnnealingFitter( *this, TString::Format("%sFitter_SA", GetName()), ranges, GetOptions() );
681 break;
682 default:
683 Log() << kFATAL << "Wrong fit method: " << fFitMethod << Endl;
684 }
685
686 if (fInteractive) fitter->SetIPythonInteractive(&fExitFromTraining, &fIPyMaxIter, &fIPyCurrentIter);
687
688 fitter->CheckForUnusedOptions();
689
690 // perform the fit
691 fitter->Run();
692
693 // clean up
694 for (UInt_t ivar=0; ivar<ranges.size(); ivar++) delete ranges[ivar];
695 delete fitter;
696
697 }
698 // --------------------------------------------------------------------------
699 else if (fFitMethod == kUseEventScan) {
700
701 Int_t nevents = Data()->GetNEvents();
702 Int_t ic = 0;
703
704 // timing of MC
705 Int_t nsamples = Int_t(0.5*nevents*(nevents - 1));
706 Timer timer( nsamples, GetName() );
707 fIPyMaxIter = nsamples;
708
709 Log() << kINFO << "Running full event scan: " << Endl;
710 for (Int_t ievt1=0; ievt1<nevents; ievt1++) {
711 for (Int_t ievt2=ievt1+1; ievt2<nevents; ievt2++) {
712
713 fIPyCurrentIter = ic;
714 if (fExitFromTraining) break;
715 EstimatorFunction( ievt1, ievt2 );
716
717 // what's the time please?
718 ic++;
719 if ((nsamples<10000) || ic%10000 == 0) timer.DrawProgressBar( ic );
720 }
721 }
722 }
723 // --------------------------------------------------------------------------
724 else if (fFitMethod == kUseMonteCarloEvents) {
725
726 Int_t nsamples = 200000;
727 UInt_t seed = 100;
728 DeclareOptionRef( nsamples, "SampleSize", "Number of Monte-Carlo-Event samples" );
729 DeclareOptionRef( seed, "Seed", "Seed for the random generator (0 takes random seeds)" );
730 ParseOptions();
731
732 Int_t nevents = Data()->GetNEvents();
733 Int_t ic = 0;
734
735 // timing of MC
736 Timer timer( nsamples, GetName() );
737 fIPyMaxIter = nsamples;
738
739 // random generator
740 TRandom3*rnd = new TRandom3( seed );
741
742 Log() << kINFO << "Running Monte-Carlo-Event sampling over " << nsamples << " events" << Endl;
743 std::vector<Double_t> pars( 2*GetNvar() );
744
745 for (Int_t itoy=0; itoy<nsamples; itoy++) {
746 fIPyCurrentIter = ic;
747 if (fExitFromTraining) break;
748
749 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
750
751 // generate minimum and delta cuts for this variable
752
753 // retrieve signal events
756 Double_t evt1 = 0., evt2 = 0.;
757 Int_t nbreak = 0;
758 while (!isSignal) {
759 ievt1 = Int_t(rnd->Uniform(0.,1.)*nevents);
760 ievt2 = Int_t(rnd->Uniform(0.,1.)*nevents);
761
762 const Event *ev1 = GetEvent(ievt1);
763 isSignal = DataInfo().IsSignal(ev1);
764 evt1 = ev1->GetValue( ivar );
765
766 const Event *ev2 = GetEvent(ievt2);
767 isSignal &= DataInfo().IsSignal(ev2);
768 evt2 = ev2->GetValue( ivar );
769
770 if (nbreak++ > 10000) {
771 Log() << kFATAL << "<MCEvents>: could not find signal events"
772 << " after 10000 trials - do you have signal events in your sample ?"
773 << Endl;
774 isSignal = 1;
775 }
776 }
777
778 // sort
779 if (evt1 > evt2) { Double_t z = evt1; evt1 = evt2; evt2 = z; }
780 pars[2*ivar] = evt1;
781 pars[2*ivar+1] = evt2 - evt1;
782 }
783
784 // compute estimator
785 EstimatorFunction( pars );
786
787 // what's the time please?
788 ic++;
789 if ((nsamples<1000) || ic%1000 == 0) timer.DrawProgressBar( ic );
790 }
791
792 delete rnd;
793 }
794 // --------------------------------------------------------------------------
795 else Log() << kFATAL << "Unknown minimisation method: " << fFitMethod << Endl;
796
797 if (fBinaryTreeS != 0) { delete fBinaryTreeS; fBinaryTreeS = 0; }
798 if (fBinaryTreeB != 0) { delete fBinaryTreeB; fBinaryTreeB = 0; }
799
800 // force cut ranges within limits
801 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
802 for (Int_t ibin=0; ibin<fNbins; ibin++) {
803
804 if ((*fFitParams)[ivar] == kForceMin && fCutMin[ivar][ibin] > -fgMaxAbsCutVal) {
805 fCutMin[ivar][ibin] = -fgMaxAbsCutVal;
806 }
807 if ((*fFitParams)[ivar] == kForceMax && fCutMax[ivar][ibin] < fgMaxAbsCutVal) {
808 fCutMax[ivar][ibin] = fgMaxAbsCutVal;
809 }
810 }
811 }
812
813 // some output
814 // the efficiency which is asked for has to be slightly higher than the bin-borders.
815 // if not, then the wrong bin is taken in some cases.
816 Double_t epsilon = 0.0001;
817 for (Double_t eff=0.1; eff<0.95; eff += 0.1) PrintCuts( eff+epsilon );
818
819 if (!fExitFromTraining) fIPyMaxIter = fIPyCurrentIter;
820 ExitFromTraining();
821}
822
823////////////////////////////////////////////////////////////////////////////////
824/// nothing to test
825
829
830////////////////////////////////////////////////////////////////////////////////
831/// for full event scan
832
834{
835 const Event *ev1 = GetEvent(ievt1);
836 if (!DataInfo().IsSignal(ev1)) return -1;
837
838 const Event *ev2 = GetEvent(ievt2);
839 if (!DataInfo().IsSignal(ev2)) return -1;
840
841 const Int_t nvar = GetNvar();
842 Double_t* evt1 = new Double_t[nvar];
843 Double_t* evt2 = new Double_t[nvar];
844
845 for (Int_t ivar=0; ivar<nvar; ivar++) {
846 evt1[ivar] = ev1->GetValue( ivar );
847 evt2[ivar] = ev2->GetValue( ivar );
848 }
849
850 // determine cuts
851 std::vector<Double_t> pars;
852 for (Int_t ivar=0; ivar<nvar; ivar++) {
855 if (evt1[ivar] < evt2[ivar]) {
856 cutMin = evt1[ivar];
857 cutMax = evt2[ivar];
858 }
859 else {
860 cutMin = evt2[ivar];
861 cutMax = evt1[ivar];
862 }
863
864 pars.push_back( cutMin );
865 pars.push_back( cutMax - cutMin );
866 }
867
868 delete [] evt1;
869 delete [] evt2;
870
871 return ComputeEstimator( pars );
872}
873
874////////////////////////////////////////////////////////////////////////////////
875/// returns estimator for "cut fitness" used by GA
876
877Double_t TMVA::MethodCuts::EstimatorFunction( std::vector<Double_t>& pars )
878{
879 return ComputeEstimator( pars );
880}
881
882////////////////////////////////////////////////////////////////////////////////
883/// returns estimator for "cut fitness" used by GA.
884///
885/// there are two requirements:
886/// 1. the signal efficiency must be equal to the required one in the
887/// efficiency scan
888/// 2. the background efficiency must be as small as possible
889///
890/// the requirement 1. has priority over 2.
891
892Double_t TMVA::MethodCuts::ComputeEstimator( std::vector<Double_t>& pars )
893{
894 // caution: the npar gives the _free_ parameters
895 // however: the "pars" array contains all parameters
896
897 // determine cuts
898 Double_t effS = 0, effB = 0;
899 this->MatchParsToCuts( pars, &fTmpCutMin[0], &fTmpCutMax[0] );
900
901 // retrieve signal and background efficiencies for given cut
902 switch (fEffMethod) {
903 case kUsePDFs:
904 this->GetEffsfromPDFs (&fTmpCutMin[0], &fTmpCutMax[0], effS, effB);
905 break;
906 case kUseEventSelection:
907 this->GetEffsfromSelection (&fTmpCutMin[0], &fTmpCutMax[0], effS, effB);
908 break;
909 default:
910 this->GetEffsfromSelection (&fTmpCutMin[0], &fTmpCutMax[0], effS, effB);
911 }
912
913 Double_t eta = 0;
914
915 // test for a estimator function which optimizes on the whole background-rejection signal-efficiency plot
916
917 // get the backg-reject. and sig-eff for the parameters given to this function
918 // effS, effB
919
920 // get best background rejection for given signal efficiency
921 Int_t ibinS = fEffBvsSLocal->FindBin( effS );
922
923 Double_t effBH = fEffBvsSLocal->GetBinContent( ibinS );
924 Double_t effBH_left = (ibinS > 1 ) ? fEffBvsSLocal->GetBinContent( ibinS-1 ) : effBH;
925 Double_t effBH_right = (ibinS < fNbins) ? fEffBvsSLocal->GetBinContent( ibinS+1 ) : effBH;
926
927 Double_t average = 0.5*(effBH_left + effBH_right);
928 if (effBH < effB) average = effBH;
929
930 // if the average of the bin right and left is larger than this one, add the difference to
931 // the current value of the estimator (because you can do at least so much better)
932 eta = ( -TMath::Abs(effBH-average) + (1.0 - (effBH - effB))) / (1.0 + effS);
933 // alternative idea
934 //if (effBH<0) eta = (1.e-6+effB)/(1.0 + effS);
935 //else eta = (effB - effBH) * (1.0 + 10.* effS);
936
937 // if a point is found which is better than an existing one, ... replace it.
938 // preliminary best event -> backup
940 fEffBvsSLocal->SetBinContent( ibinS, effB );
941 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
942 fCutMin[ivar][ibinS-1] = fTmpCutMin[ivar]; // bin 1 stored in index 0
943 fCutMax[ivar][ibinS-1] = fTmpCutMax[ivar];
944 }
945 }
946
947 // caution (!) this value is not good for a decision for MC, .. it is designed for GA
948 // but .. it doesn't matter, as MC samplings are independent from the former ones
949 // and the replacement of the best variables by better ones is done about 10 lines above.
950 // ( if (effBH < 0 || effBH > effB) { .... )
951
952 if (ibinS<=1) {
953 // add penalty for effS=0 bin
954 // to avoid that the minimizer gets stuck in the zero-bin
955 // force it towards higher efficiency
956 Double_t penalty=0.,diff=0.;
957 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
958 diff=(fCutRange[ivar]->GetMax()-fTmpCutMax[ivar])/(fCutRange[ivar]->GetMax()-fCutRange[ivar]->GetMin());
960 diff=(fCutRange[ivar]->GetMin()-fTmpCutMin[ivar])/(fCutRange[ivar]->GetMax()-fCutRange[ivar]->GetMin());
961 penalty+=4.*diff*diff;
962 }
963
964 if (effS<1.e-4) return 10.0+penalty;
965 else return 10.*(1.-10.*effS);
966 }
967 return eta;
968}
969
970////////////////////////////////////////////////////////////////////////////////
971/// translates parameters into cuts
972
973void TMVA::MethodCuts::MatchParsToCuts( const std::vector<Double_t> & pars,
975{
976 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
977 Int_t ipar = 2*ivar;
978 cutMin[ivar] = ((*fRangeSign)[ivar] > 0) ? pars[ipar] : pars[ipar] - pars[ipar+1];
979 cutMax[ivar] = ((*fRangeSign)[ivar] > 0) ? pars[ipar] + pars[ipar+1] : pars[ipar];
980 }
981}
982
983////////////////////////////////////////////////////////////////////////////////
984/// translate the cuts into parameters (obsolete function)
985
986void TMVA::MethodCuts::MatchCutsToPars( std::vector<Double_t>& pars,
988{
989 if (ibin < 1 || ibin > fNbins) Log() << kFATAL << "::MatchCutsToPars: bin error: "
990 << ibin << Endl;
991
992 const UInt_t nvar = GetNvar();
993 Double_t *cutMin = new Double_t[nvar];
994 Double_t *cutMax = new Double_t[nvar];
995 for (UInt_t ivar=0; ivar<nvar; ivar++) {
998 }
999
1000 MatchCutsToPars( pars, cutMin, cutMax );
1001 delete [] cutMin;
1002 delete [] cutMax;
1003}
1004
1005////////////////////////////////////////////////////////////////////////////////
1006/// translates cuts into parameters
1007
1008void TMVA::MethodCuts::MatchCutsToPars( std::vector<Double_t>& pars,
1010{
1011 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
1012 Int_t ipar = 2*ivar;
1013 pars[ipar] = ((*fRangeSign)[ivar] > 0) ? cutMin[ivar] : cutMax[ivar];
1014 pars[ipar+1] = cutMax[ivar] - cutMin[ivar];
1015 }
1016}
1017
1018////////////////////////////////////////////////////////////////////////////////
1019/// compute signal and background efficiencies from PDFs
1020/// for given cut sample
1021
1024{
1025 effS = 1.0;
1026 effB = 1.0;
1027 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
1028 effS *= (*fVarPdfS)[ivar]->GetIntegral( cutMin[ivar], cutMax[ivar] );
1029 effB *= (*fVarPdfB)[ivar]->GetIntegral( cutMin[ivar], cutMax[ivar] );
1030 }
1031
1032 // quick fix to prevent from efficiencies < 0
1033 if( effS < 0.0 ) {
1034 effS = 0.0;
1035 if( !fNegEffWarning ) Log() << kWARNING << "Negative signal efficiency found and set to 0. This is probably due to many events with negative weights in a certain cut-region." << Endl;
1036 fNegEffWarning = kTRUE;
1037 }
1038 if( effB < 0.0 ) {
1039 effB = 0.0;
1040 if( !fNegEffWarning ) Log() << kWARNING << "Negative background efficiency found and set to 0. This is probably due to many events with negative weights in a certain cut-region." << Endl;
1041 fNegEffWarning = kTRUE;
1042 }
1043}
1044
1045////////////////////////////////////////////////////////////////////////////////
1046/// compute signal and background efficiencies from event counting
1047/// for given cut sample
1048
1051{
1052 Float_t nTotS = 0, nTotB = 0;
1053 Float_t nSelS = 0, nSelB = 0;
1054
1055 Volume* volume = new Volume( cutMin, cutMax, GetNvar() );
1056
1057 // search for all events lying in the volume, and add up their weights
1058 nSelS = fBinaryTreeS->SearchVolume( volume );
1059 nSelB = fBinaryTreeB->SearchVolume( volume );
1060
1061 delete volume;
1062
1063 // total number of "events" (sum of weights) as reference to compute efficiency
1064 nTotS = fBinaryTreeS->GetSumOfWeights();
1065 nTotB = fBinaryTreeB->GetSumOfWeights();
1066
1067 // sanity check
1068 if (nTotS == 0 && nTotB == 0) {
1069 Log() << kFATAL << "<GetEffsfromSelection> fatal error in zero total number of events:"
1070 << " nTotS, nTotB: " << nTotS << " " << nTotB << " ***" << Endl;
1071 }
1072
1073 // efficiencies
1074 if (nTotS == 0 ) {
1075 effS = 0;
1076 effB = nSelB/nTotB;
1077 Log() << kWARNING << "<ComputeEstimator> zero number of signal events" << Endl;
1078 }
1079 else if (nTotB == 0) {
1080 effB = 0;
1081 effS = nSelS/nTotS;
1082 Log() << kWARNING << "<ComputeEstimator> zero number of background events" << Endl;
1083 }
1084 else {
1085 effS = nSelS/nTotS;
1086 effB = nSelB/nTotB;
1087 }
1088
1089 // quick fix to prevent from efficiencies < 0
1090 if( effS < 0.0 ) {
1091 effS = 0.0;
1092 if( !fNegEffWarning ) Log() << kWARNING << "Negative signal efficiency found and set to 0. This is probably due to many events with negative weights in a certain cut-region." << Endl;
1093 fNegEffWarning = kTRUE;
1094 }
1095 if( effB < 0.0 ) {
1096 effB = 0.0;
1097 if( !fNegEffWarning ) Log() << kWARNING << "Negative background efficiency found and set to 0. This is probably due to many events with negative weights in a certain cut-region." << Endl;
1098 fNegEffWarning = kTRUE;
1099 }
1100}
1101
1102////////////////////////////////////////////////////////////////////////////////
1103/// for PDF method: create efficiency reference histograms and PDFs
1104
1106{
1107 // create list of histograms and PDFs
1108 fVarHistS = new std::vector<TH1*>( GetNvar() );
1109 fVarHistB = new std::vector<TH1*>( GetNvar() );
1110 fVarHistS_smooth = new std::vector<TH1*>( GetNvar() );
1111 fVarHistB_smooth = new std::vector<TH1*>( GetNvar() );
1112 fVarPdfS = new std::vector<PDF*>( GetNvar() );
1113 fVarPdfB = new std::vector<PDF*>( GetNvar() );
1114
1115 Int_t nsmooth = 0;
1116
1117 // get min and max values of all events
1118 Double_t minVal = DBL_MAX;
1119 Double_t maxVal = -DBL_MAX;
1120 for( UInt_t ievt=0; ievt<Data()->GetNEvents(); ievt++ ){
1121 const Event *ev = GetEvent(ievt);
1122 Float_t val = ev->GetValue(ievt);
1123 if( val > minVal ) minVal = val;
1124 if( val < maxVal ) maxVal = val;
1125 }
1126
1127 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
1128
1129 // ---- signal
1130 TString histTitle = (*fInputVars)[ivar] + " signal training";
1131 TString histName = (*fInputVars)[ivar] + "_sig";
1132 // TString drawOpt = (*fInputVars)[ivar] + ">>h(";
1133 // drawOpt += fNbins;
1134 // drawOpt += ")";
1135
1136 // selection
1137 // Data().GetTrainingTree()->Draw( drawOpt, "type==1", "goff" );
1138 // (*fVarHistS)[ivar] = (TH1F*)gDirectory->Get("h");
1139 // (*fVarHistS)[ivar]->SetName(histName);
1140 // (*fVarHistS)[ivar]->SetTitle(histTitle);
1141
1142 (*fVarHistS)[ivar] = new TH1F(histName.Data(), histTitle.Data(), fNbins, minVal, maxVal );
1143
1144 // ---- background
1145 histTitle = (*fInputVars)[ivar] + " background training";
1146 histName = (*fInputVars)[ivar] + "_bgd";
1147 // drawOpt = (*fInputVars)[ivar] + ">>h(";
1148 // drawOpt += fNbins;
1149 // drawOpt += ")";
1150
1151 // Data().GetTrainingTree()->Draw( drawOpt, "type==0", "goff" );
1152 // (*fVarHistB)[ivar] = (TH1F*)gDirectory->Get("h");
1153 // (*fVarHistB)[ivar]->SetName(histName);
1154 // (*fVarHistB)[ivar]->SetTitle(histTitle);
1155
1156
1157 (*fVarHistB)[ivar] = new TH1F(histName.Data(), histTitle.Data(), fNbins, minVal, maxVal );
1158
1159 for( UInt_t ievt=0; ievt<Data()->GetNEvents(); ievt++ ){
1160 const Event *ev = GetEvent(ievt);
1161 Float_t val = ev->GetValue(ievt);
1162 if( DataInfo().IsSignal(ev) ){
1163 (*fVarHistS)[ivar]->Fill( val );
1164 }else{
1165 (*fVarHistB)[ivar]->Fill( val );
1166 }
1167 }
1168
1169
1170
1171 // make copy for smoothed histos
1172 (*fVarHistS_smooth)[ivar] = (TH1F*)(*fVarHistS)[ivar]->Clone();
1173 histTitle = (*fInputVars)[ivar] + " signal training smoothed ";
1174 histTitle += nsmooth;
1175 histTitle +=" times";
1176 histName = (*fInputVars)[ivar] + "_sig_smooth";
1177 (*fVarHistS_smooth)[ivar]->SetName(histName);
1178 (*fVarHistS_smooth)[ivar]->SetTitle(histTitle);
1179
1180 // smooth
1181 (*fVarHistS_smooth)[ivar]->Smooth(nsmooth);
1182
1183 // ---- background
1184 // histTitle = (*fInputVars)[ivar] + " background training";
1185 // histName = (*fInputVars)[ivar] + "_bgd";
1186 // drawOpt = (*fInputVars)[ivar] + ">>h(";
1187 // drawOpt += fNbins;
1188 // drawOpt += ")";
1189
1190 // Data().GetTrainingTree()->Draw( drawOpt, "type==0", "goff" );
1191 // (*fVarHistB)[ivar] = (TH1F*)gDirectory->Get("h");
1192 // (*fVarHistB)[ivar]->SetName(histName);
1193 // (*fVarHistB)[ivar]->SetTitle(histTitle);
1194
1195 // make copy for smoothed histos
1196 (*fVarHistB_smooth)[ivar] = (TH1F*)(*fVarHistB)[ivar]->Clone();
1197 histTitle = (*fInputVars)[ivar]+" background training smoothed ";
1198 histTitle += nsmooth;
1199 histTitle +=" times";
1200 histName = (*fInputVars)[ivar]+"_bgd_smooth";
1201 (*fVarHistB_smooth)[ivar]->SetName(histName);
1202 (*fVarHistB_smooth)[ivar]->SetTitle(histTitle);
1203
1204 // smooth
1205 (*fVarHistB_smooth)[ivar]->Smooth(nsmooth);
1206
1207 // create PDFs
1208 (*fVarPdfS)[ivar] = new PDF( TString(GetName()) + " PDF Var Sig " + GetInputVar( ivar ), (*fVarHistS_smooth)[ivar], PDF::kSpline2 );
1209 (*fVarPdfB)[ivar] = new PDF( TString(GetName()) + " PDF Var Bkg " + GetInputVar( ivar ), (*fVarHistB_smooth)[ivar], PDF::kSpline2 );
1210 }
1211}
1212
1213////////////////////////////////////////////////////////////////////////////////
1214/// read the cuts from stream
1215
1217{
1218 TString dummy;
1220
1221 // first the dimensions
1222 istr >> dummy >> dummy;
1223 // coverity[tainted_data_argument]
1224 istr >> dummy >> fNbins;
1225
1226 // get rid of one read-in here because we read in once all ready to check for decorrelation
1227 istr >> dummy >> dummy >> dummy >> dummy >> dummy >> dummy >> dummyInt >> dummy ;
1228
1229 // sanity check
1230 if (dummyInt != Data()->GetNVariables()) {
1231 Log() << kFATAL << "<ReadWeightsFromStream> fatal error: mismatch "
1232 << "in number of variables: " << dummyInt << " != " << Data()->GetNVariables() << Endl;
1233 }
1234 //SetNvar(dummyInt);
1235
1236 // print some information
1237 if (fFitMethod == kUseMonteCarlo) {
1238 Log() << kWARNING << "Read cuts optimised using sample of MC events" << Endl;
1239 }
1240 else if (fFitMethod == kUseMonteCarloEvents) {
1241 Log() << kWARNING << "Read cuts optimised using sample of MC events" << Endl;
1242 }
1243 else if (fFitMethod == kUseGeneticAlgorithm) {
1244 Log() << kINFO << "Read cuts optimised using Genetic Algorithm" << Endl;
1245 }
1246 else if (fFitMethod == kUseSimulatedAnnealing) {
1247 Log() << kINFO << "Read cuts optimised using Simulated Annealing algorithm" << Endl;
1248 }
1249 else if (fFitMethod == kUseEventScan) {
1250 Log() << kINFO << "Read cuts optimised using Full Event Scan" << Endl;
1251 }
1252 else {
1253 Log() << kWARNING << "unknown method: " << fFitMethod << Endl;
1254 }
1255 Log() << kINFO << "in " << fNbins << " signal efficiency bins and for " << GetNvar() << " variables" << Endl;
1256
1257 // now read the cuts
1258 char buffer[200];
1259 istr.getline(buffer,200);
1260 istr.getline(buffer,200);
1261
1262 Int_t tmpbin;
1264 if (fEffBvsSLocal != 0) delete fEffBvsSLocal;
1265 fEffBvsSLocal = new TH1F( GetTestvarName() + "_effBvsSLocal",
1266 TString(GetName()) + " efficiency of B vs S", fNbins, 0.0, 1.0 );
1267 fEffBvsSLocal->SetDirectory(nullptr); // it's local
1268
1269 for (Int_t ibin=0; ibin<fNbins; ibin++) {
1270 istr >> tmpbin >> tmpeffS >> tmpeffB;
1271 fEffBvsSLocal->SetBinContent( ibin+1, tmpeffB );
1272
1273 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
1274 istr >> fCutMin[ivar][ibin] >> fCutMax[ivar][ibin];
1275 }
1276 }
1277
1278 fEffSMin = fEffBvsSLocal->GetBinCenter(1);
1279 fEffSMax = fEffBvsSLocal->GetBinCenter(fNbins);
1280}
1281
1282////////////////////////////////////////////////////////////////////////////////
1283/// create XML description for LD classification and regression
1284/// (for arbitrary number of output classes/targets)
1285
1286void TMVA::MethodCuts::AddWeightsXMLTo( void* parent ) const
1287{
1288 // write all necessary information to the stream
1289 std::vector<Double_t> cutsMin;
1290 std::vector<Double_t> cutsMax;
1291
1292 void* wght = gTools().AddChild(parent, "Weights");
1293 gTools().AddAttr( wght, "OptimisationMethod", (Int_t)fEffMethod);
1294 gTools().AddAttr( wght, "FitMethod", (Int_t)fFitMethod );
1295 gTools().AddAttr( wght, "nbins", fNbins );
1296 gTools().AddComment( wght, TString::Format( "Below are the optimised cuts for %i variables: Format: ibin(hist) effS effB cutMin[ivar=0] cutMax[ivar=0] ... cutMin[ivar=n-1] cutMax[ivar=n-1]", GetNvar() ) );
1297
1298 // NOTE: The signal efficiency written out into
1299 // the weight file does not correspond to the center of the bin within which the
1300 // background rejection is maximised (as before) but to the lower left edge of it.
1301 // This is because the cut optimisation algorithm determines the best background
1302 // rejection for all signal efficiencies belonging into a bin. Since the best background
1303 // rejection is in general obtained for the lowest possible signal efficiency, the
1304 // reference signal efficiency is the lowest value in the bin.
1305
1306 for (Int_t ibin=0; ibin<fNbins; ibin++) {
1307 Double_t effS = fEffBvsSLocal->GetBinCenter ( ibin + 1 );
1308 Double_t trueEffS = GetCuts( effS, cutsMin, cutsMax );
1309 if (TMath::Abs(trueEffS) < 1e-10) trueEffS = 0;
1310
1311 void* binxml = gTools().AddChild( wght, "Bin" );
1312 gTools().AddAttr( binxml, "ibin", ibin+1 );
1313 gTools().AddAttr( binxml, "effS", trueEffS );
1314 gTools().AddAttr( binxml, "effB", fEffBvsSLocal->GetBinContent( ibin + 1 ) );
1315 void* cutsxml = gTools().AddChild( binxml, "Cuts" );
1316 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
1317 gTools().AddAttr( cutsxml, TString::Format( "cutMin_%i", ivar ), cutsMin[ivar] );
1318 gTools().AddAttr( cutsxml, TString::Format( "cutMax_%i", ivar ), cutsMax[ivar] );
1319 }
1320 }
1321}
1322
1323////////////////////////////////////////////////////////////////////////////////
1324/// read coefficients from xml weight file
1325
1327{
1328 // delete old min and max
1329 for (UInt_t i=0; i<GetNvar(); i++) {
1330 if (fCutMin[i] != 0) delete [] fCutMin[i];
1331 if (fCutMax[i] != 0) delete [] fCutMax[i];
1332 }
1333 if (fCutMin != 0) delete [] fCutMin;
1334 if (fCutMax != 0) delete [] fCutMax;
1335
1337 gTools().ReadAttr( wghtnode, "OptimisationMethod", tmpEffMethod );
1338 gTools().ReadAttr( wghtnode, "FitMethod", tmpFitMethod );
1339 gTools().ReadAttr( wghtnode, "nbins", fNbins );
1340
1341 fEffMethod = (EEffMethod)tmpEffMethod;
1342 fFitMethod = (EFitMethodType)tmpFitMethod;
1343
1344 // print some information
1345 if (fFitMethod == kUseMonteCarlo) {
1346 Log() << kINFO << "Read cuts optimised using sample of MC events" << Endl;
1347 }
1348 else if (fFitMethod == kUseMonteCarloEvents) {
1349 Log() << kINFO << "Read cuts optimised using sample of MC-Event events" << Endl;
1350 }
1351 else if (fFitMethod == kUseGeneticAlgorithm) {
1352 Log() << kINFO << "Read cuts optimised using Genetic Algorithm" << Endl;
1353 }
1354 else if (fFitMethod == kUseSimulatedAnnealing) {
1355 Log() << kINFO << "Read cuts optimised using Simulated Annealing algorithm" << Endl;
1356 }
1357 else if (fFitMethod == kUseEventScan) {
1358 Log() << kINFO << "Read cuts optimised using Full Event Scan" << Endl;
1359 }
1360 else {
1361 Log() << kWARNING << "unknown method: " << fFitMethod << Endl;
1362 }
1363 Log() << kINFO << "Reading " << fNbins << " signal efficiency bins for " << GetNvar() << " variables" << Endl;
1364
1365 delete fEffBvsSLocal;
1366 fEffBvsSLocal = new TH1F( GetTestvarName() + "_effBvsSLocal",
1367 TString(GetName()) + " efficiency of B vs S", fNbins, 0.0, 1.0 );
1368 fEffBvsSLocal->SetDirectory(nullptr); // it's local
1369 for (Int_t ibin=1; ibin<=fNbins; ibin++) fEffBvsSLocal->SetBinContent( ibin, -0.1 ); // Init
1370
1371 fCutMin = new Double_t*[GetNvar()];
1372 fCutMax = new Double_t*[GetNvar()];
1373 for (UInt_t i=0;i<GetNvar();i++) {
1374 fCutMin[i] = new Double_t[fNbins];
1375 fCutMax[i] = new Double_t[fNbins];
1376 }
1377
1378 // read efficiencies and cuts
1379 Int_t tmpbin;
1381 void* ch = gTools().GetChild(wghtnode,"Bin");
1382 while (ch) {
1383 // if (strcmp(gTools().GetName(ch),"Bin") !=0) {
1384 // ch = gTools().GetNextChild(ch);
1385 // continue;
1386 // }
1387
1388 gTools().ReadAttr( ch, "ibin", tmpbin );
1389 gTools().ReadAttr( ch, "effS", tmpeffS );
1390 gTools().ReadAttr( ch, "effB", tmpeffB );
1391
1392 // sanity check
1393 if (tmpbin-1 >= fNbins || tmpbin-1 < 0) {
1394 Log() << kFATAL << "Mismatch in bins: " << tmpbin-1 << " >= " << fNbins << Endl;
1395 }
1396
1397 fEffBvsSLocal->SetBinContent( tmpbin, tmpeffB );
1398 void* ct = gTools().GetChild(ch);
1399 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
1400 gTools().ReadAttr( ct, TString::Format( "cutMin_%i", ivar ), fCutMin[ivar][tmpbin-1] );
1401 gTools().ReadAttr( ct, TString::Format( "cutMax_%i", ivar ), fCutMax[ivar][tmpbin-1] );
1402 }
1403 ch = gTools().GetNextChild(ch, "Bin");
1404 }
1405}
1406
1407////////////////////////////////////////////////////////////////////////////////
1408/// write histograms and PDFs to file for monitoring purposes
1409
1411{
1412 Log() << kINFO << "Write monitoring histograms to file: " << BaseDir()->GetPath() << Endl;
1413
1414 fEffBvsSLocal->Write();
1415
1416 // save reference histograms to file
1417 if (fEffMethod == kUsePDFs) {
1418 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
1419 (*fVarHistS)[ivar]->Write();
1420 (*fVarHistB)[ivar]->Write();
1421 (*fVarHistS_smooth)[ivar]->Write();
1422 (*fVarHistB_smooth)[ivar]->Write();
1423 (*fVarPdfS)[ivar]->GetPDFHist()->Write();
1424 (*fVarPdfB)[ivar]->GetPDFHist()->Write();
1425 }
1426 }
1427}
1428
1429////////////////////////////////////////////////////////////////////////////////
1430/// Overloaded function to create background efficiency (rejection) versus
1431/// signal efficiency plot (first call of this function).
1432///
1433/// The function returns the signal efficiency at background efficiency
1434/// indicated in theString
1435///
1436/// "theString" must have two entries:
1437/// - `[0]`: "Efficiency"
1438/// - `[1]`: the value of background efficiency at which the signal efficiency
1439/// is to be returned
1440
1442{
1443 // parse input string for required background efficiency
1445 // sanity check
1446 if (list->GetSize() != 2) {
1447 Log() << kFATAL << "<GetTrainingEfficiency> wrong number of arguments"
1448 << " in string: " << theString
1449 << " | required format, e.g., Efficiency:0.05" << Endl;
1450 return -1;
1451 }
1452
1453 Results* results = Data()->GetResults(GetMethodName(), Types::kTesting, GetAnalysisType());
1454
1455 // that will be the value of the efficiency retured (does not affect
1456 // the efficiency-vs-bkg plot which is done anyway.
1457 Float_t effBref = atof( ((TObjString*)list->At(1))->GetString() );
1458
1459 delete list;
1460
1461 // first round ? --> create histograms
1462 if (results->GetHist("EFF_BVSS_TR")==0) {
1463
1464 if (fBinaryTreeS != 0) { delete fBinaryTreeS; fBinaryTreeS = 0; }
1465 if (fBinaryTreeB != 0) { delete fBinaryTreeB; fBinaryTreeB = 0; }
1466
1467 fBinaryTreeS = new BinarySearchTree();
1468 fBinaryTreeS->Fill( GetEventCollection(Types::kTraining), fSignalClass );
1469 fBinaryTreeB = new BinarySearchTree();
1470 fBinaryTreeB->Fill( GetEventCollection(Types::kTraining), fBackgroundClass );
1471 // there is no really good equivalent to the fEffS; fEffB (efficiency vs cutvalue)
1472 // for the "Cuts" method (unless we had only one cut). Maybe later I might add here
1473 // histograms for each of the cuts...but this would require also a change in the
1474 // base class, and it is not really necessary, as we get exactly THIS info from the
1475 // "evaluateAllVariables" anyway.
1476
1477 // now create efficiency curve: background versus signal
1478 TH1* eff_bvss_tr = new TH1F( GetTestvarName() + "_trainingEffBvsS", GetTestvarName() + "", fNbins, 0, 1 );
1479 for (Int_t ibin=1; ibin<=fNbins; ibin++) eff_bvss_tr->SetBinContent( ibin, -0.1 ); // Init
1480 TH1* rej_bvss_tr = new TH1F( GetTestvarName() + "_trainingRejBvsS", GetTestvarName() + "", fNbins, 0, 1 );
1481 for (Int_t ibin=1; ibin<=fNbins; ibin++) rej_bvss_tr->SetBinContent( ibin, 0. ); // Init
1482 results->Store(eff_bvss_tr, "EFF_BVSS_TR");
1483 results->Store(rej_bvss_tr, "REJ_BVSS_TR");
1484
1485 // use root finder
1486
1487 // make the background-vs-signal efficiency plot
1488 Double_t* tmpCutMin = new Double_t[GetNvar()];
1489 Double_t* tmpCutMax = new Double_t[GetNvar()];
1491 for (Int_t bini=1; bini<=fNbins; bini++) {
1492 for (UInt_t ivar=0; ivar <GetNvar(); ivar++){
1493 tmpCutMin[ivar] = fCutMin[ivar][bini-1];
1494 tmpCutMax[ivar] = fCutMax[ivar][bini-1];
1495 }
1496 // find cut value corresponding to a given signal efficiency
1498 this->GetEffsfromSelection( &tmpCutMin[0], &tmpCutMax[0], effS, effB);
1499 // check that effS matches bini
1500 Int_t effBin = eff_bvss_tr->GetXaxis()->FindBin(effS);
1501 if (effBin != bini){
1502 Log()<< kVERBOSE << "unable to fill efficiency bin " << bini<< " " << effBin <<Endl;
1503 nFailedBins++;
1504 }
1505 else{
1506 // and fill histograms
1507 eff_bvss_tr->SetBinContent( bini, effB );
1508 rej_bvss_tr->SetBinContent( bini, 1.0-effB );
1509 }
1510 }
1511 if (nFailedBins>0) Log()<< kWARNING << " unable to fill "<< nFailedBins <<" efficiency bins " <<Endl;
1512
1513 delete [] tmpCutMin;
1514 delete [] tmpCutMax;
1515
1516 // create splines for histogram
1517 fSplTrainEffBvsS = new TSpline1( "trainEffBvsS", new TGraph( eff_bvss_tr ) );
1518 }
1519
1520 // must exist...
1521 if (NULL == fSplTrainEffBvsS) return 0.0;
1522
1523 // now find signal efficiency that corresponds to required background efficiency
1524 Double_t effS = 0., effB, effS_ = 0., effB_ = 0.;
1525 Int_t nbins_ = 1000;
1526
1527 // loop over efficiency bins until the background eff. matches the requirement
1528 for (Int_t bini=1; bini<=nbins_; bini++) {
1529 // get corresponding signal and background efficiencies
1530 effS = (bini - 0.5)/Float_t(nbins_);
1531 effB = fSplTrainEffBvsS->Eval( effS );
1532
1533 // find signal efficiency that corresponds to required background efficiency
1534 if ((effB - effBref)*(effB_ - effBref) < 0) break;
1535 effS_ = effS;
1536 effB_ = effB;
1537 }
1538
1539 return 0.5*(effS + effS_);
1540}
1541
1542////////////////////////////////////////////////////////////////////////////////
1543/// Overloaded function to create background efficiency (rejection) versus
1544/// signal efficiency plot (first call of this function).
1545///
1546/// The function returns the signal efficiency at background efficiency
1547/// indicated in theString
1548///
1549/// "theString" must have two entries:
1550/// - `[0]`: "Efficiency"
1551/// - `[1]`: the value of background efficiency at which the signal efficiency
1552/// is to be returned
1553
1555{
1556 Data()->SetCurrentType(type);
1557
1558 Results* results = Data()->GetResults( GetMethodName(), Types::kTesting, GetAnalysisType() );
1559
1560 // parse input string for required background efficiency
1561 TList* list = gTools().ParseFormatLine( theString, ":" );
1562
1563 if (list->GetSize() > 2) {
1564 delete list;
1565 Log() << kFATAL << "<GetEfficiency> wrong number of arguments"
1566 << " in string: " << theString
1567 << " | required format, e.g., Efficiency:0.05, or empty string" << Endl;
1568 return -1;
1569 }
1570
1571 // sanity check
1572 Bool_t computeArea = (list->GetSize() < 2); // the area is computed
1573
1574 // that will be the value of the efficiency retured (does not affect
1575 // the efficiency-vs-bkg plot which is done anyway.
1576 Float_t effBref = (computeArea?1.:atof( ((TObjString*)list->At(1))->GetString() ));
1577
1578 delete list;
1579
1580
1581 // first round ? --> create histograms
1582 if (results->GetHist("MVA_EFF_BvsS")==0) {
1583
1584 if (fBinaryTreeS!=0) { delete fBinaryTreeS; fBinaryTreeS = 0; }
1585 if (fBinaryTreeB!=0) { delete fBinaryTreeB; fBinaryTreeB = 0; }
1586
1587 // the variables may be transformed by a transformation method: to coherently
1588 // treat signal and background one must decide which transformation type shall
1589 // be used: our default is signal-type
1590 fBinaryTreeS = new BinarySearchTree();
1591 fBinaryTreeS->Fill( GetEventCollection(Types::kTesting), fSignalClass );
1592 fBinaryTreeB = new BinarySearchTree();
1593 fBinaryTreeB->Fill( GetEventCollection(Types::kTesting), fBackgroundClass );
1594
1595 // there is no really good equivalent to the fEffS; fEffB (efficiency vs cutvalue)
1596 // for the "Cuts" method (unless we had only one cut). Maybe later I might add here
1597 // histograms for each of the cuts...but this would require also a change in the
1598 // base class, and it is not really necessary, as we get exactly THIS info from the
1599 // "evaluateAllVariables" anyway.
1600
1601 // now create efficiency curve: background versus signal
1602 TH1* eff_BvsS = new TH1F( GetTestvarName() + "_effBvsS", GetTestvarName() + "", fNbins, 0, 1 );
1603 for (Int_t ibin=1; ibin<=fNbins; ibin++) eff_BvsS->SetBinContent( ibin, -0.1 ); // Init
1604 TH1* rej_BvsS = new TH1F( GetTestvarName() + "_rejBvsS", GetTestvarName() + "", fNbins, 0, 1 );
1605 for (Int_t ibin=1; ibin<=fNbins; ibin++) rej_BvsS->SetBinContent( ibin, 0.0 ); // Init
1606 results->Store(eff_BvsS, "MVA_EFF_BvsS");
1607 results->Store(rej_BvsS);
1608
1609 Double_t xmin = 0.;
1610 Double_t xmax = 1.000001;
1611
1612 TH1* eff_s = new TH1F( GetTestvarName() + "_effS", GetTestvarName() + " (signal)", fNbins, xmin, xmax);
1613 for (Int_t ibin=1; ibin<=fNbins; ibin++) eff_s->SetBinContent( ibin, -0.1 ); // Init
1614 TH1* eff_b = new TH1F( GetTestvarName() + "_effB", GetTestvarName() + " (background)", fNbins, xmin, xmax);
1615 for (Int_t ibin=1; ibin<=fNbins; ibin++) eff_b->SetBinContent( ibin, -0.1 ); // Init
1616 results->Store(eff_s, "MVA_S");
1617 results->Store(eff_b, "MVA_B");
1618
1619 // use root finder
1620
1621 // make the background-vs-signal efficiency plot
1622 Double_t* tmpCutMin = new Double_t[GetNvar()];
1623 Double_t* tmpCutMax = new Double_t[GetNvar()];
1624 TGraph* tmpBvsS = new TGraph(fNbins+1);
1625 tmpBvsS->SetPoint(0, 0., 0.);
1626
1627 for (Int_t bini=1; bini<=fNbins; bini++) {
1628 for (UInt_t ivar=0; ivar <GetNvar(); ivar++) {
1629 tmpCutMin[ivar] = fCutMin[ivar][bini-1];
1630 tmpCutMax[ivar] = fCutMax[ivar][bini-1];
1631 }
1632 // find cut value corresponding to a given signal efficiency
1634 this->GetEffsfromSelection( &tmpCutMin[0], &tmpCutMax[0], effS, effB);
1635 tmpBvsS->SetPoint(bini, effS, effB);
1636
1637 eff_s->SetBinContent(bini, effS);
1638 eff_b->SetBinContent(bini, effB);
1639 }
1640 tmpBvsS->SetPoint(fNbins+1, 1., 1.);
1641
1642 delete [] tmpCutMin;
1643 delete [] tmpCutMax;
1644
1645 // create splines for histogram
1646 fSpleffBvsS = new TSpline1( "effBvsS", tmpBvsS );
1647 for (Int_t bini=1; bini<=fNbins; bini++) {
1648 Double_t effS = (bini - 0.5)/Float_t(fNbins);
1649 Double_t effB = fSpleffBvsS->Eval( effS );
1650 eff_BvsS->SetBinContent( bini, effB );
1651 rej_BvsS->SetBinContent( bini, 1.0-effB );
1652 }
1653 }
1654
1655 // must exist...
1656 if (NULL == fSpleffBvsS) return 0.0;
1657
1658 // now find signal efficiency that corresponds to required background efficiency
1659 Double_t effS = 0, effB = 0, effS_ = 0, effB_ = 0;
1660 Int_t nbins_ = 1000;
1661
1662 if (computeArea) {
1663
1664 // compute area of rej-vs-eff plot
1665 Double_t integral = 0;
1666 for (Int_t bini=1; bini<=nbins_; bini++) {
1667
1668 // get corresponding signal and background efficiencies
1669 effS = (bini - 0.5)/Float_t(nbins_);
1670 effB = fSpleffBvsS->Eval( effS );
1671 integral += (1.0 - effB);
1672 }
1673 integral /= nbins_;
1674
1675 return integral;
1676 }
1677 else {
1678
1679 // loop over efficiency bins until the background eff. matches the requirement
1680 for (Int_t bini=1; bini<=nbins_; bini++) {
1681 // get corresponding signal and background efficiencies
1682 effS = (bini - 0.5)/Float_t(nbins_);
1683 effB = fSpleffBvsS->Eval( effS );
1684
1685 // find signal efficiency that corresponds to required background efficiency
1686 if ((effB - effBref)*(effB_ - effBref) < 0) break;
1687 effS_ = effS;
1688 effB_ = effB;
1689 }
1690
1691 effS = 0.5*(effS + effS_);
1692 effSerr = 0;
1693 if (Data()->GetNEvtSigTest() > 0)
1694 effSerr = TMath::Sqrt( effS*(1.0 - effS)/Double_t(Data()->GetNEvtSigTest()) );
1695
1696 return effS;
1697
1698 }
1699
1700 return -1;
1701}
1702
1703////////////////////////////////////////////////////////////////////////////////
1704/// write specific classifier response
1705
1706void TMVA::MethodCuts::MakeClassSpecific( std::ostream& fout, const TString& className ) const
1707{
1708 fout << " // not implemented for class: \"" << className << "\"" << std::endl;
1709 fout << "};" << std::endl;
1710}
1711
1712////////////////////////////////////////////////////////////////////////////////
1713/// get help message text
1714///
1715/// typical length of text line:
1716/// "|--------------------------------------------------------------|"
1717
1719{
1720 TString bold = gConfig().WriteOptionsReference() ? "<b>" : "";
1721 TString resbold = gConfig().WriteOptionsReference() ? "</b>" : "";
1722 TString brk = gConfig().WriteOptionsReference() ? "<br>" : "";
1723
1724 Log() << Endl;
1725 Log() << gTools().Color("bold") << "--- Short description:" << gTools().Color("reset") << Endl;
1726 Log() << Endl;
1727 Log() << "The optimisation of rectangular cuts performed by TMVA maximises " << Endl;
1728 Log() << "the background rejection at given signal efficiency, and scans " << Endl;
1729 Log() << "over the full range of the latter quantity. Three optimisation" << Endl;
1730 Log() << "methods are optional: Monte Carlo sampling (MC), a Genetics" << Endl;
1731 Log() << "Algorithm (GA), and Simulated Annealing (SA). GA and SA are" << Endl;
1732 Log() << "expected to perform best." << Endl;
1733 Log() << Endl;
1734 Log() << "The difficulty to find the optimal cuts strongly increases with" << Endl;
1735 Log() << "the dimensionality (number of input variables) of the problem." << Endl;
1736 Log() << "This behavior is due to the non-uniqueness of the solution space."<< Endl;
1737 Log() << Endl;
1738 Log() << gTools().Color("bold") << "--- Performance optimisation:" << gTools().Color("reset") << Endl;
1739 Log() << Endl;
1740 Log() << "If the dimensionality exceeds, say, 4 input variables, it is " << Endl;
1741 Log() << "advisable to scrutinize the separation power of the variables," << Endl;
1742 Log() << "and to remove the weakest ones. If some among the input variables" << Endl;
1743 Log() << "can be described by a single cut (e.g., because signal tends to be" << Endl;
1744 Log() << "larger than background), this can be indicated to MethodCuts via" << Endl;
1745 Log() << "the \"Fsmart\" options (see option string). Choosing this option" << Endl;
1746 Log() << "reduces the number of requirements for the variable from 2 (min/max)" << Endl;
1747 Log() << "to a single one (TMVA finds out whether it is to be interpreted as" << Endl;
1748 Log() << "min or max)." << Endl;
1749 Log() << Endl;
1750 Log() << gTools().Color("bold") << "--- Performance tuning via configuration options:" << gTools().Color("reset") << Endl;
1751 Log() << "" << Endl;
1752 Log() << bold << "Monte Carlo sampling:" << resbold << Endl;
1753 Log() << "" << Endl;
1754 Log() << "Apart form the \"Fsmart\" option for the variables, the only way" << Endl;
1755 Log() << "to improve the MC sampling is to increase the sampling rate. This" << Endl;
1756 Log() << "is done via the configuration option \"MC_NRandCuts\". The execution" << Endl;
1757 Log() << "time scales linearly with the sampling rate." << Endl;
1758 Log() << "" << Endl;
1759 Log() << bold << "Genetic Algorithm:" << resbold << Endl;
1760 Log() << "" << Endl;
1761 Log() << "The algorithm terminates if no significant fitness increase has" << Endl;
1762 Log() << "been achieved within the last \"nsteps\" steps of the calculation." << Endl;
1763 Log() << "Wiggles in the ROC curve or constant background rejection of 1" << Endl;
1764 Log() << "indicate that the GA failed to always converge at the true maximum" << Endl;
1765 Log() << "fitness. In such a case, it is recommended to broaden the search " << Endl;
1766 Log() << "by increasing the population size (\"popSize\") and to give the GA " << Endl;
1767 Log() << "more time to find improvements by increasing the number of steps" << Endl;
1768 Log() << "(\"nsteps\")" << Endl;
1769 Log() << " -> increase \"popSize\" (at least >10 * number of variables)" << Endl;
1770 Log() << " -> increase \"nsteps\"" << Endl;
1771 Log() << "" << Endl;
1772 Log() << bold << "Simulated Annealing (SA) algorithm:" << resbold << Endl;
1773 Log() << "" << Endl;
1774 Log() << "\"Increasing Adaptive\" approach:" << Endl;
1775 Log() << "" << Endl;
1776 Log() << "The algorithm seeks local minima and explores their neighborhoods, while" << Endl;
1777 Log() << "changing the ambient temperature depending on the number of failures" << Endl;
1778 Log() << "in the previous steps. The performance can be improved by increasing" << Endl;
1779 Log() << "the number of iteration steps (\"MaxCalls\"), or by adjusting the" << Endl;
1780 Log() << "minimal temperature (\"MinTemperature\"). Manual adjustments of the" << Endl;
1781 Log() << "speed of the temperature increase (\"TemperatureScale\" and \"AdaptiveSpeed\")" << Endl;
1782 Log() << "to individual data sets should also help. Summary:" << brk << Endl;
1783 Log() << " -> increase \"MaxCalls\"" << brk << Endl;
1784 Log() << " -> adjust \"MinTemperature\"" << brk << Endl;
1785 Log() << " -> adjust \"TemperatureScale\"" << brk << Endl;
1786 Log() << " -> adjust \"AdaptiveSpeed\"" << Endl;
1787 Log() << "" << Endl;
1788 Log() << "\"Decreasing Adaptive\" approach:" << Endl;
1789 Log() << "" << Endl;
1790 Log() << "The algorithm calculates the initial temperature (based on the effect-" << Endl;
1791 Log() << "iveness of large steps) and the multiplier that ensures to reach the" << Endl;
1792 Log() << "minimal temperature with the requested number of iteration steps." << Endl;
1793 Log() << "The performance can be improved by adjusting the minimal temperature" << Endl;
1794 Log() << " (\"MinTemperature\") and by increasing number of steps (\"MaxCalls\"):" << brk << Endl;
1795 Log() << " -> increase \"MaxCalls\"" << brk << Endl;
1796 Log() << " -> adjust \"MinTemperature\"" << Endl;
1797 Log() << " " << Endl;
1798 Log() << "Other kernels:" << Endl;
1799 Log() << "" << Endl;
1800 Log() << "Alternative ways of counting the temperature change are implemented. " << Endl;
1801 Log() << "Each of them starts with the maximum temperature (\"MaxTemperature\")" << Endl;
1802 Log() << "and decreases while changing the temperature according to a given" << Endl;
1803 Log() << "prescription:" << brk << Endl;
1804 Log() << "CurrentTemperature =" << brk << Endl;
1805 Log() << " - Sqrt: InitialTemperature / Sqrt(StepNumber+2) * TemperatureScale" << brk << Endl;
1806 Log() << " - Log: InitialTemperature / Log(StepNumber+2) * TemperatureScale" << brk << Endl;
1807 Log() << " - Homo: InitialTemperature / (StepNumber+2) * TemperatureScale" << brk << Endl;
1808 Log() << " - Sin: (Sin(StepNumber / TemperatureScale) + 1) / (StepNumber + 1)*InitialTemperature + Eps" << brk << Endl;
1809 Log() << " - Geo: CurrentTemperature * TemperatureScale" << Endl;
1810 Log() << "" << Endl;
1811 Log() << "Their performance can be improved by adjusting initial temperature" << Endl;
1812 Log() << "(\"InitialTemperature\"), the number of iteration steps (\"MaxCalls\")," << Endl;
1813 Log() << "and the multiplier that scales the temperature decrease" << Endl;
1814 Log() << "(\"TemperatureScale\")" << brk << Endl;
1815 Log() << " -> increase \"MaxCalls\"" << brk << Endl;
1816 Log() << " -> adjust \"InitialTemperature\"" << brk << Endl;
1817 Log() << " -> adjust \"TemperatureScale\"" << brk << Endl;
1818 Log() << " -> adjust \"KernelTemperature\"" << Endl;
1819}
#define REGISTER_METHOD(CLASS)
for example
#define e(i)
Definition RSha256.hxx:103
int Int_t
Signed integer 4 bytes (int)
Definition RtypesCore.h:59
float Float_t
Float 4 bytes (float)
Definition RtypesCore.h:71
constexpr Bool_t kFALSE
Definition RtypesCore.h:108
double Double_t
Double 8 bytes.
Definition RtypesCore.h:73
constexpr Bool_t kTRUE
Definition RtypesCore.h:107
ROOT::Detail::TRangeCast< T, true > TRangeDynCast
TRangeDynCast is an adapter class that allows the typed iteration through a TCollection.
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t type
float xmin
float xmax
char * Form(const char *fmt,...)
Formats a string in a circular formatting buffer.
Definition TString.cxx:2495
A TGraph is an object made of two arrays X and Y with npoints each.
Definition TGraph.h:41
1-D histogram with a float per channel (see TH1 documentation)
Definition TH1.h:879
TH1 is the base class of all histogram classes in ROOT.
Definition TH1.h:109
A doubly linked list.
Definition TList.h:38
A simple Binary search tree including a volume search method.
Bool_t WriteOptionsReference() const
Definition Config.h:65
void CheckForUnusedOptions() const
checks for unused options in option string
Class that contains all the data information.
Definition DataSetInfo.h:62
Base class for TMVA fitters.
Definition FitterBase.h:51
void SetIPythonInteractive(bool *ExitFromTraining, UInt_t *fIPyMaxIter_, UInt_t *fIPyCurrentIter_)
Definition FitterBase.h:73
Double_t Run()
estimator function interface for fitting
Fitter using a Genetic Algorithm.
The TMVA::Interval Class.
Definition Interval.h:61
Fitter using Monte Carlo sampling of parameters.
Definition MCFitter.h:44
Virtual base Class for all MVA method.
Definition MethodBase.h:111
friend class MethodCuts
Definition MethodBase.h:603
Double_t EstimatorFunction(std::vector< Double_t > &) override
returns estimator for "cut fitness" used by GA
Double_t ComputeEstimator(std::vector< Double_t > &)
returns estimator for "cut fitness" used by GA.
void MakeClassSpecific(std::ostream &, const TString &) const override
write specific classifier response
void AddWeightsXMLTo(void *parent) const override
create XML description for LD classification and regression (for arbitrary number of output classes/t...
void DeclareOptions() override
define the options (their key words) that can be set in the option string.
void Train(void) override
training method: here the cuts are optimised for the training sample
void MatchCutsToPars(std::vector< Double_t > &, Double_t *, Double_t *)
translates cuts into parameters
Double_t GetTrainingEfficiency(const TString &) override
Overloaded function to create background efficiency (rejection) versus signal efficiency plot (first ...
void ProcessOptions() override
process user options.
static const Double_t fgMaxAbsCutVal
Definition MethodCuts.h:130
Bool_t HasAnalysisType(Types::EAnalysisType type, UInt_t numberClasses, UInt_t numberTargets) override
Cuts can only handle classification with 2 classes.
void CreateVariablePDFs(void)
for PDF method: create efficiency reference histograms and PDFs
void ReadWeightsFromStream(std::istream &i) override
read the cuts from stream
void GetEffsfromSelection(Double_t *cutMin, Double_t *cutMax, Double_t &effS, Double_t &effB)
compute signal and background efficiencies from event counting for given cut sample
void TestClassification() override
nothing to test
void WriteMonitoringHistosToFile(void) const override
write histograms and PDFs to file for monitoring purposes
void MatchParsToCuts(const std::vector< Double_t > &, Double_t *, Double_t *)
translates parameters into cuts
virtual ~MethodCuts(void)
destructor
Double_t GetEfficiency(const TString &, Types::ETreeType, Double_t &) override
Overloaded function to create background efficiency (rejection) versus signal efficiency plot (first ...
void Init(void) override
default initialisation called by all constructors
Double_t GetMvaValue(Double_t *err=nullptr, Double_t *errUpper=nullptr) override
cut evaluation: returns 1.0 if event passed, 0.0 otherwise
void ReadWeightsFromXML(void *wghtnode) override
read coefficients from xml weight file
void GetHelpMessage() const override
get help message text
void GetEffsfromPDFs(Double_t *cutMin, Double_t *cutMax, Double_t &effS, Double_t &effB)
compute signal and background efficiencies from PDFs for given cut sample
Double_t GetCuts(Double_t effS, std::vector< Double_t > &cutMin, std::vector< Double_t > &cutMax) const
retrieve cut values for given signal efficiency
void PrintCuts(Double_t effS) const
print cuts
/Fitter using MINUIT
PDF wrapper for histograms; uses user-defined spline interpolation.
Definition PDF.h:63
@ kSpline2
Definition PDF.h:70
Class that is the base-class for a vector of result.
Definition Results.h:57
Fitter using a Simulated Annealing Algorithm.
Linear interpolation of TGraph.
Definition TSpline1.h:43
Timing information for training and evaluation of MVA methods.
Definition Timer.h:58
TList * ParseFormatLine(TString theString, const char *sep=":")
Parse the string and cut into labels separated by ":".
Definition Tools.cxx:401
const TString & Color(const TString &)
human readable color strings
Definition Tools.cxx:828
void ReadAttr(void *node, const char *, T &value)
read attribute from xml
Definition Tools.h:329
Bool_t AddComment(void *node, const char *comment)
Definition Tools.cxx:1132
void * GetChild(void *parent, const char *childname=nullptr)
get child node
Definition Tools.cxx:1150
void AddAttr(void *node, const char *, const T &value, Int_t precision=16)
add attribute to xml
Definition Tools.h:347
void * AddChild(void *parent, const char *childname, const char *content=nullptr, bool isRootNode=false)
add child node
Definition Tools.cxx:1124
void * GetNextChild(void *prevchild, const char *childname=nullptr)
XML helpers.
Definition Tools.cxx:1162
Singleton class for Global types used by TMVA.
Definition Types.h:71
@ kSignal
Never change this number - it is elsewhere assumed to be zero !
Definition Types.h:135
@ kBackground
Definition Types.h:136
@ kClassification
Definition Types.h:127
@ kTraining
Definition Types.h:143
Volume for BinarySearchTree.
Definition Volume.h:47
Collectable string class.
Definition TObjString.h:28
virtual Int_t Write(const char *name=nullptr, Int_t option=0, Int_t bufsize=0)
Write this object to the current directory.
Definition TObject.cxx:964
Random number generator class based on M.
Definition TRandom3.h:27
Basic string class.
Definition TString.h:138
const char * Data() const
Definition TString.h:384
static TString Format(const char *fmt,...)
Static method which formats a string using a printf style format descriptor and return a TString.
Definition TString.cxx:2384
void Clone(Ssiz_t nc)
Make self a distinct copy with capacity of at least tot, where tot cannot be smaller than the current...
Definition TString.cxx:1285
Config & gConfig()
Tools & gTools()
MsgLogger & Endl(MsgLogger &ml)
Definition MsgLogger.h:148
Short_t Max(Short_t a, Short_t b)
Returns the largest of a and b.
Definition TMathBase.h:251
Double_t Sqrt(Double_t x)
Returns the square root of x.
Definition TMath.h:673
Short_t Min(Short_t a, Short_t b)
Returns the smallest of a and b.
Definition TMathBase.h:199
Short_t Abs(Short_t d)
Returns the absolute value of parameter Short_t d.
Definition TMathBase.h:124