Logo ROOT   6.12/07
Reference Guide
MethodBDT.cxx
Go to the documentation of this file.
1 // Author: Andreas Hoecker, Joerg Stelzer, Helge Voss, Kai Voss, Eckhard v. Toerne, Jan Therhaag
2 
3 /**********************************************************************************
4  * Project: TMVA - a Root-integrated toolkit for multivariate data analysis *
5  * Package: TMVA *
6  * Class : MethodBDT (BDT = Boosted Decision Trees) *
7  * Web : http://tmva.sourceforge.net *
8  * *
9  * Description: *
10  * Analysis of Boosted Decision Trees *
11  * *
12  * Authors (alphabetical): *
13  * Andreas Hoecker <Andreas.Hocker@cern.ch> - CERN, Switzerland *
14  * Helge Voss <Helge.Voss@cern.ch> - MPI-K Heidelberg, Germany *
15  * Kai Voss <Kai.Voss@cern.ch> - U. of Victoria, Canada *
16  * Doug Schouten <dschoute@sfu.ca> - Simon Fraser U., Canada *
17  * Jan Therhaag <jan.therhaag@cern.ch> - U. of Bonn, Germany *
18  * Eckhard v. Toerne <evt@uni-bonn.de> - U of Bonn, Germany *
19  * *
20  * Copyright (c) 2005-2011: *
21  * CERN, Switzerland *
22  * U. of Victoria, Canada *
23  * MPI-K Heidelberg, Germany *
24  * U. of Bonn, Germany *
25  * *
26  * Redistribution and use in source and binary forms, with or without *
27  * modification, are permitted according to the terms listed in LICENSE *
28  * (http://tmva.sourceforge.net/LICENSE) *
29  **********************************************************************************/
30 
31 /*! \class TMVA::MethodBDT
32 \ingroup TMVA
33 
34 Analysis of Boosted Decision Trees
35 
36 Boosted decision trees have been successfully used in High Energy
37 Physics analysis for example by the MiniBooNE experiment
38 (Yang-Roe-Zhu, physics/0508045). In Boosted Decision Trees, the
39 selection is done on a majority vote on the result of several decision
40 trees, which are all derived from the same training sample by
41 supplying different event weights during the training.
42 
43 ### Decision trees:
44 
45 Successive decision nodes are used to categorize the
46 events out of the sample as either signal or background. Each node
47 uses only a single discriminating variable to decide if the event is
48 signal-like ("goes right") or background-like ("goes left"). This
49 forms a tree like structure with "baskets" at the end (leave nodes),
50 and an event is classified as either signal or background according to
51 whether the basket where it ends up has been classified signal or
52 background during the training. Training of a decision tree is the
53 process to define the "cut criteria" for each node. The training
54 starts with the root node. Here one takes the full training event
55 sample and selects the variable and corresponding cut value that gives
56 the best separation between signal and background at this stage. Using
57 this cut criterion, the sample is then divided into two subsamples, a
58 signal-like (right) and a background-like (left) sample. Two new nodes
59 are then created for each of the two sub-samples and they are
60 constructed using the same mechanism as described for the root
61 node. The devision is stopped once a certain node has reached either a
62 minimum number of events, or a minimum or maximum signal purity. These
63 leave nodes are then called "signal" or "background" if they contain
64 more signal respective background events from the training sample.
65 
66 ### Boosting:
67 
68 The idea behind adaptive boosting (AdaBoost) is, that signal events
69 from the training sample, that end up in a background node
70 (and vice versa) are given a larger weight than events that are in
71 the correct leave node. This results in a re-weighed training event
72 sample, with which then a new decision tree can be developed.
73 The boosting can be applied several times (typically 100-500 times)
74 and one ends up with a set of decision trees (a forest).
75 Gradient boosting works more like a function expansion approach, where
76 each tree corresponds to a summand. The parameters for each summand (tree)
77 are determined by the minimization of a error function (binomial log-
78 likelihood for classification and Huber loss for regression).
79 A greedy algorithm is used, which means, that only one tree is modified
80 at a time, while the other trees stay fixed.
81 
82 ### Bagging:
83 
84 In this particular variant of the Boosted Decision Trees the boosting
85 is not done on the basis of previous training results, but by a simple
86 stochastic re-sampling of the initial training event sample.
87 
88 ### Random Trees:
89 
90 Similar to the "Random Forests" from Leo Breiman and Adele Cutler, it
91 uses the bagging algorithm together and bases the determination of the
92 best node-split during the training on a random subset of variables only
93 which is individually chosen for each split.
94 
95 ### Analysis:
96 
97 Applying an individual decision tree to a test event results in a
98 classification of the event as either signal or background. For the
99 boosted decision tree selection, an event is successively subjected to
100 the whole set of decision trees and depending on how often it is
101 classified as signal, a "likelihood" estimator is constructed for the
102 event being signal or background. The value of this estimator is the
103 one which is then used to select the events from an event sample, and
104 the cut value on this estimator defines the efficiency and purity of
105 the selection.
106 
107 */
108 
109 
110 #include "TMVA/MethodBDT.h"
111 
112 #include "TMVA/BDTEventWrapper.h"
113 #include "TMVA/BinarySearchTree.h"
114 #include "TMVA/ClassifierFactory.h"
115 #include "TMVA/Configurable.h"
116 #include "TMVA/CrossEntropy.h"
117 #include "TMVA/DecisionTree.h"
118 #include "TMVA/DataSet.h"
119 #include "TMVA/GiniIndex.h"
121 #include "TMVA/Interval.h"
122 #include "TMVA/IMethod.h"
123 #include "TMVA/LogInterval.h"
124 #include "TMVA/MethodBase.h"
126 #include "TMVA/MsgLogger.h"
128 #include "TMVA/PDF.h"
129 #include "TMVA/Ranking.h"
130 #include "TMVA/Results.h"
131 #include "TMVA/ResultsMulticlass.h"
132 #include "TMVA/SdivSqrtSplusB.h"
133 #include "TMVA/SeparationBase.h"
134 #include "TMVA/Timer.h"
135 #include "TMVA/Tools.h"
136 #include "TMVA/Types.h"
137 
138 #include "Riostream.h"
139 #include "TDirectory.h"
140 #include "TRandom3.h"
141 #include "TMath.h"
142 #include "TMatrixTSym.h"
143 #include "TObjString.h"
144 #include "TGraph.h"
145 
146 #include <algorithm>
147 #include <fstream>
148 #include <math.h>
149 #include <unordered_map>
150 
151 
152 using std::vector;
153 using std::make_pair;
154 
156 
158 
160 
161 ////////////////////////////////////////////////////////////////////////////////
162 /// The standard constructor for the "boosted decision trees".
163 
165  const TString& methodTitle,
166  DataSetInfo& theData,
167  const TString& theOption ) :
168  TMVA::MethodBase( jobName, Types::kBDT, methodTitle, theData, theOption)
169  , fTrainSample(0)
170  , fNTrees(0)
171  , fSigToBkgFraction(0)
172  , fAdaBoostBeta(0)
173 // , fTransitionPoint(0)
174  , fShrinkage(0)
175  , fBaggedBoost(kFALSE)
176  , fBaggedGradBoost(kFALSE)
177 // , fSumOfWeights(0)
178  , fMinNodeEvents(0)
179  , fMinNodeSize(5)
180  , fMinNodeSizeS("5%")
181  , fNCuts(0)
182  , fUseFisherCuts(0) // don't use this initialisation, only here to make Coverity happy. Is set in DeclarOptions()
183  , fMinLinCorrForFisher(.8) // don't use this initialisation, only here to make Coverity happy. Is set in DeclarOptions()
184  , fUseExclusiveVars(0) // don't use this initialisation, only here to make Coverity happy. Is set in DeclarOptions()
185  , fUseYesNoLeaf(kFALSE)
186  , fNodePurityLimit(0)
187  , fNNodesMax(0)
188  , fMaxDepth(0)
189  , fPruneMethod(DecisionTree::kNoPruning)
190  , fPruneStrength(0)
191  , fFValidationEvents(0)
192  , fAutomatic(kFALSE)
193  , fRandomisedTrees(kFALSE)
194  , fUseNvars(0)
195  , fUsePoissonNvars(0) // don't use this initialisation, only here to make Coverity happy. Is set in Init()
196  , fUseNTrainEvents(0)
197  , fBaggedSampleFraction(0)
198  , fNoNegWeightsInTraining(kFALSE)
199  , fInverseBoostNegWeights(kFALSE)
200  , fPairNegWeightsGlobal(kFALSE)
201  , fTrainWithNegWeights(kFALSE)
202  , fDoBoostMonitor(kFALSE)
203  , fITree(0)
204  , fBoostWeight(0)
205  , fErrorFraction(0)
206  , fCss(0)
207  , fCts_sb(0)
208  , fCtb_ss(0)
209  , fCbb(0)
210  , fDoPreselection(kFALSE)
211  , fSkipNormalization(kFALSE)
212  , fHistoricBool(kFALSE)
213 {
214  fMonitorNtuple = NULL;
215  fSepType = NULL;
216  fRegressionLossFunctionBDTG = nullptr;
217 }
218 
219 ////////////////////////////////////////////////////////////////////////////////
220 
222  const TString& theWeightFile)
223  : TMVA::MethodBase( Types::kBDT, theData, theWeightFile)
224  , fTrainSample(0)
225  , fNTrees(0)
226  , fSigToBkgFraction(0)
227  , fAdaBoostBeta(0)
228 // , fTransitionPoint(0)
229  , fShrinkage(0)
232 // , fSumOfWeights(0)
233  , fMinNodeEvents(0)
234  , fMinNodeSize(5)
235  , fMinNodeSizeS("5%")
236  , fNCuts(0)
237  , fUseFisherCuts(0) // don't use this initialisation, only here to make Coverity happy. Is set in DeclarOptions()
238  , fMinLinCorrForFisher(.8) // don't use this initialisation, only here to make Coverity happy. Is set in DeclarOptions()
239  , fUseExclusiveVars(0) // don't use this initialisation, only here to make Coverity happy. Is set in DeclarOptions()
241  , fNodePurityLimit(0)
242  , fNNodesMax(0)
243  , fMaxDepth(0)
244  , fPruneMethod(DecisionTree::kNoPruning)
245  , fPruneStrength(0)
246  , fFValidationEvents(0)
247  , fAutomatic(kFALSE)
249  , fUseNvars(0)
250  , fUsePoissonNvars(0) // don't use this initialisation, only here to make Coverity happy. Is set in Init()
251  , fUseNTrainEvents(0)
258  , fITree(0)
259  , fBoostWeight(0)
260  , fErrorFraction(0)
261  , fCss(0)
262  , fCts_sb(0)
263  , fCtb_ss(0)
264  , fCbb(0)
268 {
269  fMonitorNtuple = NULL;
270  fSepType = NULL;
271  fRegressionLossFunctionBDTG = nullptr;
272  // constructor for calculating BDT-MVA using previously generated decision trees
273  // the result of the previous training (the decision trees) are read in via the
274  // weight file. Make sure the the variables correspond to the ones used in
275  // creating the "weight"-file
276 }
277 
278 ////////////////////////////////////////////////////////////////////////////////
279 /// BDT can handle classification with multiple classes and regression with one regression-target.
280 
282 {
283  if (type == Types::kClassification && numberClasses == 2) return kTRUE;
284  if (type == Types::kMulticlass ) return kTRUE;
285  if( type == Types::kRegression && numberTargets == 1 ) return kTRUE;
286  return kFALSE;
287 }
288 
289 ////////////////////////////////////////////////////////////////////////////////
290 /// Define the options (their key words). That can be set in the option string.
291 ///
292 /// know options:
293 ///
294 /// - nTrees number of trees in the forest to be created
295 /// - BoostType the boosting type for the trees in the forest (AdaBoost e.t.c..).
296 /// Known:
297 /// - AdaBoost
298 /// - AdaBoostR2 (Adaboost for regression)
299 /// - Bagging
300 /// - GradBoost
301 /// - AdaBoostBeta the boosting parameter, beta, for AdaBoost
302 /// - UseRandomisedTrees choose at each node splitting a random set of variables
303 /// - UseNvars use UseNvars variables in randomised trees
304 /// - UsePoisson Nvars use UseNvars not as fixed number but as mean of a poisson distribution
305 /// - SeparationType the separation criterion applied in the node splitting.
306 /// Known:
307 /// - GiniIndex
308 /// - MisClassificationError
309 /// - CrossEntropy
310 /// - SDivSqrtSPlusB
311 /// - MinNodeSize: minimum percentage of training events in a leaf node (leaf criteria, stop splitting)
312 /// - nCuts: the number of steps in the optimisation of the cut for a node (if < 0, then
313 /// step size is determined by the events)
314 /// - UseFisherCuts: use multivariate splits using the Fisher criterion
315 /// - UseYesNoLeaf decide if the classification is done simply by the node type, or the S/B
316 /// (from the training) in the leaf node
317 /// - NodePurityLimit the minimum purity to classify a node as a signal node (used in pruning and boosting to determine
318 /// misclassification error rate)
319 /// - PruneMethod The Pruning method.
320 /// Known:
321 /// - NoPruning // switch off pruning completely
322 /// - ExpectedError
323 /// - CostComplexity
324 /// - PruneStrength a parameter to adjust the amount of pruning. Should be large enough such that overtraining is avoided.
325 /// - PruningValFraction number of events to use for optimizing pruning (only if PruneStrength < 0, i.e. automatic pruning)
326 /// - NegWeightTreatment
327 /// - IgnoreNegWeightsInTraining Ignore negative weight events in the training.
328 /// - DecreaseBoostWeight Boost ev. with neg. weight with 1/boostweight instead of boostweight
329 /// - PairNegWeightsGlobal Pair ev. with neg. and pos. weights in training sample and "annihilate" them
330 /// - MaxDepth maximum depth of the decision tree allowed before further splitting is stopped
331 /// - SkipNormalization Skip normalization at initialization, to keep expectation value of BDT output
332 /// according to the fraction of events
333 
335 {
336  DeclareOptionRef(fNTrees, "NTrees", "Number of trees in the forest");
337  if (DoRegression()) {
338  DeclareOptionRef(fMaxDepth=50,"MaxDepth","Max depth of the decision tree allowed");
339  }else{
340  DeclareOptionRef(fMaxDepth=3,"MaxDepth","Max depth of the decision tree allowed");
341  }
342 
343  TString tmp="5%"; if (DoRegression()) tmp="0.2%";
344  DeclareOptionRef(fMinNodeSizeS=tmp, "MinNodeSize", "Minimum percentage of training events required in a leaf node (default: Classification: 5%, Regression: 0.2%)");
345  // MinNodeSize: minimum percentage of training events in a leaf node (leaf criteria, stop splitting)
346  DeclareOptionRef(fNCuts, "nCuts", "Number of grid points in variable range used in finding optimal cut in node splitting");
347 
348  DeclareOptionRef(fBoostType, "BoostType", "Boosting type for the trees in the forest (note: AdaCost is still experimental)");
349 
350  AddPreDefVal(TString("AdaBoost"));
351  AddPreDefVal(TString("RealAdaBoost"));
352  AddPreDefVal(TString("AdaCost"));
353  AddPreDefVal(TString("Bagging"));
354  // AddPreDefVal(TString("RegBoost"));
355  AddPreDefVal(TString("AdaBoostR2"));
356  AddPreDefVal(TString("Grad"));
357  if (DoRegression()) {
358  fBoostType = "AdaBoostR2";
359  }else{
360  fBoostType = "AdaBoost";
361  }
362  DeclareOptionRef(fAdaBoostR2Loss="Quadratic", "AdaBoostR2Loss", "Type of Loss function in AdaBoostR2");
363  AddPreDefVal(TString("Linear"));
364  AddPreDefVal(TString("Quadratic"));
365  AddPreDefVal(TString("Exponential"));
366 
367  DeclareOptionRef(fBaggedBoost=kFALSE, "UseBaggedBoost","Use only a random subsample of all events for growing the trees in each boost iteration.");
368  DeclareOptionRef(fShrinkage=1.0, "Shrinkage", "Learning rate for GradBoost algorithm");
369  DeclareOptionRef(fAdaBoostBeta=.5, "AdaBoostBeta", "Learning rate for AdaBoost algorithm");
370  DeclareOptionRef(fRandomisedTrees,"UseRandomisedTrees","Determine at each node splitting the cut variable only as the best out of a random subset of variables (like in RandomForests)");
371  DeclareOptionRef(fUseNvars,"UseNvars","Size of the subset of variables used with RandomisedTree option");
372  DeclareOptionRef(fUsePoissonNvars,"UsePoissonNvars", "Interpret \"UseNvars\" not as fixed number but as mean of a Poisson distribution in each split with RandomisedTree option");
373  DeclareOptionRef(fBaggedSampleFraction=.6,"BaggedSampleFraction","Relative size of bagged event sample to original size of the data sample (used whenever bagging is used (i.e. UseBaggedBoost, Bagging,)" );
374 
375  DeclareOptionRef(fUseYesNoLeaf=kTRUE, "UseYesNoLeaf",
376  "Use Sig or Bkg categories, or the purity=S/(S+B) as classification of the leaf node -> Real-AdaBoost");
377  if (DoRegression()) {
379  }
380 
381  DeclareOptionRef(fNegWeightTreatment="InverseBoostNegWeights","NegWeightTreatment","How to treat events with negative weights in the BDT training (particular the boosting) : IgnoreInTraining; Boost With inverse boostweight; Pair events with negative and positive weights in training sample and *annihilate* them (experimental!)");
382  AddPreDefVal(TString("InverseBoostNegWeights"));
383  AddPreDefVal(TString("IgnoreNegWeightsInTraining"));
384  AddPreDefVal(TString("NoNegWeightsInTraining")); // well, let's be nice to users and keep at least this old name anyway ..
385  AddPreDefVal(TString("PairNegWeightsGlobal"));
386  AddPreDefVal(TString("Pray"));
387 
388 
389 
390  DeclareOptionRef(fCss=1., "Css", "AdaCost: cost of true signal selected signal");
391  DeclareOptionRef(fCts_sb=1.,"Cts_sb","AdaCost: cost of true signal selected bkg");
392  DeclareOptionRef(fCtb_ss=1.,"Ctb_ss","AdaCost: cost of true bkg selected signal");
393  DeclareOptionRef(fCbb=1., "Cbb", "AdaCost: cost of true bkg selected bkg ");
394 
395  DeclareOptionRef(fNodePurityLimit=0.5, "NodePurityLimit", "In boosting/pruning, nodes with purity > NodePurityLimit are signal; background otherwise.");
396 
397 
398  DeclareOptionRef(fSepTypeS, "SeparationType", "Separation criterion for node splitting");
399  AddPreDefVal(TString("CrossEntropy"));
400  AddPreDefVal(TString("GiniIndex"));
401  AddPreDefVal(TString("GiniIndexWithLaplace"));
402  AddPreDefVal(TString("MisClassificationError"));
403  AddPreDefVal(TString("SDivSqrtSPlusB"));
404  AddPreDefVal(TString("RegressionVariance"));
405  if (DoRegression()) {
406  fSepTypeS = "RegressionVariance";
407  }else{
408  fSepTypeS = "GiniIndex";
409  }
410 
411  DeclareOptionRef(fRegressionLossFunctionBDTGS = "Huber", "RegressionLossFunctionBDTG", "Loss function for BDTG regression.");
412  AddPreDefVal(TString("Huber"));
413  AddPreDefVal(TString("AbsoluteDeviation"));
414  AddPreDefVal(TString("LeastSquares"));
415 
416  DeclareOptionRef(fHuberQuantile = 0.7, "HuberQuantile", "In the Huber loss function this is the quantile that separates the core from the tails in the residuals distribution.");
417 
418  DeclareOptionRef(fDoBoostMonitor=kFALSE,"DoBoostMonitor","Create control plot with ROC integral vs tree number");
419 
420  DeclareOptionRef(fUseFisherCuts=kFALSE, "UseFisherCuts", "Use multivariate splits using the Fisher criterion");
421  DeclareOptionRef(fMinLinCorrForFisher=.8,"MinLinCorrForFisher", "The minimum linear correlation between two variables demanded for use in Fisher criterion in node splitting");
422  DeclareOptionRef(fUseExclusiveVars=kFALSE,"UseExclusiveVars","Variables already used in fisher criterion are not anymore analysed individually for node splitting");
423 
424 
425  DeclareOptionRef(fDoPreselection=kFALSE,"DoPreselection","and and apply automatic pre-selection for 100% efficient signal (bkg) cuts prior to training");
426 
427 
428  DeclareOptionRef(fSigToBkgFraction=1,"SigToBkgFraction","Sig to Bkg ratio used in Training (similar to NodePurityLimit, which cannot be used in real adaboost");
429 
430  DeclareOptionRef(fPruneMethodS, "PruneMethod", "Note: for BDTs use small trees (e.g.MaxDepth=3) and NoPruning: Pruning: Method used for pruning (removal) of statistically insignificant branches ");
431  AddPreDefVal(TString("NoPruning"));
432  AddPreDefVal(TString("ExpectedError"));
433  AddPreDefVal(TString("CostComplexity"));
434 
435  DeclareOptionRef(fPruneStrength, "PruneStrength", "Pruning strength");
436 
437  DeclareOptionRef(fFValidationEvents=0.5, "PruningValFraction", "Fraction of events to use for optimizing automatic pruning.");
438 
439  DeclareOptionRef(fSkipNormalization=kFALSE, "SkipNormalization", "Skip normalization at initialization, to keep expectation value of BDT output according to the fraction of events");
440 
441  // deprecated options, still kept for the moment:
442  DeclareOptionRef(fMinNodeEvents=0, "nEventsMin", "deprecated: Use MinNodeSize (in % of training events) instead");
443 
444  DeclareOptionRef(fBaggedGradBoost=kFALSE, "UseBaggedGrad","deprecated: Use *UseBaggedBoost* instead: Use only a random subsample of all events for growing the trees in each iteration.");
445  DeclareOptionRef(fBaggedSampleFraction, "GradBaggingFraction","deprecated: Use *BaggedSampleFraction* instead: Defines the fraction of events to be used in each iteration, e.g. when UseBaggedGrad=kTRUE. ");
446  DeclareOptionRef(fUseNTrainEvents,"UseNTrainEvents","deprecated: Use *BaggedSampleFraction* instead: Number of randomly picked training events used in randomised (and bagged) trees");
447  DeclareOptionRef(fNNodesMax,"NNodesMax","deprecated: Use MaxDepth instead to limit the tree size" );
448 
449 
450 }
451 
452 ////////////////////////////////////////////////////////////////////////////////
453 /// Options that are used ONLY for the READER to ensure backward compatibility.
454 
457 
458 
459  DeclareOptionRef(fHistoricBool=kTRUE, "UseWeightedTrees",
460  "Use weighted trees or simple average in classification from the forest");
461  DeclareOptionRef(fHistoricBool=kFALSE, "PruneBeforeBoost", "Flag to prune the tree before applying boosting algorithm");
462  DeclareOptionRef(fHistoricBool=kFALSE,"RenormByClass","Individually re-normalize each event class to the original size after boosting");
463 
464  AddPreDefVal(TString("NegWeightTreatment"),TString("IgnoreNegWeights"));
465 
466 }
467 
468 ////////////////////////////////////////////////////////////////////////////////
469 /// The option string is decoded, for available options see "DeclareOptions".
470 
472 {
473  fSepTypeS.ToLower();
474  if (fSepTypeS == "misclassificationerror") fSepType = new MisClassificationError();
475  else if (fSepTypeS == "giniindex") fSepType = new GiniIndex();
476  else if (fSepTypeS == "giniindexwithlaplace") fSepType = new GiniIndexWithLaplace();
477  else if (fSepTypeS == "crossentropy") fSepType = new CrossEntropy();
478  else if (fSepTypeS == "sdivsqrtsplusb") fSepType = new SdivSqrtSplusB();
479  else if (fSepTypeS == "regressionvariance") fSepType = NULL;
480  else {
481  Log() << kINFO << GetOptions() << Endl;
482  Log() << kFATAL << "<ProcessOptions> unknown Separation Index option " << fSepTypeS << " called" << Endl;
483  }
484 
485  if(!(fHuberQuantile >= 0.0 && fHuberQuantile <= 1.0)){
486  Log() << kINFO << GetOptions() << Endl;
487  Log() << kFATAL << "<ProcessOptions> Huber Quantile must be in range [0,1]. Value given, " << fHuberQuantile << ", does not match this criteria" << Endl;
488  }
489 
494  else {
495  Log() << kINFO << GetOptions() << Endl;
496  Log() << kFATAL << "<ProcessOptions> unknown Regression Loss Function BDT option " << fRegressionLossFunctionBDTGS << " called" << Endl;
497  }
498 
501  else if (fPruneMethodS == "costcomplexity") fPruneMethod = DecisionTree::kCostComplexityPruning;
502  else if (fPruneMethodS == "nopruning") fPruneMethod = DecisionTree::kNoPruning;
503  else {
504  Log() << kINFO << GetOptions() << Endl;
505  Log() << kFATAL << "<ProcessOptions> unknown PruneMethod " << fPruneMethodS << " option called" << Endl;
506  }
508  else fAutomatic = kFALSE;
510  Log() << kFATAL
511  << "Sorry automatic pruning strength determination is not implemented yet for ExpectedErrorPruning" << Endl;
512  }
513 
514 
515  if (fMinNodeEvents > 0){
517  Log() << kWARNING << "You have explicitly set ** nEventsMin = " << fMinNodeEvents<<" ** the min absolute number \n"
518  << "of events in a leaf node. This is DEPRECATED, please use the option \n"
519  << "*MinNodeSize* giving the relative number as percentage of training \n"
520  << "events instead. \n"
521  << "nEventsMin="<<fMinNodeEvents<< "--> MinNodeSize="<<fMinNodeSize<<"%"
522  << Endl;
523  Log() << kWARNING << "Note also that explicitly setting *nEventsMin* so far OVERWRITES the option recommended \n"
524  << " *MinNodeSize* = " << fMinNodeSizeS << " option !!" << Endl ;
525  fMinNodeSizeS = Form("%F3.2",fMinNodeSize);
526 
527  }else{
529  }
530 
531 
533 
534  if (fBoostType=="Grad") {
536  if (fNegWeightTreatment=="InverseBoostNegWeights"){
537  Log() << kINFO << "the option *InverseBoostNegWeights* does not exist for BoostType=Grad --> change" << Endl;
538  Log() << kINFO << "to new default for GradBoost *Pray*" << Endl;
539  Log() << kDEBUG << "i.e. simply keep them as if which should work fine for Grad Boost" << Endl;
540  fNegWeightTreatment="Pray";
542  }
543  } else if (fBoostType=="RealAdaBoost"){
544  fBoostType = "AdaBoost";
546  } else if (fBoostType=="AdaCost"){
548  }
549 
550  if (fFValidationEvents < 0.0) fFValidationEvents = 0.0;
551  if (fAutomatic && fFValidationEvents > 0.5) {
552  Log() << kWARNING << "You have chosen to use more than half of your training sample "
553  << "to optimize the automatic pruning algorithm. This is probably wasteful "
554  << "and your overall results will be degraded. Are you sure you want this?"
555  << Endl;
556  }
557 
558 
559  if (this->Data()->HasNegativeEventWeights()){
560  Log() << kINFO << " You are using a Monte Carlo that has also negative weights. "
561  << "That should in principle be fine as long as on average you end up with "
562  << "something positive. For this you have to make sure that the minimal number "
563  << "of (un-weighted) events demanded for a tree node (currently you use: MinNodeSize="
564  << fMinNodeSizeS << " ("<< fMinNodeSize << "%)"
565  <<", (or the deprecated equivalent nEventsMin) you can set this via the "
566  <<"BDT option string when booking the "
567  << "classifier) is large enough to allow for reasonable averaging!!! "
568  << " If this does not help.. maybe you want to try the option: IgnoreNegWeightsInTraining "
569  << "which ignores events with negative weight in the training. " << Endl
570  << Endl << "Note: You'll get a WARNING message during the training if that should ever happen" << Endl;
571  }
572 
573  if (DoRegression()) {
575  Log() << kWARNING << "Regression Trees do not work with fUseYesNoLeaf=TRUE --> I will set it to FALSE" << Endl;
577  }
578 
579  if (fSepType != NULL){
580  Log() << kWARNING << "Regression Trees do not work with Separation type other than <RegressionVariance> --> I will use it instead" << Endl;
581  fSepType = NULL;
582  }
583  if (fUseFisherCuts){
584  Log() << kWARNING << "Sorry, UseFisherCuts is not available for regression analysis, I will ignore it!" << Endl;
586  }
587  if (fNCuts < 0) {
588  Log() << kWARNING << "Sorry, the option of nCuts<0 using a more elaborate node splitting algorithm " << Endl;
589  Log() << kWARNING << "is not implemented for regression analysis ! " << Endl;
590  Log() << kWARNING << "--> I switch do default nCuts = 20 and use standard node splitting"<<Endl;
591  fNCuts=20;
592  }
593  }
594  if (fRandomisedTrees){
595  Log() << kINFO << " Randomised trees use no pruning" << Endl;
597  // fBoostType = "Bagging";
598  }
599 
600  if (fUseFisherCuts) {
601  Log() << kWARNING << "When using the option UseFisherCuts, the other option nCuts<0 (i.e. using" << Endl;
602  Log() << " a more elaborate node splitting algorithm) is not implemented. " << Endl;
603  //I will switch o " << Endl;
604  //Log() << "--> I switch do default nCuts = 20 and use standard node splitting WITH possible Fisher criteria"<<Endl;
605  fNCuts=20;
606  }
607 
608  if (fNTrees==0){
609  Log() << kERROR << " Zero Decision Trees demanded... that does not work !! "
610  << " I set it to 1 .. just so that the program does not crash"
611  << Endl;
612  fNTrees = 1;
613  }
614 
616  if (fNegWeightTreatment == "ignorenegweightsintraining") fNoNegWeightsInTraining = kTRUE;
617  else if (fNegWeightTreatment == "nonegweightsintraining") fNoNegWeightsInTraining = kTRUE;
618  else if (fNegWeightTreatment == "inverseboostnegweights") fInverseBoostNegWeights = kTRUE;
619  else if (fNegWeightTreatment == "pairnegweightsglobal") fPairNegWeightsGlobal = kTRUE;
620  else if (fNegWeightTreatment == "pray") Log() << kDEBUG << "Yes, good luck with praying " << Endl;
621  else {
622  Log() << kINFO << GetOptions() << Endl;
623  Log() << kFATAL << "<ProcessOptions> unknown option for treating negative event weights during training " << fNegWeightTreatment << " requested" << Endl;
624  }
625 
626  if (fNegWeightTreatment == "pairnegweightsglobal")
627  Log() << kWARNING << " you specified the option NegWeightTreatment=PairNegWeightsGlobal : This option is still considered EXPERIMENTAL !! " << Endl;
628 
629 
630  // dealing with deprecated options !
631  if (fNNodesMax>0) {
632  UInt_t tmp=1; // depth=0 == 1 node
633  fMaxDepth=0;
634  while (tmp < fNNodesMax){
635  tmp+=2*tmp;
636  fMaxDepth++;
637  }
638  Log() << kWARNING << "You have specified a deprecated option *NNodesMax="<<fNNodesMax
639  << "* \n this has been translated to MaxDepth="<<fMaxDepth<<Endl;
640  }
641 
642 
643  if (fUseNTrainEvents>0){
645  Log() << kWARNING << "You have specified a deprecated option *UseNTrainEvents="<<fUseNTrainEvents
646  << "* \n this has been translated to BaggedSampleFraction="<<fBaggedSampleFraction<<"(%)"<<Endl;
647  }
648 
649  if (fBoostType=="Bagging") fBaggedBoost = kTRUE;
650  if (fBaggedGradBoost){
652  Log() << kWARNING << "You have specified a deprecated option *UseBaggedGrad* --> please use *UseBaggedBoost* instead" << Endl;
653  }
654 
655 }
656 
657 ////////////////////////////////////////////////////////////////////////////////
658 
660  if (sizeInPercent > 0 && sizeInPercent < 50){
661  fMinNodeSize=sizeInPercent;
662 
663  } else {
664  Log() << kFATAL << "you have demanded a minimal node size of "
665  << sizeInPercent << "% of the training events.. \n"
666  << " that somehow does not make sense "<<Endl;
667  }
668 
669 }
670 
671 ////////////////////////////////////////////////////////////////////////////////
672 
674  sizeInPercent.ReplaceAll("%","");
675  sizeInPercent.ReplaceAll(" ","");
676  if (sizeInPercent.IsFloat()) SetMinNodeSize(sizeInPercent.Atof());
677  else {
678  Log() << kFATAL << "I had problems reading the option MinNodeEvents, which "
679  << "after removing a possible % sign now reads " << sizeInPercent << Endl;
680  }
681 }
682 
683 ////////////////////////////////////////////////////////////////////////////////
684 /// Common initialisation with defaults for the BDT-Method.
685 
687 {
688  fNTrees = 800;
690  fMaxDepth = 3;
691  fBoostType = "AdaBoost";
692  if(DataInfo().GetNClasses()!=0) //workaround for multiclass application
693  fMinNodeSize = 5.;
694  }else {
695  fMaxDepth = 50;
696  fBoostType = "AdaBoostR2";
697  fAdaBoostR2Loss = "Quadratic";
698  if(DataInfo().GetNClasses()!=0) //workaround for multiclass application
699  fMinNodeSize = .2;
700  }
701 
702 
703  fNCuts = 20;
704  fPruneMethodS = "NoPruning";
706  fPruneStrength = 0;
707  fAutomatic = kFALSE;
708  fFValidationEvents = 0.5;
710  // fUseNvars = (GetNvar()>12) ? UInt_t(GetNvar()/8) : TMath::Max(UInt_t(2),UInt_t(GetNvar()/3));
713  fShrinkage = 1.0;
714 // fSumOfWeights = 0.0;
715 
716  // reference cut value to distinguish signal-like from background-like events
718 }
719 
720 
721 ////////////////////////////////////////////////////////////////////////////////
722 /// Reset the method, as if it had just been instantiated (forget all training etc.).
723 
725 {
726  // I keep the BDT EventSample and its Validation sample (eventually they should all
727  // disappear and just use the DataSet samples ..
728 
729  // remove all the trees
730  for (UInt_t i=0; i<fForest.size(); i++) delete fForest[i];
731  fForest.clear();
732 
733  fBoostWeights.clear();
735  fVariableImportance.clear();
736  fResiduals.clear();
737  fLossFunctionEventInfo.clear();
738  // now done in "InitEventSample" which is called in "Train"
739  // reset all previously stored/accumulated BOOST weights in the event sample
740  //for (UInt_t iev=0; iev<fEventSample.size(); iev++) fEventSample[iev]->SetBoostWeight(1.);
742  Log() << kDEBUG << " successfully(?) reset the method " << Endl;
743 }
744 
745 
746 ////////////////////////////////////////////////////////////////////////////////
747 /// Destructor.
748 ///
749 /// - Note: fEventSample and ValidationSample are already deleted at the end of TRAIN
750 /// When they are not used anymore
751 
753 {
754  for (UInt_t i=0; i<fForest.size(); i++) delete fForest[i];
755 }
756 
757 ////////////////////////////////////////////////////////////////////////////////
758 /// Initialize the event sample (i.e. reset the boost-weights... etc).
759 
761 {
762  if (!HasTrainingTree()) Log() << kFATAL << "<Init> Data().TrainingTree() is zero pointer" << Endl;
763 
764  if (fEventSample.size() > 0) { // do not re-initialise the event sample, just set all boostweights to 1. as if it were untouched
765  // reset all previously stored/accumulated BOOST weights in the event sample
766  for (UInt_t iev=0; iev<fEventSample.size(); iev++) fEventSample[iev]->SetBoostWeight(1.);
767  } else {
769  UInt_t nevents = Data()->GetNTrainingEvents();
770 
771  std::vector<const TMVA::Event*> tmpEventSample;
772  for (Long64_t ievt=0; ievt<nevents; ievt++) {
773  // const Event *event = new Event(*(GetEvent(ievt)));
774  Event* event = new Event( *GetTrainingEvent(ievt) );
775  tmpEventSample.push_back(event);
776  }
777 
778  if (!DoRegression()) DeterminePreselectionCuts(tmpEventSample);
779  else fDoPreselection = kFALSE; // just to make sure...
780 
781  for (UInt_t i=0; i<tmpEventSample.size(); i++) delete tmpEventSample[i];
782 
783 
784  Bool_t firstNegWeight=kTRUE;
785  Bool_t firstZeroWeight=kTRUE;
786  for (Long64_t ievt=0; ievt<nevents; ievt++) {
787  // const Event *event = new Event(*(GetEvent(ievt)));
788  // const Event* event = new Event( *GetTrainingEvent(ievt) );
789  Event* event = new Event( *GetTrainingEvent(ievt) );
790  if (fDoPreselection){
791  if (TMath::Abs(ApplyPreselectionCuts(event)) > 0.05) {
792  delete event;
793  continue;
794  }
795  }
796 
797  if (event->GetWeight() < 0 && (IgnoreEventsWithNegWeightsInTraining() || fNoNegWeightsInTraining)){
798  if (firstNegWeight) {
799  Log() << kWARNING << " Note, you have events with negative event weight in the sample, but you've chosen to ignore them" << Endl;
800  firstNegWeight=kFALSE;
801  }
802  delete event;
803  }else if (event->GetWeight()==0){
804  if (firstZeroWeight) {
805  firstZeroWeight = kFALSE;
806  Log() << "Events with weight == 0 are going to be simply ignored " << Endl;
807  }
808  delete event;
809  }else{
810  if (event->GetWeight() < 0) {
812  if (firstNegWeight){
813  firstNegWeight = kFALSE;
815  Log() << kWARNING << "Events with negative event weights are found and "
816  << " will be removed prior to the actual BDT training by global "
817  << " paring (and subsequent annihilation) with positiv weight events"
818  << Endl;
819  }else{
820  Log() << kWARNING << "Events with negative event weights are USED during "
821  << "the BDT training. This might cause problems with small node sizes "
822  << "or with the boosting. Please remove negative events from training "
823  << "using the option *IgnoreEventsWithNegWeightsInTraining* in case you "
824  << "observe problems with the boosting"
825  << Endl;
826  }
827  }
828  }
829  // if fAutomatic == true you need a validation sample to optimize pruning
830  if (fAutomatic) {
831  Double_t modulo = 1.0/(fFValidationEvents);
832  Int_t imodulo = static_cast<Int_t>( fmod(modulo,1.0) > 0.5 ? ceil(modulo) : floor(modulo) );
833  if (ievt % imodulo == 0) fValidationSample.push_back( event );
834  else fEventSample.push_back( event );
835  }
836  else {
837  fEventSample.push_back(event);
838  }
839  }
840  }
841 
842  if (fAutomatic) {
843  Log() << kINFO << "<InitEventSample> Internally I use " << fEventSample.size()
844  << " for Training and " << fValidationSample.size()
845  << " for Pruning Validation (" << ((Float_t)fValidationSample.size())/((Float_t)fEventSample.size()+fValidationSample.size())*100.0
846  << "% of training used for validation)" << Endl;
847  }
848 
849  // some pre-processing for events with negative weights
851  }
852 
853  if (DoRegression()) {
854  // Regression, no reweighting to do
855  } else if (DoMulticlass()) {
856  // Multiclass, only gradboost is supported. No reweighting.
857  } else if (!fSkipNormalization) {
858  // Binary classification.
859  Log() << kDEBUG << "\t<InitEventSample> For classification trees, "<< Endl;
860  Log() << kDEBUG << " \tthe effective number of backgrounds is scaled to match "<<Endl;
861  Log() << kDEBUG << " \tthe signal. Otherwise the first boosting step would do 'just that'!"<<Endl;
862  // it does not make sense in decision trees to start with unequal number of signal/background
863  // events (weights) .. hence normalize them now (happens otherwise in first 'boosting step'
864  // anyway..
865  // Also make sure, that the sum_of_weights == sample.size() .. as this is assumed in
866  // the DecisionTree to derive a sensible number for "fMinSize" (min.#events in node)
867  // that currently is an OR between "weighted" and "unweighted number"
868  // I want:
869  // nS + nB = n
870  // a*SW + b*BW = n
871  // (a*SW)/(b*BW) = fSigToBkgFraction
872  //
873  // ==> b = n/((1+f)BW) and a = (nf/(1+f))/SW
874 
875  Double_t nevents = fEventSample.size();
876  Double_t sumSigW=0, sumBkgW=0;
877  Int_t sumSig=0, sumBkg=0;
878  for (UInt_t ievt=0; ievt<fEventSample.size(); ievt++) {
879  if ((DataInfo().IsSignal(fEventSample[ievt])) ) {
880  sumSigW += fEventSample[ievt]->GetWeight();
881  sumSig++;
882  } else {
883  sumBkgW += fEventSample[ievt]->GetWeight();
884  sumBkg++;
885  }
886  }
887  if (sumSigW && sumBkgW){
888  Double_t normSig = nevents/((1+fSigToBkgFraction)*sumSigW)*fSigToBkgFraction;
889  Double_t normBkg = nevents/((1+fSigToBkgFraction)*sumBkgW); ;
890  Log() << kDEBUG << "\tre-normalise events such that Sig and Bkg have respective sum of weights = "
891  << fSigToBkgFraction << Endl;
892  Log() << kDEBUG << " \tsig->sig*"<<normSig << "ev. bkg->bkg*"<<normBkg << "ev." <<Endl;
893  Log() << kHEADER << "#events: (reweighted) sig: "<< sumSigW*normSig << " bkg: " << sumBkgW*normBkg << Endl;
894  Log() << kINFO << "#events: (unweighted) sig: "<< sumSig << " bkg: " << sumBkg << Endl;
895  for (Long64_t ievt=0; ievt<nevents; ievt++) {
896  if ((DataInfo().IsSignal(fEventSample[ievt])) ) fEventSample[ievt]->SetBoostWeight(normSig);
897  else fEventSample[ievt]->SetBoostWeight(normBkg);
898  }
899  }else{
900  Log() << kINFO << "--> could not determine scaling factors as either there are " << Endl;
901  Log() << kINFO << " no signal events (sumSigW="<<sumSigW<<") or no bkg ev. (sumBkgW="<<sumBkgW<<")"<<Endl;
902  }
903 
904  }
905 
907  if (fBaggedBoost){
910  }
911 
912  //just for debug purposes..
913  /*
914  sumSigW=0;
915  sumBkgW=0;
916  for (UInt_t ievt=0; ievt<fEventSample.size(); ievt++) {
917  if ((DataInfo().IsSignal(fEventSample[ievt])) ) sumSigW += fEventSample[ievt]->GetWeight();
918  else sumBkgW += fEventSample[ievt]->GetWeight();
919  }
920  Log() << kWARNING << "sigSumW="<<sumSigW<<"bkgSumW="<<sumBkgW<< Endl;
921  */
922 }
923 
924 ////////////////////////////////////////////////////////////////////////////////
925 /// O.k. you know there are events with negative event weights. This routine will remove
926 /// them by pairing them with the closest event(s) of the same event class with positive
927 /// weights
928 /// A first attempt is "brute force", I dont' try to be clever using search trees etc,
929 /// just quick and dirty to see if the result is any good
930 
932  Double_t totalNegWeights = 0;
933  Double_t totalPosWeights = 0;
934  Double_t totalWeights = 0;
935  std::vector<const Event*> negEvents;
936  for (UInt_t iev = 0; iev < fEventSample.size(); iev++){
937  if (fEventSample[iev]->GetWeight() < 0) {
938  totalNegWeights += fEventSample[iev]->GetWeight();
939  negEvents.push_back(fEventSample[iev]);
940  } else {
941  totalPosWeights += fEventSample[iev]->GetWeight();
942  }
943  totalWeights += fEventSample[iev]->GetWeight();
944  }
945  if (totalNegWeights == 0 ) {
946  Log() << kINFO << "no negative event weights found .. no preprocessing necessary" << Endl;
947  return;
948  } else {
949  Log() << kINFO << "found a total of " << totalNegWeights << " of negative event weights which I am going to try to pair with positive events to annihilate them" << Endl;
950  Log() << kINFO << "found a total of " << totalPosWeights << " of events with positive weights" << Endl;
951  Log() << kINFO << "--> total sum of weights = " << totalWeights << " = " << totalNegWeights+totalPosWeights << Endl;
952  }
953 
954  std::vector<TMatrixDSym*>* cov = gTools().CalcCovarianceMatrices( fEventSample, 2);
955 
956  TMatrixDSym *invCov;
957 
958  for (Int_t i=0; i<2; i++){
959  invCov = ((*cov)[i]);
960  if ( TMath::Abs(invCov->Determinant()) < 10E-24 ) {
961  std::cout << "<MethodBDT::PreProcessNeg...> matrix is almost singular with determinant="
962  << TMath::Abs(invCov->Determinant())
963  << " did you use the variables that are linear combinations or highly correlated?"
964  << std::endl;
965  }
966  if ( TMath::Abs(invCov->Determinant()) < 10E-120 ) {
967  std::cout << "<MethodBDT::PreProcessNeg...> matrix is singular with determinant="
968  << TMath::Abs(invCov->Determinant())
969  << " did you use the variables that are linear combinations?"
970  << std::endl;
971  }
972 
973  invCov->Invert();
974  }
975 
976 
977 
978  Log() << kINFO << "Found a total of " << totalNegWeights << " in negative weights out of " << fEventSample.size() << " training events " << Endl;
979  Timer timer(negEvents.size(),"Negative Event paired");
980  for (UInt_t nev = 0; nev < negEvents.size(); nev++){
981  timer.DrawProgressBar( nev );
982  Double_t weight = negEvents[nev]->GetWeight();
983  UInt_t iClassID = negEvents[nev]->GetClass();
984  invCov = ((*cov)[iClassID]);
985  while (weight < 0){
986  // find closest event with positive event weight and "pair" it with the negative event
987  // (add their weight) until there is no negative weight anymore
988  Int_t iMin=-1;
989  Double_t dist, minDist=10E270;
990  for (UInt_t iev = 0; iev < fEventSample.size(); iev++){
991  if (iClassID==fEventSample[iev]->GetClass() && fEventSample[iev]->GetWeight() > 0){
992  dist=0;
993  for (UInt_t ivar=0; ivar < GetNvar(); ivar++){
994  for (UInt_t jvar=0; jvar<GetNvar(); jvar++){
995  dist += (negEvents[nev]->GetValue(ivar)-fEventSample[iev]->GetValue(ivar))*
996  (*invCov)[ivar][jvar]*
997  (negEvents[nev]->GetValue(jvar)-fEventSample[iev]->GetValue(jvar));
998  }
999  }
1000  if (dist < minDist) { iMin=iev; minDist=dist;}
1001  }
1002  }
1003 
1004  if (iMin > -1) {
1005  // std::cout << "Happily pairing .. weight before : " << negEvents[nev]->GetWeight() << " and " << fEventSample[iMin]->GetWeight();
1006  Double_t newWeight = (negEvents[nev]->GetWeight() + fEventSample[iMin]->GetWeight());
1007  if (newWeight > 0){
1008  negEvents[nev]->SetBoostWeight( 0 );
1009  fEventSample[iMin]->SetBoostWeight( newWeight/fEventSample[iMin]->GetOriginalWeight() ); // note the weight*boostweight should be "newWeight"
1010  } else {
1011  negEvents[nev]->SetBoostWeight( newWeight/negEvents[nev]->GetOriginalWeight() ); // note the weight*boostweight should be "newWeight"
1012  fEventSample[iMin]->SetBoostWeight( 0 );
1013  }
1014  // std::cout << " and afterwards " << negEvents[nev]->GetWeight() << " and the paired " << fEventSample[iMin]->GetWeight() << " dist="<<minDist<< std::endl;
1015  } else Log() << kFATAL << "preprocessing didn't find event to pair with the negative weight ... probably a bug" << Endl;
1016  weight = negEvents[nev]->GetWeight();
1017  }
1018  }
1019  Log() << kINFO << "<Negative Event Pairing> took: " << timer.GetElapsedTime()
1020  << " " << Endl;
1021 
1022  // just check.. now there should be no negative event weight left anymore
1023  totalNegWeights = 0;
1024  totalPosWeights = 0;
1025  totalWeights = 0;
1026  Double_t sigWeight=0;
1027  Double_t bkgWeight=0;
1028  Int_t nSig=0;
1029  Int_t nBkg=0;
1030 
1031  std::vector<const Event*> newEventSample;
1032 
1033  for (UInt_t iev = 0; iev < fEventSample.size(); iev++){
1034  if (fEventSample[iev]->GetWeight() < 0) {
1035  totalNegWeights += fEventSample[iev]->GetWeight();
1036  totalWeights += fEventSample[iev]->GetWeight();
1037  } else {
1038  totalPosWeights += fEventSample[iev]->GetWeight();
1039  totalWeights += fEventSample[iev]->GetWeight();
1040  }
1041  if (fEventSample[iev]->GetWeight() > 0) {
1042  newEventSample.push_back(new Event(*fEventSample[iev]));
1043  if (fEventSample[iev]->GetClass() == fSignalClass){
1044  sigWeight += fEventSample[iev]->GetWeight();
1045  nSig+=1;
1046  }else{
1047  bkgWeight += fEventSample[iev]->GetWeight();
1048  nBkg+=1;
1049  }
1050  }
1051  }
1052  if (totalNegWeights < 0) Log() << kFATAL << " compensation of negative event weights with positive ones did not work " << totalNegWeights << Endl;
1053 
1054  for (UInt_t i=0; i<fEventSample.size(); i++) delete fEventSample[i];
1055  fEventSample = newEventSample;
1056 
1057  Log() << kINFO << " after PreProcessing, the Event sample is left with " << fEventSample.size() << " events (unweighted), all with positive weights, adding up to " << totalWeights << Endl;
1058  Log() << kINFO << " nSig="<<nSig << " sigWeight="<<sigWeight << " nBkg="<<nBkg << " bkgWeight="<<bkgWeight << Endl;
1059 
1060 
1061 }
1062 
1063 ////////////////////////////////////////////////////////////////////////////////
1064 /// Call the Optimizer with the set of parameters and ranges that
1065 /// are meant to be tuned.
1066 
1067 std::map<TString,Double_t> TMVA::MethodBDT::OptimizeTuningParameters(TString fomType, TString fitType)
1068 {
1069  // fill all the tuning parameters that should be optimized into a map:
1070  std::map<TString,TMVA::Interval*> tuneParameters;
1071  std::map<TString,Double_t> tunedParameters;
1072 
1073  // note: the 3rd parameter in the interval is the "number of bins", NOT the stepsize !!
1074  // the actual VALUES at (at least for the scan, guess also in GA) are always
1075  // read from the middle of the bins. Hence.. the choice of Intervals e.g. for the
1076  // MaxDepth, in order to make nice integer values!!!
1077 
1078  // find some reasonable ranges for the optimisation of MinNodeEvents:
1079 
1080  tuneParameters.insert(std::pair<TString,Interval*>("NTrees", new Interval(10,1000,5))); // stepsize 50
1081  tuneParameters.insert(std::pair<TString,Interval*>("MaxDepth", new Interval(2,4,3))); // stepsize 1
1082  tuneParameters.insert(std::pair<TString,Interval*>("MinNodeSize", new LogInterval(1,30,30))); //
1083  //tuneParameters.insert(std::pair<TString,Interval*>("NodePurityLimit",new Interval(.4,.6,3))); // stepsize .1
1084  //tuneParameters.insert(std::pair<TString,Interval*>("BaggedSampleFraction",new Interval(.4,.9,6))); // stepsize .1
1085 
1086  // method-specific parameters
1087  if (fBoostType=="AdaBoost"){
1088  tuneParameters.insert(std::pair<TString,Interval*>("AdaBoostBeta", new Interval(.2,1.,5)));
1089 
1090  }else if (fBoostType=="Grad"){
1091  tuneParameters.insert(std::pair<TString,Interval*>("Shrinkage", new Interval(0.05,0.50,5)));
1092 
1093  }else if (fBoostType=="Bagging" && fRandomisedTrees){
1094  Int_t min_var = TMath::FloorNint( GetNvar() * .25 );
1095  Int_t max_var = TMath::CeilNint( GetNvar() * .75 );
1096  tuneParameters.insert(std::pair<TString,Interval*>("UseNvars", new Interval(min_var,max_var,4)));
1097 
1098  }
1099 
1100  Log()<<kINFO << " the following BDT parameters will be tuned on the respective *grid*\n"<<Endl;
1101  std::map<TString,TMVA::Interval*>::iterator it;
1102  for(it=tuneParameters.begin(); it!= tuneParameters.end(); it++){
1103  Log() << kWARNING << it->first << Endl;
1104  std::ostringstream oss;
1105  (it->second)->Print(oss);
1106  Log()<<oss.str();
1107  Log()<<Endl;
1108  }
1109 
1110  OptimizeConfigParameters optimize(this, tuneParameters, fomType, fitType);
1111  tunedParameters=optimize.optimize();
1112 
1113  return tunedParameters;
1114 
1115 }
1116 
1117 ////////////////////////////////////////////////////////////////////////////////
1118 /// Set the tuning parameters according to the argument.
1119 
1120 void TMVA::MethodBDT::SetTuneParameters(std::map<TString,Double_t> tuneParameters)
1121 {
1122  std::map<TString,Double_t>::iterator it;
1123  for(it=tuneParameters.begin(); it!= tuneParameters.end(); it++){
1124  Log() << kWARNING << it->first << " = " << it->second << Endl;
1125  if (it->first == "MaxDepth" ) SetMaxDepth ((Int_t)it->second);
1126  else if (it->first == "MinNodeSize" ) SetMinNodeSize (it->second);
1127  else if (it->first == "NTrees" ) SetNTrees ((Int_t)it->second);
1128  else if (it->first == "NodePurityLimit") SetNodePurityLimit (it->second);
1129  else if (it->first == "AdaBoostBeta" ) SetAdaBoostBeta (it->second);
1130  else if (it->first == "Shrinkage" ) SetShrinkage (it->second);
1131  else if (it->first == "UseNvars" ) SetUseNvars ((Int_t)it->second);
1132  else if (it->first == "BaggedSampleFraction" ) SetBaggedSampleFraction (it->second);
1133  else Log() << kFATAL << " SetParameter for " << it->first << " not yet implemented " <<Endl;
1134  }
1135 }
1136 
1137 ////////////////////////////////////////////////////////////////////////////////
1138 /// BDT training.
1139 
1141 {
1143 
1144  // fill the STL Vector with the event sample
1145  // (needs to be done here and cannot be done in "init" as the options need to be
1146  // known).
1147  InitEventSample();
1148 
1149  if (fNTrees==0){
1150  Log() << kERROR << " Zero Decision Trees demanded... that does not work !! "
1151  << " I set it to 1 .. just so that the program does not crash"
1152  << Endl;
1153  fNTrees = 1;
1154  }
1155 
1157  std::vector<TString> titles = {"Boost weight", "Error Fraction"};
1158  fInteractive->Init(titles);
1159  }
1160  fIPyMaxIter = fNTrees;
1161  fExitFromTraining = false;
1162 
1163  // HHV (it's been here since looong but I really don't know why we cannot handle
1164  // normalized variables in BDTs... todo
1165  if (IsNormalised()) Log() << kFATAL << "\"Normalise\" option cannot be used with BDT; "
1166  << "please remove the option from the configuration string, or "
1167  << "use \"!Normalise\""
1168  << Endl;
1169 
1170  if(DoRegression())
1171  Log() << kINFO << "Regression Loss Function: "<< fRegressionLossFunctionBDTG->Name() << Endl;
1172 
1173  Log() << kINFO << "Training "<< fNTrees << " Decision Trees ... patience please" << Endl;
1174 
1175  Log() << kDEBUG << "Training with maximal depth = " <<fMaxDepth
1176  << ", MinNodeEvents=" << fMinNodeEvents
1177  << ", NTrees="<<fNTrees
1178  << ", NodePurityLimit="<<fNodePurityLimit
1179  << ", AdaBoostBeta="<<fAdaBoostBeta
1180  << Endl;
1181 
1182  // weights applied in boosting
1183  Int_t nBins;
1184  Double_t xMin,xMax;
1185  TString hname = "AdaBooost weight distribution";
1186 
1187  nBins= 100;
1188  xMin = 0;
1189  xMax = 30;
1190 
1191  if (DoRegression()) {
1192  nBins= 100;
1193  xMin = 0;
1194  xMax = 1;
1195  hname="Boost event weights distribution";
1196  }
1197 
1198  // book monitoring histograms (for AdaBost only)
1199 
1200  TH1* h = new TH1F(Form("%s_BoostWeight",DataInfo().GetName()),hname,nBins,xMin,xMax);
1201  TH1* nodesBeforePruningVsTree = new TH1I(Form("%s_NodesBeforePruning",DataInfo().GetName()),"nodes before pruning",fNTrees,0,fNTrees);
1202  TH1* nodesAfterPruningVsTree = new TH1I(Form("%s_NodesAfterPruning",DataInfo().GetName()),"nodes after pruning",fNTrees,0,fNTrees);
1203 
1204 
1205 
1206  if(!DoMulticlass()){
1208 
1209  h->SetXTitle("boost weight");
1210  results->Store(h, "BoostWeights");
1211 
1212 
1213  // Monitor the performance (on TEST sample) versus number of trees
1214  if (fDoBoostMonitor){
1215  TH2* boostMonitor = new TH2F("BoostMonitor","ROC Integral Vs iTree",2,0,fNTrees,2,0,1.05);
1216  boostMonitor->SetXTitle("#tree");
1217  boostMonitor->SetYTitle("ROC Integral");
1218  results->Store(boostMonitor, "BoostMonitor");
1219  TGraph *boostMonitorGraph = new TGraph();
1220  boostMonitorGraph->SetName("BoostMonitorGraph");
1221  boostMonitorGraph->SetTitle("ROCIntegralVsNTrees");
1222  results->Store(boostMonitorGraph, "BoostMonitorGraph");
1223  }
1224 
1225  // weights applied in boosting vs tree number
1226  h = new TH1F("BoostWeightVsTree","Boost weights vs tree",fNTrees,0,fNTrees);
1227  h->SetXTitle("#tree");
1228  h->SetYTitle("boost weight");
1229  results->Store(h, "BoostWeightsVsTree");
1230 
1231  // error fraction vs tree number
1232  h = new TH1F("ErrFractHist","error fraction vs tree number",fNTrees,0,fNTrees);
1233  h->SetXTitle("#tree");
1234  h->SetYTitle("error fraction");
1235  results->Store(h, "ErrorFrac");
1236 
1237  // nNodesBeforePruning vs tree number
1238  nodesBeforePruningVsTree->SetXTitle("#tree");
1239  nodesBeforePruningVsTree->SetYTitle("#tree nodes");
1240  results->Store(nodesBeforePruningVsTree);
1241 
1242  // nNodesAfterPruning vs tree number
1243  nodesAfterPruningVsTree->SetXTitle("#tree");
1244  nodesAfterPruningVsTree->SetYTitle("#tree nodes");
1245  results->Store(nodesAfterPruningVsTree);
1246 
1247  }
1248 
1249  fMonitorNtuple= new TTree("MonitorNtuple","BDT variables");
1250  fMonitorNtuple->Branch("iTree",&fITree,"iTree/I");
1251  fMonitorNtuple->Branch("boostWeight",&fBoostWeight,"boostWeight/D");
1252  fMonitorNtuple->Branch("errorFraction",&fErrorFraction,"errorFraction/D");
1253 
1254  Timer timer( fNTrees, GetName() );
1255  Int_t nNodesBeforePruningCount = 0;
1256  Int_t nNodesAfterPruningCount = 0;
1257 
1258  Int_t nNodesBeforePruning = 0;
1259  Int_t nNodesAfterPruning = 0;
1260 
1261 
1262  if(fBoostType=="Grad"){
1264  }
1265 
1266  Int_t itree=0;
1267  Bool_t continueBoost=kTRUE;
1268  //for (int itree=0; itree<fNTrees; itree++) {
1269  while (itree < fNTrees && continueBoost){
1270  if (fExitFromTraining) break;
1271  fIPyCurrentIter = itree;
1272  timer.DrawProgressBar( itree );
1273  // Results* results = Data()->GetResults(GetMethodName(), Types::kTraining, GetAnalysisType());
1274  // TH1 *hxx = new TH1F(Form("swdist%d",itree),Form("swdist%d",itree),10000,0,15);
1275  // results->Store(hxx,Form("swdist%d",itree));
1276  // TH1 *hxy = new TH1F(Form("bwdist%d",itree),Form("bwdist%d",itree),10000,0,15);
1277  // results->Store(hxy,Form("bwdist%d",itree));
1278  // for (Int_t iev=0; iev<fEventSample.size(); iev++) {
1279  // if (fEventSample[iev]->GetClass()!=0) hxy->Fill((fEventSample[iev])->GetWeight());
1280  // else hxx->Fill((fEventSample[iev])->GetWeight());
1281  // }
1282 
1283  if(DoMulticlass()){
1284  if (fBoostType!="Grad"){
1285  Log() << kFATAL << "Multiclass is currently only supported by gradient boost. "
1286  << "Please change boost option accordingly (GradBoost)."
1287  << Endl;
1288  }
1289 
1290  UInt_t nClasses = DataInfo().GetNClasses();
1291  for (UInt_t i=0;i<nClasses;i++){
1292  // Careful: If fSepType is nullptr, the tree will be considered a regression tree and
1293  // use the correct output for gradboost (response rather than yesnoleaf) in checkEvent.
1294  // See TMVA::MethodBDT::InitGradBoost.
1295  fForest.push_back( new DecisionTree( fSepType, fMinNodeSize, fNCuts, &(DataInfo()), i,
1297  itree*nClasses+i, fNodePurityLimit, itree*nClasses+1));
1298  fForest.back()->SetNVars(GetNvar());
1299  if (fUseFisherCuts) {
1300  fForest.back()->SetUseFisherCuts();
1301  fForest.back()->SetMinLinCorrForFisher(fMinLinCorrForFisher);
1302  fForest.back()->SetUseExclusiveVars(fUseExclusiveVars);
1303  }
1304  // the minimum linear correlation between two variables demanded for use in fisher criterion in node splitting
1305 
1306  nNodesBeforePruning = fForest.back()->BuildTree(*fTrainSample);
1307  Double_t bw = this->Boost(*fTrainSample, fForest.back(),i);
1308  if (bw > 0) {
1309  fBoostWeights.push_back(bw);
1310  }else{
1311  fBoostWeights.push_back(0);
1312  Log() << kWARNING << "stopped boosting at itree="<<itree << Endl;
1313  // fNTrees = itree+1; // that should stop the boosting
1314  continueBoost=kFALSE;
1315  }
1316  }
1317  }
1318  else{
1321  itree, fNodePurityLimit, itree));
1322  fForest.back()->SetNVars(GetNvar());
1323  if (fUseFisherCuts) {
1324  fForest.back()->SetUseFisherCuts();
1325  fForest.back()->SetMinLinCorrForFisher(fMinLinCorrForFisher);
1326  fForest.back()->SetUseExclusiveVars(fUseExclusiveVars);
1327  }
1328 
1329  nNodesBeforePruning = fForest.back()->BuildTree(*fTrainSample);
1330 
1331  if (fUseYesNoLeaf && !DoRegression() && fBoostType!="Grad") { // remove leaf nodes where both daughter nodes are of same type
1332  nNodesBeforePruning = fForest.back()->CleanTree();
1333  }
1334 
1335  nNodesBeforePruningCount += nNodesBeforePruning;
1336  nodesBeforePruningVsTree->SetBinContent(itree+1,nNodesBeforePruning);
1337 
1338  fForest.back()->SetPruneMethod(fPruneMethod); // set the pruning method for the tree
1339  fForest.back()->SetPruneStrength(fPruneStrength); // set the strength parameter
1340 
1341  std::vector<const Event*> * validationSample = NULL;
1342  if(fAutomatic) validationSample = &fValidationSample;
1343 
1344  Double_t bw = this->Boost(*fTrainSample, fForest.back());
1345  if (bw > 0) {
1346  fBoostWeights.push_back(bw);
1347  }else{
1348  fBoostWeights.push_back(0);
1349  Log() << kWARNING << "stopped boosting at itree="<<itree << Endl;
1350  continueBoost=kFALSE;
1351  }
1352 
1353 
1354 
1355  // if fAutomatic == true, pruneStrength will be the optimal pruning strength
1356  // determined by the pruning algorithm; otherwise, it is simply the strength parameter
1357  // set by the user
1358  if (fPruneMethod != DecisionTree::kNoPruning) fForest.back()->PruneTree(validationSample);
1359 
1360  if (fUseYesNoLeaf && !DoRegression() && fBoostType!="Grad"){ // remove leaf nodes where both daughter nodes are of same type
1361  fForest.back()->CleanTree();
1362  }
1363  nNodesAfterPruning = fForest.back()->GetNNodes();
1364  nNodesAfterPruningCount += nNodesAfterPruning;
1365  nodesAfterPruningVsTree->SetBinContent(itree+1,nNodesAfterPruning);
1366 
1367  if (fInteractive){
1369  }
1370  fITree = itree;
1371  fMonitorNtuple->Fill();
1372  if (fDoBoostMonitor){
1373  if (! DoRegression() ){
1374  if ( itree==fNTrees-1 || (!(itree%500)) ||
1375  (!(itree%250) && itree <1000)||
1376  (!(itree%100) && itree < 500)||
1377  (!(itree%50) && itree < 250)||
1378  (!(itree%25) && itree < 150)||
1379  (!(itree%10) && itree < 50)||
1380  (!(itree%5) && itree < 20)
1381  ) BoostMonitor(itree);
1382  }
1383  }
1384  }
1385  itree++;
1386  }
1387 
1388  // get elapsed time
1389  Log() << kDEBUG << "\t<Train> elapsed time: " << timer.GetElapsedTime()
1390  << " " << Endl;
1392  Log() << kDEBUG << "\t<Train> average number of nodes (w/o pruning) : "
1393  << nNodesBeforePruningCount/GetNTrees() << Endl;
1394  }
1395  else {
1396  Log() << kDEBUG << "\t<Train> average number of nodes before/after pruning : "
1397  << nNodesBeforePruningCount/GetNTrees() << " / "
1398  << nNodesAfterPruningCount/GetNTrees()
1399  << Endl;
1400  }
1402 
1403 
1404  // reset all previously stored/accumulated BOOST weights in the event sample
1405  // for (UInt_t iev=0; iev<fEventSample.size(); iev++) fEventSample[iev]->SetBoostWeight(1.);
1406  Log() << kDEBUG << "Now I delete the privat data sample"<< Endl;
1407  for (UInt_t i=0; i<fEventSample.size(); i++) delete fEventSample[i];
1408  for (UInt_t i=0; i<fValidationSample.size(); i++) delete fValidationSample[i];
1409  fEventSample.clear();
1410  fValidationSample.clear();
1411 
1413  ExitFromTraining();
1414 }
1415 
1416 
1417 ////////////////////////////////////////////////////////////////////////////////
1418 /// Returns MVA value: -1 for background, 1 for signal.
1419 
1421 {
1422  Double_t sum=0;
1423  for (UInt_t itree=0; itree<nTrees; itree++) {
1424  //loop over all trees in forest
1425  sum += fForest[itree]->CheckEvent(e,kFALSE);
1426 
1427  }
1428  return 2.0/(1.0+exp(-2.0*sum))-1; //MVA output between -1 and 1
1429 }
1430 
1431 ////////////////////////////////////////////////////////////////////////////////
1432 /// Calculate residual for all events.
1433 
1434 void TMVA::MethodBDT::UpdateTargets(std::vector<const TMVA::Event*>& eventSample, UInt_t cls)
1435 {
1436  if (DoMulticlass()) {
1437  UInt_t nClasses = DataInfo().GetNClasses();
1438  std::vector<Double_t> expCache;
1439  if (cls == nClasses - 1) {
1440  expCache.resize(nClasses);
1441  }
1442  for (auto e : eventSample) {
1443  fResiduals[e].at(cls) += fForest.back()->CheckEvent(e, kFALSE);
1444  if (cls == nClasses - 1) {
1445  auto &residualsThisEvent = fResiduals[e];
1446  std::transform(residualsThisEvent.begin(),
1447  residualsThisEvent.begin() + nClasses,
1448  expCache.begin(), [](Double_t d) { return exp(d); });
1449  for (UInt_t i = 0; i < nClasses; i++) {
1450  Double_t norm = 0.0;
1451  for (UInt_t j = 0; j < nClasses; j++) {
1452  if (i != j) {
1453  norm += expCache[j] / expCache[i];
1454  }
1455  }
1456  Double_t p_cls = 1.0 / (1.0 + norm);
1457  Double_t res = (e->GetClass() == i) ? (1.0 - p_cls) : (-p_cls);
1458  const_cast<TMVA::Event *>(e)->SetTarget(i, res);
1459  }
1460  }
1461  }
1462  } else {
1463  for (auto e : eventSample) {
1464  auto &residualAt0 = fResiduals[e].at(0);
1465  residualAt0 += fForest.back()->CheckEvent(e, kFALSE);
1466  Double_t p_sig = 1.0 / (1.0 + exp(-2.0 * residualAt0));
1467  Double_t res = (DataInfo().IsSignal(e) ? 1 : 0) - p_sig;
1468  const_cast<TMVA::Event *>(e)->SetTarget(0, res);
1469  }
1470  }
1471 }
1472 
1473 ////////////////////////////////////////////////////////////////////////////////
1474 /// Calculate current residuals for all events and update targets for next iteration.
1475 
1476 void TMVA::MethodBDT::UpdateTargetsRegression(std::vector<const TMVA::Event*>& eventSample, Bool_t first)
1477 {
1478  if(!first){
1479  for (std::vector<const TMVA::Event*>::const_iterator e=fEventSample.begin(); e!=fEventSample.end();e++) {
1480  fLossFunctionEventInfo[*e].predictedValue += fForest.back()->CheckEvent(*e,kFALSE);
1481  }
1482  }
1483 
1485 }
1486 
1487 ////////////////////////////////////////////////////////////////////////////////
1488 /// Calculate the desired response value for each region.
1489 
1490 Double_t TMVA::MethodBDT::GradBoost(std::vector<const TMVA::Event*>& eventSample, DecisionTree *dt, UInt_t cls)
1491 {
1492  struct LeafInfo {
1493  Double_t sumWeightTarget = 0;
1494  Double_t sum2 = 0;
1495  };
1496 
1497  std::unordered_map<TMVA::DecisionTreeNode*, LeafInfo> leaves;
1498  for (auto e : eventSample) {
1499  Double_t weight = e->GetWeight();
1500  TMVA::DecisionTreeNode* node = dt->GetEventNode(*e);
1501  auto &v = leaves[node];
1502  auto target = e->GetTarget(cls);
1503  v.sumWeightTarget += target * weight;
1504  v.sum2 += fabs(target) * (1.0 - fabs(target)) * weight;
1505  }
1506  for (auto &iLeave : leaves) {
1507  constexpr auto minValue = 1e-30;
1508  if (iLeave.second.sum2 < minValue) {
1509  iLeave.second.sum2 = minValue;
1510  }
1511  const Double_t K = DataInfo().GetNClasses();
1512  iLeave.first->SetResponse(fShrinkage * (K - 1) / K * iLeave.second.sumWeightTarget / iLeave.second.sum2);
1513  }
1514 
1515  //call UpdateTargets before next tree is grown
1516 
1518  return 1; //trees all have the same weight
1519 }
1520 
1521 ////////////////////////////////////////////////////////////////////////////////
1522 /// Implementation of M_TreeBoost using any loss function as described by Friedman 1999.
1523 
1524 Double_t TMVA::MethodBDT::GradBoostRegression(std::vector<const TMVA::Event*>& eventSample, DecisionTree *dt )
1525 {
1526  // get the vector of events for each terminal so that we can calculate the constant fit value in each
1527  // terminal node
1528  std::map<TMVA::DecisionTreeNode*,vector< TMVA::LossFunctionEventInfo > > leaves;
1529  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1530  TMVA::DecisionTreeNode* node = dt->GetEventNode(*(*e));
1531  (leaves[node]).push_back(fLossFunctionEventInfo[*e]);
1532  }
1533 
1534  // calculate the constant fit for each terminal node based upon the events in the node
1535  // node (iLeave->first), vector of event information (iLeave->second)
1536  for (std::map<TMVA::DecisionTreeNode*,vector< TMVA::LossFunctionEventInfo > >::iterator iLeave=leaves.begin();
1537  iLeave!=leaves.end();++iLeave){
1538  Double_t fit = fRegressionLossFunctionBDTG->Fit(iLeave->second);
1539  (iLeave->first)->SetResponse(fShrinkage*fit);
1540  }
1541 
1543  return 1;
1544 }
1545 
1546 ////////////////////////////////////////////////////////////////////////////////
1547 /// Initialize targets for first tree.
1548 
1549 void TMVA::MethodBDT::InitGradBoost( std::vector<const TMVA::Event*>& eventSample)
1550 {
1551  // Should get rid of this line. It's just for debugging.
1552  //std::sort(eventSample.begin(), eventSample.end(), [](const TMVA::Event* a, const TMVA::Event* b){
1553  // return (a->GetTarget(0) < b->GetTarget(0)); });
1554  fSepType=NULL; //set fSepType to NULL (regression trees are used for both classification an regression)
1555  if(DoRegression()){
1556  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1557  fLossFunctionEventInfo[*e]= TMVA::LossFunctionEventInfo((*e)->GetTarget(0), 0, (*e)->GetWeight());
1558  }
1559 
1562  return;
1563  }
1564  else if(DoMulticlass()){
1565  UInt_t nClasses = DataInfo().GetNClasses();
1566  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1567  for (UInt_t i=0;i<nClasses;i++){
1568  //Calculate initial residua, assuming equal probability for all classes
1569  Double_t r = (*e)->GetClass()==i?(1-1.0/nClasses):(-1.0/nClasses);
1570  const_cast<TMVA::Event*>(*e)->SetTarget(i,r);
1571  fResiduals[*e].push_back(0);
1572  }
1573  }
1574  }
1575  else{
1576  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1577  Double_t r = (DataInfo().IsSignal(*e)?1:0)-0.5; //Calculate initial residua
1578  const_cast<TMVA::Event*>(*e)->SetTarget(0,r);
1579  fResiduals[*e].push_back(0);
1580  }
1581  }
1582 
1583 }
1584 ////////////////////////////////////////////////////////////////////////////////
1585 /// Test the tree quality.. in terms of Misclassification.
1586 
1588 {
1589  Double_t ncorrect=0, nfalse=0;
1590  for (UInt_t ievt=0; ievt<fValidationSample.size(); ievt++) {
1591  Bool_t isSignalType= (dt->CheckEvent(fValidationSample[ievt]) > fNodePurityLimit ) ? 1 : 0;
1592 
1593  if (isSignalType == (DataInfo().IsSignal(fValidationSample[ievt])) ) {
1594  ncorrect += fValidationSample[ievt]->GetWeight();
1595  }
1596  else{
1597  nfalse += fValidationSample[ievt]->GetWeight();
1598  }
1599  }
1600 
1601  return ncorrect / (ncorrect + nfalse);
1602 }
1603 
1604 ////////////////////////////////////////////////////////////////////////////////
1605 /// Apply the boosting algorithm (the algorithm is selecte via the the "option" given
1606 /// in the constructor. The return value is the boosting weight.
1607 
1608 Double_t TMVA::MethodBDT::Boost( std::vector<const TMVA::Event*>& eventSample, DecisionTree *dt, UInt_t cls )
1609 {
1610  Double_t returnVal=-1;
1611 
1612  if (fBoostType=="AdaBoost") returnVal = this->AdaBoost (eventSample, dt);
1613  else if (fBoostType=="AdaCost") returnVal = this->AdaCost (eventSample, dt);
1614  else if (fBoostType=="Bagging") returnVal = this->Bagging ( );
1615  else if (fBoostType=="RegBoost") returnVal = this->RegBoost (eventSample, dt);
1616  else if (fBoostType=="AdaBoostR2") returnVal = this->AdaBoostR2(eventSample, dt);
1617  else if (fBoostType=="Grad"){
1618  if(DoRegression())
1619  returnVal = this->GradBoostRegression(eventSample, dt);
1620  else if(DoMulticlass())
1621  returnVal = this->GradBoost (eventSample, dt, cls);
1622  else
1623  returnVal = this->GradBoost (eventSample, dt);
1624  }
1625  else {
1626  Log() << kINFO << GetOptions() << Endl;
1627  Log() << kFATAL << "<Boost> unknown boost option " << fBoostType<< " called" << Endl;
1628  }
1629 
1630  if (fBaggedBoost){
1632  }
1633 
1634 
1635  return returnVal;
1636 }
1637 
1638 ////////////////////////////////////////////////////////////////////////////////
1639 /// Fills the ROCIntegral vs Itree from the testSample for the monitoring plots
1640 /// during the training .. but using the testing events
1641 
1643 {
1645 
1646  TH1F *tmpS = new TH1F( "tmpS", "", 100 , -1., 1.00001 );
1647  TH1F *tmpB = new TH1F( "tmpB", "", 100 , -1., 1.00001 );
1648  TH1F *tmp;
1649 
1650 
1651  UInt_t signalClassNr = DataInfo().GetClassInfo("Signal")->GetNumber();
1652 
1653  // const std::vector<Event*> events=Data()->GetEventCollection(Types::kTesting);
1654  // // fMethod->GetTransformationHandler().CalcTransformations(fMethod->Data()->GetEventCollection(Types::kTesting));
1655  // for (UInt_t iev=0; iev < events.size() ; iev++){
1656  // if (events[iev]->GetClass() == signalClassNr) tmp=tmpS;
1657  // else tmp=tmpB;
1658  // tmp->Fill(PrivateGetMvaValue(*(events[iev])),events[iev]->GetWeight());
1659  // }
1660 
1661  UInt_t nevents = Data()->GetNTestEvents();
1662  for (UInt_t iev=0; iev < nevents; iev++){
1663  const Event* event = GetTestingEvent(iev);
1664 
1665  if (event->GetClass() == signalClassNr) {tmp=tmpS;}
1666  else {tmp=tmpB;}
1667  tmp->Fill(PrivateGetMvaValue(event),event->GetWeight());
1668  }
1669  Double_t max=1;
1670 
1671  std::vector<TH1F*> hS;
1672  std::vector<TH1F*> hB;
1673  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
1674  hS.push_back(new TH1F(Form("SigVar%dAtTree%d",ivar,iTree),Form("SigVar%dAtTree%d",ivar,iTree),100,DataInfo().GetVariableInfo(ivar).GetMin(),DataInfo().GetVariableInfo(ivar).GetMax()));
1675  hB.push_back(new TH1F(Form("BkgVar%dAtTree%d",ivar,iTree),Form("BkgVar%dAtTree%d",ivar,iTree),100,DataInfo().GetVariableInfo(ivar).GetMin(),DataInfo().GetVariableInfo(ivar).GetMax()));
1676  results->Store(hS.back(),hS.back()->GetTitle());
1677  results->Store(hB.back(),hB.back()->GetTitle());
1678  }
1679 
1680 
1681  for (UInt_t iev=0; iev < fEventSample.size(); iev++){
1682  if (fEventSample[iev]->GetBoostWeight() > max) max = 1.01*fEventSample[iev]->GetBoostWeight();
1683  }
1684  TH1F *tmpBoostWeightsS = new TH1F(Form("BoostWeightsInTreeS%d",iTree),Form("BoostWeightsInTreeS%d",iTree),100,0.,max);
1685  TH1F *tmpBoostWeightsB = new TH1F(Form("BoostWeightsInTreeB%d",iTree),Form("BoostWeightsInTreeB%d",iTree),100,0.,max);
1686  results->Store(tmpBoostWeightsS,tmpBoostWeightsS->GetTitle());
1687  results->Store(tmpBoostWeightsB,tmpBoostWeightsB->GetTitle());
1688 
1689  TH1F *tmpBoostWeights;
1690  std::vector<TH1F*> *h;
1691 
1692  for (UInt_t iev=0; iev < fEventSample.size(); iev++){
1693  if (fEventSample[iev]->GetClass() == signalClassNr) {
1694  tmpBoostWeights=tmpBoostWeightsS;
1695  h=&hS;
1696  }else{
1697  tmpBoostWeights=tmpBoostWeightsB;
1698  h=&hB;
1699  }
1700  tmpBoostWeights->Fill(fEventSample[iev]->GetBoostWeight());
1701  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
1702  (*h)[ivar]->Fill(fEventSample[iev]->GetValue(ivar),fEventSample[iev]->GetWeight());
1703  }
1704  }
1705 
1706 
1707  TMVA::PDF *sig = new TMVA::PDF( " PDF Sig", tmpS, TMVA::PDF::kSpline3 );
1708  TMVA::PDF *bkg = new TMVA::PDF( " PDF Bkg", tmpB, TMVA::PDF::kSpline3 );
1709 
1710 
1711  TGraph* gr=results->GetGraph("BoostMonitorGraph");
1712  Int_t nPoints = gr->GetN();
1713  gr->Set(nPoints+1);
1714  gr->SetPoint(nPoints,(Double_t)iTree+1,GetROCIntegral(sig,bkg));
1715 
1716  tmpS->Delete();
1717  tmpB->Delete();
1718 
1719  delete sig;
1720  delete bkg;
1721 
1722  return;
1723 }
1724 
1725 ////////////////////////////////////////////////////////////////////////////////
1726 /// The AdaBoost implementation.
1727 /// a new training sample is generated by weighting
1728 /// events that are misclassified by the decision tree. The weight
1729 /// applied is \f$ w = \frac{(1-err)}{err} \f$ or more general:
1730 /// \f$ w = (\frac{(1-err)}{err})^\beta \f$
1731 /// where \f$err\f$ is the fraction of misclassified events in the tree ( <0.5 assuming
1732 /// demanding the that previous selection was better than random guessing)
1733 /// and "beta" being a free parameter (standard: beta = 1) that modifies the
1734 /// boosting.
1735 
1736 Double_t TMVA::MethodBDT::AdaBoost( std::vector<const TMVA::Event*>& eventSample, DecisionTree *dt )
1737 {
1738  Double_t err=0, sumGlobalw=0, sumGlobalwfalse=0, sumGlobalwfalse2=0;
1739 
1740  std::vector<Double_t> sumw(DataInfo().GetNClasses(),0); //for individually re-scaling each class
1741 
1742  Double_t maxDev=0;
1743  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1744  Double_t w = (*e)->GetWeight();
1745  sumGlobalw += w;
1746  UInt_t iclass=(*e)->GetClass();
1747  sumw[iclass] += w;
1748 
1749  if ( DoRegression() ) {
1750  Double_t tmpDev = TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) );
1751  sumGlobalwfalse += w * tmpDev;
1752  sumGlobalwfalse2 += w * tmpDev*tmpDev;
1753  if (tmpDev > maxDev) maxDev = tmpDev;
1754  }else{
1755 
1756  if (fUseYesNoLeaf){
1757  Bool_t isSignalType = (dt->CheckEvent(*e,fUseYesNoLeaf) > fNodePurityLimit );
1758  if (!(isSignalType == DataInfo().IsSignal(*e))) {
1759  sumGlobalwfalse+= w;
1760  }
1761  }else{
1762  Double_t dtoutput = (dt->CheckEvent(*e,fUseYesNoLeaf) - 0.5)*2.;
1763  Int_t trueType;
1764  if (DataInfo().IsSignal(*e)) trueType = 1;
1765  else trueType = -1;
1766  sumGlobalwfalse+= w*trueType*dtoutput;
1767  }
1768  }
1769  }
1770 
1771  err = sumGlobalwfalse/sumGlobalw ;
1772  if ( DoRegression() ) {
1773  //if quadratic loss:
1774  if (fAdaBoostR2Loss=="linear"){
1775  err = sumGlobalwfalse/maxDev/sumGlobalw ;
1776  }
1777  else if (fAdaBoostR2Loss=="quadratic"){
1778  err = sumGlobalwfalse2/maxDev/maxDev/sumGlobalw ;
1779  }
1780  else if (fAdaBoostR2Loss=="exponential"){
1781  err = 0;
1782  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1783  Double_t w = (*e)->GetWeight();
1784  Double_t tmpDev = TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) );
1785  err += w * (1 - exp (-tmpDev/maxDev)) / sumGlobalw;
1786  }
1787 
1788  }
1789  else {
1790  Log() << kFATAL << " you've chosen a Loss type for Adaboost other than linear, quadratic or exponential "
1791  << " namely " << fAdaBoostR2Loss << "\n"
1792  << "and this is not implemented... a typo in the options ??" <<Endl;
1793  }
1794  }
1795 
1796  Log() << kDEBUG << "BDT AdaBoos wrong/all: " << sumGlobalwfalse << "/" << sumGlobalw << Endl;
1797 
1798 
1799  Double_t newSumGlobalw=0;
1800  std::vector<Double_t> newSumw(sumw.size(),0);
1801 
1802  Double_t boostWeight=1.;
1803  if (err >= 0.5 && fUseYesNoLeaf) { // sanity check ... should never happen as otherwise there is apparently
1804  // something odd with the assignment of the leaf nodes (rem: you use the training
1805  // events for this determination of the error rate)
1806  if (dt->GetNNodes() == 1){
1807  Log() << kERROR << " YOUR tree has only 1 Node... kind of a funny *tree*. I cannot "
1808  << "boost such a thing... if after 1 step the error rate is == 0.5"
1809  << Endl
1810  << "please check why this happens, maybe too many events per node requested ?"
1811  << Endl;
1812 
1813  }else{
1814  Log() << kERROR << " The error rate in the BDT boosting is > 0.5. ("<< err
1815  << ") That should not happen, please check your code (i.e... the BDT code), I "
1816  << " stop boosting here" << Endl;
1817  return -1;
1818  }
1819  err = 0.5;
1820  } else if (err < 0) {
1821  Log() << kERROR << " The error rate in the BDT boosting is < 0. That can happen"
1822  << " due to improper treatment of negative weights in a Monte Carlo.. (if you have"
1823  << " an idea on how to do it in a better way, please let me know (Helge.Voss@cern.ch)"
1824  << " for the time being I set it to its absolute value.. just to continue.." << Endl;
1825  err = TMath::Abs(err);
1826  }
1827  if (fUseYesNoLeaf)
1828  boostWeight = TMath::Log((1.-err)/err)*fAdaBoostBeta;
1829  else
1830  boostWeight = TMath::Log((1.+err)/(1-err))*fAdaBoostBeta;
1831 
1832 
1833  Log() << kDEBUG << "BDT AdaBoos wrong/all: " << sumGlobalwfalse << "/" << sumGlobalw << " 1-err/err="<<boostWeight<< " log.."<<TMath::Log(boostWeight)<<Endl;
1834 
1836 
1837 
1838  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1839 
1840  if (fUseYesNoLeaf||DoRegression()){
1841  if ((!( (dt->CheckEvent(*e,fUseYesNoLeaf) > fNodePurityLimit ) == DataInfo().IsSignal(*e))) || DoRegression()) {
1842  Double_t boostfactor = TMath::Exp(boostWeight);
1843 
1844  if (DoRegression()) boostfactor = TMath::Power(1/boostWeight,(1.-TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) )/maxDev ) );
1845  if ( (*e)->GetWeight() > 0 ){
1846  (*e)->SetBoostWeight( (*e)->GetBoostWeight() * boostfactor);
1847  // Helge change back (*e)->ScaleBoostWeight(boostfactor);
1848  if (DoRegression()) results->GetHist("BoostWeights")->Fill(boostfactor);
1849  } else {
1850  if ( fInverseBoostNegWeights )(*e)->ScaleBoostWeight( 1. / boostfactor); // if the original event weight is negative, and you want to "increase" the events "positive" influence, you'd rather make the event weight "smaller" in terms of it's absolute value while still keeping it something "negative"
1851  else (*e)->SetBoostWeight( (*e)->GetBoostWeight() * boostfactor);
1852 
1853  }
1854  }
1855 
1856  }else{
1857  Double_t dtoutput = (dt->CheckEvent(*e,fUseYesNoLeaf) - 0.5)*2.;
1858  Int_t trueType;
1859  if (DataInfo().IsSignal(*e)) trueType = 1;
1860  else trueType = -1;
1861  Double_t boostfactor = TMath::Exp(-1*boostWeight*trueType*dtoutput);
1862 
1863  if ( (*e)->GetWeight() > 0 ){
1864  (*e)->SetBoostWeight( (*e)->GetBoostWeight() * boostfactor);
1865  // Helge change back (*e)->ScaleBoostWeight(boostfactor);
1866  if (DoRegression()) results->GetHist("BoostWeights")->Fill(boostfactor);
1867  } else {
1868  if ( fInverseBoostNegWeights )(*e)->ScaleBoostWeight( 1. / boostfactor); // if the original event weight is negative, and you want to "increase" the events "positive" influence, you'd rather make the event weight "smaller" in terms of it's absolute value while still keeping it something "negative"
1869  else (*e)->SetBoostWeight( (*e)->GetBoostWeight() * boostfactor);
1870  }
1871  }
1872  newSumGlobalw+=(*e)->GetWeight();
1873  newSumw[(*e)->GetClass()] += (*e)->GetWeight();
1874  }
1875 
1876 
1877  // Double_t globalNormWeight=sumGlobalw/newSumGlobalw;
1878  Double_t globalNormWeight=( (Double_t) eventSample.size())/newSumGlobalw;
1879  Log() << kDEBUG << "new Nsig="<<newSumw[0]*globalNormWeight << " new Nbkg="<<newSumw[1]*globalNormWeight << Endl;
1880 
1881 
1882  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1883  // if (fRenormByClass) (*e)->ScaleBoostWeight( normWeightByClass[(*e)->GetClass()] );
1884  // else (*e)->ScaleBoostWeight( globalNormWeight );
1885  // else (*e)->ScaleBoostWeight( globalNormWeight );
1886  if (DataInfo().IsSignal(*e))(*e)->ScaleBoostWeight( globalNormWeight * fSigToBkgFraction );
1887  else (*e)->ScaleBoostWeight( globalNormWeight );
1888  }
1889 
1890  if (!(DoRegression()))results->GetHist("BoostWeights")->Fill(boostWeight);
1891  results->GetHist("BoostWeightsVsTree")->SetBinContent(fForest.size(),boostWeight);
1892  results->GetHist("ErrorFrac")->SetBinContent(fForest.size(),err);
1893 
1894  fBoostWeight = boostWeight;
1895  fErrorFraction = err;
1896 
1897  return boostWeight;
1898 }
1899 
1900 ////////////////////////////////////////////////////////////////////////////////
1901 /// The AdaCost boosting algorithm takes a simple cost Matrix (currently fixed for
1902 /// all events... later could be modified to use individual cost matrices for each
1903 /// events as in the original paper...
1904 ///
1905 /// true_signal true_bkg
1906 /// ----------------------------------
1907 /// sel_signal | Css Ctb_ss Cxx.. in the range [0,1]
1908 /// sel_bkg | Cts_sb Cbb
1909 ///
1910 /// and takes this into account when calculating the mis class. cost (former: error fraction):
1911 ///
1912 /// err = sum_events ( weight* y_true*y_sel * beta(event)
1913 
1914 Double_t TMVA::MethodBDT::AdaCost( vector<const TMVA::Event*>& eventSample, DecisionTree *dt )
1915 {
1916  Double_t Css = fCss;
1917  Double_t Cbb = fCbb;
1918  Double_t Cts_sb = fCts_sb;
1919  Double_t Ctb_ss = fCtb_ss;
1920 
1921  Double_t err=0, sumGlobalWeights=0, sumGlobalCost=0;
1922 
1923  std::vector<Double_t> sumw(DataInfo().GetNClasses(),0); //for individually re-scaling each class
1924 
1925  for (vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1926  Double_t w = (*e)->GetWeight();
1927  sumGlobalWeights += w;
1928  UInt_t iclass=(*e)->GetClass();
1929 
1930  sumw[iclass] += w;
1931 
1932  if ( DoRegression() ) {
1933  Log() << kFATAL << " AdaCost not implemented for regression"<<Endl;
1934  }else{
1935 
1936  Double_t dtoutput = (dt->CheckEvent(*e,false) - 0.5)*2.;
1937  Int_t trueType;
1938  Bool_t isTrueSignal = DataInfo().IsSignal(*e);
1939  Bool_t isSelectedSignal = (dtoutput>0);
1940  if (isTrueSignal) trueType = 1;
1941  else trueType = -1;
1942 
1943  Double_t cost=0;
1944  if (isTrueSignal && isSelectedSignal) cost=Css;
1945  else if (isTrueSignal && !isSelectedSignal) cost=Cts_sb;
1946  else if (!isTrueSignal && isSelectedSignal) cost=Ctb_ss;
1947  else if (!isTrueSignal && !isSelectedSignal) cost=Cbb;
1948  else Log() << kERROR << "something went wrong in AdaCost" << Endl;
1949 
1950  sumGlobalCost+= w*trueType*dtoutput*cost;
1951 
1952  }
1953  }
1954 
1955  if ( DoRegression() ) {
1956  Log() << kFATAL << " AdaCost not implemented for regression"<<Endl;
1957  }
1958 
1959  // Log() << kDEBUG << "BDT AdaBoos wrong/all: " << sumGlobalCost << "/" << sumGlobalWeights << Endl;
1960  // Log() << kWARNING << "BDT AdaBoos wrong/all: " << sumGlobalCost << "/" << sumGlobalWeights << Endl;
1961  sumGlobalCost /= sumGlobalWeights;
1962  // Log() << kWARNING << "BDT AdaBoos wrong/all: " << sumGlobalCost << "/" << sumGlobalWeights << Endl;
1963 
1964 
1965  Double_t newSumGlobalWeights=0;
1966  vector<Double_t> newSumClassWeights(sumw.size(),0);
1967 
1968  Double_t boostWeight = TMath::Log((1+sumGlobalCost)/(1-sumGlobalCost)) * fAdaBoostBeta;
1969 
1971 
1972  for (vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1973  Double_t dtoutput = (dt->CheckEvent(*e,false) - 0.5)*2.;
1974  Int_t trueType;
1975  Bool_t isTrueSignal = DataInfo().IsSignal(*e);
1976  Bool_t isSelectedSignal = (dtoutput>0);
1977  if (isTrueSignal) trueType = 1;
1978  else trueType = -1;
1979 
1980  Double_t cost=0;
1981  if (isTrueSignal && isSelectedSignal) cost=Css;
1982  else if (isTrueSignal && !isSelectedSignal) cost=Cts_sb;
1983  else if (!isTrueSignal && isSelectedSignal) cost=Ctb_ss;
1984  else if (!isTrueSignal && !isSelectedSignal) cost=Cbb;
1985  else Log() << kERROR << "something went wrong in AdaCost" << Endl;
1986 
1987  Double_t boostfactor = TMath::Exp(-1*boostWeight*trueType*dtoutput*cost);
1988  if (DoRegression())Log() << kFATAL << " AdaCost not implemented for regression"<<Endl;
1989  if ( (*e)->GetWeight() > 0 ){
1990  (*e)->SetBoostWeight( (*e)->GetBoostWeight() * boostfactor);
1991  // Helge change back (*e)->ScaleBoostWeight(boostfactor);
1992  if (DoRegression())Log() << kFATAL << " AdaCost not implemented for regression"<<Endl;
1993  } else {
1994  if ( fInverseBoostNegWeights )(*e)->ScaleBoostWeight( 1. / boostfactor); // if the original event weight is negative, and you want to "increase" the events "positive" influence, you'd rather make the event weight "smaller" in terms of it's absolute value while still keeping it something "negative"
1995  }
1996 
1997  newSumGlobalWeights+=(*e)->GetWeight();
1998  newSumClassWeights[(*e)->GetClass()] += (*e)->GetWeight();
1999  }
2000 
2001 
2002  // Double_t globalNormWeight=sumGlobalWeights/newSumGlobalWeights;
2003  Double_t globalNormWeight=Double_t(eventSample.size())/newSumGlobalWeights;
2004  Log() << kDEBUG << "new Nsig="<<newSumClassWeights[0]*globalNormWeight << " new Nbkg="<<newSumClassWeights[1]*globalNormWeight << Endl;
2005 
2006 
2007  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
2008  // if (fRenormByClass) (*e)->ScaleBoostWeight( normWeightByClass[(*e)->GetClass()] );
2009  // else (*e)->ScaleBoostWeight( globalNormWeight );
2010  if (DataInfo().IsSignal(*e))(*e)->ScaleBoostWeight( globalNormWeight * fSigToBkgFraction );
2011  else (*e)->ScaleBoostWeight( globalNormWeight );
2012  }
2013 
2014 
2015  if (!(DoRegression()))results->GetHist("BoostWeights")->Fill(boostWeight);
2016  results->GetHist("BoostWeightsVsTree")->SetBinContent(fForest.size(),boostWeight);
2017  results->GetHist("ErrorFrac")->SetBinContent(fForest.size(),err);
2018 
2019  fBoostWeight = boostWeight;
2020  fErrorFraction = err;
2021 
2022 
2023  return boostWeight;
2024 }
2025 
2026 ////////////////////////////////////////////////////////////////////////////////
2027 /// Call it boot-strapping, re-sampling or whatever you like, in the end it is nothing
2028 /// else but applying "random" poisson weights to each event.
2029 
2031 {
2032  // this is now done in "MethodBDT::Boost as it might be used by other boost methods, too
2033  // GetBaggedSample(eventSample);
2034 
2035  return 1.; //here as there are random weights for each event, just return a constant==1;
2036 }
2037 
2038 ////////////////////////////////////////////////////////////////////////////////
2039 /// Fills fEventSample with fBaggedSampleFraction*NEvents random training events.
2040 
2041 void TMVA::MethodBDT::GetBaggedSubSample(std::vector<const TMVA::Event*>& eventSample)
2042 {
2043 
2044  Double_t n;
2045  TRandom3 *trandom = new TRandom3(100*fForest.size()+1234);
2046 
2047  if (!fSubSample.empty()) fSubSample.clear();
2048 
2049  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
2050  n = trandom->PoissonD(fBaggedSampleFraction);
2051  for (Int_t i=0;i<n;i++) fSubSample.push_back(*e);
2052  }
2053 
2054  delete trandom;
2055  return;
2056 
2057  /*
2058  UInt_t nevents = fEventSample.size();
2059 
2060  if (!fSubSample.empty()) fSubSample.clear();
2061  TRandom3 *trandom = new TRandom3(fForest.size()+1);
2062 
2063  for (UInt_t ievt=0; ievt<nevents; ievt++) { // recreate new random subsample
2064  if(trandom->Rndm()<fBaggedSampleFraction)
2065  fSubSample.push_back(fEventSample[ievt]);
2066  }
2067  delete trandom;
2068  */
2069 
2070 }
2071 
2072 ////////////////////////////////////////////////////////////////////////////////
2073 /// A special boosting only for Regression (not implemented).
2074 
2075 Double_t TMVA::MethodBDT::RegBoost( std::vector<const TMVA::Event*>& /* eventSample */, DecisionTree* /* dt */ )
2076 {
2077  return 1;
2078 }
2079 
2080 ////////////////////////////////////////////////////////////////////////////////
2081 /// Adaption of the AdaBoost to regression problems (see H.Drucker 1997).
2082 
2083 Double_t TMVA::MethodBDT::AdaBoostR2( std::vector<const TMVA::Event*>& eventSample, DecisionTree *dt )
2084 {
2085  if ( !DoRegression() ) Log() << kFATAL << "Somehow you chose a regression boost method for a classification job" << Endl;
2086 
2087  Double_t err=0, sumw=0, sumwfalse=0, sumwfalse2=0;
2088  Double_t maxDev=0;
2089  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
2090  Double_t w = (*e)->GetWeight();
2091  sumw += w;
2092 
2093  Double_t tmpDev = TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) );
2094  sumwfalse += w * tmpDev;
2095  sumwfalse2 += w * tmpDev*tmpDev;
2096  if (tmpDev > maxDev) maxDev = tmpDev;
2097  }
2098 
2099  //if quadratic loss:
2100  if (fAdaBoostR2Loss=="linear"){
2101  err = sumwfalse/maxDev/sumw ;
2102  }
2103  else if (fAdaBoostR2Loss=="quadratic"){
2104  err = sumwfalse2/maxDev/maxDev/sumw ;
2105  }
2106  else if (fAdaBoostR2Loss=="exponential"){
2107  err = 0;
2108  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
2109  Double_t w = (*e)->GetWeight();
2110  Double_t tmpDev = TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) );
2111  err += w * (1 - exp (-tmpDev/maxDev)) / sumw;
2112  }
2113 
2114  }
2115  else {
2116  Log() << kFATAL << " you've chosen a Loss type for Adaboost other than linear, quadratic or exponential "
2117  << " namely " << fAdaBoostR2Loss << "\n"
2118  << "and this is not implemented... a typo in the options ??" <<Endl;
2119  }
2120 
2121 
2122  if (err >= 0.5) { // sanity check ... should never happen as otherwise there is apparently
2123  // something odd with the assignment of the leaf nodes (rem: you use the training
2124  // events for this determination of the error rate)
2125  if (dt->GetNNodes() == 1){
2126  Log() << kERROR << " YOUR tree has only 1 Node... kind of a funny *tree*. I cannot "
2127  << "boost such a thing... if after 1 step the error rate is == 0.5"
2128  << Endl
2129  << "please check why this happens, maybe too many events per node requested ?"
2130  << Endl;
2131 
2132  }else{
2133  Log() << kERROR << " The error rate in the BDT boosting is > 0.5. ("<< err
2134  << ") That should not happen, but is possible for regression trees, and"
2135  << " should trigger a stop for the boosting. please check your code (i.e... the BDT code), I "
2136  << " stop boosting " << Endl;
2137  return -1;
2138  }
2139  err = 0.5;
2140  } else if (err < 0) {
2141  Log() << kERROR << " The error rate in the BDT boosting is < 0. That can happen"
2142  << " due to improper treatment of negative weights in a Monte Carlo.. (if you have"
2143  << " an idea on how to do it in a better way, please let me know (Helge.Voss@cern.ch)"
2144  << " for the time being I set it to its absolute value.. just to continue.." << Endl;
2145  err = TMath::Abs(err);
2146  }
2147 
2148  Double_t boostWeight = err / (1.-err);
2149  Double_t newSumw=0;
2150 
2152 
2153  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
2154  Double_t boostfactor = TMath::Power(boostWeight,(1.-TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) )/maxDev ) );
2155  results->GetHist("BoostWeights")->Fill(boostfactor);
2156  // std::cout << "R2 " << boostfactor << " " << boostWeight << " " << (1.-TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) )/maxDev) << std::endl;
2157  if ( (*e)->GetWeight() > 0 ){
2158  Float_t newBoostWeight = (*e)->GetBoostWeight() * boostfactor;
2159  Float_t newWeight = (*e)->GetWeight() * (*e)->GetBoostWeight() * boostfactor;
2160  if (newWeight == 0) {
2161  Log() << kINFO << "Weight= " << (*e)->GetWeight() << Endl;
2162  Log() << kINFO << "BoostWeight= " << (*e)->GetBoostWeight() << Endl;
2163  Log() << kINFO << "boostweight="<<boostWeight << " err= " <<err << Endl;
2164  Log() << kINFO << "NewBoostWeight= " << newBoostWeight << Endl;
2165  Log() << kINFO << "boostfactor= " << boostfactor << Endl;
2166  Log() << kINFO << "maxDev = " << maxDev << Endl;
2167  Log() << kINFO << "tmpDev = " << TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) ) << Endl;
2168  Log() << kINFO << "target = " << (*e)->GetTarget(0) << Endl;
2169  Log() << kINFO << "estimate = " << dt->CheckEvent(*e,kFALSE) << Endl;
2170  }
2171  (*e)->SetBoostWeight( newBoostWeight );
2172  // (*e)->SetBoostWeight( (*e)->GetBoostWeight() * boostfactor);
2173  } else {
2174  (*e)->SetBoostWeight( (*e)->GetBoostWeight() / boostfactor);
2175  }
2176  newSumw+=(*e)->GetWeight();
2177  }
2178 
2179  // re-normalise the weights
2180  Double_t normWeight = sumw / newSumw;
2181  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
2182  //Helge (*e)->ScaleBoostWeight( sumw/newSumw);
2183  // (*e)->ScaleBoostWeight( normWeight);
2184  (*e)->SetBoostWeight( (*e)->GetBoostWeight() * normWeight );
2185  }
2186 
2187 
2188  results->GetHist("BoostWeightsVsTree")->SetBinContent(fForest.size(),1./boostWeight);
2189  results->GetHist("ErrorFrac")->SetBinContent(fForest.size(),err);
2190 
2191  fBoostWeight = boostWeight;
2192  fErrorFraction = err;
2193 
2194  return TMath::Log(1./boostWeight);
2195 }
2196 
2197 ////////////////////////////////////////////////////////////////////////////////
2198 /// Write weights to XML.
2199 
2200 void TMVA::MethodBDT::AddWeightsXMLTo( void* parent ) const
2201 {
2202  void* wght = gTools().AddChild(parent, "Weights");
2203 
2204  if (fDoPreselection){
2205  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
2206  gTools().AddAttr( wght, Form("PreselectionLowBkgVar%d",ivar), fIsLowBkgCut[ivar]);
2207  gTools().AddAttr( wght, Form("PreselectionLowBkgVar%dValue",ivar), fLowBkgCut[ivar]);
2208  gTools().AddAttr( wght, Form("PreselectionLowSigVar%d",ivar), fIsLowSigCut[ivar]);
2209  gTools().AddAttr( wght, Form("PreselectionLowSigVar%dValue",ivar), fLowSigCut[ivar]);
2210  gTools().AddAttr( wght, Form("PreselectionHighBkgVar%d",ivar), fIsHighBkgCut[ivar]);
2211  gTools().AddAttr( wght, Form("PreselectionHighBkgVar%dValue",ivar),fHighBkgCut[ivar]);
2212  gTools().AddAttr( wght, Form("PreselectionHighSigVar%d",ivar), fIsHighSigCut[ivar]);
2213  gTools().AddAttr( wght, Form("PreselectionHighSigVar%dValue",ivar),fHighSigCut[ivar]);
2214  }
2215  }
2216 
2217 
2218  gTools().AddAttr( wght, "NTrees", fForest.size() );
2219  gTools().AddAttr( wght, "AnalysisType", fForest.back()->GetAnalysisType() );
2220 
2221  for (UInt_t i=0; i< fForest.size(); i++) {
2222  void* trxml = fForest[i]->AddXMLTo(wght);
2223  gTools().AddAttr( trxml, "boostWeight", fBoostWeights[i] );
2224  gTools().AddAttr( trxml, "itree", i );
2225  }
2226 }
2227 
2228 ////////////////////////////////////////////////////////////////////////////////
2229 /// Reads the BDT from the xml file.
2230 
2232  UInt_t i;
2233  for (i=0; i<fForest.size(); i++) delete fForest[i];
2234  fForest.clear();
2235  fBoostWeights.clear();
2236 
2237  UInt_t ntrees;
2238  UInt_t analysisType;
2239  Float_t boostWeight;
2240 
2241 
2242  if (gTools().HasAttr( parent, Form("PreselectionLowBkgVar%d",0))) {
2243  fIsLowBkgCut.resize(GetNvar());
2244  fLowBkgCut.resize(GetNvar());
2245  fIsLowSigCut.resize(GetNvar());
2246  fLowSigCut.resize(GetNvar());
2247  fIsHighBkgCut.resize(GetNvar());
2248  fHighBkgCut.resize(GetNvar());
2249  fIsHighSigCut.resize(GetNvar());
2250  fHighSigCut.resize(GetNvar());
2251 
2252  Bool_t tmpBool;
2253  Double_t tmpDouble;
2254  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
2255  gTools().ReadAttr( parent, Form("PreselectionLowBkgVar%d",ivar), tmpBool);
2256  fIsLowBkgCut[ivar]=tmpBool;
2257  gTools().ReadAttr( parent, Form("PreselectionLowBkgVar%dValue",ivar), tmpDouble);
2258  fLowBkgCut[ivar]=tmpDouble;
2259  gTools().ReadAttr( parent, Form("PreselectionLowSigVar%d",ivar), tmpBool);
2260  fIsLowSigCut[ivar]=tmpBool;
2261  gTools().ReadAttr( parent, Form("PreselectionLowSigVar%dValue",ivar), tmpDouble);
2262  fLowSigCut[ivar]=tmpDouble;
2263  gTools().ReadAttr( parent, Form("PreselectionHighBkgVar%d",ivar), tmpBool);
2264  fIsHighBkgCut[ivar]=tmpBool;
2265  gTools().ReadAttr( parent, Form("PreselectionHighBkgVar%dValue",ivar), tmpDouble);
2266  fHighBkgCut[ivar]=tmpDouble;
2267  gTools().ReadAttr( parent, Form("PreselectionHighSigVar%d",ivar),tmpBool);
2268  fIsHighSigCut[ivar]=tmpBool;
2269  gTools().ReadAttr( parent, Form("PreselectionHighSigVar%dValue",ivar), tmpDouble);
2270  fHighSigCut[ivar]=tmpDouble;
2271  }
2272  }
2273 
2274 
2275  gTools().ReadAttr( parent, "NTrees", ntrees );
2276 
2277  if(gTools().HasAttr(parent, "TreeType")) { // pre 4.1.0 version
2278  gTools().ReadAttr( parent, "TreeType", analysisType );
2279  } else { // from 4.1.0 onwards
2280  gTools().ReadAttr( parent, "AnalysisType", analysisType );
2281  }
2282 
2283  void* ch = gTools().GetChild(parent);
2284  i=0;
2285  while(ch) {
2286  fForest.push_back( dynamic_cast<DecisionTree*>( DecisionTree::CreateFromXML(ch, GetTrainingTMVAVersionCode()) ) );
2287  fForest.back()->SetAnalysisType(Types::EAnalysisType(analysisType));
2288  fForest.back()->SetTreeID(i++);
2289  gTools().ReadAttr(ch,"boostWeight",boostWeight);
2290  fBoostWeights.push_back(boostWeight);
2291  ch = gTools().GetNextChild(ch);
2292  }
2293 }
2294 
2295 ////////////////////////////////////////////////////////////////////////////////
2296 /// Read the weights (BDT coefficients).
2297 
2298 void TMVA::MethodBDT::ReadWeightsFromStream( std::istream& istr )
2299 {
2300  TString dummy;
2301  // Types::EAnalysisType analysisType;
2302  Int_t analysisType(0);
2303 
2304  // coverity[tainted_data_argument]
2305  istr >> dummy >> fNTrees;
2306  Log() << kINFO << "Read " << fNTrees << " Decision trees" << Endl;
2307 
2308  for (UInt_t i=0;i<fForest.size();i++) delete fForest[i];
2309  fForest.clear();
2310  fBoostWeights.clear();
2311  Int_t iTree;
2312  Double_t boostWeight;
2313  for (int i=0;i<fNTrees;i++) {
2314  istr >> dummy >> iTree >> dummy >> boostWeight;
2315  if (iTree != i) {
2316  fForest.back()->Print( std::cout );
2317  Log() << kFATAL << "Error while reading weight file; mismatch iTree="
2318  << iTree << " i=" << i
2319  << " dummy " << dummy
2320  << " boostweight " << boostWeight
2321  << Endl;
2322  }
2323  fForest.push_back( new DecisionTree() );
2324  fForest.back()->SetAnalysisType(Types::EAnalysisType(analysisType));
2325  fForest.back()->SetTreeID(i);
2326  fForest.back()->Read(istr, GetTrainingTMVAVersionCode());
2327  fBoostWeights.push_back(boostWeight);
2328  }
2329 }
2330 
2331 ////////////////////////////////////////////////////////////////////////////////
2332 
2334  return this->GetMvaValue( err, errUpper, 0 );
2335 }
2336 
2337 ////////////////////////////////////////////////////////////////////////////////
2338 /// Return the MVA value (range [-1;1]) that classifies the
2339 /// event according to the majority vote from the total number of
2340 /// decision trees.
2341 
2343 {
2344  const Event* ev = GetEvent();
2345  if (fDoPreselection) {
2346  Double_t val = ApplyPreselectionCuts(ev);
2347  if (TMath::Abs(val)>0.05) return val;
2348  }
2349  return PrivateGetMvaValue(ev, err, errUpper, useNTrees);
2350 
2351 }
2352 
2353 ////////////////////////////////////////////////////////////////////////////////
2354 /// Return the MVA value (range [-1;1]) that classifies the
2355 /// event according to the majority vote from the total number of
2356 /// decision trees.
2357 
2359 {
2360  // cannot determine error
2361  NoErrorCalc(err, errUpper);
2362 
2363  // allow for the possibility to use less trees in the actual MVA calculation
2364  // than have been originally trained.
2365  UInt_t nTrees = fForest.size();
2366 
2367  if (useNTrees > 0 ) nTrees = useNTrees;
2368 
2369  if (fBoostType=="Grad") return GetGradBoostMVA(ev,nTrees);
2370 
2371  Double_t myMVA = 0;
2372  Double_t norm = 0;
2373  for (UInt_t itree=0; itree<nTrees; itree++) {
2374  //
2375  myMVA += fBoostWeights[itree] * fForest[itree]->CheckEvent(ev,fUseYesNoLeaf);
2376  norm += fBoostWeights[itree];
2377  }
2378  return ( norm > std::numeric_limits<double>::epsilon() ) ? myMVA /= norm : 0 ;
2379 }
2380 
2381 
2382 ////////////////////////////////////////////////////////////////////////////////
2383 /// Get the multiclass MVA response for the BDT classifier.
2384 
2385 const std::vector<Float_t>& TMVA::MethodBDT::GetMulticlassValues()
2386 {
2387  const TMVA::Event *e = GetEvent();
2388  if (fMulticlassReturnVal == NULL) fMulticlassReturnVal = new std::vector<Float_t>();
2389  fMulticlassReturnVal->clear();
2390 
2391  UInt_t nClasses = DataInfo().GetNClasses();
2392  std::vector<Double_t> temp(nClasses);
2393  auto forestSize = fForest.size();
2394  // trees 0, nClasses, 2*nClasses, ... belong to class 0
2395  // trees 1, nClasses+1, 2*nClasses+1, ... belong to class 1 and so forth
2396  UInt_t classOfTree = 0;
2397  for (UInt_t itree = 0; itree < forestSize; ++itree) {
2398  temp[classOfTree] += fForest[itree]->CheckEvent(e, kFALSE);
2399  if (++classOfTree == nClasses) classOfTree = 0; // cheap modulo
2400  }
2401 
2402  // we want to calculate sum of exp(temp[j] - temp[i]) for all i,j (i!=j)
2403  // first calculate exp(), then replace minus with division.
2404  std::transform(temp.begin(), temp.end(), temp.begin(), [](Double_t d){return exp(d);});
2405 
2406  for(UInt_t iClass=0; iClass<nClasses; iClass++){
2407  Double_t norm = 0.0;
2408  for(UInt_t j=0;j<nClasses;j++){
2409  if(iClass!=j)
2410  norm += temp[j] / temp[iClass];
2411  }
2412  (*fMulticlassReturnVal).push_back(1.0/(1.0+norm));
2413  }
2414 
2415  return *fMulticlassReturnVal;
2416 }
2417 
2418 ////////////////////////////////////////////////////////////////////////////////
2419 /// Get the regression value generated by the BDTs.
2420 
2421 const std::vector<Float_t> & TMVA::MethodBDT::GetRegressionValues()
2422 {
2423 
2424  if (fRegressionReturnVal == NULL) fRegressionReturnVal = new std::vector<Float_t>();
2425  fRegressionReturnVal->clear();
2426 
2427  const Event * ev = GetEvent();
2428  Event * evT = new Event(*ev);
2429 
2430  Double_t myMVA = 0;
2431  Double_t norm = 0;
2432  if (fBoostType=="AdaBoostR2") {
2433  // rather than using the weighted average of the tree respones in the forest
2434  // H.Decker(1997) proposed to use the "weighted median"
2435 
2436  // sort all individual tree responses according to the prediction value
2437  // (keep the association to their tree weight)
2438  // the sum up all the associated weights (starting from the one whose tree
2439  // yielded the smalles response) up to the tree "t" at which you've
2440  // added enough tree weights to have more than half of the sum of all tree weights.
2441  // choose as response of the forest that one which belongs to this "t"
2442 
2443  vector< Double_t > response(fForest.size());
2444  vector< Double_t > weight(fForest.size());
2445  Double_t totalSumOfWeights = 0;
2446 
2447  for (UInt_t itree=0; itree<fForest.size(); itree++) {
2448  response[itree] = fForest[itree]->CheckEvent(ev,kFALSE);
2449  weight[itree] = fBoostWeights[itree];
2450  totalSumOfWeights += fBoostWeights[itree];
2451  }
2452 
2453  std::vector< std::vector<Double_t> > vtemp;
2454  vtemp.push_back( response ); // this is the vector that will get sorted
2455  vtemp.push_back( weight );
2456  gTools().UsefulSortAscending( vtemp );
2457 
2458  Int_t t=0;
2459  Double_t sumOfWeights = 0;
2460  while (sumOfWeights <= totalSumOfWeights/2.) {
2461  sumOfWeights += vtemp[1][t];
2462  t++;
2463  }
2464 
2465  Double_t rVal=0;
2466  Int_t count=0;
2467  for (UInt_t i= TMath::Max(UInt_t(0),UInt_t(t-(fForest.size()/6)-0.5));
2468  i< TMath::Min(UInt_t(fForest.size()),UInt_t(t+(fForest.size()/6)+0.5)); i++) {
2469  count++;
2470  rVal+=vtemp[0][i];
2471  }
2472  // fRegressionReturnVal->push_back( rVal/Double_t(count));
2473  evT->SetTarget(0, rVal/Double_t(count) );
2474  }
2475  else if(fBoostType=="Grad"){
2476  for (UInt_t itree=0; itree<fForest.size(); itree++) {
2477  myMVA += fForest[itree]->CheckEvent(ev,kFALSE);
2478  }
2479  // fRegressionReturnVal->push_back( myMVA+fBoostWeights[0]);
2480  evT->SetTarget(0, myMVA+fBoostWeights[0] );
2481  }
2482  else{
2483  for (UInt_t itree=0; itree<fForest.size(); itree++) {
2484  //
2485  myMVA += fBoostWeights[itree] * fForest[itree]->CheckEvent(ev,kFALSE);
2486  norm += fBoostWeights[itree];
2487  }
2488  // fRegressionReturnVal->push_back( ( norm > std::numeric_limits<double>::epsilon() ) ? myMVA /= norm : 0 );
2489  evT->SetTarget(0, ( norm > std::numeric_limits<double>::epsilon() ) ? myMVA /= norm : 0 );
2490  }
2491 
2492 
2493 
2494  const Event* evT2 = GetTransformationHandler().InverseTransform( evT );
2495  fRegressionReturnVal->push_back( evT2->GetTarget(0) );
2496 
2497  delete evT;
2498 
2499 
2500  return *fRegressionReturnVal;
2501 }
2502 
2503 ////////////////////////////////////////////////////////////////////////////////
2504 /// Here we could write some histograms created during the processing
2505 /// to the output file.
2506 
2508 {
2509  Log() << kDEBUG << "\tWrite monitoring histograms to file: " << BaseDir()->GetPath() << Endl;
2510 
2511  //Results* results = Data()->GetResults(GetMethodName(), Types::kTraining, Types::kMaxAnalysisType);
2512  //results->GetStorage()->Write();
2513  fMonitorNtuple->Write();
2514 }
2515 
2516 ////////////////////////////////////////////////////////////////////////////////
2517 /// Return the relative variable importance, normalized to all
2518 /// variables together having the importance 1. The importance in
2519 /// evaluated as the total separation-gain that this variable had in
2520 /// the decision trees (weighted by the number of events)
2521 
2523 {
2524  fVariableImportance.resize(GetNvar());
2525  for (UInt_t ivar = 0; ivar < GetNvar(); ivar++) {
2526  fVariableImportance[ivar]=0;
2527  }
2528  Double_t sum=0;
2529  for (UInt_t itree = 0; itree < GetNTrees(); itree++) {
2530  std::vector<Double_t> relativeImportance(fForest[itree]->GetVariableImportance());
2531  for (UInt_t i=0; i< relativeImportance.size(); i++) {
2532  fVariableImportance[i] += fBoostWeights[itree] * relativeImportance[i];
2533  }
2534  }
2535 
2536  for (UInt_t ivar=0; ivar< fVariableImportance.size(); ivar++){
2538  sum += fVariableImportance[ivar];
2539  }
2540  for (UInt_t ivar=0; ivar< fVariableImportance.size(); ivar++) fVariableImportance[ivar] /= sum;
2541 
2542  return fVariableImportance;
2543 }
2544 
2545 ////////////////////////////////////////////////////////////////////////////////
2546 /// Returns the measure for the variable importance of variable "ivar"
2547 /// which is later used in GetVariableImportance() to calculate the
2548 /// relative variable importances.
2549 
2551 {
2552  std::vector<Double_t> relativeImportance = this->GetVariableImportance();
2553  if (ivar < (UInt_t)relativeImportance.size()) return relativeImportance[ivar];
2554  else Log() << kFATAL << "<GetVariableImportance> ivar = " << ivar << " is out of range " << Endl;
2555 
2556  return -1;
2557 }
2558 
2559 ////////////////////////////////////////////////////////////////////////////////
2560 /// Compute ranking of input variables
2561 
2563 {
2564  // create the ranking object
2565  fRanking = new Ranking( GetName(), "Variable Importance" );
2566  vector< Double_t> importance(this->GetVariableImportance());
2567 
2568  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
2569 
2570  fRanking->AddRank( Rank( GetInputLabel(ivar), importance[ivar] ) );
2571  }
2572 
2573  return fRanking;
2574 }
2575 
2576 ////////////////////////////////////////////////////////////////////////////////
2577 /// Get help message text.
2578 
2580 {
2581  Log() << Endl;
2582  Log() << gTools().Color("bold") << "--- Short description:" << gTools().Color("reset") << Endl;
2583  Log() << Endl;
2584  Log() << "Boosted Decision Trees are a collection of individual decision" << Endl;
2585  Log() << "trees which form a multivariate classifier by (weighted) majority " << Endl;
2586  Log() << "vote of the individual trees. Consecutive decision trees are " << Endl;
2587  Log() << "trained using the original training data set with re-weighted " << Endl;
2588  Log() << "events. By default, the AdaBoost method is employed, which gives " << Endl;
2589  Log() << "events that were misclassified in the previous tree a larger " << Endl;
2590  Log() << "weight in the training of the following tree." << Endl;
2591  Log() << Endl;
2592  Log() << "Decision trees are a sequence of binary splits of the data sample" << Endl;
2593  Log() << "using a single discriminant variable at a time. A test event " << Endl;
2594  Log() << "ending up after the sequence of left-right splits in a final " << Endl;
2595  Log() << "(\"leaf\") node is classified as either signal or background" << Endl;
2596  Log() << "depending on the majority type of training events in that node." << Endl;
2597  Log() << Endl;
2598  Log() << gTools().Color("bold") << "--- Performance optimisation:" << gTools().Color("reset") << Endl;
2599  Log() << Endl;
2600  Log() << "By the nature of the binary splits performed on the individual" << Endl;
2601  Log() << "variables, decision trees do not deal well with linear correlations" << Endl;
2602  Log() << "between variables (they need to approximate the linear split in" << Endl;
2603  Log() << "the two dimensional space by a sequence of splits on the two " << Endl;
2604  Log() << "variables individually). Hence decorrelation could be useful " << Endl;
2605  Log() << "to optimise the BDT performance." << Endl;
2606  Log() << Endl;
2607  Log() << gTools().Color("bold") << "--- Performance tuning via configuration options:" << gTools().Color("reset") << Endl;
2608  Log() << Endl;
2609  Log() << "The two most important parameters in the configuration are the " << Endl;
2610  Log() << "minimal number of events requested by a leaf node as percentage of the " <<Endl;
2611  Log() << " number of training events (option \"MinNodeSize\" replacing the actual number " << Endl;
2612  Log() << " of events \"nEventsMin\" as given in earlier versions" << Endl;
2613  Log() << "If this number is too large, detailed features " << Endl;
2614  Log() << "in the parameter space are hard to be modelled. If it is too small, " << Endl;
2615  Log() << "the risk to overtrain rises and boosting seems to be less effective" << Endl;
2616  Log() << " typical values from our current experience for best performance " << Endl;
2617  Log() << " are between 0.5(%) and 10(%) " << Endl;
2618  Log() << Endl;
2619  Log() << "The default minimal number is currently set to " << Endl;
2620  Log() << " max(20, (N_training_events / N_variables^2 / 10)) " << Endl;
2621  Log() << "and can be changed by the user." << Endl;
2622  Log() << Endl;
2623  Log() << "The other crucial parameter, the pruning strength (\"PruneStrength\")," << Endl;
2624  Log() << "is also related to overtraining. It is a regularisation parameter " << Endl;
2625  Log() << "that is used when determining after the training which splits " << Endl;
2626  Log() << "are considered statistically insignificant and are removed. The" << Endl;
2627  Log() << "user is advised to carefully watch the BDT screen output for" << Endl;
2628  Log() << "the comparison between efficiencies obtained on the training and" << Endl;
2629  Log() << "the independent test sample. They should be equal within statistical" << Endl;
2630  Log() << "errors, in order to minimize statistical fluctuations in different samples." << Endl;
2631 }
2632 
2633 ////////////////////////////////////////////////////////////////////////////////
2634 /// Make ROOT-independent C++ class for classifier response (classifier-specific implementation).
2635 
2636 void TMVA::MethodBDT::MakeClassSpecific( std::ostream& fout, const TString& className ) const
2637 {
2638  TString nodeName = className;
2639  nodeName.ReplaceAll("Read","");
2640  nodeName.Append("Node");
2641  // write BDT-specific classifier response
2642  fout << " std::vector<"<<nodeName<<"*> fForest; // i.e. root nodes of decision trees" << std::endl;
2643  fout << " std::vector<double> fBoostWeights; // the weights applied in the individual boosts" << std::endl;
2644  fout << "};" << std::endl << std::endl;
2645  fout << "double " << className << "::GetMvaValue__( const std::vector<double>& inputValues ) const" << std::endl;
2646  fout << "{" << std::endl;
2647  fout << " double myMVA = 0;" << std::endl;
2648  if (fDoPreselection){
2649  for (UInt_t ivar = 0; ivar< fIsLowBkgCut.size(); ivar++){
2650  if (fIsLowBkgCut[ivar]){
2651  fout << " if (inputValues["<<ivar<<"] < " << fLowBkgCut[ivar] << ") return -1; // is background preselection cut" << std::endl;
2652  }
2653  if (fIsLowSigCut[ivar]){
2654  fout << " if (inputValues["<<ivar<<"] < "<< fLowSigCut[ivar] << ") return 1; // is signal preselection cut" << std::endl;
2655  }
2656  if (fIsHighBkgCut[ivar]){
2657  fout << " if (inputValues["<<ivar<<"] > "<<fHighBkgCut[ivar] <<") return -1; // is background preselection cut" << std::endl;
2658  }
2659  if (fIsHighSigCut[ivar]){
2660  fout << " if (inputValues["<<ivar<<"] > "<<fHighSigCut[ivar]<<") return 1; // is signal preselection cut" << std::endl;
2661  }
2662  }
2663  }
2664 
2665  if (fBoostType!="Grad"){
2666  fout << " double norm = 0;" << std::endl;
2667  }
2668  fout << " for (unsigned int itree=0; itree<fForest.size(); itree++){" << std::endl;
2669  fout << " "<<nodeName<<" *current = fForest[itree];" << std::endl;
2670  fout << " while (current->GetNodeType() == 0) { //intermediate node" << std::endl;
2671  fout << " if (current->GoesRight(inputValues)) current=("<<nodeName<<"*)current->GetRight();" << std::endl;
2672  fout << " else current=("<<nodeName<<"*)current->GetLeft();" << std::endl;
2673  fout << " }" << std::endl;
2674  if (fBoostType=="Grad"){
2675  fout << " myMVA += current->GetResponse();" << std::endl;
2676  }else{
2677  if (fUseYesNoLeaf) fout << " myMVA += fBoostWeights[itree] * current->GetNodeType();" << std::endl;
2678  else fout << " myMVA += fBoostWeights[itree] * current->GetPurity();" << std::endl;
2679  fout << " norm += fBoostWeights[itree];" << std::endl;
2680  }
2681  fout << " }" << std::endl;
2682  if (fBoostType=="Grad"){
2683  fout << " return 2.0/(1.0+exp(-2.0*myMVA))-1.0;" << std::endl;
2684  }
2685  else fout << " return myMVA /= norm;" << std::endl;
2686  fout << "};" << std::endl << std::endl;
2687  fout << "void " << className << "::Initialize()" << std::endl;
2688  fout << "{" << std::endl;
2689  //Now for each decision tree, write directly the constructors of the nodes in the tree structure
2690  for (UInt_t itree=0; itree<GetNTrees(); itree++) {
2691  fout << " // itree = " << itree << std::endl;
2692  fout << " fBoostWeights.push_back(" << fBoostWeights[itree] << ");" << std::endl;
2693  fout << " fForest.push_back( " << std::endl;
2694  this->MakeClassInstantiateNode((DecisionTreeNode*)fForest[itree]->GetRoot(), fout, className);
2695  fout <<" );" << std::endl;
2696  }
2697  fout << " return;" << std::endl;
2698  fout << "};" << std::endl;
2699  fout << " " << std::endl;
2700  fout << "// Clean up" << std::endl;
2701  fout << "inline void " << className << "::Clear() " << std::endl;
2702  fout << "{" << std::endl;
2703  fout << " for (unsigned int itree=0; itree<fForest.size(); itree++) { " << std::endl;
2704  fout << " delete fForest[itree]; " << std::endl;
2705  fout << " }" << std::endl;
2706  fout << "}" << std::endl;
2707 }
2708 
2709 ////////////////////////////////////////////////////////////////////////////////
2710 /// Specific class header.
2711 
2712 void TMVA::MethodBDT::MakeClassSpecificHeader( std::ostream& fout, const TString& className) const
2713 {
2714  TString nodeName = className;
2715  nodeName.ReplaceAll("Read","");
2716  nodeName.Append("Node");
2717  //fout << "#ifndef NN" << std::endl; commented out on purpose see next line
2718  fout << "#define NN new "<<nodeName << std::endl; // NN definition depends on individual methods. Important to have NO #ifndef if several BDT methods compile together
2719  //fout << "#endif" << std::endl; commented out on purpose see previous line
2720  fout << " " << std::endl;
2721  fout << "#ifndef "<<nodeName<<"__def" << std::endl;
2722  fout << "#define "<<nodeName<<"__def" << std::endl;
2723  fout << " " << std::endl;
2724  fout << "class "<<nodeName<<" {" << std::endl;
2725  fout << " " << std::endl;
2726  fout << "public:" << std::endl;
2727  fout << " " << std::endl;
2728  fout << " // constructor of an essentially \"empty\" node floating in space" << std::endl;
2729  fout << " "<<nodeName<<" ( "<<nodeName<<"* left,"<<nodeName<<"* right," << std::endl;
2730  if (fUseFisherCuts){
2731  fout << " int nFisherCoeff," << std::endl;
2732  for (UInt_t i=0;i<GetNVariables()+1;i++){
2733  fout << " double fisherCoeff"<<i<<"," << std::endl;
2734  }
2735  }
2736  fout << " int selector, double cutValue, bool cutType, " << std::endl;
2737  fout << " int nodeType, double purity, double response ) :" << std::endl;
2738  fout << " fLeft ( left )," << std::endl;
2739  fout << " fRight ( right )," << std::endl;
2740  if (fUseFisherCuts) fout << " fNFisherCoeff ( nFisherCoeff )," << std::endl;
2741  fout << " fSelector ( selector )," << std::endl;
2742  fout << " fCutValue ( cutValue )," << std::endl;
2743  fout << " fCutType ( cutType )," << std::endl;
2744  fout << " fNodeType ( nodeType )," << std::endl;
2745  fout << " fPurity ( purity )," << std::endl;
2746  fout << " fResponse ( response ){" << std::endl;
2747  if (fUseFisherCuts){
2748  for (UInt_t i=0;i<GetNVariables()+1;i++){
2749  fout << " fFisherCoeff.push_back(fisherCoeff"<<i<<");" << std::endl;
2750  }
2751  }
2752  fout << " }" << std::endl << std::endl;
2753  fout << " virtual ~"<<nodeName<<"();" << std::endl << std::endl;
2754  fout << " // test event if it descends the tree at this node to the right" << std::endl;
2755  fout << " virtual bool GoesRight( const std::vector<double>& inputValues ) const;" << std::endl;
2756  fout << " "<<nodeName<<"* GetRight( void ) {return fRight; };" << std::endl << std::endl;
2757  fout << " // test event if it descends the tree at this node to the left " << std::endl;
2758  fout << " virtual bool GoesLeft ( const std::vector<double>& inputValues ) const;" << std::endl;
2759  fout << " "<<nodeName<<"* GetLeft( void ) { return fLeft; }; " << std::endl << std::endl;
2760  fout << " // return S/(S+B) (purity) at this node (from training)" << std::endl << std::endl;
2761  fout << " double GetPurity( void ) const { return fPurity; } " << std::endl;
2762  fout << " // return the node type" << std::endl;
2763  fout << " int GetNodeType( void ) const { return fNodeType; }" << std::endl;
2764  fout << " double GetResponse(void) const {return fResponse;}" << std::endl << std::endl;
2765  fout << "private:" << std::endl << std::endl;
2766  fout << " "<<nodeName<<"* fLeft; // pointer to the left daughter node" << std::endl;
2767  fout << " "<<nodeName<<"* fRight; // pointer to the right daughter node" << std::endl;
2768  if (fUseFisherCuts){
2769  fout << " int fNFisherCoeff; // =0 if this node doesn't use fisher, else =nvar+1 " << std::endl;
2770  fout << " std::vector<double> fFisherCoeff; // the fisher coeff (offset at the last element)" << std::endl;
2771  }
2772  fout << " int fSelector; // index of variable used in node selection (decision tree) " << std::endl;
2773  fout << " double fCutValue; // cut value applied on this node to discriminate bkg against sig" << std::endl;
2774  fout << " bool fCutType; // true: if event variable > cutValue ==> signal , false otherwise" << std::endl;
2775  fout << " int fNodeType; // Type of node: -1 == Bkg-leaf, 1 == Signal-leaf, 0 = internal " << std::endl;
2776  fout << " double fPurity; // Purity of node from training"<< std::endl;
2777  fout << " double fResponse; // Regression response value of node" << std::endl;
2778  fout << "}; " << std::endl;
2779  fout << " " << std::endl;
2780  fout << "//_______________________________________________________________________" << std::endl;
2781  fout << " "<<nodeName<<"::~"<<nodeName<<"()" << std::endl;
2782  fout << "{" << std::endl;
2783  fout << " if (fLeft != NULL) delete fLeft;" << std::endl;
2784  fout << " if (fRight != NULL) delete fRight;" << std::endl;
2785  fout << "}; " << std::endl;
2786  fout << " " << std::endl;
2787  fout << "//_______________________________________________________________________" << std::endl;
2788  fout << "bool "<<nodeName<<"::GoesRight( const std::vector<double>& inputValues ) const" << std::endl;
2789  fout << "{" << std::endl;
2790  fout << " // test event if it descends the tree at this node to the right" << std::endl;
2791  fout << " bool result;" << std::endl;
2792  if (fUseFisherCuts){
2793  fout << " if (fNFisherCoeff == 0){" << std::endl;
2794  fout << " result = (inputValues[fSelector] > fCutValue );" << std::endl;
2795  fout << " }else{" << std::endl;
2796  fout << " double fisher = fFisherCoeff.at(fFisherCoeff.size()-1);" << std::endl;
2797  fout << " for (unsigned int ivar=0; ivar<fFisherCoeff.size()-1; ivar++)" << std::endl;
2798  fout << " fisher += fFisherCoeff.at(ivar)*inputValues.at(ivar);" << std::endl;
2799  fout << " result = fisher > fCutValue;" << std::endl;
2800  fout << " }" << std::endl;
2801  }else{
2802  fout << " result = (inputValues[fSelector] > fCutValue );" << std::endl;
2803  }
2804  fout << " if (fCutType == true) return result; //the cuts are selecting Signal ;" << std::endl;
2805  fout << " else return !result;" << std::endl;
2806  fout << "}" << std::endl;
2807  fout << " " << std::endl;
2808  fout << "//_______________________________________________________________________" << std::endl;
2809  fout << "bool "<<nodeName<<"::GoesLeft( const std::vector<double>& inputValues ) const" << std::endl;
2810  fout << "{" << std::endl;
2811  fout << " // test event if it descends the tree at this node to the left" << std::endl;
2812  fout << " if (!this->GoesRight(inputValues)) return true;" << std::endl;
2813  fout << " else return false;" << std::endl;
2814  fout << "}" << std::endl;
2815  fout << " " << std::endl;
2816  fout << "#endif" << std::endl;
2817  fout << " " << std::endl;
2818 }
2819 
2820 ////////////////////////////////////////////////////////////////////////////////
2821 /// Recursively descends a tree and writes the node instance to the output stream.
2822 
2823 void TMVA::MethodBDT::MakeClassInstantiateNode( DecisionTreeNode *n, std::ostream& fout, const TString& className ) const
2824 {
2825  if (n == NULL) {
2826  Log() << kFATAL << "MakeClassInstantiateNode: started with undefined node" <<Endl;
2827  return ;
2828  }
2829  fout << "NN("<<std::endl;
2830  if (n->GetLeft() != NULL){
2831  this->MakeClassInstantiateNode( (DecisionTreeNode*)n->GetLeft() , fout, className);
2832  }
2833  else {
2834  fout << "0";
2835  }
2836  fout << ", " <<std::endl;
2837  if (n->GetRight() != NULL){
2838  this->MakeClassInstantiateNode( (DecisionTreeNode*)n->GetRight(), fout, className );
2839  }
2840  else {
2841  fout << "0";
2842  }
2843  fout << ", " << std::endl
2844  << std::setprecision(6);
2845  if (fUseFisherCuts){
2846  fout << n->GetNFisherCoeff() << ", ";
2847  for (UInt_t i=0; i< GetNVariables()+1; i++) {
2848  if (n->GetNFisherCoeff() == 0 ){
2849  fout << "0, ";
2850  }else{
2851  fout << n->GetFisherCoeff(i) << ", ";
2852  }
2853  }
2854  }
2855  fout << n->GetSelector() << ", "
2856  << n->GetCutValue() << ", "
2857  << n->GetCutType() << ", "
2858  << n->GetNodeType() << ", "
2859  << n->GetPurity() << ","
2860  << n->GetResponse() << ") ";
2861 }
2862 
2863 ////////////////////////////////////////////////////////////////////////////////
2864 /// Find useful preselection cuts that will be applied before
2865 /// and Decision Tree training.. (and of course also applied
2866 /// in the GetMVA .. --> -1 for background +1 for Signal)
2867 
2868 void TMVA::MethodBDT::DeterminePreselectionCuts(const std::vector<const TMVA::Event*>& eventSample)
2869 {
2870  Double_t nTotS = 0.0, nTotB = 0.0;
2871  Int_t nTotS_unWeighted = 0, nTotB_unWeighted = 0;
2872 
2873  std::vector<TMVA::BDTEventWrapper> bdtEventSample;
2874 
2875  fIsLowSigCut.assign(GetNvar(),kFALSE);
2876  fIsLowBkgCut.assign(GetNvar(),kFALSE);
2877  fIsHighSigCut.assign(GetNvar(),kFALSE);
2878  fIsHighBkgCut.assign(GetNvar(),kFALSE);
2879 
2880  fLowSigCut.assign(GetNvar(),0.); // ---------------| --> in var is signal (accept all above lower cut)
2881  fLowBkgCut.assign(GetNvar(),0.); // ---------------| --> in var is bkg (accept all above lower cut)
2882  fHighSigCut.assign(GetNvar(),0.); // <-- | -------------- in var is signal (accept all blow cut)
2883  fHighBkgCut.assign(GetNvar(),0.); // <-- | -------------- in var is blg (accept all blow cut)
2884 
2885 
2886  // Initialize (un)weighted counters for signal & background
2887  // Construct a list of event wrappers that point to the original data
2888  for( std::vector<const TMVA::Event*>::const_iterator it = eventSample.begin(); it != eventSample.end(); ++it ) {
2889  if (DataInfo().IsSignal(*it)){
2890  nTotS += (*it)->GetWeight();
2891  ++nTotS_unWeighted;
2892  }
2893  else {
2894  nTotB += (*it)->GetWeight();
2895  ++nTotB_unWeighted;
2896  }
2897  bdtEventSample.push_back(TMVA::BDTEventWrapper(*it));
2898  }
2899 
2900  for( UInt_t ivar = 0; ivar < GetNvar(); ivar++ ) { // loop over all discriminating variables
2901  TMVA::BDTEventWrapper::SetVarIndex(ivar); // select the variable to sort by
2902  std::sort( bdtEventSample.begin(),bdtEventSample.end() ); // sort the event data
2903 
2904  Double_t bkgWeightCtr = 0.0, sigWeightCtr = 0.0;
2905  std::vector<TMVA::BDTEventWrapper>::iterator it = bdtEventSample.begin(), it_end = bdtEventSample.end();
2906  for( ; it != it_end; ++it ) {
2907  if (DataInfo().IsSignal(**it))
2908  sigWeightCtr += (**it)->GetWeight();
2909  else
2910  bkgWeightCtr += (**it)->GetWeight();
2911  // Store the accumulated signal (background) weights
2912  it->SetCumulativeWeight(false,bkgWeightCtr);
2913  it->SetCumulativeWeight(true,sigWeightCtr);
2914  }
2915 
2916  //variable that determines how "exact" you cut on the preselection found in the training data. Here I chose
2917  //1% of the variable range...
2918  Double_t dVal = (DataInfo().GetVariableInfo(ivar).GetMax() - DataInfo().GetVariableInfo(ivar).GetMin())/100. ;
2919  Double_t nSelS, nSelB, effS=0.05, effB=0.05, rejS=0.05, rejB=0.05;
2920  Double_t tmpEffS, tmpEffB, tmpRejS, tmpRejB;
2921  // Locate the optimal cut for this (ivar-th) variable
2922 
2923 
2924 
2925  for(UInt_t iev = 1; iev < bdtEventSample.size(); iev++) {
2926  //dVal = bdtEventSample[iev].GetVal() - bdtEventSample[iev-1].GetVal();
2927 
2928  nSelS = bdtEventSample[iev].GetCumulativeWeight(true);
2929  nSelB = bdtEventSample[iev].GetCumulativeWeight(false);
2930  // you look for some 100% efficient pre-selection cut to remove background.. i.e. nSelS=0 && nSelB>5%nTotB or ( nSelB=0 nSelS>5%nTotS)
2931  tmpEffS=nSelS/nTotS;
2932  tmpEffB=nSelB/nTotB;
2933  tmpRejS=1-tmpEffS;
2934  tmpRejB=1-tmpEffB;
2935  if (nSelS==0 && tmpEffB>effB) {effB=tmpEffB; fLowBkgCut[ivar] = bdtEventSample[iev].GetVal() - dVal; fIsLowBkgCut[ivar]=kTRUE;}
2936  else if (nSelB==0 && tmpEffS>effS) {effS=tmpEffS; fLowSigCut[ivar] = bdtEventSample[iev].GetVal() - dVal; fIsLowSigCut[ivar]=kTRUE;}
2937  else if (nSelB==nTotB && tmpRejS>rejS) {rejS=tmpRejS; fHighSigCut[ivar] = bdtEventSample[iev].GetVal() + dVal; fIsHighSigCut[ivar]=kTRUE;}
2938  else if (nSelS==nTotS && tmpRejB>rejB) {rejB=tmpRejB; fHighBkgCut[ivar] = bdtEventSample[iev].GetVal() + dVal; fIsHighBkgCut[ivar]=kTRUE;}
2939 
2940  }
2941  }
2942 
2943  Log() << kDEBUG << " \tfound and suggest the following possible pre-selection cuts " << Endl;
2944  if (fDoPreselection) Log() << kDEBUG << "\tthe training will be done after these cuts... and GetMVA value returns +1, (-1) for a signal (bkg) event that passes these cuts" << Endl;
2945  else Log() << kDEBUG << "\tas option DoPreselection was not used, these cuts however will not be performed, but the training will see the full sample"<<Endl;
2946  for (UInt_t ivar=0; ivar < GetNvar(); ivar++ ) { // loop over all discriminating variables
2947  if (fIsLowBkgCut[ivar]){
2948  Log() << kDEBUG << " \tfound cut: Bkg if var " << ivar << " < " << fLowBkgCut[ivar] << Endl;
2949  }
2950  if (fIsLowSigCut[ivar]){
2951  Log() << kDEBUG << " \tfound cut: Sig if var " << ivar << " < " << fLowSigCut[ivar] << Endl;
2952  }
2953  if (fIsHighBkgCut[ivar]){
2954  Log() << kDEBUG << " \tfound cut: Bkg if var " << ivar << " > " << fHighBkgCut[ivar] << Endl;
2955  }
2956  if (fIsHighSigCut[ivar]){
2957  Log() << kDEBUG << " \tfound cut: Sig if var " << ivar << " > " << fHighSigCut[ivar] << Endl;
2958  }
2959  }
2960 
2961  return;
2962 }
2963 
2964 ////////////////////////////////////////////////////////////////////////////////
2965 /// Apply the preselection cuts before even bothering about any
2966 /// Decision Trees in the GetMVA .. --> -1 for background +1 for Signal
2967 
2969 {
2970  Double_t result=0;
2971 
2972  for (UInt_t ivar=0; ivar < GetNvar(); ivar++ ) { // loop over all discriminating variables
2973  if (fIsLowBkgCut[ivar]){
2974  if (ev->GetValue(ivar) < fLowBkgCut[ivar]) result = -1; // is background
2975  }
2976  if (fIsLowSigCut[ivar]){
2977  if (ev->GetValue(ivar) < fLowSigCut[ivar]) result = 1; // is signal
2978  }
2979  if (fIsHighBkgCut[ivar]){
2980  if (ev->GetValue(ivar) > fHighBkgCut[ivar]) result = -1; // is background
2981  }
2982  if (fIsHighSigCut[ivar]){
2983  if (ev->GetValue(ivar) > fHighSigCut[ivar]) result = 1; // is signal
2984  }
2985  }
2986 
2987  return result;
2988 }
2989 
Bool_t fUseYesNoLeaf
Definition: MethodBDT.h:228
Types::EAnalysisType fAnalysisType
Definition: MethodBase.h:582
void Train(void)
BDT training.
Definition: MethodBDT.cxx:1140
void PreProcessNegativeEventWeights()
O.k.
Definition: MethodBDT.cxx:931
virtual Int_t Fill(Double_t x)
Increment bin with abscissa X by 1.
Definition: TH1.cxx:3244
double dist(Rotation3D const &r1, Rotation3D const &r2)
Definition: 3DDistances.cxx:48
void GetBaggedSubSample(std::vector< const TMVA::Event *> &)
Fills fEventSample with fBaggedSampleFraction*NEvents random training events.
Definition: MethodBDT.cxx:2041
static long int sum(long int i)
Definition: Factory.cxx:2173
virtual Double_t Fit(std::vector< LossFunctionEventInfo > &evs)=0
constexpr Double_t K()
Definition: TMath.h:178
Random number generator class based on M.
Definition: TRandom3.h:27
THist< 1, int, THistStatContent > TH1I
Definition: THist.hxx:287
virtual Double_t PoissonD(Double_t mean)
Generates a random number according to a Poisson law.
Definition: TRandom.cxx:435
MsgLogger & Endl(MsgLogger &ml)
Definition: MsgLogger.h:158
Singleton class for Global types used by TMVA.
Definition: Types.h:73
long long Long64_t
Definition: RtypesCore.h:69
std::vector< Bool_t > fIsLowSigCut
Definition: MethodBDT.h:278
Double_t RegBoost(std::vector< const TMVA::Event *> &, DecisionTree *dt)
A special boosting only for Regression (not implemented).
Definition: MethodBDT.cxx:2075
void DeclareCompatibilityOptions()
Options that are used ONLY for the READER to ensure backward compatibility.
Definition: MethodBDT.cxx:455
std::map< const TMVA::Event *, LossFunctionEventInfo > fLossFunctionEventInfo
Definition: MethodBDT.h:213
Bool_t fPairNegWeightsGlobal
Definition: MethodBDT.h:247
void AddPoint(Double_t x, Double_t y1, Double_t y2)
This function is used only in 2 TGraph case, and it will add new data points to graphs.
Definition: MethodBase.cxx:212
void SetUseNvars(Int_t n)
Definition: MethodBDT.h:131
Bool_t fRandomisedTrees
Definition: MethodBDT.h:238
TString fMinNodeSizeS
Definition: MethodBDT.h:222
Double_t Log(Double_t x)
Definition: TMath.h:648
const Ranking * CreateRanking()
Compute ranking of input variables.
Definition: MethodBDT.cxx:2562
virtual void Delete(Option_t *option="")
Delete this tree from memory or/and disk.
Definition: TTree.cxx:3540
Bool_t IsConstructedFromWeightFile() const
Definition: MethodBase.h:527
float Float_t
Definition: RtypesCore.h:53
TString fPruneMethodS
Definition: MethodBDT.h:234
Double_t GetMin() const
Definition: VariableInfo.h:63
TString fSepTypeS
Definition: MethodBDT.h:219
Double_t CheckEvent(const TMVA::Event *, Bool_t UseYesNoLeaf=kFALSE) const
the event e is put into the decision tree (starting at the root node) and the output is NodeType (sig...
void BDT(TString dataset, const TString &fin="TMVA.root")
Absolute Deviation BDT Loss Function.
Definition: LossFunction.h:258
TString & ReplaceAll(const TString &s1, const TString &s2)
Definition: TString.h:638
UInt_t GetNvar() const
Definition: MethodBase.h:333
TTree * fMonitorNtuple
Definition: MethodBDT.h:253
UInt_t GetNNodes() const
Definition: BinaryTree.h:86
virtual Int_t Fill()
Fill all branches.
Definition: TTree.cxx:4364
virtual void SetName(const char *name)
Set the name of the TNamed.
Definition: TNamed.cxx:140
THist< 1, float, THistStatContent, THistStatUncertainty > TH1F
Definition: THist.hxx:285
TH1 * h
Definition: legend2.C:5
Double_t fAdaBoostBeta
Definition: MethodBDT.h:205
MsgLogger & Log() const
Definition: Configurable.h:122
std::vector< Bool_t > fIsHighSigCut
Definition: MethodBDT.h:280
OptionBase * DeclareOptionRef(T &ref, const TString &name, const TString &desc="")
void DeclareOptions()
Define the options (their key words).
Definition: MethodBDT.cxx:334
std::vector< Double_t > fVariableImportance
Definition: MethodBDT.h:267
Bool_t IsFloat() const
Returns kTRUE if string contains a floating point or integer number.
Definition: TString.cxx:1845
void DeterminePreselectionCuts(const std::vector< const TMVA::Event *> &eventSample)
Find useful preselection cuts that will be applied before and Decision Tree training.
Definition: MethodBDT.cxx:2868
EAnalysisType
Definition: Types.h:125
void MakeClassInstantiateNode(DecisionTreeNode *n, std::ostream &fout, const TString &className) const
Recursively descends a tree and writes the node instance to the output stream.
Definition: MethodBDT.cxx:2823
Double_t fMinLinCorrForFisher
Definition: MethodBDT.h:226
Virtual base Class for all MVA method.
Definition: MethodBase.h:109
std::vector< const TMVA::Event * > fEventSample
Definition: MethodBDT.h:195
Double_t Bagging()
Call it boot-strapping, re-sampling or whatever you like, in the end it is nothing else but applying ...
Definition: MethodBDT.cxx:2030
bool fExitFromTraining
Definition: MethodBase.h:436
Bool_t fBaggedGradBoost
Definition: MethodBDT.h:210
Bool_t fDoBoostMonitor
Definition: MethodBDT.h:249
Basic string class.
Definition: TString.h:125
1-D histogram with a float per channel (see TH1 documentation)}
Definition: TH1.h:567
TransformationHandler & GetTransformationHandler(Bool_t takeReroutedIfAvailable=true)
Definition: MethodBase.h:383
Ranking for variables in method (implementation)
Definition: Ranking.h:48
Short_t Min(Short_t a, Short_t b)
Definition: TMathBase.h:168
void ToLower()
Change string to lower-case.
Definition: TString.cxx:1099
int Int_t
Definition: RtypesCore.h:41
virtual void SetYTitle(const char *title)
Definition: TH1.h:406
bool Bool_t
Definition: RtypesCore.h:59
virtual void SetTitle(const char *title="")
Set graph title.
Definition: TGraph.cxx:2208
Double_t AdaBoost(std::vector< const TMVA::Event *> &, DecisionTree *dt)
The AdaBoost implementation.
Definition: MethodBDT.cxx:1736
UInt_t GetNClasses() const
Definition: DataSetInfo.h:136
void ProcessOptions()
The option string is decoded, for available options see "DeclareOptions".
Definition: MethodBDT.cxx:471
Bool_t fBaggedBoost
Definition: MethodBDT.h:209
Int_t FloorNint(Double_t x)
Definition: TMath.h:602
void GetHelpMessage() const
Get help message text.
Definition: MethodBDT.cxx:2579
Bool_t GetCutType(void) const
std::vector< Bool_t > fIsHighBkgCut
Definition: MethodBDT.h:281
void SetShrinkage(Double_t s)
Definition: MethodBDT.h:130
Double_t AdaCost(std::vector< const TMVA::Event *> &, DecisionTree *dt)
The AdaCost boosting algorithm takes a simple cost Matrix (currently fixed for all events...
Definition: MethodBDT.cxx:1914
Bool_t fAutomatic
Definition: MethodBDT.h:237
void MakeClassSpecific(std::ostream &, const TString &) const
Make ROOT-independent C++ class for classifier response (classifier-specific implementation).
Definition: MethodBDT.cxx:2636
void AddAttr(void *node, const char *, const T &value, Int_t precision=16)
add attribute to xml
Definition: Tools.h:308
virtual Double_t GetROCIntegral(TH1D *histS, TH1D *histB) const
calculate the area (integral) under the ROC curve as a overall quality measure of the classification ...
Double_t fCts_sb
Definition: MethodBDT.h:259
void * AddChild(void *parent, const char *childname, const char *content=0, bool isRootNode=false)
add child node
Definition: Tools.cxx:1135
Short_t Abs(Short_t d)
Definition: TMathBase.h:108
TString fRegressionLossFunctionBDTGS
Definition: MethodBDT.h:285
Double_t fBoostWeight
Definition: MethodBDT.h:255
Double_t GetMvaValue(Double_t *err=0, Double_t *errUpper=0)
Definition: MethodBDT.cxx:2333
LongDouble_t Power(LongDouble_t x, LongDouble_t y)
Definition: TMath.h:627
const TString & GetInputLabel(Int_t i) const
Definition: MethodBase.h:339
Huber BDT Loss Function.
Definition: LossFunction.h:176
UInt_t fSignalClass
Definition: MethodBase.h:676
Double_t fPruneStrength
Definition: MethodBDT.h:235
std::vector< Double_t > fHighBkgCut
Definition: MethodBDT.h:276
Double_t GetGradBoostMVA(const TMVA::Event *e, UInt_t nTrees)
Returns MVA value: -1 for background, 1 for signal.
Definition: MethodBDT.cxx:1420
TClass * GetClass(T *)
Definition: TClass.h:577
Double_t fBaggedSampleFraction
Definition: MethodBDT.h:243
Implementation of the CrossEntropy as separation criterion.
Definition: CrossEntropy.h:43
Bool_t fInverseBoostNegWeights
Definition: MethodBDT.h:246
Double_t GradBoostRegression(std::vector< const TMVA::Event *> &, DecisionTree *dt)
Implementation of M_TreeBoost using any loss function as described by Friedman 1999.
Definition: MethodBDT.cxx:1524
virtual void SetTuneParameters(std::map< TString, Double_t > tuneParameters)
Set the tuning parameters according to the argument.
Definition: MethodBDT.cxx:1120
void MakeClassSpecificHeader(std::ostream &, const TString &) const
Specific class header.
Definition: MethodBDT.cxx:2712
Float_t GetCutValue(void) const
UInt_t GetTrainingTMVAVersionCode() const
Definition: MethodBase.h:378
const Event * GetEvent() const
Definition: MethodBase.h:738
DataSet * Data() const
Definition: MethodBase.h:398
Double_t fSigToBkgFraction
Definition: MethodBDT.h:203
virtual Bool_t HasAnalysisType(Types::EAnalysisType type, UInt_t numberClasses, UInt_t numberTargets)
BDT can handle classification with multiple classes and regression with one regression-target.
Definition: MethodBDT.cxx:281
UInt_t GetNFisherCoeff() const
Bool_t fDoPreselection
Definition: MethodBDT.h:263
void * GetChild(void *parent, const char *childname=0)
get child node
Definition: Tools.cxx:1161
void Reset(void)
Reset the method, as if it had just been instantiated (forget all training etc.). ...
Definition: MethodBDT.cxx:724
TString & Append(const char *cs)
Definition: TString.h:495
void SetMinNodeSize(Double_t sizeInPercent)
Definition: MethodBDT.cxx:659
TString fAdaBoostR2Loss
Definition: MethodBDT.h:206
void Init(std::vector< TString > &graphTitles)
This function gets some title and it creates a TGraph for every title.
Definition: MethodBase.cxx:174
DataSetInfo & DataInfo() const
Definition: MethodBase.h:399
Bool_t DoRegression() const
Definition: MethodBase.h:427
Int_t fMinNodeEvents
Definition: MethodBDT.h:220
Double_t AdaBoostR2(std::vector< const TMVA::Event *> &, DecisionTree *dt)
Adaption of the AdaBoost to regression problems (see H.Drucker 1997).
Definition: MethodBDT.cxx:2083
void SetNTrees(Int_t d)
Definition: MethodBDT.h:127
std::vector< Double_t > fHighSigCut
Definition: MethodBDT.h:275
Class that contains all the data information.
Definition: DataSetInfo.h:60
Least Squares BDT Loss Function.
Definition: LossFunction.h:219
Implementation of the SdivSqrtSplusB as separation criterion.
PDF wrapper for histograms; uses user-defined spline interpolation.
Definition: PDF.h:63
Double_t fCtb_ss
Definition: MethodBDT.h:260
Long64_t GetNTrainingEvents() const
Definition: DataSet.h:79
const std::vector< Float_t > & GetMulticlassValues()
Get the multiclass MVA response for the BDT classifier.
Definition: MethodBDT.cxx:2385
UInt_t fIPyCurrentIter
Definition: MethodBase.h:437
virtual void Print(Option_t *option="") const
Print TNamed name and title.
Definition: TNamed.cxx:128
Float_t fMinNodeSize
Definition: MethodBDT.h:221
const Event * GetTrainingEvent(Long64_t ievt) const
Definition: MethodBase.h:758
Implementation of the MisClassificationError as separation criterion.
Bool_t fNoNegWeightsInTraining
Definition: MethodBDT.h:245
TString GetElapsedTime(Bool_t Scientific=kTRUE)
returns pretty string with elapsed time
Definition: Timer.cxx:134
Double_t GetMax() const
Definition: VariableInfo.h:64
Bool_t DoMulticlass() const
Definition: MethodBase.h:428
const std::vector< Float_t > & GetRegressionValues()
Get the regression value generated by the BDTs.
Definition: MethodBDT.cxx:2421
void InitEventSample()
Initialize the event sample (i.e. reset the boost-weights... etc).
Definition: MethodBDT.cxx:760
std::vector< Bool_t > fIsLowBkgCut
Definition: MethodBDT.h:279
void WriteMonitoringHistosToFile(void) const
Here we could write some histograms created during the processing to the output file.
Definition: MethodBDT.cxx:2507
virtual void Delete(Option_t *option="")
Delete this object.
Definition: TObject.cxx:169
Double_t fErrorFraction
Definition: MethodBDT.h:256
VecExpr< UnaryOp< Fabs< T >, VecExpr< A, T, D >, T >, T, D > fabs(const VecExpr< A, T, D > &rhs)
const Event * GetTestingEvent(Long64_t ievt) const
Definition: MethodBase.h:764
virtual Double_t Determinant() const
virtual Int_t Write(const char *name=0, Int_t option=0, Int_t bufsize=0)
Write this object to the current directory.
Definition: TTree.cxx:9212
Float_t GetTarget(UInt_t itgt) const
Definition: Event.h:97
Bool_t HasTrainingTree() const
Definition: MethodBase.h:500
Results * GetResults(const TString &, Types::ETreeType type, Types::EAnalysisType analysistype)
Definition: DataSet.cxx:265
std::vector< Double_t > fLowBkgCut
Definition: MethodBDT.h:274
Int_t GetNodeType(void) const
ROOT::R::TRInterface & r
Definition: Object.C:4
Double_t fNodePurityLimit
Definition: MethodBDT.h:229
Service class for 2-Dim histogram classes.
Definition: TH2.h:30
Bool_t fHistoricBool
Definition: MethodBDT.h:283
void SetBaggedSampleFraction(Double_t f)
Definition: MethodBDT.h:132
SVector< double, 2 > v
Definition: Dict.h:5
virtual TString Name()=0
const char * GetName() const
Definition: MethodBase.h:323
ClassInfo * GetClassInfo(Int_t clNum) const
std::map< TString, Double_t > optimize()
TGraph * GetGraph(const TString &alias) const
Definition: Results.cxx:147
void BoostMonitor(Int_t iTree)
Fills the ROCIntegral vs Itree from the testSample for the monitoring plots during the training ...
Definition: MethodBDT.cxx:1642
The TMVA::Interval Class.
Definition: Interval.h:61
Double_t GetFisherCoeff(Int_t ivar) const
Bool_t fTrainWithNegWeights
Definition: MethodBDT.h:248
Bool_t fSkipNormalization
Definition: MethodBDT.h:265
void DeleteResults(const TString &, Types::ETreeType type, Types::EAnalysisType analysistype)
delete the results stored for this particular Method instance.
Definition: DataSet.cxx:316
virtual ~MethodBDT(void)
Destructor.
Definition: MethodBDT.cxx:752
Implementation of the GiniIndex as separation criterion.
Definition: GiniIndex.h:63
virtual void SetBinContent(Int_t bin, Double_t content)
Set bin content see convention for numbering bins in TH1::GetBin In case the bin number is greater th...
Definition: TH1.cxx:8477
void SetNodePurityLimit(Double_t l)
Definition: MethodBDT.h:129
UInt_t fIPyMaxIter
Definition: MethodBase.h:437
Double_t PrivateGetMvaValue(const TMVA::Event *ev, Double_t *err=0, Double_t *errUpper=0, UInt_t useNTrees=0)
Return the MVA value (range [-1;1]) that classifies the event according to the majority vote from the...
Definition: MethodBDT.cxx:2358
Implementation of a Decision Tree.
Definition: DecisionTree.h:59
unsigned int UInt_t
Definition: RtypesCore.h:42
Double_t GradBoost(std::vector< const TMVA::Event *> &, DecisionTree *dt, UInt_t cls=0)
Calculate the desired response value for each region.
Definition: MethodBDT.cxx:1490
char * Form(const char *fmt,...)
const Event * InverseTransform(const Event *, Bool_t suppressIfNoTargets=true) const
double floor(double)
Int_t GetN() const
Definition: TGraph.h:122
const TString & GetMethodName() const
Definition: MethodBase.h:320
void SetTarget(UInt_t itgt, Float_t value)
set the target value (dimension itgt) to value
Definition: Event.cxx:360
Double_t fCbb
Definition: MethodBDT.h:261
void ReadAttr(void *node, const char *, T &value)
read attribute from xml
Definition: Tools.h:290
SeparationBase * fSepType
Definition: MethodBDT.h:218
void Init(void)
Common initialisation with defaults for the BDT-Method.
Definition: MethodBDT.cxx:686
Tools & gTools()
void ReadWeightsFromXML(void *parent)
Reads the BDT from the xml file.
Definition: MethodBDT.cxx:2231
TMVA::DecisionTreeNode * GetEventNode(const TMVA::Event &e) const
get the pointer to the leaf node where a particular event ends up in...
virtual const char * GetPath() const
Returns the full path of the directory.
Definition: TDirectory.cxx:987
Bool_t fUseExclusiveVars
Definition: MethodBDT.h:227
TGraphErrors * gr
Definition: legend1.C:25
REAL epsilon
Definition: triangle.c:617
constexpr Double_t E()
Definition: TMath.h:74
Double_t TestTreeQuality(DecisionTree *dt)
Test the tree quality.. in terms of Misclassification.
Definition: MethodBDT.cxx:1587
Implementation of the GiniIndex With Laplace correction as separation criterion.
Long64_t GetNTestEvents() const
Definition: DataSet.h:80
UInt_t GetNVariables() const
Definition: MethodBase.h:334
const Bool_t kFALSE
Definition: RtypesCore.h:88
Float_t GetValue(UInt_t ivar) const
return value of i&#39;th variable
Definition: Event.cxx:237
DecisionTree::EPruneMethod fPruneMethod
Definition: MethodBDT.h:233
static void SetVarIndex(Int_t iVar)
UInt_t fUseNvars
Definition: MethodBDT.h:239
UInt_t fMaxDepth
Definition: MethodBDT.h:231
Float_t GetPurity(void) const
Bool_t IgnoreEventsWithNegWeightsInTraining() const
Definition: MethodBase.h:673
Double_t Exp(Double_t x)
Definition: TMath.h:621
#define ClassImp(name)
Definition: Rtypes.h:359
void ReadWeightsFromStream(std::istream &istr)
Read the weights (BDT coefficients).
Definition: MethodBDT.cxx:2298
double Double_t
Definition: RtypesCore.h:55
Double_t ApplyPreselectionCuts(const Event *ev)
Apply the preselection cuts before even bothering about any Decision Trees in the GetMVA ...
Definition: MethodBDT.cxx:2968
void UpdateTargets(std::vector< const TMVA::Event *> &, UInt_t cls=0)
Calculate residual for all events.
Definition: MethodBDT.cxx:1434
std::vector< Float_t > * fMulticlassReturnVal
Definition: MethodBase.h:585
Bool_t IsNormalised() const
Definition: MethodBase.h:483
void SetMaxDepth(Int_t d)
Definition: MethodBDT.h:123
TH1 * GetHist(const TString &alias) const
Definition: Results.cxx:130
int type
Definition: TGX11.cxx:120
void AddWeightsXMLTo(void *parent) const
Write weights to XML.
Definition: MethodBDT.cxx:2200
Double_t fShrinkage
Definition: MethodBDT.h:208
Bool_t fUsePoissonNvars
Definition: MethodBDT.h:240
static DecisionTree * CreateFromXML(void *node, UInt_t tmva_Version_Code=TMVA_VERSION_CODE)
re-create a new tree (decision tree or search tree) from XML
static RooMathCoreReg dummy
void * GetNextChild(void *prevchild, const char *childname=0)
XML helpers.
Definition: Tools.cxx:1173
void SetAdaBoostBeta(Double_t b)
Definition: MethodBDT.h:128
void SetCurrentType(Types::ETreeType type) const
Definition: DataSet.h:100
The TH1 histogram class.
Definition: TH1.h:56
std::vector< const TMVA::Event * > * fTrainSample
Definition: MethodBDT.h:198
you should not use this method at all Int_t Int_t Double_t Double_t Double_t e
Definition: TRolke.cxx:630
void UsefulSortAscending(std::vector< std::vector< Double_t > > &, std::vector< TString > *vs=0)
sort 2D vector (AND in parallel a TString vector) in such a way that the "first vector is sorted" and...
Definition: Tools.cxx:549
VariableInfo & GetVariableInfo(Int_t i)
Definition: DataSetInfo.h:96
void AddPreDefVal(const T &)
Definition: Configurable.h:168
Double_t Boost(std::vector< const TMVA::Event *> &, DecisionTree *dt, UInt_t cls=0)
Apply the boosting algorithm (the algorithm is selecte via the the "option" given in the constructor...
Definition: MethodBDT.cxx:1608
UInt_t GetNumber() const
Definition: ClassInfo.h:65
void ExitFromTraining()
Definition: MethodBase.h:451
The TMVA::Interval Class.
Definition: LogInterval.h:83
const TString & GetOptions() const
Definition: Configurable.h:84
LossFunctionBDT * fRegressionLossFunctionBDTG
Definition: MethodBDT.h:288
TString fBoostType
Definition: MethodBDT.h:204
Bool_t fUseFisherCuts
Definition: MethodBDT.h:225
const TString & Color(const TString &)
human readable color strings
Definition: Tools.cxx:839
TMatrixTSym< Element > & Invert(Double_t *det=0)
Invert the matrix and calculate its determinant Notice that the LU decomposition is used instead of B...
UInt_t fUseNTrainEvents
Definition: MethodBDT.h:241
virtual std::map< TString, Double_t > OptimizeTuningParameters(TString fomType="ROCIntegral", TString fitType="FitGA")
Call the Optimizer with the set of parameters and ranges that are meant to be tuned.
Definition: MethodBDT.cxx:1067
virtual Int_t Branch(TCollection *list, Int_t bufsize=32000, Int_t splitlevel=99, const char *name="")
Create one branch for each element in the collection.
Definition: TTree.cxx:1701
#define REGISTER_METHOD(CLASS)
for example
TString fNegWeightTreatment
Definition: MethodBDT.h:244
Abstract ClassifierFactory template that handles arbitrary types.
Ranking * fRanking
Definition: MethodBase.h:574
virtual void SetXTitle(const char *title)
Definition: TH1.h:405
virtual void SetPoint(Int_t i, Double_t x, Double_t y)
Set x and y values for point number i.
Definition: TGraph.cxx:2184
IPythonInteractive * fInteractive
Definition: MethodBase.h:435
TDirectory * BaseDir() const
returns the ROOT directory where info/histograms etc of the corresponding MVA method instance are sto...
Float_t GetResponse(void) const
virtual void AddRank(const Rank &rank)
Add a new rank take ownership of it.
Definition: Ranking.cxx:86
virtual void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
Definition: MethodBase.cxx:601
Class that is the base-class for a vector of result.
Definition: Results.h:57
Short_t Max(Short_t a, Short_t b)
Definition: TMathBase.h:200
double ceil(double)
A Graph is a graphics object made of two arrays X and Y with npoints each.
Definition: TGraph.h:41
virtual DecisionTreeNode * GetLeft() const
std::vector< const TMVA::Event * > fValidationSample
Definition: MethodBDT.h:196
std::vector< DecisionTree * > fForest
Definition: MethodBDT.h:201
virtual DecisionTreeNode * GetRight() const
Bool_t IsSignal(const Event *ev) const
void DrawProgressBar(Int_t, const TString &comment="")
draws progress bar in color or B&W caution:
Definition: Timer.cxx:190
std::vector< Double_t > GetVariableImportance()
Return the relative variable importance, normalized to all variables together having the importance 1...
Definition: MethodBDT.cxx:2522
Double_t fFValidationEvents
Definition: MethodBDT.h:236
std::vector< Double_t > fLowSigCut
Definition: MethodBDT.h:273
std::vector< Float_t > * fRegressionReturnVal
Definition: MethodBase.h:584
Double_t Atof() const
Return floating-point value contained in string.
Definition: TString.cxx:2041
void UpdateTargetsRegression(std::vector< const TMVA::Event *> &, Bool_t first=kFALSE)
Calculate current residuals for all events and update targets for next iteration. ...
Definition: MethodBDT.cxx:1476
Types::EAnalysisType GetAnalysisType() const
Definition: MethodBase.h:426
A TTree object has a header with a name and a title.
Definition: TTree.h:70
Short_t GetSelector() const
std::map< const TMVA::Event *, std::vector< double > > fResiduals
Definition: MethodBDT.h:215
Definition: first.py:1
void Store(TObject *obj, const char *alias=0)
Definition: Results.cxx:86
static const Int_t fgDebugLevel
Definition: MethodBDT.h:291
virtual void Init(std::map< const TMVA::Event *, LossFunctionEventInfo > &evinfomap, std::vector< double > &boostWeights)=0
Double_t Sqrt(Double_t x)
Definition: TMath.h:590
std::vector< TMatrixDSym * > * CalcCovarianceMatrices(const std::vector< Event *> &events, Int_t maxCls, VariableTransformBase *transformBase=0)
compute covariance matrices
Definition: Tools.cxx:1525
virtual void Set(Int_t n)
Set number of points in the graph Existing coordinates are preserved New coordinates above fNpoints a...
Definition: TGraph.cxx:2133
double exp(double)
THist< 2, float, THistStatContent, THistStatUncertainty > TH2F
Definition: THist.hxx:291
Double_t fCss
Definition: MethodBDT.h:258
const Bool_t kTRUE
Definition: RtypesCore.h:87
const Int_t n
Definition: legend1.C:16
std::vector< const TMVA::Event * > fSubSample
Definition: MethodBDT.h:197
Timing information for training and evaluation of MVA methods.
Definition: Timer.h:58
UInt_t GetNTrees() const
Definition: MethodBDT.h:102
Analysis of Boosted Decision Trees.
Definition: MethodBDT.h:54
Int_t CeilNint(Double_t x)
Definition: TMath.h:596
Double_t fHuberQuantile
Definition: MethodBDT.h:286
UInt_t fNNodesMax
Definition: MethodBDT.h:230
virtual void SetTargets(std::vector< const TMVA::Event *> &evs, std::map< const TMVA::Event *, LossFunctionEventInfo > &evinfomap)=0
void InitGradBoost(std::vector< const TMVA::Event *> &)
Initialize targets for first tree.
Definition: MethodBDT.cxx:1549
void NoErrorCalc(Double_t *const err, Double_t *const errUpper)
Definition: MethodBase.cxx:829
void SetSignalReferenceCut(Double_t cut)
Definition: MethodBase.h:353
virtual const char * GetTitle() const
Returns title of object.
Definition: TNamed.h:48
std::vector< double > fBoostWeights
Definition: MethodBDT.h:202
MethodBDT(const TString &jobName, const TString &methodTitle, DataSetInfo &theData, const TString &theOption="")
The standard constructor for the "boosted decision trees".
Definition: MethodBDT.cxx:164