Logo ROOT   6.07/09
Reference Guide
MethodBDT.cxx
Go to the documentation of this file.
1 // Author: Andreas Hoecker, Joerg Stelzer, Helge Voss, Kai Voss, Eckhard v. Toerne, Jan Therhaag
2 
3 /**********************************************************************************
4  * Project: TMVA - a Root-integrated toolkit for multivariate data analysis *
5  * Package: TMVA *
6  * Class : MethodBDT (BDT = Boosted Decision Trees) *
7  * Web : http://tmva.sourceforge.net *
8  * *
9  * Description: *
10  * Analysis of Boosted Decision Trees *
11  * *
12  * Authors (alphabetical): *
13  * Andreas Hoecker <Andreas.Hocker@cern.ch> - CERN, Switzerland *
14  * Helge Voss <Helge.Voss@cern.ch> - MPI-K Heidelberg, Germany *
15  * Kai Voss <Kai.Voss@cern.ch> - U. of Victoria, Canada *
16  * Doug Schouten <dschoute@sfu.ca> - Simon Fraser U., Canada *
17  * Jan Therhaag <jan.therhaag@cern.ch> - U. of Bonn, Germany *
18  * Eckhard v. Toerne <evt@uni-bonn.de> - U of Bonn, Germany *
19  * *
20  * Copyright (c) 2005-2011: *
21  * CERN, Switzerland *
22  * U. of Victoria, Canada *
23  * MPI-K Heidelberg, Germany *
24  * U. of Bonn, Germany *
25  * *
26  * Redistribution and use in source and binary forms, with or without *
27  * modification, are permitted according to the terms listed in LICENSE *
28  * (http://tmva.sourceforge.net/LICENSE) *
29  **********************************************************************************/
30 
31 //_______________________________________________________________________
32 //
33 // Analysis of Boosted Decision Trees
34 //
35 // Boosted decision trees have been successfully used in High Energy
36 // Physics analysis for example by the MiniBooNE experiment
37 // (Yang-Roe-Zhu, physics/0508045). In Boosted Decision Trees, the
38 // selection is done on a majority vote on the result of several decision
39 // trees, which are all derived from the same training sample by
40 // supplying different event weights during the training.
41 //
42 // Decision trees:
43 //
44 // Successive decision nodes are used to categorize the
45 // events out of the sample as either signal or background. Each node
46 // uses only a single discriminating variable to decide if the event is
47 // signal-like ("goes right") or background-like ("goes left"). This
48 // forms a tree like structure with "baskets" at the end (leave nodes),
49 // and an event is classified as either signal or background according to
50 // whether the basket where it ends up has been classified signal or
51 // background during the training. Training of a decision tree is the
52 // process to define the "cut criteria" for each node. The training
53 // starts with the root node. Here one takes the full training event
54 // sample and selects the variable and corresponding cut value that gives
55 // the best separation between signal and background at this stage. Using
56 // this cut criterion, the sample is then divided into two subsamples, a
57 // signal-like (right) and a background-like (left) sample. Two new nodes
58 // are then created for each of the two sub-samples and they are
59 // constructed using the same mechanism as described for the root
60 // node. The devision is stopped once a certain node has reached either a
61 // minimum number of events, or a minimum or maximum signal purity. These
62 // leave nodes are then called "signal" or "background" if they contain
63 // more signal respective background events from the training sample.
64 //
65 // Boosting:
66 //
67 // The idea behind adaptive boosting (AdaBoost) is, that signal events
68 // from the training sample, that end up in a background node
69 // (and vice versa) are given a larger weight than events that are in
70 // the correct leave node. This results in a re-weighed training event
71 // sample, with which then a new decision tree can be developed.
72 // The boosting can be applied several times (typically 100-500 times)
73 // and one ends up with a set of decision trees (a forest).
74 // Gradient boosting works more like a function expansion approach, where
75 // each tree corresponds to a summand. The parameters for each summand (tree)
76 // are determined by the minimization of a error function (binomial log-
77 // likelihood for classification and Huber loss for regression).
78 // A greedy algorithm is used, which means, that only one tree is modified
79 // at a time, while the other trees stay fixed.
80 //
81 // Bagging:
82 //
83 // In this particular variant of the Boosted Decision Trees the boosting
84 // is not done on the basis of previous training results, but by a simple
85 // stochastic re-sampling of the initial training event sample.
86 //
87 // Random Trees:
88 // Similar to the "Random Forests" from Leo Breiman and Adele Cutler, it
89 // uses the bagging algorithm together and bases the determination of the
90 // best node-split during the training on a random subset of variables only
91 // which is individually chosen for each split.
92 //
93 // Analysis:
94 //
95 // Applying an individual decision tree to a test event results in a
96 // classification of the event as either signal or background. For the
97 // boosted decision tree selection, an event is successively subjected to
98 // the whole set of decision trees and depending on how often it is
99 // classified as signal, a "likelihood" estimator is constructed for the
100 // event being signal or background. The value of this estimator is the
101 // one which is then used to select the events from an event sample, and
102 // the cut value on this estimator defines the efficiency and purity of
103 // the selection.
104 //
105 //_______________________________________________________________________
106 
107 #include "TMVA/MethodBDT.h"
108 
109 #include "TMVA/BDTEventWrapper.h"
110 #include "TMVA/BinarySearchTree.h"
111 #include "TMVA/ClassifierFactory.h"
112 #include "TMVA/Configurable.h"
113 #include "TMVA/CrossEntropy.h"
114 #include "TMVA/DecisionTree.h"
115 #include "TMVA/DataSet.h"
116 #include "TMVA/GiniIndex.h"
118 #include "TMVA/Interval.h"
119 #include "TMVA/IMethod.h"
120 #include "TMVA/LogInterval.h"
121 #include "TMVA/MethodBase.h"
123 #include "TMVA/MsgLogger.h"
125 #include "TMVA/PDF.h"
126 #include "TMVA/Ranking.h"
127 #include "TMVA/Results.h"
128 #include "TMVA/ResultsMulticlass.h"
129 #include "TMVA/SdivSqrtSplusB.h"
130 #include "TMVA/SeparationBase.h"
131 #include "TMVA/Timer.h"
132 #include "TMVA/Tools.h"
133 #include "TMVA/Types.h"
134 
135 #include "Riostream.h"
136 #include "TDirectory.h"
137 #include "TRandom3.h"
138 #include "TMath.h"
139 #include "TMatrixTSym.h"
140 #include "TObjString.h"
141 #include "TGraph.h"
142 
143 #include <algorithm>
144 #include <fstream>
145 #include <math.h>
146 
147 
148 using std::vector;
149 using std::make_pair;
150 
152 
153 ClassImp(TMVA::MethodBDT)
154 
155  const Int_t TMVA::MethodBDT::fgDebugLevel = 0;
156 
157 ////////////////////////////////////////////////////////////////////////////////
158 /// the standard constructor for the "boosted decision trees"
159 
160 TMVA::MethodBDT::MethodBDT( const TString& jobName,
161  const TString& methodTitle,
162  DataSetInfo& theData,
163  const TString& theOption ) :
164  TMVA::MethodBase( jobName, Types::kBDT, methodTitle, theData, theOption)
165  , fTrainSample(0)
166  , fNTrees(0)
167  , fSigToBkgFraction(0)
168  , fAdaBoostBeta(0)
169 // , fTransitionPoint(0)
170  , fShrinkage(0)
171  , fBaggedBoost(kFALSE)
172  , fBaggedGradBoost(kFALSE)
173 // , fSumOfWeights(0)
174  , fMinNodeEvents(0)
175  , fMinNodeSize(5)
176  , fMinNodeSizeS("5%")
177  , fNCuts(0)
178  , fUseFisherCuts(0) // don't use this initialisation, only here to make Coverity happy. Is set in DeclarOptions()
179  , fMinLinCorrForFisher(.8) // don't use this initialisation, only here to make Coverity happy. Is set in DeclarOptions()
180  , fUseExclusiveVars(0) // don't use this initialisation, only here to make Coverity happy. Is set in DeclarOptions()
181  , fUseYesNoLeaf(kFALSE)
182  , fNodePurityLimit(0)
183  , fNNodesMax(0)
184  , fMaxDepth(0)
185  , fPruneMethod(DecisionTree::kNoPruning)
186  , fPruneStrength(0)
187  , fFValidationEvents(0)
188  , fAutomatic(kFALSE)
189  , fRandomisedTrees(kFALSE)
190  , fUseNvars(0)
191  , fUsePoissonNvars(0) // don't use this initialisation, only here to make Coverity happy. Is set in Init()
192  , fUseNTrainEvents(0)
193  , fBaggedSampleFraction(0)
194  , fNoNegWeightsInTraining(kFALSE)
195  , fInverseBoostNegWeights(kFALSE)
196  , fPairNegWeightsGlobal(kFALSE)
197  , fTrainWithNegWeights(kFALSE)
198  , fDoBoostMonitor(kFALSE)
199  , fITree(0)
200  , fBoostWeight(0)
201  , fErrorFraction(0)
202  , fCss(0)
203  , fCts_sb(0)
204  , fCtb_ss(0)
205  , fCbb(0)
206  , fDoPreselection(kFALSE)
207  , fSkipNormalization(kFALSE)
208  , fHistoricBool(kFALSE)
209 {
210  fMonitorNtuple = NULL;
211  fSepType = NULL;
212 }
213 
214 ////////////////////////////////////////////////////////////////////////////////
215 
217  const TString& theWeightFile)
218  : TMVA::MethodBase( Types::kBDT, theData, theWeightFile)
219  , fTrainSample(0)
220  , fNTrees(0)
221  , fSigToBkgFraction(0)
222  , fAdaBoostBeta(0)
223 // , fTransitionPoint(0)
224  , fShrinkage(0)
225  , fBaggedBoost(kFALSE)
226  , fBaggedGradBoost(kFALSE)
227 // , fSumOfWeights(0)
228  , fMinNodeEvents(0)
229  , fMinNodeSize(5)
230  , fMinNodeSizeS("5%")
231  , fNCuts(0)
232  , fUseFisherCuts(0) // don't use this initialisation, only here to make Coverity happy. Is set in DeclarOptions()
233  , fMinLinCorrForFisher(.8) // don't use this initialisation, only here to make Coverity happy. Is set in DeclarOptions()
234  , fUseExclusiveVars(0) // don't use this initialisation, only here to make Coverity happy. Is set in DeclarOptions()
235  , fUseYesNoLeaf(kFALSE)
236  , fNodePurityLimit(0)
237  , fNNodesMax(0)
238  , fMaxDepth(0)
239  , fPruneMethod(DecisionTree::kNoPruning)
240  , fPruneStrength(0)
241  , fFValidationEvents(0)
242  , fAutomatic(kFALSE)
243  , fRandomisedTrees(kFALSE)
244  , fUseNvars(0)
245  , fUsePoissonNvars(0) // don't use this initialisation, only here to make Coverity happy. Is set in Init()
246  , fUseNTrainEvents(0)
247  , fBaggedSampleFraction(0)
248  , fNoNegWeightsInTraining(kFALSE)
249  , fInverseBoostNegWeights(kFALSE)
250  , fPairNegWeightsGlobal(kFALSE)
251  , fTrainWithNegWeights(kFALSE)
252  , fDoBoostMonitor(kFALSE)
253  , fITree(0)
254  , fBoostWeight(0)
255  , fErrorFraction(0)
256  , fCss(0)
257  , fCts_sb(0)
258  , fCtb_ss(0)
259  , fCbb(0)
260  , fDoPreselection(kFALSE)
261  , fSkipNormalization(kFALSE)
262  , fHistoricBool(kFALSE)
263 {
265  fSepType = NULL;
266  // constructor for calculating BDT-MVA using previously generated decision trees
267  // the result of the previous training (the decision trees) are read in via the
268  // weight file. Make sure the the variables correspond to the ones used in
269  // creating the "weight"-file
270 }
271 
272 ////////////////////////////////////////////////////////////////////////////////
273 /// BDT can handle classification with multiple classes and regression with one regression-target
274 
276 {
277  if (type == Types::kClassification && numberClasses == 2) return kTRUE;
278  if (type == Types::kMulticlass ) return kTRUE;
279  if( type == Types::kRegression && numberTargets == 1 ) return kTRUE;
280  return kFALSE;
281 }
282 
283 ////////////////////////////////////////////////////////////////////////////////
284 /// define the options (their key words) that can be set in the option string
285 /// know options:
286 /// nTrees number of trees in the forest to be created
287 /// BoostType the boosting type for the trees in the forest (AdaBoost e.t.c..)
288 /// known: AdaBoost
289 /// AdaBoostR2 (Adaboost for regression)
290 /// Bagging
291 /// GradBoost
292 /// AdaBoostBeta the boosting parameter, beta, for AdaBoost
293 /// UseRandomisedTrees choose at each node splitting a random set of variables
294 /// UseNvars use UseNvars variables in randomised trees
295 /// UsePoission Nvars use UseNvars not as fixed number but as mean of a possion distribution
296 /// SeparationType the separation criterion applied in the node splitting
297 /// known: GiniIndex
298 /// MisClassificationError
299 /// CrossEntropy
300 /// SDivSqrtSPlusB
301 /// MinNodeSize: minimum percentage of training events in a leaf node (leaf criteria, stop splitting)
302 /// nCuts: the number of steps in the optimisation of the cut for a node (if < 0, then
303 /// step size is determined by the events)
304 /// UseFisherCuts: use multivariate splits using the Fisher criterion
305 /// UseYesNoLeaf decide if the classification is done simply by the node type, or the S/B
306 /// (from the training) in the leaf node
307 /// NodePurityLimit the minimum purity to classify a node as a signal node (used in pruning and boosting to determine
308 /// misclassification error rate)
309 /// PruneMethod The Pruning method:
310 /// known: NoPruning // switch off pruning completely
311 /// ExpectedError
312 /// CostComplexity
313 /// PruneStrength a parameter to adjust the amount of pruning. Should be large enough such that overtraining is avoided.
314 /// PruningValFraction number of events to use for optimizing pruning (only if PruneStrength < 0, i.e. automatic pruning)
315 /// NegWeightTreatment IgnoreNegWeightsInTraining Ignore negative weight events in the training.
316 /// DecreaseBoostWeight Boost ev. with neg. weight with 1/boostweight instead of boostweight
317 /// PairNegWeightsGlobal Pair ev. with neg. and pos. weights in traning sample and "annihilate" them
318 /// MaxDepth maximum depth of the decision tree allowed before further splitting is stopped
319 /// SkipNormalization Skip normalization at initialization, to keep expectation value of BDT output
320 /// according to the fraction of events
321 
322 
324 {
325  DeclareOptionRef(fNTrees, "NTrees", "Number of trees in the forest");
326  if (DoRegression()) {
327  DeclareOptionRef(fMaxDepth=50,"MaxDepth","Max depth of the decision tree allowed");
328  }else{
329  DeclareOptionRef(fMaxDepth=3,"MaxDepth","Max depth of the decision tree allowed");
330  }
331 
332  TString tmp="5%"; if (DoRegression()) tmp="0.2%";
333  DeclareOptionRef(fMinNodeSizeS=tmp, "MinNodeSize", "Minimum percentage of training events required in a leaf node (default: Classification: 5%, Regression: 0.2%)");
334  // MinNodeSize: minimum percentage of training events in a leaf node (leaf criteria, stop splitting)
335  DeclareOptionRef(fNCuts, "nCuts", "Number of grid points in variable range used in finding optimal cut in node splitting");
336 
337  DeclareOptionRef(fBoostType, "BoostType", "Boosting type for the trees in the forest (note: AdaCost is still experimental)");
338 
339  AddPreDefVal(TString("AdaBoost"));
340  AddPreDefVal(TString("RealAdaBoost"));
341  AddPreDefVal(TString("AdaCost"));
342  AddPreDefVal(TString("Bagging"));
343  // AddPreDefVal(TString("RegBoost"));
344  AddPreDefVal(TString("AdaBoostR2"));
345  AddPreDefVal(TString("Grad"));
346  if (DoRegression()) {
347  fBoostType = "AdaBoostR2";
348  }else{
349  fBoostType = "AdaBoost";
350  }
351  DeclareOptionRef(fAdaBoostR2Loss="Quadratic", "AdaBoostR2Loss", "Type of Loss function in AdaBoostR2");
352  AddPreDefVal(TString("Linear"));
353  AddPreDefVal(TString("Quadratic"));
354  AddPreDefVal(TString("Exponential"));
355 
356  DeclareOptionRef(fBaggedBoost=kFALSE, "UseBaggedBoost","Use only a random subsample of all events for growing the trees in each boost iteration.");
357  DeclareOptionRef(fShrinkage=1.0, "Shrinkage", "Learning rate for GradBoost algorithm");
358  DeclareOptionRef(fAdaBoostBeta=.5, "AdaBoostBeta", "Learning rate for AdaBoost algorithm");
359  DeclareOptionRef(fRandomisedTrees,"UseRandomisedTrees","Determine at each node splitting the cut variable only as the best out of a random subset of variables (like in RandomForests)");
360  DeclareOptionRef(fUseNvars,"UseNvars","Size of the subset of variables used with RandomisedTree option");
361  DeclareOptionRef(fUsePoissonNvars,"UsePoissonNvars", "Interpret \"UseNvars\" not as fixed number but as mean of a Possion distribution in each split with RandomisedTree option");
362  DeclareOptionRef(fBaggedSampleFraction=.6,"BaggedSampleFraction","Relative size of bagged event sample to original size of the data sample (used whenever bagging is used (i.e. UseBaggedBoost, Bagging,)" );
363 
364  DeclareOptionRef(fUseYesNoLeaf=kTRUE, "UseYesNoLeaf",
365  "Use Sig or Bkg categories, or the purity=S/(S+B) as classification of the leaf node -> Real-AdaBoost");
366  if (DoRegression()) {
368  }
369 
370  DeclareOptionRef(fNegWeightTreatment="InverseBoostNegWeights","NegWeightTreatment","How to treat events with negative weights in the BDT training (particular the boosting) : IgnoreInTraining; Boost With inverse boostweight; Pair events with negative and positive weights in traning sample and *annihilate* them (experimental!)");
371  AddPreDefVal(TString("InverseBoostNegWeights"));
372  AddPreDefVal(TString("IgnoreNegWeightsInTraining"));
373  AddPreDefVal(TString("NoNegWeightsInTraining")); // well, let's be nice to users and keep at least this old name anyway ..
374  AddPreDefVal(TString("PairNegWeightsGlobal"));
375  AddPreDefVal(TString("Pray"));
376 
377 
378 
379  DeclareOptionRef(fCss=1., "Css", "AdaCost: cost of true signal selected signal");
380  DeclareOptionRef(fCts_sb=1.,"Cts_sb","AdaCost: cost of true signal selected bkg");
381  DeclareOptionRef(fCtb_ss=1.,"Ctb_ss","AdaCost: cost of true bkg selected signal");
382  DeclareOptionRef(fCbb=1., "Cbb", "AdaCost: cost of true bkg selected bkg ");
383 
384  DeclareOptionRef(fNodePurityLimit=0.5, "NodePurityLimit", "In boosting/pruning, nodes with purity > NodePurityLimit are signal; background otherwise.");
385 
386 
387  DeclareOptionRef(fSepTypeS, "SeparationType", "Separation criterion for node splitting");
388  AddPreDefVal(TString("CrossEntropy"));
389  AddPreDefVal(TString("GiniIndex"));
390  AddPreDefVal(TString("GiniIndexWithLaplace"));
391  AddPreDefVal(TString("MisClassificationError"));
392  AddPreDefVal(TString("SDivSqrtSPlusB"));
393  AddPreDefVal(TString("RegressionVariance"));
394  if (DoRegression()) {
395  fSepTypeS = "RegressionVariance";
396  }else{
397  fSepTypeS = "GiniIndex";
398  }
399 
400  DeclareOptionRef(fRegressionLossFunctionBDTGS = "Huber", "RegressionLossFunctionBDTG", "Loss function for BDTG regression.");
401  AddPreDefVal(TString("Huber"));
402  AddPreDefVal(TString("AbsoluteDeviation"));
403  AddPreDefVal(TString("LeastSquares"));
404 
405  DeclareOptionRef(fHuberQuantile = 0.7, "HuberQuantile", "In the Huber loss function this is the quantile that separates the core from the tails in the residuals distribution.");
406 
407  DeclareOptionRef(fDoBoostMonitor=kFALSE,"DoBoostMonitor","Create control plot with ROC integral vs tree number");
408 
409  DeclareOptionRef(fUseFisherCuts=kFALSE, "UseFisherCuts", "Use multivariate splits using the Fisher criterion");
410  DeclareOptionRef(fMinLinCorrForFisher=.8,"MinLinCorrForFisher", "The minimum linear correlation between two variables demanded for use in Fisher criterion in node splitting");
411  DeclareOptionRef(fUseExclusiveVars=kFALSE,"UseExclusiveVars","Variables already used in fisher criterion are not anymore analysed individually for node splitting");
412 
413 
414  DeclareOptionRef(fDoPreselection=kFALSE,"DoPreselection","and and apply automatic pre-selection for 100% efficient signal (bkg) cuts prior to training");
415 
416 
417  DeclareOptionRef(fSigToBkgFraction=1,"SigToBkgFraction","Sig to Bkg ratio used in Training (similar to NodePurityLimit, which cannot be used in real adaboost");
418 
419  DeclareOptionRef(fPruneMethodS, "PruneMethod", "Note: for BDTs use small trees (e.g.MaxDepth=3) and NoPruning: Pruning: Method used for pruning (removal) of statistically insignificant branches ");
420  AddPreDefVal(TString("NoPruning"));
421  AddPreDefVal(TString("ExpectedError"));
422  AddPreDefVal(TString("CostComplexity"));
423 
424  DeclareOptionRef(fPruneStrength, "PruneStrength", "Pruning strength");
425 
426  DeclareOptionRef(fFValidationEvents=0.5, "PruningValFraction", "Fraction of events to use for optimizing automatic pruning.");
427 
428  DeclareOptionRef(fSkipNormalization=kFALSE, "SkipNormalization", "Skip normalization at initialization, to keep expectation value of BDT output according to the fraction of events");
429 
430  // deprecated options, still kept for the moment:
431  DeclareOptionRef(fMinNodeEvents=0, "nEventsMin", "deprecated: Use MinNodeSize (in % of training events) instead");
432 
433  DeclareOptionRef(fBaggedGradBoost=kFALSE, "UseBaggedGrad","deprecated: Use *UseBaggedBoost* instead: Use only a random subsample of all events for growing the trees in each iteration.");
434  DeclareOptionRef(fBaggedSampleFraction, "GradBaggingFraction","deprecated: Use *BaggedSampleFraction* instead: Defines the fraction of events to be used in each iteration, e.g. when UseBaggedGrad=kTRUE. ");
435  DeclareOptionRef(fUseNTrainEvents,"UseNTrainEvents","deprecated: Use *BaggedSampleFraction* instead: Number of randomly picked training events used in randomised (and bagged) trees");
436  DeclareOptionRef(fNNodesMax,"NNodesMax","deprecated: Use MaxDepth instead to limit the tree size" );
437 
438 
439 }
440 
441 ////////////////////////////////////////////////////////////////////////////////
442 /// options that are used ONLY for the READER to ensure backward compatibility
443 
446 
447 
448  DeclareOptionRef(fHistoricBool=kTRUE, "UseWeightedTrees",
449  "Use weighted trees or simple average in classification from the forest");
450  DeclareOptionRef(fHistoricBool=kFALSE, "PruneBeforeBoost", "Flag to prune the tree before applying boosting algorithm");
451  DeclareOptionRef(fHistoricBool=kFALSE,"RenormByClass","Individually re-normalize each event class to the original size after boosting");
452 
453  AddPreDefVal(TString("NegWeightTreatment"),TString("IgnoreNegWeights"));
454 
455 }
456 
457 
458 
459 
460 ////////////////////////////////////////////////////////////////////////////////
461 /// the option string is decoded, for available options see "DeclareOptions"
462 
464 {
465  fSepTypeS.ToLower();
466  if (fSepTypeS == "misclassificationerror") fSepType = new MisClassificationError();
467  else if (fSepTypeS == "giniindex") fSepType = new GiniIndex();
468  else if (fSepTypeS == "giniindexwithlaplace") fSepType = new GiniIndexWithLaplace();
469  else if (fSepTypeS == "crossentropy") fSepType = new CrossEntropy();
470  else if (fSepTypeS == "sdivsqrtsplusb") fSepType = new SdivSqrtSplusB();
471  else if (fSepTypeS == "regressionvariance") fSepType = NULL;
472  else {
473  Log() << kINFO << GetOptions() << Endl;
474  Log() << kFATAL << "<ProcessOptions> unknown Separation Index option " << fSepTypeS << " called" << Endl;
475  }
476 
477  if(!(fHuberQuantile >= 0.0 && fHuberQuantile <= 1.0)){
478  Log() << kINFO << GetOptions() << Endl;
479  Log() << kFATAL << "<ProcessOptions> Huber Quantile must be in range [0,1]. Value given, " << fHuberQuantile << ", does not match this criteria" << Endl;
480  }
481 
486  else {
487  Log() << kINFO << GetOptions() << Endl;
488  Log() << kFATAL << "<ProcessOptions> unknown Regression Loss Function BDT option " << fRegressionLossFunctionBDTGS << " called" << Endl;
489  }
490 
493  else if (fPruneMethodS == "costcomplexity") fPruneMethod = DecisionTree::kCostComplexityPruning;
494  else if (fPruneMethodS == "nopruning") fPruneMethod = DecisionTree::kNoPruning;
495  else {
496  Log() << kINFO << GetOptions() << Endl;
497  Log() << kFATAL << "<ProcessOptions> unknown PruneMethod " << fPruneMethodS << " option called" << Endl;
498  }
500  else fAutomatic = kFALSE;
502  Log() << kFATAL
503  << "Sorry autmoatic pruning strength determination is not implemented yet for ExpectedErrorPruning" << Endl;
504  }
505 
506 
507  if (fMinNodeEvents > 0){
509  Log() << kWARNING << "You have explicitly set ** nEventsMin = " << fMinNodeEvents<<" ** the min ablsolut number \n"
510  << "of events in a leaf node. This is DEPRECATED, please use the option \n"
511  << "*MinNodeSize* giving the relative number as percentage of training \n"
512  << "events instead. \n"
513  << "nEventsMin="<<fMinNodeEvents<< "--> MinNodeSize="<<fMinNodeSize<<"%"
514  << Endl;
515  Log() << kWARNING << "Note also that explicitly setting *nEventsMin* so far OVERWRITES the option recomeded \n"
516  << " *MinNodeSize* = " << fMinNodeSizeS << " option !!" << Endl ;
517  fMinNodeSizeS = Form("%F3.2",fMinNodeSize);
518 
519  }else{
521  }
522 
523 
525 
526  if (fBoostType=="Grad") {
528  if (fNegWeightTreatment=="InverseBoostNegWeights"){
529  Log() << kINFO << "the option *InverseBoostNegWeights* does not exist for BoostType=Grad --> change" << Endl;
530  Log() << kINFO << "to new default for GradBoost *Pray*" << Endl;
531  Log() << kDEBUG << "i.e. simply keep them as if which should work fine for Grad Boost" << Endl;
532  fNegWeightTreatment="Pray";
534  }
535  } else if (fBoostType=="RealAdaBoost"){
536  fBoostType = "AdaBoost";
538  } else if (fBoostType=="AdaCost"){
540  }
541 
542  if (fFValidationEvents < 0.0) fFValidationEvents = 0.0;
543  if (fAutomatic && fFValidationEvents > 0.5) {
544  Log() << kWARNING << "You have chosen to use more than half of your training sample "
545  << "to optimize the automatic pruning algorithm. This is probably wasteful "
546  << "and your overall results will be degraded. Are you sure you want this?"
547  << Endl;
548  }
549 
550 
551  if (this->Data()->HasNegativeEventWeights()){
552  Log() << kINFO << " You are using a Monte Carlo that has also negative weights. "
553  << "That should in principle be fine as long as on average you end up with "
554  << "something positive. For this you have to make sure that the minimal number "
555  << "of (un-weighted) events demanded for a tree node (currently you use: MinNodeSize="
556  << fMinNodeSizeS << " ("<< fMinNodeSize << "%)"
557  <<", (or the deprecated equivalent nEventsMin) you can set this via the "
558  <<"BDT option string when booking the "
559  << "classifier) is large enough to allow for reasonable averaging!!! "
560  << " If this does not help.. maybe you want to try the option: IgnoreNegWeightsInTraining "
561  << "which ignores events with negative weight in the training. " << Endl
562  << Endl << "Note: You'll get a WARNING message during the training if that should ever happen" << Endl;
563  }
564 
565  if (DoRegression()) {
567  Log() << kWARNING << "Regression Trees do not work with fUseYesNoLeaf=TRUE --> I will set it to FALSE" << Endl;
569  }
570 
571  if (fSepType != NULL){
572  Log() << kWARNING << "Regression Trees do not work with Separation type other than <RegressionVariance> --> I will use it instead" << Endl;
573  fSepType = NULL;
574  }
575  if (fUseFisherCuts){
576  Log() << kWARNING << "Sorry, UseFisherCuts is not available for regression analysis, I will ignore it!" << Endl;
578  }
579  if (fNCuts < 0) {
580  Log() << kWARNING << "Sorry, the option of nCuts<0 using a more elaborate node splitting algorithm " << Endl;
581  Log() << kWARNING << "is not implemented for regression analysis ! " << Endl;
582  Log() << kWARNING << "--> I switch do default nCuts = 20 and use standard node splitting"<<Endl;
583  fNCuts=20;
584  }
585  }
586  if (fRandomisedTrees){
587  Log() << kINFO << " Randomised trees use no pruning" << Endl;
589  // fBoostType = "Bagging";
590  }
591 
592  if (fUseFisherCuts) {
593  Log() << kWARNING << "When using the option UseFisherCuts, the other option nCuts<0 (i.e. using" << Endl;
594  Log() << " a more elaborate node splitting algorithm) is not implemented. " << Endl;
595  //I will switch o " << Endl;
596  //Log() << "--> I switch do default nCuts = 20 and use standard node splitting WITH possible Fisher criteria"<<Endl;
597  fNCuts=20;
598  }
599 
600  if (fNTrees==0){
601  Log() << kERROR << " Zero Decision Trees demanded... that does not work !! "
602  << " I set it to 1 .. just so that the program does not crash"
603  << Endl;
604  fNTrees = 1;
605  }
606 
608  if (fNegWeightTreatment == "ignorenegweightsintraining") fNoNegWeightsInTraining = kTRUE;
609  else if (fNegWeightTreatment == "nonegweightsintraining") fNoNegWeightsInTraining = kTRUE;
610  else if (fNegWeightTreatment == "inverseboostnegweights") fInverseBoostNegWeights = kTRUE;
611  else if (fNegWeightTreatment == "pairnegweightsglobal") fPairNegWeightsGlobal = kTRUE;
612  else if (fNegWeightTreatment == "pray") Log() << kDEBUG << "Yes, good luck with praying " << Endl;
613  else {
614  Log() << kINFO << GetOptions() << Endl;
615  Log() << kFATAL << "<ProcessOptions> unknown option for treating negative event weights during training " << fNegWeightTreatment << " requested" << Endl;
616  }
617 
618  if (fNegWeightTreatment == "pairnegweightsglobal")
619  Log() << kWARNING << " you specified the option NegWeightTreatment=PairNegWeightsGlobal : This option is still considered EXPERIMENTAL !! " << Endl;
620 
621 
622  // dealing with deprecated options !
623  if (fNNodesMax>0) {
624  UInt_t tmp=1; // depth=0 == 1 node
625  fMaxDepth=0;
626  while (tmp < fNNodesMax){
627  tmp+=2*tmp;
628  fMaxDepth++;
629  }
630  Log() << kWARNING << "You have specified a deprecated option *NNodesMax="<<fNNodesMax
631  << "* \n this has been translated to MaxDepth="<<fMaxDepth<<Endl;
632  }
633 
634 
635  if (fUseNTrainEvents>0){
637  Log() << kWARNING << "You have specified a deprecated option *UseNTrainEvents="<<fUseNTrainEvents
638  << "* \n this has been translated to BaggedSampleFraction="<<fBaggedSampleFraction<<"(%)"<<Endl;
639  }
640 
641  if (fBoostType=="Bagging") fBaggedBoost = kTRUE;
642  if (fBaggedGradBoost){
644  Log() << kWARNING << "You have specified a deprecated option *UseBaggedGrad* --> please use *UseBaggedBoost* instead" << Endl;
645  }
646 
647 }
648 
649 
650 //_______________________________________________________________________
651 
653  if (sizeInPercent > 0 && sizeInPercent < 50){
654  fMinNodeSize=sizeInPercent;
655 
656  } else {
657  Log() << kFATAL << "you have demanded a minimal node size of "
658  << sizeInPercent << "% of the training events.. \n"
659  << " that somehow does not make sense "<<Endl;
660  }
661 
662 }
663 ////////////////////////////////////////////////////////////////////////////////
664 
666  sizeInPercent.ReplaceAll("%","");
667  sizeInPercent.ReplaceAll(" ","");
668  if (sizeInPercent.IsFloat()) SetMinNodeSize(sizeInPercent.Atof());
669  else {
670  Log() << kFATAL << "I had problems reading the option MinNodeEvents, which "
671  << "after removing a possible % sign now reads " << sizeInPercent << Endl;
672  }
673 }
674 
675 
676 
677 ////////////////////////////////////////////////////////////////////////////////
678 /// common initialisation with defaults for the BDT-Method
679 
681 {
682  fNTrees = 800;
684  fMaxDepth = 3;
685  fBoostType = "AdaBoost";
686  if(DataInfo().GetNClasses()!=0) //workaround for multiclass application
687  fMinNodeSize = 5.;
688  }else {
689  fMaxDepth = 50;
690  fBoostType = "AdaBoostR2";
691  fAdaBoostR2Loss = "Quadratic";
692  if(DataInfo().GetNClasses()!=0) //workaround for multiclass application
693  fMinNodeSize = .2;
694  }
695 
696 
697  fNCuts = 20;
698  fPruneMethodS = "NoPruning";
700  fPruneStrength = 0;
701  fAutomatic = kFALSE;
702  fFValidationEvents = 0.5;
704  // fUseNvars = (GetNvar()>12) ? UInt_t(GetNvar()/8) : TMath::Max(UInt_t(2),UInt_t(GetNvar()/3));
707  fShrinkage = 1.0;
708 // fSumOfWeights = 0.0;
709 
710  // reference cut value to distinguish signal-like from background-like events
712 }
713 
714 
715 ////////////////////////////////////////////////////////////////////////////////
716 /// reset the method, as if it had just been instantiated (forget all training etc.)
717 
719 {
720  // I keep the BDT EventSample and its Validation sample (eventuall they should all
721  // disappear and just use the DataSet samples ..
722 
723  // remove all the trees
724  for (UInt_t i=0; i<fForest.size(); i++) delete fForest[i];
725  fForest.clear();
726 
727  fBoostWeights.clear();
729  fVariableImportance.clear();
730  fResiduals.clear();
731  fLossFunctionEventInfo.clear();
732  // now done in "InitEventSample" which is called in "Train"
733  // reset all previously stored/accumulated BOOST weights in the event sample
734  //for (UInt_t iev=0; iev<fEventSample.size(); iev++) fEventSample[iev]->SetBoostWeight(1.);
736  Log() << kDEBUG << " successfully(?) reset the method " << Endl;
737 }
738 
739 
740 ////////////////////////////////////////////////////////////////////////////////
741 ///destructor
742 /// Note: fEventSample and ValidationSample are already deleted at the end of TRAIN
743 /// When they are not used anymore
744 /// for (UInt_t i=0; i<fEventSample.size(); i++) delete fEventSample[i];
745 /// for (UInt_t i=0; i<fValidationSample.size(); i++) delete fValidationSample[i];
746 
748 {
749  for (UInt_t i=0; i<fForest.size(); i++) delete fForest[i];
750 }
751 
752 ////////////////////////////////////////////////////////////////////////////////
753 /// initialize the event sample (i.e. reset the boost-weights... etc)
754 
756 {
757  if (!HasTrainingTree()) Log() << kFATAL << "<Init> Data().TrainingTree() is zero pointer" << Endl;
758 
759  if (fEventSample.size() > 0) { // do not re-initialise the event sample, just set all boostweights to 1. as if it were untouched
760  // reset all previously stored/accumulated BOOST weights in the event sample
761  for (UInt_t iev=0; iev<fEventSample.size(); iev++) fEventSample[iev]->SetBoostWeight(1.);
762  } else {
764  UInt_t nevents = Data()->GetNTrainingEvents();
765 
766  std::vector<const TMVA::Event*> tmpEventSample;
767  for (Long64_t ievt=0; ievt<nevents; ievt++) {
768  // const Event *event = new Event(*(GetEvent(ievt)));
769  Event* event = new Event( *GetTrainingEvent(ievt) );
770  tmpEventSample.push_back(event);
771  }
772 
773  if (!DoRegression()) DeterminePreselectionCuts(tmpEventSample);
774  else fDoPreselection = kFALSE; // just to make sure...
775 
776  for (UInt_t i=0; i<tmpEventSample.size(); i++) delete tmpEventSample[i];
777 
778 
779  Bool_t firstNegWeight=kTRUE;
780  Bool_t firstZeroWeight=kTRUE;
781  for (Long64_t ievt=0; ievt<nevents; ievt++) {
782  // const Event *event = new Event(*(GetEvent(ievt)));
783  // const Event* event = new Event( *GetTrainingEvent(ievt) );
784  Event* event = new Event( *GetTrainingEvent(ievt) );
785  if (fDoPreselection){
786  if (TMath::Abs(ApplyPreselectionCuts(event)) > 0.05) {
787  delete event;
788  continue;
789  }
790  }
791 
792  if (event->GetWeight() < 0 && (IgnoreEventsWithNegWeightsInTraining() || fNoNegWeightsInTraining)){
793  if (firstNegWeight) {
794  Log() << kWARNING << " Note, you have events with negative event weight in the sample, but you've chosen to ignore them" << Endl;
795  firstNegWeight=kFALSE;
796  }
797  delete event;
798  }else if (event->GetWeight()==0){
799  if (firstZeroWeight) {
800  firstZeroWeight = kFALSE;
801  Log() << "Events with weight == 0 are going to be simply ignored " << Endl;
802  }
803  delete event;
804  }else{
805  if (event->GetWeight() < 0) {
807  if (firstNegWeight){
808  firstNegWeight = kFALSE;
810  Log() << kWARNING << "Events with negative event weights are found and "
811  << " will be removed prior to the actual BDT training by global "
812  << " paring (and subsequent annihilation) with positiv weight events"
813  << Endl;
814  }else{
815  Log() << kWARNING << "Events with negative event weights are USED during "
816  << "the BDT training. This might cause problems with small node sizes "
817  << "or with the boosting. Please remove negative events from training "
818  << "using the option *IgnoreEventsWithNegWeightsInTraining* in case you "
819  << "observe problems with the boosting"
820  << Endl;
821  }
822  }
823  }
824  // if fAutomatic == true you need a validation sample to optimize pruning
825  if (fAutomatic) {
826  Double_t modulo = 1.0/(fFValidationEvents);
827  Int_t imodulo = static_cast<Int_t>( fmod(modulo,1.0) > 0.5 ? ceil(modulo) : floor(modulo) );
828  if (ievt % imodulo == 0) fValidationSample.push_back( event );
829  else fEventSample.push_back( event );
830  }
831  else {
832  fEventSample.push_back(event);
833  }
834  }
835  }
836 
837  if (fAutomatic) {
838  Log() << kINFO << "<InitEventSample> Internally I use " << fEventSample.size()
839  << " for Training and " << fValidationSample.size()
840  << " for Pruning Validation (" << ((Float_t)fValidationSample.size())/((Float_t)fEventSample.size()+fValidationSample.size())*100.0
841  << "% of training used for validation)" << Endl;
842  }
843 
844  // some pre-processing for events with negative weights
846  }
847 
848  if (!DoRegression() && !fSkipNormalization){
849  Log() << kDEBUG << "\t<InitEventSample> For classification trees, "<< Endl;
850  Log() << kDEBUG << " \tthe effective number of backgrounds is scaled to match "<<Endl;
851  Log() << kDEBUG << " \tthe signal. Otherwise the first boosting step would do 'just that'!"<<Endl;
852  // it does not make sense in decision trees to start with unequal number of signal/background
853  // events (weights) .. hence normalize them now (happens atherwise in first 'boosting step'
854  // anyway..
855  // Also make sure, that the sum_of_weights == sample.size() .. as this is assumed in
856  // the DecisionTree to derive a sensible number for "fMinSize" (min.#events in node)
857  // that currently is an OR between "weighted" and "unweighted number"
858  // I want:
859  // nS + nB = n
860  // a*SW + b*BW = n
861  // (a*SW)/(b*BW) = fSigToBkgFraction
862  //
863  // ==> b = n/((1+f)BW) and a = (nf/(1+f))/SW
864 
865  Double_t nevents = fEventSample.size();
866  Double_t sumSigW=0, sumBkgW=0;
867  Int_t sumSig=0, sumBkg=0;
868  for (UInt_t ievt=0; ievt<fEventSample.size(); ievt++) {
869  if ((DataInfo().IsSignal(fEventSample[ievt])) ) {
870  sumSigW += fEventSample[ievt]->GetWeight();
871  sumSig++;
872  } else {
873  sumBkgW += fEventSample[ievt]->GetWeight();
874  sumBkg++;
875  }
876  }
877  if (sumSigW && sumBkgW){
878  Double_t normSig = nevents/((1+fSigToBkgFraction)*sumSigW)*fSigToBkgFraction;
879  Double_t normBkg = nevents/((1+fSigToBkgFraction)*sumBkgW); ;
880  Log() << kDEBUG << "\tre-normalise events such that Sig and Bkg have respective sum of weights = "
881  << fSigToBkgFraction << Endl;
882  Log() << kDEBUG << " \tsig->sig*"<<normSig << "ev. bkg->bkg*"<<normBkg << "ev." <<Endl;
883  Log() << kHEADER << "#events: (reweighted) sig: "<< sumSigW*normSig << " bkg: " << sumBkgW*normBkg << Endl;
884  Log() << kINFO << "#events: (unweighted) sig: "<< sumSig << " bkg: " << sumBkg << Endl;
885  for (Long64_t ievt=0; ievt<nevents; ievt++) {
886  if ((DataInfo().IsSignal(fEventSample[ievt])) ) fEventSample[ievt]->SetBoostWeight(normSig);
887  else fEventSample[ievt]->SetBoostWeight(normBkg);
888  }
889  }else{
890  Log() << kINFO << "--> could not determine scaleing factors as either there are " << Endl;
891  Log() << kINFO << " no signal events (sumSigW="<<sumSigW<<") or no bkg ev. (sumBkgW="<<sumBkgW<<")"<<Endl;
892  }
893 
894  }
895 
897  if (fBaggedBoost){
900  }
901 
902  //just for debug purposes..
903  /*
904  sumSigW=0;
905  sumBkgW=0;
906  for (UInt_t ievt=0; ievt<fEventSample.size(); ievt++) {
907  if ((DataInfo().IsSignal(fEventSample[ievt])) ) sumSigW += fEventSample[ievt]->GetWeight();
908  else sumBkgW += fEventSample[ievt]->GetWeight();
909  }
910  Log() << kWARNING << "sigSumW="<<sumSigW<<"bkgSumW="<<sumBkgW<< Endl;
911  */
912 }
913 
914 ////////////////////////////////////////////////////////////////////////////////
915 /// o.k. you know there are events with negative event weights. This routine will remove
916 /// them by pairing them with the closest event(s) of the same event class with positive
917 /// weights
918 /// A first attempt is "brute force", I dont' try to be clever using search trees etc,
919 /// just quick and dirty to see if the result is any good
920 
922  Double_t totalNegWeights = 0;
923  Double_t totalPosWeights = 0;
924  Double_t totalWeights = 0;
925  std::vector<const Event*> negEvents;
926  for (UInt_t iev = 0; iev < fEventSample.size(); iev++){
927  if (fEventSample[iev]->GetWeight() < 0) {
928  totalNegWeights += fEventSample[iev]->GetWeight();
929  negEvents.push_back(fEventSample[iev]);
930  } else {
931  totalPosWeights += fEventSample[iev]->GetWeight();
932  }
933  totalWeights += fEventSample[iev]->GetWeight();
934  }
935  if (totalNegWeights == 0 ) {
936  Log() << kINFO << "no negative event weights found .. no preprocessing necessary" << Endl;
937  return;
938  } else {
939  Log() << kINFO << "found a total of " << totalNegWeights << " of negative event weights which I am going to try to pair with positive events to annihilate them" << Endl;
940  Log() << kINFO << "found a total of " << totalPosWeights << " of events with positive weights" << Endl;
941  Log() << kINFO << "--> total sum of weights = " << totalWeights << " = " << totalNegWeights+totalPosWeights << Endl;
942  }
943 
944  std::vector<TMatrixDSym*>* cov = gTools().CalcCovarianceMatrices( fEventSample, 2);
945 
946  TMatrixDSym *invCov;
947 
948  for (Int_t i=0; i<2; i++){
949  invCov = ((*cov)[i]);
950  if ( TMath::Abs(invCov->Determinant()) < 10E-24 ) {
951  std::cout << "<MethodBDT::PreProcessNeg...> matrix is almost singular with deterninant="
952  << TMath::Abs(invCov->Determinant())
953  << " did you use the variables that are linear combinations or highly correlated?"
954  << std::endl;
955  }
956  if ( TMath::Abs(invCov->Determinant()) < 10E-120 ) {
957  std::cout << "<MethodBDT::PreProcessNeg...> matrix is singular with determinant="
958  << TMath::Abs(invCov->Determinant())
959  << " did you use the variables that are linear combinations?"
960  << std::endl;
961  }
962 
963  invCov->Invert();
964  }
965 
966 
967 
968  Log() << kINFO << "Found a total of " << totalNegWeights << " in negative weights out of " << fEventSample.size() << " training events " << Endl;
969  Timer timer(negEvents.size(),"Negative Event paired");
970  for (UInt_t nev = 0; nev < negEvents.size(); nev++){
971  timer.DrawProgressBar( nev );
972  Double_t weight = negEvents[nev]->GetWeight();
973  UInt_t iClassID = negEvents[nev]->GetClass();
974  invCov = ((*cov)[iClassID]);
975  while (weight < 0){
976  // find closest event with positive event weight and "pair" it with the negative event
977  // (add their weight) until there is no negative weight anymore
978  Int_t iMin=-1;
979  Double_t dist, minDist=10E270;
980  for (UInt_t iev = 0; iev < fEventSample.size(); iev++){
981  if (iClassID==fEventSample[iev]->GetClass() && fEventSample[iev]->GetWeight() > 0){
982  dist=0;
983  for (UInt_t ivar=0; ivar < GetNvar(); ivar++){
984  for (UInt_t jvar=0; jvar<GetNvar(); jvar++){
985  dist += (negEvents[nev]->GetValue(ivar)-fEventSample[iev]->GetValue(ivar))*
986  (*invCov)[ivar][jvar]*
987  (negEvents[nev]->GetValue(jvar)-fEventSample[iev]->GetValue(jvar));
988  }
989  }
990  if (dist < minDist) { iMin=iev; minDist=dist;}
991  }
992  }
993 
994  if (iMin > -1) {
995  // std::cout << "Happily pairing .. weight before : " << negEvents[nev]->GetWeight() << " and " << fEventSample[iMin]->GetWeight();
996  Double_t newWeight = (negEvents[nev]->GetWeight() + fEventSample[iMin]->GetWeight());
997  if (newWeight > 0){
998  negEvents[nev]->SetBoostWeight( 0 );
999  fEventSample[iMin]->SetBoostWeight( newWeight/fEventSample[iMin]->GetOriginalWeight() ); // note the weight*boostweight should be "newWeight"
1000  } else {
1001  negEvents[nev]->SetBoostWeight( newWeight/negEvents[nev]->GetOriginalWeight() ); // note the weight*boostweight should be "newWeight"
1002  fEventSample[iMin]->SetBoostWeight( 0 );
1003  }
1004  // std::cout << " and afterwards " << negEvents[nev]->GetWeight() << " and the paired " << fEventSample[iMin]->GetWeight() << " dist="<<minDist<< std::endl;
1005  } else Log() << kFATAL << "preprocessing didn't find event to pair with the negative weight ... probably a bug" << Endl;
1006  weight = negEvents[nev]->GetWeight();
1007  }
1008  }
1009  Log() << kINFO << "<Negative Event Pairing> took: " << timer.GetElapsedTime()
1010  << " " << Endl;
1011 
1012  // just check.. now there should be no negative event weight left anymore
1013  totalNegWeights = 0;
1014  totalPosWeights = 0;
1015  totalWeights = 0;
1016  Double_t sigWeight=0;
1017  Double_t bkgWeight=0;
1018  Int_t nSig=0;
1019  Int_t nBkg=0;
1020 
1021  std::vector<const Event*> newEventSample;
1022 
1023  for (UInt_t iev = 0; iev < fEventSample.size(); iev++){
1024  if (fEventSample[iev]->GetWeight() < 0) {
1025  totalNegWeights += fEventSample[iev]->GetWeight();
1026  totalWeights += fEventSample[iev]->GetWeight();
1027  } else {
1028  totalPosWeights += fEventSample[iev]->GetWeight();
1029  totalWeights += fEventSample[iev]->GetWeight();
1030  }
1031  if (fEventSample[iev]->GetWeight() > 0) {
1032  newEventSample.push_back(new Event(*fEventSample[iev]));
1033  if (fEventSample[iev]->GetClass() == fSignalClass){
1034  sigWeight += fEventSample[iev]->GetWeight();
1035  nSig+=1;
1036  }else{
1037  bkgWeight += fEventSample[iev]->GetWeight();
1038  nBkg+=1;
1039  }
1040  }
1041  }
1042  if (totalNegWeights < 0) Log() << kFATAL << " compenstion of negative event weights with positive ones did not work " << totalNegWeights << Endl;
1043 
1044  for (UInt_t i=0; i<fEventSample.size(); i++) delete fEventSample[i];
1045  fEventSample = newEventSample;
1046 
1047  Log() << kINFO << " after PreProcessing, the Event sample is left with " << fEventSample.size() << " events (unweighted), all with positive weights, adding up to " << totalWeights << Endl;
1048  Log() << kINFO << " nSig="<<nSig << " sigWeight="<<sigWeight << " nBkg="<<nBkg << " bkgWeight="<<bkgWeight << Endl;
1049 
1050 
1051 }
1052 
1053 //
1054 
1055 ////////////////////////////////////////////////////////////////////////////////
1056 /// call the Optimzier with the set of paremeters and ranges that
1057 /// are meant to be tuned.
1058 
1059 std::map<TString,Double_t> TMVA::MethodBDT::OptimizeTuningParameters(TString fomType, TString fitType)
1060 {
1061  // fill all the tuning parameters that should be optimized into a map:
1062  std::map<TString,TMVA::Interval*> tuneParameters;
1063  std::map<TString,Double_t> tunedParameters;
1064 
1065  // note: the 3rd paraemter in the inteval is the "number of bins", NOT the stepsize !!
1066  // the actual VALUES at (at least for the scan, guess also in GA) are always
1067  // read from the middle of the bins. Hence.. the choice of Intervals e.g. for the
1068  // MaxDepth, in order to make nice interger values!!!
1069 
1070  // find some reasonable ranges for the optimisation of MinNodeEvents:
1071 
1072  tuneParameters.insert(std::pair<TString,Interval*>("NTrees", new Interval(10,1000,5))); // stepsize 50
1073  tuneParameters.insert(std::pair<TString,Interval*>("MaxDepth", new Interval(2,4,3))); // stepsize 1
1074  tuneParameters.insert(std::pair<TString,Interval*>("MinNodeSize", new LogInterval(1,30,30))); //
1075  //tuneParameters.insert(std::pair<TString,Interval*>("NodePurityLimit",new Interval(.4,.6,3))); // stepsize .1
1076  //tuneParameters.insert(std::pair<TString,Interval*>("BaggedSampleFraction",new Interval(.4,.9,6))); // stepsize .1
1077 
1078  // method-specific parameters
1079  if (fBoostType=="AdaBoost"){
1080  tuneParameters.insert(std::pair<TString,Interval*>("AdaBoostBeta", new Interval(.2,1.,5)));
1081 
1082  }else if (fBoostType=="Grad"){
1083  tuneParameters.insert(std::pair<TString,Interval*>("Shrinkage", new Interval(0.05,0.50,5)));
1084 
1085  }else if (fBoostType=="Bagging" && fRandomisedTrees){
1086  Int_t min_var = TMath::FloorNint( GetNvar() * .25 );
1087  Int_t max_var = TMath::CeilNint( GetNvar() * .75 );
1088  tuneParameters.insert(std::pair<TString,Interval*>("UseNvars", new Interval(min_var,max_var,4)));
1089 
1090  }
1091 
1092  Log()<<kINFO << " the following BDT parameters will be tuned on the respective *grid*\n"<<Endl;
1093  std::map<TString,TMVA::Interval*>::iterator it;
1094  for(it=tuneParameters.begin(); it!= tuneParameters.end(); it++){
1095  Log() << kWARNING << it->first << Endl;
1096  std::ostringstream oss;
1097  (it->second)->Print(oss);
1098  Log()<<oss.str();
1099  Log()<<Endl;
1100  }
1101 
1102  OptimizeConfigParameters optimize(this, tuneParameters, fomType, fitType);
1103  tunedParameters=optimize.optimize();
1104 
1105  return tunedParameters;
1106 
1107 }
1108 
1109 ////////////////////////////////////////////////////////////////////////////////
1110 /// set the tuning parameters accoding to the argument
1111 
1112 void TMVA::MethodBDT::SetTuneParameters(std::map<TString,Double_t> tuneParameters)
1113 {
1114  std::map<TString,Double_t>::iterator it;
1115  for(it=tuneParameters.begin(); it!= tuneParameters.end(); it++){
1116  Log() << kWARNING << it->first << " = " << it->second << Endl;
1117  if (it->first == "MaxDepth" ) SetMaxDepth ((Int_t)it->second);
1118  else if (it->first == "MinNodeSize" ) SetMinNodeSize (it->second);
1119  else if (it->first == "NTrees" ) SetNTrees ((Int_t)it->second);
1120  else if (it->first == "NodePurityLimit") SetNodePurityLimit (it->second);
1121  else if (it->first == "AdaBoostBeta" ) SetAdaBoostBeta (it->second);
1122  else if (it->first == "Shrinkage" ) SetShrinkage (it->second);
1123  else if (it->first == "UseNvars" ) SetUseNvars ((Int_t)it->second);
1124  else if (it->first == "BaggedSampleFraction" ) SetBaggedSampleFraction (it->second);
1125  else Log() << kFATAL << " SetParameter for " << it->first << " not yet implemented " <<Endl;
1126  }
1127 
1128 
1129 }
1130 
1131 ////////////////////////////////////////////////////////////////////////////////
1132 /// BDT training
1133 
1135 {
1137 
1138  // fill the STL Vector with the event sample
1139  // (needs to be done here and cannot be done in "init" as the options need to be
1140  // known).
1141  InitEventSample();
1142 
1143  if (fNTrees==0){
1144  Log() << kERROR << " Zero Decision Trees demanded... that does not work !! "
1145  << " I set it to 1 .. just so that the program does not crash"
1146  << Endl;
1147  fNTrees = 1;
1148  }
1149 
1151  std::vector<TString> titles = {"Boost weight", "Error Fraction"};
1152  fInteractive->Init(titles);
1153  }
1154  fIPyMaxIter = fNTrees;
1155 
1156  // HHV (it's been here since looong but I really don't know why we cannot handle
1157  // normalized variables in BDTs... todo
1158  if (IsNormalised()) Log() << kFATAL << "\"Normalise\" option cannot be used with BDT; "
1159  << "please remove the option from the configuration string, or "
1160  << "use \"!Normalise\""
1161  << Endl;
1162 
1163  if(DoRegression())
1164  Log() << kINFO << "Regression Loss Function: "<< fRegressionLossFunctionBDTG->Name() << Endl;
1165 
1166  Log() << kINFO << "Training "<< fNTrees << " Decision Trees ... patience please" << Endl;
1167 
1168  Log() << kDEBUG << "Training with maximal depth = " <<fMaxDepth
1169  << ", MinNodeEvents=" << fMinNodeEvents
1170  << ", NTrees="<<fNTrees
1171  << ", NodePurityLimit="<<fNodePurityLimit
1172  << ", AdaBoostBeta="<<fAdaBoostBeta
1173  << Endl;
1174 
1175  // weights applied in boosting
1176  Int_t nBins;
1177  Double_t xMin,xMax;
1178  TString hname = "AdaBooost weight distribution";
1179 
1180  nBins= 100;
1181  xMin = 0;
1182  xMax = 30;
1183 
1184  if (DoRegression()) {
1185  nBins= 100;
1186  xMin = 0;
1187  xMax = 1;
1188  hname="Boost event weights distribution";
1189  }
1190 
1191  // book monitoring histograms (for AdaBost only)
1192 
1193  TH1* h = new TH1F(Form("%s_BoostWeight",DataInfo().GetName()),hname,nBins,xMin,xMax);
1194  TH1* nodesBeforePruningVsTree = new TH1I(Form("%s_NodesBeforePruning",DataInfo().GetName()),"nodes before pruning",fNTrees,0,fNTrees);
1195  TH1* nodesAfterPruningVsTree = new TH1I(Form("%s_NodesAfterPruning",DataInfo().GetName()),"nodes after pruning",fNTrees,0,fNTrees);
1196 
1197 
1198 
1199  if(!DoMulticlass()){
1201 
1202  h->SetXTitle("boost weight");
1203  results->Store(h, "BoostWeights");
1204 
1205 
1206  // Monitor the performance (on TEST sample) versus number of trees
1207  if (fDoBoostMonitor){
1208  TH2* boostMonitor = new TH2F("BoostMonitor","ROC Integral Vs iTree",2,0,fNTrees,2,0,1.05);
1209  boostMonitor->SetXTitle("#tree");
1210  boostMonitor->SetYTitle("ROC Integral");
1211  results->Store(boostMonitor, "BoostMonitor");
1212  TGraph *boostMonitorGraph = new TGraph();
1213  boostMonitorGraph->SetName("BoostMonitorGraph");
1214  boostMonitorGraph->SetTitle("ROCIntegralVsNTrees");
1215  results->Store(boostMonitorGraph, "BoostMonitorGraph");
1216  }
1217 
1218  // weights applied in boosting vs tree number
1219  h = new TH1F("BoostWeightVsTree","Boost weights vs tree",fNTrees,0,fNTrees);
1220  h->SetXTitle("#tree");
1221  h->SetYTitle("boost weight");
1222  results->Store(h, "BoostWeightsVsTree");
1223 
1224  // error fraction vs tree number
1225  h = new TH1F("ErrFractHist","error fraction vs tree number",fNTrees,0,fNTrees);
1226  h->SetXTitle("#tree");
1227  h->SetYTitle("error fraction");
1228  results->Store(h, "ErrorFrac");
1229 
1230  // nNodesBeforePruning vs tree number
1231  nodesBeforePruningVsTree->SetXTitle("#tree");
1232  nodesBeforePruningVsTree->SetYTitle("#tree nodes");
1233  results->Store(nodesBeforePruningVsTree);
1234 
1235  // nNodesAfterPruning vs tree number
1236  nodesAfterPruningVsTree->SetXTitle("#tree");
1237  nodesAfterPruningVsTree->SetYTitle("#tree nodes");
1238  results->Store(nodesAfterPruningVsTree);
1239 
1240  }
1241 
1242  fMonitorNtuple= new TTree("MonitorNtuple","BDT variables");
1243  fMonitorNtuple->Branch("iTree",&fITree,"iTree/I");
1244  fMonitorNtuple->Branch("boostWeight",&fBoostWeight,"boostWeight/D");
1245  fMonitorNtuple->Branch("errorFraction",&fErrorFraction,"errorFraction/D");
1246 
1247  Timer timer( fNTrees, GetName() );
1248  Int_t nNodesBeforePruningCount = 0;
1249  Int_t nNodesAfterPruningCount = 0;
1250 
1251  Int_t nNodesBeforePruning = 0;
1252  Int_t nNodesAfterPruning = 0;
1253 
1254 
1255  if(fBoostType=="Grad"){
1257  }
1258 
1259  Int_t itree=0;
1260  Bool_t continueBoost=kTRUE;
1261  //for (int itree=0; itree<fNTrees; itree++) {
1262  while (itree < fNTrees && continueBoost){
1263  if (fExitFromTraining) break;
1264  fIPyCurrentIter = itree;
1265  timer.DrawProgressBar( itree );
1266  // Results* results = Data()->GetResults(GetMethodName(), Types::kTraining, GetAnalysisType());
1267  // TH1 *hxx = new TH1F(Form("swdist%d",itree),Form("swdist%d",itree),10000,0,15);
1268  // results->Store(hxx,Form("swdist%d",itree));
1269  // TH1 *hxy = new TH1F(Form("bwdist%d",itree),Form("bwdist%d",itree),10000,0,15);
1270  // results->Store(hxy,Form("bwdist%d",itree));
1271  // for (Int_t iev=0; iev<fEventSample.size(); iev++) {
1272  // if (fEventSample[iev]->GetClass()!=0) hxy->Fill((fEventSample[iev])->GetWeight());
1273  // else hxx->Fill((fEventSample[iev])->GetWeight());
1274  // }
1275 
1276  if(DoMulticlass()){
1277  if (fBoostType!="Grad"){
1278  Log() << kFATAL << "Multiclass is currently only supported by gradient boost. "
1279  << "Please change boost option accordingly (GradBoost)."
1280  << Endl;
1281  }
1282  UInt_t nClasses = DataInfo().GetNClasses();
1283  for (UInt_t i=0;i<nClasses;i++){
1284  fForest.push_back( new DecisionTree( fSepType, fMinNodeSize, fNCuts, &(DataInfo()), i,
1286  itree*nClasses+i, fNodePurityLimit, itree*nClasses+1));
1287  fForest.back()->SetNVars(GetNvar());
1288  if (fUseFisherCuts) {
1289  fForest.back()->SetUseFisherCuts();
1290  fForest.back()->SetMinLinCorrForFisher(fMinLinCorrForFisher);
1291  fForest.back()->SetUseExclusiveVars(fUseExclusiveVars);
1292  }
1293  // the minimum linear correlation between two variables demanded for use in fisher criterion in node splitting
1294 
1295  nNodesBeforePruning = fForest.back()->BuildTree(*fTrainSample);
1296  Double_t bw = this->Boost(*fTrainSample, fForest.back(),i);
1297  if (bw > 0) {
1298  fBoostWeights.push_back(bw);
1299  }else{
1300  fBoostWeights.push_back(0);
1301  Log() << kWARNING << "stopped boosting at itree="<<itree << Endl;
1302  // fNTrees = itree+1; // that should stop the boosting
1303  continueBoost=kFALSE;
1304  }
1305  }
1306  }
1307  else{
1310  itree, fNodePurityLimit, itree));
1311  fForest.back()->SetNVars(GetNvar());
1312  if (fUseFisherCuts) {
1313  fForest.back()->SetUseFisherCuts();
1314  fForest.back()->SetMinLinCorrForFisher(fMinLinCorrForFisher);
1315  fForest.back()->SetUseExclusiveVars(fUseExclusiveVars);
1316  }
1317 
1318  nNodesBeforePruning = fForest.back()->BuildTree(*fTrainSample);
1319 
1320  if (fUseYesNoLeaf && !DoRegression() && fBoostType!="Grad") { // remove leaf nodes where both daughter nodes are of same type
1321  nNodesBeforePruning = fForest.back()->CleanTree();
1322  }
1323 
1324  nNodesBeforePruningCount += nNodesBeforePruning;
1325  nodesBeforePruningVsTree->SetBinContent(itree+1,nNodesBeforePruning);
1326 
1327  fForest.back()->SetPruneMethod(fPruneMethod); // set the pruning method for the tree
1328  fForest.back()->SetPruneStrength(fPruneStrength); // set the strength parameter
1329 
1330  std::vector<const Event*> * validationSample = NULL;
1331  if(fAutomatic) validationSample = &fValidationSample;
1332 
1333  Double_t bw = this->Boost(*fTrainSample, fForest.back());
1334  if (bw > 0) {
1335  fBoostWeights.push_back(bw);
1336  }else{
1337  fBoostWeights.push_back(0);
1338  Log() << kWARNING << "stopped boosting at itree="<<itree << Endl;
1339  continueBoost=kFALSE;
1340  }
1341 
1342 
1343 
1344  // if fAutomatic == true, pruneStrength will be the optimal pruning strength
1345  // determined by the pruning algorithm; otherwise, it is simply the strength parameter
1346  // set by the user
1347  if (fPruneMethod != DecisionTree::kNoPruning) fForest.back()->PruneTree(validationSample);
1348 
1349  if (fUseYesNoLeaf && !DoRegression() && fBoostType!="Grad"){ // remove leaf nodes where both daughter nodes are of same type
1350  fForest.back()->CleanTree();
1351  }
1352  nNodesAfterPruning = fForest.back()->GetNNodes();
1353  nNodesAfterPruningCount += nNodesAfterPruning;
1354  nodesAfterPruningVsTree->SetBinContent(itree+1,nNodesAfterPruning);
1355 
1356  if (fInteractive){
1358  }
1359  fITree = itree;
1360  fMonitorNtuple->Fill();
1361  if (fDoBoostMonitor){
1362  if (! DoRegression() ){
1363  if ( itree==fNTrees-1 || (!(itree%500)) ||
1364  (!(itree%250) && itree <1000)||
1365  (!(itree%100) && itree < 500)||
1366  (!(itree%50) && itree < 250)||
1367  (!(itree%25) && itree < 150)||
1368  (!(itree%10) && itree < 50)||
1369  (!(itree%5) && itree < 20)
1370  ) BoostMonitor(itree);
1371  }
1372  }
1373  }
1374  itree++;
1375  }
1376 
1377  // get elapsed time
1378  Log() << kDEBUG << "\t<Train> elapsed time: " << timer.GetElapsedTime()
1379  << " " << Endl;
1381  Log() << kDEBUG << "\t<Train> average number of nodes (w/o pruning) : "
1382  << nNodesBeforePruningCount/GetNTrees() << Endl;
1383  }
1384  else {
1385  Log() << kDEBUG << "\t<Train> average number of nodes before/after pruning : "
1386  << nNodesBeforePruningCount/GetNTrees() << " / "
1387  << nNodesAfterPruningCount/GetNTrees()
1388  << Endl;
1389  }
1391 
1392 
1393  // reset all previously stored/accumulated BOOST weights in the event sample
1394  // for (UInt_t iev=0; iev<fEventSample.size(); iev++) fEventSample[iev]->SetBoostWeight(1.);
1395  Log() << kDEBUG << "Now I delete the privat data sample"<< Endl;
1396  for (UInt_t i=0; i<fEventSample.size(); i++) delete fEventSample[i];
1397  for (UInt_t i=0; i<fValidationSample.size(); i++) delete fValidationSample[i];
1398  fEventSample.clear();
1399  fValidationSample.clear();
1400 
1402  ExitFromTraining();
1403 }
1404 
1405 
1406 ////////////////////////////////////////////////////////////////////////////////
1407 ///returns MVA value: -1 for background, 1 for signal
1408 
1410 {
1411  Double_t sum=0;
1412  for (UInt_t itree=0; itree<nTrees; itree++) {
1413  //loop over all trees in forest
1414  sum += fForest[itree]->CheckEvent(e,kFALSE);
1415 
1416  }
1417  return 2.0/(1.0+exp(-2.0*sum))-1; //MVA output between -1 and 1
1418 }
1419 
1420 ////////////////////////////////////////////////////////////////////////////////
1421 ///Calculate residua for all events;
1422 
1423 void TMVA::MethodBDT::UpdateTargets(std::vector<const TMVA::Event*>& eventSample, UInt_t cls)
1424 {
1425  if(DoMulticlass()){
1426  UInt_t nClasses = DataInfo().GetNClasses();
1427  for (std::vector<const TMVA::Event*>::iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1428  fResiduals[*e].at(cls)+=fForest.back()->CheckEvent(*e,kFALSE);
1429  if(cls == nClasses-1){
1430  for(UInt_t i=0;i<nClasses;i++){
1431  Double_t norm = 0.0;
1432  for(UInt_t j=0;j<nClasses;j++){
1433  if(i!=j)
1434  norm+=exp(fResiduals[*e].at(j)-fResiduals[*e].at(i));
1435  }
1436  Double_t p_cls = 1.0/(1.0+norm);
1437  Double_t res = ((*e)->GetClass()==i)?(1.0-p_cls):(-p_cls);
1438  const_cast<TMVA::Event*>(*e)->SetTarget(i,res);
1439  }
1440  }
1441  }
1442  }
1443  else{
1444  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1445  fResiduals[*e].at(0)+=fForest.back()->CheckEvent(*e,kFALSE);
1446  Double_t p_sig=1.0/(1.0+exp(-2.0*fResiduals[*e].at(0)));
1447  Double_t res = (DataInfo().IsSignal(*e)?1:0)-p_sig;
1448  const_cast<TMVA::Event*>(*e)->SetTarget(0,res);
1449  }
1450  }
1451 }
1452 
1453 ////////////////////////////////////////////////////////////////////////////////
1454 ///Calculate current residuals for all events and update targets for next iteration
1455 
1456 void TMVA::MethodBDT::UpdateTargetsRegression(std::vector<const TMVA::Event*>& eventSample, Bool_t first)
1457 {
1458  if(!first){
1459  for (std::vector<const TMVA::Event*>::const_iterator e=fEventSample.begin(); e!=fEventSample.end();e++) {
1460  fLossFunctionEventInfo[*e].predictedValue += fForest.back()->CheckEvent(*e,kFALSE);
1461  }
1462  }
1463 
1465 }
1466 
1467 ////////////////////////////////////////////////////////////////////////////////
1468 ///Calculate the desired response value for each region
1469 
1470 Double_t TMVA::MethodBDT::GradBoost(std::vector<const TMVA::Event*>& eventSample, DecisionTree *dt, UInt_t cls)
1471 {
1472  std::map<TMVA::DecisionTreeNode*,std::vector<Double_t> > leaves;
1473  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1474  Double_t weight = (*e)->GetWeight();
1475  TMVA::DecisionTreeNode* node = dt->GetEventNode(*(*e));
1476  if ((leaves[node]).empty()){
1477  (leaves[node]).push_back((*e)->GetTarget(cls)* weight);
1478  (leaves[node]).push_back(fabs((*e)->GetTarget(cls))*(1.0-fabs((*e)->GetTarget(cls))) * weight* weight);
1479  }
1480  else {
1481  (leaves[node])[0]+=((*e)->GetTarget(cls)* weight);
1482  (leaves[node])[1]+=fabs((*e)->GetTarget(cls))*(1.0-fabs((*e)->GetTarget(cls))) * weight* weight;
1483  }
1484  }
1485  for (std::map<TMVA::DecisionTreeNode*,std::vector<Double_t> >::iterator iLeave=leaves.begin();
1486  iLeave!=leaves.end();++iLeave){
1487  if ((iLeave->second)[1]<1e-30) (iLeave->second)[1]=1e-30;
1488 
1489  (iLeave->first)->SetResponse(fShrinkage/DataInfo().GetNClasses()*(iLeave->second)[0]/((iLeave->second)[1]));
1490  }
1491 
1492  //call UpdateTargets before next tree is grown
1493 
1495  return 1; //trees all have the same weight
1496 }
1497 
1498 ////////////////////////////////////////////////////////////////////////////////
1499 /// Implementation of M_TreeBoost using any loss function as desribed by Friedman 1999
1500 
1501 Double_t TMVA::MethodBDT::GradBoostRegression(std::vector<const TMVA::Event*>& eventSample, DecisionTree *dt )
1502 {
1503  // get the vector of events for each terminal so that we can calculate the constant fit value in each
1504  // terminal node
1505  std::map<TMVA::DecisionTreeNode*,vector< TMVA::LossFunctionEventInfo > > leaves;
1506  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1507  TMVA::DecisionTreeNode* node = dt->GetEventNode(*(*e));
1508  (leaves[node]).push_back(fLossFunctionEventInfo[*e]);
1509  }
1510 
1511  // calculate the constant fit for each terminal node based upon the events in the node
1512  // node (iLeave->first), vector of event information (iLeave->second)
1513  for (std::map<TMVA::DecisionTreeNode*,vector< TMVA::LossFunctionEventInfo > >::iterator iLeave=leaves.begin();
1514  iLeave!=leaves.end();++iLeave){
1515  Double_t fit = fRegressionLossFunctionBDTG->Fit(iLeave->second);
1516  (iLeave->first)->SetResponse(fShrinkage*fit);
1517  }
1518 
1520  return 1;
1521 }
1522 
1523 ////////////////////////////////////////////////////////////////////////////////
1524 /// initialize targets for first tree
1525 
1526 void TMVA::MethodBDT::InitGradBoost( std::vector<const TMVA::Event*>& eventSample)
1527 {
1528  // Should get rid of this line. It's just for debugging.
1529  //std::sort(eventSample.begin(), eventSample.end(), [](const TMVA::Event* a, const TMVA::Event* b){
1530  // return (a->GetTarget(0) < b->GetTarget(0)); });
1531  fSepType=NULL; //set fSepType to NULL (regression trees are used for both classification an regression)
1532  if(DoRegression()){
1533  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1534  fLossFunctionEventInfo[*e]= TMVA::LossFunctionEventInfo((*e)->GetTarget(0), 0, (*e)->GetWeight());
1535  }
1536 
1539  return;
1540  }
1541  else if(DoMulticlass()){
1542  UInt_t nClasses = DataInfo().GetNClasses();
1543  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1544  for (UInt_t i=0;i<nClasses;i++){
1545  //Calculate initial residua, assuming equal probability for all classes
1546  Double_t r = (*e)->GetClass()==i?(1-1.0/nClasses):(-1.0/nClasses);
1547  const_cast<TMVA::Event*>(*e)->SetTarget(i,r);
1548  fResiduals[*e].push_back(0);
1549  }
1550  }
1551  }
1552  else{
1553  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1554  Double_t r = (DataInfo().IsSignal(*e)?1:0)-0.5; //Calculate initial residua
1555  const_cast<TMVA::Event*>(*e)->SetTarget(0,r);
1556  fResiduals[*e].push_back(0);
1557  }
1558  }
1559 
1560 }
1561 ////////////////////////////////////////////////////////////////////////////////
1562 /// test the tree quality.. in terms of Miscalssification
1563 
1565 {
1566  Double_t ncorrect=0, nfalse=0;
1567  for (UInt_t ievt=0; ievt<fValidationSample.size(); ievt++) {
1568  Bool_t isSignalType= (dt->CheckEvent(fValidationSample[ievt]) > fNodePurityLimit ) ? 1 : 0;
1569 
1570  if (isSignalType == (DataInfo().IsSignal(fValidationSample[ievt])) ) {
1571  ncorrect += fValidationSample[ievt]->GetWeight();
1572  }
1573  else{
1574  nfalse += fValidationSample[ievt]->GetWeight();
1575  }
1576  }
1577 
1578  return ncorrect / (ncorrect + nfalse);
1579 }
1580 
1581 ////////////////////////////////////////////////////////////////////////////////
1582 /// apply the boosting alogrithim (the algorithm is selecte via the the "option" given
1583 /// in the constructor. The return value is the boosting weight
1584 
1585 Double_t TMVA::MethodBDT::Boost( std::vector<const TMVA::Event*>& eventSample, DecisionTree *dt, UInt_t cls )
1586 {
1587  Double_t returnVal=-1;
1588 
1589  if (fBoostType=="AdaBoost") returnVal = this->AdaBoost (eventSample, dt);
1590  else if (fBoostType=="AdaCost") returnVal = this->AdaCost (eventSample, dt);
1591  else if (fBoostType=="Bagging") returnVal = this->Bagging ( );
1592  else if (fBoostType=="RegBoost") returnVal = this->RegBoost (eventSample, dt);
1593  else if (fBoostType=="AdaBoostR2") returnVal = this->AdaBoostR2(eventSample, dt);
1594  else if (fBoostType=="Grad"){
1595  if(DoRegression())
1596  returnVal = this->GradBoostRegression(eventSample, dt);
1597  else if(DoMulticlass())
1598  returnVal = this->GradBoost (eventSample, dt, cls);
1599  else
1600  returnVal = this->GradBoost (eventSample, dt);
1601  }
1602  else {
1603  Log() << kINFO << GetOptions() << Endl;
1604  Log() << kFATAL << "<Boost> unknown boost option " << fBoostType<< " called" << Endl;
1605  }
1606 
1607  if (fBaggedBoost){
1609  }
1610 
1611 
1612  return returnVal;
1613 }
1614 
1615 ////////////////////////////////////////////////////////////////////////////////
1616 /// fills the ROCIntegral vs Itree from the testSample for the monitoring plots
1617 /// during the training .. but using the testing events
1618 
1620 {
1622 
1623  TH1F *tmpS = new TH1F( "tmpS", "", 100 , -1., 1.00001 );
1624  TH1F *tmpB = new TH1F( "tmpB", "", 100 , -1., 1.00001 );
1625  TH1F *tmp;
1626 
1627 
1628  UInt_t signalClassNr = DataInfo().GetClassInfo("Signal")->GetNumber();
1629 
1630  // const std::vector<Event*> events=Data()->GetEventCollection(Types::kTesting);
1631  // // fMethod->GetTransformationHandler().CalcTransformations(fMethod->Data()->GetEventCollection(Types::kTesting));
1632  // for (UInt_t iev=0; iev < events.size() ; iev++){
1633  // if (events[iev]->GetClass() == signalClassNr) tmp=tmpS;
1634  // else tmp=tmpB;
1635  // tmp->Fill(PrivateGetMvaValue(*(events[iev])),events[iev]->GetWeight());
1636  // }
1637 
1638  UInt_t nevents = Data()->GetNTestEvents();
1639  for (UInt_t iev=0; iev < nevents; iev++){
1640  const Event* event = GetTestingEvent(iev);
1641 
1642  if (event->GetClass() == signalClassNr) {tmp=tmpS;}
1643  else {tmp=tmpB;}
1644  tmp->Fill(PrivateGetMvaValue(event),event->GetWeight());
1645  }
1646  Double_t max=1;
1647 
1648  std::vector<TH1F*> hS;
1649  std::vector<TH1F*> hB;
1650  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
1651  hS.push_back(new TH1F(Form("SigVar%dAtTree%d",ivar,iTree),Form("SigVar%dAtTree%d",ivar,iTree),100,DataInfo().GetVariableInfo(ivar).GetMin(),DataInfo().GetVariableInfo(ivar).GetMax()));
1652  hB.push_back(new TH1F(Form("BkgVar%dAtTree%d",ivar,iTree),Form("BkgVar%dAtTree%d",ivar,iTree),100,DataInfo().GetVariableInfo(ivar).GetMin(),DataInfo().GetVariableInfo(ivar).GetMax()));
1653  results->Store(hS.back(),hS.back()->GetTitle());
1654  results->Store(hB.back(),hB.back()->GetTitle());
1655  }
1656 
1657 
1658  for (UInt_t iev=0; iev < fEventSample.size(); iev++){
1659  if (fEventSample[iev]->GetBoostWeight() > max) max = 1.01*fEventSample[iev]->GetBoostWeight();
1660  }
1661  TH1F *tmpBoostWeightsS = new TH1F(Form("BoostWeightsInTreeS%d",iTree),Form("BoostWeightsInTreeS%d",iTree),100,0.,max);
1662  TH1F *tmpBoostWeightsB = new TH1F(Form("BoostWeightsInTreeB%d",iTree),Form("BoostWeightsInTreeB%d",iTree),100,0.,max);
1663  results->Store(tmpBoostWeightsS,tmpBoostWeightsS->GetTitle());
1664  results->Store(tmpBoostWeightsB,tmpBoostWeightsB->GetTitle());
1665 
1666  TH1F *tmpBoostWeights;
1667  std::vector<TH1F*> *h;
1668 
1669  for (UInt_t iev=0; iev < fEventSample.size(); iev++){
1670  if (fEventSample[iev]->GetClass() == signalClassNr) {
1671  tmpBoostWeights=tmpBoostWeightsS;
1672  h=&hS;
1673  }else{
1674  tmpBoostWeights=tmpBoostWeightsB;
1675  h=&hB;
1676  }
1677  tmpBoostWeights->Fill(fEventSample[iev]->GetBoostWeight());
1678  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
1679  (*h)[ivar]->Fill(fEventSample[iev]->GetValue(ivar),fEventSample[iev]->GetWeight());
1680  }
1681  }
1682 
1683 
1684  TMVA::PDF *sig = new TMVA::PDF( " PDF Sig", tmpS, TMVA::PDF::kSpline3 );
1685  TMVA::PDF *bkg = new TMVA::PDF( " PDF Bkg", tmpB, TMVA::PDF::kSpline3 );
1686 
1687 
1688  TGraph* gr=results->GetGraph("BoostMonitorGraph");
1689  Int_t nPoints = gr->GetN();
1690  gr->Set(nPoints+1);
1691  gr->SetPoint(nPoints,(Double_t)iTree+1,GetROCIntegral(sig,bkg));
1692 
1693  tmpS->Delete();
1694  tmpB->Delete();
1695 
1696  delete sig;
1697  delete bkg;
1698 
1699  return;
1700 }
1701 
1702 ////////////////////////////////////////////////////////////////////////////////
1703 /// the AdaBoost implementation.
1704 /// a new training sample is generated by weighting
1705 /// events that are misclassified by the decision tree. The weight
1706 /// applied is w = (1-err)/err or more general:
1707 /// w = ((1-err)/err)^beta
1708 /// where err is the fraction of misclassified events in the tree ( <0.5 assuming
1709 /// demanding the that previous selection was better than random guessing)
1710 /// and "beta" being a free parameter (standard: beta = 1) that modifies the
1711 /// boosting.
1712 
1713 Double_t TMVA::MethodBDT::AdaBoost( std::vector<const TMVA::Event*>& eventSample, DecisionTree *dt )
1714 {
1715  Double_t err=0, sumGlobalw=0, sumGlobalwfalse=0, sumGlobalwfalse2=0;
1716 
1717  std::vector<Double_t> sumw(DataInfo().GetNClasses(),0); //for individually re-scaling each class
1718  std::map<Node*,Int_t> sigEventsInNode; // how many signal events of the training tree
1719 
1720  Double_t maxDev=0;
1721  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1722  Double_t w = (*e)->GetWeight();
1723  sumGlobalw += w;
1724  UInt_t iclass=(*e)->GetClass();
1725  sumw[iclass] += w;
1726 
1727  if ( DoRegression() ) {
1728  Double_t tmpDev = TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) );
1729  sumGlobalwfalse += w * tmpDev;
1730  sumGlobalwfalse2 += w * tmpDev*tmpDev;
1731  if (tmpDev > maxDev) maxDev = tmpDev;
1732  }else{
1733 
1734  if (fUseYesNoLeaf){
1735  Bool_t isSignalType = (dt->CheckEvent(*e,fUseYesNoLeaf) > fNodePurityLimit );
1736  if (!(isSignalType == DataInfo().IsSignal(*e))) {
1737  sumGlobalwfalse+= w;
1738  }
1739  }else{
1740  Double_t dtoutput = (dt->CheckEvent(*e,fUseYesNoLeaf) - 0.5)*2.;
1741  Int_t trueType;
1742  if (DataInfo().IsSignal(*e)) trueType = 1;
1743  else trueType = -1;
1744  sumGlobalwfalse+= w*trueType*dtoutput;
1745  }
1746  }
1747  }
1748 
1749  err = sumGlobalwfalse/sumGlobalw ;
1750  if ( DoRegression() ) {
1751  //if quadratic loss:
1752  if (fAdaBoostR2Loss=="linear"){
1753  err = sumGlobalwfalse/maxDev/sumGlobalw ;
1754  }
1755  else if (fAdaBoostR2Loss=="quadratic"){
1756  err = sumGlobalwfalse2/maxDev/maxDev/sumGlobalw ;
1757  }
1758  else if (fAdaBoostR2Loss=="exponential"){
1759  err = 0;
1760  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1761  Double_t w = (*e)->GetWeight();
1762  Double_t tmpDev = TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) );
1763  err += w * (1 - exp (-tmpDev/maxDev)) / sumGlobalw;
1764  }
1765 
1766  }
1767  else {
1768  Log() << kFATAL << " you've chosen a Loss type for Adaboost other than linear, quadratic or exponential "
1769  << " namely " << fAdaBoostR2Loss << "\n"
1770  << "and this is not implemented... a typo in the options ??" <<Endl;
1771  }
1772  }
1773 
1774  Log() << kDEBUG << "BDT AdaBoos wrong/all: " << sumGlobalwfalse << "/" << sumGlobalw << Endl;
1775 
1776 
1777  Double_t newSumGlobalw=0;
1778  std::vector<Double_t> newSumw(sumw.size(),0);
1779 
1780  Double_t boostWeight=1.;
1781  if (err >= 0.5 && fUseYesNoLeaf) { // sanity check ... should never happen as otherwise there is apparently
1782  // something odd with the assignement of the leaf nodes (rem: you use the training
1783  // events for this determination of the error rate)
1784  if (dt->GetNNodes() == 1){
1785  Log() << kERROR << " YOUR tree has only 1 Node... kind of a funny *tree*. I cannot "
1786  << "boost such a thing... if after 1 step the error rate is == 0.5"
1787  << Endl
1788  << "please check why this happens, maybe too many events per node requested ?"
1789  << Endl;
1790 
1791  }else{
1792  Log() << kERROR << " The error rate in the BDT boosting is > 0.5. ("<< err
1793  << ") That should not happen, please check your code (i.e... the BDT code), I "
1794  << " stop boosting here" << Endl;
1795  return -1;
1796  }
1797  err = 0.5;
1798  } else if (err < 0) {
1799  Log() << kERROR << " The error rate in the BDT boosting is < 0. That can happen"
1800  << " due to improper treatment of negative weights in a Monte Carlo.. (if you have"
1801  << " an idea on how to do it in a better way, please let me know (Helge.Voss@cern.ch)"
1802  << " for the time being I set it to its absolute value.. just to continue.." << Endl;
1803  err = TMath::Abs(err);
1804  }
1805  if (fUseYesNoLeaf)
1806  boostWeight = TMath::Log((1.-err)/err)*fAdaBoostBeta;
1807  else
1808  boostWeight = TMath::Log((1.+err)/(1-err))*fAdaBoostBeta;
1809 
1810 
1811  Log() << kDEBUG << "BDT AdaBoos wrong/all: " << sumGlobalwfalse << "/" << sumGlobalw << " 1-err/err="<<boostWeight<< " log.."<<TMath::Log(boostWeight)<<Endl;
1812 
1814 
1815 
1816  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1817 
1818  if (fUseYesNoLeaf||DoRegression()){
1819  if ((!( (dt->CheckEvent(*e,fUseYesNoLeaf) > fNodePurityLimit ) == DataInfo().IsSignal(*e))) || DoRegression()) {
1820  Double_t boostfactor = TMath::Exp(boostWeight);
1821 
1822  if (DoRegression()) boostfactor = TMath::Power(1/boostWeight,(1.-TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) )/maxDev ) );
1823  if ( (*e)->GetWeight() > 0 ){
1824  (*e)->SetBoostWeight( (*e)->GetBoostWeight() * boostfactor);
1825  // Helge change back (*e)->ScaleBoostWeight(boostfactor);
1826  if (DoRegression()) results->GetHist("BoostWeights")->Fill(boostfactor);
1827  } else {
1828  if ( fInverseBoostNegWeights )(*e)->ScaleBoostWeight( 1. / boostfactor); // if the original event weight is negative, and you want to "increase" the events "positive" influence, you'd reather make the event weight "smaller" in terms of it's absolute value while still keeping it something "negative"
1829  else (*e)->SetBoostWeight( (*e)->GetBoostWeight() * boostfactor);
1830 
1831  }
1832  }
1833 
1834  }else{
1835  Double_t dtoutput = (dt->CheckEvent(*e,fUseYesNoLeaf) - 0.5)*2.;
1836  Int_t trueType;
1837  if (DataInfo().IsSignal(*e)) trueType = 1;
1838  else trueType = -1;
1839  Double_t boostfactor = TMath::Exp(-1*boostWeight*trueType*dtoutput);
1840 
1841  if ( (*e)->GetWeight() > 0 ){
1842  (*e)->SetBoostWeight( (*e)->GetBoostWeight() * boostfactor);
1843  // Helge change back (*e)->ScaleBoostWeight(boostfactor);
1844  if (DoRegression()) results->GetHist("BoostWeights")->Fill(boostfactor);
1845  } else {
1846  if ( fInverseBoostNegWeights )(*e)->ScaleBoostWeight( 1. / boostfactor); // if the original event weight is negative, and you want to "increase" the events "positive" influence, you'd reather make the event weight "smaller" in terms of it's absolute value while still keeping it something "negative"
1847  else (*e)->SetBoostWeight( (*e)->GetBoostWeight() * boostfactor);
1848  }
1849  }
1850  newSumGlobalw+=(*e)->GetWeight();
1851  newSumw[(*e)->GetClass()] += (*e)->GetWeight();
1852  }
1853 
1854 
1855  // Double_t globalNormWeight=sumGlobalw/newSumGlobalw;
1856  Double_t globalNormWeight=( (Double_t) eventSample.size())/newSumGlobalw;
1857  Log() << kDEBUG << "new Nsig="<<newSumw[0]*globalNormWeight << " new Nbkg="<<newSumw[1]*globalNormWeight << Endl;
1858 
1859 
1860  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1861  // if (fRenormByClass) (*e)->ScaleBoostWeight( normWeightByClass[(*e)->GetClass()] );
1862  // else (*e)->ScaleBoostWeight( globalNormWeight );
1863  // else (*e)->ScaleBoostWeight( globalNormWeight );
1864  if (DataInfo().IsSignal(*e))(*e)->ScaleBoostWeight( globalNormWeight * fSigToBkgFraction );
1865  else (*e)->ScaleBoostWeight( globalNormWeight );
1866  }
1867 
1868  if (!(DoRegression()))results->GetHist("BoostWeights")->Fill(boostWeight);
1869  results->GetHist("BoostWeightsVsTree")->SetBinContent(fForest.size(),boostWeight);
1870  results->GetHist("ErrorFrac")->SetBinContent(fForest.size(),err);
1871 
1872  fBoostWeight = boostWeight;
1873  fErrorFraction = err;
1874 
1875  return boostWeight;
1876 }
1877 
1878 
1879 ////////////////////////////////////////////////////////////////////////////////
1880 /// the AdaCost boosting algorithm takes a simple cost Matrix (currently fixed for
1881 /// all events... later could be modified to use individual cost matrices for each
1882 /// events as in the original paper...
1883 ///
1884 /// true_signal true_bkg
1885 /// ----------------------------------
1886 /// sel_signal | Css Ctb_ss Cxx.. in the range [0,1]
1887 /// sel_bkg | Cts_sb Cbb
1888 ///
1889 /// and takes this into account when calculating the misclass. cost (former: error fraction):
1890 ///
1891 /// err = sum_events ( weight* y_true*y_sel * beta(event)
1892 ///
1893 
1894 Double_t TMVA::MethodBDT::AdaCost( vector<const TMVA::Event*>& eventSample, DecisionTree *dt )
1895 {
1896  Double_t Css = fCss;
1897  Double_t Cbb = fCbb;
1898  Double_t Cts_sb = fCts_sb;
1899  Double_t Ctb_ss = fCtb_ss;
1900 
1901  Double_t err=0, sumGlobalWeights=0, sumGlobalCost=0;
1902 
1903  std::vector<Double_t> sumw(DataInfo().GetNClasses(),0); //for individually re-scaling each class
1904  std::map<Node*,Int_t> sigEventsInNode; // how many signal events of the training tree
1905 
1906  for (vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1907  Double_t w = (*e)->GetWeight();
1908  sumGlobalWeights += w;
1909  UInt_t iclass=(*e)->GetClass();
1910 
1911  sumw[iclass] += w;
1912 
1913  if ( DoRegression() ) {
1914  Log() << kFATAL << " AdaCost not implemented for regression"<<Endl;
1915  }else{
1916 
1917  Double_t dtoutput = (dt->CheckEvent(*e,false) - 0.5)*2.;
1918  Int_t trueType;
1919  Bool_t isTrueSignal = DataInfo().IsSignal(*e);
1920  Bool_t isSelectedSignal = (dtoutput>0);
1921  if (isTrueSignal) trueType = 1;
1922  else trueType = -1;
1923 
1924  Double_t cost=0;
1925  if (isTrueSignal && isSelectedSignal) cost=Css;
1926  else if (isTrueSignal && !isSelectedSignal) cost=Cts_sb;
1927  else if (!isTrueSignal && isSelectedSignal) cost=Ctb_ss;
1928  else if (!isTrueSignal && !isSelectedSignal) cost=Cbb;
1929  else Log() << kERROR << "something went wrong in AdaCost" << Endl;
1930 
1931  sumGlobalCost+= w*trueType*dtoutput*cost;
1932 
1933  }
1934  }
1935 
1936  if ( DoRegression() ) {
1937  Log() << kFATAL << " AdaCost not implemented for regression"<<Endl;
1938  }
1939 
1940  // Log() << kDEBUG << "BDT AdaBoos wrong/all: " << sumGlobalCost << "/" << sumGlobalWeights << Endl;
1941  // Log() << kWARNING << "BDT AdaBoos wrong/all: " << sumGlobalCost << "/" << sumGlobalWeights << Endl;
1942  sumGlobalCost /= sumGlobalWeights;
1943  // Log() << kWARNING << "BDT AdaBoos wrong/all: " << sumGlobalCost << "/" << sumGlobalWeights << Endl;
1944 
1945 
1946  Double_t newSumGlobalWeights=0;
1947  vector<Double_t> newSumClassWeights(sumw.size(),0);
1948 
1949  Double_t boostWeight = TMath::Log((1+sumGlobalCost)/(1-sumGlobalCost)) * fAdaBoostBeta;
1950 
1952 
1953  for (vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1954  Double_t dtoutput = (dt->CheckEvent(*e,false) - 0.5)*2.;
1955  Int_t trueType;
1956  Bool_t isTrueSignal = DataInfo().IsSignal(*e);
1957  Bool_t isSelectedSignal = (dtoutput>0);
1958  if (isTrueSignal) trueType = 1;
1959  else trueType = -1;
1960 
1961  Double_t cost=0;
1962  if (isTrueSignal && isSelectedSignal) cost=Css;
1963  else if (isTrueSignal && !isSelectedSignal) cost=Cts_sb;
1964  else if (!isTrueSignal && isSelectedSignal) cost=Ctb_ss;
1965  else if (!isTrueSignal && !isSelectedSignal) cost=Cbb;
1966  else Log() << kERROR << "something went wrong in AdaCost" << Endl;
1967 
1968  Double_t boostfactor = TMath::Exp(-1*boostWeight*trueType*dtoutput*cost);
1969  if (DoRegression())Log() << kFATAL << " AdaCost not implemented for regression"<<Endl;
1970  if ( (*e)->GetWeight() > 0 ){
1971  (*e)->SetBoostWeight( (*e)->GetBoostWeight() * boostfactor);
1972  // Helge change back (*e)->ScaleBoostWeight(boostfactor);
1973  if (DoRegression())Log() << kFATAL << " AdaCost not implemented for regression"<<Endl;
1974  } else {
1975  if ( fInverseBoostNegWeights )(*e)->ScaleBoostWeight( 1. / boostfactor); // if the original event weight is negative, and you want to "increase" the events "positive" influence, you'd reather make the event weight "smaller" in terms of it's absolute value while still keeping it something "negative"
1976  }
1977 
1978  newSumGlobalWeights+=(*e)->GetWeight();
1979  newSumClassWeights[(*e)->GetClass()] += (*e)->GetWeight();
1980  }
1981 
1982 
1983  // Double_t globalNormWeight=sumGlobalWeights/newSumGlobalWeights;
1984  Double_t globalNormWeight=Double_t(eventSample.size())/newSumGlobalWeights;
1985  Log() << kDEBUG << "new Nsig="<<newSumClassWeights[0]*globalNormWeight << " new Nbkg="<<newSumClassWeights[1]*globalNormWeight << Endl;
1986 
1987 
1988  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1989  // if (fRenormByClass) (*e)->ScaleBoostWeight( normWeightByClass[(*e)->GetClass()] );
1990  // else (*e)->ScaleBoostWeight( globalNormWeight );
1991  if (DataInfo().IsSignal(*e))(*e)->ScaleBoostWeight( globalNormWeight * fSigToBkgFraction );
1992  else (*e)->ScaleBoostWeight( globalNormWeight );
1993  }
1994 
1995 
1996  if (!(DoRegression()))results->GetHist("BoostWeights")->Fill(boostWeight);
1997  results->GetHist("BoostWeightsVsTree")->SetBinContent(fForest.size(),boostWeight);
1998  results->GetHist("ErrorFrac")->SetBinContent(fForest.size(),err);
1999 
2000  fBoostWeight = boostWeight;
2001  fErrorFraction = err;
2002 
2003 
2004  return boostWeight;
2005 }
2006 
2007 
2008 ////////////////////////////////////////////////////////////////////////////////
2009 /// call it boot-strapping, re-sampling or whatever you like, in the end it is nothing
2010 /// else but applying "random" poisson weights to each event.
2011 
2013 {
2014  // this is now done in "MethodBDT::Boost as it might be used by other boost methods, too
2015  // GetBaggedSample(eventSample);
2016 
2017  return 1.; //here as there are random weights for each event, just return a constant==1;
2018 }
2019 
2020 ////////////////////////////////////////////////////////////////////////////////
2021 /// fills fEventSample with fBaggedSampleFraction*NEvents random training events
2022 
2023 void TMVA::MethodBDT::GetBaggedSubSample(std::vector<const TMVA::Event*>& eventSample)
2024 {
2025 
2026  Double_t n;
2027  TRandom3 *trandom = new TRandom3(100*fForest.size()+1234);
2028 
2029  if (!fSubSample.empty()) fSubSample.clear();
2030 
2031  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
2032  n = trandom->PoissonD(fBaggedSampleFraction);
2033  for (Int_t i=0;i<n;i++) fSubSample.push_back(*e);
2034  }
2035 
2036  delete trandom;
2037  return;
2038 
2039  /*
2040  UInt_t nevents = fEventSample.size();
2041 
2042  if (!fSubSample.empty()) fSubSample.clear();
2043  TRandom3 *trandom = new TRandom3(fForest.size()+1);
2044 
2045  for (UInt_t ievt=0; ievt<nevents; ievt++) { // recreate new random subsample
2046  if(trandom->Rndm()<fBaggedSampleFraction)
2047  fSubSample.push_back(fEventSample[ievt]);
2048  }
2049  delete trandom;
2050  */
2051 
2052 }
2053 
2054 ////////////////////////////////////////////////////////////////////////////////
2055 /// a special boosting only for Regression ...
2056 /// maybe I'll implement it later...
2057 
2058 Double_t TMVA::MethodBDT::RegBoost( std::vector<const TMVA::Event*>& /* eventSample */, DecisionTree* /* dt */ )
2059 {
2060  return 1;
2061 }
2062 
2063 ////////////////////////////////////////////////////////////////////////////////
2064 /// adaption of the AdaBoost to regression problems (see H.Drucker 1997)
2065 
2066 Double_t TMVA::MethodBDT::AdaBoostR2( std::vector<const TMVA::Event*>& eventSample, DecisionTree *dt )
2067 {
2068  if ( !DoRegression() ) Log() << kFATAL << "Somehow you chose a regression boost method for a classification job" << Endl;
2069 
2070  Double_t err=0, sumw=0, sumwfalse=0, sumwfalse2=0;
2071  Double_t maxDev=0;
2072  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
2073  Double_t w = (*e)->GetWeight();
2074  sumw += w;
2075 
2076  Double_t tmpDev = TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) );
2077  sumwfalse += w * tmpDev;
2078  sumwfalse2 += w * tmpDev*tmpDev;
2079  if (tmpDev > maxDev) maxDev = tmpDev;
2080  }
2081 
2082  //if quadratic loss:
2083  if (fAdaBoostR2Loss=="linear"){
2084  err = sumwfalse/maxDev/sumw ;
2085  }
2086  else if (fAdaBoostR2Loss=="quadratic"){
2087  err = sumwfalse2/maxDev/maxDev/sumw ;
2088  }
2089  else if (fAdaBoostR2Loss=="exponential"){
2090  err = 0;
2091  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
2092  Double_t w = (*e)->GetWeight();
2093  Double_t tmpDev = TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) );
2094  err += w * (1 - exp (-tmpDev/maxDev)) / sumw;
2095  }
2096 
2097  }
2098  else {
2099  Log() << kFATAL << " you've chosen a Loss type for Adaboost other than linear, quadratic or exponential "
2100  << " namely " << fAdaBoostR2Loss << "\n"
2101  << "and this is not implemented... a typo in the options ??" <<Endl;
2102  }
2103 
2104 
2105  if (err >= 0.5) { // sanity check ... should never happen as otherwise there is apparently
2106  // something odd with the assignement of the leaf nodes (rem: you use the training
2107  // events for this determination of the error rate)
2108  if (dt->GetNNodes() == 1){
2109  Log() << kERROR << " YOUR tree has only 1 Node... kind of a funny *tree*. I cannot "
2110  << "boost such a thing... if after 1 step the error rate is == 0.5"
2111  << Endl
2112  << "please check why this happens, maybe too many events per node requested ?"
2113  << Endl;
2114 
2115  }else{
2116  Log() << kERROR << " The error rate in the BDT boosting is > 0.5. ("<< err
2117  << ") That should not happen, but is possible for regression trees, and"
2118  << " should trigger a stop for the boosting. please check your code (i.e... the BDT code), I "
2119  << " stop boosting " << Endl;
2120  return -1;
2121  }
2122  err = 0.5;
2123  } else if (err < 0) {
2124  Log() << kERROR << " The error rate in the BDT boosting is < 0. That can happen"
2125  << " due to improper treatment of negative weights in a Monte Carlo.. (if you have"
2126  << " an idea on how to do it in a better way, please let me know (Helge.Voss@cern.ch)"
2127  << " for the time being I set it to its absolute value.. just to continue.." << Endl;
2128  err = TMath::Abs(err);
2129  }
2130 
2131  Double_t boostWeight = err / (1.-err);
2132  Double_t newSumw=0;
2133 
2135 
2136  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
2137  Double_t boostfactor = TMath::Power(boostWeight,(1.-TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) )/maxDev ) );
2138  results->GetHist("BoostWeights")->Fill(boostfactor);
2139  // std::cout << "R2 " << boostfactor << " " << boostWeight << " " << (1.-TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) )/maxDev) << std::endl;
2140  if ( (*e)->GetWeight() > 0 ){
2141  Float_t newBoostWeight = (*e)->GetBoostWeight() * boostfactor;
2142  Float_t newWeight = (*e)->GetWeight() * (*e)->GetBoostWeight() * boostfactor;
2143  if (newWeight == 0) {
2144  Log() << kINFO << "Weight= " << (*e)->GetWeight() << Endl;
2145  Log() << kINFO << "BoostWeight= " << (*e)->GetBoostWeight() << Endl;
2146  Log() << kINFO << "boostweight="<<boostWeight << " err= " <<err << Endl;
2147  Log() << kINFO << "NewBoostWeight= " << newBoostWeight << Endl;
2148  Log() << kINFO << "boostfactor= " << boostfactor << Endl;
2149  Log() << kINFO << "maxDev = " << maxDev << Endl;
2150  Log() << kINFO << "tmpDev = " << TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) ) << Endl;
2151  Log() << kINFO << "target = " << (*e)->GetTarget(0) << Endl;
2152  Log() << kINFO << "estimate = " << dt->CheckEvent(*e,kFALSE) << Endl;
2153  }
2154  (*e)->SetBoostWeight( newBoostWeight );
2155  // (*e)->SetBoostWeight( (*e)->GetBoostWeight() * boostfactor);
2156  } else {
2157  (*e)->SetBoostWeight( (*e)->GetBoostWeight() / boostfactor);
2158  }
2159  newSumw+=(*e)->GetWeight();
2160  }
2161 
2162  // re-normalise the weights
2163  Double_t normWeight = sumw / newSumw;
2164  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
2165  //Helge (*e)->ScaleBoostWeight( sumw/newSumw);
2166  // (*e)->ScaleBoostWeight( normWeight);
2167  (*e)->SetBoostWeight( (*e)->GetBoostWeight() * normWeight );
2168  }
2169 
2170 
2171  results->GetHist("BoostWeightsVsTree")->SetBinContent(fForest.size(),1./boostWeight);
2172  results->GetHist("ErrorFrac")->SetBinContent(fForest.size(),err);
2173 
2174  fBoostWeight = boostWeight;
2175  fErrorFraction = err;
2176 
2177  return TMath::Log(1./boostWeight);
2178 }
2179 
2180 ////////////////////////////////////////////////////////////////////////////////
2181 /// write weights to XML
2182 
2183 void TMVA::MethodBDT::AddWeightsXMLTo( void* parent ) const
2184 {
2185  void* wght = gTools().AddChild(parent, "Weights");
2186 
2187  if (fDoPreselection){
2188  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
2189  gTools().AddAttr( wght, Form("PreselectionLowBkgVar%d",ivar), fIsLowBkgCut[ivar]);
2190  gTools().AddAttr( wght, Form("PreselectionLowBkgVar%dValue",ivar), fLowBkgCut[ivar]);
2191  gTools().AddAttr( wght, Form("PreselectionLowSigVar%d",ivar), fIsLowSigCut[ivar]);
2192  gTools().AddAttr( wght, Form("PreselectionLowSigVar%dValue",ivar), fLowSigCut[ivar]);
2193  gTools().AddAttr( wght, Form("PreselectionHighBkgVar%d",ivar), fIsHighBkgCut[ivar]);
2194  gTools().AddAttr( wght, Form("PreselectionHighBkgVar%dValue",ivar),fHighBkgCut[ivar]);
2195  gTools().AddAttr( wght, Form("PreselectionHighSigVar%d",ivar), fIsHighSigCut[ivar]);
2196  gTools().AddAttr( wght, Form("PreselectionHighSigVar%dValue",ivar),fHighSigCut[ivar]);
2197  }
2198  }
2199 
2200 
2201  gTools().AddAttr( wght, "NTrees", fForest.size() );
2202  gTools().AddAttr( wght, "AnalysisType", fForest.back()->GetAnalysisType() );
2203 
2204  for (UInt_t i=0; i< fForest.size(); i++) {
2205  void* trxml = fForest[i]->AddXMLTo(wght);
2206  gTools().AddAttr( trxml, "boostWeight", fBoostWeights[i] );
2207  gTools().AddAttr( trxml, "itree", i );
2208  }
2209 }
2210 
2211 ////////////////////////////////////////////////////////////////////////////////
2212 /// reads the BDT from the xml file
2213 
2215  UInt_t i;
2216  for (i=0; i<fForest.size(); i++) delete fForest[i];
2217  fForest.clear();
2218  fBoostWeights.clear();
2219 
2220  UInt_t ntrees;
2221  UInt_t analysisType;
2222  Float_t boostWeight;
2223 
2224 
2225  if (gTools().HasAttr( parent, Form("PreselectionLowBkgVar%d",0))) {
2226  fIsLowBkgCut.resize(GetNvar());
2227  fLowBkgCut.resize(GetNvar());
2228  fIsLowSigCut.resize(GetNvar());
2229  fLowSigCut.resize(GetNvar());
2230  fIsHighBkgCut.resize(GetNvar());
2231  fHighBkgCut.resize(GetNvar());
2232  fIsHighSigCut.resize(GetNvar());
2233  fHighSigCut.resize(GetNvar());
2234 
2235  Bool_t tmpBool;
2236  Double_t tmpDouble;
2237  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
2238  gTools().ReadAttr( parent, Form("PreselectionLowBkgVar%d",ivar), tmpBool);
2239  fIsLowBkgCut[ivar]=tmpBool;
2240  gTools().ReadAttr( parent, Form("PreselectionLowBkgVar%dValue",ivar), tmpDouble);
2241  fLowBkgCut[ivar]=tmpDouble;
2242  gTools().ReadAttr( parent, Form("PreselectionLowSigVar%d",ivar), tmpBool);
2243  fIsLowSigCut[ivar]=tmpBool;
2244  gTools().ReadAttr( parent, Form("PreselectionLowSigVar%dValue",ivar), tmpDouble);
2245  fLowSigCut[ivar]=tmpDouble;
2246  gTools().ReadAttr( parent, Form("PreselectionHighBkgVar%d",ivar), tmpBool);
2247  fIsHighBkgCut[ivar]=tmpBool;
2248  gTools().ReadAttr( parent, Form("PreselectionHighBkgVar%dValue",ivar), tmpDouble);
2249  fHighBkgCut[ivar]=tmpDouble;
2250  gTools().ReadAttr( parent, Form("PreselectionHighSigVar%d",ivar),tmpBool);
2251  fIsHighSigCut[ivar]=tmpBool;
2252  gTools().ReadAttr( parent, Form("PreselectionHighSigVar%dValue",ivar), tmpDouble);
2253  fHighSigCut[ivar]=tmpDouble;
2254  }
2255  }
2256 
2257 
2258  gTools().ReadAttr( parent, "NTrees", ntrees );
2259 
2260  if(gTools().HasAttr(parent, "TreeType")) { // pre 4.1.0 version
2261  gTools().ReadAttr( parent, "TreeType", analysisType );
2262  } else { // from 4.1.0 onwards
2263  gTools().ReadAttr( parent, "AnalysisType", analysisType );
2264  }
2265 
2266  void* ch = gTools().GetChild(parent);
2267  i=0;
2268  while(ch) {
2269  fForest.push_back( dynamic_cast<DecisionTree*>( DecisionTree::CreateFromXML(ch, GetTrainingTMVAVersionCode()) ) );
2270  fForest.back()->SetAnalysisType(Types::EAnalysisType(analysisType));
2271  fForest.back()->SetTreeID(i++);
2272  gTools().ReadAttr(ch,"boostWeight",boostWeight);
2273  fBoostWeights.push_back(boostWeight);
2274  ch = gTools().GetNextChild(ch);
2275  }
2276 }
2277 
2278 ////////////////////////////////////////////////////////////////////////////////
2279 /// read the weights (BDT coefficients)
2280 
2281 void TMVA::MethodBDT::ReadWeightsFromStream( std::istream& istr )
2282 {
2283  TString dummy;
2284  // Types::EAnalysisType analysisType;
2285  Int_t analysisType(0);
2286 
2287  // coverity[tainted_data_argument]
2288  istr >> dummy >> fNTrees;
2289  Log() << kINFO << "Read " << fNTrees << " Decision trees" << Endl;
2290 
2291  for (UInt_t i=0;i<fForest.size();i++) delete fForest[i];
2292  fForest.clear();
2293  fBoostWeights.clear();
2294  Int_t iTree;
2295  Double_t boostWeight;
2296  for (int i=0;i<fNTrees;i++) {
2297  istr >> dummy >> iTree >> dummy >> boostWeight;
2298  if (iTree != i) {
2299  fForest.back()->Print( std::cout );
2300  Log() << kFATAL << "Error while reading weight file; mismatch iTree="
2301  << iTree << " i=" << i
2302  << " dummy " << dummy
2303  << " boostweight " << boostWeight
2304  << Endl;
2305  }
2306  fForest.push_back( new DecisionTree() );
2307  fForest.back()->SetAnalysisType(Types::EAnalysisType(analysisType));
2308  fForest.back()->SetTreeID(i);
2309  fForest.back()->Read(istr, GetTrainingTMVAVersionCode());
2310  fBoostWeights.push_back(boostWeight);
2311  }
2312 }
2313 
2314 ////////////////////////////////////////////////////////////////////////////////
2315 
2317  return this->GetMvaValue( err, errUpper, 0 );
2318 }
2319 
2320 ////////////////////////////////////////////////////////////////////////////////
2321 /// Return the MVA value (range [-1;1]) that classifies the
2322 /// event according to the majority vote from the total number of
2323 /// decision trees.
2324 
2326 {
2327  const Event* ev = GetEvent();
2328  if (fDoPreselection) {
2329  Double_t val = ApplyPreselectionCuts(ev);
2330  if (TMath::Abs(val)>0.05) return val;
2331  }
2332  return PrivateGetMvaValue(ev, err, errUpper, useNTrees);
2333 
2334 }
2335 ////////////////////////////////////////////////////////////////////////////////
2336 /// Return the MVA value (range [-1;1]) that classifies the
2337 /// event according to the majority vote from the total number of
2338 /// decision trees.
2339 
2341 {
2342  // cannot determine error
2343  NoErrorCalc(err, errUpper);
2344 
2345  // allow for the possibility to use less trees in the actual MVA calculation
2346  // than have been originally trained.
2347  UInt_t nTrees = fForest.size();
2348 
2349  if (useNTrees > 0 ) nTrees = useNTrees;
2350 
2351  if (fBoostType=="Grad") return GetGradBoostMVA(ev,nTrees);
2352 
2353  Double_t myMVA = 0;
2354  Double_t norm = 0;
2355  for (UInt_t itree=0; itree<nTrees; itree++) {
2356  //
2357  myMVA += fBoostWeights[itree] * fForest[itree]->CheckEvent(ev,fUseYesNoLeaf);
2358  norm += fBoostWeights[itree];
2359  }
2360  return ( norm > std::numeric_limits<double>::epsilon() ) ? myMVA /= norm : 0 ;
2361 }
2362 
2363 
2364 ////////////////////////////////////////////////////////////////////////////////
2365 /// get the multiclass MVA response for the BDT classifier
2366 
2367 const std::vector<Float_t>& TMVA::MethodBDT::GetMulticlassValues()
2368 {
2369  const TMVA::Event *e = GetEvent();
2370  if (fMulticlassReturnVal == NULL) fMulticlassReturnVal = new std::vector<Float_t>();
2371  fMulticlassReturnVal->clear();
2372 
2373  std::vector<double> temp;
2374 
2375  UInt_t nClasses = DataInfo().GetNClasses();
2376  for(UInt_t iClass=0; iClass<nClasses; iClass++){
2377  temp.push_back(0.0);
2378  for(UInt_t itree = iClass; itree<fForest.size(); itree+=nClasses){
2379  temp[iClass] += fForest[itree]->CheckEvent(e,kFALSE);
2380  }
2381  }
2382 
2383  for(UInt_t iClass=0; iClass<nClasses; iClass++){
2384  Double_t norm = 0.0;
2385  for(UInt_t j=0;j<nClasses;j++){
2386  if(iClass!=j)
2387  norm+=exp(temp[j]-temp[iClass]);
2388  }
2389  (*fMulticlassReturnVal).push_back(1.0/(1.0+norm));
2390  }
2391 
2392 
2393  return *fMulticlassReturnVal;
2394 }
2395 
2396 
2397 
2398 
2399 ////////////////////////////////////////////////////////////////////////////////
2400 /// get the regression value generated by the BDTs
2401 
2402 const std::vector<Float_t> & TMVA::MethodBDT::GetRegressionValues()
2403 {
2404 
2405  if (fRegressionReturnVal == NULL) fRegressionReturnVal = new std::vector<Float_t>();
2406  fRegressionReturnVal->clear();
2407 
2408  const Event * ev = GetEvent();
2409  Event * evT = new Event(*ev);
2410 
2411  Double_t myMVA = 0;
2412  Double_t norm = 0;
2413  if (fBoostType=="AdaBoostR2") {
2414  // rather than using the weighted average of the tree respones in the forest
2415  // H.Decker(1997) proposed to use the "weighted median"
2416 
2417  // sort all individual tree responses according to the prediction value
2418  // (keep the association to their tree weight)
2419  // the sum up all the associated weights (starting from the one whose tree
2420  // yielded the smalles response) up to the tree "t" at which you've
2421  // added enough tree weights to have more than half of the sum of all tree weights.
2422  // choose as response of the forest that one which belongs to this "t"
2423 
2424  vector< Double_t > response(fForest.size());
2425  vector< Double_t > weight(fForest.size());
2426  Double_t totalSumOfWeights = 0;
2427 
2428  for (UInt_t itree=0; itree<fForest.size(); itree++) {
2429  response[itree] = fForest[itree]->CheckEvent(ev,kFALSE);
2430  weight[itree] = fBoostWeights[itree];
2431  totalSumOfWeights += fBoostWeights[itree];
2432  }
2433 
2434  std::vector< std::vector<Double_t> > vtemp;
2435  vtemp.push_back( response ); // this is the vector that will get sorted
2436  vtemp.push_back( weight );
2437  gTools().UsefulSortAscending( vtemp );
2438 
2439  Int_t t=0;
2440  Double_t sumOfWeights = 0;
2441  while (sumOfWeights <= totalSumOfWeights/2.) {
2442  sumOfWeights += vtemp[1][t];
2443  t++;
2444  }
2445 
2446  Double_t rVal=0;
2447  Int_t count=0;
2448  for (UInt_t i= TMath::Max(UInt_t(0),UInt_t(t-(fForest.size()/6)-0.5));
2449  i< TMath::Min(UInt_t(fForest.size()),UInt_t(t+(fForest.size()/6)+0.5)); i++) {
2450  count++;
2451  rVal+=vtemp[0][i];
2452  }
2453  // fRegressionReturnVal->push_back( rVal/Double_t(count));
2454  evT->SetTarget(0, rVal/Double_t(count) );
2455  }
2456  else if(fBoostType=="Grad"){
2457  for (UInt_t itree=0; itree<fForest.size(); itree++) {
2458  myMVA += fForest[itree]->CheckEvent(ev,kFALSE);
2459  }
2460  // fRegressionReturnVal->push_back( myMVA+fBoostWeights[0]);
2461  evT->SetTarget(0, myMVA+fBoostWeights[0] );
2462  }
2463  else{
2464  for (UInt_t itree=0; itree<fForest.size(); itree++) {
2465  //
2466  myMVA += fBoostWeights[itree] * fForest[itree]->CheckEvent(ev,kFALSE);
2467  norm += fBoostWeights[itree];
2468  }
2469  // fRegressionReturnVal->push_back( ( norm > std::numeric_limits<double>::epsilon() ) ? myMVA /= norm : 0 );
2470  evT->SetTarget(0, ( norm > std::numeric_limits<double>::epsilon() ) ? myMVA /= norm : 0 );
2471  }
2472 
2473 
2474 
2475  const Event* evT2 = GetTransformationHandler().InverseTransform( evT );
2476  fRegressionReturnVal->push_back( evT2->GetTarget(0) );
2477 
2478  delete evT;
2479 
2480 
2481  return *fRegressionReturnVal;
2482 }
2483 
2484 ////////////////////////////////////////////////////////////////////////////////
2485 /// Here we could write some histograms created during the processing
2486 /// to the output file.
2487 
2489 {
2490  Log() << kDEBUG << "\tWrite monitoring histograms to file: " << BaseDir()->GetPath() << Endl;
2491 
2492  //Results* results = Data()->GetResults(GetMethodName(), Types::kTraining, Types::kMaxAnalysisType);
2493  //results->GetStorage()->Write();
2494  fMonitorNtuple->Write();
2495 }
2496 
2497 ////////////////////////////////////////////////////////////////////////////////
2498 /// Return the relative variable importance, normalized to all
2499 /// variables together having the importance 1. The importance in
2500 /// evaluated as the total separation-gain that this variable had in
2501 /// the decision trees (weighted by the number of events)
2502 
2504 {
2505  fVariableImportance.resize(GetNvar());
2506  for (UInt_t ivar = 0; ivar < GetNvar(); ivar++) {
2507  fVariableImportance[ivar]=0;
2508  }
2509  Double_t sum=0;
2510  for (UInt_t itree = 0; itree < GetNTrees(); itree++) {
2511  std::vector<Double_t> relativeImportance(fForest[itree]->GetVariableImportance());
2512  for (UInt_t i=0; i< relativeImportance.size(); i++) {
2513  fVariableImportance[i] += fBoostWeights[itree] * relativeImportance[i];
2514  }
2515  }
2516 
2517  for (UInt_t ivar=0; ivar< fVariableImportance.size(); ivar++){
2519  sum += fVariableImportance[ivar];
2520  }
2521  for (UInt_t ivar=0; ivar< fVariableImportance.size(); ivar++) fVariableImportance[ivar] /= sum;
2522 
2523  return fVariableImportance;
2524 }
2525 
2526 ////////////////////////////////////////////////////////////////////////////////
2527 /// Returns the measure for the variable importance of variable "ivar"
2528 /// which is later used in GetVariableImportance() to calculate the
2529 /// relative variable importances.
2530 
2532 {
2533  std::vector<Double_t> relativeImportance = this->GetVariableImportance();
2534  if (ivar < (UInt_t)relativeImportance.size()) return relativeImportance[ivar];
2535  else Log() << kFATAL << "<GetVariableImportance> ivar = " << ivar << " is out of range " << Endl;
2536 
2537  return -1;
2538 }
2539 
2540 ////////////////////////////////////////////////////////////////////////////////
2541 /// Compute ranking of input variables
2542 
2544 {
2545  // create the ranking object
2546  fRanking = new Ranking( GetName(), "Variable Importance" );
2547  vector< Double_t> importance(this->GetVariableImportance());
2548 
2549  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
2550 
2551  fRanking->AddRank( Rank( GetInputLabel(ivar), importance[ivar] ) );
2552  }
2553 
2554  return fRanking;
2555 }
2556 
2557 ////////////////////////////////////////////////////////////////////////////////
2558 /// Get help message text
2559 ///
2560 /// typical length of text line:
2561 /// "|--------------------------------------------------------------|"
2562 
2564 {
2565  Log() << Endl;
2566  Log() << gTools().Color("bold") << "--- Short description:" << gTools().Color("reset") << Endl;
2567  Log() << Endl;
2568  Log() << "Boosted Decision Trees are a collection of individual decision" << Endl;
2569  Log() << "trees which form a multivariate classifier by (weighted) majority " << Endl;
2570  Log() << "vote of the individual trees. Consecutive decision trees are " << Endl;
2571  Log() << "trained using the original training data set with re-weighted " << Endl;
2572  Log() << "events. By default, the AdaBoost method is employed, which gives " << Endl;
2573  Log() << "events that were misclassified in the previous tree a larger " << Endl;
2574  Log() << "weight in the training of the following tree." << Endl;
2575  Log() << Endl;
2576  Log() << "Decision trees are a sequence of binary splits of the data sample" << Endl;
2577  Log() << "using a single descriminant variable at a time. A test event " << Endl;
2578  Log() << "ending up after the sequence of left-right splits in a final " << Endl;
2579  Log() << "(\"leaf\") node is classified as either signal or background" << Endl;
2580  Log() << "depending on the majority type of training events in that node." << Endl;
2581  Log() << Endl;
2582  Log() << gTools().Color("bold") << "--- Performance optimisation:" << gTools().Color("reset") << Endl;
2583  Log() << Endl;
2584  Log() << "By the nature of the binary splits performed on the individual" << Endl;
2585  Log() << "variables, decision trees do not deal well with linear correlations" << Endl;
2586  Log() << "between variables (they need to approximate the linear split in" << Endl;
2587  Log() << "the two dimensional space by a sequence of splits on the two " << Endl;
2588  Log() << "variables individually). Hence decorrelation could be useful " << Endl;
2589  Log() << "to optimise the BDT performance." << Endl;
2590  Log() << Endl;
2591  Log() << gTools().Color("bold") << "--- Performance tuning via configuration options:" << gTools().Color("reset") << Endl;
2592  Log() << Endl;
2593  Log() << "The two most important parameters in the configuration are the " << Endl;
2594  Log() << "minimal number of events requested by a leaf node as percentage of the " <<Endl;
2595  Log() << " number of training events (option \"MinNodeSize\" replacing the actual number " << Endl;
2596  Log() << " of events \"nEventsMin\" as given in earlier versions" << Endl;
2597  Log() << "If this number is too large, detailed features " << Endl;
2598  Log() << "in the parameter space are hard to be modelled. If it is too small, " << Endl;
2599  Log() << "the risk to overtrain rises and boosting seems to be less effective" << Endl;
2600  Log() << " typical values from our current expericience for best performance " << Endl;
2601  Log() << " are between 0.5(%) and 10(%) " << Endl;
2602  Log() << Endl;
2603  Log() << "The default minimal number is currently set to " << Endl;
2604  Log() << " max(20, (N_training_events / N_variables^2 / 10)) " << Endl;
2605  Log() << "and can be changed by the user." << Endl;
2606  Log() << Endl;
2607  Log() << "The other crucial parameter, the pruning strength (\"PruneStrength\")," << Endl;
2608  Log() << "is also related to overtraining. It is a regularisation parameter " << Endl;
2609  Log() << "that is used when determining after the training which splits " << Endl;
2610  Log() << "are considered statistically insignificant and are removed. The" << Endl;
2611  Log() << "user is advised to carefully watch the BDT screen output for" << Endl;
2612  Log() << "the comparison between efficiencies obtained on the training and" << Endl;
2613  Log() << "the independent test sample. They should be equal within statistical" << Endl;
2614  Log() << "errors, in order to minimize statistical fluctuations in different samples." << Endl;
2615 }
2616 
2617 ////////////////////////////////////////////////////////////////////////////////
2618 /// make ROOT-independent C++ class for classifier response (classifier-specific implementation)
2619 
2620 void TMVA::MethodBDT::MakeClassSpecific( std::ostream& fout, const TString& className ) const
2621 {
2622  TString nodeName = className;
2623  nodeName.ReplaceAll("Read","");
2624  nodeName.Append("Node");
2625  // write BDT-specific classifier response
2626  fout << " std::vector<"<<nodeName<<"*> fForest; // i.e. root nodes of decision trees" << std::endl;
2627  fout << " std::vector<double> fBoostWeights; // the weights applied in the individual boosts" << std::endl;
2628  fout << "};" << std::endl << std::endl;
2629  fout << "double " << className << "::GetMvaValue__( const std::vector<double>& inputValues ) const" << std::endl;
2630  fout << "{" << std::endl;
2631  fout << " double myMVA = 0;" << std::endl;
2632  if (fDoPreselection){
2633  for (UInt_t ivar = 0; ivar< fIsLowBkgCut.size(); ivar++){
2634  if (fIsLowBkgCut[ivar]){
2635  fout << " if (inputValues["<<ivar<<"] < " << fLowBkgCut[ivar] << ") return -1; // is background preselection cut" << std::endl;
2636  }
2637  if (fIsLowSigCut[ivar]){
2638  fout << " if (inputValues["<<ivar<<"] < "<< fLowSigCut[ivar] << ") return 1; // is signal preselection cut" << std::endl;
2639  }
2640  if (fIsHighBkgCut[ivar]){
2641  fout << " if (inputValues["<<ivar<<"] > "<<fHighBkgCut[ivar] <<") return -1; // is background preselection cut" << std::endl;
2642  }
2643  if (fIsHighSigCut[ivar]){
2644  fout << " if (inputValues["<<ivar<<"] > "<<fHighSigCut[ivar]<<") return 1; // is signal preselection cut" << std::endl;
2645  }
2646  }
2647  }
2648 
2649  if (fBoostType!="Grad"){
2650  fout << " double norm = 0;" << std::endl;
2651  }
2652  fout << " for (unsigned int itree=0; itree<fForest.size(); itree++){" << std::endl;
2653  fout << " "<<nodeName<<" *current = fForest[itree];" << std::endl;
2654  fout << " while (current->GetNodeType() == 0) { //intermediate node" << std::endl;
2655  fout << " if (current->GoesRight(inputValues)) current=("<<nodeName<<"*)current->GetRight();" << std::endl;
2656  fout << " else current=("<<nodeName<<"*)current->GetLeft();" << std::endl;
2657  fout << " }" << std::endl;
2658  if (fBoostType=="Grad"){
2659  fout << " myMVA += current->GetResponse();" << std::endl;
2660  }else{
2661  if (fUseYesNoLeaf) fout << " myMVA += fBoostWeights[itree] * current->GetNodeType();" << std::endl;
2662  else fout << " myMVA += fBoostWeights[itree] * current->GetPurity();" << std::endl;
2663  fout << " norm += fBoostWeights[itree];" << std::endl;
2664  }
2665  fout << " }" << std::endl;
2666  if (fBoostType=="Grad"){
2667  fout << " return 2.0/(1.0+exp(-2.0*myMVA))-1.0;" << std::endl;
2668  }
2669  else fout << " return myMVA /= norm;" << std::endl;
2670  fout << "};" << std::endl << std::endl;
2671  fout << "void " << className << "::Initialize()" << std::endl;
2672  fout << "{" << std::endl;
2673  //Now for each decision tree, write directly the constructors of the nodes in the tree structure
2674  for (UInt_t itree=0; itree<GetNTrees(); itree++) {
2675  fout << " // itree = " << itree << std::endl;
2676  fout << " fBoostWeights.push_back(" << fBoostWeights[itree] << ");" << std::endl;
2677  fout << " fForest.push_back( " << std::endl;
2678  this->MakeClassInstantiateNode((DecisionTreeNode*)fForest[itree]->GetRoot(), fout, className);
2679  fout <<" );" << std::endl;
2680  }
2681  fout << " return;" << std::endl;
2682  fout << "};" << std::endl;
2683  fout << " " << std::endl;
2684  fout << "// Clean up" << std::endl;
2685  fout << "inline void " << className << "::Clear() " << std::endl;
2686  fout << "{" << std::endl;
2687  fout << " for (unsigned int itree=0; itree<fForest.size(); itree++) { " << std::endl;
2688  fout << " delete fForest[itree]; " << std::endl;
2689  fout << " }" << std::endl;
2690  fout << "}" << std::endl;
2691 }
2692 
2693 ////////////////////////////////////////////////////////////////////////////////
2694 /// specific class header
2695 
2696 void TMVA::MethodBDT::MakeClassSpecificHeader( std::ostream& fout, const TString& className) const
2697 {
2698  TString nodeName = className;
2699  nodeName.ReplaceAll("Read","");
2700  nodeName.Append("Node");
2701  //fout << "#ifndef NN" << std::endl; commented out on purpose see next line
2702  fout << "#define NN new "<<nodeName << std::endl; // NN definition depends on individual methods. Important to have NO #ifndef if several BDT methods compile together
2703  //fout << "#endif" << std::endl; commented out on purpose see previous line
2704  fout << " " << std::endl;
2705  fout << "#ifndef "<<nodeName<<"__def" << std::endl;
2706  fout << "#define "<<nodeName<<"__def" << std::endl;
2707  fout << " " << std::endl;
2708  fout << "class "<<nodeName<<" {" << std::endl;
2709  fout << " " << std::endl;
2710  fout << "public:" << std::endl;
2711  fout << " " << std::endl;
2712  fout << " // constructor of an essentially \"empty\" node floating in space" << std::endl;
2713  fout << " "<<nodeName<<" ( "<<nodeName<<"* left,"<<nodeName<<"* right," << std::endl;
2714  if (fUseFisherCuts){
2715  fout << " int nFisherCoeff," << std::endl;
2716  for (UInt_t i=0;i<GetNVariables()+1;i++){
2717  fout << " double fisherCoeff"<<i<<"," << std::endl;
2718  }
2719  }
2720  fout << " int selector, double cutValue, bool cutType, " << std::endl;
2721  fout << " int nodeType, double purity, double response ) :" << std::endl;
2722  fout << " fLeft ( left )," << std::endl;
2723  fout << " fRight ( right )," << std::endl;
2724  if (fUseFisherCuts) fout << " fNFisherCoeff ( nFisherCoeff )," << std::endl;
2725  fout << " fSelector ( selector )," << std::endl;
2726  fout << " fCutValue ( cutValue )," << std::endl;
2727  fout << " fCutType ( cutType )," << std::endl;
2728  fout << " fNodeType ( nodeType )," << std::endl;
2729  fout << " fPurity ( purity )," << std::endl;
2730  fout << " fResponse ( response ){" << std::endl;
2731  if (fUseFisherCuts){
2732  for (UInt_t i=0;i<GetNVariables()+1;i++){
2733  fout << " fFisherCoeff.push_back(fisherCoeff"<<i<<");" << std::endl;
2734  }
2735  }
2736  fout << " }" << std::endl << std::endl;
2737  fout << " virtual ~"<<nodeName<<"();" << std::endl << std::endl;
2738  fout << " // test event if it decends the tree at this node to the right" << std::endl;
2739  fout << " virtual bool GoesRight( const std::vector<double>& inputValues ) const;" << std::endl;
2740  fout << " "<<nodeName<<"* GetRight( void ) {return fRight; };" << std::endl << std::endl;
2741  fout << " // test event if it decends the tree at this node to the left " << std::endl;
2742  fout << " virtual bool GoesLeft ( const std::vector<double>& inputValues ) const;" << std::endl;
2743  fout << " "<<nodeName<<"* GetLeft( void ) { return fLeft; }; " << std::endl << std::endl;
2744  fout << " // return S/(S+B) (purity) at this node (from training)" << std::endl << std::endl;
2745  fout << " double GetPurity( void ) const { return fPurity; } " << std::endl;
2746  fout << " // return the node type" << std::endl;
2747  fout << " int GetNodeType( void ) const { return fNodeType; }" << std::endl;
2748  fout << " double GetResponse(void) const {return fResponse;}" << std::endl << std::endl;
2749  fout << "private:" << std::endl << std::endl;
2750  fout << " "<<nodeName<<"* fLeft; // pointer to the left daughter node" << std::endl;
2751  fout << " "<<nodeName<<"* fRight; // pointer to the right daughter node" << std::endl;
2752  if (fUseFisherCuts){
2753  fout << " int fNFisherCoeff; // =0 if this node doesn use fisher, else =nvar+1 " << std::endl;
2754  fout << " std::vector<double> fFisherCoeff; // the fisher coeff (offset at the last element)" << std::endl;
2755  }
2756  fout << " int fSelector; // index of variable used in node selection (decision tree) " << std::endl;
2757  fout << " double fCutValue; // cut value appplied on this node to discriminate bkg against sig" << std::endl;
2758  fout << " bool fCutType; // true: if event variable > cutValue ==> signal , false otherwise" << std::endl;
2759  fout << " int fNodeType; // Type of node: -1 == Bkg-leaf, 1 == Signal-leaf, 0 = internal " << std::endl;
2760  fout << " double fPurity; // Purity of node from training"<< std::endl;
2761  fout << " double fResponse; // Regression response value of node" << std::endl;
2762  fout << "}; " << std::endl;
2763  fout << " " << std::endl;
2764  fout << "//_______________________________________________________________________" << std::endl;
2765  fout << " "<<nodeName<<"::~"<<nodeName<<"()" << std::endl;
2766  fout << "{" << std::endl;
2767  fout << " if (fLeft != NULL) delete fLeft;" << std::endl;
2768  fout << " if (fRight != NULL) delete fRight;" << std::endl;
2769  fout << "}; " << std::endl;
2770  fout << " " << std::endl;
2771  fout << "//_______________________________________________________________________" << std::endl;
2772  fout << "bool "<<nodeName<<"::GoesRight( const std::vector<double>& inputValues ) const" << std::endl;
2773  fout << "{" << std::endl;
2774  fout << " // test event if it decends the tree at this node to the right" << std::endl;
2775  fout << " bool result;" << std::endl;
2776  if (fUseFisherCuts){
2777  fout << " if (fNFisherCoeff == 0){" << std::endl;
2778  fout << " result = (inputValues[fSelector] > fCutValue );" << std::endl;
2779  fout << " }else{" << std::endl;
2780  fout << " double fisher = fFisherCoeff.at(fFisherCoeff.size()-1);" << std::endl;
2781  fout << " for (unsigned int ivar=0; ivar<fFisherCoeff.size()-1; ivar++)" << std::endl;
2782  fout << " fisher += fFisherCoeff.at(ivar)*inputValues.at(ivar);" << std::endl;
2783  fout << " result = fisher > fCutValue;" << std::endl;
2784  fout << " }" << std::endl;
2785  }else{
2786  fout << " result = (inputValues[fSelector] > fCutValue );" << std::endl;
2787  }
2788  fout << " if (fCutType == true) return result; //the cuts are selecting Signal ;" << std::endl;
2789  fout << " else return !result;" << std::endl;
2790  fout << "}" << std::endl;
2791  fout << " " << std::endl;
2792  fout << "//_______________________________________________________________________" << std::endl;
2793  fout << "bool "<<nodeName<<"::GoesLeft( const std::vector<double>& inputValues ) const" << std::endl;
2794  fout << "{" << std::endl;
2795  fout << " // test event if it decends the tree at this node to the left" << std::endl;
2796  fout << " if (!this->GoesRight(inputValues)) return true;" << std::endl;
2797  fout << " else return false;" << std::endl;
2798  fout << "}" << std::endl;
2799  fout << " " << std::endl;
2800  fout << "#endif" << std::endl;
2801  fout << " " << std::endl;
2802 }
2803 
2804 ////////////////////////////////////////////////////////////////////////////////
2805 /// recursively descends a tree and writes the node instance to the output streem
2806 
2807 void TMVA::MethodBDT::MakeClassInstantiateNode( DecisionTreeNode *n, std::ostream& fout, const TString& className ) const
2808 {
2809  if (n == NULL) {
2810  Log() << kFATAL << "MakeClassInstantiateNode: started with undefined node" <<Endl;
2811  return ;
2812  }
2813  fout << "NN("<<std::endl;
2814  if (n->GetLeft() != NULL){
2815  this->MakeClassInstantiateNode( (DecisionTreeNode*)n->GetLeft() , fout, className);
2816  }
2817  else {
2818  fout << "0";
2819  }
2820  fout << ", " <<std::endl;
2821  if (n->GetRight() != NULL){
2822  this->MakeClassInstantiateNode( (DecisionTreeNode*)n->GetRight(), fout, className );
2823  }
2824  else {
2825  fout << "0";
2826  }
2827  fout << ", " << std::endl
2828  << std::setprecision(6);
2829  if (fUseFisherCuts){
2830  fout << n->GetNFisherCoeff() << ", ";
2831  for (UInt_t i=0; i< GetNVariables()+1; i++) {
2832  if (n->GetNFisherCoeff() == 0 ){
2833  fout << "0, ";
2834  }else{
2835  fout << n->GetFisherCoeff(i) << ", ";
2836  }
2837  }
2838  }
2839  fout << n->GetSelector() << ", "
2840  << n->GetCutValue() << ", "
2841  << n->GetCutType() << ", "
2842  << n->GetNodeType() << ", "
2843  << n->GetPurity() << ","
2844  << n->GetResponse() << ") ";
2845 }
2846 
2847 ////////////////////////////////////////////////////////////////////////////////
2848 /// find useful preselection cuts that will be applied before
2849 /// and Decision Tree training.. (and of course also applied
2850 /// in the GetMVA .. --> -1 for background +1 for Signal
2851 /// /*
2852 
2853 void TMVA::MethodBDT::DeterminePreselectionCuts(const std::vector<const TMVA::Event*>& eventSample)
2854 {
2855  Double_t nTotS = 0.0, nTotB = 0.0;
2856  Int_t nTotS_unWeighted = 0, nTotB_unWeighted = 0;
2857 
2858  std::vector<TMVA::BDTEventWrapper> bdtEventSample;
2859 
2860  fIsLowSigCut.assign(GetNvar(),kFALSE);
2861  fIsLowBkgCut.assign(GetNvar(),kFALSE);
2862  fIsHighSigCut.assign(GetNvar(),kFALSE);
2863  fIsHighBkgCut.assign(GetNvar(),kFALSE);
2864 
2865  fLowSigCut.assign(GetNvar(),0.); // ---------------| --> in var is signal (accept all above lower cut)
2866  fLowBkgCut.assign(GetNvar(),0.); // ---------------| --> in var is bkg (accept all above lower cut)
2867  fHighSigCut.assign(GetNvar(),0.); // <-- | -------------- in var is signal (accept all blow cut)
2868  fHighBkgCut.assign(GetNvar(),0.); // <-- | -------------- in var is blg (accept all blow cut)
2869 
2870 
2871  // Initialize (un)weighted counters for signal & background
2872  // Construct a list of event wrappers that point to the original data
2873  for( std::vector<const TMVA::Event*>::const_iterator it = eventSample.begin(); it != eventSample.end(); ++it ) {
2874  if (DataInfo().IsSignal(*it)){
2875  nTotS += (*it)->GetWeight();
2876  ++nTotS_unWeighted;
2877  }
2878  else {
2879  nTotB += (*it)->GetWeight();
2880  ++nTotB_unWeighted;
2881  }
2882  bdtEventSample.push_back(TMVA::BDTEventWrapper(*it));
2883  }
2884 
2885  for( UInt_t ivar = 0; ivar < GetNvar(); ivar++ ) { // loop over all discriminating variables
2886  TMVA::BDTEventWrapper::SetVarIndex(ivar); // select the variable to sort by
2887  std::sort( bdtEventSample.begin(),bdtEventSample.end() ); // sort the event data
2888 
2889  Double_t bkgWeightCtr = 0.0, sigWeightCtr = 0.0;
2890  std::vector<TMVA::BDTEventWrapper>::iterator it = bdtEventSample.begin(), it_end = bdtEventSample.end();
2891  for( ; it != it_end; ++it ) {
2892  if (DataInfo().IsSignal(**it))
2893  sigWeightCtr += (**it)->GetWeight();
2894  else
2895  bkgWeightCtr += (**it)->GetWeight();
2896  // Store the accumulated signal (background) weights
2897  it->SetCumulativeWeight(false,bkgWeightCtr);
2898  it->SetCumulativeWeight(true,sigWeightCtr);
2899  }
2900 
2901  //variable that determines how "exact" you cut on the preslection found in the training data. Here I chose
2902  //1% of the variable range...
2903  Double_t dVal = (DataInfo().GetVariableInfo(ivar).GetMax() - DataInfo().GetVariableInfo(ivar).GetMin())/100. ;
2904  Double_t nSelS, nSelB, effS=0.05, effB=0.05, rejS=0.05, rejB=0.05;
2905  Double_t tmpEffS, tmpEffB, tmpRejS, tmpRejB;
2906  // Locate the optimal cut for this (ivar-th) variable
2907 
2908 
2909 
2910  for(UInt_t iev = 1; iev < bdtEventSample.size(); iev++) {
2911  //dVal = bdtEventSample[iev].GetVal() - bdtEventSample[iev-1].GetVal();
2912 
2913  nSelS = bdtEventSample[iev].GetCumulativeWeight(true);
2914  nSelB = bdtEventSample[iev].GetCumulativeWeight(false);
2915  // you look for some 100% efficient pre-selection cut to remove background.. i.e. nSelS=0 && nSelB>5%nTotB or ( nSelB=0 nSelS>5%nTotS)
2916  tmpEffS=nSelS/nTotS;
2917  tmpEffB=nSelB/nTotB;
2918  tmpRejS=1-tmpEffS;
2919  tmpRejB=1-tmpEffB;
2920  if (nSelS==0 && tmpEffB>effB) {effB=tmpEffB; fLowBkgCut[ivar] = bdtEventSample[iev].GetVal() - dVal; fIsLowBkgCut[ivar]=kTRUE;}
2921  else if (nSelB==0 && tmpEffS>effS) {effS=tmpEffS; fLowSigCut[ivar] = bdtEventSample[iev].GetVal() - dVal; fIsLowSigCut[ivar]=kTRUE;}
2922  else if (nSelB==nTotB && tmpRejS>rejS) {rejS=tmpRejS; fHighSigCut[ivar] = bdtEventSample[iev].GetVal() + dVal; fIsHighSigCut[ivar]=kTRUE;}
2923  else if (nSelS==nTotS && tmpRejB>rejB) {rejB=tmpRejB; fHighBkgCut[ivar] = bdtEventSample[iev].GetVal() + dVal; fIsHighBkgCut[ivar]=kTRUE;}
2924 
2925  }
2926  }
2927 
2928  Log() << kDEBUG << " \tfound and suggest the following possible pre-selection cuts " << Endl;
2929  if (fDoPreselection) Log() << kDEBUG << "\tthe training will be done after these cuts... and GetMVA value returns +1, (-1) for a signal (bkg) event that passes these cuts" << Endl;
2930  else Log() << kDEBUG << "\tas option DoPreselection was not used, these cuts however will not be performed, but the training will see the full sample"<<Endl;
2931  for (UInt_t ivar=0; ivar < GetNvar(); ivar++ ) { // loop over all discriminating variables
2932  if (fIsLowBkgCut[ivar]){
2933  Log() << kDEBUG << " \tfound cut: Bkg if var " << ivar << " < " << fLowBkgCut[ivar] << Endl;
2934  }
2935  if (fIsLowSigCut[ivar]){
2936  Log() << kDEBUG << " \tfound cut: Sig if var " << ivar << " < " << fLowSigCut[ivar] << Endl;
2937  }
2938  if (fIsHighBkgCut[ivar]){
2939  Log() << kDEBUG << " \tfound cut: Bkg if var " << ivar << " > " << fHighBkgCut[ivar] << Endl;
2940  }
2941  if (fIsHighSigCut[ivar]){
2942  Log() << kDEBUG << " \tfound cut: Sig if var " << ivar << " > " << fHighSigCut[ivar] << Endl;
2943  }
2944  }
2945 
2946  return;
2947 }
2948 
2949 ////////////////////////////////////////////////////////////////////////////////
2950 /// aply the preselection cuts before even bothing about any
2951 /// Decision Trees in the GetMVA .. --> -1 for background +1 for Signal
2952 
2954 {
2955  Double_t result=0;
2956 
2957  for (UInt_t ivar=0; ivar < GetNvar(); ivar++ ) { // loop over all discriminating variables
2958  if (fIsLowBkgCut[ivar]){
2959  if (ev->GetValue(ivar) < fLowBkgCut[ivar]) result = -1; // is background
2960  }
2961  if (fIsLowSigCut[ivar]){
2962  if (ev->GetValue(ivar) < fLowSigCut[ivar]) result = 1; // is signal
2963  }
2964  if (fIsHighBkgCut[ivar]){
2965  if (ev->GetValue(ivar) > fHighBkgCut[ivar]) result = -1; // is background
2966  }
2967  if (fIsHighSigCut[ivar]){
2968  if (ev->GetValue(ivar) > fHighSigCut[ivar]) result = 1; // is signal
2969  }
2970  }
2971 
2972  return result;
2973 }
2974 
Bool_t fUseYesNoLeaf
Definition: MethodBDT.h:238
Double_t AdaCost(std::vector< const TMVA::Event * > &, DecisionTree *dt)
the AdaCost boosting algorithm takes a simple cost Matrix (currently fixed for all events...
Definition: MethodBDT.cxx:1894
Types::EAnalysisType fAnalysisType
Definition: MethodBase.h:589
void Train(void)
BDT training.
Definition: MethodBDT.cxx:1134
virtual const char * GetTitle() const
Returns title of object.
Definition: TNamed.h:52
void PreProcessNegativeEventWeights()
o.k.
Definition: MethodBDT.cxx:921
Double_t AdaBoostR2(std::vector< const TMVA::Event * > &, DecisionTree *dt)
adaption of the AdaBoost to regression problems (see H.Drucker 1997)
Definition: MethodBDT.cxx:2066
virtual Int_t Fill(Double_t x)
Increment bin with abscissa X by 1.
Definition: TH1.cxx:3127
double dist(Rotation3D const &r1, Rotation3D const &r2)
Definition: 3DDistances.cxx:48
static long int sum(long int i)
Definition: Factory.cxx:1785
Long64_t GetNTestEvents() const
Definition: DataSet.h:94
virtual Double_t Fit(std::vector< LossFunctionEventInfo > &evs)=0
Random number generator class based on M.
Definition: TRandom3.h:29
THist< 1, int, THistStatContent > TH1I
Definition: THist.hxx:304
virtual Double_t PoissonD(Double_t mean)
Generates a random number according to a Poisson law.
Definition: TRandom.cxx:414
MsgLogger & Endl(MsgLogger &ml)
Definition: MsgLogger.h:162
Double_t Boost(std::vector< const TMVA::Event * > &, DecisionTree *dt, UInt_t cls=0)
apply the boosting alogrithim (the algorithm is selecte via the the "option" given in the constructor...
Definition: MethodBDT.cxx:1585
TH1 * GetHist(const TString &alias) const
Definition: Results.cxx:127
long long Long64_t
Definition: RtypesCore.h:69
void WriteMonitoringHistosToFile(void) const
Here we could write some histograms created during the processing to the output file.
Definition: MethodBDT.cxx:2488
std::vector< Bool_t > fIsLowSigCut
Definition: MethodBDT.h:288
void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility
Definition: MethodBDT.cxx:444
std::map< const TMVA::Event *, LossFunctionEventInfo > fLossFunctionEventInfo
Definition: MethodBDT.h:223
Bool_t fPairNegWeightsGlobal
Definition: MethodBDT.h:257
void AddPoint(Double_t x, Double_t y1, Double_t y2)
This function is used only in 2 TGraph case, and it will add new data points to graphs.
Definition: MethodBase.cxx:206
void SetUseNvars(Int_t n)
Definition: MethodBDT.h:141
Bool_t fRandomisedTrees
Definition: MethodBDT.h:248
#define REGISTER_METHOD(CLASS)
for example
TString fMinNodeSizeS
Definition: MethodBDT.h:232
void AddWeightsXMLTo(void *parent) const
write weights to XML
Definition: MethodBDT.cxx:2183
Double_t Log(Double_t x)
Definition: TMath.h:526
Double_t GradBoost(std::vector< const TMVA::Event * > &, DecisionTree *dt, UInt_t cls=0)
Calculate the desired response value for each region.
Definition: MethodBDT.cxx:1470
const Ranking * CreateRanking()
Compute ranking of input variables.
Definition: MethodBDT.cxx:2543
virtual void Delete(Option_t *option="")
Delete this tree from memory or/and disk.
Definition: TTree.cxx:3562
virtual void SetTargets(std::vector< const TMVA::Event * > &evs, std::map< const TMVA::Event *, LossFunctionEventInfo > &evinfomap)=0
void MakeClassSpecificHeader(std::ostream &, const TString &) const
specific class header
Definition: MethodBDT.cxx:2696
float Float_t
Definition: RtypesCore.h:53
TString fPruneMethodS
Definition: MethodBDT.h:244
TString fSepTypeS
Definition: MethodBDT.h:229
void BDT(TString dataset, const TString &fin="TMVA.root")
Definition: BDT.cxx:372
Double_t GetMin() const
Definition: VariableInfo.h:71
TString & ReplaceAll(const TString &s1, const TString &s2)
Definition: TString.h:635
const char * GetName() const
Definition: MethodBase.h:330
TTree * fMonitorNtuple
Definition: MethodBDT.h:263
Bool_t IgnoreEventsWithNegWeightsInTraining() const
Definition: MethodBase.h:680
std::vector< TMatrixDSym * > * CalcCovarianceMatrices(const std::vector< Event * > &events, Int_t maxCls, VariableTransformBase *transformBase=0)
compute covariance matrices
Definition: Tools.cxx:1522
virtual Int_t Fill()
Fill all branches.
Definition: TTree.cxx:4374
virtual void SetName(const char *name)
Set the name of the TNamed.
Definition: TNamed.cxx:131
UInt_t GetNClasses() const
Definition: DataSetInfo.h:154
THist< 1, float, THistStatContent, THistStatUncertainty > TH1F
Definition: THist.hxx:302
TH1 * h
Definition: legend2.C:5
Double_t fAdaBoostBeta
Definition: MethodBDT.h:215
std::vector< Bool_t > fIsHighSigCut
Definition: MethodBDT.h:290
OptionBase * DeclareOptionRef(T &ref, const TString &name, const TString &desc="")
void DeclareOptions()
define the options (their key words) that can be set in the option string know options: nTrees number...
Definition: MethodBDT.cxx:323
std::vector< Double_t > fVariableImportance
Definition: MethodBDT.h:277
Double_t Atof() const
Return floating-point value contained in string.
Definition: TString.cxx:2031
UInt_t GetNFisherCoeff() const
const Event * GetTestingEvent(Long64_t ievt) const
Definition: MethodBase.h:771
UInt_t GetNvar() const
Definition: MethodBase.h:340
DataSet * Data() const
Definition: MethodBase.h:405
EAnalysisType
Definition: Types.h:128
Double_t fMinLinCorrForFisher
Definition: MethodBDT.h:236
std::vector< const TMVA::Event * > fEventSample
Definition: MethodBDT.h:205
virtual DecisionTreeNode * GetRight() const
TMVA::DecisionTreeNode * GetEventNode(const TMVA::Event &e) const
get the pointer to the leaf node where a particular event ends up in...
Double_t Bagging()
call it boot-strapping, re-sampling or whatever you like, in the end it is nothing else but applying ...
Definition: MethodBDT.cxx:2012
void DrawProgressBar(Int_t, const TString &comment="")
draws progress bar in color or B&W caution:
Definition: Timer.cxx:186
bool fExitFromTraining
Definition: MethodBase.h:443
Bool_t fBaggedGradBoost
Definition: MethodBDT.h:220
Bool_t fDoBoostMonitor
Definition: MethodBDT.h:259
Bool_t IsNormalised() const
Definition: MethodBase.h:490
Double_t GetMax() const
Definition: VariableInfo.h:72
Basic string class.
Definition: TString.h:137
tomato 1-D histogram with a float per channel (see TH1 documentation)}
Definition: TH1.h:575
TransformationHandler & GetTransformationHandler(Bool_t takeReroutedIfAvailable=true)
Definition: MethodBase.h:390
Short_t Min(Short_t a, Short_t b)
Definition: TMathBase.h:170
void ToLower()
Change string to lower-case.
Definition: TString.cxx:1089
Int_t GetNodeType(void) const
int Int_t
Definition: RtypesCore.h:41
virtual void SetYTitle(const char *title)
Definition: TH1.h:414
bool Bool_t
Definition: RtypesCore.h:59
const Bool_t kFALSE
Definition: Rtypes.h:92
virtual void SetTitle(const char *title="")
Set graph title.
Definition: TGraph.cxx:2176
void DeterminePreselectionCuts(const std::vector< const TMVA::Event * > &eventSample)
find useful preselection cuts that will be applied before and Decision Tree training.
Definition: MethodBDT.cxx:2853
void ProcessOptions()
the option string is decoded, for available options see "DeclareOptions"
Definition: MethodBDT.cxx:463
void UpdateTargetsRegression(std::vector< const TMVA::Event * > &, Bool_t first=kFALSE)
Calculate current residuals for all events and update targets for next iteration. ...
Definition: MethodBDT.cxx:1456
Bool_t fBaggedBoost
Definition: MethodBDT.h:219
Int_t FloorNint(Double_t x)
Definition: TMath.h:476
virtual DecisionTreeNode * GetLeft() const
std::vector< Bool_t > fIsHighBkgCut
Definition: MethodBDT.h:291
void SetShrinkage(Double_t s)
Definition: MethodBDT.h:140
Bool_t fAutomatic
Definition: MethodBDT.h:247
Int_t GetN() const
Definition: TGraph.h:133
TString GetElapsedTime(Bool_t Scientific=kTRUE)
Definition: Timer.cxx:129
void AddAttr(void *node, const char *, const T &value, Int_t precision=16)
Definition: Tools.h:309
Double_t fCts_sb
Definition: MethodBDT.h:269
virtual const char * GetPath() const
Returns the full path of the directory.
Definition: TDirectory.cxx:911
void * AddChild(void *parent, const char *childname, const char *content=0, bool isRootNode=false)
add child node
Definition: Tools.cxx:1134
Short_t Abs(Short_t d)
Definition: TMathBase.h:110
TString fRegressionLossFunctionBDTGS
Definition: MethodBDT.h:295
Double_t fBoostWeight
Definition: MethodBDT.h:265
Double_t GetMvaValue(Double_t *err=0, Double_t *errUpper=0)
Definition: MethodBDT.cxx:2316
LongDouble_t Power(LongDouble_t x, LongDouble_t y)
Definition: TMath.h:501
const TString & GetMethodName() const
Definition: MethodBase.h:327
Bool_t IsConstructedFromWeightFile() const
Definition: MethodBase.h:534
Float_t GetValue(UInt_t ivar) const
return value of i&#39;th variable
Definition: Event.cxx:233
void GetHelpMessage() const
Get help message text.
Definition: MethodBDT.cxx:2563
UInt_t fSignalClass
Definition: MethodBase.h:683
Double_t fPruneStrength
Definition: MethodBDT.h:245
std::vector< Double_t > fHighBkgCut
Definition: MethodBDT.h:286
Double_t GetGradBoostMVA(const TMVA::Event *e, UInt_t nTrees)
returns MVA value: -1 for background, 1 for signal
Definition: MethodBDT.cxx:1409
Types::EAnalysisType GetAnalysisType() const
Definition: MethodBase.h:433
TClass * GetClass(T *)
Definition: TClass.h:555
Tools & gTools()
Definition: Tools.cxx:79
Double_t fBaggedSampleFraction
Definition: MethodBDT.h:253
Bool_t fInverseBoostNegWeights
Definition: MethodBDT.h:256
TStopwatch timer
Definition: pirndm.C:37
UInt_t GetNTrees() const
Definition: MethodBDT.h:112
virtual void SetTuneParameters(std::map< TString, Double_t > tuneParameters)
set the tuning parameters accoding to the argument
Definition: MethodBDT.cxx:1112
virtual Double_t Determinant() const
Bool_t IsSignal(const Event *ev) const
Float_t GetPurity(void) const
Bool_t GetCutType(void) const
Double_t fSigToBkgFraction
Definition: MethodBDT.h:213
virtual Bool_t HasAnalysisType(Types::EAnalysisType type, UInt_t numberClasses, UInt_t numberTargets)
BDT can handle classification with multiple classes and regression with one regression-target.
Definition: MethodBDT.cxx:275
Bool_t fDoPreselection
Definition: MethodBDT.h:273
void * GetChild(void *parent, const char *childname=0)
get child node
Definition: Tools.cxx:1158
void Reset(void)
reset the method, as if it had just been instantiated (forget all training etc.)
Definition: MethodBDT.cxx:718
Bool_t DoMulticlass() const
Definition: MethodBase.h:435
TString & Append(const char *cs)
Definition: TString.h:492
Double_t RegBoost(std::vector< const TMVA::Event * > &, DecisionTree *dt)
a special boosting only for Regression ...
Definition: MethodBDT.cxx:2058
void SetMinNodeSize(Double_t sizeInPercent)
Definition: MethodBDT.cxx:652
TString fAdaBoostR2Loss
Definition: MethodBDT.h:216
void Init(std::vector< TString > &graphTitles)
This function gets some title and it creates a TGraph for every title.
Definition: MethodBase.cxx:171
Int_t fMinNodeEvents
Definition: MethodBDT.h:230
void SetNTrees(Int_t d)
Definition: MethodBDT.h:137
std::vector< Double_t > fHighSigCut
Definition: MethodBDT.h:285
Definition: PDF.h:71
void MakeClassInstantiateNode(DecisionTreeNode *n, std::ostream &fout, const TString &className) const
recursively descends a tree and writes the node instance to the output streem
Definition: MethodBDT.cxx:2807
Double_t fCtb_ss
Definition: MethodBDT.h:270
const std::vector< Float_t > & GetMulticlassValues()
get the multiclass MVA response for the BDT classifier
Definition: MethodBDT.cxx:2367
Double_t GradBoostRegression(std::vector< const TMVA::Event * > &, DecisionTree *dt)
Implementation of M_TreeBoost using any loss function as desribed by Friedman 1999.
Definition: MethodBDT.cxx:1501
UInt_t fIPyCurrentIter
Definition: MethodBase.h:444
Float_t fMinNodeSize
Definition: MethodBDT.h:231
void InitGradBoost(std::vector< const TMVA::Event * > &)
initialize targets for first tree
Definition: MethodBDT.cxx:1526
Double_t CheckEvent(const TMVA::Event *, Bool_t UseYesNoLeaf=kFALSE) const
the event e is put into the decision tree (starting at the root node) and the output is NodeType (sig...
Bool_t fNoNegWeightsInTraining
Definition: MethodBDT.h:255
UInt_t GetNVariables() const
Definition: MethodBase.h:341
const std::vector< Float_t > & GetRegressionValues()
get the regression value generated by the BDTs
Definition: MethodBDT.cxx:2402
void InitEventSample()
initialize the event sample (i.e. reset the boost-weights... etc)
Definition: MethodBDT.cxx:755
std::vector< Bool_t > fIsLowBkgCut
Definition: MethodBDT.h:289
virtual void Delete(Option_t *option="")
Delete this object.
Definition: TObject.cxx:229
Bool_t HasTrainingTree() const
Definition: MethodBase.h:507
Double_t fErrorFraction
Definition: MethodBDT.h:266
const Event * GetTrainingEvent(Long64_t ievt) const
Definition: MethodBase.h:765
VecExpr< UnaryOp< Fabs< T >, VecExpr< A, T, D >, T >, T, D > fabs(const VecExpr< A, T, D > &rhs)
virtual Int_t Write(const char *name=0, Int_t option=0, Int_t bufsize=0)
Write this object to the current directory.
Definition: TTree.cxx:9001
Results * GetResults(const TString &, Types::ETreeType type, Types::EAnalysisType analysistype)
TString info(resultsName+"/"); switch(type) { case Types::kTraining: info += "kTraining/"; break; cas...
Definition: DataSet.cxx:286
std::vector< Double_t > fLowBkgCut
Definition: MethodBDT.h:284
TRandom2 r(17)
Double_t fNodePurityLimit
Definition: MethodBDT.h:239
Service class for 2-Dim histogram classes.
Definition: TH2.h:36
Bool_t fHistoricBool
Definition: MethodBDT.h:293
void SetBaggedSampleFraction(Double_t f)
Definition: MethodBDT.h:142
virtual TString Name()=0
std::map< TString, Double_t > optimize()
void BoostMonitor(Int_t iTree)
fills the ROCIntegral vs Itree from the testSample for the monitoring plots during the training ...
Definition: MethodBDT.cxx:1619
Double_t GetFisherCoeff(Int_t ivar) const
Bool_t fTrainWithNegWeights
Definition: MethodBDT.h:258
Bool_t fSkipNormalization
Definition: MethodBDT.h:275
ClassInfo * GetClassInfo(Int_t clNum) const
void DeleteResults(const TString &, Types::ETreeType type, Types::EAnalysisType analysistype)
delete the results stored for this particulary Method instance (here appareantly called resultsName i...
Definition: DataSet.cxx:337
virtual ~MethodBDT(void)
destructor Note: fEventSample and ValidationSample are already deleted at the end of TRAIN When they ...
Definition: MethodBDT.cxx:747
virtual void SetBinContent(Int_t bin, Double_t content)
Set bin content see convention for numbering bins in TH1::GetBin In case the bin number is greater th...
Definition: TH1.cxx:8280
void SetNodePurityLimit(Double_t l)
Definition: MethodBDT.h:139
UInt_t fIPyMaxIter
Definition: MethodBase.h:444
Double_t PrivateGetMvaValue(const TMVA::Event *ev, Double_t *err=0, Double_t *errUpper=0, UInt_t useNTrees=0)
Return the MVA value (range [-1;1]) that classifies the event according to the majority vote from the...
Definition: MethodBDT.cxx:2340
unsigned int UInt_t
Definition: RtypesCore.h:42
const Event * GetEvent() const
Definition: MethodBase.h:745
char * Form(const char *fmt,...)
Double_t E()
Definition: TMath.h:54
double floor(double)
void SetTarget(UInt_t itgt, Float_t value)
set the target value (dimension itgt) to value
Definition: Event.cxx:356
Double_t fCbb
Definition: MethodBDT.h:271
void ReadAttr(void *node, const char *, T &value)
Definition: Tools.h:296
SeparationBase * fSepType
Definition: MethodBDT.h:228
void Init(void)
common initialisation with defaults for the BDT-Method
Definition: MethodBDT.cxx:680
void ReadWeightsFromXML(void *parent)
reads the BDT from the xml file
Definition: MethodBDT.cxx:2214
virtual void Print(Option_t *option="") const
Print TNamed name and title.
Definition: TNamed.cxx:119
Bool_t fUseExclusiveVars
Definition: MethodBDT.h:237
TGraphErrors * gr
Definition: legend1.C:25
REAL epsilon
Definition: triangle.c:617
Double_t AdaBoost(std::vector< const TMVA::Event * > &, DecisionTree *dt)
the AdaBoost implementation.
Definition: MethodBDT.cxx:1713
Double_t TestTreeQuality(DecisionTree *dt)
test the tree quality.. in terms of Miscalssification
Definition: MethodBDT.cxx:1564
DecisionTree::EPruneMethod fPruneMethod
Definition: MethodBDT.h:243
static void SetVarIndex(Int_t iVar)
UInt_t fUseNvars
Definition: MethodBDT.h:249
UInt_t fMaxDepth
Definition: MethodBDT.h:241
void SetCurrentType(Types::ETreeType type) const
Definition: DataSet.h:114
TGraph * GetGraph(const TString &alias) const
Definition: Results.cxx:144
Double_t Exp(Double_t x)
Definition: TMath.h:495
#define ClassImp(name)
Definition: Rtypes.h:279
void ReadWeightsFromStream(std::istream &istr)
read the weights (BDT coefficients)
Definition: MethodBDT.cxx:2281
double Double_t
Definition: RtypesCore.h:55
Double_t ApplyPreselectionCuts(const Event *ev)
aply the preselection cuts before even bothing about any Decision Trees in the GetMVA ...
Definition: MethodBDT.cxx:2953
void UpdateTargets(std::vector< const TMVA::Event * > &, UInt_t cls=0)
Calculate residua for all events;.
Definition: MethodBDT.cxx:1423
std::vector< Float_t > * fMulticlassReturnVal
Definition: MethodBase.h:592
virtual Double_t GetROCIntegral(TH1D *histS, TH1D *histB) const
calculate the area (integral) under the ROC curve as a overall quality measure of the classification ...
void SetMaxDepth(Int_t d)
Definition: MethodBDT.h:133
int type
Definition: TGX11.cxx:120
Double_t fShrinkage
Definition: MethodBDT.h:218
Bool_t fUsePoissonNvars
Definition: MethodBDT.h:250
TDirectory * BaseDir() const
returns the ROOT directory where info/histograms etc of the corresponding MVA method instance are sto...
static DecisionTree * CreateFromXML(void *node, UInt_t tmva_Version_Code=TMVA_VERSION_CODE)
re-create a new tree (decision tree or search tree) from XML
UInt_t GetNNodes() const
Definition: BinaryTree.h:92
static RooMathCoreReg dummy
void * GetNextChild(void *prevchild, const char *childname=0)
XML helpers.
Definition: Tools.cxx:1170
void SetAdaBoostBeta(Double_t b)
Definition: MethodBDT.h:138
Bool_t IsFloat() const
Returns kTRUE if string contains a floating point or integer number.
Definition: TString.cxx:1835
MsgLogger & Log() const
Definition: Configurable.h:128
The TH1 histogram class.
Definition: TH1.h:80
std::vector< const TMVA::Event * > * fTrainSample
Definition: MethodBDT.h:208
you should not use this method at all Int_t Int_t Double_t Double_t Double_t e
Definition: TRolke.cxx:630
DataSetInfo & DataInfo() const
Definition: MethodBase.h:406
void UsefulSortAscending(std::vector< std::vector< Double_t > > &, std::vector< TString > *vs=0)
sort 2D vector (AND in parallel a TString vector) in such a way that the "first vector is sorted" and...
Definition: Tools.cxx:547
VariableInfo & GetVariableInfo(Int_t i)
Definition: DataSetInfo.h:114
void AddPreDefVal(const T &)
Definition: Configurable.h:174
const TString & GetInputLabel(Int_t i) const
Definition: MethodBase.h:346
void ExitFromTraining()
Definition: MethodBase.h:458
LossFunctionBDT * fRegressionLossFunctionBDTG
Definition: MethodBDT.h:298
TString fBoostType
Definition: MethodBDT.h:214
Bool_t fUseFisherCuts
Definition: MethodBDT.h:235
const TString & Color(const TString &)
human readable color strings
Definition: Tools.cxx:837
TMatrixTSym< Element > & Invert(Double_t *det=0)
Invert the matrix and calculate its determinant Notice that the LU decomposition is used instead of B...
Float_t GetTarget(UInt_t itgt) const
Definition: Event.h:104
UInt_t fUseNTrainEvents
Definition: MethodBDT.h:251
virtual std::map< TString, Double_t > OptimizeTuningParameters(TString fomType="ROCIntegral", TString fitType="FitGA")
call the Optimzier with the set of paremeters and ranges that are meant to be tuned.
Definition: MethodBDT.cxx:1059
virtual Int_t Branch(TCollection *list, Int_t bufsize=32000, Int_t splitlevel=99, const char *name="")
Create one branch for each element in the collection.
Definition: TTree.cxx:1651
Short_t GetSelector() const
Bool_t DoRegression() const
Definition: MethodBase.h:434
TString fNegWeightTreatment
Definition: MethodBDT.h:254
Abstract ClassifierFactory template that handles arbitrary types.
Ranking * fRanking
Definition: MethodBase.h:581
void GetBaggedSubSample(std::vector< const TMVA::Event * > &)
fills fEventSample with fBaggedSampleFraction*NEvents random training events
Definition: MethodBDT.cxx:2023
virtual void SetXTitle(const char *title)
Definition: TH1.h:413
virtual void SetPoint(Int_t i, Double_t x, Double_t y)
Set x and y values for point number i.
Definition: TGraph.cxx:2150
IPythonInteractive * fInteractive
Definition: MethodBase.h:442
virtual void AddRank(const Rank &rank)
Add a new rank take ownership of it.
Definition: Ranking.cxx:86
UInt_t GetTrainingTMVAVersionCode() const
Definition: MethodBase.h:385
virtual void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
Definition: MethodBase.cxx:590
Short_t Max(Short_t a, Short_t b)
Definition: TMathBase.h:202
const TString & GetOptions() const
Definition: Configurable.h:90
double ceil(double)
A Graph is a graphics object made of two arrays X and Y with npoints each.
Definition: TGraph.h:53
std::vector< const TMVA::Event * > fValidationSample
Definition: MethodBDT.h:206
#define NULL
Definition: Rtypes.h:82
std::vector< DecisionTree * > fForest
Definition: MethodBDT.h:211
std::vector< Double_t > GetVariableImportance()
Return the relative variable importance, normalized to all variables together having the importance 1...
Definition: MethodBDT.cxx:2503
Double_t fFValidationEvents
Definition: MethodBDT.h:246
std::vector< Double_t > fLowSigCut
Definition: MethodBDT.h:283
std::vector< Float_t > * fRegressionReturnVal
Definition: MethodBase.h:591
Long64_t GetNTrainingEvents() const
Definition: DataSet.h:93
A TTree object has a header with a name and a title.
Definition: TTree.h:98
double result[121]
std::map< const TMVA::Event *, std::vector< double > > fResiduals
Definition: MethodBDT.h:225
Definition: first.py:1
void Store(TObject *obj, const char *alias=0)
Definition: Results.cxx:83
virtual void Init(std::map< const TMVA::Event *, LossFunctionEventInfo > &evinfomap, std::vector< double > &boostWeights)=0
Double_t Sqrt(Double_t x)
Definition: TMath.h:464
virtual void Set(Int_t n)
Set number of points in the graph Existing coordinates are preserved New coordinates above fNpoints a...
Definition: TGraph.cxx:2099
double exp(double)
const Bool_t kTRUE
Definition: Rtypes.h:91
THist< 2, float, THistStatContent, THistStatUncertainty > TH2F
Definition: THist.hxx:308
UInt_t GetNumber() const
Definition: ClassInfo.h:73
Double_t fCss
Definition: MethodBDT.h:268
double norm(double *x, double *p)
Definition: unuranDistr.cxx:40
return
Definition: HLFactory.cxx:514
const Int_t n
Definition: legend1.C:16
Float_t GetResponse(void) const
std::vector< const TMVA::Event * > fSubSample
Definition: MethodBDT.h:207
void MakeClassSpecific(std::ostream &, const TString &) const
make ROOT-independent C++ class for classifier response (classifier-specific implementation) ...
Definition: MethodBDT.cxx:2620
Int_t CeilNint(Double_t x)
Definition: TMath.h:470
Double_t fHuberQuantile
Definition: MethodBDT.h:296
UInt_t fNNodesMax
Definition: MethodBDT.h:240
void NoErrorCalc(Double_t *const err, Double_t *const errUpper)
Definition: MethodBase.cxx:819
void SetSignalReferenceCut(Double_t cut)
Definition: MethodBase.h:360
std::vector< double > fBoostWeights
Definition: MethodBDT.h:212
Float_t GetCutValue(void) const
const Event * InverseTransform(const Event *, Bool_t suppressIfNoTargets=true) const
MethodBDT(const TString &jobName, const TString &methodTitle, DataSetInfo &theData, const TString &theOption="")
the standard constructor for the "boosted decision trees"
Definition: MethodBDT.cxx:160