Logo ROOT   6.10/09
Reference Guide
MethodDT.cxx
Go to the documentation of this file.
1 // @(#)root/tmva $Id$
2 // Author: Andreas Hoecker, Joerg Stelzer, Helge Voss, Kai Voss
3 
4 /**********************************************************************************
5  * Project: TMVA - a Root-integrated toolkit for multivariate data analysis *
6  * Package: TMVA *
7  * Class : MethodDT (DT = Decision Trees) *
8  * Web : http://tmva.sourceforge.net *
9  * *
10  * Description: *
11  * Analysis of Boosted Decision Trees *
12  * *
13  * Authors (alphabetical): *
14  * Andreas Hoecker <Andreas.Hocker@cern.ch> - CERN, Switzerland *
15  * Helge Voss <Helge.Voss@cern.ch> - MPI-K Heidelberg, Germany *
16  * Or Cohen <orcohenor@gmail.com> - Weizmann Inst., Israel *
17  * *
18  * Copyright (c) 2005: *
19  * CERN, Switzerland *
20  * MPI-K Heidelberg, Germany *
21  * *
22  * Redistribution and use in source and binary forms, with or without *
23  * modification, are permitted according to the terms listed in LICENSE *
24  * (http://tmva.sourceforge.net/LICENSE) *
25  **********************************************************************************/
26 
27 /*! \class TMVA::MethodDT
28 \ingroup TMVA
29 
30 Analysis of Boosted Decision Trees
31 
32 Boosted decision trees have been successfully used in High Energy
33 Physics analysis for example by the MiniBooNE experiment
34 (Yang-Roe-Zhu, physics/0508045). In Boosted Decision Trees, the
35 selection is done on a majority vote on the result of several decision
36 trees, which are all derived from the same training sample by
37 supplying different event weights during the training.
38 
39 ### Decision trees:
40 
41 successive decision nodes are used to categorize the
42 events out of the sample as either signal or background. Each node
43 uses only a single discriminating variable to decide if the event is
44 signal-like ("goes right") or background-like ("goes left"). This
45 forms a tree like structure with "baskets" at the end (leave nodes),
46 and an event is classified as either signal or background according to
47 whether the basket where it ends up has been classified signal or
48 background during the training. Training of a decision tree is the
49 process to define the "cut criteria" for each node. The training
50 starts with the root node. Here one takes the full training event
51 sample and selects the variable and corresponding cut value that gives
52 the best separation between signal and background at this stage. Using
53 this cut criterion, the sample is then divided into two subsamples, a
54 signal-like (right) and a background-like (left) sample. Two new nodes
55 are then created for each of the two sub-samples and they are
56 constructed using the same mechanism as described for the root
57 node. The devision is stopped once a certain node has reached either a
58 minimum number of events, or a minimum or maximum signal purity. These
59 leave nodes are then called "signal" or "background" if they contain
60 more signal respective background events from the training sample.
61 
62 ### Boosting:
63 
64 the idea behind the boosting is, that signal events from the training
65 sample, that *end up in a background node (and vice versa) are given a
66 larger weight than events that are in the correct leave node. This
67 results in a re-weighed training event sample, with which then a new
68 decision tree can be developed. The boosting can be applied several
69 times (typically 100-500 times) and one ends up with a set of decision
70 trees (a forest).
71 
72 ### Bagging:
73 
74 In this particular variant of the Boosted Decision Trees the boosting
75 is not done on the basis of previous training results, but by a simple
76 stochastic re-sampling of the initial training event sample.
77 
78 ### Analysis:
79 
80 applying an individual decision tree to a test event results in a
81 classification of the event as either signal or background. For the
82 boosted decision tree selection, an event is successively subjected to
83 the whole set of decision trees and depending on how often it is
84 classified as signal, a "likelihood" estimator is constructed for the
85 event being signal or background. The value of this estimator is the
86 one which is then used to select the events from an event sample, and
87 the cut value on this estimator defines the efficiency and purity of
88 the selection.
89 */
90 
91 #include "TMVA/MethodDT.h"
92 
93 #include "TMVA/BinarySearchTree.h"
94 #include "TMVA/CCPruner.h"
95 #include "TMVA/ClassifierFactory.h"
96 #include "TMVA/Configurable.h"
97 #include "TMVA/CrossEntropy.h"
98 #include "TMVA/DataSet.h"
99 #include "TMVA/DecisionTree.h"
100 #include "TMVA/GiniIndex.h"
101 #include "TMVA/IMethod.h"
102 #include "TMVA/MethodBase.h"
103 #include "TMVA/MethodBoost.h"
105 #include "TMVA/MsgLogger.h"
106 #include "TMVA/Ranking.h"
107 #include "TMVA/SdivSqrtSplusB.h"
108 #include "TMVA/SeparationBase.h"
109 #include "TMVA/Timer.h"
110 #include "TMVA/Tools.h"
111 #include "TMVA/Types.h"
112 
113 #include "Riostream.h"
114 #include "TRandom3.h"
115 #include "TMath.h"
116 #include "TObjString.h"
117 
118 #include <algorithm>
119 
120 using std::vector;
121 
122 REGISTER_METHOD(DT)
123 
125 
126 ////////////////////////////////////////////////////////////////////////////////
127 /// the standard constructor for just an ordinar "decision trees"
128 
129  TMVA::MethodDT::MethodDT( const TString& jobName,
130  const TString& methodTitle,
131  DataSetInfo& theData,
132  const TString& theOption) :
133  TMVA::MethodBase( jobName, Types::kDT, methodTitle, theData, theOption)
134  , fTree(0)
135  , fSepType(0)
136  , fMinNodeEvents(0)
137  , fMinNodeSize(0)
138  , fNCuts(0)
139  , fUseYesNoLeaf(kFALSE)
140  , fNodePurityLimit(0)
141  , fMaxDepth(0)
142  , fErrorFraction(0)
143  , fPruneStrength(0)
144  , fPruneMethod(DecisionTree::kNoPruning)
145  , fAutomatic(kFALSE)
146  , fRandomisedTrees(kFALSE)
147  , fUseNvars(0)
148  , fUsePoissonNvars(0) // don't use this initialisation, only here to make Coverity happy. Is set in Init()
149  , fDeltaPruneStrength(0)
150 {
151  fPruneBeforeBoost = kFALSE;
152 }
153 
154 ////////////////////////////////////////////////////////////////////////////////
155 ///constructor from Reader
156 
158  const TString& theWeightFile) :
159  TMVA::MethodBase( Types::kDT, dsi, theWeightFile)
160  , fTree(0)
161  , fSepType(0)
162  , fMinNodeEvents(0)
163  , fMinNodeSize(0)
164  , fNCuts(0)
165  , fUseYesNoLeaf(kFALSE)
166  , fNodePurityLimit(0)
167  , fMaxDepth(0)
168  , fErrorFraction(0)
169  , fPruneStrength(0)
170  , fPruneMethod(DecisionTree::kNoPruning)
171  , fAutomatic(kFALSE)
172  , fRandomisedTrees(kFALSE)
173  , fUseNvars(0)
174  , fDeltaPruneStrength(0)
175 {
177 }
178 
179 ////////////////////////////////////////////////////////////////////////////////
180 /// FDA can handle classification with 2 classes and regression with one regression-target
181 
183 {
184  if( type == Types::kClassification && numberClasses == 2 ) return kTRUE;
185  return kFALSE;
186 }
187 
188 
189 ////////////////////////////////////////////////////////////////////////////////
190 /// Define the options (their key words) that can be set in the option string.
191 ///
192 /// - UseRandomisedTrees choose at each node splitting a random set of variables
193 /// - UseNvars use UseNvars variables in randomised trees
194 /// - SeparationType the separation criterion applied in the node splitting.
195 /// known:
196 /// - GiniIndex
197 /// - MisClassificationError
198 /// - CrossEntropy
199 /// - SDivSqrtSPlusB
200 /// - nEventsMin: the minimum number of events in a node (leaf criteria, stop splitting)
201 /// - nCuts: the number of steps in the optimisation of the cut for a node (if < 0, then
202 /// step size is determined by the events)
203 /// - UseYesNoLeaf decide if the classification is done simply by the node type, or the S/B
204 /// (from the training) in the leaf node
205 /// - NodePurityLimit the minimum purity to classify a node as a signal node (used in pruning and boosting to determine
206 /// misclassification error rate)
207 /// - PruneMethod The Pruning method:
208 /// known:
209 /// - NoPruning // switch off pruning completely
210 /// - ExpectedError
211 /// - CostComplexity
212 /// - PruneStrength a parameter to adjust the amount of pruning. Should be large enough such that overtraining is avoided");
213 
215 {
216  DeclareOptionRef(fRandomisedTrees,"UseRandomisedTrees","Choose at each node splitting a random set of variables and *bagging*");
217  DeclareOptionRef(fUseNvars,"UseNvars","Number of variables used if randomised Tree option is chosen");
218  DeclareOptionRef(fUsePoissonNvars,"UsePoissonNvars", "Interpret \"UseNvars\" not as fixed number but as mean of a Poisson distribution in each split with RandomisedTree option");
219  DeclareOptionRef(fUseYesNoLeaf=kTRUE, "UseYesNoLeaf",
220  "Use Sig or Bkg node type or the ratio S/B as classification in the leaf node");
221  DeclareOptionRef(fNodePurityLimit=0.5, "NodePurityLimit", "In boosting/pruning, nodes with purity > NodePurityLimit are signal; background otherwise.");
222  DeclareOptionRef(fSepTypeS="GiniIndex", "SeparationType", "Separation criterion for node splitting");
223  AddPreDefVal(TString("MisClassificationError"));
224  AddPreDefVal(TString("GiniIndex"));
225  AddPreDefVal(TString("CrossEntropy"));
226  AddPreDefVal(TString("SDivSqrtSPlusB"));
227  DeclareOptionRef(fMinNodeEvents=-1, "nEventsMin", "deprecated !!! Minimum number of events required in a leaf node");
228  DeclareOptionRef(fMinNodeSizeS, "MinNodeSize", "Minimum percentage of training events required in a leaf node (default: Classification: 10%, Regression: 1%)");
229  DeclareOptionRef(fNCuts, "nCuts", "Number of steps during node cut optimisation");
230  DeclareOptionRef(fPruneStrength, "PruneStrength", "Pruning strength (negative value == automatic adjustment)");
231  DeclareOptionRef(fPruneMethodS="NoPruning", "PruneMethod", "Pruning method: NoPruning (switched off), ExpectedError or CostComplexity");
232 
233  AddPreDefVal(TString("NoPruning"));
234  AddPreDefVal(TString("ExpectedError"));
235  AddPreDefVal(TString("CostComplexity"));
236 
237  if (DoRegression()) {
238  DeclareOptionRef(fMaxDepth=50,"MaxDepth","Max depth of the decision tree allowed");
239  }else{
240  DeclareOptionRef(fMaxDepth=3,"MaxDepth","Max depth of the decision tree allowed");
241  }
242 }
243 
244 ////////////////////////////////////////////////////////////////////////////////
245 /// options that are used ONLY for the READER to ensure backward compatibility
246 
248 
250 
251  DeclareOptionRef(fPruneBeforeBoost=kFALSE, "PruneBeforeBoost",
252  "--> removed option .. only kept for reader backward compatibility");
253 }
254 
255 ////////////////////////////////////////////////////////////////////////////////
256 /// the option string is decoded, for available options see "DeclareOptions"
257 
259 {
260  fSepTypeS.ToLower();
261  if (fSepTypeS == "misclassificationerror") fSepType = new MisClassificationError();
262  else if (fSepTypeS == "giniindex") fSepType = new GiniIndex();
263  else if (fSepTypeS == "crossentropy") fSepType = new CrossEntropy();
264  else if (fSepTypeS == "sdivsqrtsplusb") fSepType = new SdivSqrtSplusB();
265  else {
266  Log() << kINFO << GetOptions() << Endl;
267  Log() << kFATAL << "<ProcessOptions> unknown Separation Index option called" << Endl;
268  }
269 
270  // std::cout << "fSeptypes " << fSepTypeS << " fseptype " << fSepType << std::endl;
271 
274  else if (fPruneMethodS == "costcomplexity" ) fPruneMethod = DecisionTree::kCostComplexityPruning;
275  else if (fPruneMethodS == "nopruning" ) fPruneMethod = DecisionTree::kNoPruning;
276  else {
277  Log() << kINFO << GetOptions() << Endl;
278  Log() << kFATAL << "<ProcessOptions> unknown PruneMethod option:" << fPruneMethodS <<" called" << Endl;
279  }
280 
281  if (fPruneStrength < 0) fAutomatic = kTRUE;
282  else fAutomatic = kFALSE;
284  Log() << kFATAL
285  << "Sorry automatic pruning strength determination is not implemented yet for ExpectedErrorPruning" << Endl;
286  }
287 
288 
289  if (this->Data()->HasNegativeEventWeights()){
290  Log() << kINFO << " You are using a Monte Carlo that has also negative weights. "
291  << "That should in principle be fine as long as on average you end up with "
292  << "something positive. For this you have to make sure that the minimal number "
293  << "of (un-weighted) events demanded for a tree node (currently you use: MinNodeSize="
294  <<fMinNodeSizeS
295  <<", (or the deprecated equivalent nEventsMin) you can set this via the "
296  <<"MethodDT option string when booking the "
297  << "classifier) is large enough to allow for reasonable averaging!!! "
298  << " If this does not help.. maybe you want to try the option: IgnoreNegWeightsInTraining "
299  << "which ignores events with negative weight in the training. " << Endl
300  << Endl << "Note: You'll get a WARNING message during the training if that should ever happen" << Endl;
301  }
302 
303  if (fRandomisedTrees){
304  Log() << kINFO << " Randomised trees should use *bagging* as *boost* method. Did you set this in the *MethodBoost* ? . Here I can enforce only the *no pruning*" << Endl;
306  // fBoostType = "Bagging";
307  }
308 
309  if (fMinNodeEvents > 0){
311  Log() << kWARNING << "You have explicitly set *nEventsMin*, the min absolute number \n"
312  << "of events in a leaf node. This is DEPRECATED, please use the option \n"
313  << "*MinNodeSize* giving the relative number as percentage of training \n"
314  << "events instead. \n"
315  << "nEventsMin="<<fMinNodeEvents<< "--> MinNodeSize="<<fMinNodeSize<<"%"
316  << Endl;
317  }else{
319  }
320 }
321 
323  if (sizeInPercent > 0 && sizeInPercent < 50){
324  fMinNodeSize=sizeInPercent;
325 
326  } else {
327  Log() << kERROR << "you have demanded a minimal node size of "
328  << sizeInPercent << "% of the training events.. \n"
329  << " that somehow does not make sense "<<Endl;
330  }
331 
332 }
334  sizeInPercent.ReplaceAll("%","");
335  if (sizeInPercent.IsAlnum()) SetMinNodeSize(sizeInPercent.Atof());
336  else {
337  Log() << kERROR << "I had problems reading the option MinNodeEvents, which\n"
338  << "after removing a possible % sign now reads " << sizeInPercent << Endl;
339  }
340 }
341 
342 ////////////////////////////////////////////////////////////////////////////////
343 /// common initialisation with defaults for the DT-Method
344 
346 {
347  fMinNodeEvents = -1;
348  fMinNodeSize = 5;
349  fMinNodeSizeS = "5%";
350  fNCuts = 20;
352  fPruneStrength = 5; // -1 means automatic determination of the prune strength using a validation sample
355  fUseNvars = GetNvar();
357 
358  // reference cut value to distinguish signal-like from background-like events
361  fMaxDepth = 3;
362  }else {
363  fMaxDepth = 50;
364  }
365 }
366 
367 ////////////////////////////////////////////////////////////////////////////////
368 ///destructor
369 
371 {
372  delete fTree;
373 }
374 
375 ////////////////////////////////////////////////////////////////////////////////
376 
378 {
382  fTree->SetNVars(GetNvar());
383  if (fRandomisedTrees) Log()<<kWARNING<<" randomised Trees do not work yet in this framework,"
384  << " as I do not know how to give each tree a new random seed, now they"
385  << " will be all the same and that is not good " << Endl;
387 
388  //fTree->BuildTree(GetEventCollection(Types::kTraining));
390  UInt_t nevents = Data()->GetNTrainingEvents();
391  std::vector<const TMVA::Event*> tmp;
392  for (Long64_t ievt=0; ievt<nevents; ievt++) {
393  const Event *event = GetEvent(ievt);
394  tmp.push_back(event);
395  }
396  fTree->BuildTree(tmp);
398 
401 }
402 
403 ////////////////////////////////////////////////////////////////////////////////
404 /// prune the decision tree if requested (good for individual trees that are best grown out, and then
405 /// pruned back, while boosted decision trees are best 'small' trees to start with. Well, at least the
406 /// standard "optimal pruning algorithms" don't result in 'weak enough' classifiers !!
407 
409 {
410  // remember the number of nodes beforehand (for monitoring purposes)
411 
412 
413  if (fAutomatic && fPruneMethod == DecisionTree::kCostComplexityPruning) { // automatic cost complexity pruning
414  CCPruner* pruneTool = new CCPruner(fTree, this->Data() , fSepType);
415  pruneTool->Optimize();
416  std::vector<DecisionTreeNode*> nodes = pruneTool->GetOptimalPruneSequence();
418  for(UInt_t i = 0; i < nodes.size(); i++)
419  fTree->PruneNode(nodes[i]);
420  delete pruneTool;
421  }
423  /*
424 
425  Double_t alpha = 0;
426  Double_t delta = fDeltaPruneStrength;
427 
428  DecisionTree* dcopy;
429  std::vector<Double_t> q;
430  multimap<Double_t,Double_t> quality;
431  Int_t nnodes=fTree->GetNNodes();
432 
433  // find the maximum prune strength that still leaves some nodes
434  Bool_t forceStop = kFALSE;
435  Int_t troubleCount=0, previousNnodes=nnodes;
436 
437 
438  nnodes=fTree->GetNNodes();
439  while (nnodes > 3 && !forceStop) {
440  dcopy = new DecisionTree(*fTree);
441  dcopy->SetPruneStrength(alpha+=delta);
442  dcopy->PruneTree();
443  q.push_back(TestTreeQuality(dcopy));
444  quality.insert(std::pair<const Double_t,Double_t>(q.back(),alpha));
445  nnodes=dcopy->GetNNodes();
446  if (previousNnodes == nnodes) troubleCount++;
447  else {
448  troubleCount=0; // reset counter
449  if (nnodes < previousNnodes / 2 ) fDeltaPruneStrength /= 2.;
450  }
451  previousNnodes = nnodes;
452  if (troubleCount > 20) {
453  if (methodIndex == 0 && fPruneStrength <=0) {//maybe you need larger stepsize ??
454  fDeltaPruneStrength *= 5;
455  Log() << kINFO << "<PruneTree> trouble determining optimal prune strength"
456  << " for Tree " << methodIndex
457  << " --> first try to increase the step size"
458  << " currently Prunestrenght= " << alpha
459  << " stepsize " << fDeltaPruneStrength << " " << Endl;
460  troubleCount = 0; // try again
461  fPruneStrength = 1; // if it was for the first time..
462  } else if (methodIndex == 0 && fPruneStrength <=2) {//maybe you need much larger stepsize ??
463  fDeltaPruneStrength *= 5;
464  Log() << kINFO << "<PruneTree> trouble determining optimal prune strength"
465  << " for Tree " << methodIndex
466  << " --> try to increase the step size even more.. "
467  << " if that still didn't work, TRY IT BY HAND"
468  << " currently Prunestrenght= " << alpha
469  << " stepsize " << fDeltaPruneStrength << " " << Endl;
470  troubleCount = 0; // try again
471  fPruneStrength = 3; // if it was for the first time..
472  } else {
473  forceStop=kTRUE;
474  Log() << kINFO << "<PruneTree> trouble determining optimal prune strength"
475  << " for Tree " << methodIndex << " at tested prune strength: " << alpha << " --> abort forced, use same strength as for previous tree:"
476  << fPruneStrength << Endl;
477  }
478  }
479  if (fgDebugLevel==1) Log() << kINFO << "Pruneed with ("<<alpha
480  << ") give quality: " << q.back()
481  << " and #nodes: " << nnodes
482  << Endl;
483  delete dcopy;
484  }
485  if (!forceStop) {
486  multimap<Double_t,Double_t>::reverse_iterator it=quality.rend();
487  it++;
488  fPruneStrength = it->second;
489  // adjust the step size for the next tree.. think that 20 steps are sort of
490  // fine enough.. could become a tunable option later..
491  fDeltaPruneStrength *= Double_t(q.size())/20.;
492  }
493 
494  fTree->SetPruneStrength(fPruneStrength);
495  fTree->PruneTree();
496  */
497  }
498  else {
500  fTree->PruneTree();
501  }
502 
503  return fPruneStrength;
504 }
505 
506 ////////////////////////////////////////////////////////////////////////////////
507 
509 {
511  // test the tree quality.. in terms of Misclassification
512  Double_t SumCorrect=0,SumWrong=0;
513  for (Long64_t ievt=0; ievt<Data()->GetNEvents(); ievt++)
514  {
515  const Event * ev = Data()->GetEvent(ievt);
516  if ((dt->CheckEvent(ev) > dt->GetNodePurityLimit() ) == DataInfo().IsSignal(ev)) SumCorrect+=ev->GetWeight();
517  else SumWrong+=ev->GetWeight();
518  }
520  return SumCorrect / (SumCorrect + SumWrong);
521 }
522 
523 ////////////////////////////////////////////////////////////////////////////////
524 
525 void TMVA::MethodDT::AddWeightsXMLTo( void* parent ) const
526 {
527  fTree->AddXMLTo(parent);
528  //Log() << kFATAL << "Please implement writing of weights as XML" << Endl;
529 }
530 
531 ////////////////////////////////////////////////////////////////////////////////
532 
534 {
535  if(fTree)
536  delete fTree;
537  fTree = new DecisionTree();
539 }
540 
541 ////////////////////////////////////////////////////////////////////////////////
542 
543 void TMVA::MethodDT::ReadWeightsFromStream( std::istream& istr )
544 {
545  delete fTree;
546  fTree = new DecisionTree();
547  fTree->Read(istr);
548 }
549 
550 ////////////////////////////////////////////////////////////////////////////////
551 /// returns MVA value
552 
554 {
555  // cannot determine error
556  NoErrorCalc(err, errUpper);
557 
559 }
560 
561 ////////////////////////////////////////////////////////////////////////////////
562 
564 {
565 }
566 ////////////////////////////////////////////////////////////////////////////////
567 
569 {
570  return 0;
571 }
Types::EAnalysisType fAnalysisType
Definition: MethodBase.h:577
virtual void * AddXMLTo(void *parent) const
add attributes to XML
Definition: BinaryTree.cxx:134
void Optimize()
determine the pruning sequence
Definition: CCPruner.cxx:124
MsgLogger & Endl(MsgLogger &ml)
Definition: MsgLogger.h:158
Bool_t fUsePoissonNvars
Definition: MethodDT.h:129
Singleton class for Global types used by TMVA.
Definition: Types.h:73
long long Long64_t
Definition: RtypesCore.h:69
Double_t CheckEvent(const TMVA::Event *, Bool_t UseYesNoLeaf=kFALSE) const
the event e is put into the decision tree (starting at the root node) and the output is NodeType (sig...
void Init(void)
common initialisation with defaults for the DT-Method
Definition: MethodDT.cxx:345
virtual void Read(std::istream &istr, UInt_t tmva_Version_Code=TMVA_VERSION_CODE)
Read the binary tree from an input stream.
Definition: BinaryTree.cxx:169
TString & ReplaceAll(const TString &s1, const TString &s2)
Definition: TString.h:640
UInt_t GetNvar() const
Definition: MethodBase.h:328
Double_t GetNodePurityLimit() const
Definition: DecisionTree.h:156
TString fPruneMethodS
Definition: MethodDT.h:125
DecisionTree::EPruneMethod fPruneMethod
Definition: MethodDT.h:124
MsgLogger & Log() const
Definition: Configurable.h:122
OptionBase * DeclareOptionRef(T &ref, const TString &name, const TString &desc="")
EAnalysisType
Definition: Types.h:125
Virtual base Class for all MVA method.
Definition: MethodBase.h:106
Float_t GetOptimalPruneStrength() const
Definition: CCPruner.h:88
Basic string class.
Definition: TString.h:129
Ranking for variables in method (implementation)
Definition: Ranking.h:48
void ToLower()
Change string to lower-case.
Definition: TString.cxx:1099
bool Bool_t
Definition: RtypesCore.h:59
virtual Bool_t HasAnalysisType(Types::EAnalysisType type, UInt_t numberClasses, UInt_t numberTargets)
FDA can handle classification with 2 classes and regression with one regression-target.
Definition: MethodDT.cxx:182
Double_t fNodePurityLimit
Definition: MethodDT.h:118
Double_t PruneTree()
prune the decision tree if requested (good for individual trees that are best grown out...
Definition: MethodDT.cxx:408
Bool_t fPruneBeforeBoost
Definition: MethodDT.h:137
void SetMinNodeSize(Double_t sizeInPercent)
Definition: MethodDT.cxx:322
Double_t fPruneStrength
Definition: MethodDT.h:123
void SetAnalysisType(Types::EAnalysisType t)
Definition: DecisionTree.h:183
Int_t fUseNvars
Definition: MethodDT.h:128
Implementation of the CrossEntropy as separation criterion.
Definition: CrossEntropy.h:43
Bool_t fAutomatic
Definition: MethodDT.h:126
UInt_t fMaxDepth
Definition: MethodDT.h:119
UInt_t GetTrainingTMVAVersionCode() const
Definition: MethodBase.h:373
void ReadWeightsFromStream(std::istream &istr)
Definition: MethodDT.cxx:543
const Event * GetEvent() const
Definition: MethodBase.h:733
Int_t fMinNodeEvents
Definition: MethodDT.h:112
DataSet * Data() const
Definition: MethodBase.h:393
Bool_t IsAlnum() const
Returns true if all characters in string are alphanumeric.
Definition: TString.cxx:1800
void DeclareOptions()
Define the options (their key words) that can be set in the option string.
Definition: MethodDT.cxx:214
DataSetInfo & DataInfo() const
Definition: MethodBase.h:394
Bool_t DoRegression() const
Definition: MethodBase.h:422
Double_t fDeltaPruneStrength
Definition: MethodDT.h:132
Class that contains all the data information.
Definition: DataSetInfo.h:60
Implementation of the SdivSqrtSplusB as separation criterion.
Double_t GetWeight() const
return the event weight - depending on whether the flag IgnoreNegWeightsInTraining is or not...
Definition: Event.cxx:382
Long64_t GetNTrainingEvents() const
Definition: DataSet.h:79
Implementation of the MisClassificationError as separation criterion.
void AddWeightsXMLTo(void *parent) const
Definition: MethodDT.cxx:525
void ProcessOptions()
the option string is decoded, for available options see "DeclareOptions"
Definition: MethodDT.cxx:258
void SetNVars(Int_t n)
Definition: DecisionTree.h:188
TString fSepTypeS
Definition: MethodDT.h:111
Implementation of the GiniIndex as separation criterion.
Definition: GiniIndex.h:63
void SetPruneStrength(Double_t p)
Definition: DecisionTree.h:140
Implementation of a Decision Tree.
Definition: DecisionTree.h:59
unsigned int UInt_t
Definition: RtypesCore.h:42
DecisionTree * fTree
Definition: MethodDT.h:108
virtual void ReadXML(void *node, UInt_t tmva_Version_Code=TMVA_VERSION_CODE)
read attributes from XML
Definition: BinaryTree.cxx:144
Float_t fMinNodeSize
Definition: MethodDT.h:113
const Bool_t kFALSE
Definition: RtypesCore.h:92
Double_t TestTreeQuality(DecisionTree *dt)
Definition: MethodDT.cxx:508
Int_t fNCuts
Definition: MethodDT.h:116
#define ClassImp(name)
Definition: Rtypes.h:336
double Double_t
Definition: RtypesCore.h:55
std::vector< TMVA::DecisionTreeNode * > GetOptimalPruneSequence() const
return the prune strength (=alpha) corresponding to the prune sequence
Definition: CCPruner.cxx:240
Bool_t fRandomisedTrees
Definition: MethodDT.h:127
int type
Definition: TGX11.cxx:120
Double_t GetMvaValue(Double_t *err=0, Double_t *errUpper=0)
returns MVA value
Definition: MethodDT.cxx:553
void SetCurrentType(Types::ETreeType type) const
Definition: DataSet.h:100
TString fMinNodeSizeS
Definition: MethodDT.h:114
void AddPreDefVal(const T &)
Definition: Configurable.h:168
void Train(void)
Definition: MethodDT.cxx:377
void ExitFromTraining()
Definition: MethodBase.h:446
MethodDT(const TString &jobName, const TString &methodTitle, DataSetInfo &theData, const TString &theOption="")
the standard constructor for just an ordinar "decision trees"
Definition: MethodDT.cxx:129
const TString & GetOptions() const
Definition: Configurable.h:84
void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility
Definition: MethodDT.cxx:247
#define REGISTER_METHOD(CLASS)
for example
Double_t PruneTree(const EventConstList *validationSample=NULL)
prune (get rid of internal nodes) the Decision tree to avoid overtraining several different pruning m...
Abstract ClassifierFactory template that handles arbitrary types.
virtual void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
Definition: MethodBase.cxx:601
UInt_t BuildTree(const EventConstList &eventSample, DecisionTreeNode *node=NULL)
building the decision tree by recursively calling the splitting of one (root-) node into two daughter...
virtual ~MethodDT(void)
destructor
Definition: MethodDT.cxx:370
void ReadWeightsFromXML(void *wghtnode)
Definition: MethodDT.cxx:533
Long64_t GetNEvents(Types::ETreeType type=Types::kMaxTreeType) const
Definition: DataSet.h:215
Bool_t IsSignal(const Event *ev) const
Double_t Atof() const
Return floating-point value contained in string.
Definition: TString.cxx:2041
Types::EAnalysisType GetAnalysisType() const
Definition: MethodBase.h:421
Bool_t fUseYesNoLeaf
Definition: MethodDT.h:117
const Bool_t kTRUE
Definition: RtypesCore.h:91
const Event * GetEvent() const
Definition: DataSet.cxx:202
void GetHelpMessage() const
Definition: MethodDT.cxx:563
Analysis of Boosted Decision Trees.
Definition: MethodDT.h:49
void NoErrorCalc(Double_t *const err, Double_t *const errUpper)
Definition: MethodBase.cxx:829
void SetSignalReferenceCut(Double_t cut)
Definition: MethodBase.h:348
A helper class to prune a decision tree using the Cost Complexity method (see Classification and Regr...
Definition: CCPruner.h:61
const Ranking * CreateRanking()
Definition: MethodDT.cxx:568
void PruneNode(TMVA::DecisionTreeNode *node)
prune away the subtree below the node
SeparationBase * fSepType
Definition: MethodDT.h:110