Logo ROOT  
Reference Guide
Loading...
Searching...
No Matches
MethodDT.cxx
Go to the documentation of this file.
1// @(#)root/tmva $Id$
2// Author: Andreas Hoecker, Joerg Stelzer, Helge Voss, Kai Voss
3
4/**********************************************************************************
5 * Project: TMVA - a Root-integrated toolkit for multivariate data analysis *
6 * Package: TMVA *
7 * Class : MethodDT (DT = Decision Trees) *
8 * *
9 * *
10 * Description: *
11 * Analysis of Boosted Decision Trees *
12 * *
13 * Authors (alphabetical): *
14 * Andreas Hoecker <Andreas.Hocker@cern.ch> - CERN, Switzerland *
15 * Helge Voss <Helge.Voss@cern.ch> - MPI-K Heidelberg, Germany *
16 * Or Cohen <orcohenor@gmail.com> - Weizmann Inst., Israel *
17 * *
18 * Copyright (c) 2005: *
19 * CERN, Switzerland *
20 * MPI-K Heidelberg, Germany *
21 * *
22 * Redistribution and use in source and binary forms, with or without *
23 * modification, are permitted according to the terms listed in LICENSE *
24 * (see tmva/doc/LICENSE) *
25 **********************************************************************************/
26
27/*! \class TMVA::MethodDT
28\ingroup TMVA
29
30Analysis of Boosted Decision Trees
31
32Boosted decision trees have been successfully used in High Energy
33Physics analysis for example by the MiniBooNE experiment
34(Yang-Roe-Zhu, physics/0508045). In Boosted Decision Trees, the
35selection is done on a majority vote on the result of several decision
36trees, which are all derived from the same training sample by
37supplying different event weights during the training.
38
39### Decision trees:
40
41successive decision nodes are used to categorize the
42events out of the sample as either signal or background. Each node
43uses only a single discriminating variable to decide if the event is
44signal-like ("goes right") or background-like ("goes left"). This
45forms a tree like structure with "baskets" at the end (leave nodes),
46and an event is classified as either signal or background according to
47whether the basket where it ends up has been classified signal or
48background during the training. Training of a decision tree is the
49process to define the "cut criteria" for each node. The training
50starts with the root node. Here one takes the full training event
51sample and selects the variable and corresponding cut value that gives
52the best separation between signal and background at this stage. Using
53this cut criterion, the sample is then divided into two subsamples, a
54signal-like (right) and a background-like (left) sample. Two new nodes
55are then created for each of the two sub-samples and they are
56constructed using the same mechanism as described for the root
57node. The devision is stopped once a certain node has reached either a
58minimum number of events, or a minimum or maximum signal purity. These
59leave nodes are then called "signal" or "background" if they contain
60more signal respective background events from the training sample.
61
62### Boosting:
63
64the idea behind the boosting is, that signal events from the training
65sample, that *end up in a background node (and vice versa) are given a
66larger weight than events that are in the correct leave node. This
67results in a re-weighed training event sample, with which then a new
68decision tree can be developed. The boosting can be applied several
69times (typically 100-500 times) and one ends up with a set of decision
70trees (a forest).
71
72### Bagging:
73
74In this particular variant of the Boosted Decision Trees the boosting
75is not done on the basis of previous training results, but by a simple
76stochastic re-sampling of the initial training event sample.
77
78### Analysis:
79
80applying an individual decision tree to a test event results in a
81classification of the event as either signal or background. For the
82boosted decision tree selection, an event is successively subjected to
83the whole set of decision trees and depending on how often it is
84classified as signal, a "likelihood" estimator is constructed for the
85event being signal or background. The value of this estimator is the
86one which is then used to select the events from an event sample, and
87the cut value on this estimator defines the efficiency and purity of
88the selection.
89*/
90
91#include "TMVA/MethodDT.h"
92
94#include "TMVA/CCPruner.h"
96#include "TMVA/Configurable.h"
97#include "TMVA/CrossEntropy.h"
98#include "TMVA/DataSet.h"
99#include "TMVA/DecisionTree.h"
100#include "TMVA/GiniIndex.h"
101#include "TMVA/IMethod.h"
102#include "TMVA/MethodBase.h"
103#include "TMVA/MethodBoost.h"
105#include "TMVA/MsgLogger.h"
106#include "TMVA/Ranking.h"
107#include "TMVA/SdivSqrtSplusB.h"
108#include "TMVA/SeparationBase.h"
109#include "TMVA/Timer.h"
110#include "TMVA/Tools.h"
111#include "TMVA/Types.h"
112
113#include "TRandom3.h"
114
115#include <iostream>
116#include <algorithm>
117
118using std::vector;
119
121
122
123////////////////////////////////////////////////////////////////////////////////
124/// the standard constructor for just an ordinar "decision trees"
125
126 TMVA::MethodDT::MethodDT( const TString& jobName,
127 const TString& methodTitle,
128 DataSetInfo& theData,
129 const TString& theOption) :
130 TMVA::MethodBase( jobName, Types::kDT, methodTitle, theData, theOption)
131 , fTree(0)
132 , fSepType(0)
133 , fMinNodeEvents(0)
134 , fMinNodeSize(0)
135 , fNCuts(0)
138 , fMaxDepth(0)
139 , fErrorFraction(0)
140 , fPruneStrength(0)
141 , fPruneMethod(DecisionTree::kNoPruning)
144 , fUseNvars(0)
145 , fUsePoissonNvars(0) // don't use this initialisation, only here to make Coverity happy. Is set in Init()
147{
149}
150
151////////////////////////////////////////////////////////////////////////////////
152///constructor from Reader
153
155 const TString& theWeightFile) :
156 TMVA::MethodBase( Types::kDT, dsi, theWeightFile)
157 , fTree(0)
158 , fSepType(0)
159 , fMinNodeEvents(0)
160 , fMinNodeSize(0)
161 , fNCuts(0)
164 , fMaxDepth(0)
165 , fErrorFraction(0)
166 , fPruneStrength(0)
167 , fPruneMethod(DecisionTree::kNoPruning)
170 , fUseNvars(0)
172{
174}
175
176////////////////////////////////////////////////////////////////////////////////
177/// FDA can handle classification with 2 classes and regression with one regression-target
178
180{
181 if( type == Types::kClassification && numberClasses == 2 ) return kTRUE;
182 return kFALSE;
183}
184
185
186////////////////////////////////////////////////////////////////////////////////
187/// Define the options (their key words) that can be set in the option string.
188///
189/// - UseRandomisedTrees choose at each node splitting a random set of variables
190/// - UseNvars use UseNvars variables in randomised trees
191/// - SeparationType the separation criterion applied in the node splitting.
192/// known:
193/// - GiniIndex
194/// - MisClassificationError
195/// - CrossEntropy
196/// - SDivSqrtSPlusB
197/// - nEventsMin: the minimum number of events in a node (leaf criteria, stop splitting)
198/// - nCuts: the number of steps in the optimisation of the cut for a node (if < 0, then
199/// step size is determined by the events)
200/// - UseYesNoLeaf decide if the classification is done simply by the node type, or the S/B
201/// (from the training) in the leaf node
202/// - NodePurityLimit the minimum purity to classify a node as a signal node (used in pruning and boosting to determine
203/// misclassification error rate)
204/// - PruneMethod The Pruning method:
205/// known:
206/// - NoPruning // switch off pruning completely
207/// - ExpectedError
208/// - CostComplexity
209/// - PruneStrength a parameter to adjust the amount of pruning. Should be large enough such that overtraining is avoided");
210
212{
213 DeclareOptionRef(fRandomisedTrees,"UseRandomisedTrees","Choose at each node splitting a random set of variables and *bagging*");
214 DeclareOptionRef(fUseNvars,"UseNvars","Number of variables used if randomised Tree option is chosen");
215 DeclareOptionRef(fUsePoissonNvars,"UsePoissonNvars", "Interpret \"UseNvars\" not as fixed number but as mean of a Poisson distribution in each split with RandomisedTree option");
216 DeclareOptionRef(fUseYesNoLeaf=kTRUE, "UseYesNoLeaf",
217 "Use Sig or Bkg node type or the ratio S/B as classification in the leaf node");
218 DeclareOptionRef(fNodePurityLimit=0.5, "NodePurityLimit", "In boosting/pruning, nodes with purity > NodePurityLimit are signal; background otherwise.");
219 DeclareOptionRef(fSepTypeS="GiniIndex", "SeparationType", "Separation criterion for node splitting");
220 AddPreDefVal(TString("MisClassificationError"));
221 AddPreDefVal(TString("GiniIndex"));
222 AddPreDefVal(TString("CrossEntropy"));
223 AddPreDefVal(TString("SDivSqrtSPlusB"));
224 DeclareOptionRef(fMinNodeEvents=-1, "nEventsMin", "deprecated !!! Minimum number of events required in a leaf node");
225 DeclareOptionRef(fMinNodeSizeS, "MinNodeSize", "Minimum percentage of training events required in a leaf node (default: Classification: 10%, Regression: 1%)");
226 DeclareOptionRef(fNCuts, "nCuts", "Number of steps during node cut optimisation");
227 DeclareOptionRef(fPruneStrength, "PruneStrength", "Pruning strength (negative value == automatic adjustment)");
228 DeclareOptionRef(fPruneMethodS="NoPruning", "PruneMethod", "Pruning method: NoPruning (switched off), ExpectedError or CostComplexity");
229
230 AddPreDefVal(TString("NoPruning"));
231 AddPreDefVal(TString("ExpectedError"));
232 AddPreDefVal(TString("CostComplexity"));
233
234 if (DoRegression()) {
235 DeclareOptionRef(fMaxDepth=50,"MaxDepth","Max depth of the decision tree allowed");
236 }else{
237 DeclareOptionRef(fMaxDepth=3,"MaxDepth","Max depth of the decision tree allowed");
238 }
239}
240
241////////////////////////////////////////////////////////////////////////////////
242/// options that are used ONLY for the READER to ensure backward compatibility
243
245
247
248 DeclareOptionRef(fPruneBeforeBoost=kFALSE, "PruneBeforeBoost",
249 "--> removed option .. only kept for reader backward compatibility");
250}
251
252////////////////////////////////////////////////////////////////////////////////
253/// the option string is decoded, for available options see "DeclareOptions"
254
256{
257 fSepTypeS.ToLower();
258 if (fSepTypeS == "misclassificationerror") fSepType = new MisClassificationError();
259 else if (fSepTypeS == "giniindex") fSepType = new GiniIndex();
260 else if (fSepTypeS == "crossentropy") fSepType = new CrossEntropy();
261 else if (fSepTypeS == "sdivsqrtsplusb") fSepType = new SdivSqrtSplusB();
262 else {
263 Log() << kINFO << GetOptions() << Endl;
264 Log() << kFATAL << "<ProcessOptions> unknown Separation Index option called" << Endl;
265 }
266
267 // std::cout << "fSeptypes " << fSepTypeS << " fseptype " << fSepType << std::endl;
268
269 fPruneMethodS.ToLower();
271 else if (fPruneMethodS == "costcomplexity" ) fPruneMethod = DecisionTree::kCostComplexityPruning;
272 else if (fPruneMethodS == "nopruning" ) fPruneMethod = DecisionTree::kNoPruning;
273 else {
274 Log() << kINFO << GetOptions() << Endl;
275 Log() << kFATAL << "<ProcessOptions> unknown PruneMethod option:" << fPruneMethodS <<" called" << Endl;
276 }
277
279 else fAutomatic = kFALSE;
281 Log() << kFATAL
282 << "Sorry automatic pruning strength determination is not implemented yet for ExpectedErrorPruning" << Endl;
283 }
284
285
286 if (this->Data()->HasNegativeEventWeights()){
287 Log() << kINFO << " You are using a Monte Carlo that has also negative weights. "
288 << "That should in principle be fine as long as on average you end up with "
289 << "something positive. For this you have to make sure that the minimal number "
290 << "of (un-weighted) events demanded for a tree node (currently you use: MinNodeSize="
292 <<", (or the deprecated equivalent nEventsMin) you can set this via the "
293 <<"MethodDT option string when booking the "
294 << "classifier) is large enough to allow for reasonable averaging!!! "
295 << " If this does not help.. maybe you want to try the option: IgnoreNegWeightsInTraining "
296 << "which ignores events with negative weight in the training. " << Endl
297 << Endl << "Note: You'll get a WARNING message during the training if that should ever happen" << Endl;
298 }
299
300 if (fRandomisedTrees){
301 Log() << kINFO << " Randomised trees should use *bagging* as *boost* method. Did you set this in the *MethodBoost* ? . Here I can enforce only the *no pruning*" << Endl;
303 // fBoostType = "Bagging";
304 }
305
306 if (fMinNodeEvents > 0){
307 fMinNodeSize = fMinNodeEvents / Data()->GetNTrainingEvents() * 100;
308 Log() << kWARNING << "You have explicitly set *nEventsMin*, the min absolute number \n"
309 << "of events in a leaf node. This is DEPRECATED, please use the option \n"
310 << "*MinNodeSize* giving the relative number as percentage of training \n"
311 << "events instead. \n"
312 << "nEventsMin="<<fMinNodeEvents<< "--> MinNodeSize="<<fMinNodeSize<<"%"
313 << Endl;
314 }else{
316 }
317}
318
320 if (sizeInPercent > 0 && sizeInPercent < 50){
321 fMinNodeSize=sizeInPercent;
322
323 } else {
324 Log() << kERROR << "you have demanded a minimal node size of "
325 << sizeInPercent << "% of the training events.. \n"
326 << " that somehow does not make sense "<<Endl;
327 }
328
329}
331 sizeInPercent.ReplaceAll("%","");
332 if (sizeInPercent.IsAlnum()) SetMinNodeSize(sizeInPercent.Atof());
333 else {
334 Log() << kERROR << "I had problems reading the option MinNodeEvents, which\n"
335 << "after removing a possible % sign now reads " << sizeInPercent << Endl;
336 }
337}
338
339////////////////////////////////////////////////////////////////////////////////
340/// common initialisation with defaults for the DT-Method
341
343{
344 fMinNodeEvents = -1;
345 fMinNodeSize = 5;
346 fMinNodeSizeS = "5%";
347 fNCuts = 20;
349 fPruneStrength = 5; // -1 means automatic determination of the prune strength using a validation sample
352 fUseNvars = GetNvar();
354
355 // reference cut value to distinguish signal-like from background-like events
358 fMaxDepth = 3;
359 }else {
360 fMaxDepth = 50;
361 }
362}
363
364////////////////////////////////////////////////////////////////////////////////
365///destructor
366
368{
369 delete fTree;
370}
371
372////////////////////////////////////////////////////////////////////////////////
373
375{
379 fTree->SetNVars(GetNvar());
380 if (fRandomisedTrees) Log()<<kWARNING<<" randomised Trees do not work yet in this framework,"
381 << " as I do not know how to give each tree a new random seed, now they"
382 << " will be all the same and that is not good " << Endl;
383 fTree->SetAnalysisType( GetAnalysisType() );
384
385 //fTree->BuildTree(GetEventCollection(Types::kTraining));
386 Data()->SetCurrentType(Types::kTraining);
387 UInt_t nevents = Data()->GetNTrainingEvents();
388 std::vector<const TMVA::Event*> tmp;
389 for (Long64_t ievt=0; ievt<nevents; ievt++) {
390 const Event *event = GetEvent(ievt);
391 tmp.push_back(event);
392 }
393 fTree->BuildTree(tmp);
394 if (fPruneMethod != DecisionTree::kNoPruning) fTree->PruneTree();
395
398}
399
400////////////////////////////////////////////////////////////////////////////////
401/// prune the decision tree if requested (good for individual trees that are best grown out, and then
402/// pruned back, while boosted decision trees are best 'small' trees to start with. Well, at least the
403/// standard "optimal pruning algorithms" don't result in 'weak enough' classifiers !!
404
406{
407 // remember the number of nodes beforehand (for monitoring purposes)
408
409
410 if (fAutomatic && fPruneMethod == DecisionTree::kCostComplexityPruning) { // automatic cost complexity pruning
411 CCPruner* pruneTool = new CCPruner(fTree, this->Data() , fSepType);
412 pruneTool->Optimize();
413 std::vector<DecisionTreeNode*> nodes = pruneTool->GetOptimalPruneSequence();
415 for(UInt_t i = 0; i < nodes.size(); i++)
416 fTree->PruneNode(nodes[i]);
417 delete pruneTool;
418 }
420 /*
421
422 Double_t alpha = 0;
423 Double_t delta = fDeltaPruneStrength;
424
425 DecisionTree* dcopy;
426 std::vector<Double_t> q;
427 multimap<Double_t,Double_t> quality;
428 Int_t nnodes=fTree->GetNNodes();
429
430 // find the maximum prune strength that still leaves some nodes
431 Bool_t forceStop = kFALSE;
432 Int_t troubleCount=0, previousNnodes=nnodes;
433
434
435 nnodes=fTree->GetNNodes();
436 while (nnodes > 3 && !forceStop) {
437 dcopy = new DecisionTree(*fTree);
438 dcopy->SetPruneStrength(alpha+=delta);
439 dcopy->PruneTree();
440 q.push_back(TestTreeQuality(dcopy));
441 quality.insert(std::pair<const Double_t,Double_t>(q.back(),alpha));
442 nnodes=dcopy->GetNNodes();
443 if (previousNnodes == nnodes) troubleCount++;
444 else {
445 troubleCount=0; // reset counter
446 if (nnodes < previousNnodes / 2 ) fDeltaPruneStrength /= 2.;
447 }
448 previousNnodes = nnodes;
449 if (troubleCount > 20) {
450 if (methodIndex == 0 && fPruneStrength <=0) {//maybe you need larger stepsize ??
451 fDeltaPruneStrength *= 5;
452 Log() << kINFO << "<PruneTree> trouble determining optimal prune strength"
453 << " for Tree " << methodIndex
454 << " --> first try to increase the step size"
455 << " currently Prunestrenght= " << alpha
456 << " stepsize " << fDeltaPruneStrength << " " << Endl;
457 troubleCount = 0; // try again
458 fPruneStrength = 1; // if it was for the first time..
459 } else if (methodIndex == 0 && fPruneStrength <=2) {//maybe you need much larger stepsize ??
460 fDeltaPruneStrength *= 5;
461 Log() << kINFO << "<PruneTree> trouble determining optimal prune strength"
462 << " for Tree " << methodIndex
463 << " --> try to increase the step size even more.. "
464 << " if that still didn't work, TRY IT BY HAND"
465 << " currently Prunestrenght= " << alpha
466 << " stepsize " << fDeltaPruneStrength << " " << Endl;
467 troubleCount = 0; // try again
468 fPruneStrength = 3; // if it was for the first time..
469 } else {
470 forceStop=kTRUE;
471 Log() << kINFO << "<PruneTree> trouble determining optimal prune strength"
472 << " for Tree " << methodIndex << " at tested prune strength: " << alpha << " --> abort forced, use same strength as for previous tree:"
473 << fPruneStrength << Endl;
474 }
475 }
476 if (fgDebugLevel==1) Log() << kINFO << "Pruneed with ("<<alpha
477 << ") give quality: " << q.back()
478 << " and #nodes: " << nnodes
479 << Endl;
480 delete dcopy;
481 }
482 if (!forceStop) {
483 multimap<Double_t,Double_t>::reverse_iterator it=quality.rend();
484 it++;
485 fPruneStrength = it->second;
486 // adjust the step size for the next tree.. think that 20 steps are sort of
487 // fine enough.. could become a tunable option later..
488 fDeltaPruneStrength *= Double_t(q.size())/20.;
489 }
490
491 fTree->SetPruneStrength(fPruneStrength);
492 fTree->PruneTree();
493 */
494 }
495 else {
496 fTree->SetPruneStrength(fPruneStrength);
497 fTree->PruneTree();
498 }
499
500 return fPruneStrength;
501}
502
503////////////////////////////////////////////////////////////////////////////////
504
506{
507 Data()->SetCurrentType(Types::kValidation);
508 // test the tree quality.. in terms of Misclassification
509 Double_t SumCorrect=0,SumWrong=0;
510 for (Long64_t ievt=0; ievt<Data()->GetNEvents(); ievt++)
511 {
512 const Event * ev = Data()->GetEvent(ievt);
513 if ((dt->CheckEvent(ev) > dt->GetNodePurityLimit() ) == DataInfo().IsSignal(ev)) SumCorrect+=ev->GetWeight();
514 else SumWrong+=ev->GetWeight();
515 }
516 Data()->SetCurrentType(Types::kTraining);
517 return SumCorrect / (SumCorrect + SumWrong);
518}
519
520////////////////////////////////////////////////////////////////////////////////
521
522void TMVA::MethodDT::AddWeightsXMLTo( void* parent ) const
523{
524 fTree->AddXMLTo(parent);
525 //Log() << kFATAL << "Please implement writing of weights as XML" << Endl;
526}
527
528////////////////////////////////////////////////////////////////////////////////
529
531{
532 if(fTree)
533 delete fTree;
534 fTree = new DecisionTree();
535 fTree->ReadXML(wghtnode,GetTrainingTMVAVersionCode());
536}
537
538////////////////////////////////////////////////////////////////////////////////
539
541{
542 delete fTree;
543 fTree = new DecisionTree();
544 fTree->Read(istr);
545}
546
547////////////////////////////////////////////////////////////////////////////////
548/// returns MVA value
549
551{
552 // cannot determine error
553 NoErrorCalc(err, errUpper);
554
555 return fTree->CheckEvent(GetEvent(),fUseYesNoLeaf);
556}
557
558////////////////////////////////////////////////////////////////////////////////
559
561{
562}
563////////////////////////////////////////////////////////////////////////////////
564
566{
567 return 0;
568}
#define REGISTER_METHOD(CLASS)
for example
unsigned int UInt_t
Unsigned integer 4 bytes (unsigned int).
Definition RtypesCore.h:60
bool Bool_t
Boolean (0=false, 1=true) (bool).
Definition RtypesCore.h:77
constexpr Bool_t kFALSE
Definition RtypesCore.h:108
double Double_t
Double 8 bytes.
Definition RtypesCore.h:73
long long Long64_t
Portable signed long integer 8 bytes.
Definition RtypesCore.h:83
constexpr Bool_t kTRUE
Definition RtypesCore.h:107
Double_t err
A helper class to prune a decision tree using the Cost Complexity method (see Classification and Regr...
Definition CCPruner.h:62
void Optimize()
determine the pruning sequence
Definition CCPruner.cxx:124
std::vector< TMVA::DecisionTreeNode * > GetOptimalPruneSequence() const
return the prune strength (=alpha) corresponding to the prune sequence
Definition CCPruner.cxx:240
Float_t GetOptimalPruneStrength() const
Definition CCPruner.h:89
OptionBase * DeclareOptionRef(T &ref, const TString &name, const TString &desc="")
void AddPreDefVal(const T &)
const TString & GetOptions() const
MsgLogger & Log() const
Implementation of the CrossEntropy as separation criterion.
Class that contains all the data information.
Definition DataSetInfo.h:62
static void SetIsTraining(bool on)
Implementation of a Decision Tree.
Double_t GetNodePurityLimit() const
Double_t CheckEvent(const TMVA::Event *, Bool_t UseYesNoLeaf=kFALSE) const
the event e is put into the decision tree (starting at the root node) and the output is NodeType (sig...
Double_t GetWeight() const
return the event weight - depending on whether the flag IgnoreNegWeightsInTraining is or not.
Definition Event.cxx:389
Implementation of the GiniIndex as separation criterion.
Definition GiniIndex.h:63
MethodBase(const TString &jobName, Types::EMVA methodType, const TString &methodTitle, DataSetInfo &dsi, const TString &theOption="")
standard constructor
virtual void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
Types::EAnalysisType GetAnalysisType() const
Definition MethodBase.h:440
UInt_t GetTrainingTMVAVersionCode() const
Definition MethodBase.h:392
void ExitFromTraining()
Definition MethodBase.h:467
Bool_t DoRegression() const
Definition MethodBase.h:441
const Event * GetEvent() const
Definition MethodBase.h:754
DataSetInfo & DataInfo() const
Definition MethodBase.h:413
Types::EAnalysisType fAnalysisType
Definition MethodBase.h:598
UInt_t GetNvar() const
Definition MethodBase.h:347
void SetSignalReferenceCut(Double_t cut)
Definition MethodBase.h:367
void NoErrorCalc(Double_t *const err, Double_t *const errUpper)
DataSet * Data() const
Definition MethodBase.h:412
UInt_t fMaxDepth
max depth
Definition MethodDT.h:119
Bool_t fAutomatic
use user given prune strength or automatically determined one using a validation sample
Definition MethodDT.h:126
Float_t fMinNodeSize
min percentage of training events in node
Definition MethodDT.h:113
virtual ~MethodDT(void)
destructor
Definition MethodDT.cxx:367
MethodDT(const TString &jobName, const TString &methodTitle, DataSetInfo &theData, const TString &theOption="")
the standard constructor for just an ordinar "decision trees"
Definition MethodDT.cxx:126
void DeclareOptions() override
Define the options (their key words) that can be set in the option string.
Definition MethodDT.cxx:211
Bool_t fUsePoissonNvars
fUseNvars is used as a poisson mean, and the actual value of useNvars is at each step drawn form that...
Definition MethodDT.h:129
Int_t fUseNvars
the number of variables used in the randomised tree splitting
Definition MethodDT.h:128
void ProcessOptions() override
the option string is decoded, for available options see "DeclareOptions"
Definition MethodDT.cxx:255
void Init(void) override
common initialisation with defaults for the DT-Method
Definition MethodDT.cxx:342
Double_t TestTreeQuality(DecisionTree *dt)
Definition MethodDT.cxx:505
SeparationBase * fSepType
the separation used in node splitting
Definition MethodDT.h:110
DecisionTree::EPruneMethod fPruneMethod
method used for pruning
Definition MethodDT.h:124
Double_t fErrorFraction
ntuple var: misclassification error fraction
Definition MethodDT.h:122
Double_t fDeltaPruneStrength
step size in pruning, is adjusted according to experience of previous trees
Definition MethodDT.h:132
Bool_t fUseYesNoLeaf
use sig or bkg classification in leave nodes or sig/bkg
Definition MethodDT.h:117
void ReadWeightsFromXML(void *wghtnode) override
Definition MethodDT.cxx:530
const Ranking * CreateRanking() override
Definition MethodDT.cxx:565
Double_t fNodePurityLimit
purity limit for sig/bkg nodes
Definition MethodDT.h:118
TString fSepTypeS
the separation (option string) used in node splitting
Definition MethodDT.h:111
TString fMinNodeSizeS
string containing min percentage of training events in node
Definition MethodDT.h:114
void GetHelpMessage() const override
Definition MethodDT.cxx:560
Double_t GetMvaValue(Double_t *err=nullptr, Double_t *errUpper=nullptr) override
returns MVA value
Definition MethodDT.cxx:550
Double_t PruneTree()
prune the decision tree if requested (good for individual trees that are best grown out,...
Definition MethodDT.cxx:405
void DeclareCompatibilityOptions() override
options that are used ONLY for the READER to ensure backward compatibility
Definition MethodDT.cxx:244
void ReadWeightsFromStream(std::istream &istr) override
Definition MethodDT.cxx:540
Bool_t fRandomisedTrees
choose a random subset of possible cut variables at each node during training
Definition MethodDT.h:127
DecisionTree * fTree
the decision tree
Definition MethodDT.h:108
void AddWeightsXMLTo(void *parent) const override
Definition MethodDT.cxx:522
void Train(void) override
Definition MethodDT.cxx:374
Bool_t HasAnalysisType(Types::EAnalysisType type, UInt_t numberClasses, UInt_t numberTargets) override
FDA can handle classification with 2 classes and regression with one regression-target.
Definition MethodDT.cxx:179
Bool_t fPruneBeforeBoost
ancient variable, only needed for "CompatibilityOptions"
Definition MethodDT.h:137
void SetMinNodeSize(Double_t sizeInPercent)
Definition MethodDT.cxx:319
Int_t fNCuts
grid used in cut applied in node splitting
Definition MethodDT.h:116
Double_t fPruneStrength
a parameter to set the "amount" of pruning..needs to be adjusted
Definition MethodDT.h:123
TString fPruneMethodS
prune method option String
Definition MethodDT.h:125
Int_t fMinNodeEvents
min number of events in node
Definition MethodDT.h:112
Implementation of the MisClassificationError as separation criterion.
Ranking for variables in method (implementation).
Definition Ranking.h:48
Implementation of the SdivSqrtSplusB as separation criterion.
Singleton class for Global types used by TMVA.
Definition Types.h:71
@ kMulticlass
Definition Types.h:129
@ kClassification
Definition Types.h:127
@ kTraining
Definition Types.h:143
@ kValidation
these are placeholders... currently not used, but could be moved "forward" if
Definition Types.h:146
Basic string class.
Definition TString.h:138
Double_t Atof() const
Return floating-point value contained in string.
Definition TString.cxx:2060
TString & ReplaceAll(const TString &s1, const TString &s2)
Definition TString.h:713
Bool_t IsAlnum() const
Returns true if all characters in string are alphanumeric.
Definition TString.cxx:1819
create variable transformations
MsgLogger & Endl(MsgLogger &ml)
Definition MsgLogger.h:148