Logo ROOT  
Reference Guide
TMVA::CostComplexityPruneTool Class Reference

A class to prune a decision tree using the Cost Complexity method.

(see "Classification and Regression Trees" by Leo Breiman et al)

Some definitions:

  • \( T_{max} \) - the initial, usually highly overtrained tree, that is to be pruned back
  • \( R(T) \) - quality index (Gini, misclassification rate, or other) of a tree \( T \)
  • \( \sim T \) - set of terminal nodes in \( T \)
  • \( T' \) - the pruned subtree of \( T_max \) that has the best quality index \( R(T') \)
  • \( \alpha \) - the prune strength parameter in Cost Complexity pruning \( (R_{\alpha}(T) = R(T) + \alpha*|\sim T|) \)

There are two running modes in CCPruner: (i) one may select a prune strength and prune back the tree \( T_{max}\) until the criterion:

\[ \alpha < \frac{R(T) - R(t)}{|\sim T_t| - 1} \]

is true for all nodes t in \( T \), or (ii) the algorithm finds the sequence of critical points \( \alpha_k < \alpha_{k+1} ... < \alpha_K \) such that \( T_K = root(T_{max}) \) and then selects the optimally-pruned subtree, defined to be the subtree with the best quality index for the validation sample.

Definition at line 61 of file CostComplexityPruneTool.h.

Public Member Functions

 CostComplexityPruneTool (SeparationBase *qualityIndex=NULL)
 the constructor for the cost complexity pruning More...
 
virtual ~CostComplexityPruneTool ()
 the destructor for the cost complexity pruning More...
 
virtual PruningInfoCalculatePruningInfo (DecisionTree *dt, const IPruneTool::EventSample *testEvents=NULL, Bool_t isAutomatic=kFALSE)
 the routine that basically "steers" the pruning process. More...
 
- Public Member Functions inherited from TMVA::IPruneTool
 IPruneTool ()
 
virtual ~IPruneTool ()
 
virtual PruningInfoCalculatePruningInfo (DecisionTree *dt, const EventSample *testEvents=NULL, Bool_t isAutomatic=kFALSE)=0
 
Double_t GetPruneStrength () const
 
Bool_t IsAutomatic () const
 
void SetAutomatic ()
 
void SetPruneStrength (Double_t alpha)
 

Private Member Functions

void InitTreePruningMetaData (DecisionTreeNode *n)
 the optimal index of the prune sequence More...
 
MsgLoggerLog () const
 output stream to save logging information More...
 
void Optimize (DecisionTree *dt, Double_t weights)
 after the critical \( \alpha \) values (at which the corresponding nodes would be pruned away) had been established in the "InitMetaData" we need now: automatic pruning: More...
 

Private Attributes

MsgLoggerfLogger
 
Int_t fOptimalK
 map of R(T) -> pruning index More...
 
std::vector< DecisionTreeNode * > fPruneSequence
 the quality index used to calculate R(t), R(T) = sum[t in ~T]{ R(t) } More...
 
std::vector< Double_tfPruneStrengthList
 map of weakest links (i.e., branches to prune) -> pruning index More...
 
std::vector< Double_tfQualityIndexList
 map of alpha -> pruning index More...
 
SeparationBasefQualityIndexTool
 

Additional Inherited Members

- Public Types inherited from TMVA::IPruneTool
typedef std::vector< const Event * > EventSample
 
- Protected Attributes inherited from TMVA::IPruneTool
Double_t B
 
Double_t fPruneStrength
 
Double_t S
 regularization parameter in pruning More...
 

#include <TMVA/CostComplexityPruneTool.h>

Inheritance diagram for TMVA::CostComplexityPruneTool:
[legend]

Constructor & Destructor Documentation

◆ CostComplexityPruneTool()

CostComplexityPruneTool::CostComplexityPruneTool ( SeparationBase qualityIndex = NULL)

the constructor for the cost complexity pruning

Definition at line 69 of file CostComplexityPruneTool.cxx.

◆ ~CostComplexityPruneTool()

CostComplexityPruneTool::~CostComplexityPruneTool ( )
virtual

the destructor for the cost complexity pruning

Definition at line 90 of file CostComplexityPruneTool.cxx.

Member Function Documentation

◆ CalculatePruningInfo()

PruningInfo * CostComplexityPruneTool::CalculatePruningInfo ( DecisionTree dt,
const IPruneTool::EventSample validationSample = NULL,
Bool_t  isAutomatic = kFALSE 
)
virtual

the routine that basically "steers" the pruning process.

Call the calculation of the pruning sequence, the tree quality and alike..

Implements TMVA::IPruneTool.

Definition at line 99 of file CostComplexityPruneTool.cxx.

◆ InitTreePruningMetaData()

void CostComplexityPruneTool::InitTreePruningMetaData ( DecisionTreeNode n)
private

the optimal index of the prune sequence

initialise "meta data" for the pruning, like the "costcomplexity", the critical alpha, the minimal alpha down the tree, etc... for each node!!

Definition at line 182 of file CostComplexityPruneTool.cxx.

◆ Log()

MsgLogger & TMVA::CostComplexityPruneTool::Log ( ) const
inlineprivate

output stream to save logging information

Definition at line 86 of file CostComplexityPruneTool.h.

◆ Optimize()

void CostComplexityPruneTool::Optimize ( DecisionTree dt,
Double_t  weights 
)
private

after the critical \( \alpha \) values (at which the corresponding nodes would be pruned away) had been established in the "InitMetaData" we need now: automatic pruning:

find the value of \( \alpha \) for which the test sample gives minimal error, on the tree with all nodes pruned that have \( \alpha_{critical} < \alpha \), fixed parameter pruning

Definition at line 237 of file CostComplexityPruneTool.cxx.

Member Data Documentation

◆ fLogger

MsgLogger* TMVA::CostComplexityPruneTool::fLogger
mutableprivate

Definition at line 85 of file CostComplexityPruneTool.h.

◆ fOptimalK

Int_t TMVA::CostComplexityPruneTool::fOptimalK
private

map of R(T) -> pruning index

Definition at line 76 of file CostComplexityPruneTool.h.

◆ fPruneSequence

std::vector<DecisionTreeNode*> TMVA::CostComplexityPruneTool::fPruneSequence
private

the quality index used to calculate R(t), R(T) = sum[t in ~T]{ R(t) }

Definition at line 72 of file CostComplexityPruneTool.h.

◆ fPruneStrengthList

std::vector<Double_t> TMVA::CostComplexityPruneTool::fPruneStrengthList
private

map of weakest links (i.e., branches to prune) -> pruning index

Definition at line 73 of file CostComplexityPruneTool.h.

◆ fQualityIndexList

std::vector<Double_t> TMVA::CostComplexityPruneTool::fQualityIndexList
private

map of alpha -> pruning index

Definition at line 74 of file CostComplexityPruneTool.h.

◆ fQualityIndexTool

SeparationBase* TMVA::CostComplexityPruneTool::fQualityIndexTool
private

Definition at line 70 of file CostComplexityPruneTool.h.

Libraries for TMVA::CostComplexityPruneTool:
[legend]

The documentation for this class was generated from the following files: