ROOT   6.10/09 Reference Guide
TMVA::CostComplexityPruneTool Class Reference

A class to prune a decision tree using the Cost Complexity method.

(see "Classification and Regression Trees" by Leo Breiman et al)

### Some definitions:

• $$T_{max}$$ - the initial, usually highly overtrained tree, that is to be pruned back
• $$R(T)$$ - quality index (Gini, misclassification rate, or other) of a tree $$T$$
• $$\sim T$$ - set of terminal nodes in $$T$$
• $$T'$$ - the pruned subtree of $$T_max$$ that has the best quality index $$R(T')$$
• $$\alpha$$ - the prune strength parameter in Cost Complexity pruning $$(R_{\alpha}(T) = R(T) + \alpha*|\sim T|)$$

There are two running modes in CCPruner: (i) one may select a prune strength and prune back the tree $$T_{max}$$ until the criterion:

$\alpha < \frac{R(T) - R(t)}{|\sim T_t| - 1}$

is true for all nodes t in $$T$$, or (ii) the algorithm finds the sequence of critical points $$\alpha_k < \alpha_{k+1} ... < \alpha_K$$ such that $$T_K = root(T_{max})$$ and then selects the optimally-pruned subtree, defined to be the subtree with the best quality index for the validation sample.

Definition at line 61 of file CostComplexityPruneTool.h.

## Public Member Functions

CostComplexityPruneTool (SeparationBase *qualityIndex=NULL)
the constructor for the cost complexity pruning More...

virtual ~CostComplexityPruneTool ()
the destructor for the cost complexity pruning More...

virtual PruningInfoCalculatePruningInfo (DecisionTree *dt, const IPruneTool::EventSample *testEvents=NULL, Bool_t isAutomatic=kFALSE)
the routine that basically "steers" the pruning process. More...

Public Member Functions inherited from TMVA::IPruneTool
IPruneTool ()

virtual ~IPruneTool ()

Double_t GetPruneStrength () const

Bool_t IsAutomatic () const

void SetAutomatic ()

void SetPruneStrength (Double_t alpha)

## Private Member Functions

the optimal index of the prune sequence More...

MsgLoggerLog () const
output stream to save logging information More...

void Optimize (DecisionTree *dt, Double_t weights)
after the critical $$\alpha$$ values (at which the corresponding nodes would be pruned away) had been established in the "InitMetaData" we need now: automatic pruning: More...

## Private Attributes

MsgLoggerfLogger

Int_t fOptimalK
map of R(T) -> pruning index More...

std::vector< DecisionTreeNode * > fPruneSequence
the quality index used to calculate R(t), R(T) = sum[t in ~T]{ R(t) } More...

std::vector< Double_tfPruneStrengthList
map of weakest links (i.e., branches to prune) -> pruning index More...

std::vector< Double_tfQualityIndexList
map of alpha -> pruning index More...

SeparationBasefQualityIndexTool

Public Types inherited from TMVA::IPruneTool
typedef std::vector< const Event * > EventSample

Protected Attributes inherited from TMVA::IPruneTool
Double_t B

Double_t fPruneStrength

Double_t S
regularization parameter in pruning More...

#include <TMVA/CostComplexityPruneTool.h>

Inheritance diagram for TMVA::CostComplexityPruneTool:
[legend]

## ◆ CostComplexityPruneTool()

 CostComplexityPruneTool::CostComplexityPruneTool ( SeparationBase * qualityIndex = NULL )

the constructor for the cost complexity pruning

Definition at line 69 of file CostComplexityPruneTool.cxx.

## ◆ ~CostComplexityPruneTool()

 CostComplexityPruneTool::~CostComplexityPruneTool ( )
virtual

the destructor for the cost complexity pruning

Definition at line 90 of file CostComplexityPruneTool.cxx.

## ◆ CalculatePruningInfo()

 PruningInfo * CostComplexityPruneTool::CalculatePruningInfo ( DecisionTree * dt, const IPruneTool::EventSample * validationSample = NULL, Bool_t isAutomatic = kFALSE )
virtual

the routine that basically "steers" the pruning process.

Call the calculation of the pruning sequence, the tree quality and alike..

Implements TMVA::IPruneTool.

Definition at line 99 of file CostComplexityPruneTool.cxx.

 void CostComplexityPruneTool::InitTreePruningMetaData ( DecisionTreeNode * n )
private

the optimal index of the prune sequence

initialise "meta data" for the pruning, like the "costcomplexity", the critical alpha, the minimal alpha down the tree, etc...

for each node!!

Definition at line 182 of file CostComplexityPruneTool.cxx.

## ◆ Log()

 MsgLogger& TMVA::CostComplexityPruneTool::Log ( ) const
inlineprivate

output stream to save logging information

Definition at line 86 of file CostComplexityPruneTool.h.

## ◆ Optimize()

 void CostComplexityPruneTool::Optimize ( DecisionTree * dt, Double_t weights )
private

after the critical $$\alpha$$ values (at which the corresponding nodes would be pruned away) had been established in the "InitMetaData" we need now: automatic pruning:

find the value of $$\alpha$$ for which the test sample gives minimal error, on the tree with all nodes pruned that have $$\alpha_{critical} < \alpha$$, fixed parameter pruning

Definition at line 237 of file CostComplexityPruneTool.cxx.

## ◆ fLogger

 MsgLogger* TMVA::CostComplexityPruneTool::fLogger
mutableprivate

Definition at line 85 of file CostComplexityPruneTool.h.

## ◆ fOptimalK

 Int_t TMVA::CostComplexityPruneTool::fOptimalK
private

map of R(T) -> pruning index

Definition at line 76 of file CostComplexityPruneTool.h.

## ◆ fPruneSequence

 std::vector TMVA::CostComplexityPruneTool::fPruneSequence
private

the quality index used to calculate R(t), R(T) = sum[t in ~T]{ R(t) }

Definition at line 72 of file CostComplexityPruneTool.h.

## ◆ fPruneStrengthList

 std::vector TMVA::CostComplexityPruneTool::fPruneStrengthList
private

map of weakest links (i.e., branches to prune) -> pruning index

Definition at line 73 of file CostComplexityPruneTool.h.

## ◆ fQualityIndexList

 std::vector TMVA::CostComplexityPruneTool::fQualityIndexList
private

map of alpha -> pruning index

Definition at line 74 of file CostComplexityPruneTool.h.

## ◆ fQualityIndexTool

 SeparationBase* TMVA::CostComplexityPruneTool::fQualityIndexTool
private

Definition at line 70 of file CostComplexityPruneTool.h.

The documentation for this class was generated from the following files: