A class to prune a decision tree using the Cost Complexity method.
(see "Classification and Regression Trees" by Leo Breiman et al)
There are two running modes in CCPruner: (i) one may select a prune strength and prune back the tree \( T_{max}\) until the criterion:
\[ \alpha < \frac{R(T) - R(t)}{|\sim T_t| - 1} \]
is true for all nodes t in \( T \), or (ii) the algorithm finds the sequence of critical points \( \alpha_k < \alpha_{k+1} ... < \alpha_K \) such that \( T_K = root(T_{max}) \) and then selects the optimally-pruned subtree, defined to be the subtree with the best quality index for the validation sample.
Definition at line 61 of file CostComplexityPruneTool.h.
Public Member Functions | |
CostComplexityPruneTool (SeparationBase *qualityIndex=NULL) | |
the constructor for the cost complexity pruning More... | |
virtual | ~CostComplexityPruneTool () |
the destructor for the cost complexity pruning More... | |
virtual PruningInfo * | CalculatePruningInfo (DecisionTree *dt, const IPruneTool::EventSample *testEvents=NULL, Bool_t isAutomatic=kFALSE) |
the routine that basically "steers" the pruning process. More... | |
Public Member Functions inherited from TMVA::IPruneTool | |
IPruneTool () | |
virtual | ~IPruneTool () |
virtual PruningInfo * | CalculatePruningInfo (DecisionTree *dt, const EventSample *testEvents=NULL, Bool_t isAutomatic=kFALSE)=0 |
Double_t | GetPruneStrength () const |
Bool_t | IsAutomatic () const |
void | SetAutomatic () |
void | SetPruneStrength (Double_t alpha) |
Private Member Functions | |
void | InitTreePruningMetaData (DecisionTreeNode *n) |
the optimal index of the prune sequence More... | |
MsgLogger & | Log () const |
output stream to save logging information More... | |
void | Optimize (DecisionTree *dt, Double_t weights) |
after the critical \( \alpha \) values (at which the corresponding nodes would be pruned away) had been established in the "InitMetaData" we need now: automatic pruning: More... | |
Private Attributes | |
MsgLogger * | fLogger |
Int_t | fOptimalK |
map of R(T) -> pruning index More... | |
std::vector< DecisionTreeNode * > | fPruneSequence |
the quality index used to calculate R(t), R(T) = sum[t in ~T]{ R(t) } More... | |
std::vector< Double_t > | fPruneStrengthList |
map of weakest links (i.e., branches to prune) -> pruning index More... | |
std::vector< Double_t > | fQualityIndexList |
map of alpha -> pruning index More... | |
SeparationBase * | fQualityIndexTool |
Additional Inherited Members | |
Public Types inherited from TMVA::IPruneTool | |
typedef std::vector< const Event * > | EventSample |
Protected Attributes inherited from TMVA::IPruneTool | |
Double_t | B |
Double_t | fPruneStrength |
Double_t | S |
regularization parameter in pruning More... | |
#include <TMVA/CostComplexityPruneTool.h>
CostComplexityPruneTool::CostComplexityPruneTool | ( | SeparationBase * | qualityIndex = NULL | ) |
the constructor for the cost complexity pruning
Definition at line 69 of file CostComplexityPruneTool.cxx.
|
virtual |
the destructor for the cost complexity pruning
Definition at line 90 of file CostComplexityPruneTool.cxx.
|
virtual |
the routine that basically "steers" the pruning process.
Call the calculation of the pruning sequence, the tree quality and alike..
Implements TMVA::IPruneTool.
Definition at line 99 of file CostComplexityPruneTool.cxx.
|
private |
the optimal index of the prune sequence
initialise "meta data" for the pruning, like the "costcomplexity", the critical alpha, the minimal alpha down the tree, etc... for each node!!
Definition at line 182 of file CostComplexityPruneTool.cxx.
|
inlineprivate |
output stream to save logging information
Definition at line 86 of file CostComplexityPruneTool.h.
|
private |
after the critical \( \alpha \) values (at which the corresponding nodes would be pruned away) had been established in the "InitMetaData" we need now: automatic pruning:
find the value of \( \alpha \) for which the test sample gives minimal error, on the tree with all nodes pruned that have \( \alpha_{critical} < \alpha \), fixed parameter pruning
Definition at line 237 of file CostComplexityPruneTool.cxx.
|
mutableprivate |
Definition at line 85 of file CostComplexityPruneTool.h.
|
private |
map of R(T) -> pruning index
Definition at line 76 of file CostComplexityPruneTool.h.
|
private |
the quality index used to calculate R(t), R(T) = sum[t in ~T]{ R(t) }
Definition at line 72 of file CostComplexityPruneTool.h.
|
private |
map of weakest links (i.e., branches to prune) -> pruning index
Definition at line 73 of file CostComplexityPruneTool.h.
|
private |
map of alpha -> pruning index
Definition at line 74 of file CostComplexityPruneTool.h.
|
private |
Definition at line 70 of file CostComplexityPruneTool.h.