A class to prune a decision tree using the Cost Complexity method.
(see "Classification and Regression Trees" by Leo Breiman et al)
There are two running modes in CCPruner: (i) one may select a prune strength and prune back the tree \( T_{max}\) until the criterion:
\[ \alpha < \frac{R(T) - R(t)}{|\sim T_t| - 1} \]
is true for all nodes t in \( T \), or (ii) the algorithm finds the sequence of critical points \( \alpha_k < \alpha_{k+1} ... < \alpha_K \) such that \( T_K = root(T_{max}) \) and then selects the optimally-pruned subtree, defined to be the subtree with the best quality index for the validation sample.
Definition at line 62 of file CostComplexityPruneTool.h.
Public Member Functions | |
CostComplexityPruneTool (SeparationBase *qualityIndex=NULL) | |
the constructor for the cost complexity pruning | |
virtual | ~CostComplexityPruneTool () |
the destructor for the cost complexity pruning | |
virtual PruningInfo * | CalculatePruningInfo (DecisionTree *dt, const IPruneTool::EventSample *testEvents=NULL, Bool_t isAutomatic=kFALSE) |
the routine that basically "steers" the pruning process. | |
Public Member Functions inherited from TMVA::IPruneTool | |
IPruneTool () | |
virtual | ~IPruneTool () |
Double_t | GetPruneStrength () const |
Bool_t | IsAutomatic () const |
void | SetAutomatic () |
void | SetPruneStrength (Double_t alpha) |
Private Member Functions | |
void | InitTreePruningMetaData (DecisionTreeNode *n) |
the optimal index of the prune sequence | |
MsgLogger & | Log () const |
output stream to save logging information | |
void | Optimize (DecisionTree *dt, Double_t weights) |
after the critical \( \alpha \) values (at which the corresponding nodes would be pruned away) had been established in the "InitMetaData" we need now: automatic pruning: | |
Private Attributes | |
MsgLogger * | fLogger |
Int_t | fOptimalK |
map of R(T) -> pruning index | |
std::vector< DecisionTreeNode * > | fPruneSequence |
the quality index used to calculate R(t), R(T) = sum[t in ~T]{ R(t) } | |
std::vector< Double_t > | fPruneStrengthList |
map of weakest links (i.e., branches to prune) -> pruning index | |
std::vector< Double_t > | fQualityIndexList |
map of alpha -> pruning index | |
SeparationBase * | fQualityIndexTool |
Additional Inherited Members | |
Public Types inherited from TMVA::IPruneTool | |
typedef std::vector< const Event * > | EventSample |
Protected Attributes inherited from TMVA::IPruneTool | |
Double_t | B |
Double_t | fPruneStrength |
Double_t | S |
regularization parameter in pruning | |
#include <TMVA/CostComplexityPruneTool.h>
CostComplexityPruneTool::CostComplexityPruneTool | ( | SeparationBase * | qualityIndex = NULL | ) |
the constructor for the cost complexity pruning
Definition at line 68 of file CostComplexityPruneTool.cxx.
|
virtual |
the destructor for the cost complexity pruning
Definition at line 89 of file CostComplexityPruneTool.cxx.
|
virtual |
the routine that basically "steers" the pruning process.
Call the calculation of the pruning sequence, the tree quality and alike..
Implements TMVA::IPruneTool.
Definition at line 98 of file CostComplexityPruneTool.cxx.
|
private |
the optimal index of the prune sequence
initialise "meta data" for the pruning, like the "costcomplexity", the critical alpha, the minimal alpha down the tree, etc... for each node!!
Definition at line 181 of file CostComplexityPruneTool.cxx.
|
inlineprivate |
output stream to save logging information
Definition at line 87 of file CostComplexityPruneTool.h.
|
private |
after the critical \( \alpha \) values (at which the corresponding nodes would be pruned away) had been established in the "InitMetaData" we need now: automatic pruning:
find the value of \( \alpha \) for which the test sample gives minimal error, on the tree with all nodes pruned that have \( \alpha_{critical} < \alpha \), fixed parameter pruning
Definition at line 236 of file CostComplexityPruneTool.cxx.
|
mutableprivate |
Definition at line 86 of file CostComplexityPruneTool.h.
|
private |
map of R(T) -> pruning index
Definition at line 77 of file CostComplexityPruneTool.h.
|
private |
the quality index used to calculate R(t), R(T) = sum[t in ~T]{ R(t) }
Definition at line 73 of file CostComplexityPruneTool.h.
|
private |
map of weakest links (i.e., branches to prune) -> pruning index
Definition at line 74 of file CostComplexityPruneTool.h.
|
private |
map of alpha -> pruning index
Definition at line 75 of file CostComplexityPruneTool.h.
|
private |
Definition at line 71 of file CostComplexityPruneTool.h.