A class to prune a decision tree using the Cost Complexity method.
(see "Classification and Regression Trees" by Leo Breiman et al)
There are two running modes in CCPruner: (i) one may select a prune strength and prune back the tree \( T_{max}\) until the criterion:
\[ \alpha < \frac{R(T) - R(t)}{|\sim T_t| - 1} \]
is true for all nodes t in \( T \), or (ii) the algorithm finds the sequence of critical points \( \alpha_k < \alpha_{k+1} ... < \alpha_K \) such that \( T_K = root(T_{max}) \) and then selects the optimally-pruned subtree, defined to be the subtree with the best quality index for the validation sample.
Definition at line 62 of file CostComplexityPruneTool.h.
| Public Member Functions | |
| CostComplexityPruneTool (SeparationBase *qualityIndex=nullptr) | |
| the constructor for the cost complexity pruning | |
| virtual | ~CostComplexityPruneTool () | 
| the destructor for the cost complexity pruning | |
| PruningInfo * | CalculatePruningInfo (DecisionTree *dt, const IPruneTool::EventSample *testEvents=nullptr, Bool_t isAutomatic=kFALSE) override | 
| the routine that basically "steers" the pruning process. | |
|  Public Member Functions inherited from TMVA::IPruneTool | |
| IPruneTool () | |
| virtual | ~IPruneTool () | 
| Double_t | GetPruneStrength () const | 
| Bool_t | IsAutomatic () const | 
| void | SetAutomatic () | 
| void | SetPruneStrength (Double_t alpha) | 
| Private Member Functions | |
| void | InitTreePruningMetaData (DecisionTreeNode *n) | 
| initialise "meta data" for the pruning, like the "costcomplexity", the critical alpha, the minimal alpha down the tree, etc... for each node!! | |
| MsgLogger & | Log () const | 
| output stream to save logging information | |
| void | Optimize (DecisionTree *dt, Double_t weights) | 
| after the critical \( \alpha \) values (at which the corresponding nodes would be pruned away) had been established in the "InitMetaData" we need now: automatic pruning: | |
| Private Attributes | |
| MsgLogger * | fLogger | 
| Int_t | fOptimalK | 
| ! the optimal index of the prune sequence | |
| std::vector< DecisionTreeNode * > | fPruneSequence | 
| ! map of weakest links (i.e., branches to prune) -> pruning index | |
| std::vector< Double_t > | fPruneStrengthList | 
| ! map of alpha -> pruning index | |
| std::vector< Double_t > | fQualityIndexList | 
| ! map of R(T) -> pruning index | |
| SeparationBase * | fQualityIndexTool | 
| ! the quality index used to calculate R(t), R(T) = sum[t in ~T]{ R(t) } | |
| Additional Inherited Members | |
|  Public Types inherited from TMVA::IPruneTool | |
| typedef std::vector< const Event * > | EventSample | 
|  Protected Attributes inherited from TMVA::IPruneTool | |
| Double_t | B | 
| Double_t | fPruneStrength | 
| ! regularization parameter in pruning | |
| Double_t | S | 
#include <TMVA/CostComplexityPruneTool.h>
| CostComplexityPruneTool::CostComplexityPruneTool | ( | SeparationBase * | qualityIndex = nullptr | ) | 
the constructor for the cost complexity pruning
Definition at line 68 of file CostComplexityPruneTool.cxx.
| 
 | virtual | 
the destructor for the cost complexity pruning
Definition at line 89 of file CostComplexityPruneTool.cxx.
| 
 | overridevirtual | 
the routine that basically "steers" the pruning process.
Call the calculation of the pruning sequence, the tree quality and alike..
Implements TMVA::IPruneTool.
Definition at line 98 of file CostComplexityPruneTool.cxx.
| 
 | private | 
initialise "meta data" for the pruning, like the "costcomplexity", the critical alpha, the minimal alpha down the tree, etc... for each node!!
Definition at line 181 of file CostComplexityPruneTool.cxx.
| 
 | inlineprivate | 
output stream to save logging information
Definition at line 87 of file CostComplexityPruneTool.h.
| 
 | private | 
after the critical \( \alpha \) values (at which the corresponding nodes would be pruned away) had been established in the "InitMetaData" we need now: automatic pruning:
find the value of \( \alpha \) for which the test sample gives minimal error, on the tree with all nodes pruned that have \( \alpha_{critical} < \alpha \), fixed parameter pruning
Definition at line 236 of file CostComplexityPruneTool.cxx.
| 
 | mutableprivate | 
Definition at line 86 of file CostComplexityPruneTool.h.
| 
 | private | 
! the optimal index of the prune sequence
Definition at line 77 of file CostComplexityPruneTool.h.
| 
 | private | 
! map of weakest links (i.e., branches to prune) -> pruning index
Definition at line 73 of file CostComplexityPruneTool.h.
| 
 | private | 
! map of alpha -> pruning index
Definition at line 74 of file CostComplexityPruneTool.h.
| 
 | private | 
! map of R(T) -> pruning index
Definition at line 75 of file CostComplexityPruneTool.h.
| 
 | private | 
! the quality index used to calculate R(t), R(T) = sum[t in ~T]{ R(t) }
Definition at line 71 of file CostComplexityPruneTool.h.