A helper class to prune a decision tree using the Cost Complexity method (see Classification and Regression Trees by Leo Breiman et al)
There are two running modes in CCPruner: (i) one may select a prune strength and prune back the tree \( T_{max}\) until the criterion:
\[ \alpha < \frac{R(T) - R(t)}{|\sim T_t| - 1} \]
is true for all nodes t in \( T \), or (ii) the algorithm finds the sequence of critical points \( \alpha_k < \alpha_{k+1} ... < \alpha_K \) such that \( T_K = root(T_{max}) \) and then selects the optimally-pruned subtree, defined to be the subtree with the best quality index for the validation sample.
Definition at line 62 of file CCPruner.h.
Public Types | |
typedef std::vector< Event * > | EventList |
Public Member Functions | |
CCPruner (DecisionTree *t_max, const DataSet *validationSample, SeparationBase *qualityIndex=NULL) | |
constructor | |
CCPruner (DecisionTree *t_max, const EventList *validationSample, SeparationBase *qualityIndex=NULL) | |
constructor | |
~CCPruner () | |
std::vector< TMVA::DecisionTreeNode * > | GetOptimalPruneSequence () const |
return the prune strength (=alpha) corresponding to the prune sequence | |
Float_t | GetOptimalPruneStrength () const |
Float_t | GetOptimalQualityIndex () const |
void | Optimize () |
determine the pruning sequence | |
void | SetPruneStrength (Float_t alpha=-1.0) |
Private Attributes | |
Float_t | fAlpha |
Bool_t | fDebug |
index of the optimal tree in the pruned tree sequence | |
Int_t | fOptimalK |
map of R(T) -> pruning index | |
Bool_t | fOwnQIndex |
the quality index used to calculate R(t), R(T) = sum[t in ~T]{ R(t) } | |
std::vector< TMVA::DecisionTreeNode * > | fPruneSequence |
(pruned) decision tree | |
std::vector< Float_t > | fPruneStrengthList |
map of weakest links (i.e., branches to prune) -> pruning index | |
SeparationBase * | fQualityIndex |
the event sample to select the optimally-pruned tree | |
std::vector< Float_t > | fQualityIndexList |
map of alpha -> pruning index | |
DecisionTree * | fTree |
flag indicates if fQualityIndex is owned by this | |
const DataSet * | fValidationDataSet |
the event sample to select the optimally-pruned tree | |
const EventList * | fValidationSample |
regularization parameter in CC pruning | |
#include <TMVA/CCPruner.h>
typedef std::vector<Event*> TMVA::CCPruner::EventList |
Definition at line 64 of file CCPruner.h.
CCPruner::CCPruner | ( | DecisionTree * | t_max, |
const EventList * | validationSample, | ||
SeparationBase * | qualityIndex = NULL |
||
) |
constructor
Definition at line 69 of file CCPruner.cxx.
CCPruner::CCPruner | ( | DecisionTree * | t_max, |
const DataSet * | validationSample, | ||
SeparationBase * | qualityIndex = NULL |
||
) |
constructor
Definition at line 92 of file CCPruner.cxx.
CCPruner::~CCPruner | ( | ) |
Definition at line 115 of file CCPruner.cxx.
std::vector< DecisionTreeNode * > CCPruner::GetOptimalPruneSequence | ( | ) | const |
return the prune strength (=alpha) corresponding to the prune sequence
Definition at line 240 of file CCPruner.cxx.
|
inline |
Definition at line 89 of file CCPruner.h.
|
inline |
Definition at line 85 of file CCPruner.h.
void CCPruner::Optimize | ( | ) |
determine the pruning sequence
Definition at line 124 of file CCPruner.cxx.
Definition at line 110 of file CCPruner.h.
|
private |
Definition at line 93 of file CCPruner.h.
|
private |
index of the optimal tree in the pruned tree sequence
Definition at line 106 of file CCPruner.h.
|
private |
map of R(T) -> pruning index
Definition at line 105 of file CCPruner.h.
|
private |
the quality index used to calculate R(t), R(T) = sum[t in ~T]{ R(t) }
Definition at line 97 of file CCPruner.h.
|
private |
(pruned) decision tree
Definition at line 101 of file CCPruner.h.
|
private |
map of weakest links (i.e., branches to prune) -> pruning index
Definition at line 102 of file CCPruner.h.
|
private |
the event sample to select the optimally-pruned tree
Definition at line 96 of file CCPruner.h.
|
private |
map of alpha -> pruning index
Definition at line 103 of file CCPruner.h.
|
private |
flag indicates if fQualityIndex is owned by this
Definition at line 99 of file CCPruner.h.
|
private |
the event sample to select the optimally-pruned tree
Definition at line 95 of file CCPruner.h.
|
private |
regularization parameter in CC pruning
Definition at line 94 of file CCPruner.h.