A class to prune a decision tree using the Cost Complexity method.
(see "Classification and Regression Trees" by Leo Breiman et al)
Some definitions:
- T_{max} - the initial, usually highly overtrained tree, that is to be pruned back
- R(T) - quality index (Gini, misclassification rate, or other) of a tree T
- \sim T - set of terminal nodes in T
- T' - the pruned subtree of T_max that has the best quality index R(T')
- \alpha - the prune strength parameter in Cost Complexity pruning (R_{\alpha}(T) = R(T) + \alpha*|\sim T|)
There are two running modes in CCPruner: (i) one may select a prune strength and prune back the tree T_{max} until the criterion:
\alpha < \frac{R(T) - R(t)}{|\sim T_t| - 1}
is true for all nodes t in T , or (ii) the algorithm finds the sequence of critical points \alpha_k < \alpha_{k+1} ... < \alpha_K such that T_K = root(T_{max}) and then selects the optimally-pruned subtree, defined to be the subtree with the best quality index for the validation sample.
Definition at line 61 of file CostComplexityPruneTool.h.