library: libTMVA
#include "MethodBDT.h"

TMVA::MethodBDT


class description - header file - source file - inheritance tree (.pdf)

class TMVA::MethodBDT : public TMVA::MethodBase

Inheritance Chart:
TObject
<-
TMVA::MethodBase
<-
TMVA::MethodBDT
    private:
Double_t AdaBoost(vector<TMVA::Event*,allocator<TMVA::Event*> >, TMVA::DecisionTree* dt) Double_t Bagging(vector<TMVA::Event*,allocator<TMVA::Event*> >, Int_t iTree) void InitBDT() public:
MethodBDT(TString jobName, vector<TString>* theVariables, TTree* theTree, TString theOption = 100:AdaBoost:GiniIndex:10:0:20:-1, TDirectory* theTargetDir = 0) MethodBDT(vector<TString>* theVariables, TString theWeightFile, TDirectory* theTargetDir = NULL) MethodBDT(const TMVA::MethodBDT&) virtual ~MethodBDT() virtual Double_t Boost(vector<TMVA::Event*,allocator<TMVA::Event*> >, TMVA::DecisionTree* dt, Int_t iTree) static TClass* Class() virtual Double_t GetMvaValue(TMVA::Event* e) virtual void InitEventSample() virtual TClass* IsA() const TMVA::MethodBDT& operator=(const TMVA::MethodBDT&) virtual void ReadWeightsFromFile() virtual void ShowMembers(TMemberInspector& insp, char* parent) virtual void Streamer(TBuffer& b) void StreamerNVirtual(TBuffer& b) virtual void Train() virtual void WriteHistosToFile() virtual void WriteWeightsToFile()

Data Members

    private:
Double_t fAdaBoostBeta parameter in AdaBoost vector<TMVA::Event*,allocator<TMVA::Event*> > fEventSample the training events Int_t fNTrees number of decision trees requested vector<DecisionTree*> fForest the collection of decision trees vector<double> fBoostWeights the weights applied in the individual boosts TString fBoostType string specifying the boost type TMVA::SeparationBase* fSepType the separation used in node splitting Int_t fNodeMinEvents min number of events in node Double_t fDummyOpt dummy option (for backward compatibility) Int_t fNCuts grid used in cut applied in node splitting Double_t fSignalFraction scalefactor for bkg events to modify initial s/b fraction in training data TH1F* fBoostWeightHist weights applied in boosting TH2F* fErrFractHist error fraction vs tree number TTree* fMonitorNtuple monitoring ntuple Int_t fITree ntuple var: ith tree Double_t fBoostWeight ntuple var: boost weight Double_t fErrorFraction ntuple var: misclassification error fraction Int_t fNnodes ntuple var: nNodes

Class Description

_______________________________________________________________________

 Analysis of Boosted Decision Trees

 Boosted decision trees have been successfully used in High Energy
 Physics analysis for example by the MiniBooNE experiment
 (Yang-Roe-Zhu, physics/0508045). In Boosted Decision Trees, the
 selection is done on a majority vote on the result of several decision
 trees, which are all derived from the same training sample by
 supplying different event weights during the training.

 Decision trees:

 successive decision nodes are used to categorize the
 events out of the sample as either signal or background. Each node
 uses only a single discriminating variable to decide if the event is
 signal-like ("goes right") or background-like ("goes left"). This
 forms a tree like structure with "baskets" at the end (leave nodes),
 and an event is classified as either signal or background according to
 whether the basket where it ends up has been classified signal or
 background during the training. Training of a decision tree is the
 process to define the "cut criteria" for each node. The training
 starts with the root node. Here one takes the full training event
 sample and selects the variable and corresponding cut value that gives
 the best separation between signal and background at this stage. Using
 this cut criterion, the sample is then divided into two subsamples, a
 signal-like (right) and a background-like (left) sample. Two new nodes
 are then created for each of the two sub-samples and they are
 constructed using the same mechanism as described for the root
 node. The devision is stopped once a certain node has reached either a
 minimum number of events, or a minimum or maximum signal purity. These
 leave nodes are then called "signal" or "background" if they contain
 more signal respective background events from the training sample.

 Boosting:

 the idea behind the boosting is, that signal events from the training
 sample, that end up in a background node (and vice versa) are given a
 larger weight than events that are in the correct leave node. This
 results in a re-weighed training event sample, with which then a new
 decision tree can be developed. The boosting can be applied several
 times (typically 100-500 times) and one ends up with a set of decision
 trees (a forest).

 Bagging:

 In this particular variant of the Boosted Decision Trees the boosting
 is not done on the basis of previous training results, but by a simple
 stochasitc re-sampling of the initial training event sample.

 Analysis:

 applying an individual decision tree to a test event results in a
 classification of the event as either signal or background. For the
 boosted decision tree selection, an event is successively subjected to
 the whole set of decision trees and depending on how often it is
 classified as signal, a "likelihood" estimator is constructed for the
 event being signal or background. The value of this estimator is the
 one which is then used to select the events from an event sample, and
 the cut value on this estimator defines the efficiency and purity of
 the selection.

_______________________________________________________________________
MethodBDT( TString jobName, vector<TString>* theVariables, TTree* theTree, TString theOption, TDirectory* theTargetDir )
 the standard constructor for the "boosted decision trees"

 MethodBDT (Boosted Decision Trees) options:
 format and syntax of option string: "nTrees:BoostType:SeparationType:
                                      nEventsMin:dummy:
                                      nCuts:SignalFraction"
 nTrees:          number of trees in the forest to be created
 BoostType:       the boosting type for the trees in the forest (AdaBoost e.t.c..)
 SeparationType   the separation criterion applied in the node splitting
 nEventsMin:      the minimum number of events in a node (leaf criteria, stop splitting)
 dummy:           a dummy variable, just to keep backward compatible
 nCuts:  the number of steps in the optimisation of the cut for a node
 SignalFraction:  scale parameter of the number of Bkg events
                  applied to the training sample to simulate different initial purity
                  of your data sample.

 known SeparationTypes are:
    - MisClassificationError
    - GiniIndex
    - CrossEntropy
 known BoostTypes are:
    - AdaBoost
    - Bagging
MethodBDT( vector<TString> *theVariables, TString theWeightFile, TDirectory* theTargetDir )
 constructor for calculating BDT-MVA using previously generatad decision trees
 the result of the previous training (the decision trees) are read in via the
 weightfile. Make sure the "theVariables" correspond to the ones used in
 creating the "weight"-file
void InitBDT( void )
 common initialisation with defaults for the BDT-Method
~MethodBDT( void )
destructor
void InitEventSample( void )
 write all Events from the Tree into a vector of TMVA::Events, that are
 more easily manipulated.
 This method should never be called without existing trainingTree, as it
 the vector of events from the ROOT training tree
void Train( void )
 default sanity checks
Double_t Boost( vector<TMVA::Event*> eventSample, TMVA::DecisionTree *dt, Int_t iTree )
 apply the boosting alogrithim (the algorithm is selecte via the the "option" given
 in the constructor. The return value is the boosting weight
Double_t AdaBoost( vector<TMVA::Event*> eventSample, TMVA::DecisionTree *dt )
 the AdaBoost implementation.
 a new training sample is generated by weighting
 events that are misclassified by the decision tree. The weight
 applied is w = (1-err)/err or more general:
            w = ((1-err)/err)^beta
 where err is the fracthin of misclassified events in the tree ( <0.5 assuming
 demanding the that previous selection was better than random guessing)
 and "beta" beeing a free parameter (standard: beta = 1) that modifies the
 boosting.
Double_t Bagging( vector<TMVA::Event*> eventSample, Int_t iTree )
 call it Bootstrapping, re-sampling or whatever you like, in the end it is nothing
 else but applying "random Weights" to each event.
void WriteWeightsToFile( void )
 write the whole Forest (sample of Decition trees) to a file for later use.
void ReadWeightsFromFile( void )
 read back the Decicion Trees  from the file
Double_t GetMvaValue(TMVA::Event *e)
return the MVA value (range [-1;1]) that classifies the
event.according to the majority vote from the total number of
decision trees
In the literature I found that people actually use the
weighted majority vote (using the boost weights) .. However I
did not see any improvement in doing so :(
 --> this is currently switched off
void WriteHistosToFile( void )
here we could write some histograms created during the processing
to the output file.
MethodBDT( TString jobName, vector<TString>* theVariables, TTree* theTree , TString theOption = "100:AdaBoost:GiniIndex:10:0:20:-1", TDirectory* theTargetDir = 0 )
 constructor for training and reading

Author: Andreas Hoecker, Joerg Stelzer, Helge Voss, Kai Voss
Last update: root/tmva $Id: MethodBDT.cxx,v 1.4 2006/05/26 09:22:13 brun Exp $
Copyright (c) 2005: *


ROOT page - Class index - Class Hierarchy - Top of the page

This page has been automatically generated. If you have any comments or suggestions about the page layout send a mail to ROOT support, or contact the developers with any questions or problems regarding ROOT.