As input data is used a toy-MC sample consisting of two gaussian distributions.
The output file "TMVACV.root" can be analysed with the use of dedicated macros (simply say: root -l <macro.C>), which can be conveniently invoked through a GUI that will appear at the end of the run of this macro. Launch the GUI via the command:
Cross evaluation is a special case of k-folds cross validation where the splitting into k folds is computed deterministically. This ensures that the a given event will always end up in the same fold.
In addition all resulting classifiers are saved and can be applied to new data using MethodCrossValidation
. One requirement for this to work is a splitting function that is evaluated for each event to determine into what fold it goes (for training/evaluation) or to what classifier (for application).
Cross evaluation uses a deterministic split to partition the data into folds called the split expression. The expression can be any valid TFormula
as long as all parts used are defined.
For each event the split expression is evaluated to a number and the event is put in the fold corresponding to that number.
The split expression has access to all spectators and variables defined in the dataloader. Additionally, the number of folds in the split can be accessed with NumFolds
(or numFolds
).
DataSetInfo : [datasetcv] : Added class "Signal"
: Add Tree of type Signal with 1000 events
DataSetInfo : [datasetcv] : Added class "Background"
: Add Tree of type Background with 1000 events
<HEADER> Factory : You are running ROOT Version: 6.28/13, Jan 30, 2024
:
: _/_/_/_/_/ _| _| _| _| _|_|
: _/ _|_| _|_| _| _| _| _|
: _/ _| _| _| _| _| _|_|_|_|
: _/ _| _| _| _| _| _|
: _/ _| _| _| _| _|
:
: ___________TMVA Version 4.2.1, Feb 5, 2015
:
: Rebuilding Dataset datasetcv
: Building event vectors for type 2 Signal
: Dataset[datasetcv] : create input formulas for tree
: Building event vectors for type 2 Background
: Dataset[datasetcv] : create input formulas for tree
<HEADER> DataSetFactory : [datasetcv] : Number of events in input trees
:
:
: Number of training and testing events
: ---------------------------------------------------------------------------
: Signal -- training events : 999
: Signal -- testing events : 1
: Signal -- training and testing events: 1000
: Background -- training events : 999
: Background -- testing events : 1
: Background -- training and testing events: 1000
:
<HEADER> DataSetInfo : Correlation matrix (Signal):
: ------------------------
: x y
: x: +1.000 +0.075
: y: +0.075 +1.000
: ------------------------
<HEADER> DataSetInfo : Correlation matrix (Background):
: ------------------------
: x y
: x: +1.000 +0.020
: y: +0.020 +1.000
: ------------------------
<HEADER> DataSetFactory : [datasetcv] :
:
:
:
: ========================================
: ========================================
:
<HEADER> Factory : Booking method: BDTG_fold1
:
<HEADER> BDTG_fold1 : #events: (reweighted) sig: 500 bkg: 500
: #events: (unweighted) sig: 500 bkg: 500
: Training 100 Decision Trees ... patience please
: Elapsed time for training with 1000 events: 0.0429 sec
<HEADER> BDTG_fold1 : [datasetcv] : Evaluation of BDTG_fold1 on training sample (1000 events)
: Elapsed time for evaluation of 1000 events: 0.00336 sec
: Creating xml weight file: datasetcv/weights/TMVACrossValidation_BDTG_fold1.weights.xml
: Creating standalone class: datasetcv/weights/TMVACrossValidation_BDTG_fold1.class.C
<HEADER> Factory : Test all methods
<HEADER> Factory : Test method: BDTG_fold1 for Classification performance
:
<HEADER> BDTG_fold1 : [datasetcv] : Evaluation of BDTG_fold1 on testing sample (998 events)
: Elapsed time for evaluation of 998 events: 0.00333 sec
<HEADER> Factory : Evaluate all methods
<HEADER> Factory : Evaluate classifier: BDTG_fold1
:
<HEADER> BDTG_fold1 : [datasetcv] : Loop over test events and fill histograms with classifier response...
:
:
: Evaluation results ranked by best signal efficiency and purity (area)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA
: Name: Method: ROC-integ
: datasetcv BDTG_fold1 : 0.973
: -------------------------------------------------------------------------------------------------------------------
:
: Testing efficiency compared to training efficiency (overtraining check)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA Signal efficiency: from test sample (from training sample)
: Name: Method: @B=0.01 @B=0.10 @B=0.30
: -------------------------------------------------------------------------------------------------------------------
: datasetcv BDTG_fold1 : 0.575 (0.725) 0.947 (0.933) 0.981 (0.980)
: -------------------------------------------------------------------------------------------------------------------
:
<HEADER> Factory : Thank you for using TMVA!
: For citation information, please visit: http://tmva.sf.net/citeTMVA.html
<HEADER> Factory : Booking method: BDTG_fold2
:
<HEADER> BDTG_fold2 : #events: (reweighted) sig: 499 bkg: 499
: #events: (unweighted) sig: 499 bkg: 499
: Training 100 Decision Trees ... patience please
: Elapsed time for training with 998 events: 0.0438 sec
<HEADER> BDTG_fold2 : [datasetcv] : Evaluation of BDTG_fold2 on training sample (998 events)
: Elapsed time for evaluation of 998 events: 0.0035 sec
: Creating xml weight file: datasetcv/weights/TMVACrossValidation_BDTG_fold2.weights.xml
: Creating standalone class: datasetcv/weights/TMVACrossValidation_BDTG_fold2.class.C
<HEADER> Factory : Test all methods
<HEADER> Factory : Test method: BDTG_fold2 for Classification performance
:
<HEADER> BDTG_fold2 : [datasetcv] : Evaluation of BDTG_fold2 on testing sample (1000 events)
: Elapsed time for evaluation of 1000 events: 0.00349 sec
<HEADER> Factory : Evaluate all methods
<HEADER> Factory : Evaluate classifier: BDTG_fold2
:
<HEADER> BDTG_fold2 : [datasetcv] : Loop over test events and fill histograms with classifier response...
:
:
: Evaluation results ranked by best signal efficiency and purity (area)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA
: Name: Method: ROC-integ
: datasetcv BDTG_fold2 : 0.961
: -------------------------------------------------------------------------------------------------------------------
:
: Testing efficiency compared to training efficiency (overtraining check)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA Signal efficiency: from test sample (from training sample)
: Name: Method: @B=0.01 @B=0.10 @B=0.30
: -------------------------------------------------------------------------------------------------------------------
: datasetcv BDTG_fold2 : 0.646 (0.696) 0.868 (0.930) 0.975 (0.976)
: -------------------------------------------------------------------------------------------------------------------
:
<HEADER> Factory : Thank you for using TMVA!
: For citation information, please visit: http://tmva.sf.net/citeTMVA.html
<HEADER> Factory : Booking method: BDTG
:
: Reading weightfile: datasetcv/weights/TMVACrossValidation_BDTG_fold1.weights.xml
: Reading weight file: datasetcv/weights/TMVACrossValidation_BDTG_fold1.weights.xml
: Reading weightfile: datasetcv/weights/TMVACrossValidation_BDTG_fold2.weights.xml
: Reading weight file: datasetcv/weights/TMVACrossValidation_BDTG_fold2.weights.xml
:
:
: ========================================
: ========================================
:
<HEADER> Factory : Booking method: Fisher_fold1
:
<HEADER> Fisher_fold1 : Results for Fisher coefficients:
: -----------------------
: Variable: Coefficient:
: -----------------------
: x: +0.449
: y: +0.436
: (offset): +0.019
: -----------------------
: Elapsed time for training with 1000 events: 0.000368 sec
<HEADER> Fisher_fold1 : [datasetcv] : Evaluation of Fisher_fold1 on training sample (1000 events)
: Elapsed time for evaluation of 1000 events: 7.3e-05 sec
: Creating xml weight file: datasetcv/weights/TMVACrossValidation_Fisher_fold1.weights.xml
: Creating standalone class: datasetcv/weights/TMVACrossValidation_Fisher_fold1.class.C
<HEADER> Factory : Test all methods
<HEADER> Factory : Test method: Fisher_fold1 for Classification performance
:
<HEADER> Fisher_fold1 : [datasetcv] : Evaluation of Fisher_fold1 on testing sample (998 events)
: Elapsed time for evaluation of 998 events: 0.000129 sec
<HEADER> Factory : Evaluate all methods
<HEADER> Factory : Evaluate classifier: Fisher_fold1
:
<HEADER> Fisher_fold1 : [datasetcv] : Loop over test events and fill histograms with classifier response...
:
:
: Evaluation results ranked by best signal efficiency and purity (area)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA
: Name: Method: ROC-integ
: datasetcv Fisher_fold1 : 0.976
: -------------------------------------------------------------------------------------------------------------------
:
: Testing efficiency compared to training efficiency (overtraining check)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA Signal efficiency: from test sample (from training sample)
: Name: Method: @B=0.01 @B=0.10 @B=0.30
: -------------------------------------------------------------------------------------------------------------------
: datasetcv Fisher_fold1 : 0.660 (0.665) 0.952 (0.923) 0.986 (0.985)
: -------------------------------------------------------------------------------------------------------------------
:
<HEADER> Factory : Thank you for using TMVA!
: For citation information, please visit: http://tmva.sf.net/citeTMVA.html
<HEADER> Factory : Booking method: Fisher_fold2
:
<HEADER> Fisher_fold2 : Results for Fisher coefficients:
: -----------------------
: Variable: Coefficient:
: -----------------------
: x: +0.501
: y: +0.467
: (offset): -0.000
: -----------------------
: Elapsed time for training with 998 events: 0.000259 sec
<HEADER> Fisher_fold2 : [datasetcv] : Evaluation of Fisher_fold2 on training sample (998 events)
: Elapsed time for evaluation of 998 events: 7.61e-05 sec
: Creating xml weight file: datasetcv/weights/TMVACrossValidation_Fisher_fold2.weights.xml
: Creating standalone class: datasetcv/weights/TMVACrossValidation_Fisher_fold2.class.C
<HEADER> Factory : Test all methods
<HEADER> Factory : Test method: Fisher_fold2 for Classification performance
:
<HEADER> Fisher_fold2 : [datasetcv] : Evaluation of Fisher_fold2 on testing sample (1000 events)
: Elapsed time for evaluation of 1000 events: 7.82e-05 sec
<HEADER> Factory : Evaluate all methods
<HEADER> Factory : Evaluate classifier: Fisher_fold2
:
<HEADER> Fisher_fold2 : [datasetcv] : Loop over test events and fill histograms with classifier response...
:
:
: Evaluation results ranked by best signal efficiency and purity (area)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA
: Name: Method: ROC-integ
: datasetcv Fisher_fold2 : 0.966
: -------------------------------------------------------------------------------------------------------------------
:
: Testing efficiency compared to training efficiency (overtraining check)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA Signal efficiency: from test sample (from training sample)
: Name: Method: @B=0.01 @B=0.10 @B=0.30
: -------------------------------------------------------------------------------------------------------------------
: datasetcv Fisher_fold2 : 0.655 (0.645) 0.900 (0.928) 0.975 (0.977)
: -------------------------------------------------------------------------------------------------------------------
:
<HEADER> Factory : Thank you for using TMVA!
: For citation information, please visit: http://tmva.sf.net/citeTMVA.html
<HEADER> Factory : Booking method: Fisher
:
: Reading weightfile: datasetcv/weights/TMVACrossValidation_Fisher_fold1.weights.xml
: Reading weight file: datasetcv/weights/TMVACrossValidation_Fisher_fold1.weights.xml
: Reading weightfile: datasetcv/weights/TMVACrossValidation_Fisher_fold2.weights.xml
: Reading weight file: datasetcv/weights/TMVACrossValidation_Fisher_fold2.weights.xml
:
:
: ========================================
: Folds processed for all methods, evaluating.
: ========================================
:
<HEADER> Factory : [datasetcv] : Create Transformation "I" with events from all classes.
:
<HEADER> : Transformation, Variable selection :
: Input : variable 'x' <---> Output : variable 'x'
: Input : variable 'y' <---> Output : variable 'y'
<HEADER> TFHandler_Factory : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: x: -0.014284 1.4061 [ -4.1075 4.0969 ]
: y: -0.0066370 1.4204 [ -4.8520 4.0761 ]
: -----------------------------------------------------------
: Ranking input variables (method unspecific)...
<HEADER> IdTransformation : Ranking result (top variable is best ranked)
: --------------------------
: Rank : Variable : Separation
: --------------------------
: 1 : x : 5.429e-01
: 2 : y : 5.230e-01
: --------------------------
: Elapsed time for training with 1998 events: 5.01e-06 sec
<HEADER> BDTG : [datasetcv] : Evaluation of BDTG on training sample (1998 events)
: Elapsed time for evaluation of 1998 events: 0.00649 sec
: Creating xml weight file: datasetcv/weights/TMVACrossValidation_BDTG.weights.xml
: Creating standalone class: datasetcv/weights/TMVACrossValidation_BDTG.class.C
<WARNING> <WARNING> : MakeClassSpecificHeader not implemented for CrossValidation
<WARNING> <WARNING> : MakeClassSpecific not implemented for CrossValidation
: Elapsed time for training with 1998 events: 4.05e-06 sec
<HEADER> Fisher : [datasetcv] : Evaluation of Fisher on training sample (1998 events)
: Elapsed time for evaluation of 1998 events: 0.000336 sec
: Creating xml weight file: datasetcv/weights/TMVACrossValidation_Fisher.weights.xml
: Creating standalone class: datasetcv/weights/TMVACrossValidation_Fisher.class.C
<WARNING> <WARNING> : MakeClassSpecificHeader not implemented for CrossValidation
<WARNING> <WARNING> : MakeClassSpecific not implemented for CrossValidation
<HEADER> Factory : Test all methods
<HEADER> Factory : Test method: BDTG for Classification performance
:
<HEADER> BDTG : [datasetcv] : Evaluation of BDTG on testing sample (1998 events)
: Elapsed time for evaluation of 1998 events: 0.00628 sec
<HEADER> Factory : Test method: Fisher for Classification performance
:
<HEADER> Fisher : [datasetcv] : Evaluation of Fisher on testing sample (1998 events)
: Elapsed time for evaluation of 1998 events: 0.000301 sec
<HEADER> Factory : Evaluate all methods
<HEADER> Factory : Evaluate classifier: BDTG
:
<HEADER> BDTG : [datasetcv] : Loop over test events and fill histograms with classifier response...
:
<HEADER> TFHandler_BDTG : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: x: -0.014284 1.4061 [ -4.1075 4.0969 ]
: y: -0.0066370 1.4204 [ -4.8520 4.0761 ]
: -----------------------------------------------------------
<HEADER> Factory : Evaluate classifier: Fisher
:
<HEADER> Fisher : [datasetcv] : Loop over test events and fill histograms with classifier response...
:
<HEADER> TFHandler_Fisher : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: x: -0.014284 1.4061 [ -4.1075 4.0969 ]
: y: -0.0066370 1.4204 [ -4.8520 4.0761 ]
: -----------------------------------------------------------
:
: Evaluation results ranked by best signal efficiency and purity (area)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA
: Name: Method: ROC-integ
: datasetcv Fisher : 0.971
: datasetcv BDTG : 0.965
: -------------------------------------------------------------------------------------------------------------------
:
: Testing efficiency compared to training efficiency (overtraining check)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA Signal efficiency: from test sample (from training sample)
: Name: Method: @B=0.01 @B=0.10 @B=0.30
: -------------------------------------------------------------------------------------------------------------------
: datasetcv Fisher : 0.665 (0.665) 0.922 (0.922) 0.980 (0.980)
: datasetcv BDTG : 0.617 (0.617) 0.914 (0.914) 0.974 (0.974)
: -------------------------------------------------------------------------------------------------------------------
:
<HEADER> Dataset:datasetcv : Created tree 'TestTree' with 1998 events
:
<HEADER> Dataset:datasetcv : Created tree 'TrainTree' with 1998 events
:
<HEADER> Factory : Thank you for using TMVA!
: For citation information, please visit: http://tmva.sf.net/citeTMVA.html
: Evaluation done.
Summary for method BDT
Fold 0: ROC int: 0.972504, BkgEff@SigEff=0.3: 0.981
Fold 1: ROC int: 0.96115, BkgEff@SigEff=0.3: 0.975
Summary for method Fisher
Fold 0: ROC int: 0.976137, BkgEff@SigEff=0.3: 0.986
Fold 1: ROC int: 0.96584, BkgEff@SigEff=0.3: 0.975
==> Wrote root file: TMVACV.root
==> TMVACrossValidation is done!
(int) 0