Detailed Description

This macro provides examples for the training and testing of the TMVA classifiers.

As input data is used a toy-MC sample consisting of four Gaussian-distributed and linearly correlated input variables.

The methods to be used can be switched on and off by means of booleans, or via the prompt command, for example:

root -l TMVARegression.C\(\"LD,MLP\"\)

(note that the backslashes are mandatory) If no method given, a default set is used.

The output file "TMVAReg.root" can be analysed with the use of dedicated macros (simply say: root -l <macro.C>), which can be conveniently invoked through a GUI that will appear at the end of the run of this macro.

Project : TMVA - a Root-integrated toolkit for multivariate data analysis
Package : TMVA
Root Macro: TMVARegression

Processing /mnt/build/workspace/root-makedoc-v610/rootspi/rdoc/src/v6-10-00-patches/tutorials/tmva/TMVARegression.C...
==> Start TMVARegression
--- TMVARegression           : Using input file: ./files/tmva_reg_example.root
DataSetInfo              : [dataset] : Added class "Regression"
                         : Add Tree TreeR of type Regression with 10000 events
                         : Dataset[dataset] : Class index : 0  name : Regression
Factory                  : Booking method: [1mPDEFoam[0m
                         : 
DataSetFactory           : [dataset] : Number of events in input trees
                         : 
                         : Number of training and testing events
                         : ---------------------------------------------------------------------------
                         : Regression -- training events            : 1000
                         : Regression -- testing events             : 9000
                         : Regression -- training and testing events: 10000
                         : 
DataSetInfo              : Correlation matrix (Regression):
                         : ------------------------
                         :             var1    var2
                         :    var1:  +1.000  -0.017
                         :    var2:  -0.017  +1.000
                         : ------------------------
DataSetFactory           : [dataset] :  
                         : 
Factory                  : Booking method: [1mKNN[0m
                         : 
Factory                  : Booking method: [1mLD[0m
                         : 
Factory                  : Booking method: [1mFDA_GA[0m
                         : 
FDA_GA                   : [dataset] : Create Transformation "Norm" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'var1' <---> Output : variable 'var1'
                         : Input : variable 'var2' <---> Output : variable 'var2'
                         : Input : target 'fvalue' <---> Output : target 'fvalue'
                         : Create parameter interval for parameter 0 : [-100,100]
                         : Create parameter interval for parameter 1 : [-100,100]
                         : Create parameter interval for parameter 2 : [-100,100]
                         : User-defined formula string       : "(0)+(1)*x0+(2)*x1"
                         : TFormula-compatible formula string: "[0]+[1]*[3]+[2]*[4]"
Factory                  : Booking method: [1mMLP[0m
                         : 
MLP                      : [dataset] : Create Transformation "Norm" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'var1' <---> Output : variable 'var1'
                         : Input : variable 'var2' <---> Output : variable 'var2'
                         : Input : target 'fvalue' <---> Output : target 'fvalue'
MLP                      : Building Network. 
                         : Initializing weights
Factory                  : Booking method: [1mBDTG[0m
                         : 
<WARNING>                : Value for option maxdepth was previously set to 3
                         : the option *InverseBoostNegWeights* does not exist for BoostType=Grad --> change
                         : to new default for GradBoost *Pray*
Factory                  : [1mTrain all methods[0m
Factory                  : [dataset] : Create Transformation "I" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'var1' <---> Output : variable 'var1'
                         : Input : variable 'var2' <---> Output : variable 'var2'
TFHandler_Factory        : Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :     var1:     3.4152     1.1962   [  0.0026062     4.9957 ]
                         :     var2:     2.4350     1.4125   [  0.0092062     4.9990 ]
                         :   fvalue:     164.97     82.189   [     1.7144     391.23 ]
                         : -----------------------------------------------------------
                         : Ranking input variables (method unspecific)...
IdTransformation         : Ranking result (top variable is best ranked)
                         : --------------------------------------------
                         : Rank : Variable  : |Correlation with target|
                         : --------------------------------------------
                         :    1 : var2      : 7.418e-01
                         :    2 : var1      : 5.999e-01
                         : --------------------------------------------
IdTransformation         : Ranking result (top variable is best ranked)
                         : -------------------------------------
                         : Rank : Variable  : Mutual information
                         : -------------------------------------
                         :    1 : var2      : 2.029e+00
                         :    2 : var1      : 1.950e+00
                         : -------------------------------------
IdTransformation         : Ranking result (top variable is best ranked)
                         : ------------------------------------
                         : Rank : Variable  : Correlation Ratio
                         : ------------------------------------
                         :    1 : var1      : 6.546e+00
                         :    2 : var2      : 2.461e+00
                         : ------------------------------------
IdTransformation         : Ranking result (top variable is best ranked)
                         : ----------------------------------------
                         : Rank : Variable  : Correlation Ratio (T)
                         : ----------------------------------------
                         :    1 : var2      : 9.176e-01
                         :    2 : var1      : 2.977e-01
                         : ----------------------------------------
Factory                  : Train method: PDEFoam for Regression
                         : 
                         : Build mono target regression foam
                         : Elapsed time: 0.583 sec                                 
                         : Elapsed time for training with 1000 events: 0.59 sec         
                         : Dataset[dataset] : Create results for training
                         : Dataset[dataset] : Evaluation of PDEFoam on training sample
                         : Dataset[dataset] : Elapsed time for evaluation of 1000 events: 0.016 sec       
                         : Create variable histograms
                         : Create regression target histograms
                         : Create regression average deviation
                         : Results created
                         : Creating xml weight file: [0;36mdataset/weights/TMVARegression_PDEFoam.weights.xml[0m
                         : writing foam MonoTargetRegressionFoam to file
                         : Foams written to file: [0;36mdataset/weights/TMVARegression_PDEFoam.weights_foams.root[0m
Factory                  : Training finished
                         : 
Factory                  : Train method: KNN for Regression
                         : 
KNN                      : <Train> start...
                         : Reading 1000 events
                         : Number of signal events 1000
                         : Number of background events 0
                         : Creating kd-tree with 1000 events
                         : Computing scale factor for 1d distributions: (ifrac, bottom, top) = (80%, 10%, 90%)
ModulekNN                : Optimizing tree for 2 variables with 1000 values
                         : <Fill> Class 1 has     1000 events
                         : Elapsed time for training with 1000 events: 0.00156 sec         
                         : Dataset[dataset] : Create results for training
                         : Dataset[dataset] : Evaluation of KNN on training sample
                         : Dataset[dataset] : Elapsed time for evaluation of 1000 events: 0.0172 sec       
                         : Create variable histograms
                         : Create regression target histograms
                         : Create regression average deviation
                         : Results created
                         : Creating xml weight file: [0;36mdataset/weights/TMVARegression_KNN.weights.xml[0m
Factory                  : Training finished
                         : 
Factory                  : Train method: LD for Regression
                         : 
LD                       : Results for LD coefficients:
                         : -----------------------
                         : Variable:  Coefficient:
                         : -----------------------
                         :     var1:      +42.090
                         :     var2:      +44.604
                         : (offset):      -87.385
                         : -----------------------
                         : Elapsed time for training with 1000 events: 0.000518 sec         
                         : Dataset[dataset] : Create results for training
                         : Dataset[dataset] : Evaluation of LD on training sample
                         : Dataset[dataset] : Elapsed time for evaluation of 1000 events: 0.00861 sec       
                         : Create variable histograms
                         : Create regression target histograms
                         : Create regression average deviation
                         : Results created
                         : Creating xml weight file: [0;36mdataset/weights/TMVARegression_LD.weights.xml[0m
Factory                  : Training finished
                         : 
Factory                  : Train method: FDA_GA for Regression
                         : 
TFHandler_FDA_GA         : Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :     var1:    0.36693    0.47914   [    -1.0000     1.0000 ]
                         :     var2:  -0.027695    0.56614   [    -1.0000     1.0000 ]
                         :   fvalue:   -0.16174    0.42200   [    -1.0000     1.0000 ]
                         : -----------------------------------------------------------
FitterBase               : <GeneticFitter> Optimisation, please be patient ... (inaccurate progress timing for GA)
                         : Elapsed time: 11.5 sec                            
FDA_GA                   : Results for parameter fit using "GA" fitter:
                         : -----------------------
                         : Parameter:  Fit result:
                         : -----------------------
                         :    Par(0):   -0.345151
                         :    Par(1):    0.539931
                         :    Par(2):    0.569534
                         : -----------------------
                         : Discriminator expression: "(0)+(1)*x0+(2)*x1"
                         : Value of estimator at minimum: 0.0094457
                         : Elapsed time for training with 1000 events: 11.7 sec         
                         : Dataset[dataset] : Create results for training
                         : Dataset[dataset] : Evaluation of FDA_GA on training sample
                         : Dataset[dataset] : Elapsed time for evaluation of 1000 events: 0.00832 sec       
                         : Create variable histograms
                         : Create regression target histograms
                         : Create regression average deviation
                         : Results created
                         : Creating xml weight file: [0;36mdataset/weights/TMVARegression_FDA_GA.weights.xml[0m
Factory                  : Training finished
                         : 
Factory                  : Train method: MLP for Regression
                         : 
TFHandler_MLP            : Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :     var1:    0.36693    0.47914   [    -1.0000     1.0000 ]
                         :     var2:  -0.027695    0.56614   [    -1.0000     1.0000 ]
                         :   fvalue:   -0.16174    0.42200   [    -1.0000     1.0000 ]
                         : -----------------------------------------------------------
                         : Training Network
                         : 
                         : Inaccurate progress timing for MLP... 
<WARNING>                : Line search increased error! Something is wrong.fLastAlpha=2.5046al123=0.986836 2.96051 8.88152 err1=0.00131353 errfinal=0.00131354
<WARNING>                : Line search increased error! Something is wrong.fLastAlpha=5.07736al123=2 6 18 err1=0.00131344 errfinal=0.00131344
                         : Elapsed time for training with 1000 events: 9.16 sec         
                         : Dataset[dataset] : Create results for training
                         : Dataset[dataset] : Evaluation of MLP on training sample
                         : Dataset[dataset] : Elapsed time for evaluation of 1000 events: 0.00953 sec       
                         : Create variable histograms
                         : Create regression target histograms
                         : Create regression average deviation
                         : Results created
                         : Creating xml weight file: [0;36mdataset/weights/TMVARegression_MLP.weights.xml[0m
                         : Write special histos to file: TMVAReg.root:/dataset/Method_MLP/MLP
Factory                  : Training finished
                         : 
Factory                  : Train method: BDTG for Regression
                         : 
                         : Regression Loss Function: Huber
                         : Training 2000 Decision Trees ... patience please
                         : Elapsed time for training with 1000 events: 1.3 sec         
                         : Dataset[dataset] : Create results for training
                         : Dataset[dataset] : Evaluation of BDTG on training sample
                         : Dataset[dataset] : Elapsed time for evaluation of 1000 events: 0.362 sec       
                         : Create variable histograms
                         : Create regression target histograms
                         : Create regression average deviation
                         : Results created
                         : Creating xml weight file: [0;36mdataset/weights/TMVARegression_BDTG.weights.xml[0m
                         : TMVAReg.root:/dataset/Method_BDTG/BDTG
Factory                  : Training finished
                         : 
Factory                  : === Destroy and recreate all methods via weight files for testing ===
                         : 
                         : Read foams from file: [0;36mdataset/weights/TMVARegression_PDEFoam.weights_foams.root[0m
                         : Creating kd-tree with 1000 events
                         : Computing scale factor for 1d distributions: (ifrac, bottom, top) = (80%, 10%, 90%)
ModulekNN                : Optimizing tree for 2 variables with 1000 values
                         : <Fill> Class 1 has     1000 events
                         : User-defined formula string       : "(0)+(1)*x0+(2)*x1"
                         : TFormula-compatible formula string: "[0]+[1]*[3]+[2]*[4]"
MLP                      : Building Network. 
                         : Initializing weights
Factory                  : [1mTest all methods[0m
Factory                  : Test method: PDEFoam for Regression performance
                         : 
                         : Dataset[dataset] : Create results for testing
                         : Dataset[dataset] : Evaluation of PDEFoam on testing sample
                         : Dataset[dataset] : Elapsed time for evaluation of 9000 events: 0.1 sec       
                         : Create variable histograms
                         : Create regression target histograms
                         : Create regression average deviation
                         : Results created
Factory                  : Test method: KNN for Regression performance
                         : 
                         : Dataset[dataset] : Create results for testing
                         : Dataset[dataset] : Evaluation of KNN on testing sample
                         : Dataset[dataset] : Elapsed time for evaluation of 9000 events: 0.0864 sec       
                         : Create variable histograms
                         : Create regression target histograms
                         : Create regression average deviation
                         : Results created
Factory                  : Test method: LD for Regression performance
                         : 
                         : Dataset[dataset] : Create results for testing
                         : Dataset[dataset] : Evaluation of LD on testing sample
                         : Dataset[dataset] : Elapsed time for evaluation of 9000 events: 0.0153 sec       
                         : Create variable histograms
                         : Create regression target histograms
                         : Create regression average deviation
                         : Results created
Factory                  : Test method: FDA_GA for Regression performance
                         : 
                         : Dataset[dataset] : Create results for testing
                         : Dataset[dataset] : Evaluation of FDA_GA on testing sample
                         : Dataset[dataset] : Elapsed time for evaluation of 9000 events: 0.0205 sec       
                         : Create variable histograms
                         : Create regression target histograms
                         : Create regression average deviation
                         : Results created
Factory                  : Test method: MLP for Regression performance
                         : 
                         : Dataset[dataset] : Create results for testing
                         : Dataset[dataset] : Evaluation of MLP on testing sample
                         : Dataset[dataset] : Elapsed time for evaluation of 9000 events: 0.0313 sec       
                         : Create variable histograms
                         : Create regression target histograms
                         : Create regression average deviation
                         : Results created
Factory                  : Test method: BDTG for Regression performance
                         : 
                         : Dataset[dataset] : Create results for testing
                         : Dataset[dataset] : Evaluation of BDTG on testing sample
                         : Dataset[dataset] : Elapsed time for evaluation of 9000 events: 2.1 sec       
                         : Create variable histograms
                         : Create regression target histograms
                         : Create regression average deviation
                         : Results created
Factory                  : [1mEvaluate all methods[0m
                         : Evaluate regression method: PDEFoam
TFHandler_PDEFoam        : Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :     var1:     3.3308     1.1858   [ 0.00020069     5.0000 ]
                         :     var2:     2.4914     1.4394   [ 0.00071490     5.0000 ]
                         :   fvalue:     164.02     83.934   [     1.6186     394.84 ]
                         : -----------------------------------------------------------
                         : Evaluate regression method: KNN
TFHandler_KNN            : Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :     var1:     3.3308     1.1858   [ 0.00020069     5.0000 ]
                         :     var2:     2.4914     1.4394   [ 0.00071490     5.0000 ]
                         :   fvalue:     164.02     83.934   [     1.6186     394.84 ]
                         : -----------------------------------------------------------
                         : Evaluate regression method: LD
TFHandler_LD             : Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :     var1:     3.3308     1.1858   [ 0.00020069     5.0000 ]
                         :     var2:     2.4914     1.4394   [ 0.00071490     5.0000 ]
                         :   fvalue:     164.02     83.934   [     1.6186     394.84 ]
                         : -----------------------------------------------------------
                         : Evaluate regression method: FDA_GA
TFHandler_FDA_GA         : Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :     var1:    0.33312    0.47497   [    -1.0010     1.0017 ]
                         :     var2: -0.0050675    0.57694   [    -1.0034     1.0004 ]
                         :   fvalue:   -0.16662    0.43096   [    -1.0005     1.0185 ]
                         : -----------------------------------------------------------
TFHandler_FDA_GA         : Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :     var1:    0.33312    0.47497   [    -1.0010     1.0017 ]
                         :     var2: -0.0050675    0.57694   [    -1.0034     1.0004 ]
                         :   fvalue:   -0.16662    0.43096   [    -1.0005     1.0185 ]
                         : -----------------------------------------------------------
                         : Evaluate regression method: MLP
TFHandler_MLP            : Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :     var1:    0.33312    0.47497   [    -1.0010     1.0017 ]
                         :     var2: -0.0050675    0.57694   [    -1.0034     1.0004 ]
                         :   fvalue:   -0.16662    0.43096   [    -1.0005     1.0185 ]
                         : -----------------------------------------------------------
TFHandler_MLP            : Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :     var1:    0.33312    0.47497   [    -1.0010     1.0017 ]
                         :     var2: -0.0050675    0.57694   [    -1.0034     1.0004 ]
                         :   fvalue:   -0.16662    0.43096   [    -1.0005     1.0185 ]
                         : -----------------------------------------------------------
                         : Evaluate regression method: BDTG
TFHandler_BDTG           : Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :     var1:     3.3308     1.1858   [ 0.00020069     5.0000 ]
                         :     var2:     2.4914     1.4394   [ 0.00071490     5.0000 ]
                         :   fvalue:     164.02     83.934   [     1.6186     394.84 ]
                         : -----------------------------------------------------------
                         : 
                         : Evaluation results ranked by smallest RMS on test sample:
                         : ("Bias" quotes the mean deviation of the regression from true target.
                         :  "MutInf" is the "Mutual Information" between regression and target.
                         :  Indicated by "_T" are the corresponding "truncated" quantities ob-
                         :  tained when removing events deviating more than 2sigma from average.)
                         : --------------------------------------------------------------------------------------------------
                         : --------------------------------------------------------------------------------------------------
                         : dataset              MLP            :  -0.0166  -0.0160    0.309    0.298  |  3.435  3.438
                         : dataset              BDTG           :    0.213    0.173     2.38     1.87  |  3.119  3.184
                         : dataset              KNN            :   -0.511    0.432     5.76     3.78  |  2.871  2.903
                         : dataset              PDEFoam        :    -1.23   -0.960     9.63     7.88  |  2.263  2.341
                         : dataset              FDA_GA         :   -0.302     1.39     19.8     17.9  |  1.990  1.980
                         : dataset              LD             :  -0.0832     1.61     19.7     17.9  |  1.987  1.981
                         : --------------------------------------------------------------------------------------------------
                         : 
                         : Evaluation results ranked by smallest RMS on training sample:
                         : (overtraining check)
                         : --------------------------------------------------------------------------------------------------
                         : DataSet Name:         MVA Method:        <Bias>   <Bias_T>    RMS    RMS_T  |  MutInf MutInf_T
                         : --------------------------------------------------------------------------------------------------
                         : dataset              MLP            :-0.000756  0.00366    0.305    0.294  |  3.435  3.432
                         : dataset              BDTG           :   0.0297  0.00807    0.487    0.261  |  3.430  3.462
                         : dataset              KNN            :   -0.525    0.296     5.55     3.82  |  2.936  2.951
                         : dataset              PDEFoam        : 1.97e-07    0.344     7.90     6.18  |  2.517  2.583
                         : dataset              FDA_GA         :   -0.208     1.57     18.9     16.9  |  2.113  2.105
                         : dataset              LD             : 2.81e-06     1.76     18.9     16.9  |  2.106  2.104
                         : --------------------------------------------------------------------------------------------------
                         : 
Dataset:dataset          : Created tree 'TestTree' with 9000 events
                         : 
Dataset:dataset          : Created tree 'TrainTree' with 1000 events
                         : 
Factory                  : [1mThank you for using TMVA![0m
                         : [1mFor citation information, please visit: http://tmva.sf.net/citeTMVA.html[0m
==> Wrote root file: TMVAReg.root
==> TMVARegression is done!

#include <cstdlib>
#include <iostream>
#include <map>
#include <string>
#include "TChain.h"
#include "TFile.h"
#include "TTree.h"
#include "TString.h"
#include "TObjString.h"
#include "TSystem.h"
#include "TROOT.h"
#include "TMVA/Tools.h"
#include "TMVA/Factory.h"
#include "TMVA/DataLoader.h"
#include "TMVA/TMVARegGui.h"
using namespace TMVA;
void TMVARegression( TString myMethodList = "" )
{
   // The explicit loading of the shared libTMVA is done in TMVAlogon.C, defined in .rootrc
   // if you use your private .rootrc, or run from a different directory, please copy the
   // corresponding lines from .rootrc
   // methods to be processed can be given as an argument; use format:
   //
   //     mylinux~> root -l TMVARegression.C\(\"myMethod1,myMethod2,myMethod3\"\)
   //
   //---------------------------------------------------------------
   // This loads the library
   TMVA::Tools::Instance();
   // Default MVA methods to be trained + tested
   std::map<std::string,int> Use;
   // Mutidimensional likelihood and Nearest-Neighbour methods
   Use["PDERS"]           = 0;
   Use["PDEFoam"]         = 1;
   Use["KNN"]             = 1;
   //
   // Linear Discriminant Analysis
   Use["LD"]              = 1;
   //
   // Function Discriminant analysis
   Use["FDA_GA"]          = 1;
   Use["FDA_MC"]          = 0;
   Use["FDA_MT"]          = 0;
   Use["FDA_GAMT"]        = 0;
   //
   // Neural Network
   Use["MLP"]             = 1;
   Use["DNN_CPU"] = 0;
   //
   // Support Vector Machine
   Use["SVM"]             = 0;
   //
   // Boosted Decision Trees
   Use["BDT"]             = 0;
   Use["BDTG"]            = 1;
   // ---------------------------------------------------------------
   std::cout << std::endl;
   std::cout << "==> Start TMVARegression" << std::endl;
   // Select methods (don't look at this code - not of interest)
   if (myMethodList != "") {
      for (std::map<std::string,int>::iterator it = Use.begin(); it != Use.end(); it++) it->second = 0;
      std::vector<TString> mlist = gTools().SplitString( myMethodList, ',' );
      for (UInt_t i=0; i<mlist.size(); i++) {
         std::string regMethod(mlist[i]);
         if (Use.find(regMethod) == Use.end()) {
            std::cout << "Method \"" << regMethod << "\" not known in TMVA under this name. Choose among the following:" << std::endl;
            for (std::map<std::string,int>::iterator it = Use.begin(); it != Use.end(); it++) std::cout << it->first << " ";
            std::cout << std::endl;
            return;
         }
         Use[regMethod] = 1;
      }
   }
   // --------------------------------------------------------------------------------------------------
   // Here the preparation phase begins
   // Create a new root output file
   TString outfileName( "TMVAReg.root" );
   TFile* outputFile = TFile::Open( outfileName, "RECREATE" );
   // Create the factory object. Later you can choose the methods
   // whose performance you'd like to investigate. The factory will
   // then run the performance analysis for you.
   //
   // The first argument is the base of the name of all the
   // weightfiles in the directory weight/
   //
   // The second argument is the output file for the training results
   // All TMVA output can be suppressed by removing the "!" (not) in
   // front of the "Silent" argument in the option string
   TMVA::Factory *factory = new TMVA::Factory( "TMVARegression", outputFile,
                                               "!V:!Silent:Color:DrawProgressBar:AnalysisType=Regression" );
   TMVA::DataLoader *dataloader=new TMVA::DataLoader("dataset");
   // If you wish to modify default settings
   // (please check "src/Config.h" to see all available global options)
   //
   //     (TMVA::gConfig().GetVariablePlotting()).fTimesRMS = 8.0;
   //     (TMVA::gConfig().GetIONames()).fWeightFileDir = "myWeightDirectory";
   // Define the input variables that shall be used for the MVA training
   // note that you may also use variable expressions, such as: "3*var1/var2*abs(var3)"
   // [all types of expressions that can also be parsed by TTree::Draw( "expression" )]
   dataloader->AddVariable( "var1", "Variable 1", "units", 'F' );
   dataloader->AddVariable( "var2", "Variable 2", "units", 'F' );
   // You can add so-called "Spectator variables", which are not used in the MVA training,
   // but will appear in the final "TestTree" produced by TMVA. This TestTree will contain the
   // input variables, the response values of all trained MVAs, and the spectator variables
   dataloader->AddSpectator( "spec1:=var1*2",  "Spectator 1", "units", 'F' );
   dataloader->AddSpectator( "spec2:=var1*3",  "Spectator 2", "units", 'F' );
   // Add the variable carrying the regression target
   dataloader->AddTarget( "fvalue" );
   // It is also possible to declare additional targets for multi-dimensional regression, ie:
   //     factory->AddTarget( "fvalue2" );
   // BUT: this is currently ONLY implemented for MLP
   // Read training and test data (see TMVAClassification for reading ASCII files)
   // load the signal and background event samples from ROOT trees
   TFile *input(0);
   TString fname = "./tmva_reg_example.root";
   if (!gSystem->AccessPathName( fname )) {
      input = TFile::Open( fname ); // check if file in local directory exists
   }
   else {
      TFile::SetCacheFileDir(".");
      input = TFile::Open("http://root.cern.ch/files/tmva_reg_example.root", "CACHEREAD"); // if not: download from ROOT server
   }
   if (!input) {
      std::cout << "ERROR: could not open data file" << std::endl;
      exit(1);
   }
   std::cout << "--- TMVARegression           : Using input file: " << input->GetName() << std::endl;
   // Register the regression tree
   TTree *regTree = (TTree*)input->Get("TreeR");
   // global event weights per tree (see below for setting event-wise weights)
   Double_t regWeight  = 1.0;
   // You can add an arbitrary number of regression trees
   dataloader->AddRegressionTree( regTree, regWeight );
   // This would set individual event weights (the variables defined in the
   // expression need to exist in the original TTree)
   dataloader->SetWeightExpression( "var1", "Regression" );
   // Apply additional cuts on the signal and background samples (can be different)
   TCut mycut = ""; // for example: TCut mycut = "abs(var1)<0.5 && abs(var2-0.5)<1";
   // tell the DataLoader to use all remaining events in the trees after training for testing:
   dataloader->PrepareTrainingAndTestTree( mycut,
                                         "nTrain_Regression=1000:nTest_Regression=0:SplitMode=Random:NormMode=NumEvents:!V" );
   //
   //     dataloader->PrepareTrainingAndTestTree( mycut,
   //            "nTrain_Regression=0:nTest_Regression=0:SplitMode=Random:NormMode=NumEvents:!V" );
   // If no numbers of events are given, half of the events in the tree are used
   // for training, and the other half for testing:
   //
   //     dataloader->PrepareTrainingAndTestTree( mycut, "SplitMode=random:!V" );
   // Book MVA methods
   //
   // Please lookup the various method configuration options in the corresponding cxx files, eg:
   // src/MethoCuts.cxx, etc, or here: http://tmva.sourceforge.net/optionRef.html
   // it is possible to preset ranges in the option string in which the cut optimisation should be done:
   // "...:CutRangeMin[2]=-1:CutRangeMax[2]=1"...", where [2] is the third input variable
   // PDE - RS method
   if (Use["PDERS"])
      factory->BookMethod( dataloader,  TMVA::Types::kPDERS, "PDERS",
                           "!H:!V:NormTree=T:VolumeRangeMode=Adaptive:KernelEstimator=Gauss:GaussSigma=0.3:NEventsMin=40:NEventsMax=60:VarTransform=None" );
   // And the options strings for the MinMax and RMS methods, respectively:
   //
   //      "!H:!V:VolumeRangeMode=MinMax:DeltaFrac=0.2:KernelEstimator=Gauss:GaussSigma=0.3" );
   //      "!H:!V:VolumeRangeMode=RMS:DeltaFrac=3:KernelEstimator=Gauss:GaussSigma=0.3" );
   if (Use["PDEFoam"])
       factory->BookMethod( dataloader,  TMVA::Types::kPDEFoam, "PDEFoam",
             "!H:!V:MultiTargetRegression=F:TargetSelection=Mpv:TailCut=0.001:VolFrac=0.0666:nActiveCells=500:nSampl=2000:nBin=5:Compress=T:Kernel=None:Nmin=10:VarTransform=None" );
   // K-Nearest Neighbour classifier (KNN)
   if (Use["KNN"])
      factory->BookMethod( dataloader,  TMVA::Types::kKNN, "KNN",
                           "nkNN=20:ScaleFrac=0.8:SigmaFact=1.0:Kernel=Gaus:UseKernel=F:UseWeight=T:!Trim" );
   // Linear discriminant
   if (Use["LD"])
      factory->BookMethod( dataloader,  TMVA::Types::kLD, "LD",
                           "!H:!V:VarTransform=None" );
   // Function discrimination analysis (FDA) -- test of various fitters - the recommended one is Minuit (or GA or SA)
   if (Use["FDA_MC"])
      factory->BookMethod( dataloader,  TMVA::Types::kFDA, "FDA_MC",
                          "!H:!V:Formula=(0)+(1)*x0+(2)*x1:ParRanges=(-100,100);(-100,100);(-100,100):FitMethod=MC:SampleSize=100000:Sigma=0.1:VarTransform=D" );
   if (Use["FDA_GA"]) // can also use Simulated Annealing (SA) algorithm (see Cuts_SA options) .. the formula of this example is good for parabolas
      factory->BookMethod( dataloader,  TMVA::Types::kFDA, "FDA_GA",
                           "!H:!V:Formula=(0)+(1)*x0+(2)*x1:ParRanges=(-100,100);(-100,100);(-100,100):FitMethod=GA:PopSize=100:Cycles=3:Steps=30:Trim=True:SaveBestGen=1:VarTransform=Norm" );
   if (Use["FDA_MT"])
      factory->BookMethod( dataloader,  TMVA::Types::kFDA, "FDA_MT",
                           "!H:!V:Formula=(0)+(1)*x0+(2)*x1:ParRanges=(-100,100);(-100,100);(-100,100);(-10,10):FitMethod=MINUIT:ErrorLevel=1:PrintLevel=-1:FitStrategy=2:UseImprove:UseMinos:SetBatch" );
   if (Use["FDA_GAMT"])
      factory->BookMethod( dataloader,  TMVA::Types::kFDA, "FDA_GAMT",
                           "!H:!V:Formula=(0)+(1)*x0+(2)*x1:ParRanges=(-100,100);(-100,100);(-100,100):FitMethod=GA:Converger=MINUIT:ErrorLevel=1:PrintLevel=-1:FitStrategy=0:!UseImprove:!UseMinos:SetBatch:Cycles=1:PopSize=5:Steps=5:Trim" );
   // Neural network (MLP)
   if (Use["MLP"])
      factory->BookMethod( dataloader,  TMVA::Types::kMLP, "MLP", "!H:!V:VarTransform=Norm:NeuronType=tanh:NCycles=20000:HiddenLayers=N+20:TestRate=6:TrainingMethod=BFGS:Sampling=0.3:SamplingEpoch=0.8:ConvergenceImprove=1e-6:ConvergenceTests=15:!UseRegulator" );
   if (Use["DNN_CPU"]) {
      /*
          TString layoutString ("Layout=TANH|(N+100)*2,LINEAR");
          TString layoutString ("Layout=SOFTSIGN|100,SOFTSIGN|50,SOFTSIGN|20,LINEAR");
          TString layoutString ("Layout=RELU|300,RELU|100,RELU|30,RELU|10,LINEAR");
          TString layoutString ("Layout=SOFTSIGN|50,SOFTSIGN|30,SOFTSIGN|20,SOFTSIGN|10,LINEAR");
          TString layoutString ("Layout=TANH|50,TANH|30,TANH|20,TANH|10,LINEAR");
          TString layoutString ("Layout=SOFTSIGN|50,SOFTSIGN|20,LINEAR");
          TString layoutString ("Layout=TANH|100,TANH|30,LINEAR");
       */
      TString layoutString("Layout=TANH|100,LINEAR");
      TString training0("LearningRate=1e-5,Momentum=0.5,Repetitions=1,ConvergenceSteps=500,BatchSize=50,"
                        "TestRepetitions=7,WeightDecay=0.01,Regularization=NONE,DropConfig=0.5+0.5+0.5+0.5,"
                        "DropRepetitions=2");
      TString training1("LearningRate=1e-5,Momentum=0.9,Repetitions=1,ConvergenceSteps=170,BatchSize=30,"
                        "TestRepetitions=7,WeightDecay=0.01,Regularization=L2,DropConfig=0.1+0.1+0.1,DropRepetitions="
                        "1");
      TString training2("LearningRate=1e-5,Momentum=0.3,Repetitions=1,ConvergenceSteps=150,BatchSize=40,"
                        "TestRepetitions=7,WeightDecay=0.01,Regularization=NONE");
      TString training3("LearningRate=1e-6,Momentum=0.1,Repetitions=1,ConvergenceSteps=500,BatchSize=100,"
                        "TestRepetitions=7,WeightDecay=0.0001,Regularization=NONE");
      TString trainingStrategyString("TrainingStrategy=");
      trainingStrategyString += training0 + "|" + training1 + "|" + training2 + "|" + training3;
      //       TString trainingStrategyString
      //       ("TrainingStrategy=LearningRate=1e-1,Momentum=0.3,Repetitions=3,ConvergenceSteps=20,BatchSize=30,TestRepetitions=7,WeightDecay=0.0,L1=false,DropFraction=0.0,DropRepetitions=5");
      TString nnOptions(
         "!H:V:ErrorStrategy=SUMOFSQUARES:VarTransform=G:WeightInitialization=XAVIERUNIFORM:Architecture=CPU");
      //       TString nnOptions ("!H:V:VarTransform=Normalize:ErrorStrategy=CHECKGRADIENTS");
      nnOptions.Append(":");
      nnOptions.Append(layoutString);
      nnOptions.Append(":");
      nnOptions.Append(trainingStrategyString);
      factory->BookMethod(dataloader, TMVA::Types::kDNN, "DNN_CPU", nnOptions); // NN
   }
   // Support Vector Machine
   if (Use["SVM"])
      factory->BookMethod( dataloader,  TMVA::Types::kSVM, "SVM", "Gamma=0.25:Tol=0.001:VarTransform=Norm" );
   // Boosted Decision Trees
   if (Use["BDT"])
     factory->BookMethod( dataloader,  TMVA::Types::kBDT, "BDT",
                           "!H:!V:NTrees=100:MinNodeSize=1.0%:BoostType=AdaBoostR2:SeparationType=RegressionVariance:nCuts=20:PruneMethod=CostComplexity:PruneStrength=30" );
   if (Use["BDTG"])
     factory->BookMethod( dataloader,  TMVA::Types::kBDT, "BDTG",
                           "!H:!V:NTrees=2000::BoostType=Grad:Shrinkage=0.1:UseBaggedBoost:BaggedSampleFraction=0.5:nCuts=20:MaxDepth=3:MaxDepth=4" );
   // --------------------------------------------------------------------------------------------------
   // Now you can tell the factory to train, test, and evaluate the MVAs
   // Train MVAs using the set of training events
   factory->TrainAllMethods();
   // Evaluate all MVAs using the set of test events
   factory->TestAllMethods();
   // Evaluate and compare performance of all configured MVAs
   factory->EvaluateAllMethods();
   // --------------------------------------------------------------
   // Save the output
   outputFile->Close();
   std::cout << "==> Wrote root file: " << outputFile->GetName() << std::endl;
   std::cout << "==> TMVARegression is done!" << std::endl;
   delete factory;
   delete dataloader;
   // Launch the GUI for the root macros
   if (!gROOT->IsBatch()) TMVA::TMVARegGui( outfileName );
}
int main( int argc, char** argv )
{
   // Select methods (don't look at this code - not of interest)
   TString methodList;
   for (int i=1; i<argc; i++) {
      TString regMethod(argv[i]);
      if(regMethod=="-b" || regMethod=="--batch") continue;
      if (!methodList.IsNull()) methodList += TString(",");
      methodList += regMethod;
   }
   TMVARegression(methodList);
   return 0;
}

Author: Andreas Hoecker

Definition in file TMVARegression.C.