# TMVAClassification
This macro provides examples for the training and testing of the
TMVA classifiers.

As input data is used a toy-MC sample consisting of four Gaussian-distributed
and linearly correlated input variables.
The methods to be used can be switched on and off by means of booleans, or
via the prompt command, for example:

    root -l ./TMVAClassification.C\(\"Fisher,Likelihood\"\)

(note that the backslashes are mandatory)
If no method given, a default set of classifiers is used.
The output file "TMVAC.root" can be analysed with the use of dedicated
macros (simply say: root -l <macro.C>), which can be conveniently
invoked through a GUI that will appear at the end of the run of this macro.
Launch the GUI via the command:

    root -l ./TMVAGui.C

You can also compile and run the example with the following commands

    make
    ./TMVAClassification <Methods>

where: `<Methods> = "method1 method2"` are the TMVA classifier names
example:

    ./TMVAClassification Fisher LikelihoodPCA BDT

If no method given, a default set is of classifiers is used

- Project   : TMVA - a ROOT-integrated toolkit for multivariate data analysis
- Package   : TMVA
- Root Macro: TMVAClassification



**Author:** Andreas Hoecker  
<i><small>This notebook tutorial was automatically generated with <a href= "https://github.com/root-project/root/blob/master/documentation/doxygen/converttonotebook.py">ROOTBOOK-izer</a> from the macro found in the ROOT repository  on Monday, May 13, 2024 at 11:22 AM.</small></i>

 Arguments are defined. 

In [1]:
TString myMethodList = "";

The explicit loading of the shared libTMVA is done in TMVAlogon.C, defined in .rootrc
if you use your private .rootrc, or run from a different directory, please copy the
corresponding lines from .rootrc

Methods to be processed can be given as an argument; use format:

mylinux~> root -l TMVAClassification.C\(\"myMethod1,myMethod2,myMethod3\"\)

---------------------------------------------------------------
This loads the library

In [2]:
TMVA::Tools::Instance();

Default MVA methods to be trained + tested

In [3]:
std::map<std::string,int> Use;

Cut optimisation

In [4]:
Use["Cuts"]            = 1;
Use["CutsD"]           = 1;
Use["CutsPCA"]         = 0;
Use["CutsGA"]          = 0;
Use["CutsSA"]          = 0;

1-dimensional likelihood ("naive Bayes estimator")

In [5]:
Use["Likelihood"]      = 1;
Use["LikelihoodD"]     = 0; // the "D" extension indicates decorrelated input variables (see option strings)
Use["LikelihoodPCA"]   = 1; // the "PCA" extension indicates PCA-transformed input variables (see option strings)
Use["LikelihoodKDE"]   = 0;
Use["LikelihoodMIX"]   = 0;

Mutidimensional likelihood and Nearest-Neighbour methods

In [6]:
Use["PDERS"]           = 1;
Use["PDERSD"]          = 0;
Use["PDERSPCA"]        = 0;
Use["PDEFoam"]         = 1;
Use["PDEFoamBoost"]    = 0; // uses generalised MVA method boosting
Use["KNN"]             = 1; // k-nearest neighbour method

Linear Discriminant Analysis

In [7]:
Use["LD"]              = 1; // Linear Discriminant identical to Fisher
Use["Fisher"]          = 0;
Use["FisherG"]         = 0;
Use["BoostedFisher"]   = 0; // uses generalised MVA method boosting
Use["HMatrix"]         = 0;

Function Discriminant analysis

In [8]:
Use["FDA_GA"]          = 1; // minimisation of user-defined function using Genetics Algorithm
Use["FDA_SA"]          = 0;
Use["FDA_MC"]          = 0;
Use["FDA_MT"]          = 0;
Use["FDA_GAMT"]        = 0;
Use["FDA_MCMT"]        = 0;

Neural Networks (all are feed-forward Multilayer Perceptrons)

In [9]:
Use["MLP"]             = 0; // Recommended ANN
Use["MLPBFGS"]         = 0; // Recommended ANN with optional training method
Use["MLPBNN"]          = 1; // Recommended ANN with BFGS training method and bayesian regulator
Use["CFMlpANN"]        = 0; // Depreciated ANN from ALEPH
Use["TMlpANN"]         = 0; // ROOT's own ANN
#ifdef R__HAS_TMVAGPU
Use["DNN_GPU"]         = 1; // CUDA-accelerated DNN training.
#else
Use["DNN_GPU"]         = 0;
#endif

#ifdef R__HAS_TMVACPU
Use["DNN_CPU"]         = 1; // Multi-core accelerated DNN.
#else
Use["DNN_CPU"]         = 0;
#endif

Support Vector Machine

In [10]:
Use["SVM"]             = 1;

Boosted Decision Trees

In [11]:
Use["BDT"]             = 1; // uses Adaptive Boost
Use["BDTG"]            = 0; // uses Gradient Boost
Use["BDTB"]            = 0; // uses Bagging
Use["BDTD"]            = 0; // decorrelation + Adaptive Boost
Use["BDTF"]            = 0; // allow usage of fisher discriminant for node splitting

Friedman's RuleFit method, ie, an optimised series of cuts ("rules")

In [12]:
Use["RuleFit"]         = 1;

---------------------------------------------------------------

In [13]:
std::cout << std::endl;
std::cout << "==> Start TMVAClassification" << std::endl;


==> Start TMVAClassification


Select methods (don't look at this code - not of interest)

In [14]:
if (myMethodList != "") {
   for (std::map<std::string,int>::iterator it = Use.begin(); it != Use.end(); it++) it->second = 0;

   std::vector<TString> mlist = TMVA::gTools().SplitString( myMethodList, ',' );
   for (UInt_t i=0; i<mlist.size(); i++) {
      std::string regMethod(mlist[i]);

      if (Use.find(regMethod) == Use.end()) {
         std::cout << "Method \"" << regMethod << "\" not known in TMVA under this name. Choose among the following:" << std::endl;
         for (std::map<std::string,int>::iterator it = Use.begin(); it != Use.end(); it++) std::cout << it->first << " ";
         std::cout << std::endl;
         return 1;
      }
      Use[regMethod] = 1;
   }
}

--------------------------------------------------------------------------------------------------

Here the preparation phase begins

Read training and test data
(it is also possible to use ASCII format as input -> see TMVA Users Guide)
Set the cache directory for the TFile to the current directory. The input
data file will be downloaded here if not present yet, then it will be read
from the cache path directly.

In [15]:
TFile::SetCacheFileDir(".");
std::unique_ptr<TFile> input{TFile::Open("http://root.cern/files/tmva_class_example.root", "CACHEREAD")};
if (!input || input->IsZombie()) {
   throw std::runtime_error("ERROR: could not open data file");
}
std::cout << "--- TMVAClassification       : Using input file: " << input->GetName() << std::endl;

--- TMVAClassification       : Using input file: ./files/tmva_class_example.root


Info in <TFile::OpenFromCache>: using local cache copy of http://root.cern/files/tmva_class_example.root [./files/tmva_class_example.root]


Register the training and test trees

In [16]:
TTree *signalTree     = (TTree*)input->Get("TreeS");
TTree *background     = (TTree*)input->Get("TreeB");

Create a ROOT output file where TMVA will store ntuples, histograms, etc.

In [17]:
TString outfileName("TMVAC.root");
std::unique_ptr<TFile> outputFile{TFile::Open(outfileName, "RECREATE")};
if (!outputFile || outputFile->IsZombie()) {
   throw std::runtime_error("ERROR: could not open output file");
}

Create the factory object. Later you can choose the methods
whose performance you'd like to investigate. The factory is
the only TMVA object you have to interact with

The first argument is the base of the name of all the
weightfiles in the directory weight

The second argument is the output file for the training results
All TMVA output can be suppressed by removing the "!" (not) in
front of the "Silent" argument in the option string

In [18]:
auto factory = std::make_unique<TMVA::Factory>(
   "TMVAClassification", outputFile.get(),
   "!V:!Silent:Color:!DrawProgressBar:Transformations=I;D;P;G,D:AnalysisType=Classification");
auto dataloader_raii = std::make_unique<TMVA::DataLoader>("dataset");
auto *dataloader = dataloader_raii.get();

If you wish to modify default settings
(please check "src/Config.h" to see all available global options)

(TMVA::gConfig().GetVariablePlotting()).fTimesRMS = 8.0;
(TMVA::gConfig().GetIONames()).fWeightFileDir = "myWeightDirectory";

Define the input variables that shall be used for the MVA training
note that you may also use variable expressions, such as: "3*var1/var2*abs(var3)"
[all types of expressions that can also be parsed by TTree::Draw( "expression" )]

In [19]:
dataloader->AddVariable( "myvar1 := var1+var2", 'F' );
dataloader->AddVariable( "myvar2 := var1-var2", "Expression 2", "", 'F' );
dataloader->AddVariable( "var3",                "Variable 3", "units", 'F' );
dataloader->AddVariable( "var4",                "Variable 4", "units", 'F' );

You can add so-called "Spectator variables", which are not used in the MVA training,
but will appear in the final "TestTree" produced by TMVA. This TestTree will contain the
input variables, the response values of all trained MVAs, and the spectator variables

In [20]:
dataloader->AddSpectator( "spec1 := var1*2",  "Spectator 1", "units", 'F' );
dataloader->AddSpectator( "spec2 := var1*3",  "Spectator 2", "units", 'F' );

global event weights per tree (see below for setting event-wise weights)

In [21]:
Double_t signalWeight     = 1.0;
Double_t backgroundWeight = 1.0;

You can add an arbitrary number of signal or background trees

In [22]:
dataloader->AddSignalTree    ( signalTree,     signalWeight );
dataloader->AddBackgroundTree( background, backgroundWeight );

DataSetInfo              : [dataset] : Added class "Signal"
                         : Add Tree TreeS of type Signal with 6000 events
DataSetInfo              : [dataset] : Added class "Background"
                         : Add Tree TreeB of type Background with 6000 events


To give different trees for training and testing, do as follows:

dataloader->AddSignalTree( signalTrainingTree, signalTrainWeight, "Training" );
dataloader->AddSignalTree( signalTestTree,     signalTestWeight,  "Test" );

Use the following code instead of the above two or four lines to add signal and background
training and test events "by hand"
NOTE that in this case one should not give expressions (such as "var1+var2") in the input
variable definition, but simply compute the expression before adding the event
```cpp
--- begin ----------------------------------------------------------
std::vector<Double_t> vars( 4 ); // vector has size of number of input variables
Float_t  treevars[4], weight;

Signal
for (UInt_t ivar=0; ivar<4; ivar++) signalTree->SetBranchAddress( Form( "var%i", ivar+1 ), &(treevars[ivar]) );
for (UInt_t i=0; i<signalTree->GetEntries(); i++) {
signalTree->GetEntry(i);
for (UInt_t ivar=0; ivar<4; ivar++) vars[ivar] = treevars[ivar];
add training and test events; here: first half is training, second is testing
note that the weight can also be event-wise
if (i < signalTree->GetEntries()/2.0) dataloader->AddSignalTrainingEvent( vars, signalWeight );
else                              dataloader->AddSignalTestEvent    ( vars, signalWeight );
}

Background (has event weights)
background->SetBranchAddress( "weight", &weight );
for (UInt_t ivar=0; ivar<4; ivar++) background->SetBranchAddress( Form( "var%i", ivar+1 ), &(treevars[ivar]) );
for (UInt_t i=0; i<background->GetEntries(); i++) {
background->GetEntry(i);
for (UInt_t ivar=0; ivar<4; ivar++) vars[ivar] = treevars[ivar];
add training and test events; here: first half is training, second is testing
note that the weight can also be event-wise
if (i < background->GetEntries()/2) dataloader->AddBackgroundTrainingEvent( vars, backgroundWeight*weight );
else                                dataloader->AddBackgroundTestEvent    ( vars, backgroundWeight*weight );
}
--- end ------------------------------------------------------------
```
End of tree registration

Set individual event weights (the variables must exist in the original TTree)
-  for signal    : `dataloader->SetSignalWeightExpression    ("weight1*weight2");`
-  for background: `dataloader->SetBackgroundWeightExpression("weight1*weight2");`

In [23]:
dataloader->SetBackgroundWeightExpression( "weight" );

Apply additional cuts on the signal and background samples (can be different)

In [24]:
TCut mycuts = ""; // for example: TCut mycuts = "abs(var1)<0.5 && abs(var2-0.5)<1";
TCut mycutb = ""; // for example: TCut mycutb = "abs(var1)<0.5";

Tell the dataloader how to use the training and testing events

If no numbers of events are given, half of the events in the tree are used
for training, and the other half for testing:

dataloader->PrepareTrainingAndTestTree( mycut, "SplitMode=random:!V" );

To also specify the number of testing events, use:

dataloader->PrepareTrainingAndTestTree( mycut,
"NSigTrain=3000:NBkgTrain=3000:NSigTest=3000:NBkgTest=3000:SplitMode=Random:!V" );

In [25]:
dataloader->PrepareTrainingAndTestTree( mycuts, mycutb,
                                     "nTrain_Signal=1000:nTrain_Background=1000:SplitMode=Random:NormMode=NumEvents:!V" );

### Book MVA methods

Please lookup the various method configuration options in the corresponding cxx files, eg:
src/MethoCuts.cxx, etc, or here: http://tmva.sourceforge.net/old_site/optionRef.html
it is possible to preset ranges in the option string in which the cut optimisation should be done:
"...:CutRangeMin[2]=-1:CutRangeMax[2]=1"...", where [2] is the third input variable

Cut optimisation

In [26]:
if (Use["Cuts"])
   factory->BookMethod( dataloader, TMVA::Types::kCuts, "Cuts",
                        "!H:!V:FitMethod=MC:EffSel:SampleSize=200000:VarProp=FSmart" );

if (Use["CutsD"])
   factory->BookMethod( dataloader, TMVA::Types::kCuts, "CutsD",
                        "!H:!V:FitMethod=MC:EffSel:SampleSize=200000:VarProp=FSmart:VarTransform=Decorrelate" );

if (Use["CutsPCA"])
   factory->BookMethod( dataloader, TMVA::Types::kCuts, "CutsPCA",
                        "!H:!V:FitMethod=MC:EffSel:SampleSize=200000:VarProp=FSmart:VarTransform=PCA" );

if (Use["CutsGA"])
   factory->BookMethod( dataloader, TMVA::Types::kCuts, "CutsGA",
                        "H:!V:FitMethod=GA:CutRangeMin[0]=-10:CutRangeMax[0]=10:VarProp[1]=FMax:EffSel:Steps=30:Cycles=3:PopSize=400:SC_steps=10:SC_rate=5:SC_factor=0.95" );

if (Use["CutsSA"])
   factory->BookMethod( dataloader, TMVA::Types::kCuts, "CutsSA",
                        "!H:!V:FitMethod=SA:EffSel:MaxCalls=150000:KernelTemp=IncAdaptive:InitialTemp=1e+6:MinTemp=1e-6:Eps=1e-10:UseDefaultScale" );

Factory                  : Booking method: [1mCuts[0m
                         : 
                         : Use optimization method: "Monte Carlo"
                         : Use efficiency computation method: "Event Selection"
                         : Use "FSmart" cuts for variable: 'myvar1'
                         : Use "FSmart" cuts for variable: 'myvar2'
                         : Use "FSmart" cuts for variable: 'var3'
                         : Use "FSmart" cuts for variable: 'var4'
Factory                  : Booking method: [1mCutsD[0m
                         : 
CutsD                    : [dataset] : Create Transformation "Decorrelate" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'myvar1' <---> Output : variable 'myvar1'
                         : Input : variable 'myvar2' <---> Output : variable 'myvar2'
                         : Input : variable 'va

Likelihood ("naive Bayes estimator")

In [27]:
if (Use["Likelihood"])
   factory->BookMethod( dataloader, TMVA::Types::kLikelihood, "Likelihood",
                        "H:!V:TransformOutput:PDFInterpol=Spline2:NSmoothSig[0]=20:NSmoothBkg[0]=20:NSmoothBkg[1]=10:NSmooth=1:NAvEvtPerBin=50" );

Factory                  : Booking method: [1mLikelihood[0m
                         : 


Decorrelated likelihood

In [28]:
if (Use["LikelihoodD"])
   factory->BookMethod( dataloader, TMVA::Types::kLikelihood, "LikelihoodD",
                        "!H:!V:TransformOutput:PDFInterpol=Spline2:NSmoothSig[0]=20:NSmoothBkg[0]=20:NSmooth=5:NAvEvtPerBin=50:VarTransform=Decorrelate" );

PCA-transformed likelihood

In [29]:
if (Use["LikelihoodPCA"])
   factory->BookMethod( dataloader, TMVA::Types::kLikelihood, "LikelihoodPCA",
                        "!H:!V:!TransformOutput:PDFInterpol=Spline2:NSmoothSig[0]=20:NSmoothBkg[0]=20:NSmooth=5:NAvEvtPerBin=50:VarTransform=PCA" );

Factory                  : Booking method: [1mLikelihoodPCA[0m
                         : 
LikelihoodPCA            : [dataset] : Create Transformation "PCA" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'myvar1' <---> Output : variable 'myvar1'
                         : Input : variable 'myvar2' <---> Output : variable 'myvar2'
                         : Input : variable 'var3' <---> Output : variable 'var3'
                         : Input : variable 'var4' <---> Output : variable 'var4'


Use a kernel density estimator to approximate the PDFs

In [30]:
if (Use["LikelihoodKDE"])
   factory->BookMethod( dataloader, TMVA::Types::kLikelihood, "LikelihoodKDE",
                        "!H:!V:!TransformOutput:PDFInterpol=KDE:KDEtype=Gauss:KDEiter=Adaptive:KDEFineFactor=0.3:KDEborder=None:NAvEvtPerBin=50" );

Use a variable-dependent mix of splines and kernel density estimator

In [31]:
if (Use["LikelihoodMIX"])
   factory->BookMethod( dataloader, TMVA::Types::kLikelihood, "LikelihoodMIX",
                        "!H:!V:!TransformOutput:PDFInterpolSig[0]=KDE:PDFInterpolBkg[0]=KDE:PDFInterpolSig[1]=KDE:PDFInterpolBkg[1]=KDE:PDFInterpolSig[2]=Spline2:PDFInterpolBkg[2]=Spline2:PDFInterpolSig[3]=Spline2:PDFInterpolBkg[3]=Spline2:KDEtype=Gauss:KDEiter=Nonadaptive:KDEborder=None:NAvEvtPerBin=50" );

Test the multi-dimensional probability density estimator
here are the options strings for the MinMax and RMS methods, respectively:

"!H:!V:VolumeRangeMode=MinMax:DeltaFrac=0.2:KernelEstimator=Gauss:GaussSigma=0.3" );
"!H:!V:VolumeRangeMode=RMS:DeltaFrac=3:KernelEstimator=Gauss:GaussSigma=0.3" );

In [32]:
if (Use["PDERS"])
   factory->BookMethod( dataloader, TMVA::Types::kPDERS, "PDERS",
                        "!H:!V:NormTree=T:VolumeRangeMode=Adaptive:KernelEstimator=Gauss:GaussSigma=0.3:NEventsMin=400:NEventsMax=600" );

if (Use["PDERSD"])
   factory->BookMethod( dataloader, TMVA::Types::kPDERS, "PDERSD",
                        "!H:!V:VolumeRangeMode=Adaptive:KernelEstimator=Gauss:GaussSigma=0.3:NEventsMin=400:NEventsMax=600:VarTransform=Decorrelate" );

if (Use["PDERSPCA"])
   factory->BookMethod( dataloader, TMVA::Types::kPDERS, "PDERSPCA",
                        "!H:!V:VolumeRangeMode=Adaptive:KernelEstimator=Gauss:GaussSigma=0.3:NEventsMin=400:NEventsMax=600:VarTransform=PCA" );

Factory                  : Booking method: [1mPDERS[0m
                         : 


Multi-dimensional likelihood estimator using self-adapting phase-space binning

In [33]:
if (Use["PDEFoam"])
   factory->BookMethod( dataloader, TMVA::Types::kPDEFoam, "PDEFoam",
                        "!H:!V:SigBgSeparate=F:TailCut=0.001:VolFrac=0.0666:nActiveCells=500:nSampl=2000:nBin=5:Nmin=100:Kernel=None:Compress=T" );

if (Use["PDEFoamBoost"])
   factory->BookMethod( dataloader, TMVA::Types::kPDEFoam, "PDEFoamBoost",
                        "!H:!V:Boost_Num=30:Boost_Transform=linear:SigBgSeparate=F:MaxDepth=4:UseYesNoCell=T:DTLogic=MisClassificationError:FillFoamWithOrigWeights=F:TailCut=0:nActiveCells=500:nBin=20:Nmin=400:Kernel=None:Compress=T" );

Factory                  : Booking method: [1mPDEFoam[0m
                         : 


K-Nearest Neighbour classifier (KNN)

In [34]:
if (Use["KNN"])
   factory->BookMethod( dataloader, TMVA::Types::kKNN, "KNN",
                        "H:nkNN=20:ScaleFrac=0.8:SigmaFact=1.0:Kernel=Gaus:UseKernel=F:UseWeight=T:!Trim" );

Factory                  : Booking method: [1mKNN[0m
                         : 


H-Matrix (chi2-squared) method

In [35]:
if (Use["HMatrix"])
   factory->BookMethod( dataloader, TMVA::Types::kHMatrix, "HMatrix", "!H:!V:VarTransform=None" );

Linear discriminant (same as Fisher discriminant)

In [36]:
if (Use["LD"])
   factory->BookMethod( dataloader, TMVA::Types::kLD, "LD", "H:!V:VarTransform=None:CreateMVAPdfs:PDFInterpolMVAPdf=Spline2:NbinsMVAPdf=50:NsmoothMVAPdf=10" );

Factory                  : Booking method: [1mLD[0m
                         : 
                         : Rebuilding Dataset dataset
                         : Building event vectors for type 2 Signal
                         : Dataset[dataset] :  create input formulas for tree TreeS
                         : Building event vectors for type 2 Background
                         : Dataset[dataset] :  create input formulas for tree TreeB
DataSetFactory           : [dataset] : Number of events in input trees
                         : 
                         : 
                         : Number of training and testing events
                         : ---------------------------------------------------------------------------
                         : Signal     -- training events            : 1000
                         : Signal     -- testing events             : 5000
                         : Signal     -- training and testing events: 6000
                         : Backgroun

Fisher discriminant (same as LD)

In [37]:
if (Use["Fisher"])
   factory->BookMethod( dataloader, TMVA::Types::kFisher, "Fisher", "H:!V:Fisher:VarTransform=None:CreateMVAPdfs:PDFInterpolMVAPdf=Spline2:NbinsMVAPdf=50:NsmoothMVAPdf=10" );

Fisher with Gauss-transformed input variables

In [38]:
if (Use["FisherG"])
   factory->BookMethod( dataloader, TMVA::Types::kFisher, "FisherG", "H:!V:VarTransform=Gauss" );

Composite classifier: ensemble (tree) of boosted Fisher classifiers

In [39]:
if (Use["BoostedFisher"])
   factory->BookMethod( dataloader, TMVA::Types::kFisher, "BoostedFisher",
                        "H:!V:Boost_Num=20:Boost_Transform=log:Boost_Type=AdaBoost:Boost_AdaBoostBeta=0.2:!Boost_DetailedMonitoring" );

Function discrimination analysis (FDA) -- test of various fitters - the recommended one is Minuit (or GA or SA)

In [40]:
if (Use["FDA_MC"])
   factory->BookMethod( dataloader, TMVA::Types::kFDA, "FDA_MC",
                        "H:!V:Formula=(0)+(1)*x0+(2)*x1+(3)*x2+(4)*x3:ParRanges=(-1,1);(-10,10);(-10,10);(-10,10);(-10,10):FitMethod=MC:SampleSize=100000:Sigma=0.1" );

if (Use["FDA_GA"]) // can also use Simulated Annealing (SA) algorithm (see Cuts_SA options])
   factory->BookMethod( dataloader, TMVA::Types::kFDA, "FDA_GA",
                        "H:!V:Formula=(0)+(1)*x0+(2)*x1+(3)*x2+(4)*x3:ParRanges=(-1,1);(-10,10);(-10,10);(-10,10);(-10,10):FitMethod=GA:PopSize=100:Cycles=2:Steps=5:Trim=True:SaveBestGen=1" );

if (Use["FDA_SA"]) // can also use Simulated Annealing (SA) algorithm (see Cuts_SA options])
   factory->BookMethod( dataloader, TMVA::Types::kFDA, "FDA_SA",
                        "H:!V:Formula=(0)+(1)*x0+(2)*x1+(3)*x2+(4)*x3:ParRanges=(-1,1);(-10,10);(-10,10);(-10,10);(-10,10):FitMethod=SA:MaxCalls=15000:KernelTemp=IncAdaptive:InitialTemp=1e+6:MinTemp=1e-6:Eps=1e-10:UseDefaultScale" );

if (Use["FDA_MT"])
   factory->BookMethod( dataloader, TMVA::Types::kFDA, "FDA_MT",
                        "H:!V:Formula=(0)+(1)*x0+(2)*x1+(3)*x2+(4)*x3:ParRanges=(-1,1);(-10,10);(-10,10);(-10,10);(-10,10):FitMethod=MINUIT:ErrorLevel=1:PrintLevel=-1:FitStrategy=2:UseImprove:UseMinos:SetBatch" );

if (Use["FDA_GAMT"])
   factory->BookMethod( dataloader, TMVA::Types::kFDA, "FDA_GAMT",
                        "H:!V:Formula=(0)+(1)*x0+(2)*x1+(3)*x2+(4)*x3:ParRanges=(-1,1);(-10,10);(-10,10);(-10,10);(-10,10):FitMethod=GA:Converger=MINUIT:ErrorLevel=1:PrintLevel=-1:FitStrategy=0:!UseImprove:!UseMinos:SetBatch:Cycles=1:PopSize=5:Steps=5:Trim" );

if (Use["FDA_MCMT"])
   factory->BookMethod( dataloader, TMVA::Types::kFDA, "FDA_MCMT",
                        "H:!V:Formula=(0)+(1)*x0+(2)*x1+(3)*x2+(4)*x3:ParRanges=(-1,1);(-10,10);(-10,10);(-10,10);(-10,10):FitMethod=MC:Converger=MINUIT:ErrorLevel=1:PrintLevel=-1:FitStrategy=0:!UseImprove:!UseMinos:SetBatch:SampleSize=20" );

Factory                  : Booking method: [1mFDA_GA[0m
                         : 
                         : Create parameter interval for parameter 0 : [-1,1]
                         : Create parameter interval for parameter 1 : [-10,10]
                         : Create parameter interval for parameter 2 : [-10,10]
                         : Create parameter interval for parameter 3 : [-10,10]
                         : Create parameter interval for parameter 4 : [-10,10]
                         : User-defined formula string       : "(0)+(1)*x0+(2)*x1+(3)*x2+(4)*x3"
                         : TFormula-compatible formula string: "[0]+[1]*[5]+[2]*[6]+[3]*[7]+[4]*[8]"


TMVA ANN: MLP (recommended ANN) -- all ANNs in TMVA are Multilayer Perceptrons

In [41]:
if (Use["MLP"])
   factory->BookMethod( dataloader, TMVA::Types::kMLP, "MLP", "H:!V:NeuronType=tanh:VarTransform=N:NCycles=600:HiddenLayers=N+5:TestRate=5:!UseRegulator" );

if (Use["MLPBFGS"])
   factory->BookMethod( dataloader, TMVA::Types::kMLP, "MLPBFGS", "H:!V:NeuronType=tanh:VarTransform=N:NCycles=600:HiddenLayers=N+5:TestRate=5:TrainingMethod=BFGS:!UseRegulator" );

if (Use["MLPBNN"])
   factory->BookMethod( dataloader, TMVA::Types::kMLP, "MLPBNN", "H:!V:NeuronType=tanh:VarTransform=N:NCycles=60:HiddenLayers=N+5:TestRate=5:TrainingMethod=BFGS:UseRegulator" ); // BFGS training with bayesian regulators

Factory                  : Booking method: [1mMLPBNN[0m
                         : 
MLPBNN                   : [dataset] : Create Transformation "N" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'myvar1' <---> Output : variable 'myvar1'
                         : Input : variable 'myvar2' <---> Output : variable 'myvar2'
                         : Input : variable 'var3' <---> Output : variable 'var3'
                         : Input : variable 'var4' <---> Output : variable 'var4'
MLPBNN                   : Building Network. 
                         : Initializing weights


Multi-architecture DNN implementation.

In [42]:
if (Use["DNN_CPU"] or Use["DNN_GPU"]) {
   // General layout.
   TString layoutString ("Layout=TANH|128,TANH|128,TANH|128,LINEAR");

   // Define Training strategy. One could define multiple strategy string separated by the "|" delimiter

   TString trainingStrategyString = ("TrainingStrategy=LearningRate=1e-2,Momentum=0.9,"
                                     "ConvergenceSteps=20,BatchSize=100,TestRepetitions=1,"
                                     "WeightDecay=1e-4,Regularization=None,"
                                     "DropConfig=0.0+0.5+0.5+0.5");

   // General Options.
   TString dnnOptions ("!H:V:ErrorStrategy=CROSSENTROPY:VarTransform=N:"
                       "WeightInitialization=XAVIERUNIFORM");
   dnnOptions.Append (":"); dnnOptions.Append (layoutString);
   dnnOptions.Append (":"); dnnOptions.Append (trainingStrategyString);

   // Cuda implementation.
   if (Use["DNN_GPU"]) {
      TString gpuOptions = dnnOptions + ":Architecture=GPU";
      factory->BookMethod(dataloader, TMVA::Types::kDL, "DNN_GPU", gpuOptions);
   }
   // Multi-core CPU implementation.
   if (Use["DNN_CPU"]) {
      TString cpuOptions = dnnOptions + ":Architecture=CPU";
      factory->BookMethod(dataloader, TMVA::Types::kDL, "DNN_CPU", cpuOptions);
   }
}

Factory                  : Booking method: [1mDNN_CPU[0m
                         : 
                         : Parsing option string: 
                         : ... "!H:V:ErrorStrategy=CROSSENTROPY:VarTransform=N:WeightInitialization=XAVIERUNIFORM:Layout=TANH|128,TANH|128,TANH|128,LINEAR:TrainingStrategy=LearningRate=1e-2,Momentum=0.9,ConvergenceSteps=20,BatchSize=100,TestRepetitions=1,WeightDecay=1e-4,Regularization=None,DropConfig=0.0+0.5+0.5+0.5:Architecture=CPU"
                         : The following options are set:
                         : - By User:
                         :     <none>
                         : - Default:
                         :     Boost_num: "0" [Number of times the classifier will be boosted]
                         : Parsing option string: 
                         : ... "!H:V:ErrorStrategy=CROSSENTROPY:VarTransform=N:WeightInitialization=XAVIERUNIFORM:Layout=TANH|128,TANH|128,TANH|128,LINEAR:TrainingStrategy=LearningRate=1e-2,Momentum=0.9,Conv

CF(Clermont-Ferrand)ANN

In [43]:
if (Use["CFMlpANN"])
   factory->BookMethod( dataloader, TMVA::Types::kCFMlpANN, "CFMlpANN", "!H:!V:NCycles=200:HiddenLayers=N+1,N"  ); // n_cycles:#nodes:#nodes:...

Tmlp(Root)ANN

In [44]:
if (Use["TMlpANN"])
   factory->BookMethod( dataloader, TMVA::Types::kTMlpANN, "TMlpANN", "!H:!V:NCycles=200:HiddenLayers=N+1,N:LearningMethod=BFGS:ValidationFraction=0.3"  ); // n_cycles:#nodes:#nodes:...

Support Vector Machine

In [45]:
if (Use["SVM"])
   factory->BookMethod( dataloader, TMVA::Types::kSVM, "SVM", "Gamma=0.25:Tol=0.001:VarTransform=Norm" );

Factory                  : Booking method: [1mSVM[0m
                         : 
SVM                      : [dataset] : Create Transformation "Norm" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'myvar1' <---> Output : variable 'myvar1'
                         : Input : variable 'myvar2' <---> Output : variable 'myvar2'
                         : Input : variable 'var3' <---> Output : variable 'var3'
                         : Input : variable 'var4' <---> Output : variable 'var4'


Boosted Decision Trees

In [46]:
if (Use["BDTG"]) // Gradient Boost
   factory->BookMethod( dataloader, TMVA::Types::kBDT, "BDTG",
                        "!H:!V:NTrees=1000:MinNodeSize=2.5%:BoostType=Grad:Shrinkage=0.10:UseBaggedBoost:BaggedSampleFraction=0.5:nCuts=20:MaxDepth=2" );

if (Use["BDT"])  // Adaptive Boost
   factory->BookMethod( dataloader, TMVA::Types::kBDT, "BDT",
                        "!H:!V:NTrees=850:MinNodeSize=2.5%:MaxDepth=3:BoostType=AdaBoost:AdaBoostBeta=0.5:UseBaggedBoost:BaggedSampleFraction=0.5:SeparationType=GiniIndex:nCuts=20" );

if (Use["BDTB"]) // Bagging
   factory->BookMethod( dataloader, TMVA::Types::kBDT, "BDTB",
                        "!H:!V:NTrees=400:BoostType=Bagging:SeparationType=GiniIndex:nCuts=20" );

if (Use["BDTD"]) // Decorrelation + Adaptive Boost
   factory->BookMethod( dataloader, TMVA::Types::kBDT, "BDTD",
                        "!H:!V:NTrees=400:MinNodeSize=5%:MaxDepth=3:BoostType=AdaBoost:SeparationType=GiniIndex:nCuts=20:VarTransform=Decorrelate" );

if (Use["BDTF"])  // Allow Using Fisher discriminant in node splitting for (strong) linearly correlated variables
   factory->BookMethod( dataloader, TMVA::Types::kBDT, "BDTF",
                        "!H:!V:NTrees=50:MinNodeSize=2.5%:UseFisherCuts:MaxDepth=3:BoostType=AdaBoost:AdaBoostBeta=0.5:SeparationType=GiniIndex:nCuts=20" );

Factory                  : Booking method: [1mBDT[0m
                         : 


RuleFit -- TMVA implementation of Friedman's method

In [47]:
if (Use["RuleFit"])
   factory->BookMethod( dataloader, TMVA::Types::kRuleFit, "RuleFit",
                        "H:!V:RuleFitModule=RFTMVA:Model=ModRuleLinear:MinImp=0.001:RuleMinDist=0.001:NTrees=20:fEventsMin=0.01:fEventsMax=0.5:GDTau=-1.0:GDTauPrec=0.01:GDStep=0.01:GDNSteps=10000:GDErrScale=1.02" );

Factory                  : Booking method: [1mRuleFit[0m
                         : 


For an example of the category classifier usage, see: TMVAClassificationCategory

--------------------------------------------------------------------------------------------------
Now you can optimize the setting (configuration) of the MVAs using the set of training events
STILL EXPERIMENTAL and only implemented for BDT's !

factory->OptimizeAllMethods("SigEffAtBkg0.01","Scan");
factory->OptimizeAllMethods("ROCIntegral","FitGA");

--------------------------------------------------------------------------------------------------

Now you can tell the factory to train, test, and evaluate the MVAs

Train MVAs using the set of training events

In [48]:
factory->TrainAllMethods();

Factory                  : [1mTrain all methods[0m
Factory                  : [dataset] : Create Transformation "I" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'myvar1' <---> Output : variable 'myvar1'
                         : Input : variable 'myvar2' <---> Output : variable 'myvar2'
                         : Input : variable 'var3' <---> Output : variable 'var3'
                         : Input : variable 'var4' <---> Output : variable 'var4'
Factory                  : [dataset] : Create Transformation "D" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'myvar1' <---> Output : variable 'myvar1'
                         : Input : variable 'myvar2' <---> Output : variable 'myvar2'
                         : Input : variable 'var3' <---> Output

Evaluate all MVAs using the set of test events

In [49]:
factory->TestAllMethods();

Factory                  : [1mTest all methods[0m
Factory                  : Test method: Cuts for Classification performance
                         : 
Cuts                     : [dataset] : Evaluation of Cuts on testing sample (10000 events)
                         : Elapsed time for evaluation of 10000 events: 0.00043 sec       
Factory                  : Test method: CutsD for Classification performance
                         : 
CutsD                    : [dataset] : Evaluation of CutsD on testing sample (10000 events)
                         : Elapsed time for evaluation of 10000 events: 0.00725 sec       
Factory                  : Test method: Likelihood for Classification performance
                         : 
Likelihood               : [dataset] : Evaluation of Likelihood on testing sample (10000 events)
                         : Elapsed time for evaluation of 10000 events: 0.00975 sec       
Factory                  : Test method: LikelihoodPCA for Classification per

Evaluate and compare performance of all configured MVAs

In [50]:
factory->EvaluateAllMethods();

Factory                  : [1mEvaluate all methods[0m
Factory                  : Evaluate classifier: Cuts
                         : 
TFHandler_Cuts           : Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :   myvar1:    0.21781     1.7248   [    -9.8605     7.9024 ]
                         :   myvar2:  -0.062175     1.1106   [    -4.0854     4.0259 ]
                         :     var3:    0.16451     1.0589   [    -5.3563     4.6422 ]
                         :     var4:    0.43566     1.2253   [    -6.9675     5.0307 ]
                         : -----------------------------------------------------------
Factory                  : Evaluate classifier: CutsD
                         : 
TFHandler_CutsD          : Variable        Mean        RMS   [        Min        Max ]
                         : ----------------------------------------------------------

--------------------------------------------------------------

Save the output

In [51]:
outputFile->Write();

std::cout << "==> Wrote root file: " << outputFile->GetName() << std::endl;
std::cout << "==> TMVAClassification is done!" << std::endl;

==> Wrote root file: TMVAC.root
==> TMVAClassification is done!


Launch the GUI for the root macros

In [52]:
if (!gROOT->IsBatch()) TMVA::TMVAGui( outfileName );

return 0;