As input data is used a toy-MC sample consisting of four Gaussian-distributed and linearly correlated input variables.
The methods to be used can be switched on and off by means of booleans, or via the prompt command, for example:
(note that the backslashes are mandatory) If no method given, a default set is used.
The output file "TMVAReg.root" can be analysed with the use of dedicated macros (simply say: root -l <macro.C>), which can be conveniently invoked through a GUI that will appear at the end of the run of this macro.
==> Start TMVARegression
--- TMVARegression : Using input file: /github/home/ROOT-CI/build/tutorials/tmva/data/tmva_reg_example.root
DataSetInfo : [datasetreg] : Added class "Regression"
: Add Tree TreeR of type Regression with 10000 events
: Dataset[datasetreg] : Class index : 0 name : Regression
Factory : Booking method: ␛[1mPDEFoam␛[0m
:
: Rebuilding Dataset datasetreg
: Building event vectors for type 2 Regression
: Dataset[datasetreg] : create input formulas for tree TreeR
DataSetFactory : [datasetreg] : Number of events in input trees
:
: Number of training and testing events
: ---------------------------------------------------------------------------
: Regression -- training events : 1000
: Regression -- testing events : 9000
: Regression -- training and testing events: 10000
:
DataSetInfo : Correlation matrix (Regression):
: ------------------------
: var1 var2
: var1: +1.000 -0.032
: var2: -0.032 +1.000
: ------------------------
DataSetFactory : [datasetreg] :
:
Factory : Booking method: ␛[1mKNN␛[0m
:
Factory : Booking method: ␛[1mLD␛[0m
:
Factory : Booking method: ␛[1mBDTG␛[0m
:
<WARNING> : Value for option maxdepth was previously set to 3
: the option NegWeightTreatment=InverseBoostNegWeights does not exist for BoostType=Grad
: --> change to new default NegWeightTreatment=Pray
Factory : ␛[1mTrain all methods␛[0m
Factory : [datasetreg] : Create Transformation "I" with events from all classes.
:
: Transformation, Variable selection :
: Input : variable 'var1' <---> Output : variable 'var1'
: Input : variable 'var2' <---> Output : variable 'var2'
TFHandler_Factory : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: 3.3615 1.1815 [ 0.0010317 4.9864 ]
: var2: 2.4456 1.4269 [ 0.0039980 4.9846 ]
: fvalue: 163.04 79.540 [ 1.8147 358.73 ]
: -----------------------------------------------------------
: Ranking input variables (method unspecific)...
IdTransformation : Ranking result (top variable is best ranked)
: --------------------------------------------
: Rank : Variable : |Correlation with target|
: --------------------------------------------
: 1 : var2 : 7.559e-01
: 2 : var1 : 6.143e-01
: --------------------------------------------
IdTransformation : Ranking result (top variable is best ranked)
: -------------------------------------
: Rank : Variable : Mutual information
: -------------------------------------
: 1 : var2 : 2.014e+00
: 2 : var1 : 1.978e+00
: -------------------------------------
IdTransformation : Ranking result (top variable is best ranked)
: ------------------------------------
: Rank : Variable : Correlation Ratio
: ------------------------------------
: 1 : var1 : 6.270e+00
: 2 : var2 : 2.543e+00
: ------------------------------------
IdTransformation : Ranking result (top variable is best ranked)
: ----------------------------------------
: Rank : Variable : Correlation Ratio (T)
: ----------------------------------------
: 1 : var2 : 1.051e+00
: 2 : var1 : 5.263e-01
: ----------------------------------------
Factory : Train method: PDEFoam for Regression
:
: Build mono target regression foam
: Elapsed time: 0.261 sec
: Elapsed time for training with 1000 events: 0.265 sec
: Dataset[datasetreg] : Create results for training
: Dataset[datasetreg] : Evaluation of PDEFoam on training sample
: Dataset[datasetreg] : Elapsed time for evaluation of 1000 events: 0.0033 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
: Creating xml weight file: ␛[0;36mdatasetreg/weights/TMVARegression_PDEFoam.weights.xml␛[0m
: writing foam MonoTargetRegressionFoam to file
: Foams written to file: ␛[0;36mdatasetreg/weights/TMVARegression_PDEFoam.weights_foams.root␛[0m
Factory : Training finished
:
Factory : Train method: KNN for Regression
:
KNN : <Train> start...
: Reading 1000 events
: Number of signal events 1000
: Number of background events 0
: Creating kd-tree with 1000 events
: Computing scale factor for 1d distributions: (ifrac, bottom, top) = (80%, 10%, 90%)
ModulekNN : Optimizing tree for 2 variables with 1000 values
: <Fill> Class 1 has 1000 events
: Elapsed time for training with 1000 events: 0.000658 sec
: Dataset[datasetreg] : Create results for training
: Dataset[datasetreg] : Evaluation of KNN on training sample
: Dataset[datasetreg] : Elapsed time for evaluation of 1000 events: 0.00451 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
: Creating xml weight file: ␛[0;36mdatasetreg/weights/TMVARegression_KNN.weights.xml␛[0m
Factory : Training finished
:
Factory : Train method: LD for Regression
:
LD : Results for LD coefficients:
: -----------------------
: Variable: Coefficient:
: -----------------------
: var1: +41.434
: var2: +42.995
: (offset): -81.387
: -----------------------
: Elapsed time for training with 1000 events: 0.000179 sec
: Dataset[datasetreg] : Create results for training
: Dataset[datasetreg] : Evaluation of LD on training sample
: Dataset[datasetreg] : Elapsed time for evaluation of 1000 events: 0.00116 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
: Creating xml weight file: ␛[0;36mdatasetreg/weights/TMVARegression_LD.weights.xml␛[0m
Factory : Training finished
:
Factory : Train method: BDTG for Regression
:
: Regression Loss Function: Huber
: Training 2000 Decision Trees ... patience please
: Elapsed time for training with 1000 events: 0.809 sec
: Dataset[datasetreg] : Create results for training
: Dataset[datasetreg] : Evaluation of BDTG on training sample
: Dataset[datasetreg] : Elapsed time for evaluation of 1000 events: 0.151 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
: Creating xml weight file: ␛[0;36mdatasetreg/weights/TMVARegression_BDTG.weights.xml␛[0m
: TMVAReg.root:/datasetreg/Method_BDT/BDTG
Factory : Training finished
:
Factory : === Destroy and recreate all methods via weight files for testing ===
:
: Reading weight file: ␛[0;36mdatasetreg/weights/TMVARegression_PDEFoam.weights.xml␛[0m
: Read foams from file: ␛[0;36mdatasetreg/weights/TMVARegression_PDEFoam.weights_foams.root␛[0m
: Reading weight file: ␛[0;36mdatasetreg/weights/TMVARegression_KNN.weights.xml␛[0m
: Creating kd-tree with 1000 events
: Computing scale factor for 1d distributions: (ifrac, bottom, top) = (80%, 10%, 90%)
ModulekNN : Optimizing tree for 2 variables with 1000 values
: <Fill> Class 1 has 1000 events
: Reading weight file: ␛[0;36mdatasetreg/weights/TMVARegression_LD.weights.xml␛[0m
: Reading weight file: ␛[0;36mdatasetreg/weights/TMVARegression_BDTG.weights.xml␛[0m
Factory : ␛[1mTest all methods␛[0m
Factory : Test method: PDEFoam for Regression performance
:
: Dataset[datasetreg] : Create results for testing
: Dataset[datasetreg] : Evaluation of PDEFoam on testing sample
: Dataset[datasetreg] : Elapsed time for evaluation of 9000 events: 0.036 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
Factory : Test method: KNN for Regression performance
:
: Dataset[datasetreg] : Create results for testing
: Dataset[datasetreg] : Evaluation of KNN on testing sample
: Dataset[datasetreg] : Elapsed time for evaluation of 9000 events: 0.0378 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
Factory : Test method: LD for Regression performance
:
: Dataset[datasetreg] : Create results for testing
: Dataset[datasetreg] : Evaluation of LD on testing sample
: Dataset[datasetreg] : Elapsed time for evaluation of 9000 events: 0.0015 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
Factory : Test method: BDTG for Regression performance
:
: Dataset[datasetreg] : Create results for testing
: Dataset[datasetreg] : Evaluation of BDTG on testing sample
: Dataset[datasetreg] : Elapsed time for evaluation of 9000 events: 0.913 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
Factory : ␛[1mEvaluate all methods␛[0m
: Evaluate regression method: PDEFoam
: TestRegression (testing)
: Calculate regression for all events
: Elapsed time for evaluation of 9000 events: 0.0203 sec
: TestRegression (training)
: Calculate regression for all events
: Elapsed time for evaluation of 1000 events: 0.00373 sec
TFHandler_PDEFoam : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: 3.3370 1.1877 [ 0.00020069 5.0000 ]
: var2: 2.4902 1.4378 [ 0.00071490 5.0000 ]
: fvalue: 164.24 84.217 [ 1.6186 394.84 ]
: -----------------------------------------------------------
: Evaluate regression method: KNN
: TestRegression (testing)
: Calculate regression for all events
: Elapsed time for evaluation of 9000 events: 0.0374 sec
: TestRegression (training)
: Calculate regression for all events
: Elapsed time for evaluation of 1000 events: 0.00486 sec
TFHandler_KNN : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: 3.3370 1.1877 [ 0.00020069 5.0000 ]
: var2: 2.4902 1.4378 [ 0.00071490 5.0000 ]
: fvalue: 164.24 84.217 [ 1.6186 394.84 ]
: -----------------------------------------------------------
: Evaluate regression method: LD
: TestRegression (testing)
: Calculate regression for all events
: Elapsed time for evaluation of 9000 events: 0.00283 sec
: TestRegression (training)
: Calculate regression for all events
: Elapsed time for evaluation of 1000 events: 0.000443 sec
TFHandler_LD : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: 3.3370 1.1877 [ 0.00020069 5.0000 ]
: var2: 2.4902 1.4378 [ 0.00071490 5.0000 ]
: fvalue: 164.24 84.217 [ 1.6186 394.84 ]
: -----------------------------------------------------------
: Evaluate regression method: BDTG
: TestRegression (testing)
: Calculate regression for all events
: Elapsed time for evaluation of 9000 events: 0.901 sec
: TestRegression (training)
: Calculate regression for all events
: Elapsed time for evaluation of 1000 events: 0.101 sec
TFHandler_BDTG : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: 3.3370 1.1877 [ 0.00020069 5.0000 ]
: var2: 2.4902 1.4378 [ 0.00071490 5.0000 ]
: fvalue: 164.24 84.217 [ 1.6186 394.84 ]
: -----------------------------------------------------------
:
: Evaluation results ranked by smallest RMS on test sample:
: ("Bias" quotes the mean deviation of the regression from true target.
: "MutInf" is the "Mutual Information" between regression and target.
: Indicated by "_T" are the corresponding "truncated" quantities ob-
: tained when removing events deviating more than 2sigma from average.)
: --------------------------------------------------------------------------------------------------
: --------------------------------------------------------------------------------------------------
: datasetreg BDTG : 0.0489 0.0694 2.42 1.86 | 3.157 3.194
: datasetreg KNN : -1.25 0.0612 7.84 4.47 | 2.870 2.864
: datasetreg PDEFoam : -1.10 -0.585 10.2 8.00 | 2.281 2.331
: datasetreg LD : -0.301 1.50 19.9 17.9 | 1.984 1.960
: --------------------------------------------------------------------------------------------------
:
: Evaluation results ranked by smallest RMS on training sample:
: (overtraining check)
: --------------------------------------------------------------------------------------------------
: DataSet Name: MVA Method: <Bias> <Bias_T> RMS RMS_T | MutInf MutInf_T
: --------------------------------------------------------------------------------------------------
: datasetreg BDTG : 0.0188 0.0107 0.445 0.254 | 3.483 3.514
: datasetreg KNN : -0.486 0.354 5.18 3.61 | 2.948 2.988
: datasetreg PDEFoam :-3.12e-07 0.265 7.58 6.09 | 2.514 2.592
: datasetreg LD :-9.54e-07 1.31 19.0 17.5 | 2.081 2.113
: --------------------------------------------------------------------------------------------------
:
Dataset:datasetreg : Created tree 'TestTree' with 9000 events
:
Dataset:datasetreg : Created tree 'TrainTree' with 1000 events
:
Factory : ␛[1mThank you for using TMVA!␛[0m
: ␛[1mFor citation information, please visit: http://tmva.sf.net/citeTMVA.html␛[0m
==> Wrote root file: TMVAReg.root
==> TMVARegression is done!
#include <cstdlib>
#include <iostream>
#include <map>
#include <string>
void TMVARegression(
TString myMethodList =
"" )
{
std::map<std::string,int> Use;
Use["PDERS"] = 0;
Use["PDEFoam"] = 1;
Use["KNN"] = 1;
Use["LD"] = 1;
Use["FDA_GA"] = 0;
Use["FDA_MC"] = 0;
Use["FDA_MT"] = 0;
Use["FDA_GAMT"] = 0;
Use["MLP"] = 0;
#ifdef R__HAS_TMVAGPU
Use["DNN_GPU"] = 1;
Use["DNN_CPU"] = 0;
#else
Use["DNN_GPU"] = 0;
#ifdef R__HAS_TMVACPU
Use["DNN_CPU"] = 1;
#else
Use["DNN_CPU"] = 0;
#endif
#endif
Use["SVM"] = 0;
Use["BDT"] = 0;
Use["BDTG"] = 1;
std::cout << std::endl;
std::cout << "==> Start TMVARegression" << std::endl;
if (myMethodList != "") {
for (std::map<std::string,int>::iterator it = Use.begin(); it != Use.end(); it++) it->second = 0;
std::vector<TString> mlist =
gTools().SplitString( myMethodList,
',' );
std::string regMethod(mlist[
i].Data());
if (Use.find(regMethod) == Use.end()) {
std::cout << "Method \"" << regMethod << "\" not known in TMVA under this name. Choose among the following:" << std::endl;
for (std::map<std::string,int>::iterator it = Use.begin(); it != Use.end(); it++) std::cout << it->first << " ";
std::cout << std::endl;
return;
}
Use[regMethod] = 1;
}
}
TString outfileName(
"TMVAReg.root" );
"!V:!Silent:Color:DrawProgressBar:AnalysisType=Regression" );
dataloader->
AddVariable(
"var1",
"Variable 1",
"units",
'F' );
dataloader->
AddVariable(
"var2",
"Variable 2",
"units",
'F' );
dataloader->
AddSpectator(
"spec1:=var1*2",
"Spectator 1",
"units",
'F' );
dataloader->
AddSpectator(
"spec2:=var1*3",
"Spectator 2",
"units",
'F' );
TString fname =
gROOT->GetTutorialDir() +
"/tmva/data/tmva_reg_example.root";
if (!
gSystem->AccessPathName( fname )) {
}
std::cout << "ERROR: could not open data file" << std::endl;
exit(1);
}
std::cout <<
"--- TMVARegression : Using input file: " <<
input->GetName() << std::endl;
"nTrain_Regression=1000:nTest_Regression=0:SplitMode=Random:NormMode=NumEvents:!V" );
if (Use["PDERS"])
"!H:!V:NormTree=T:VolumeRangeMode=Adaptive:KernelEstimator=Gauss:GaussSigma=0.3:NEventsMin=40:NEventsMax=60:VarTransform=None" );
if (Use["PDEFoam"])
"!H:!V:MultiTargetRegression=F:TargetSelection=Mpv:TailCut=0.001:VolFrac=0.0666:nActiveCells=500:nSampl=2000:nBin=5:Compress=T:Kernel=None:Nmin=10:VarTransform=None" );
if (Use["KNN"])
"nkNN=20:ScaleFrac=0.8:SigmaFact=1.0:Kernel=Gaus:UseKernel=F:UseWeight=T:!Trim" );
if (Use["LD"])
"!H:!V:VarTransform=None" );
if (Use["FDA_MC"])
"!H:!V:Formula=(0)+(1)*x0+(2)*x1:ParRanges=(-100,100);(-100,100);(-100,100):FitMethod=MC:SampleSize=100000:Sigma=0.1:VarTransform=D" );
if (Use["FDA_GA"])
"!H:!V:Formula=(0)+(1)*x0+(2)*x1:ParRanges=(-100,100);(-100,100);(-100,100):FitMethod=GA:PopSize=100:Cycles=3:Steps=30:Trim=True:SaveBestGen=1:VarTransform=Norm" );
if (Use["FDA_MT"])
"!H:!V:Formula=(0)+(1)*x0+(2)*x1:ParRanges=(-100,100);(-100,100);(-100,100);(-10,10):FitMethod=MINUIT:ErrorLevel=1:PrintLevel=-1:FitStrategy=2:UseImprove:UseMinos:SetBatch" );
if (Use["FDA_GAMT"])
"!H:!V:Formula=(0)+(1)*x0+(2)*x1:ParRanges=(-100,100);(-100,100);(-100,100):FitMethod=GA:Converger=MINUIT:ErrorLevel=1:PrintLevel=-1:FitStrategy=0:!UseImprove:!UseMinos:SetBatch:Cycles=1:PopSize=5:Steps=5:Trim" );
if (Use["MLP"])
factory->
BookMethod( dataloader,
TMVA::Types::kMLP,
"MLP",
"!H:!V:VarTransform=Norm:NeuronType=tanh:NCycles=20000:HiddenLayers=N+20:TestRate=6:TrainingMethod=BFGS:Sampling=0.3:SamplingEpoch=0.8:ConvergenceImprove=1e-6:ConvergenceTests=15:!UseRegulator" );
if (Use["DNN_CPU"] || Use["DNN_GPU"]) {
TString archOption = Use[
"DNN_GPU"] ?
"GPU" :
"CPU";
TString layoutString(
"Layout=TANH|50,TANH|50,TANH|50,LINEAR");
TString trainingStrategyString(
"TrainingStrategy=");
trainingStrategyString +="LearningRate=1e-3,Momentum=0.3,ConvergenceSteps=20,BatchSize=50,TestRepetitions=1,WeightDecay=0.0,Regularization=None,Optimizer=Adam";
TString nnOptions(
"!H:V:ErrorStrategy=SUMOFSQUARES:VarTransform=G:WeightInitialization=XAVIERUNIFORM:Architecture=");
nnOptions.Append(archOption);
nnOptions.Append(":");
nnOptions.Append(layoutString);
nnOptions.Append(":");
nnOptions.Append(trainingStrategyString);
}
if (Use["SVM"])
if (Use["BDT"])
"!H:!V:NTrees=100:MinNodeSize=1.0%:BoostType=AdaBoostR2:SeparationType=RegressionVariance:nCuts=20:PruneMethod=CostComplexity:PruneStrength=30" );
if (Use["BDTG"])
"!H:!V:NTrees=2000::BoostType=Grad:Shrinkage=0.1:UseBaggedBoost:BaggedSampleFraction=0.5:nCuts=20:MaxDepth=3:MaxDepth=4" );
std::cout <<
"==> Wrote root file: " << outputFile->
GetName() << std::endl;
std::cout << "==> TMVARegression is done!" << std::endl;
delete factory;
delete dataloader;
}
int main(
int argc,
char** argv )
{
for (
int i=1;
i<argc;
i++) {
if(regMethod=="-b" || regMethod=="--batch") continue;
methodList += regMethod;
}
TMVARegression(methodList);
return 0;
}
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void input
R__EXTERN TSystem * gSystem
A specialized string object used for TTree selections.
A file, usually with extension .root, that stores data and code in the form of serialized objects in ...
static TFile * Open(const char *name, Option_t *option="", const char *ftitle="", Int_t compress=ROOT::RCompressionSetting::EDefaults::kUseCompiledDefault, Int_t netopt=0)
Create / open a file.
void Close(Option_t *option="") override
Close a file.
void AddSpectator(const TString &expression, const TString &title="", const TString &unit="", Double_t min=0, Double_t max=0)
user inserts target in data set info
void AddRegressionTree(TTree *tree, Double_t weight=1.0, Types::ETreeType treetype=Types::kMaxTreeType)
void SetWeightExpression(const TString &variable, const TString &className="")
void PrepareTrainingAndTestTree(const TCut &cut, const TString &splitOpt)
prepare the training and test trees -> same cuts for signal and background
void AddTarget(const TString &expression, const TString &title="", const TString &unit="", Double_t min=0, Double_t max=0)
user inserts target in data set info
void AddVariable(const TString &expression, const TString &title, const TString &unit, char type='F', Double_t min=0, Double_t max=0)
user inserts discriminating variable in data set info
This is the main MVA steering class.
void TrainAllMethods()
Iterates through all booked methods and calls training.
MethodBase * BookMethod(DataLoader *loader, TString theMethodName, TString methodTitle, TString theOption="")
Book a classifier or regression method.
void TestAllMethods()
Evaluates all booked methods on the testing data and adds the output to the Results in the corresponi...
void EvaluateAllMethods(void)
Iterates over all MVAs that have been booked, and calls their evaluation methods.
const char * GetName() const override
Returns name of object.
A TTree represents a columnar dataset.
create variable transformations
void TMVARegGui(const char *fName="TMVAReg.root", TString dataset="")