This tutorial show how you can train a machine learning model with any package reading the training data directly from ROOT files.
Using XGBoost, we illustrate how you can convert an externally trained model in a format serializable and readable with the fast tree inference engine offered by TMVA.
from xgboost import XGBClassifier
import ROOT
import numpy as np
from tmva100_DataPreparation import variables
def load_data(signal_filename, background_filename):
x_sig = np.vstack([data_sig[var] for var in variables]).T
x_bkg = np.vstack([data_bkg[var] for var in variables]).T
x = np.vstack([x_sig, x_bkg])
num_sig = x_sig.shape[0]
num_bkg = x_bkg.shape[0]
y = np.hstack([np.ones(num_sig), np.zeros(num_bkg)])
num_all = num_sig + num_bkg
w = np.hstack([np.ones(num_sig) * num_all / num_sig, np.ones(num_bkg) * num_all / num_bkg])
return x, y, w
if __name__ == "__main__":
x, y, w = load_data("train_signal.root", "train_background.root")
bdt = XGBClassifier(max_depth=3, n_estimators=500)
bdt.fit(x, y, sample_weight=w)
print("Training done on ",x.shape[0],"events. Saving model in tmva101.root")
ROOT.TMVA.Experimental.SaveXGBoost(bdt, "myBDT", "tmva101.root", num_inputs=x.shape[1])
ROOT's RDataFrame offers a modern, high-level interface for analysis of data stored in TTree ,...
- Date
- August 2019
- Author
- Stefan Wunsch
Definition in file tmva101_Training.py.