Logo ROOT  
Reference Guide
Loading...
Searching...
No Matches
tmva100_DataPreparation.py File Reference

View in nbviewer Open in SWAN
This tutorial illustrates how to prepare ROOT datasets to be nicely readable by most machine learning methods.

This requires filtering the initial complex datasets and writing the data in a flat format.

import ROOT
def filter_events(df):
"""
Reduce initial dataset to only events which shall be used for training
"""
return df.Filter("nElectron>=2 && nMuon>=2", "At least two electrons and two muons")
def define_variables(df):
"""
Define the variables which shall be used for training
"""
return df.Define("Muon_pt_1", "Muon_pt[0]")\
.Define("Muon_pt_2", "Muon_pt[1]")\
.Define("Electron_pt_1", "Electron_pt[0]")\
.Define("Electron_pt_2", "Electron_pt[1]")
variables = ["Muon_pt_1", "Muon_pt_2", "Electron_pt_1", "Electron_pt_2"]
if __name__ == "__main__":
for filename, label in [["SMHiggsToZZTo4L.root", "signal"], ["ZZTo2e2mu.root", "background"]]:
print(">>> Extract the training and testing events for {} from the {} dataset.".format(
label, filename))
# Load dataset, filter the required events and define the training variables
filepath = "root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/" + filename
df = ROOT.RDataFrame("Events", filepath)
df = filter_events(df)
df = define_variables(df)
# Book cutflow report
report = df.Report()
# Split dataset by event number for training and testing
columns = ROOT.std.vector["string"](variables)
df.Filter("event % 2 == 0", "Select events with even event number for training")\
.Snapshot("Events", "train_" + label + ".root", columns)
df.Filter("event % 2 == 1", "Select events with odd event number for training")\
.Snapshot("Events", "test_" + label + ".root", columns)
# Print cutflow report
report.Print()
ROOT's RDataFrame offers a modern, high-level interface for analysis of data stored in TTree ,...
>>> Extract the training and testing events for signal from the SMHiggsToZZTo4L.root dataset.
>>> Extract the training and testing events for background from the ZZTo2e2mu.root dataset.
At least two electrons and two muons: pass=45352 all=299973 -- eff=15.12 % cumulative eff=15.12 %
At least two electrons and two muons: pass=262776 all=1497445 -- eff=17.55 % cumulative eff=17.55 %
Date
August 2019
Author
Stefan Wunsch

Definition in file tmva100_DataPreparation.py.