Logo ROOT  
Reference Guide
df007_snapshot.py
Go to the documentation of this file.
1## \file
2## \ingroup tutorial_dataframe
3## \notebook -draw
4## This tutorial shows how to write out datasets in ROOT formatusing the RDataFrame
5## \macro_code
6##
7## \date April 2017
8## \author Danilo Piparo
9
10import ROOT
11
12# A simple helper function to fill a test tree: this makes the example stand-alone.
13def fill_tree(treeName, fileName):
14 df.Define("b1", "(int) rdfentry_")\
15 .Define("b2", "(float) rdfentry_ * rdfentry_").Snapshot(treeName, fileName)
16
17# We prepare an input tree to run on
18fileName = "df007_snapshot_py.root"
19outFileName = "df007_snapshot_output_py.root"
20outFileNameAllColumns = "df007_snapshot_output_allColumns_py.root"
21treeName = "myTree"
22fill_tree(treeName, fileName)
23
24# We read the tree from the file and create a RDataFrame.
25d = ROOT.RDataFrame(treeName, fileName)
26
27# ## Select entries
28# We now select some entries in the dataset
29d_cut = d.Filter("b1 % 2 == 0")
30# ## Enrich the dataset
31# Build some temporary columns: we'll write them out
32
33getVector_code ='''
34std::vector<float> getVector (float b2)
35{
36 std::vector<float> v;
37 for (int i = 0; i < 3; i++) v.push_back(b2*i);
38 return v;
39}
40'''
41ROOT.gInterpreter.Declare(getVector_code)
42
43d2 = d_cut.Define("b1_square", "b1 * b1") \
44 .Define("b2_vector", "getVector( b2 )")
45
46# ## Write it to disk in ROOT format
47# We now write to disk a new dataset with one of the variables originally
48# present in the tree and the new variables.
49# The user can explicitly specify the types of the columns as template
50# arguments of the Snapshot method, otherwise they will be automatically
51# inferred.
52branchList = ROOT.vector('string')()
53for branchName in ["b1", "b1_square", "b2_vector"]:
54 branchList.push_back(branchName)
55d2.Snapshot(treeName, outFileName, branchList)
56
57# Open the new file and list the columns of the tree
58f1 = ROOT.TFile(outFileName)
59t = f1.myTree
60print("These are the columns b1, b1_square and b2_vector:")
61for branch in t.GetListOfBranches():
62 print("Branch: %s" %branch.GetName())
63
64f1.Close()
65
66# We are not forced to write the full set of column names. We can also
67# specify a regular expression for that. In case nothing is specified, all
68# columns are persistified.
69d2.Snapshot(treeName, outFileNameAllColumns)
70
71# Open the new file and list the columns of the tree
72f2 = ROOT.TFile(outFileNameAllColumns)
73t = f2.myTree
74print("These are all the columns available to this dataframe:")
75for branch in t.GetListOfBranches():
76 print("Branch: %s" %branch.GetName())
77
78f2.Close()
79
80# We can also get a fresh RDataFrame out of the snapshot and restart the
81# analysis chain from it.
82
83branchList.clear()
84branchList.push_back("b1_square")
85snapshot_df = d2.Snapshot(treeName, outFileName, branchList);
86h = snapshot_df.Histo1D("b1_square")
87c = ROOT.TCanvas()
88h.Draw()
89
ROOT's RDataFrame offers a high level interface for analyses of data stored in TTrees,...
Definition: RDataFrame.hxx:42