Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
df001_introduction.py
Go to the documentation of this file.
1## \file
2## \ingroup tutorial_dataframe
3## \notebook -nodraw
4## Basic usage of RDataFrame from python.
5##
6## This tutorial illustrates the basic features of the RDataFrame class,
7## a utility which allows to interact with data stored in TTrees following
8## a functional-chain like approach.
9##
10## \macro_code
11## \macro_output
12##
13## \date May 2017
14## \author Danilo Piparo (CERN)
15
16import ROOT
17
18# A simple helper function to fill a test tree: this makes the example stand-alone.
19def fill_tree(treeName, fileName):
20 df = ROOT.RDataFrame(10)
21 df.Define("b1", "(double) rdfentry_")\
22 .Define("b2", "(int) rdfentry_ * rdfentry_").Snapshot(treeName, fileName)
23
24# We prepare an input tree to run on
25fileName = "df001_introduction_py.root"
26treeName = "myTree"
27fill_tree(treeName, fileName)
28
29
30# We read the tree from the file and create a RDataFrame, a class that
31# allows us to interact with the data contained in the tree.
32d = ROOT.RDataFrame(treeName, fileName)
33
34# Operations on the dataframe
35# We now review some *actions* which can be performed on the data frame.
36# Actions can be divided into instant actions (e. g. Foreach()) and lazy
37# actions (e. g. Count()), depending on whether they trigger the event
38# loop immediately or only when one of the results is accessed for the
39# first time. Actions that return "something" either return their result
40# wrapped in a RResultPtr or in a RDataFrame.
41# But first of all, let us we define now our cut-flow with two strings.
42# Filters can be expressed as strings. The content must be C++ code. The
43# name of the variables must be the name of the branches. The code is
44# just-in-time compiled.
45cutb1 = 'b1 < 5.'
46cutb1b2 = 'b2 % 2 && b1 < 4.'
47
48# `Count` action
49# The `Count` allows to retrieve the number of the entries that passed the
50# filters. Here we show how the automatic selection of the column kicks
51# in in case the user specifies none.
52entries1 = d.Filter(cutb1) \
53 .Filter(cutb1b2) \
54 .Count();
55
56print('{} entries passed all filters'.format(entries1.GetValue()))
57
58entries2 = d.Filter("b1 < 5.").Count();
59print('{} entries passed all filters'.format(entries2.GetValue()))
60
61# `Min`, `Max` and `Mean` actions
62# These actions allow to retrieve statistical information about the entries
63# passing the cuts, if any.
64b1b2_cut = d.Filter(cutb1b2)
65minVal = b1b2_cut.Min('b1')
66maxVal = b1b2_cut.Max('b1')
67meanVal = b1b2_cut.Mean('b1')
68nonDefmeanVal = b1b2_cut.Mean("b2")
69print('The mean is always included between the min and the max: {0} <= {1} <= {2}'.format(minVal.GetValue(), meanVal.GetValue(), maxVal.GetValue()))
70
71# `Histo1D` action
72# The `Histo1D` action allows to fill an histogram. It returns a TH1F filled
73# with values of the column that passed the filters. For the most common
74# types, the type of the values stored in the column is automatically
75# guessed.
76hist = d.Filter(cutb1).Histo1D('b1')
77print('Filled h {0} times, mean: {1}'.format(hist.GetEntries(), hist.GetMean()))
78
79# Express your chain of operations with clarity!
80# We are discussing an example here but it is not hard to imagine much more
81# complex pipelines of actions acting on data. Those might require code
82# which is well organised, for example allowing to conditionally add filters
83# or again to clearly separate filters and actions without the need of
84# writing the entire pipeline on one line. This can be easily achieved.
85# We'll show this re-working the `Count` example:
86cutb1_result = d.Filter(cutb1);
87cutb1b2_result = d.Filter(cutb1b2);
88cutb1_cutb1b2_result = cutb1_result.Filter(cutb1b2)
89
90# Now we want to count:
91evts_cutb1_result = cutb1_result.Count()
92evts_cutb1b2_result = cutb1b2_result.Count()
93evts_cutb1_cutb1b2_result = cutb1_cutb1b2_result.Count()
94
95print('Events passing cutb1: {}'.format(evts_cutb1_result.GetValue()))
96print('Events passing cutb1b2: {}'.format(evts_cutb1b2_result.GetValue()))
97print('Events passing both: {}'.format(evts_cutb1_cutb1b2_result.GetValue()))
98
99# Calculating quantities starting from existing columns
100# Often, operations need to be carried out on quantities calculated starting
101# from the ones present in the columns. We'll create in this example a third
102# column, the values of which are the sum of the *b1* and *b2* ones, entry by
103# entry. The way in which the new quantity is defined is via a callable.
104# It is important to note two aspects at this point:
105# - The value is created on the fly only if the entry passed the existing
106# filters.
107# - The newly created column behaves as the one present on the file on disk.
108# - The operation creates a new value, without modifying anything. De facto,
109# this is like having a general container at disposal able to accommodate
110# any value of any type.
111# Let's dive in an example:
112entries_sum = d.Define('sum', 'b2 + b1') \
113 .Filter('sum > 4.2') \
114 .Count()
115print(entries_sum.GetValue())
116
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t format
ROOT's RDataFrame offers a modern, high-level interface for analysis of data stored in TTree ,...