Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
_rdataframe.pyzdoc
Go to the documentation of this file.
1\pythondoc ROOT::RDataFrame
2
3You can use RDataFrame in Python thanks to the dynamic Python/C++ translation of [PyROOT](https://root.cern/manual/python). In general, the interface
4is the same as for C++, a simple example follows.
5
6~~~{.py}
7df = ROOT.RDataFrame("myTree", "myFile.root")
8sum = df.Filter("x > 10").Sum("y")
9print(sum.GetValue())
10~~~
11
12### User code in the RDataFrame workflow
13
14#### C++ code
15
16In the simple example that was shown above, a C++ expression is passed to the Filter() operation as a string
17(`"x > 0"`), even if we call the method from Python. Indeed, under the hood, the analysis computations run in
18C++, while Python is just the interface language.
19
20To perform more complex operations that don't fit into a simple expression string, you can just-in-time compile
21C++ functions - via the C++ interpreter cling - and use those functions in an expression. See the following
22snippet for an example:
23
24~~~{.py}
25# JIT a C++ function from Python
26ROOT.gInterpreter.Declare("""
27bool myFilter(float x) {
28 return x > 10;
29}
30""")
31
32df = ROOT.RDataFrame("myTree", "myFile.root")
33# Use the function in an RDF operation
34sum = df.Filter("myFilter(x)").Sum("y")
35print(sum.GetValue())
36~~~
37
38To increase the performance even further, you can also pre-compile a C++ library with full code optimizations
39and load the function into the RDataFrame computation as follows.
40
41~~~{.py}
42ROOT.gSystem.Load("path/to/myLibrary.so") # Library with the myFilter function
43ROOT.gInterpreter.Declare('#include "myLibrary.h"') # Header with the declaration of the myFilter function
44df = ROOT.RDataFrame("myTree", "myFile.root")
45sum = df.Filter("myFilter(x)").Sum("y")
46print(sum.GetValue())
47~~~
48
49A more thorough explanation of how to use C++ code from Python can be found in the [PyROOT manual](https://root.cern/manual/python/#loading-user-libraries-and-just-in-time-compilation-jitting).
50
51#### Python code
52
53ROOT also offers the option to compile Python functions with fundamental types and arrays thereof using [Numba](https://numba.pydata.org/).
54Such compiled functions can then be used in a C++ expression provided to RDataFrame.
55
56The function to be compiled should be decorated with `ROOT.Numba.Declare`, which allows to specify the parameter and
57return types. See the following snippet for a simple example or the full tutorial [here](pyroot004__NumbaDeclare_8py.html).
58
59~~~{.py}
60@ROOT.Numba.Declare(["float"], "bool")
61def myFilter(x):
62 return x > 10
63
64df = ROOT.RDataFrame("myTree", "myFile.root")
65sum = df.Filter("Numba::myFilter(x)").Sum("y")
66print(sum.GetValue())
67~~~
68
69It also works with collections: `RVec` objects of fundamental types can be transparently converted to/from numpy arrays:
70
71~~~{.py}
72@ROOT.Numba.Declare(['RVec<float>', 'int'], 'RVec<float>')
73def pypowarray(numpyvec, pow):
74 return numpyvec**pow
75
76df.Define('array', 'ROOT::RVecF{1.,2.,3.}')\
77 .Define('arraySquared', 'Numba::pypowarray(array, 2)')
78~~~
79
80Note that this functionality requires the Python packages `numba` and `cffi` to be installed.
81
82### Interoperability with NumPy
83
84#### Conversion to NumPy arrays
85
86Eventually, you probably would like to inspect the content of the RDataFrame or process the data further
87with Python libraries. For this purpose, we provide the `AsNumpy()` function, which returns the columns
88of your RDataFrame as a dictionary of NumPy arrays. See a simple example below or a full tutorial [here](df026__AsNumpyArrays_8py.html).
89
90~~~{.py}
91df = ROOT.RDataFrame("myTree", "myFile.root")
92cols = df.Filter("x > 10").AsNumpy(["x", "y"]) # retrieve columns "x" and "y" as NumPy arrays
93print(cols["x"], cols["y"]) # the values of the cols dictionary are NumPy arrays
94~~~
95
96#### Processing data stored in NumPy arrays
97
98In case you have data in NumPy arrays in Python and you want to process the data with ROOT, you can easily
99create an RDataFrame using `ROOT.RDF.FromNumpy`. The factory function accepts a dictionary where
100the keys are the column names and the values are NumPy arrays, and returns a new RDataFrame with the provided
101columns.
102
103Only arrays of fundamental types (integers and floating point values) are supported and the arrays must have the same length.
104Data is read directly from the arrays: no copies are performed.
105
106~~~{.py}
107# Read data from NumPy arrays
108# The column names in the RDataFrame are taken from the dictionary keys
109x, y = numpy.array([1, 2, 3]), numpy.array([4, 5, 6])
110df = ROOT.RDF.FromNumpy({"x": x, "y": y})
111
112# Use RDataFrame as usual, e.g. write out a ROOT file
113df.Define("z", "x + y").Snapshot("tree", "file.root")
114~~~
115
116### Interoperability with [AwkwardArray](https://awkward-array.org/doc/main/user-guide/how-to-convert-rdataframe.html)
117
118The function for RDataFrame to Awkward conversion is ak.from_rdataframe(). The argument to this function accepts a tuple of strings that are the RDataFrame column names. By default this function returns ak.Array type.
119
120~~~{.py}
121import awkward as ak
122import ROOT
123
124array = ak.from_rdataframe(
125 df,
126 columns=(
127 "x",
128 "y",
129 "z",
130 ),
131)
132~~~
133
134The function for Awkward to RDataFrame conversion is ak.to_rdataframe().
135
136The argument to this function requires a dictionary: { <column name string> : <awkward array> }. This function always returns an RDataFrame object.
137
138The arrays given for each column have to be equal length:
139
140~~~{.py}
141array_x = ak.Array(
142 [
143 {"x": [1.1, 1.2, 1.3]},
144 {"x": [2.1, 2.2]},
145 {"x": [3.1]},
146 {"x": [4.1, 4.2, 4.3, 4.4]},
147 {"x": [5.1]},
148 ]
149)
150array_y = ak.Array([1, 2, 3, 4, 5])
151array_z = ak.Array([[1.1], [2.1, 2.3, 2.4], [3.1], [4.1, 4.2, 4.3], [5.1]])
152
153assert len(array_x) == len(array_y) == len(array_z)
154
155df = ak.to_rdataframe({"x": array_x, "y": array_y, "z": array_z})
156~~~
157
158### Construct histogram and profile models from a tuple
159
160The Histo1D(), Histo2D(), Histo3D(), Profile1D() and Profile2D() methods return
161histograms and profiles, respectively, which can be constructed using a model
162argument.
163
164In Python, we can specify the arguments for the constructor of such histogram or
165profile model with a Python tuple, as shown in the example below:
166
167~~~{.py}
168# First argument is a tuple with the arguments to construct a TH1D model
169h = df.Histo1D(("histName", "histTitle", 64, 0., 128.), "myColumn")
170~~~
171
172### AsRNode helper function
173
174The ROOT::RDF::AsRNode function casts an RDataFrame node to the generic ROOT::RDF::RNode type. From Python, it can be used to pass any RDataFrame node as an argument of a C++ function, as shown below:
175
176~~~{.py}
177ROOT.gInterpreter.Declare("""
178ROOT::RDF::RNode MyTransformation(ROOT::RDF::RNode df) {
179 auto myFunc = [](float x){ return -x;};
180 return df.Define("y", myFunc, {"x"});
181}
182""")
183
184# Cast the RDataFrame head node
185df = ROOT.RDataFrame("myTree", "myFile.root")
186df_transformed = ROOT.MyTransformation(ROOT.RDF.AsRNode(df))
187
188# ... or any other node
189df2 = df.Filter("x > 42")
190df2_transformed = ROOT.MyTransformation(ROOT.RDF.AsRNode(df2))
191~~~
192
193\endpythondoc
#define a(i)
Definition RSha256.hxx:99
#define h(i)
Definition RSha256.hxx:106
#define e(i)
Definition RSha256.hxx:103
static void retrieve(const gsl_integration_workspace *workspace, double *a, double *b, double *r, double *e)
Option_t Option_t option
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void data
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h length
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t UChar_t len
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t type
char name[80]
Definition TGX11.cxx:110
The public interface to the RDataFrame federation of classes.
RInterface< RDFDetail::RFilter< F, Proxied >, DS_t > Filter(F f, const ColumnNames_t &columns={}, std::string_view name="")
Append a filter to the call graph.
ROOT's RDataFrame offers a modern, high-level interface for analysis of data stored in TTree ,...
RVec< PromoteTypes< T0, T1 > > pow(const T0 &x, const RVec< T1 > &v)
Definition RVec.hxx:1846
T Sum(const RVec< T > &v, const T zero=T(0))
Sum elements of an RVec.
Definition RVec.hxx:1954
RVec< T > Filter(const RVec< T > &v, F &&f)
Create a new collection with the elements passing the filter expressed by the predicate.
Definition RVec.hxx:2182
Double_t y[n]
Definition legend1.C:17
Double_t x[n]
Definition legend1.C:17
for(Int_t i=0;i< n;i++)
Definition legend1.C:18
ROOT::VecOps::RVec< T > RVec
Definition RVec.hxx:69
RNode AsRNode(NodeType node)
Cast a RDataFrame node to the common type ROOT::RDF::RNode.
void function(const Char_t *name_, T fun, const Char_t *docstring=0)
Definition RExports.h:167
tbb::task_arena is an alias of tbb::interface7::task_arena, which doesn't allow to forward declare tb...
TString as(SEXP s)
Definition RExports.h:86
constexpr Double_t C()
Velocity of light in .
Definition TMath.h:114
static uint64_t sum(uint64_t i)
Definition Factory.cxx:2345