Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
_rdataframe.pyzdoc
Go to the documentation of this file.
1/**
2\class ROOT::RDataFrame
3\brief \parblock \endparblock
4\htmlonly
5<div class="pyrootbox">
6\endhtmlonly
7\anchor python
8## Efficient analysis in Python
9
10You can use RDataFrame in Python thanks to the dynamic Python/C++ translation of [PyROOT](https://root.cern/manual/python). In general, the interface
11is the same as for C++, a simple example follows.
12
13~~~{.py}
14df = ROOT.RDataFrame("myTree", "myFile.root")
15sum = df.Filter("x > 10").Sum("y")
16print(sum.GetValue())
17~~~
18
19### User code in the RDataFrame workflow
20
21#### C++ code
22
23In the simple example that was shown above, a C++ expression is passed to the Filter() operation as a string
24(`"x > 0"`), even if we call the method from Python. Indeed, under the hood, the analysis computations run in
25C++, while Python is just the interface language.
26
27To perform more complex operations that don't fit into a simple expression string, you can just-in-time compile
28C++ functions - via the C++ interpreter cling - and use those functions in an expression. See the following
29snippet for an example:
30
31~~~{.py}
32# JIT a C++ function from Python
33ROOT.gInterpreter.Declare("""
34bool myFilter(float x) {
35 return x > 10;
36}
37""")
38
39df = ROOT.RDataFrame("myTree", "myFile.root")
40# Use the function in an RDF operation
41sum = df.Filter("myFilter(x)").Sum("y")
42print(sum.GetValue())
43~~~
44
45To increase the performance even further, you can also pre-compile a C++ library with full code optimizations
46and load the function into the RDataFrame computation as follows.
47
48~~~{.py}
49ROOT.gSystem.Load("path/to/myLibrary.so") # Library with the myFilter function
50ROOT.gInterpreter.Declare('#include "myLibrary.h"') # Header with the declaration of the myFilter function
51df = ROOT.RDataFrame("myTree", "myFile.root")
52sum = df.Filter("myFilter(x)").Sum("y")
53print(sum.GetValue())
54~~~
55
56A more thorough explanation of how to use C++ code from Python can be found in the [PyROOT manual](https://root.cern/manual/python/#loading-user-libraries-and-just-in-time-compilation-jitting).
57
58#### Python code
59
60ROOT also offers the option to compile Python functions with fundamental types and arrays thereof using [Numba](https://numba.pydata.org/).
61Such compiled functions can then be used in a C++ expression provided to RDataFrame.
62
63The function to be compiled should be decorated with `ROOT.Numba.Declare`, which allows to specify the parameter and
64return types. See the following snippet for a simple example or the full tutorial [here](pyroot004__NumbaDeclare_8py.html).
65
66~~~{.py}
67@ROOT.Numba.Declare(["float"], "bool")
68def myFilter(x):
69 return x > 10
70
71df = ROOT.RDataFrame("myTree", "myFile.root")
72sum = df.Filter("Numba::myFilter(x)").Sum("y")
73print(sum.GetValue())
74~~~
75
76It also works with collections: `RVec` objects of fundamental types can be transparently converted to/from numpy arrays:
77
78~~~{.py}
79@ROOT.Numba.Declare(['RVec<float>', 'int'], 'RVec<float>')
80def pypowarray(numpyvec, pow):
81 return numpyvec**pow
82
83df.Define('array', 'ROOT::RVecF{1.,2.,3.}')\
84 .Define('arraySquared', 'Numba::pypowarray(array, 2)')
85~~~
86
87Note that this functionality requires the Python packages `numba` and `cffi` to be installed.
88
89### Interoperability with NumPy
90
91#### Conversion to NumPy arrays
92
93Eventually, you probably would like to inspect the content of the RDataFrame or process the data further
94with Python libraries. For this purpose, we provide the `AsNumpy()` function, which returns the columns
95of your RDataFrame as a dictionary of NumPy arrays. See a simple example below or a full tutorial [here](df026__AsNumpyArrays_8py.html).
96
97~~~{.py}
98df = ROOT.RDataFrame("myTree", "myFile.root")
99cols = df.Filter("x > 10").AsNumpy(["x", "y"]) # retrieve columns "x" and "y" as NumPy arrays
100print(cols["x"], cols["y"]) # the values of the cols dictionary are NumPy arrays
101~~~
102
103#### Processing data stored in NumPy arrays
104
105In case you have data in NumPy arrays in Python and you want to process the data with ROOT, you can easily
106create an RDataFrame using `ROOT.RDF.FromNumpy`. The factory function accepts a dictionary where
107the keys are the column names and the values are NumPy arrays, and returns a new RDataFrame with the provided
108columns.
109
110Only arrays of fundamental types (integers and floating point values) are supported and the arrays must have the same length.
111Data is read directly from the arrays: no copies are performed.
112
113~~~{.py}
114# Read data from NumPy arrays
115# The column names in the RDataFrame are taken from the dictionary keys
116x, y = numpy.array([1, 2, 3]), numpy.array([4, 5, 6])
117df = ROOT.RDF.FromNumpy({"x": x, "y": y})
118
119# Use RDataFrame as usual, e.g. write out a ROOT file
120df.Define("z", "x + y").Snapshot("tree", "file.root")
121~~~
122
123### Construct histogram and profile models from a tuple
124
125The Histo1D(), Histo2D(), Histo3D(), Profile1D() and Profile2D() methods return
126histograms and profiles, respectively, which can be constructed using a model
127argument.
128
129In Python, we can specify the arguments for the constructor of such histogram or
130profile model with a Python tuple, as shown in the example below:
131
132~~~{.py}
133# First argument is a tuple with the arguments to construct a TH1D model
134h = df.Histo1D(("histName", "histTitle", 64, 0., 128.), "myColumn")
135~~~
136
137### AsRNode helper function
138
139The ROOT::RDF::AsRNode function casts an RDataFrame node to the generic ROOT::RDF::RNode type. From Python, it can be used to pass any RDataFrame node as an argument of a C++ function, as shown below:
140
141~~~{.py}
142ROOT.gInterpreter.Declare("""
143ROOT::RDF::RNode MyTransformation(ROOT::RDF::RNode df) {
144 auto myFunc = [](float x){ return -x;};
145 return df.Define("y", myFunc, {"x"});
146}
147""")
148
149# Cast the RDataFrame head node
150df = ROOT.RDataFrame("myTree", "myFile.root")
151df_transformed = ROOT.MyTransformation(ROOT.RDF.AsRNode(df))
152
153# ... or any other node
154df2 = df.Filter("x > 42")
155df2_transformed = ROOT.MyTransformation(ROOT.RDF.AsRNode(df2))
156~~~
157\htmlonly
158</div>
159\endhtmlonly
160
161\anchor reference
162*/