Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
_rdataframe.pyzdoc
Go to the documentation of this file.
1\pythondoc ROOT::RDataFrame
2
3You can use RDataFrame in Python thanks to the dynamic Python/C++ translation of [PyROOT](https://root.cern/manual/python). In general, the interface
4is the same as for C++, a simple example follows.
5
6~~~{.py}
8sum = df.Filter("x > 10").Sum("y")
9print(sum.GetValue())
10~~~
11
12### User code in the RDataFrame workflow
13
14#### C++ code
15
16In the simple example that was shown above, a C++ expression is passed to the Filter() operation as a string
17(`"x > 0"`), even if we call the method from Python. Indeed, under the hood, the analysis computations run in
18C++, while Python is just the interface language.
19
20To perform more complex operations that don't fit into a simple expression string, you can just-in-time compile
21C++ functions - via the C++ interpreter cling - and use those functions in an expression. See the following
22snippet for an example:
23
24~~~{.py}
25# JIT a C++ function from Python
26ROOT.gInterpreter.Declare("""
27bool myFilter(float x) {
28 return x > 10;
29}
30""")
31
33# Use the function in an RDF operation
34sum = df.Filter("myFilter(x)").Sum("y")
35print(sum.GetValue())
36~~~
37
38To increase the performance even further, you can also pre-compile a C++ library with full code optimizations
39and load the function into the RDataFrame computation as follows.
40
41~~~{.py}
42ROOT.gSystem.Load("path/to/myLibrary.so") # Library with the myFilter function
43ROOT.gInterpreter.Declare('#include "myLibrary.h"') # Header with the declaration of the myFilter function
45sum = df.Filter("myFilter(x)").Sum("y")
46print(sum.GetValue())
47~~~
48
49A more thorough explanation of how to use C++ code from Python can be found in the [PyROOT manual](https://root.cern/manual/python/#loading-user-libraries-and-just-in-time-compilation-jitting).
50
51#### Python code
52
53ROOT also offers the option to compile Python functions with fundamental types and arrays thereof using [Numba](https://numba.pydata.org/).
54Such compiled functions can then be used in a C++ expression provided to RDataFrame.
55
56The function to be compiled should be decorated with `ROOT.Numba.Declare`, which allows to specify the parameter and
57return types. See the following snippet for a simple example or the full tutorial [here](pyroot004__NumbaDeclare_8py.html).
58
59~~~{.py}
60@ROOT.Numba.Declare(["float"], "bool")
61def myFilter(x):
62 return x > 10
63
65sum = df.Filter("Numba::myFilter(x)").Sum("y")
66print(sum.GetValue())
67~~~
68
69It also works with collections: `RVec` objects of fundamental types can be transparently converted to/from numpy arrays:
70
71~~~{.py}
72@ROOT.Numba.Declare(['RVec<float>', 'int'], 'RVec<float>')
73def pypowarray(numpyvec, pow):
74 return numpyvec**pow
75
76df.Define('array', 'ROOT::RVecF{1.,2.,3.}')\
77 .Define('arraySquared', 'Numba::pypowarray(array, 2)')
78~~~
79
80Note that this functionality requires the Python packages `numba` and `cffi` to be installed.
81
82### Interoperability with NumPy
83
84#### Conversion to NumPy arrays
85
86Eventually, you probably would like to inspect the content of the RDataFrame or process the data further
87with Python libraries. For this purpose, we provide the `AsNumpy()` function, which returns the columns
88of your RDataFrame as a dictionary of NumPy arrays. See a few simple examples below or a full tutorial [here](df026__AsNumpyArrays_8py.html).
89
90\anchor asnumpy_scalar_columns
91##### Scalar columns
92If your column contains scalar values of fundamental types (e.g., integers, floats), `AsNumpy()` produces NumPy arrays with the appropriate `dtype`:
93~~~{.py}
94rdf = ROOT.RDataFrame(10).Define("int_col", "1").Define("float_col", "2.3")
95print(rdf.AsNumpy(["int_col", "float_col"]))
96# Output: {'int_col': array([...], dtype=int32), 'float_col': array([...], dtype=float64)}
97~~~
98
99Columns containing non-fundamental types (e.g., objects, strings) will result in NumPy arrays with `dtype=object`.
100
101##### Collection Columns
102If your column contains collections of fundamental types (e.g., std::vector<int>), `AsNumpy()` produces a NumPy array with `dtype=object` where each
103element is a NumPy array representing the collection for its corresponding entry in the column.
104
105If the collection at a certain entry contains values of fundamental types, or if it is a regularly shaped multi-dimensional array of a fundamental type,
106then the numpy array representing the collection for that entry will have the `dtype` associated with the value type of the collection, for example:
107~~~{.py}
108rdf = rdf.Define("v_col", "std::vector<int>{{1, 2, 3}}")
109print(rdf.AsNumpy(["v_col", "int_col", "float_col"]))
110# Output: {'v_col': array([array([1, 2, 3], dtype=int32), ...], dtype=object), ...}
111~~~
112
113If the collection at a certain entry contains values of a non-fundamental type, `AsNumpy()` will fallback on the [default behavior](\ref asnumpy_scalar_columns) and produce a NumPy array with `dtype=object` for that collection.
114
115For more complex collection types in your entries, e.g. when every entry has a jagged array value, refer to the section on [interoperability with AwkwardArray](\ref awkward_interop).
116
117#### Processing data stored in NumPy arrays
118
119In case you have data in NumPy arrays in Python and you want to process the data with ROOT, you can easily
120create an RDataFrame using `ROOT.RDF.FromNumpy`. The factory function accepts a dictionary where
121the keys are the column names and the values are NumPy arrays, and returns a new RDataFrame with the provided
122columns.
123
124Only arrays of fundamental types (integers and floating point values) are supported and the arrays must have the same length.
125Data is read directly from the arrays: no copies are performed.
126
127~~~{.py}
128# Read data from NumPy arrays
129# The column names in the RDataFrame are taken from the dictionary keys
130x, y = numpy.array([1, 2, 3]), numpy.array([4, 5, 6])
131df = ROOT.RDF.FromNumpy({"x": x, "y": y})
132
133# Use RDataFrame as usual, e.g. write out a ROOT file
134df.Define("z", "x + y").Snapshot("tree", "file.root")
135~~~
136
137
138\anchor awkward_interop
139### Interoperability with [AwkwardArray](https://awkward-array.org/doc/main/user-guide/how-to-convert-rdataframe.html)
140
141The function for RDataFrame to Awkward conversion is ak.from_rdataframe(). The argument to this function accepts a tuple of strings that are the RDataFrame column names. By default this function returns ak.Array type.
142
143~~~{.py}
144import awkward as ak
145import ROOT
146
147array = ak.from_rdataframe(
148 df,
149 columns=(
150 "x",
151 "y",
152 "z",
153 ),
154)
155~~~
156
157The function for Awkward to RDataFrame conversion is ak.to_rdataframe().
158
159The argument to this function requires a dictionary: { <column name string> : <awkward array> }. This function always returns an RDataFrame object.
160
161The arrays given for each column have to be equal length:
162
163~~~{.py}
164array_x = ak.Array(
165 [
166 {"x": [1.1, 1.2, 1.3]},
167 {"x": [2.1, 2.2]},
168 {"x": [3.1]},
169 {"x": [4.1, 4.2, 4.3, 4.4]},
170 {"x": [5.1]},
171 ]
172)
173array_y = ak.Array([1, 2, 3, 4, 5])
174array_z = ak.Array([[1.1], [2.1, 2.3, 2.4], [3.1], [4.1, 4.2, 4.3], [5.1]])
175
176assert len(array_x) == len(array_y) == len(array_z)
177
178df = ak.to_rdataframe({"x": array_x, "y": array_y, "z": array_z})
179~~~
180
181### Construct histogram and profile models from a tuple
182
183The Histo1D(), Histo2D(), Histo3D(), Profile1D() and Profile2D() methods return
184histograms and profiles, respectively, which can be constructed using a model
185argument.
186
187In Python, we can specify the arguments for the constructor of such histogram or
188profile model with a Python tuple, as shown in the example below:
189
190~~~{.py}
191# First argument is a tuple with the arguments to construct a TH1D model
192h = df.Histo1D(("histName", "histTitle", 64, 0., 128.), "myColumn")
193~~~
194
195### AsRNode helper function
196
197The ROOT::RDF::AsRNode function casts an RDataFrame node to the generic ROOT::RDF::RNode type. From Python, it can be used to pass any RDataFrame node as an argument of a C++ function, as shown below:
198
199~~~{.py}
200ROOT.gInterpreter.Declare("""
201ROOT::RDF::RNode MyTransformation(ROOT::RDF::RNode df) {
202 auto myFunc = [](float x){ return -x;};
203 return df.Define("y", myFunc, {"x"});
204}
205""")
206
207# Cast the RDataFrame head node
209df_transformed = ROOT.MyTransformation(ROOT.RDF.AsRNode(df))
210
211# ... or any other node
212df2 = df.Filter("x > 42")
213df2_transformed = ROOT.MyTransformation(ROOT.RDF.AsRNode(df2))
214~~~
215
216\endpythondoc
#define g(i)
Definition RSha256.hxx:105
#define a(i)
Definition RSha256.hxx:99
#define h(i)
Definition RSha256.hxx:106
#define e(i)
Definition RSha256.hxx:103
Option_t Option_t option
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void data
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t result
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h length
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void on
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void value
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t UChar_t len
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void when
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t type
char name[80]
Definition TGX11.cxx:110
The public interface to the RDataFrame federation of classes.
RInterface< Proxied, DS_t > Define(std::string_view name, F expression, const ColumnNames_t &columns={})
Define a new column.
ROOT's RDataFrame offers a modern, high-level interface for analysis of data stored in TTree ,...
RooCmdArg Columns(Int_t ncol)
T Sum(const RVec< T > &v, const T zero=T(0))
Sum elements of an RVec.
Definition RVec.hxx:1950
RVec< T > Filter(const RVec< T > &v, F &&f)
Create a new collection with the elements passing the filter expressed by the predicate.
Definition RVec.hxx:2178
Double_t y[n]
Definition legend1.C:17
Double_t x[n]
Definition legend1.C:17
for(Int_t i=0;i< n;i++)
Definition legend1.C:18
ROOT::VecOps::RVec< T > RVec
Definition RVec.hxx:70
RNode AsRNode(NodeType node)
Cast a RDataFrame node to the common type ROOT::RDF::RNode.
void function(const Char_t *name_, T fun, const Char_t *docstring=0)
Definition RExports.h:167
Namespace for new ROOT classes and functions.
TString as(SEXP s)
Definition RExports.h:86
constexpr Double_t C()
Velocity of light in .
Definition TMath.h:115
static uint64_t sum(uint64_t i)
Definition Factory.cxx:2340