3You can use RDataFrame in Python thanks to the dynamic Python/
C++ translation of [
PyROOT](https:
4is the same
as for C++,
a simple example follows.
8sum = df.Filter(
"x > 10").Sum(
"y")
12### User code in the RDataFrame workflow
16In the simple example that was shown above,
a C++ expression is passed to the
Filter() operation as
a string
17(`
"x > 0"`), even
if we call the method from Python. Indeed, under the hood, the analysis computations run in
18C++,
while Python is just the interface language.
20To perform more complex operations that don
't fit into a simple expression string, you can just-in-time compile
21C++ functions - via the C++ interpreter cling - and use those functions in an expression. See the following
22snippet for an example:
25# JIT a C++ function from Python
26ROOT.gInterpreter.Declare("""
27bool myFilter(float x) {
32df = ROOT.RDataFrame("myTree", "myFile.root")
33# Use the function in an RDF operation
34sum = df.Filter("myFilter(x)").Sum("y")
38To increase the performance even further, you can also pre-compile a C++ library with full code optimizations
39and load the function into the RDataFrame computation as follows.
42ROOT.gSystem.Load("path/to/myLibrary.so") # Library with the myFilter function
43ROOT.gInterpreter.Declare('#include
"myLibrary.h"') # Header with the declaration of the myFilter function
44df = ROOT.RDataFrame("myTree", "myFile.root")
45sum = df.Filter("myFilter(x)").Sum("y")
49A more thorough explanation of how to use C++ code from Python can be found in the [PyROOT manual](https://root.cern/manual/python/#loading-user-libraries-and-just-in-time-compilation-jitting).
53ROOT also offers the option to compile Python functions with fundamental types and arrays thereof using [Numba](https://numba.pydata.org/).
54Such compiled functions can then be used in a C++ expression provided to RDataFrame.
56The function to be compiled should be decorated with `ROOT.Numba.Declare`, which allows to specify the parameter and
57return types. See the following snippet for a simple example or the full tutorial [here](pyroot004__NumbaDeclare_8py.html).
60@ROOT.Numba.Declare(["float"], "bool")
64df = ROOT.RDataFrame("myTree", "myFile.root")
65sum = df.Filter("Numba::myFilter(x)").Sum("y")
69It also works with collections: `RVec` objects of fundamental types can be transparently converted to/from numpy arrays:
72@ROOT.Numba.Declare(['RVec<float>
', 'int'], 'RVec<float>
')
73def pypowarray(numpyvec, pow):
77 .Define('arraySquared
', 'Numba::pypowarray(array, 2)
')
80Note that this functionality requires the Python packages `numba` and `cffi` to be installed.
82### Interoperability with NumPy
84#### Conversion to NumPy arrays
86Eventually, you probably would like to inspect the content of the RDataFrame or process the data further
87with Python libraries. For this purpose, we provide the `AsNumpy()` function, which returns the columns
88of your RDataFrame as a dictionary of NumPy arrays. See a few simple examples below or a full tutorial [here](df026__AsNumpyArrays_8py.html).
90\anchor asnumpy_scalar_columns
92If your column contains scalar values of fundamental types (e.g., integers, floats), `AsNumpy()` produces NumPy arrays with the appropriate `dtype`:
94rdf = ROOT.RDataFrame(10).Define("int_col", "1").Define("float_col", "2.3")
95print(rdf.AsNumpy(["int_col", "float_col"]))
96# Output: {'int_col
': array([...], dtype=int32), 'float_col
': array([...], dtype=float64)}
99Columns containing non-fundamental types (e.g., objects, strings) will result in NumPy arrays with `dtype=object`.
101##### Collection Columns
102If your column contains collections of fundamental types (e.g., std::vector<int>), `AsNumpy()` produces a NumPy array with `dtype=object` where each
103element is a NumPy array representing the collection for its corresponding entry in the column.
105If the collection at a certain entry contains values of fundamental types, or if it is a regularly shaped multi-dimensional array of a fundamental type,
106then the numpy array representing the collection for that entry will have the `dtype` associated with the value type of the collection, for example:
108rdf = rdf.Define("v_col", "std::vector<int>{{1, 2, 3}}")
109print(rdf.AsNumpy(["v_col", "int_col", "float_col"]))
110# Output: {'v_col
': array([array([1, 2, 3], dtype=int32), ...], dtype=object), ...}
113If the collection at a certain entry contains values of a non-fundamental type, `AsNumpy()` will fallback on the [default behavior](\ref asnumpy_scalar_columns) and produce a NumPy array with `dtype=object` for that collection.
115For more complex collection types in your entries, e.g. when every entry has a jagged array value, refer to the section on [interoperability with AwkwardArray](\ref awkward_interop).
117#### Processing data stored in NumPy arrays
119In case you have data in NumPy arrays in Python and you want to process the data with ROOT, you can easily
120create an RDataFrame using `ROOT.RDF.FromNumpy`. The factory function accepts a dictionary where
121the keys are the column names and the values are NumPy arrays, and returns a new RDataFrame with the provided
124Only arrays of fundamental types (integers and floating point values) are supported and the arrays must have the same length.
125Data is read directly from the arrays: no copies are performed.
128# Read data from NumPy arrays
129# The column names in the RDataFrame are taken from the dictionary keys
130x, y = numpy.array([1, 2, 3]), numpy.array([4, 5, 6])
131df = ROOT.RDF.FromNumpy({"x": x, "y": y})
133# Use RDataFrame as usual, e.g. write out a ROOT file
134df.Define("z", "x + y").Snapshot("tree", "file.root")
138\anchor awkward_interop
139### Interoperability with [AwkwardArray](https://awkward-array.org/doc/main/user-guide/how-to-convert-rdataframe.html)
141The function for RDataFrame to Awkward conversion is ak.from_rdataframe(). The argument to this function accepts a tuple of strings that are the RDataFrame column names. By default this function returns ak.Array type.
147array = ak.from_rdataframe(
157The function for Awkward to RDataFrame conversion is ak.to_rdataframe().
159The argument to this function requires a dictionary: { <column name string> : <awkward array> }. This function always returns an RDataFrame object.
161The arrays given for each column have to be equal length:
166 {"x": [1.1, 1.2, 1.3]},
169 {"x": [4.1, 4.2, 4.3, 4.4]},
173array_y = ak.Array([1, 2, 3, 4, 5])
174array_z = ak.Array([[1.1], [2.1, 2.3, 2.4], [3.1], [4.1, 4.2, 4.3], [5.1]])
176assert len(array_x) == len(array_y) == len(array_z)
178df = ak.to_rdataframe({"x": array_x, "y": array_y, "z": array_z})
181### Construct histogram and profile models from a tuple
183The Histo1D(), Histo2D(), Histo3D(), Profile1D() and Profile2D() methods return
184histograms and profiles, respectively, which can be constructed using a model
187In Python, we can specify the arguments for the constructor of such histogram or
188profile model with a Python tuple, as shown in the example below:
191# First argument is a tuple with the arguments to construct a TH1D model
192h = df.Histo1D(("histName", "histTitle", 64, 0., 128.), "myColumn")
195### AsRNode helper function
197The ROOT::RDF::AsRNode function casts an RDataFrame node to the generic ROOT::RDF::RNode type. From Python, it can be used to pass any RDataFrame node as an argument of a C++ function, as shown below:
200ROOT.gInterpreter.Declare("""
201ROOT::RDF::RNode MyTransformation(ROOT::RDF::RNode df) {
202 auto myFunc = [](float x){ return -x;};
203 return df.Define("y", myFunc, {"x"});
207# Cast the RDataFrame head node
208df = ROOT.RDataFrame("myTree", "myFile.root")
209df_transformed = ROOT.MyTransformation(ROOT.RDF.AsRNode(df))
211# ... or any other node
212df2 = df.Filter("x > 42")
213df2_transformed = ROOT.MyTransformation(ROOT.RDF.AsRNode(df2))
ROOT's RDataFrame offers a modern, high-level interface for analysis of data stored in TTree ,...
RVec< T > Filter(const RVec< T > &v, F &&f)
Create a new collection with the elements passing the filter expressed by the predicate.
ROOT::VecOps::RVec< float > RVecF
constexpr Double_t C()
Velocity of light in .
static uint64_t sum(uint64_t i)