3You can use RDataFrame in Python thanks to the dynamic Python/
C++ translation of [
PyROOT](https:
4is the same
as for C++,
a simple example follows.
8sum = df.Filter(
"x > 10").Sum(
"y")
12### User code in the RDataFrame workflow
16In the simple example that was shown above,
a C++ expression is passed to the
Filter() operation as
a string
17(`"
x > 0"`), even if we call the method from Python. Indeed, under the hood, the analysis computations run in
18C++, while Python is just the interface language.
20To perform more complex operations that don't fit into
a simple expression
string, you can just-in-time compile
21C++ functions - via the C++ interpreter cling - and use those functions in an expression. See the following
22snippet
for an example:
25# JIT a C++ function from Python
26ROOT.gInterpreter.Declare(
"""
27bool myFilter(float x) {
33# Use the function in an RDF operation
34sum = df.Filter(
"myFilter(x)").Sum(
"y")
38To increase the performance even further, you can also pre-compile
a C++ library with
full code optimizations
39and load the
function into the RDataFrame computation
as follows.
42ROOT.gSystem.Load(
"path/to/myLibrary.so") # Library with the myFilter
function
43ROOT.gInterpreter.Declare(
'#include "myLibrary.h"') # Header with the declaration of the myFilter
function
45sum = df.Filter(
"myFilter(x)").Sum(
"y")
49A more thorough explanation of how to use
C++ code from Python can be found in the [
PyROOT manual](https:
53ROOT also offers the
option to compile Python functions with fundamental types and arrays thereof
using [Numba](https:
54Such compiled functions can then be used in
a C++ expression provided to RDataFrame.
56The
function to be compiled should be decorated with `
ROOT.Numba.Declare`, which allows to specify the parameter and
57return types. See the following snippet
for a simple example or the
full tutorial [here](pyroot004__NumbaDeclare_8py.html).
60@
ROOT.Numba.Declare([
"float"],
"bool")
69It also works with collections: `
RVec` objects of fundamental types can be transparently converted to/from numpy arrays:
72@
ROOT.Numba.Declare([
'RVec<float>',
'int'],
'RVec<float>')
73def pypowarray(numpyvec, pow):
76df.Define(
'array',
'ROOT::RVecF{1.,2.,3.}')\
77 .Define(
'arraySquared',
'Numba::pypowarray(array, 2)')
80Note that this functionality requires the Python packages `numba` and `cffi` to be installed.
82### Interoperability with NumPy
84#### Conversion to NumPy arrays
86Eventually, you probably would like to inspect the content of the RDataFrame or process the
data further
87with Python libraries. For this purpose, we provide the `AsNumpy()`
function, which returns the columns
88of your RDataFrame
as a dictionary of NumPy arrays. See
a few simple examples below or
a full tutorial [here](df026__AsNumpyArrays_8py.html).
90\anchor asnumpy_scalar_columns
92If your column contains scalar values of fundamental types (
e.
g., integers, floats), `AsNumpy()` produces NumPy arrays with the appropriate `dtype`:
95print(rdf.AsNumpy([
"int_col",
"float_col"]))
96# Output: {'int_col': array([...], dtype=int32), 'float_col': array([...], dtype=float64)}
99Columns containing non-fundamental types (
e.g., objects, strings) will
result in NumPy arrays with `dtype=
object`.
102If your column contains collections of fundamental types (
e.g., std::vector<int>), `AsNumpy()` produces
a NumPy array with `dtype=
object` where each
103element is
a NumPy array representing the collection
for its corresponding entry in the column.
105If the collection at
a certain entry contains values of fundamental types, or
if it is
a regularly shaped multi-dimensional array of
a fundamental
type,
106then the numpy array representing the collection
for that entry will have the `dtype` associated with the
value type of the collection,
for example:
108rdf = rdf.Define(
"v_col",
"std::vector<int>{{1, 2, 3}}")
109print(rdf.AsNumpy([
"v_col",
"int_col",
"float_col"]))
110# Output: {'v_col': array([array([1, 2, 3], dtype=int32), ...], dtype=object), ...}
113If the collection at
a certain entry contains values of
a non-fundamental
type, `AsNumpy()` will fallback
on the [
default behavior](\ref asnumpy_scalar_columns) and produce
a NumPy array with `dtype=
object`
for that collection.
115For more complex collection types in your entries,
e.g.
when every entry has
a jagged array
value, refer to the section
on [interoperability with AwkwardArray](\ref awkward_interop).
117#### Processing
data stored in NumPy arrays
119In
case you have
data in NumPy arrays in Python and you want to process the
data with
ROOT, you can easily
120create an RDataFrame
using `
ROOT.
RDF.FromNumpy`. The factory
function accepts
a dictionary where
121the keys are the column names and the values are NumPy arrays, and returns
a new RDataFrame with the provided
124Only arrays of fundamental types (integers and floating point values) are supported and the arrays must have the same
length.
125Data is read directly from the arrays: no copies are performed.
128# Read data from NumPy arrays
129# The column names in the RDataFrame are taken from the dictionary keys
130x,
y = numpy.array([1, 2, 3]), numpy.array([4, 5, 6])
133# Use RDataFrame as usual,
e.g. write out
a ROOT file
134df.Define(
"z",
"x + y").Snapshot(
"tree",
"file.root")
138\anchor awkward_interop
139### Interoperability with [AwkwardArray](https:
141The
function for RDataFrame to Awkward conversion is ak.from_rdataframe(). The argument to
this function accepts
a tuple of strings that are the RDataFrame column names. By
default this function returns ak.Array
type.
147array = ak.from_rdataframe(
157The
function for Awkward to RDataFrame conversion is ak.to_rdataframe().
159The argument to
this function requires a dictionary: { <column
name string> : <awkward array> }. This
function always returns an RDataFrame
object.
161The arrays given
for each column have to be equal
length:
166 {
"x": [1.1, 1.2, 1.3]},
169 {
"x": [4.1, 4.2, 4.3, 4.4]},
173array_y = ak.Array([1, 2, 3, 4, 5])
174array_z = ak.Array([[1.1], [2.1, 2.3, 2.4], [3.1], [4.1, 4.2, 4.3], [5.1]])
176assert
len(array_x) ==
len(array_y) ==
len(array_z)
178df = ak.to_rdataframe({
"x": array_x,
"y": array_y,
"z": array_z})
181### Construct histogram and profile models from
a tuple
183The Histo1D(), Histo2D(), Histo3D(), Profile1D() and Profile2D() methods return
184histograms and profiles, respectively, which can be constructed using
a model
187In Python, we can specify the arguments
for the constructor of such histogram or
188profile model with
a Python tuple, as shown in the example below:
191# First argument is a tuple with the arguments to construct a TH1D model
192h = df.Histo1D((
"histName",
"histTitle", 64, 0., 128.),
"myColumn")
195### AsRNode helper function
200ROOT.gInterpreter.Declare(
"""
201ROOT::RDF::RNode MyTransformation(ROOT::RDF::RNode df) {
202 auto myFunc = [](float x){ return -x;};
203 return df.Define("y", myFunc, {"x"});
207# Cast the RDataFrame head node
209df_transformed =
ROOT.MyTransformation(
ROOT.
RDF.AsRNode(df))
211# ... or any other node
212df2 = df.Filter(
"x > 42")
213df2_transformed =
ROOT.MyTransformation(
ROOT.
RDF.AsRNode(df2))
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void data
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t result
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h length
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void on
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void value
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t UChar_t len
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void when
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t type
The public interface to the RDataFrame federation of classes.
RInterface< Proxied, DS_t > Define(std::string_view name, F expression, const ColumnNames_t &columns={})
Define a new column.
ROOT's RDataFrame offers a modern, high-level interface for analysis of data stored in TTree ,...
RooCmdArg Columns(Int_t ncol)
T Sum(const RVec< T > &v, const T zero=T(0))
Sum elements of an RVec.
RVec< T > Filter(const RVec< T > &v, F &&f)
Create a new collection with the elements passing the filter expressed by the predicate.
ROOT::VecOps::RVec< T > RVec
RNode AsRNode(NodeType node)
Cast a RDataFrame node to the common type ROOT::RDF::RNode.
void function(const Char_t *name_, T fun, const Char_t *docstring=0)
Namespace for new ROOT classes and functions.
constexpr Double_t C()
Velocity of light in .
static uint64_t sum(uint64_t i)