Logo ROOT  
Reference Guide
df026_AsNumpyArrays.py File Reference

Namespaces

namespace  df026_AsNumpyArrays
 

Detailed Description

View in nbviewer Open in SWAN Read data from RDataFrame into Numpy arrays.

import ROOT
from sys import exit
# Let's create a simple dataframe with ten rows and two columns
df = ROOT.RDataFrame(10) \
.Define("x", "(int)rdfentry_") \
.Define("y", "1.f/(1.f+rdfentry_)")
# Next, we want to access the data from Python as Numpy arrays. To do so, the
# content of the dataframe is converted using the AsNumpy method. The returned
# object is a dictionary with the column names as keys and 1D numpy arrays with
# the content as values.
npy = df.AsNumpy()
print("Read-out of the full RDataFrame:\n{}\n".format(npy))
# Since reading out data to memory is expensive, always try to read-out only what
# is needed for your analysis. You can use all RDataFrame features to reduce your
# dataset, e.g., the Filter transformation. Furthermore, you can can pass to the
# AsNumpy method a whitelist of column names with the option `columns` or a blacklist
# with column names with the option `exclude`.
df2 = df.Filter("x>5")
npy2 = df2.AsNumpy()
print("Read-out of the filtered RDataFrame:\n{}\n".format(npy2))
npy3 = df2.AsNumpy(columns=["x"])
print("Read-out of the filtered RDataFrame with the columns option:\n{}\n".format(npy3))
npy4 = df2.AsNumpy(exclude=["x"])
print("Read-out of the filtered RDataFrame with the exclude option:\n{}\n".format(npy4))
# You can read-out all objects from ROOT files since these are wrapped by PyROOT
# in the Python world. However, be aware that objects other than fundamental types,
# such as complex C++ objects and not int or float, are costly to read-out.
ROOT.gInterpreter.Declare("""
// Inject the C++ class CustomObject in the C++ runtime.
class CustomObject {
public:
int x = 42;
};
// Create a function that returns such an object. This is called to fill the dataframe.
CustomObject fill_object() { return CustomObject(); }
""")
df3 = df.Define("custom_object", "fill_object()")
npy5 = df3.AsNumpy()
print("Read-out of C++ objects:\n{}\n".format(npy5["custom_object"]))
print("Access to all methods and data members of the C++ object:\nObject: {}\nAccess data member: custom_object.x = {}\n".format(
repr(npy5["custom_object"][0]), npy5["custom_object"][0].x))
# Note that you can pass the object returned by AsNumpy directly to pandas.DataFrame
# including any complex C++ object that may be read-out.
try:
import pandas
except:
print("Please install the pandas package to run this section of the tutorial.")
exit(1)
df = pandas.DataFrame(npy5)
print("Content of the ROOT.RDataFrame as pandas.DataFrame:\n{}\n".format(df))
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t format
ROOT's RDataFrame offers a modern, high-level interface for analysis of data stored in TTree ,...
Definition: RDataFrame.hxx:40
Read-out of the full RDataFrame:
{'x': ndarray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32), 'y': ndarray([1. , 0.5 , 0.33333334, 0.25 , 0.2 ,
0.16666667, 0.14285715, 0.125 , 0.11111111, 0.1 ],
dtype=float32)}
Read-out of the filtered RDataFrame:
{'x': ndarray([6, 7, 8, 9], dtype=int32), 'y': ndarray([0.14285715, 0.125 , 0.11111111, 0.1 ], dtype=float32)}
Read-out of the filtered RDataFrame with the columns option:
{'x': ndarray([6, 7, 8, 9], dtype=int32)}
Read-out of the filtered RDataFrame with the exclude option:
{'y': ndarray([0.14285715, 0.125 , 0.11111111, 0.1 ], dtype=float32)}
Read-out of C++ objects:
[<cppyy.gbl.CustomObject object at 0x9b91200>
<cppyy.gbl.CustomObject object at 0x9b91204>
<cppyy.gbl.CustomObject object at 0x9b91208>
<cppyy.gbl.CustomObject object at 0x9b9120c>
<cppyy.gbl.CustomObject object at 0x9b91210>
<cppyy.gbl.CustomObject object at 0x9b91214>
<cppyy.gbl.CustomObject object at 0x9b91218>
<cppyy.gbl.CustomObject object at 0x9b9121c>
<cppyy.gbl.CustomObject object at 0x9b91220>
<cppyy.gbl.CustomObject object at 0x9b91224>]
Access to all methods and data members of the C++ object:
Object: <cppyy.gbl.CustomObject object at 0x9b91200>
Access data member: custom_object.x = 42
Content of the ROOT.RDataFrame as pandas.DataFrame:
custom_object x y
0 <cppyy.gbl.CustomObject object at 0x9b91200> 0 1.000000
1 <cppyy.gbl.CustomObject object at 0x9b91204> 1 0.500000
2 <cppyy.gbl.CustomObject object at 0x9b91208> 2 0.333333
3 <cppyy.gbl.CustomObject object at 0x9b9120c> 3 0.250000
4 <cppyy.gbl.CustomObject object at 0x9b91210> 4 0.200000
5 <cppyy.gbl.CustomObject object at 0x9b91214> 5 0.166667
6 <cppyy.gbl.CustomObject object at 0x9b91218> 6 0.142857
7 <cppyy.gbl.CustomObject object at 0x9b9121c> 7 0.125000
8 <cppyy.gbl.CustomObject object at 0x9b91220> 8 0.111111
9 <cppyy.gbl.CustomObject object at 0x9b91224> 9 0.100000
Date
December 2018
Author
Stefan Wunsch (KIT, CERN)

Definition in file df026_AsNumpyArrays.py.