Logo ROOT  
Reference Guide
No Matches
df019_Cache.C File Reference

Detailed Description

View in nbviewer Open in SWAN
Cache a processed RDataFrame in memory for further usage.

This tutorial shows how the content of a data frame can be cached in memory in form of a dataframe. The content of the columns is stored in memory in contiguous slabs of memory and is "ready to use", i.e. no ROOT IO operation is performed.

Creating a cached data frame storing all of its content deserialised and uncompressed in memory is particularly useful when dealing with datasets of a moderate size (small enough to fit the RAM) over which several explorative loops need to be performed as fast as possible. In addition, caching can be useful when no file on disk needs to be created as a side effect of checkpointing part of the analysis.

All steps in the caching are lazy, i.e. the cached data frame is actually filled only when the event loop is triggered on it.

// We create a data frame on top of the hsimple example.
auto hsimplePath = gROOT->GetTutorialDir();
hsimplePath += "/hsimple.root";
ROOT::RDataFrame df("ntuple", hsimplePath.Data());
// We apply a simple cut and define a new column.
auto df_cut = df.Filter([](float py) { return py > 0.f; }, {"py"})
.Define("px_plus_py", [](float px, float py) { return px + py; }, {"px", "py"});
// We cache the content of the dataset. Nothing has happened yet: the work to accomplish
// has been described. As for `Snapshot`, the types and columns can be written out explicitly
// or left for the jitting to handle (`df_cached` is intentionally unused - it shows how
// to create a *cached* dataframe specifying column types explicitly):
auto df_cached = df_cut.Cache<float, float>({"px_plus_py", "py"});
auto df_cached_implicit = df_cut.Cache();
auto h = df_cached_implicit.Histo1D<float>("px_plus_py");
// Now the event loop on the cached dataset is triggered. This event triggers the loop
// on the `df` data frame lazily.
#define h(i)
Definition RSha256.hxx:106
#define gROOT
Definition TROOT.h:406
ROOT's RDataFrame offers a modern, high-level interface for analysis of data stored in TTree ,...
June 2018
Danilo Piparo (CERN)

Definition in file df019_Cache.C.