Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
df019_Cache.C
Go to the documentation of this file.
1/// \file
2/// \ingroup tutorial_dataframe
3/// \notebook -draw
4/// Cache a processed RDataFrame in memory for further usage.
5///
6/// This tutorial shows how the content of a data frame can be cached in memory
7/// in form of a dataframe. The content of the columns is stored in memory in
8/// contiguous slabs of memory and is "ready to use", i.e. no ROOT IO operation
9/// is performed.
10///
11/// Creating a cached data frame storing all of its content deserialised and uncompressed
12/// in memory is particularly useful when dealing with datasets of a moderate size
13/// (small enough to fit the RAM) over which several explorative loops need to be
14/// performed as fast as possible. In addition, caching can be useful when no file
15/// on disk needs to be created as a side effect of checkpointing part of the analysis.
16///
17/// All steps in the caching are lazy, i.e. the cached data frame is actually filled
18/// only when the event loop is triggered on it.
19///
20/// \macro_code
21/// \macro_image
22///
23/// \date June 2018
24/// \author Danilo Piparo (CERN)
25
26void df019_Cache()
27{
28 // We create a data frame on top of the hsimple example.
29 auto hsimplePath = gROOT->GetTutorialDir();
30 hsimplePath += "/hsimple.root";
31 ROOT::RDataFrame df("ntuple", hsimplePath.Data());
32
33 // We apply a simple cut and define a new column.
34 auto df_cut = df.Filter([](float py) { return py > 0.f; }, {"py"})
35 .Define("px_plus_py", [](float px, float py) { return px + py; }, {"px", "py"});
36
37 // We cache the content of the dataset. Nothing has happened yet: the work to accomplish
38 // has been described. As for `Snapshot`, the types and columns can be written out explicitly
39 // or left for the jitting to handle (`df_cached` is intentionally unused - it shows how
40 // to create a *cached* dataframe specifying column types explicitly):
41 auto df_cached = df_cut.Cache<float, float>({"px_plus_py", "py"});
42 auto df_cached_implicit = df_cut.Cache();
43 auto h = df_cached_implicit.Histo1D<float>("px_plus_py");
44
45 // Now the event loop on the cached dataset is triggered. This event triggers the loop
46 // on the `df` data frame lazily.
47 h->DrawCopy();
48}
#define h(i)
Definition RSha256.hxx:106
#define gROOT
Definition TROOT.h:406
ROOT's RDataFrame offers a modern, high-level interface for analysis of data stored in TTree ,...