Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
RDataFrame analysis tutorials

These examples show various features of RDataFrame: ROOT's declarative analysis interface.

RDataFrame offers a high level interface for the analysis of data stored in TTrees, CSV files and other data formats.

In addition, multi-threading and other low-level optimisations allow users to exploit all the resources available on their machines transparently.

In a nutshell:

ROOT::EnableImplicitMT(); // Enable ROOT's implicit multi-threading
ROOT::RDataFrame d("myTree", "file_*.root"); // Interface to TTree and TChain
auto histoA = d.Histo1D("Branch_A"); // Book the filling of a histogram
auto histoB = d.Histo1D("Branch_B"); // Book the filling of another histogram
// Data processing is triggered by the next line, which accesses a booked result for the first time
// All booked results are evaluated during the same parallel event loop.
histoA->Draw(); // <-- event loop runs here!
histoB->Draw(); // HistoB has already been filled, no event loop is run here
#define d(i)
Definition RSha256.hxx:102
ROOT's RDataFrame offers a modern, high-level interface for analysis of data stored in TTree ,...
void EnableImplicitMT(UInt_t numthreads=0)
Enable ROOT's implicit multi-threading for all objects and methods that provide an internal paralleli...
Definition TROOT.cxx:539

Explore the examples below or go to RDataFrame's user guide. A list of all the RDataFrame tutorials can be found here.

Table of contents

Introduction

To get started these examples show how to create a simple RDataFrame, how to process the data in a simple analyses and how to plot distributions.

Tutorial Description
df000_simple.C df000_simple.py Simple RDataFrame example in C++.
df001_introduction.C df001_introduction.py Basic RDataFrame usage.
df002_dataModel.C df002_dataModel.py Show how to work with non-flat data models, e.g. vectors of tracks.

Processing your data

A collection of building block examples for your analysis.

Tutorial Description
df003_profiles.C df003_profiles.py Use TProfiles.
df005_fillAnyObject.C Fill any object the class of which exposes a Fill method
df006_ranges.C df006_ranges.py Use Range to limit the amount of data processed.
df012_DefinesAndFiltersAsStrings.C df012_DefinesAndFiltersAsStrings.py Use just-in-time-compiled Filters and Defines for quick prototyping.
df016_vecOps.C df016_vecOps.py Process collections in RDataFrame with the help of RVec.
df018_customActions.C Implement a custom action to fill THns.
df020_helpers.C Show usage of RDataFrame's helper tools.
df021_createTGraph.C df021_createTGraph.py Fill a TGraph.
df022_useKahan.C Implement a custom action that evaluates a Kahan sum.
df023_aggregate.C Use the Aggregate action to specify arbitrary data aggregations.
df025_RNode.C Manipulate RDF objects in functions, loops and conditional branches.
df036_missingBranches.C Deal with missing values due to a missing branch when switching to a new file in a chain.
df037_TTreeEventMatching.C Deal with missing values due to not finding a matching event in an auxiliary dataset.

Write and read from many sources

The content of a dataframe can be written to a ROOT file. In addition to ROOT files, other file formats can be read.

Tutorial Description
df007_snapshot.C df007_snapshot.py Write out a dataset.
df008_createDataSetFromScratch.C df008_createDataSetFromScratch.py Generate data from scratch.
df009_FromScratchVSTTree.C Compare creation of a ROOT dataset with RDataFrame and TTree.
df010_trivialDataSource.C df010_trivialDataSource.py Simplest possible data source.
df014_CSVDataSource.C df014_CSVDataSource.py Process a CSV.
df015_LazyDataSource.C Concatenate computation graphs with the "lazy data source.
df019_Cache.C df019_Cache.py Cache a processed RDataFrame in memory for further usage.
df027_SQliteDependencyOverVersion.C Analyse a remote sqlite3 file.
df028_SQliteIPLocation.C Plot the location of ROOT downloads reading a remote sqlite3 file.
df029_SQlitePlatformDistribution.C Analyse data in a sqlite3 file.
df030_SQliteVersionsOfROOT.C Analyse data in a sqlite3 file and create a plot.

Interface with Numpy and Pandas

From Python, NumPy arrays can be imported into RDataFrame and columns from RDataFrame can be converted to NumPy arrays. A Pandas DataFrame can also be converted into a RDataFrame.

Tutorial Description
df026_AsNumpyArrays.py Read data into Numpy arrays.
df032_RDFFromNumpy.py Read data from Numpy arrays.
df035_RDFFromPandas.py Read data from Pandas DataFrame.

Distributed execution in Python

RDataFrame applications can be executed in parallel through distributed computing frameworks on a set of remote machines via Apache Spark or Dask.

Tutorial Description
distrdf001_spark_connection.py Configure a Spark connection and fill two histograms distributedly.
distrdf002_dask_connection.py Configure a Dask connection and fill two histograms distributedly.
distrdf003_live_visualization.py Configure a Dask connection and visualize the filling of a 1D and 2D histograms distributedly.

Know more about your analysis

In RDataFrame there exist methods to inspect the data and the computation graph.

Tutorial Description
df004_cutFlowReport.C df004_cutFlowReport.py Display cut/Filter efficiencies.
df013_InspectAnalysis.C Use callbacks to update a plot and a progress bar during the event loop.
df024_Display.C df024_Display.py Use the Display action to inspect entry values.
df031_Stats.C df031_Stats.py Use the Stats action to extract the statistics of a column.
df033_Describe.py Get information about your analysis.
df034_SaveGraph.C df034_SaveGraph.py Look at the DAG of your analysis

Example HEP analyses tutorials

With RDataFrame advanced analyses can be executed on large amounts of data. These examples shows how particle physics analyses can be carried out using Open Data from different experiments.

Tutorial Description
df017_vecOpsHEP.C df017_vecOpsHEP.py Use RVecs to plot the transverse momentum of selected particles.
df101_h1Analysis.C Express ROOT's standard H1 analysis.
df102_NanoAODDimuonAnalysis.C df102_NanoAODDimuonAnalysis.py Process NanoAOD files.
df103_NanoAODHiggsAnalysis.C df103_NanoAODHiggsAnalysis.py An example of complex analysis: reconstructing the Higgs boson.
df104_HiggsToTwoPhotons.py The Higgs to two photons analysis from the ATLAS Open Data 2020 release.
df105_WBosonAnalysis.py The W boson mass analysis from the ATLAS Open Data release of 2020.
df106_HiggsToFourLeptons.C df106_HiggsToFourLeptons.py The Higgs to four lepton analysis from the ATLAS Open Data release of 2020.
df107_SingleTopAnalysis.py A single top analysis using the ATLAS Open Data release of 2020.

Files

file  df000_simple.C
  View in nbviewer Open in SWAN
Simple RDataFrame example in C++.
 
file  df000_simple.py
  View in nbviewer Open in SWAN
Simple RDataFrame example in Python.
 
file  df001_introduction.C
  View in nbviewer Open in SWAN
Basic RDataFrame usage.
 
file  df001_introduction.py
  View in nbviewer Open in SWAN
Basic usage of RDataFrame from python.
 
file  df002_dataModel.C
  View in nbviewer Open in SWAN
Show how to work with non-flat data models, e.g.
 
file  df002_dataModel.py
  View in nbviewer Open in SWAN
Show how to work with non-flat data models, e.g.
 
file  df003_profiles.C
  View in nbviewer Open in SWAN
Use TProfiles with RDataFrame.
 
file  df003_profiles.py
  View in nbviewer Open in SWAN
Use TProfiles with RDataFrame.
 
file  df004_cutFlowReport.C
  View in nbviewer Open in SWAN
Display cut/Filter efficiencies with RDataFrame.
 
file  df004_cutFlowReport.py
  View in nbviewer Open in SWAN
Display cut/Filter efficiencies with RDataFrame.
 
file  df005_fillAnyObject.C
  View in nbviewer Open in SWAN
Using the generic Fill action.
 
file  df006_ranges.C
  View in nbviewer Open in SWAN
Use Range to limit the amount of data processed.
 
file  df006_ranges.py
  View in nbviewer Open in SWAN
Use Range to limit the amount of data processed.
 
file  df007_snapshot.C
  View in nbviewer Open in SWAN
Write ROOT data with RDataFrame.
 
file  df007_snapshot.py
  View in nbviewer Open in SWAN
Write ROOT data with RDataFrame.
 
file  df008_createDataSetFromScratch.C
  View in nbviewer Open in SWAN
Create data from scratch with RDataFrame.
 
file  df008_createDataSetFromScratch.py
  View in nbviewer Open in SWAN
Create data from scratch with RDataFrame.
 
file  df009_FromScratchVSTTree.C
  View in nbviewer Open in SWAN
Compare creation of a ROOT dataset with RDataFrame and TTree.
 
file  df010_trivialDataSource.C
  View in nbviewer Open in SWAN
Use the "trivial data source", an example data source implementation.
 
file  df010_trivialDataSource.py
  View in nbviewer Open in SWAN
Use the "trivial data source", an example data source implementation.
 
file  df012_DefinesAndFiltersAsStrings.C
  View in nbviewer Open in SWAN
Use just-in-time-compiled Filters and Defines for quick prototyping.
 
file  df012_DefinesAndFiltersAsStrings.py
  View in nbviewer Open in SWAN
Use just-in-time-compiled Filters and Defines for quick prototyping.
 
file  df013_InspectAnalysis.C
  View in nbviewer Open in SWAN
Use callbacks to update a plot and a progress bar during the event loop.
 
file  df014_CSVDataSource.C
  View in nbviewer Open in SWAN
Process a CSV file with RDataFrame and the CSV data source.
 
file  df014_CSVDataSource.py
  View in nbviewer Open in SWAN
Process a CSV file with RDataFrame and the CSV data source.
 
file  df015_LazyDataSource.C
  View in nbviewer Open in SWAN
Use the lazy RDataFrame data source to concatenate computation graphs.
 
file  df016_vecOps.C
  View in nbviewer Open in SWAN
Process collections in RDataFrame with the help of RVec.
 
file  df016_vecOps.py
  View in nbviewer Open in SWAN
Process collections in RDataFrame with the help of RVec.
 
file  df017_vecOpsHEP.C
  View in nbviewer Open in SWAN
Use RVecs to plot the transverse momentum of selected particles.
 
file  df017_vecOpsHEP.py
  View in nbviewer Open in SWAN
Use RVecs to plot the transverse momentum of selected particles.
 
file  df018_customActions.C
  View in nbviewer Open in SWAN
Implement a custom action to fill THns.
 
file  df019_Cache.C
  View in nbviewer Open in SWAN
Cache a processed RDataFrame in memory for further usage.
 
file  df019_Cache.py
  View in nbviewer Open in SWAN
Cache a processed RDataFrame in memory for further usage.
 
file  df020_helpers.C
  View in nbviewer Open in SWAN
Show usage of RDataFrame's helper tools, contained in ROOT/RDFHelpers.hxx.
 
file  df021_createTGraph.C
  View in nbviewer Open in SWAN
Fill a TGraph using RDataFrame.
 
file  df021_createTGraph.py
  View in nbviewer Open in SWAN
Fill a TGraph using RDataFrame.
 
file  df022_useKahan.C
  View in nbviewer Open in SWAN
Implement a custom action that evaluates a Kahan sum.
 
file  df023_aggregate.C
  View in nbviewer Open in SWAN
Use the Aggregate action to specify arbitrary data aggregations.
 
file  df024_Display.C
  View in nbviewer Open in SWAN
Use the Display action to inspect entry values.
 
file  df024_Display.py
  View in nbviewer Open in SWAN
Use the Display action to inspect entry values.
 
file  df025_RNode.C
  View in nbviewer Open in SWAN
Manipulate RDF objects in functions, loops and conditional branches.
 
file  df026_AsNumpyArrays.py
  View in nbviewer Open in SWAN
Read data from RDataFrame into Numpy arrays.
 
file  df027_SQliteDependencyOverVersion.C
  View in nbviewer Open in SWAN
Plot the ROOT downloads based on the version reading a remote sqlite3 file.
 
file  df028_SQliteIPLocation.C
  View in nbviewer Open in SWAN
Plot the location of ROOT downloads reading a remote sqlite3 file.
 
file  df029_SQlitePlatformDistribution.C
  View in nbviewer Open in SWAN
Use RDataFrame to display data about ROOT downloads.
 
file  df030_SQliteVersionsOfROOT.C
  View in nbviewer Open in SWAN
Read an sqlite3 databases with RDataFrame and plot statistics on ROOT downloads.
 
file  df031_Stats.C
  View in nbviewer Open in SWAN
Use the Stats action to extract the statistics of a column.
 
file  df031_Stats.py
  View in nbviewer Open in SWAN
Use the Stats action to extract the statistics of a column.
 
file  df032_RDFFromNumpy.py
  View in nbviewer Open in SWAN
Read data from Numpy arrays into RDataFrame.
 
file  df033_Describe.py
  View in nbviewer Open in SWAN
Get information about the dataframe with the convenience method Describe.
 
file  df034_SaveGraph.C
  View in nbviewer Open in SWAN
Basic SaveGraph usage.
 
file  df034_SaveGraph.py
  View in nbviewer Open in SWAN
Basic SaveGraph usage.
 
file  df035_RDFFromPandas.py
  View in nbviewer Open in SWAN
Read data from Pandas Data Frame into RDataFrame.
 
file  df036_missingBranches.C
  View in nbviewer Open in SWAN

 
file  df037_TTreeEventMatching.C
  View in nbviewer Open in SWAN

 
file  df038_NumbaDeclare.py
  View in nbviewer Open in SWAN
This tutorial illustrates how PyROOT supports declaring C++ callables from Python callables making them, for example, usable with RDataFrame.
 
file  df101_h1Analysis.C
  View in nbviewer Open in SWAN
Show how to express ROOT's standard H1 analysis with RDataFrame.
 
file  df102_NanoAODDimuonAnalysis.C
  View in nbviewer Open in SWAN
Show how NanoAOD files can be processed with RDataFrame.
 
file  df102_NanoAODDimuonAnalysis.py
  View in nbviewer Open in SWAN
Show how NanoAOD files can be processed with RDataFrame.
 
file  df103_NanoAODHiggsAnalysis.C
  View in nbviewer Open in SWAN
An example of complex analysis with RDataFrame: reconstructing the Higgs boson.
 
file  df103_NanoAODHiggsAnalysis.py
  View in nbviewer Open in SWAN
An example of complex analysis with RDataFrame: reconstructing the Higgs boson.
 
file  df103_NanoAODHiggsAnalysis_python.h
 Header file with functions needed to execute the Python version of the NanoAOD Higgs tutorial.
 
file  df104_HiggsToTwoPhotons.py
  View in nbviewer Open in SWAN
The Higgs to two photons analysis from the ATLAS Open Data 2020 release, with RDataFrame.
 
file  df105_WBosonAnalysis.py
  View in nbviewer Open in SWAN
The W boson mass analysis from the ATLAS Open Data release of 2020, with RDataFrame.
 
file  df106_HiggsToFourLeptons.C
  View in nbviewer Open in SWAN
The Higgs to four lepton analysis from the ATLAS Open Data release of 2020, with RDataFrame.
 
file  df106_HiggsToFourLeptons.py
  View in nbviewer Open in SWAN
The Higgs to four lepton analysis from the ATLAS Open Data release of 2020, with RDataFrame.
 
file  df107_SingleTopAnalysis.py
  View in nbviewer Open in SWAN
A single top analysis using the ATLAS Open Data release of 2020, with RDataFrame.
 
file  distrdf001_spark_connection.py
  View in nbviewer Open in SWAN
Configure a Spark connection and fill two histograms distributedly.
 
file  distrdf002_dask_connection.py
  View in nbviewer Open in SWAN
Configure a Dask connection and fill two histograms distributedly.
 
file  distrdf003_live_visualization.py
  View in nbviewer Open in SWAN
Configure a Dask connection and visualize the filling of a 1D and 2D histograms distributedly.