These examples show various features of RDataFrame: ROOT's declarative analysis interface.
RDataFrame offers a high level interface for the analysis of data stored in TTrees, CSV files and other data formats.
In addition, multi-threading and other low-level optimisations allow users to exploit all the resources available on their machines transparently.
In a nutshell:
Explore the examples below or go to RDataFrame's user guide. A list of all the RDataFrame tutorials can be found here.
To get started these examples show how to create a simple RDataFrame, how to process the data in a simple analyses and how to plot distributions.
| Tutorial | Description | |
|---|---|---|
| df000_simple.C | df000_simple.py | Simple RDataFrame example in C++. | 
| df001_introduction.C | df001_introduction.py | Basic RDataFrame usage. | 
| df002_dataModel.C | df002_dataModel.py | Show how to work with non-flat data models, e.g. vectors of tracks. | 
| df039_RResultPtr_basics.C | Learn the difference between lazy and immediate actions. | |
A collection of building block examples for your analysis.
| Tutorial | Description | |
|---|---|---|
| df003_profiles.C | df003_profiles.py | Use TProfiles. | 
| df005_fillAnyObject.C | Fill any object the class of which exposes a Fill method | |
| df006_ranges.C | df006_ranges.py | Use Range to limit the amount of data processed. | 
| df012_DefinesAndFiltersAsStrings.C | df012_DefinesAndFiltersAsStrings.py | Use just-in-time-compiled Filters and Defines for quick prototyping. | 
| df016_vecOps.C | df016_vecOps.py | Process collections in RDataFrame with the help of RVec. | 
| df018_customActions.C | Implement a custom action to fill THns. | |
| df020_helpers.C | Show usage of RDataFrame's helper tools. | |
| df021_createTGraph.C | df021_createTGraph.py | Fill a TGraph. | 
| df022_useKahan.C | Implement a custom action that evaluates a Kahan sum. | |
| df023_aggregate.C | Use the Aggregate action to specify arbitrary data aggregations. | |
| df025_RNode.C | Manipulate RDF objects in functions, loops and conditional branches. | |
| df036_missingBranches.C | df036_missingBranches.py | Deal with missing values due to a missing branch when switching to a new file in a chain. | 
| df037_TTreeEventMatching.C | df037_TTreeEventMatching.py | Deal with missing values due to not finding a matching event in an auxiliary dataset. | 
| df040_RResultPtr_lifetimeManagement.C | Lifetime management of RResultPtr and the underlying objects. | |
The content of a dataframe can be written to a ROOT file. In addition to ROOT files, other file formats can be read.
| Tutorial | Description | |
|---|---|---|
| df007_snapshot.C | df007_snapshot.py | Write out a dataset. | 
| df008_createDataSetFromScratch.C | df008_createDataSetFromScratch.py | Generate data from scratch. | 
| df009_FromScratchVSTTree.C | Compare creation of a ROOT dataset with RDataFrame and TTree. | |
| df010_trivialDataSource.C | df010_trivialDataSource.py | Simplest possible data source. | 
| df014_CSVDataSource.C | df014_CSVDataSource.py | Process a CSV. | 
| df015_LazyDataSource.C | Concatenate computation graphs with the "lazy data source. | |
| df019_Cache.C | df019_Cache.py | Cache a processed RDataFrame in memory for further usage. | 
| df027_SQliteDependencyOverVersion.C | Analyse a remote sqlite3 file. | |
| df028_SQliteIPLocation.C | Plot the location of ROOT downloads reading a remote sqlite3 file. | |
| df029_SQlitePlatformDistribution.C | Analyse data in a sqlite3 file. | |
| df030_SQliteVersionsOfROOT.C | Analyse data in a sqlite3 file and create a plot. | |
From Python, NumPy arrays can be imported into RDataFrame and columns from RDataFrame can be converted to NumPy arrays. A Pandas DataFrame can also be converted into a RDataFrame.
| Tutorial | Description | 
|---|---|
| df026_AsNumpyArrays.py | Read data into Numpy arrays. | 
| df032_RDFFromNumpy.py | Read data from Numpy arrays. | 
| df035_RDFFromPandas.py | Read data from Pandas DataFrame. | 
RDataFrame applications can be executed in parallel through distributed computing frameworks on a set of remote machines via Apache Spark or Dask.
| Tutorial | Description | 
|---|---|
| distrdf001_spark_connection.py | Configure a Spark connection and fill two histograms distributedly. | 
| distrdf002_dask_connection.py | Configure a Dask connection and fill two histograms distributedly. | 
| distrdf003_live_visualization.py | Configure a Dask connection and visualize the filling of a 1D and 2D histograms distributedly. | 
In RDataFrame there exist methods to inspect the data and the computation graph.
| Tutorial | Description | |
|---|---|---|
| df004_cutFlowReport.C | df004_cutFlowReport.py | Display cut/Filter efficiencies. | 
| df013_InspectAnalysis.C | Use callbacks to update a plot and a progress bar during the event loop. | |
| df024_Display.C | df024_Display.py | Use the Display action to inspect entry values. | 
| df031_Stats.C | df031_Stats.py | Use the Stats action to extract the statistics of a column. | 
| df033_Describe.py | Get information about your analysis. | |
| df034_SaveGraph.C | df034_SaveGraph.py | Look at the DAG of your analysis | 
With RDataFrame advanced analyses can be executed on large amounts of data. These examples shows how particle physics analyses can be carried out using Open Data from different experiments.
| Tutorial | Description | |
|---|---|---|
| df017_vecOpsHEP.C | df017_vecOpsHEP.py | Use RVecs to plot the transverse momentum of selected particles. | 
| df101_h1Analysis.C | Express ROOT's standard H1 analysis. | |
| df102_NanoAODDimuonAnalysis.C | df102_NanoAODDimuonAnalysis.py | Process NanoAOD files. | 
| df103_NanoAODHiggsAnalysis.C | df103_NanoAODHiggsAnalysis.py | An example of complex analysis: reconstructing the Higgs boson. | 
| df104_HiggsToTwoPhotons.py | The Higgs to two photons analysis from the ATLAS Open Data 2020 release. | |
| df105_WBosonAnalysis.py | The W boson mass analysis from the ATLAS Open Data release of 2020. | |
| df106_HiggsToFourLeptons.C | df106_HiggsToFourLeptons.py | The Higgs to four lepton analysis from the ATLAS Open Data release of 2020. | 
| df107_SingleTopAnalysis.py | A single top analysis using the ATLAS Open Data release of 2020. | |
Files | |
| file | df000_simple.C | 
     Simple RDataFrame example in C++.  | |
| file | df000_simple.py | 
     Simple RDataFrame example in Python.  | |
| file | df001_introduction.C | 
     Basic RDataFrame usage.  | |
| file | df001_introduction.py | 
     Basic usage of RDataFrame from python.  | |
| file | df002_dataModel.C | 
     Show how to work with non-flat data models, e.g.  | |
| file | df002_dataModel.py | 
     Show how to work with non-flat data models, e.g.  | |
| file | df003_profiles.C | 
     Use TProfiles with RDataFrame.  | |
| file | df003_profiles.py | 
     Use TProfiles with RDataFrame.  | |
| file | df004_cutFlowReport.C | 
     Display cut/Filter efficiencies with RDataFrame.  | |
| file | df004_cutFlowReport.py | 
     Display cut/Filter efficiencies with RDataFrame.  | |
| file | df005_fillAnyObject.C | 
     Using the generic Fill action.  | |
| file | df006_ranges.C | 
     Use Range to limit the amount of data processed.  | |
| file | df006_ranges.py | 
     Use Range to limit the amount of data processed.  | |
| file | df007_snapshot.C | 
     Write ROOT data with RDataFrame.  | |
| file | df007_snapshot.py | 
     Write ROOT data with RDataFrame.  | |
| file | df008_createDataSetFromScratch.C | 
     Create data from scratch with RDataFrame.  | |
| file | df008_createDataSetFromScratch.py | 
     Create data from scratch with RDataFrame.  | |
| file | df009_FromScratchVSTTree.C | 
     Compare creation of a ROOT dataset with RDataFrame and TTree.  | |
| file | df010_trivialDataSource.C | 
     Use the "trivial data source", an example data source implementation.  | |
| file | df010_trivialDataSource.py | 
     Use the "trivial data source", an example data source implementation.  | |
| file | df012_DefinesAndFiltersAsStrings.C | 
     Use just-in-time-compiled Filters and Defines for quick prototyping.  | |
| file | df012_DefinesAndFiltersAsStrings.py | 
     Use just-in-time-compiled Filters and Defines for quick prototyping.  | |
| file | df013_InspectAnalysis.C | 
     Use callbacks to update a plot and a progress bar during the event loop.  | |
| file | df014_CSVDataSource.C | 
     Process a CSV file with RDataFrame and the CSV data source.  | |
| file | df014_CSVDataSource.py | 
     Process a CSV file with RDataFrame and the CSV data source.  | |
| file | df015_LazyDataSource.C | 
     Use the lazy RDataFrame data source to concatenate computation graphs.  | |
| file | df016_vecOps.C | 
     Process collections in RDataFrame with the help of RVec.  | |
| file | df016_vecOps.py | 
     Process collections in RDataFrame with the help of RVec.  | |
| file | df017_vecOpsHEP.C | 
     Use RVecs to plot the transverse momentum of selected particles.  | |
| file | df017_vecOpsHEP.py | 
     Use RVecs to plot the transverse momentum of selected particles.  | |
| file | df018_customActions.C | 
     Implement a custom action to fill THns.  | |
| file | df019_Cache.C | 
     Cache a processed RDataFrame in memory for further usage.  | |
| file | df019_Cache.py | 
     Cache a processed RDataFrame in memory for further usage.  | |
| file | df020_helpers.C | 
     Show usage of RDataFrame's helper tools, contained in ROOT/RDFHelpers.hxx.  | |
| file | df021_createTGraph.C | 
     Fill a TGraph using RDataFrame.  | |
| file | df021_createTGraph.py | 
     Fill a TGraph using RDataFrame.  | |
| file | df022_useKahan.C | 
     Implement a custom action that evaluates a Kahan sum.  | |
| file | df023_aggregate.C | 
     Use the Aggregate action to specify arbitrary data aggregations.  | |
| file | df024_Display.C | 
     Use the Display action to inspect entry values.  | |
| file | df024_Display.py | 
     Use the Display action to inspect entry values.  | |
| file | df025_RNode.C | 
     Manipulate RDF objects in functions, loops and conditional branches.  | |
| file | df026_AsNumpyArrays.py | 
     Read data from RDataFrame into Numpy arrays.  | |
| file | df027_SQliteDependencyOverVersion.C | 
     Plot the ROOT downloads based on the version reading a remote sqlite3 file.  | |
| file | df028_SQliteIPLocation.C | 
     Plot the location of ROOT downloads reading a remote sqlite3 file.  | |
| file | df029_SQlitePlatformDistribution.C | 
     Use RDataFrame to display data about ROOT downloads.  | |
| file | df030_SQliteVersionsOfROOT.C | 
     Read an sqlite3 databases with RDataFrame and plot statistics on ROOT downloads.  | |
| file | df031_Stats.C | 
     Use the Stats action to extract the statistics of a column.  | |
| file | df031_Stats.py | 
     Use the Stats action to extract the statistics of a column.  | |
| file | df032_RDFFromNumpy.py | 
     Read data from Numpy arrays into RDataFrame.  | |
| file | df033_Describe.py | 
     Get information about the dataframe with the convenience method Describe.  | |
| file | df034_SaveGraph.C | 
     Basic SaveGraph usage.  | |
| file | df034_SaveGraph.py | 
     Basic SaveGraph usage.  | |
| file | df035_RDFFromPandas.py | 
     Read data from Pandas Data Frame into RDataFrame.  | |
| file | df036_missingBranches.C | 
      | |
| file | df037_TTreeEventMatching.C | 
      | |
| file | df038_NumbaDeclare.py | 
     This tutorial illustrates how PyROOT supports declaring C++ callables from Python callables making them, for example, usable with RDataFrame.  | |
| file | df039_RResultPtr_basics.C | 
     Usage of RResultPtr.  | |
| file | df040_RResultPtr_lifetimeManagement.C | 
     Usage of RResultPtr: Lifetime management.  | |
| file | df101_h1Analysis.C | 
     Show how to express ROOT's standard H1 analysis with RDataFrame.  | |
| file | df102_NanoAODDimuonAnalysis.C | 
     Show how NanoAOD files can be processed with RDataFrame.  | |
| file | df102_NanoAODDimuonAnalysis.py | 
     Show how NanoAOD files can be processed with RDataFrame.  | |
| file | df103_NanoAODHiggsAnalysis.C | 
     An example of complex analysis with RDataFrame: reconstructing the Higgs boson.  | |
| file | df103_NanoAODHiggsAnalysis.py | 
     An example of complex analysis with RDataFrame: reconstructing the Higgs boson.  | |
| file | df103_NanoAODHiggsAnalysis_python.h | 
| Header file with functions needed to execute the Python version of the NanoAOD Higgs tutorial.  | |
| file | df104_HiggsToTwoPhotons.py | 
     The Higgs to two photons analysis from the ATLAS Open Data 2020 release, with RDataFrame.  | |
| file | df105_WBosonAnalysis.py | 
     The W boson mass analysis from the ATLAS Open Data release of 2020, with RDataFrame.  | |
| file | df106_HiggsToFourLeptons.C | 
     The Higgs to four lepton analysis from the ATLAS Open Data release of 2020, with RDataFrame.  | |
| file | df106_HiggsToFourLeptons.py | 
     The Higgs to four lepton analysis from the ATLAS Open Data release of 2020, with RDataFrame.  | |
| file | df107_SingleTopAnalysis.py | 
     A single top analysis using the ATLAS Open Data release of 2020, with RDataFrame.  | |
| file | distrdf001_spark_connection.py | 
     Configure a Spark connection and fill two histograms distributedly.  | |
| file | distrdf002_dask_connection.py | 
     Configure a Dask connection and fill two histograms distributedly.  | |
| file | distrdf003_live_visualization.py | 
     Configure a Dask connection and visualize the filling of a 1D and 2D histograms distributedly.  | |