These examples show various features of RDataFrame: ROOT's declarative analysis interface.
RDataFrame offers a high level interface for the analysis of data stored in TTrees, CSV files and other data formats.
In addition, multi-threading and other low-level optimisations allow users to exploit all the resources available on their machines transparently.
In a nutshell:
Explore the examples below or go to RDataFrame's user guide. A list of all the RDataFrame tutorials can be found here.
To get started these examples show how to create a simple RDataFrame, how to process the data in a simple analyses and how to plot distributions.
Tutorial | Description | |
---|---|---|
df000_simple.C | df000_simple.py | Simple RDataFrame example in C++. |
df001_introduction.C | df001_introduction.py | Basic RDataFrame usage. |
df002_dataModel.C | df002_dataModel.py | Show how to work with non-flat data models, e.g. vectors of tracks. |
A collection of building block examples for your analysis.
Tutorial | Description | |
---|---|---|
df003_profiles.C | df003_profiles.py | Use TProfiles. |
df005_fillAnyObject.C | Fill any object the class of which exposes a Fill method | |
df006_ranges.C | df006_ranges.py | Use Range to limit the amount of data processed. |
df012_DefinesAndFiltersAsStrings.C | df012_DefinesAndFiltersAsStrings.py | Use just-in-time-compiled Filters and Defines for quick prototyping. |
df016_vecOps.C | df016_vecOps.py | Process collections in RDataFrame with the help of RVec. |
df018_customActions.C | Implement a custom action to fill THns. | |
df020_helpers.C | Show usage of RDataFrame's helper tools. | |
df021_createTGraph.C | df021_createTGraph.py | Fill a TGraph. |
df022_useKahan.C | Implement a custom action that evaluates a Kahan sum. | |
df023_aggregate.C | Use the Aggregate action to specify arbitrary data aggregations. | |
df025_RNode.C | Manipulate RDF objects in functions, loops and conditional branches. | |
df036_missingBranches.C | Deal with missing values due to a missing branch when switching to a new file in a chain. | |
df037_TTreeEventMatching.C | Deal with missing values due to not finding a matching event in an auxiliary dataset. |
The content of a dataframe can be written to a ROOT file. In addition to ROOT files, other file formats can be read.
Tutorial | Description | |
---|---|---|
df007_snapshot.C | df007_snapshot.py | Write out a dataset. |
df008_createDataSetFromScratch.C | df008_createDataSetFromScratch.py | Generate data from scratch. |
df009_FromScratchVSTTree.C | Compare creation of a ROOT dataset with RDataFrame and TTree. | |
df010_trivialDataSource.C | df010_trivialDataSource.py | Simplest possible data source. |
df014_CSVDataSource.C | df014_CSVDataSource.py | Process a CSV. |
df015_LazyDataSource.C | Concatenate computation graphs with the "lazy data source. | |
df019_Cache.C | df019_Cache.py | Cache a processed RDataFrame in memory for further usage. |
df027_SQliteDependencyOverVersion.C | Analyse a remote sqlite3 file. | |
df028_SQliteIPLocation.C | Plot the location of ROOT downloads reading a remote sqlite3 file. | |
df029_SQlitePlatformDistribution.C | Analyse data in a sqlite3 file. | |
df030_SQliteVersionsOfROOT.C | Analyse data in a sqlite3 file and create a plot. |
From Python, NumPy arrays can be imported into RDataFrame and columns from RDataFrame can be converted to NumPy arrays. A Pandas DataFrame can also be converted into a RDataFrame.
Tutorial | Description |
---|---|
df026_AsNumpyArrays.py | Read data into Numpy arrays. |
df032_RDFFromNumpy.py | Read data from Numpy arrays. |
df035_RDFFromPandas.py | Read data from Pandas DataFrame. |
RDataFrame applications can be executed in parallel through distributed computing frameworks on a set of remote machines via Apache Spark or Dask.
Tutorial | Description |
---|---|
distrdf001_spark_connection.py | Configure a Spark connection and fill two histograms distributedly. |
distrdf002_dask_connection.py | Configure a Dask connection and fill two histograms distributedly. |
distrdf003_live_visualization.py | Configure a Dask connection and visualize the filling of a 1D and 2D histograms distributedly. |
In RDataFrame there exist methods to inspect the data and the computation graph.
Tutorial | Description | |
---|---|---|
df004_cutFlowReport.C | df004_cutFlowReport.py | Display cut/Filter efficiencies. |
df013_InspectAnalysis.C | Use callbacks to update a plot and a progress bar during the event loop. | |
df024_Display.C | df024_Display.py | Use the Display action to inspect entry values. |
df031_Stats.C | df031_Stats.py | Use the Stats action to extract the statistics of a column. |
df033_Describe.py | Get information about your analysis. | |
df034_SaveGraph.C | df034_SaveGraph.py | Look at the DAG of your analysis |
With RDataFrame advanced analyses can be executed on large amounts of data. These examples shows how particle physics analyses can be carried out using Open Data from different experiments.
Tutorial | Description | |
---|---|---|
df017_vecOpsHEP.C | df017_vecOpsHEP.py | Use RVecs to plot the transverse momentum of selected particles. |
df101_h1Analysis.C | Express ROOT's standard H1 analysis. | |
df102_NanoAODDimuonAnalysis.C | df102_NanoAODDimuonAnalysis.py | Process NanoAOD files. |
df103_NanoAODHiggsAnalysis.C | df103_NanoAODHiggsAnalysis.py | An example of complex analysis: reconstructing the Higgs boson. |
df104_HiggsToTwoPhotons.py | The Higgs to two photons analysis from the ATLAS Open Data 2020 release. | |
df105_WBosonAnalysis.py | The W boson mass analysis from the ATLAS Open Data release of 2020. | |
df106_HiggsToFourLeptons.C | df106_HiggsToFourLeptons.py | The Higgs to four lepton analysis from the ATLAS Open Data release of 2020. |
df107_SingleTopAnalysis.py | A single top analysis using the ATLAS Open Data release of 2020. |
Files | |
file | df000_simple.C |
Simple RDataFrame example in C++. | |
file | df000_simple.py |
Simple RDataFrame example in Python. | |
file | df001_introduction.C |
Basic RDataFrame usage. | |
file | df001_introduction.py |
Basic usage of RDataFrame from python. | |
file | df002_dataModel.C |
Show how to work with non-flat data models, e.g. | |
file | df002_dataModel.py |
Show how to work with non-flat data models, e.g. | |
file | df003_profiles.C |
Use TProfiles with RDataFrame. | |
file | df003_profiles.py |
Use TProfiles with RDataFrame. | |
file | df004_cutFlowReport.C |
Display cut/Filter efficiencies with RDataFrame. | |
file | df004_cutFlowReport.py |
Display cut/Filter efficiencies with RDataFrame. | |
file | df005_fillAnyObject.C |
Using the generic Fill action. | |
file | df006_ranges.C |
Use Range to limit the amount of data processed. | |
file | df006_ranges.py |
Use Range to limit the amount of data processed. | |
file | df007_snapshot.C |
Write ROOT data with RDataFrame. | |
file | df007_snapshot.py |
Write ROOT data with RDataFrame. | |
file | df008_createDataSetFromScratch.C |
Create data from scratch with RDataFrame. | |
file | df008_createDataSetFromScratch.py |
Create data from scratch with RDataFrame. | |
file | df009_FromScratchVSTTree.C |
Compare creation of a ROOT dataset with RDataFrame and TTree. | |
file | df010_trivialDataSource.C |
Use the "trivial data source", an example data source implementation. | |
file | df010_trivialDataSource.py |
Use the "trivial data source", an example data source implementation. | |
file | df012_DefinesAndFiltersAsStrings.C |
Use just-in-time-compiled Filters and Defines for quick prototyping. | |
file | df012_DefinesAndFiltersAsStrings.py |
Use just-in-time-compiled Filters and Defines for quick prototyping. | |
file | df013_InspectAnalysis.C |
Use callbacks to update a plot and a progress bar during the event loop. | |
file | df014_CSVDataSource.C |
Process a CSV file with RDataFrame and the CSV data source. | |
file | df014_CSVDataSource.py |
Process a CSV file with RDataFrame and the CSV data source. | |
file | df015_LazyDataSource.C |
Use the lazy RDataFrame data source to concatenate computation graphs. | |
file | df016_vecOps.C |
Process collections in RDataFrame with the help of RVec. | |
file | df016_vecOps.py |
Process collections in RDataFrame with the help of RVec. | |
file | df017_vecOpsHEP.C |
Use RVecs to plot the transverse momentum of selected particles. | |
file | df017_vecOpsHEP.py |
Use RVecs to plot the transverse momentum of selected particles. | |
file | df018_customActions.C |
Implement a custom action to fill THns. | |
file | df019_Cache.C |
Cache a processed RDataFrame in memory for further usage. | |
file | df019_Cache.py |
Cache a processed RDataFrame in memory for further usage. | |
file | df020_helpers.C |
Show usage of RDataFrame's helper tools, contained in ROOT/RDFHelpers.hxx. | |
file | df021_createTGraph.C |
Fill a TGraph using RDataFrame. | |
file | df021_createTGraph.py |
Fill a TGraph using RDataFrame. | |
file | df022_useKahan.C |
Implement a custom action that evaluates a Kahan sum. | |
file | df023_aggregate.C |
Use the Aggregate action to specify arbitrary data aggregations. | |
file | df024_Display.C |
Use the Display action to inspect entry values. | |
file | df024_Display.py |
Use the Display action to inspect entry values. | |
file | df025_RNode.C |
Manipulate RDF objects in functions, loops and conditional branches. | |
file | df026_AsNumpyArrays.py |
Read data from RDataFrame into Numpy arrays. | |
file | df027_SQliteDependencyOverVersion.C |
Plot the ROOT downloads based on the version reading a remote sqlite3 file. | |
file | df028_SQliteIPLocation.C |
Plot the location of ROOT downloads reading a remote sqlite3 file. | |
file | df029_SQlitePlatformDistribution.C |
Use RDataFrame to display data about ROOT downloads. | |
file | df030_SQliteVersionsOfROOT.C |
Read an sqlite3 databases with RDataFrame and plot statistics on ROOT downloads. | |
file | df031_Stats.C |
Use the Stats action to extract the statistics of a column. | |
file | df031_Stats.py |
Use the Stats action to extract the statistics of a column. | |
file | df032_RDFFromNumpy.py |
Read data from Numpy arrays into RDataFrame. | |
file | df033_Describe.py |
Get information about the dataframe with the convenience method Describe. | |
file | df034_SaveGraph.C |
Basic SaveGraph usage. | |
file | df034_SaveGraph.py |
Basic SaveGraph usage. | |
file | df035_RDFFromPandas.py |
Read data from Pandas Data Frame into RDataFrame. | |
file | df036_missingBranches.C |
| |
file | df037_TTreeEventMatching.C |
| |
file | df038_NumbaDeclare.py |
This tutorial illustrates how PyROOT supports declaring C++ callables from Python callables making them, for example, usable with RDataFrame. | |
file | df101_h1Analysis.C |
Show how to express ROOT's standard H1 analysis with RDataFrame. | |
file | df102_NanoAODDimuonAnalysis.C |
Show how NanoAOD files can be processed with RDataFrame. | |
file | df102_NanoAODDimuonAnalysis.py |
Show how NanoAOD files can be processed with RDataFrame. | |
file | df103_NanoAODHiggsAnalysis.C |
An example of complex analysis with RDataFrame: reconstructing the Higgs boson. | |
file | df103_NanoAODHiggsAnalysis.py |
An example of complex analysis with RDataFrame: reconstructing the Higgs boson. | |
file | df103_NanoAODHiggsAnalysis_python.h |
Header file with functions needed to execute the Python version of the NanoAOD Higgs tutorial. | |
file | df104_HiggsToTwoPhotons.py |
The Higgs to two photons analysis from the ATLAS Open Data 2020 release, with RDataFrame. | |
file | df105_WBosonAnalysis.py |
The W boson mass analysis from the ATLAS Open Data release of 2020, with RDataFrame. | |
file | df106_HiggsToFourLeptons.C |
The Higgs to four lepton analysis from the ATLAS Open Data release of 2020, with RDataFrame. | |
file | df106_HiggsToFourLeptons.py |
The Higgs to four lepton analysis from the ATLAS Open Data release of 2020, with RDataFrame. | |
file | df107_SingleTopAnalysis.py |
A single top analysis using the ATLAS Open Data release of 2020, with RDataFrame. | |
file | distrdf001_spark_connection.py |
Configure a Spark connection and fill two histograms distributedly. | |
file | distrdf002_dask_connection.py |
Configure a Dask connection and fill two histograms distributedly. | |
file | distrdf003_live_visualization.py |
Configure a Dask connection and visualize the filling of a 1D and 2D histograms distributedly. | |