Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
ROOT::RDF Namespace Reference

Namespaces

namespace  Experimental
 
namespace  Internal
 

Classes

class  RArrowDS
 RDataFrame data source class to interface with Apache Arrow. More...
 
class  RCsvDS
 RDataFrame data source class for reading CSV files. More...
 
class  RCutFlowReport
 
class  RDataSource
 RDataSource defines an API that RDataFrame can use to read arbitrary data formats. More...
 
class  RDFDescription
 A DFDescription contains useful information about a given RDataFrame computation graph. More...
 
class  RDFTypeNameGetter
 Helper to get the contents of a given column. More...
 
class  RDisplay
 This class is the textual representation of the content of a columnar dataset. More...
 
class  RInterface
 The public interface to the RDataFrame federation of classes. More...
 
class  RInterfaceBase
 
class  RLazyDS
 A RDataSource implementation which is built on top of result proxies. More...
 
class  RResultHandle
 A type-erased version of RResultPtr and RResultMap. More...
 
class  RResultPtr
 Smart pointer for the return type of actions. More...
 
class  RSampleInfo
 This type represents a sample identifier, to be used in conjunction with RDataFrame features such as DefinePerSample() and per-sample callbacks. More...
 
struct  RSnapshotOptions
 A collection of options to steer the creation of the dataset on file. More...
 
class  RSqliteDS
 RSqliteDS is an RDF data source implementation for SQL result sets from sqlite3 files. More...
 
class  RTrivialDS
 A simple data-source implementation, for demo purposes. More...
 
class  RVariationsDescription
 A descriptor for the systematic variations known to a given RDataFrame node. More...
 
class  TCutInfo
 
class  TH1DModel
 A struct which stores the parameters of a TH1D. More...
 
class  TH2DModel
 A struct which stores the parameters of a TH2D. More...
 
class  TH3DModel
 A struct which stores the parameters of a TH3D. More...
 
class  THnDModel
 A struct which stores the parameters of a THnD. More...
 
class  TProfile1DModel
 A struct which stores the parameters of a TProfile. More...
 
class  TProfile2DModel
 A struct which stores the parameters of a TProfile2D. More...
 
class  VerifyValidColumnType
 Helper to determine if a given Column is a supported type. More...
 

Typedefs

using ColumnNames_t = std::vector< std::string >
 
using RNode = RInterface<::ROOT::Detail::RDF::RNodeBase, void >
 
using SampleCallback_t = std::function< void(unsigned int, const ROOT::RDF::RSampleInfo &)>
 The type of a data-block callback, registered with an RDataFrame computation graph via e.g.
 
template<typename T >
using TResultProxy = RResultPtr< T >
 

Functions

template<typename NodeType >
RNode AsRNode (NodeType node)
 Cast a RDataFrame node to the common type ROOT::RDF::RNode.
 
RDataFrame FromArrow (std::shared_ptr< arrow::Table > table, std::vector< std::string > const &columnNames)
 Factory method to create a Apache Arrow RDataFrame.
 
RDataFrame FromCSV (std::string_view fileName, bool readHeaders=true, char delimiter=',', Long64_t linesChunkSize=-1LL, std::unordered_map< std::string, char > &&colTypes={})
 Factory method to create a CSV RDataFrame.
 
RDataFrame FromCSV (std::string_view fileName, const RCsvDS::ROptions &options)
 Factory method to create a CSV RDataFrame.
 
RDataFrame FromSqlite (std::string_view fileName, std::string_view query)
 Factory method to create a SQlite RDataFrame.
 
template<typename T >
std::shared_ptr< arrow::ChunkedArray > getData (T p)
 
int getNRecords (std::shared_ptr< arrow::Table > &table, std::vector< std::string > &columnNames)
 
template<typename... ColumnTypes>
RDataFrame MakeLazyDataFrame (std::pair< std::string, RResultPtr< std::vector< ColumnTypes > > > &&... colNameProxyPairs)
 Factory method to create a Lazy RDataFrame.
 
RInterface< RDFDetail::RLoopManagerMakeTrivialDataFrame ()
 Make a RDF wrapping a RTrivialDS with infinite entries, for demo purposes.
 
RInterface< RDFDetail::RLoopManagerMakeTrivialDataFrame (ULong64_t size, bool skipEvenEntries=false)
 Make a RDF wrapping a RTrivialDS with the specified amount of entries.
 
template<typename F , typename Args = typename ROOT::TypeTraits::CallableTraits<std::decay_t<F>>::arg_types_nodecay, typename Ret = typename ROOT::TypeTraits::CallableTraits<std::decay_t<F>>::ret_type>
auto Not (F &&f) -> decltype(RDFInternal::NotHelper(Args(), std::forward< F >(f)))
 Given a callable with signature bool(T1, T2, ...) return a callable with same signature that returns the negated result.
 
template<class T1 , class T2 >
bool operator!= (const RResultPtr< T1 > &lhs, const RResultPtr< T2 > &rhs)
 
template<class T1 >
bool operator!= (const RResultPtr< T1 > &lhs, std::nullptr_t rhs)
 
template<class T1 >
bool operator!= (std::nullptr_t lhs, const RResultPtr< T1 > &rhs)
 
std::ostream & operator<< (std::ostream &os, const RDFDescription &description)
 
template<class T1 , class T2 >
bool operator== (const RResultPtr< T1 > &lhs, const RResultPtr< T2 > &rhs)
 
template<class T1 >
bool operator== (const RResultPtr< T1 > &lhs, std::nullptr_t rhs)
 
template<class T1 >
bool operator== (std::nullptr_t lhs, const RResultPtr< T1 > &rhs)
 
template<std::size_t N, typename T , typename F >
auto PassAsVec (F &&f) -> RDFInternal::PassAsVecHelper< std::make_index_sequence< N >, T, F >
 PassAsVec is a callable generator that allows passing N variables of type T to a function as a single collection.
 
unsigned int RunGraphs (std::vector< RResultHandle > handles)
 Trigger the event loop of multiple RDataFrames concurrently.
 
template<typename NodeType >
std::string SaveGraph (NodeType node)
 Create a graphviz representation of the dataframe computation graph, return it as a string.
 
template<typename NodeType >
void SaveGraph (NodeType node, const std::string &outputFile)
 Create a graphviz representation of the dataframe computation graph, write it to the specified file.
 
void splitInEqualRanges (std::vector< std::pair< ULong64_t, ULong64_t > > &ranges, int nRecords, unsigned int nSlots)
 

Typedef Documentation

◆ ColumnNames_t

typedef std::vector< std::string > ROOT::RDF::ColumnNames_t

Definition at line 35 of file RInterfaceBase.hxx.

◆ RNode

◆ SampleCallback_t

using ROOT::RDF::SampleCallback_t = typedef std::function<void(unsigned int, const ROOT::RDF::RSampleInfo &)>

The type of a data-block callback, registered with an RDataFrame computation graph via e.g.

DefinePerSample() or by certain actions (e.g. Snapshot()).

Definition at line 134 of file RSampleInfo.hxx.

◆ TResultProxy

template<typename T >
using ROOT::RDF::TResultProxy = typedef RResultPtr<T>

Definition at line 18 of file TResultProxy.hxx.

Function Documentation

◆ AsRNode()

template<typename NodeType >
RNode ROOT::RDF::AsRNode ( NodeType  node)

Cast a RDataFrame node to the common type ROOT::RDF::RNode.

Parameters
[in]nodeAny node of a RDataFrame graph

Definition at line 158 of file RDFHelpers.hxx.

◆ FromArrow()

RDataFrame ROOT::RDF::FromArrow ( std::shared_ptr< arrow::Table >  table,
std::vector< std::string > const &  columnNames 
)

Factory method to create a Apache Arrow RDataFrame.

Creates a RDataFrame using an arrow::Table as input.

Parameters
[in]tablean apache::arrow table to use as a source / to observe.
[in]columnNamesthe name of the columns to use In case columnNames is empty, we use all the columns found in the table

Definition at line 606 of file RArrowDS.cxx.

◆ FromCSV() [1/2]

RDataFrame ROOT::RDF::FromCSV ( std::string_view  fileName,
bool  readHeaders = true,
char  delimiter = ',',
Long64_t  linesChunkSize = -1LL,
std::unordered_map< std::string, char > &&  colTypes = {} 
)

Factory method to create a CSV RDataFrame.

Parameters
[in]fileNamePath of the CSV file.
[in]readHeaderstrue if the CSV file contains headers as first row, false otherwise (default true).
[in]delimiterDelimiter character (default ',').
[in]linesChunkSizebunch of lines to read, use -1 to read all
[in]colTypesAllow user to specify custom column types, accepts an unordered map with keys being column type, values being type alias ('O' for boolean, 'D' for double, 'L' for Long64_t, 'T' for std::string)

Definition at line 666 of file RCsvDS.cxx.

◆ FromCSV() [2/2]

RDataFrame ROOT::RDF::FromCSV ( std::string_view  fileName,
const RCsvDS::ROptions options 
)

Factory method to create a CSV RDataFrame.

Parameters
[in]fileNamePath of the CSV file.
[in]optionsFile parsing settings.

Definition at line 660 of file RCsvDS.cxx.

◆ FromSqlite()

RDataFrame ROOT::RDF::FromSqlite ( std::string_view  fileName,
std::string_view  query 
)

Factory method to create a SQlite RDataFrame.

Parameters
[in]fileNamePath of the sqlite file.
[in]querySQL query that defines the data set.

Definition at line 538 of file RSqliteDS.cxx.

◆ getData()

template<typename T >
std::shared_ptr< arrow::ChunkedArray > ROOT::RDF::getData ( p)

Definition at line 542 of file RArrowDS.cxx.

◆ getNRecords()

int ROOT::RDF::getNRecords ( std::shared_ptr< arrow::Table > &  table,
std::vector< std::string > &  columnNames 
)

Definition at line 535 of file RArrowDS.cxx.

◆ MakeLazyDataFrame()

template<typename... ColumnTypes>
RDataFrame ROOT::RDF::MakeLazyDataFrame ( std::pair< std::string, RResultPtr< std::vector< ColumnTypes > > > &&...  colNameProxyPairs)

Factory method to create a Lazy RDataFrame.

Parameters
[in]colNameProxyPairsthe series of pairs to describe the columns of the data source, first element of the pair is the name of the column and the second is the RResultPtr to the column in the parent data frame.

Definition at line 29 of file RLazyDS.hxx.

◆ MakeTrivialDataFrame() [1/2]

RInterface< RDFDetail::RLoopManager > ROOT::RDF::MakeTrivialDataFrame ( )

Make a RDF wrapping a RTrivialDS with infinite entries, for demo purposes.

Definition at line 126 of file RTrivialDS.cxx.

◆ MakeTrivialDataFrame() [2/2]

RInterface< RDFDetail::RLoopManager > ROOT::RDF::MakeTrivialDataFrame ( ULong64_t  size,
bool  skipEvenEntries = false 
)

Make a RDF wrapping a RTrivialDS with the specified amount of entries.

Constructing an RDataFrame as RDataFrame(nEntries) is a superior alternative. If size is std::numeric_limits<ULong64_t>::max(), this acts as an infinite data-source: it returns entries from GetEntryRanges forever or until a Range stops the event loop (for test purposes).

Definition at line 119 of file RTrivialDS.cxx.

◆ Not()

template<typename F , typename Args = typename ROOT::TypeTraits::CallableTraits<std::decay_t<F>>::arg_types_nodecay, typename Ret = typename ROOT::TypeTraits::CallableTraits<std::decay_t<F>>::ret_type>
auto ROOT::RDF::Not ( F &&  f) -> decltype(RDFInternal::NotHelper(Args(), std::forward<F>(f)))

Given a callable with signature bool(T1, T2, ...) return a callable with same signature that returns the negated result.

The callable must have one single non-template definition of operator(). This is a limitation with respect to std::not_fn, required for interoperability with RDataFrame.

Definition at line 83 of file RDFHelpers.hxx.

◆ operator!=() [1/3]

template<class T1 , class T2 >
bool ROOT::RDF::operator!= ( const RResultPtr< T1 > &  lhs,
const RResultPtr< T2 > &  rhs 
)

Definition at line 407 of file RResultPtr.hxx.

◆ operator!=() [2/3]

template<class T1 >
bool ROOT::RDF::operator!= ( const RResultPtr< T1 > &  lhs,
std::nullptr_t  rhs 
)

Definition at line 425 of file RResultPtr.hxx.

◆ operator!=() [3/3]

template<class T1 >
bool ROOT::RDF::operator!= ( std::nullptr_t  lhs,
const RResultPtr< T1 > &  rhs 
)

Definition at line 431 of file RResultPtr.hxx.

◆ operator<<()

std::ostream & ROOT::RDF::operator<< ( std::ostream &  os,
const RDFDescription description 
)

Definition at line 34 of file RDFDescription.cxx.

◆ operator==() [1/3]

template<class T1 , class T2 >
bool ROOT::RDF::operator== ( const RResultPtr< T1 > &  lhs,
const RResultPtr< T2 > &  rhs 
)

Definition at line 401 of file RResultPtr.hxx.

◆ operator==() [2/3]

template<class T1 >
bool ROOT::RDF::operator== ( const RResultPtr< T1 > &  lhs,
std::nullptr_t  rhs 
)

Definition at line 413 of file RResultPtr.hxx.

◆ operator==() [3/3]

template<class T1 >
bool ROOT::RDF::operator== ( std::nullptr_t  lhs,
const RResultPtr< T1 > &  rhs 
)

Definition at line 419 of file RResultPtr.hxx.

◆ PassAsVec()

template<std::size_t N, typename T , typename F >
auto ROOT::RDF::PassAsVec ( F &&  f) -> RDFInternal::PassAsVecHelper<std::make_index_sequence<N>, T, F>

PassAsVec is a callable generator that allows passing N variables of type T to a function as a single collection.

PassAsVec<N, T>(func) returns a callable that takes N arguments of type T, passes them down to function func as an initializer list {t1, t2, t3,..., tN} and returns whatever f({t1, t2, t3, ..., tN}) returns.

Note that for this to work with RDataFrame the type of all columns that the callable is applied to must be exactly T. Example usage together with RDataFrame ("varX" columns must all be float variables):

bool myVecFunc(std::vector<float> args);
df.Filter(PassAsVec<3, float>(myVecFunc), {"var1", "var2", "var3"});

Definition at line 103 of file RDFHelpers.hxx.

◆ RunGraphs()

unsigned int ROOT::RDF::RunGraphs ( std::vector< RResultHandle handles)

Trigger the event loop of multiple RDataFrames concurrently.

Parameters
[in]handlesA vector of RResultHandles
Returns
The number of distinct computation graphs that have been processed

This function triggers the event loop of all computation graphs which relate to the given RResultHandles. The advantage compared to running the event loop implicitly by accessing the RResultPtr is that the event loops will run concurrently. Therefore, the overall computation of all results is generally more efficient. It should be noted that user-defined operations (e.g., Filters and Defines) of the different RDataFrame graphs are assumed to be safe to call concurrently.

ROOT::RDataFrame df1("tree1", "file1.root");
auto r1 = df1.Histo1D("var1");
ROOT::RDataFrame df2("tree2", "file2.root");
auto r2 = df2.Sum("var2");
// RResultPtr -> RResultHandle conversion is automatic
ROOT's RDataFrame offers a modern, high-level interface for analysis of data stored in TTree ,...
unsigned int RunGraphs(std::vector< RResultHandle > handles)
Trigger the event loop of multiple RDataFrames concurrently.

Definition at line 66 of file RDFHelpers.cxx.

◆ SaveGraph() [1/2]

template<typename NodeType >
std::string ROOT::RDF::SaveGraph ( NodeType  node)

Create a graphviz representation of the dataframe computation graph, return it as a string.

Parameters
[in]nodeany node of the graph. Called on the head (first) node, it prints the entire graph. Otherwise, only the branch the node belongs to.

The output can be displayed with a command akin to dot -Tpng output.dot > output.png && open output.png.

Note that "hanging" Defines, i.e. Defines without downstream nodes, will not be displayed by SaveGraph as they are effectively optimized away from the computation graph.

Note that SaveGraph is not thread-safe and must not be called concurrently from different threads.

Definition at line 120 of file RDFHelpers.hxx.

◆ SaveGraph() [2/2]

template<typename NodeType >
void ROOT::RDF::SaveGraph ( NodeType  node,
const std::string &  outputFile 
)

Create a graphviz representation of the dataframe computation graph, write it to the specified file.

Parameters
[in]nodeany node of the graph. Called on the head (first) node, it prints the entire graph. Otherwise, only the branch the node belongs to.
[in]outputFilefile where to save the representation.

The output can be displayed with a command akin to dot -Tpng output.dot > output.png && open output.png.

Note that "hanging" Defines, i.e. Defines without downstream nodes, will not be displayed by SaveGraph as they are effectively optimized away from the computation graph.

Note that SaveGraph is not thread-safe and must not be called concurrently from different threads.

Definition at line 139 of file RDFHelpers.hxx.

◆ splitInEqualRanges()

void ROOT::RDF::splitInEqualRanges ( std::vector< std::pair< ULong64_t, ULong64_t > > &  ranges,
int  nRecords,
unsigned int  nSlots 
)

Definition at line 519 of file RArrowDS.cxx.