Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
RooAbsDataHelper.h
Go to the documentation of this file.
1/*****************************************************************************
2 * Project: RooFit *
3 * Package: RooFitCore *
4 * Authors: *
5 * WV, Wouter Verkerke, UC Santa Barbara, verkerke@slac.stanford.edu *
6 * DK, David Kirkby, UC Irvine, dkirkby@uci.edu *
7 * *
8 * Copyright (c) 2000-2021, Regents of the University of California *
9 * and Stanford University. All rights reserved. *
10 * *
11 * Redistribution and use in source and binary forms, *
12 * with or without modification, are permitted according to the terms *
13 * listed in LICENSE (http://roofit.sourceforge.net/license.txt) *
14 *****************************************************************************/
15/// Create RooDataSet/RooDataHist from RDataFrame.
16/// \date Mar 2021
17/// \author Stephan Hageboeck (CERN)
18#ifndef ROOABSDATAHELPER
19#define ROOABSDATAHELPER
20
21#include <RooRealVar.h>
22#include <RooArgSet.h>
23#include <RooDataSet.h>
24#include <RooDataHist.h>
25#include <RooMsgService.h>
26
27#include <ROOT/RDataFrame.hxx>
29#include <TROOT.h>
30
31#include <vector>
32#include <mutex>
33#include <memory>
34#include <cstddef>
35#include <string>
36#include <stdexcept>
37
38class TTreeReader;
39
40/// This is a helper for an RDataFrame action, which fills RooFit data classes.
41///
42/// \tparam DataSet_t Either RooDataSet or RooDataHist.
43///
44/// To construct RooDataSet / RooDataHist within RDataFrame
45/// - Construct one of the two action helpers RooDataSetHelper or RooDataHistHelper. Pass constructor arguments
46/// to RooAbsDataHelper::RooAbsDataHelper() as for the original classes.
47/// The arguments are forwarded to the actual data classes without any changes.
48/// - Book the helper as an RDataFrame action. Here, the RDataFrame column types have to be passed as template parameters.
49/// - Pass the column names to the Book action. These are matched by position to the variables of the dataset.
50///
51/// All arguments passed to are forwarded to RooDataSet::RooDataSet() / RooDataHist::RooDataHist().
52///
53/// #### Usage example:
54/// ```
55/// RooRealVar x("x", "x", -5., 5.);
56/// RooRealVar y("y", "y", -50., 50.);
57/// auto myDataSet = rdataframe.Book<double, double>(
58/// RooDataSetHelper{"dataset", // Name (directly forwarded to RooDataSet::RooDataSet())
59/// "Title of dataset", // Title ( ~ " ~ )
60/// RooArgSet(x, y) }, // Variables to create in dataset
61/// {"x", "y"} // Column names from RDataFrame
62/// );
63///
64/// ```
65/// \warning Variables in the dataset and columns in RDataFrame are **matched by position, not by name**.
66/// This enables the easy exchanging of columns that should be filled into the dataset.
67template<class DataSet_t>
68class RooAbsDataHelper : public ROOT::Detail::RDF::RActionImpl<RooAbsDataHelper<DataSet_t>> {
69public:
70 using Result_t = DataSet_t;
71
72private:
73 std::shared_ptr<DataSet_t> _dataset;
74 std::mutex _mutex_dataset;
75 std::size_t _numInvalid = 0;
76
77 std::vector<std::vector<double>> _events; // One vector of values per data-processing slot
78 const std::size_t _eventSize; // Number of variables in dataset
79
80public:
81
82 /// Construct a helper to create RooDataSet/RooDataHist.
83 /// \tparam Args_t Parameter pack of arguments.
84 /// \param args Constructor arguments for RooDataSet::RooDataSet() or RooDataHist::RooDataHist().
85 /// All arguments will be forwarded as they are.
86 template<typename... Args_t>
87 RooAbsDataHelper(Args_t&&... args) :
88 _dataset{ new DataSet_t(std::forward<Args_t>(args)...) },
89 _eventSize{ _dataset->get()->size() }
90 {
91 const auto nSlots = ROOT::IsImplicitMTEnabled() ? ROOT::GetThreadPoolSize() : 1;
92 _events.resize(nSlots);
93 }
94
95
96 /// Move constructor. It transfers ownership of the internal RooAbsData object.
98 _dataset{ std::move(other._dataset) },
99 _events{ std::move(other._events) },
100 _eventSize{ other._eventSize }
101 {
102
103 }
104
105 /// Copy is discouraged.
106 /// Use `rdataframe.Book<...>(std::move(absDataHelper), ...)` instead.
108 /// Return internal dataset/hist.
109 std::shared_ptr<DataSet_t> GetResultPtr() const { return _dataset; }
110 /// RDataFrame interface method. Nothing has to be initialised.
111 void Initialize() {}
112 /// RDataFrame interface method. No tasks.
113 void InitTask(TTreeReader *, unsigned int) {}
114 /// RDataFrame interface method.
115 std::string GetActionName() { return "RooDataSetHelper"; }
116
117 /// Method that RDataFrame calls to pass a new event.
118 ///
119 /// \param slot When IMT is used, this is a number in the range [0, nSlots) to fill lock free.
120 /// \param values x, y, z, ... coordinates of the event.
121 template <typename... ColumnTypes>
122 void Exec(unsigned int slot, ColumnTypes... values)
123 {
124 if (sizeof...(values) != _eventSize) {
125 throw std::invalid_argument(std::string("RooDataSet can hold ")
126 + std::to_string(_eventSize)
127 + " variables per event, but RDataFrame passed "
128 + std::to_string(sizeof...(values))
129 + " columns.");
130 }
131
132 auto& vector = _events[slot];
133 for (auto&& val : {values...}) {
134 vector.push_back(val);
135 }
136
137 if (vector.size() > 1024 && _mutex_dataset.try_lock()) {
138 const std::lock_guard<std::mutex> guard(_mutex_dataset, std::adopt_lock_t());
139 FillDataSet(vector, _eventSize);
140 vector.clear();
141 }
142 }
143
144 /// Empty all buffers into the dataset/hist to finish processing.
145 void Finalize() {
146 for (auto& vector : _events) {
147 FillDataSet(vector, _eventSize);
148 vector.clear();
149 }
150
151 if (_numInvalid>0) {
152 const auto prefix = std::string(_dataset->ClassName()) + "Helper::Finalize(" + _dataset->GetName() + ") ";
153 oocoutW(nullptr, DataHandling) << prefix << "Ignored " << _numInvalid << " out-of-range events\n";
154 }
155 }
156
157
158private:
159 /// Append all `events` to the internal RooDataSet or increment the bins of a RooDataHist at the given locations.
160 ///
161 /// \param events Events to fill into `data`. The layout is assumed to be `(x, y, z, ...) (x, y, z, ...), (...)`.
162 /// \note The order of the variables inside `events` must be consistent with the order given in the constructor.
163 /// No matching by name is performed.
164 /// \param eventSize Size of a single event.
165 void FillDataSet(const std::vector<double>& events, unsigned int eventSize) {
166 if (events.empty())
167 return;
168
169 const RooArgSet& argSet = *_dataset->get();
170
171 for (std::size_t i = 0; i < events.size(); i += eventSize) {
172
173 // Creating a RooDataSet from an RDataFrame should be consistent with the
174 // creation from a TTree. The construction from a TTree discards entries
175 // outside the variable definition range, so we have to do that too (see
176 // also RooTreeDataStore::loadValues).
177
178 bool allOK = true;
179 for (std::size_t j=0; j < eventSize; ++j) {
180 auto * destArg = static_cast<RooAbsRealLValue*>(argSet[j]);
181 double sourceVal = events[i+j];
182
183 if (!destArg->inRange(sourceVal, nullptr)) {
184 _numInvalid++ ;
185 allOK = false;
186 const auto prefix = std::string(_dataset->ClassName()) + "Helper::FillDataSet(" + _dataset->GetName() + ") ";
187 if (_numInvalid < 5) {
188 // Unlike in the TreeVectorStore case, we don't log the event
189 // number here because we don't know it anyway, because of
190 // RDataFrame slots and multithreading.
191 oocoutI(nullptr, DataHandling) << prefix << "Skipping event because " << destArg->GetName()
192 << " cannot accommodate the value " << sourceVal << "\n";
193 } else if (_numInvalid == 5) {
194 oocoutI(nullptr, DataHandling) << prefix << "Skipping ...\n";
195 }
196 break ;
197 }
198 destArg->setVal(sourceVal);
199 }
200 if(allOK) {
201 _dataset->add(argSet);
202 }
203 }
204 }
205};
206
207/// Helper for creating a RooDataSet inside RDataFrame. \see RooAbsDataHelper
209/// Helper for creating a RooDataHist inside RDataFrame. \see RooAbsDataHelper
211
212#endif
size_t size(const MatrixT &matrix)
retrieve the size of a square matrix
#define oocoutW(o, a)
#define oocoutI(o, a)
Base class for action helpers, see RInterface::Book() for more information.
Storage_t const & get() const
Const access to the underlying stl container.
This is a helper for an RDataFrame action, which fills RooFit data classes.
void Exec(unsigned int slot, ColumnTypes... values)
Method that RDataFrame calls to pass a new event.
const std::size_t _eventSize
std::size_t _numInvalid
void Initialize()
RDataFrame interface method. Nothing has to be initialised.
std::mutex _mutex_dataset
std::shared_ptr< DataSet_t > GetResultPtr() const
Return internal dataset/hist.
std::string GetActionName()
RDataFrame interface method.
std::vector< std::vector< double > > _events
RooAbsDataHelper(Args_t &&... args)
Construct a helper to create RooDataSet/RooDataHist.
void Finalize()
Empty all buffers into the dataset/hist to finish processing.
RooAbsDataHelper(const RooAbsDataHelper &)=delete
Copy is discouraged.
void FillDataSet(const std::vector< double > &events, unsigned int eventSize)
Append all events to the internal RooDataSet or increment the bins of a RooDataHist at the given loca...
void InitTask(TTreeReader *, unsigned int)
RDataFrame interface method. No tasks.
std::shared_ptr< DataSet_t > _dataset
RooAbsDataHelper(RooAbsDataHelper &&other)
Move constructor. It transfers ownership of the internal RooAbsData object.
Abstract base class for objects that represent a real value that may appear on the left hand side of ...
RooArgSet is a container object that can hold multiple RooAbsArg objects.
Definition RooArgSet.h:55
A simple, robust and fast interface to read values from ROOT columnar datasets such as TTree,...
Definition TTreeReader.h:44
Bool_t IsImplicitMTEnabled()
Returns true if the implicit multi-threading in ROOT is enabled.
Definition TROOT.cxx:568
UInt_t GetThreadPoolSize()
Returns the size of ROOT's thread pool.
Definition TROOT.cxx:575