Loads TTree/RNTuple clusters from one or more RDataFrames into RFlat2DMatrix buffers for ML training and validation.
At construction the loader scans the cluster boundaries of every provided RDataFrame and stores them as a flat list of RClusterRange objects. SplitDataset() then partitions those ranges into training and validation sets according to validationSplit.
(1 - validationSplit) fraction of entries goes to training. At most one cluster is split at the boundary.validationSplit) so both sets draw entries from every part of the dataset. ShuffleTrainingClusters() and ShuffleValidationClusters() re-order the cluster lists at the start of each epoch. A second shuffling step, at the entries level, happens inside LoadTrainingClusterInto() and LoadValidationClusterInto() when loading the data into the tensors.When any RDataFrame carries a filter, the true entry count is not known until the computation graph is executed. In this case SplitDataset() is a no-op and the split is discovered lazily inside LoadTrainingClusterInto() during the first epoch. After the first epoch FinaliseSplitDiscovery() marks the split as stable and all subsequent epochs use the same pre-computed ranges.
Definition at line 149 of file RClusterLoader.hxx.
Public Member Functions | |
| RClusterLoader (std::vector< ROOT::RDF::RNode > &rdfs, const std::vector< std::string > &cols, const std::vector< std::size_t > &vecSizes, float vecPadding, float validationSplit, bool shuffle, std::size_t setSeed) | |
| void | FinaliseSplitDiscovery () |
| Mark the train/val split as finalised after the first epoch. | |
| std::size_t | GetNmTotalClusters () const |
| std::size_t | GetNumChunkCols () const |
| std::size_t | GetNumTrainingClusters () const |
| std::size_t | GetNumTrainingEntries () const |
| std::size_t | GetNumValidationClusters () const |
| std::size_t | GetNumValidationEntries () const |
| const std::vector< RClusterRange > & | GetTrainingClusters () const |
| const std::vector< RClusterRange > & | GetValidationClusters () const |
| bool | IsSplitDiscovered () const |
| void | LoadClusterInto (RFlat2DMatrix &dest, std::size_t rdfIdx, std::uint64_t startRow, std::uint64_t endRow, std::size_t rowOffset=0) |
| std::size_t | LoadTrainingClusterInto (RFlat2DMatrix &dest, std::size_t rdfIdx, std::uint64_t startRow, std::uint64_t endRow, std::size_t rowOffset=0) |
| Load one training cluster and return the number of rows written. | |
| void | LoadValidationClusterInto (RFlat2DMatrix &dest, std::size_t rdfIdx, std::uint64_t startRow, std::uint64_t endRow, std::size_t rowOffset=0) |
Load one validation cluster into dest starting at rowOffset. | |
| void | ShuffleTrainingClusters (std::size_t epochIdx) |
| Re-order training clusters for the upcoming epoch. | |
| void | ShuffleValidationClusters (std::size_t epochIdx) |
| Re-order validation clusters for the upcoming epoch. | |
| void | SplitDataset () |
| Distribute the clusters into training and validation datasets No-op for filtered RDataFrames, the split is discovered lazily during the first epoch. | |
Private Attributes | |
| std::size_t | fAccumulatedFilteredForTrain {0} |
| std::vector< RClusterRange > | fAllClusters |
| std::vector< std::string > | fCols |
| bool | fIsFiltered {false} |
| std::size_t | fNumChunkCols |
| std::size_t | fNumCols |
| std::size_t | fNumTrainingEntries {0} |
| std::size_t | fNumValidationEntries {0} |
| std::vector< ROOT::RDF::RNode > & | fRdfs |
| std::vector< std::size_t > | fRdfSizes |
| std::size_t | fSetSeed |
| bool | fShuffle |
| bool | fSplitDiscovered {false} |
| std::size_t | fSumVecSizes |
| std::size_t | fTotalEntries {0} |
| std::vector< RClusterRange > | fTrainingClusters |
| std::vector< RClusterRange > | fValidationClusters |
| float | fValidationSplit |
| float | fVecPadding |
| std::vector< std::size_t > | fVecSizes |
#include <ROOT/ML/RClusterLoader.hxx>
|
inline |
Definition at line 177 of file RClusterLoader.hxx.
|
inline |
Mark the train/val split as finalised after the first epoch.
Definition at line 422 of file RClusterLoader.hxx.
|
inline |
Definition at line 447 of file RClusterLoader.hxx.
|
inline |
Definition at line 434 of file RClusterLoader.hxx.
|
inline |
Definition at line 442 of file RClusterLoader.hxx.
|
inline |
Definition at line 432 of file RClusterLoader.hxx.
|
inline |
Definition at line 446 of file RClusterLoader.hxx.
|
inline |
Definition at line 433 of file RClusterLoader.hxx.
|
inline |
Definition at line 436 of file RClusterLoader.hxx.
|
inline |
Definition at line 440 of file RClusterLoader.hxx.
|
inline |
Definition at line 428 of file RClusterLoader.hxx.
|
inline |
Definition at line 316 of file RClusterLoader.hxx.
|
inline |
Load one training cluster and return the number of rows written.
Unfiltered: delegates directly to LoadClusterInto() Filtered, epoch 1 (!fSplitDiscovered):
dest. -All subsequent epochs: delegates directly to LoadClusterInto() Definition at line 340 of file RClusterLoader.hxx.
|
inline |
Load one validation cluster into dest starting at rowOffset.
Definition at line 414 of file RClusterLoader.hxx.
|
inline |
Re-order training clusters for the upcoming epoch.
Definition at line 295 of file RClusterLoader.hxx.
|
inline |
Re-order validation clusters for the upcoming epoch.
Definition at line 307 of file RClusterLoader.hxx.
|
inline |
Distribute the clusters into training and validation datasets No-op for filtered RDataFrames, the split is discovered lazily during the first epoch.
Definition at line 217 of file RClusterLoader.hxx.
|
private |
Definition at line 174 of file RClusterLoader.hxx.
|
private |
Definition at line 164 of file RClusterLoader.hxx.
|
private |
Definition at line 153 of file RClusterLoader.hxx.
|
private |
Definition at line 172 of file RClusterLoader.hxx.
|
private |
Definition at line 162 of file RClusterLoader.hxx.
|
private |
Definition at line 160 of file RClusterLoader.hxx.
|
private |
Definition at line 169 of file RClusterLoader.hxx.
|
private |
Definition at line 170 of file RClusterLoader.hxx.
|
private |
Definition at line 151 of file RClusterLoader.hxx.
|
private |
Definition at line 152 of file RClusterLoader.hxx.
|
private |
Definition at line 158 of file RClusterLoader.hxx.
|
private |
Definition at line 157 of file RClusterLoader.hxx.
|
private |
Definition at line 173 of file RClusterLoader.hxx.
|
private |
Definition at line 161 of file RClusterLoader.hxx.
|
private |
Definition at line 168 of file RClusterLoader.hxx.
|
private |
Definition at line 165 of file RClusterLoader.hxx.
|
private |
Definition at line 166 of file RClusterLoader.hxx.
|
private |
Definition at line 156 of file RClusterLoader.hxx.
|
private |
Definition at line 155 of file RClusterLoader.hxx.
|
private |
Definition at line 154 of file RClusterLoader.hxx.