Building and loading the chunks from the blocks and chunks constructed in RChunkConstructor.
In this class the blocks are stiches together to form chunks that are loaded into memory. The blocks used to create each chunk comes from different parts of the dataset. This is achieved by shuffling the blocks before distributing them into chunks. The purpose of this process is to reduce bias during machine learning training by ensuring that the data is well mixed. The dataset is also spit into training and validation sets with the user-defined validation split fraction.
Definition at line 105 of file RChunkLoader.hxx.
Public Member Functions | |
| RChunkLoader (ROOT::RDF::RNode &rdf, const std::size_t chunkSize, const std::size_t blockSize, const float validationSplit, const std::vector< std::string > &cols, const std::vector< std::size_t > &vecSizes={}, const float vecPadding=0.0, bool shuffle=true, const std::size_t setSeed=0) | |
| void | CheckIfOverlap (RFlat2DMatrix &Tensor1, RFlat2DMatrix &Tensor2) |
| void | CheckIfUnique (RFlat2DMatrix &Tensor) |
| void | CreateTrainingChunksIntervals () |
| Create training chunks consisiting of block intervals of different types. | |
| void | CreateValidationChunksIntervals () |
| Create training chunks consisiting of block intervals of different types. | |
| std::size_t | GetNumTrainingChunks () |
| std::size_t | GetNumTrainingEntries () |
| std::size_t | GetNumValidationChunks () |
| std::size_t | GetNumValidationEntries () |
| std::vector< std::size_t > | GetTrainingChunkSizes () |
| std::vector< std::size_t > | GetValidationChunkSizes () |
| void | LoadTrainingChunk (RFlat2DMatrix &TrainChunkTensor, std::size_t chunk) |
| Load the nth chunk from the training dataset into a tensor. | |
| void | LoadValidationChunk (RFlat2DMatrix &ValidationChunkTensor, std::size_t chunk) |
| Load the nth chunk from the validation dataset into a tensor. | |
| void | ResetDataframe () |
| void | SplitDataset () |
| Distribute the blocks into training and validation datasets. | |
Private Attributes | |
| ROOT::RDF::RNode & | f_rdf |
| std::size_t | fBlockSize |
| std::size_t | fChunkSize |
| std::vector< std::string > | fCols |
| ROOT::RDF::RResultPtr< std::vector< ULong64_t > > | fEntries |
| bool | fNotFiltered |
| std::size_t | fNumChunkCols |
| std::size_t | fNumCols |
| std::size_t | fNumEntries |
| std::size_t | fNumTrainEntries |
| std::size_t | fNumValidationEntries |
| std::size_t | fSetSeed |
| bool | fShuffle |
| std::size_t | fSumVecSizes |
| std::unique_ptr< RFlat2DMatrixOperators > | fTensorOperators |
| std::unique_ptr< RChunkConstructor > | fTraining |
| std::unique_ptr< RChunkConstructor > | fValidation |
| float | fValidationSplit |
| std::size_t | fVecPadding |
| std::vector< std::size_t > | fVecSizes |
#include <ROOT/ML/RChunkLoader.hxx>
|
inline |
Definition at line 135 of file RChunkLoader.hxx.
|
inline |
Definition at line 441 of file RChunkLoader.hxx.
|
inline |
Definition at line 433 of file RChunkLoader.hxx.
|
inline |
Create training chunks consisiting of block intervals of different types.
Definition at line 248 of file RChunkLoader.hxx.
|
inline |
Create training chunks consisiting of block intervals of different types.
Definition at line 288 of file RChunkLoader.hxx.
|
inline |
Definition at line 464 of file RChunkLoader.hxx.
|
inline |
Definition at line 430 of file RChunkLoader.hxx.
|
inline |
Definition at line 466 of file RChunkLoader.hxx.
|
inline |
Definition at line 431 of file RChunkLoader.hxx.
|
inline |
Definition at line 427 of file RChunkLoader.hxx.
|
inline |
Definition at line 428 of file RChunkLoader.hxx.
|
inline |
Load the nth chunk from the training dataset into a tensor.
| [in] | TrainChunkTensor | RTensor for the training chunk |
| [in] | chunk | Index of the chunk in the dataset |
Definition at line 326 of file RChunkLoader.hxx.
|
inline |
Load the nth chunk from the validation dataset into a tensor.
| [in] | ValidationChunkTensor | RTensor for the validation chunk |
| [in] | chunk | Index of the chunk in the dataset |
Definition at line 378 of file RChunkLoader.hxx.
|
inline |
Definition at line 425 of file RChunkLoader.hxx.
|
inline |
Distribute the blocks into training and validation datasets.
Definition at line 173 of file RChunkLoader.hxx.
|
private |
Definition at line 121 of file RChunkLoader.hxx.
|
private |
Definition at line 109 of file RChunkLoader.hxx.
|
private |
Definition at line 108 of file RChunkLoader.hxx.
|
private |
Definition at line 122 of file RChunkLoader.hxx.
|
private |
Definition at line 129 of file RChunkLoader.hxx.
|
private |
Definition at line 126 of file RChunkLoader.hxx.
|
private |
Definition at line 115 of file RChunkLoader.hxx.
|
private |
Definition at line 123 of file RChunkLoader.hxx.
|
private |
Definition at line 107 of file RChunkLoader.hxx.
|
private |
Definition at line 117 of file RChunkLoader.hxx.
|
private |
Definition at line 118 of file RChunkLoader.hxx.
|
private |
Definition at line 124 of file RChunkLoader.hxx.
|
private |
Definition at line 127 of file RChunkLoader.hxx.
|
private |
Definition at line 113 of file RChunkLoader.hxx.
|
private |
Definition at line 119 of file RChunkLoader.hxx.
|
private |
Definition at line 131 of file RChunkLoader.hxx.
|
private |
Definition at line 132 of file RChunkLoader.hxx.
|
private |
Definition at line 110 of file RChunkLoader.hxx.
|
private |
Definition at line 114 of file RChunkLoader.hxx.
|
private |
Definition at line 112 of file RChunkLoader.hxx.