Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
RooFit::Detail::CudaKernels::Reducers Namespace Reference

Dedicated namespace for reduction kernels. More...

Functions

template<int BlockSize_n>
__global__ void Covariance2D (Size_t arraySize, float const *__restrict__ x, const float *__restrict__ y, double xMean, double yMean, double *__restrict__ output)
 Computes the covariance, variance of 'x', and variance of 'y' for a 2D data set.
 
template<int BlockSize_n, int Bins1_n, int Bins2_n, class Idx_t , class Elem_t , class Sum_t , class Counts_t >
__global__ void SumBinwise2D (int arraySize, Idx_t const *__restrict__ x1, Idx_t const *__restrict__ x2, const Elem_t *__restrict__ arr, Sum_t *__restrict__ outputSum, Counts_t *__restrict__ outputCounts)
 Computes bin-wise sum and count of elements from the 'arr' array into separate output arrays based on indices provided in 'x1' and 'x2' arrays, using a 2D grid-stride loop approach.
 
template<int BlockSize_n, int ElemDim_n, class Elem_t , class Sum_t >
__global__ void SumVectors (Size_t nElems, const Elem_t *__restrict__ arr, Sum_t *__restrict__ output)
 Performs a multi-block sum reduction on the input array arr.
 

Detailed Description

Dedicated namespace for reduction kernels.

Function Documentation

◆ Covariance2D()

template<int BlockSize_n>
__global__ void RooFit::Detail::CudaKernels::Reducers::Covariance2D ( Size_t  arraySize,
float const *__restrict__  x,
const float *__restrict__  y,
double  xMean,
double  yMean,
double *__restrict__  output 
)

Computes the covariance, variance of 'x', and variance of 'y' for a 2D data set.

The output array is of shape [ gridDim.x, 3 ] (row-major flattened, meaning when iterating over the elements of the covariance matrix, the elements are contiguous). Only three output elements are needed to store the symmetric 2-by-2 covariance matrix.

Template Parameters
BlockSize_nNeeds to be identical to the number of thread blocks. Has to be known statically for the size of the shared memory.
Parameters
[in]arraySizeSize of the input arrays 'x' and 'y'.
[in]xInput array containing 'x' data.
[in]yInput array containing 'y' data.
[in]xMeanMean of the 'x' data.
[in]yMeanMean of the 'y' data.
[out]outputOutput array to store computed covariance and variances. (Three values are stored per block: variance of 'x', covariance, variance of 'y'.)

Definition at line 178 of file CudaKernels.cuh.

◆ SumBinwise2D()

template<int BlockSize_n, int Bins1_n, int Bins2_n, class Idx_t , class Elem_t , class Sum_t , class Counts_t >
__global__ void RooFit::Detail::CudaKernels::Reducers::SumBinwise2D ( int  arraySize,
Idx_t const *__restrict__  x1,
Idx_t const *__restrict__  x2,
const Elem_t *__restrict__  arr,
Sum_t *__restrict__  outputSum,
Counts_t *__restrict__  outputCounts 
)

Computes bin-wise sum and count of elements from the 'arr' array into separate output arrays based on indices provided in 'x1' and 'x2' arrays, using a 2D grid-stride loop approach.

The output arrays are of shape [ gridDim.x, Bins1_n, Bins2_n ] (row-major flattened, meaning when iterating over the bins, the elements are contiguous).

Template Parameters
Bins1_nNumber of bins in the first dimension.
Bins2_nNumber of bins in the second dimension.
BlockSize_nNeeds to be identical to the number of thread blocks. Has to be known statically for the size of the shared memory.
Idx_tData type of index arrays x1 and x2.
Elem_tData type of the input array arr elements.
Sum_tData type for bin-wise sum output.
Counts_tData type for bin-wise count output.
Parameters
[in]arraySizeSize of the input arrays x1, x2, and arr.
[in]x1Input array containing bin indices for the first dimension.
[in]x2Input array containing bin indices for the second dimension.
[in]arrInput array containing elements to be summed.
[out]outputSumOutput array for storing bin-wise sum of elements.
[out]outputCountsOutput array for storing bin-wise count of elements.

Definition at line 113 of file CudaKernels.cuh.

◆ SumVectors()

template<int BlockSize_n, int ElemDim_n, class Elem_t , class Sum_t >
__global__ void RooFit::Detail::CudaKernels::Reducers::SumVectors ( Size_t  nElems,
const Elem_t *__restrict__  arr,
Sum_t *__restrict__  output 
)

Performs a multi-block sum reduction on the input array arr.

The input array can either be scalar, or a flattened vector array with inner dimension ElemDim_n.

The output array is of shape [ gridDim.x, ElemDim_n ] (row-major flattened, meaning when iterating over the bins, the elements are contiguous).

Template Parameters
ElemDim_nInner dimension of the flattened input array. Set to 1 when summing scalar values.
BlockSize_nNeeds to be identical to the number of thread blocks. Has to be known statically for the size of the shared memory.
Elem_tData type of the input array elements.
Sum_tData type of the output and shared memory array elements.
Parameters
[in]nElemsNumber of elements in the input array.
[in]arrInput array containing data to be summed.
[out]outputOutput array containing partial sums for each block. (Each block's sum is stored in 'output[ElemDim_n * blockIdx.x]').

Definition at line 51 of file CudaKernels.cuh.