Dedicated namespace for reduction kernels. More...
Functions | |
template<int BlockSize_n> | |
__global__ void | Covariance2D (Size_t arraySize, float const *__restrict__ x, const float *__restrict__ y, double xMean, double yMean, double *__restrict__ output) |
Computes the covariance, variance of 'x', and variance of 'y' for a 2D data set. | |
template<int BlockSize_n, int Bins1_n, int Bins2_n, class Idx_t , class Elem_t , class Sum_t , class Counts_t > | |
__global__ void | SumBinwise2D (int arraySize, Idx_t const *__restrict__ x1, Idx_t const *__restrict__ x2, const Elem_t *__restrict__ arr, Sum_t *__restrict__ outputSum, Counts_t *__restrict__ outputCounts) |
Computes bin-wise sum and count of elements from the 'arr' array into separate output arrays based on indices provided in 'x1' and 'x2' arrays, using a 2D grid-stride loop approach. | |
template<int BlockSize_n, int ElemDim_n, class Elem_t , class Sum_t > | |
__global__ void | SumVectors (Size_t nElems, const Elem_t *__restrict__ arr, Sum_t *__restrict__ output) |
Performs a multi-block sum reduction on the input array arr . | |
Dedicated namespace for reduction kernels.
__global__ void RooFit::Detail::CudaKernels::Reducers::Covariance2D | ( | Size_t | arraySize, |
float const *__restrict__ | x, | ||
const float *__restrict__ | y, | ||
double | xMean, | ||
double | yMean, | ||
double *__restrict__ | output | ||
) |
Computes the covariance, variance of 'x', and variance of 'y' for a 2D data set.
The output array is of shape [ gridDim.x, 3 ]
(row-major flattened, meaning when iterating over the elements of the covariance matrix, the elements are contiguous). Only three output elements are needed to store the symmetric 2-by-2 covariance matrix.
BlockSize_n | Needs to be identical to the number of thread blocks. Has to be known statically for the size of the shared memory. |
[in] | arraySize | Size of the input arrays 'x' and 'y'. |
[in] | x | Input array containing 'x' data. |
[in] | y | Input array containing 'y' data. |
[in] | xMean | Mean of the 'x' data. |
[in] | yMean | Mean of the 'y' data. |
[out] | output | Output array to store computed covariance and variances. (Three values are stored per block: variance of 'x', covariance, variance of 'y'.) |
Definition at line 178 of file CudaKernels.cuh.
__global__ void RooFit::Detail::CudaKernels::Reducers::SumBinwise2D | ( | int | arraySize, |
Idx_t const *__restrict__ | x1, | ||
Idx_t const *__restrict__ | x2, | ||
const Elem_t *__restrict__ | arr, | ||
Sum_t *__restrict__ | outputSum, | ||
Counts_t *__restrict__ | outputCounts | ||
) |
Computes bin-wise sum and count of elements from the 'arr' array into separate output arrays based on indices provided in 'x1' and 'x2' arrays, using a 2D grid-stride loop approach.
The output arrays are of shape [ gridDim.x, Bins1_n, Bins2_n ]
(row-major flattened, meaning when iterating over the bins, the elements are contiguous).
Bins1_n | Number of bins in the first dimension. |
Bins2_n | Number of bins in the second dimension. |
BlockSize_n | Needs to be identical to the number of thread blocks. Has to be known statically for the size of the shared memory. |
Idx_t | Data type of index arrays x1 and x2 . |
Elem_t | Data type of the input array arr elements. |
Sum_t | Data type for bin-wise sum output. |
Counts_t | Data type for bin-wise count output. |
[in] | arraySize | Size of the input arrays x1 , x2 , and arr . |
[in] | x1 | Input array containing bin indices for the first dimension. |
[in] | x2 | Input array containing bin indices for the second dimension. |
[in] | arr | Input array containing elements to be summed. |
[out] | outputSum | Output array for storing bin-wise sum of elements. |
[out] | outputCounts | Output array for storing bin-wise count of elements. |
Definition at line 113 of file CudaKernels.cuh.
__global__ void RooFit::Detail::CudaKernels::Reducers::SumVectors | ( | Size_t | nElems, |
const Elem_t *__restrict__ | arr, | ||
Sum_t *__restrict__ | output | ||
) |
Performs a multi-block sum reduction on the input array arr
.
The input array can either be scalar, or a flattened vector array with inner dimension ElemDim_n
.
The output array is of shape [ gridDim.x, ElemDim_n ]
(row-major flattened, meaning when iterating over the bins, the elements are contiguous).
ElemDim_n | Inner dimension of the flattened input array. Set to 1 when summing scalar values. |
BlockSize_n | Needs to be identical to the number of thread blocks. Has to be known statically for the size of the shared memory. |
Elem_t | Data type of the input array elements. |
Sum_t | Data type of the output and shared memory array elements. |
[in] | nElems | Number of elements in the input array. |
[in] | arr | Input array containing data to be summed. |
[out] | output | Output array containing partial sums for each block. (Each block's sum is stored in 'output[ElemDim_n * blockIdx.x]'). |
Definition at line 51 of file CudaKernels.cuh.