Dedicated namespace for reduction kernels. More...

Functions
template<int BlockSize_n>
__global__ void	Covariance2D (Size_t arraySize, float const __restrict__ x, const float __restrict__ y, double xMean, double yMean, double *__restrict__ output)
	Computes the covariance, variance of 'x', and variance of 'y' for a 2D data set.

template<int BlockSize_n, int Bins1_n, int Bins2_n, class Idx_t , class Elem_t , class Sum_t , class Counts_t >
__global__ void	SumBinwise2D (int arraySize, Idx_t const __restrict__ x1, Idx_t const __restrict__ x2, const Elem_t __restrict__ arr, Sum_t __restrict__ outputSum, Counts_t *__restrict__ outputCounts)
	Computes bin-wise sum and count of elements from the 'arr' array into separate output arrays based on indices provided in 'x1' and 'x2' arrays, using a 2D grid-stride loop approach.

template<int BlockSize_n, int ElemDim_n, class Elem_t , class Sum_t >
__global__ void	SumVectors (Size_t nElems, const Elem_t __restrict__ arr, Sum_t __restrict__ output)
	Performs a multi-block sum reduction on the input array `arr`.

Detailed Description

Dedicated namespace for reduction kernels.

Function Documentation

◆ Covariance2D()

template<int BlockSize_n>

__global__ void RooFit::Detail::CudaKernels::Reducers::Covariance2D	(	Size_t	arraySize,
		float const *__restrict__	x,
		const float *__restrict__	y,
		double	xMean,
		double	yMean,
		double *__restrict__	output
	)

Computes the covariance, variance of 'x', and variance of 'y' for a 2D data set.

The output array is of shape [ gridDim.x, 3 ] (row-major flattened, meaning when iterating over the elements of the covariance matrix, the elements are contiguous). Only three output elements are needed to store the symmetric 2-by-2 covariance matrix.

Template Parameters

BlockSize_n Needs to be identical to the number of thread blocks. Has to be known statically for the size of the shared memory.

Parameters

[in]	arraySize	Size of the input arrays 'x' and 'y'.
[in]	x	Input array containing 'x' data.
[in]	y	Input array containing 'y' data.
[in]	xMean	Mean of the 'x' data.
[in]	yMean	Mean of the 'y' data.
[out]	output	Output array to store computed covariance and variances. (Three values are stored per block: variance of 'x', covariance, variance of 'y'.)

Definition at line 178 of file CudaKernels.cuh.

◆ SumBinwise2D()

template<int BlockSize_n, int Bins1_n, int Bins2_n, class Idx_t , class Elem_t , class Sum_t , class Counts_t >

__global__ void RooFit::Detail::CudaKernels::Reducers::SumBinwise2D	(	int	arraySize,
		Idx_t const *__restrict__	x1,
		Idx_t const *__restrict__	x2,
		const Elem_t *__restrict__	arr,
		Sum_t *__restrict__	outputSum,
		Counts_t *__restrict__	outputCounts
	)

Computes bin-wise sum and count of elements from the 'arr' array into separate output arrays based on indices provided in 'x1' and 'x2' arrays, using a 2D grid-stride loop approach.

The output arrays are of shape [ gridDim.x, Bins1_n, Bins2_n ] (row-major flattened, meaning when iterating over the bins, the elements are contiguous).

Template Parameters

Bins1_n	Number of bins in the first dimension.
Bins2_n	Number of bins in the second dimension.
BlockSize_n	Needs to be identical to the number of thread blocks. Has to be known statically for the size of the shared memory.
Idx_t	Data type of index arrays `x1` and `x2`.
Elem_t	Data type of the input array `arr` elements.
Sum_t	Data type for bin-wise sum output.
Counts_t	Data type for bin-wise count output.

Parameters

[in]	arraySize	Size of the input arrays `x1`, `x2`, and `arr`.
[in]	x1	Input array containing bin indices for the first dimension.
[in]	x2	Input array containing bin indices for the second dimension.
[in]	arr	Input array containing elements to be summed.
[out]	outputSum	Output array for storing bin-wise sum of elements.
[out]	outputCounts	Output array for storing bin-wise count of elements.

Definition at line 113 of file CudaKernels.cuh.

◆ SumVectors()

template<int BlockSize_n, int ElemDim_n, class Elem_t , class Sum_t >

__global__ void RooFit::Detail::CudaKernels::Reducers::SumVectors	(	Size_t	nElems,
		const Elem_t *__restrict__	arr,
		Sum_t *__restrict__	output
	)

Performs a multi-block sum reduction on the input array arr.

The input array can either be scalar, or a flattened vector array with inner dimension ElemDim_n.

The output array is of shape [ gridDim.x, ElemDim_n ] (row-major flattened, meaning when iterating over the bins, the elements are contiguous).

Template Parameters

ElemDim_n	Inner dimension of the flattened input array. Set to `1` when summing scalar values.
BlockSize_n	Needs to be identical to the number of thread blocks. Has to be known statically for the size of the shared memory.
Elem_t	Data type of the input array elements.
Sum_t	Data type of the output and shared memory array elements.

Parameters

[in]	nElems	Number of elements in the input array.
[in]	arr	Input array containing data to be summed.
[out]	output	Output array containing partial sums for each block. (Each block's sum is stored in 'output[ElemDim_n * blockIdx.x]').

Definition at line 51 of file CudaKernels.cuh.

Functions

Detailed Description

Function Documentation

◆ Covariance2D()

◆ SumBinwise2D()

◆ SumVectors()