Logo ROOT   6.12/07
Reference Guide
List of all members | Public Member Functions | Static Public Member Functions | Private Member Functions | Private Attributes | Static Private Attributes | List of all members
TMVA::DNN::TCudaMatrix< AFloat > Class Template Reference

template<typename AFloat>
class TMVA::DNN::TCudaMatrix< AFloat >

TCudaMatrix Class.

The TCudaMatrix class represents matrices on a CUDA device. The elements of the matrix are stored in a TCudaDeviceBuffer object which takes care of the allocation and freeing of the device memory. TCudaMatrices are lightweight object, that means on assignment and copy creation only a shallow copy is performed and no new element buffer allocated. To perform a deep copy use the static Copy method of the TCuda architecture class.

The TCudaDeviceBuffer has an associated cuda stream, on which the data is transferred to the device. This stream can be accessed through the GetComputeStream member function and used to synchronize computations.

The TCudaMatrix class also holds static references to CUDA resources. Those are the cublas handle, a buffer of curand states for the generation of random numbers as well as a vector containing ones, which is used for summing column matrices using matrix-vector multiplication. The class also has a static buffer for returning results from the device.

Definition at line 98 of file CudaMatrix.h.

Public Member Functions

 TCudaMatrix ()
 
 TCudaMatrix (size_t i, size_t j)
 
 TCudaMatrix (const TMatrixT< Double_t > &)
 
 TCudaMatrix (TCudaDeviceBuffer< AFloat > buffer, size_t m, size_t n)
 
 TCudaMatrix (const TCudaMatrix &)=default
 
 TCudaMatrix (TCudaMatrix &&)=default
 
 ~TCudaMatrix ()=default
 
cudaStream_t GetComputeStream () const
 
const cublasHandle_t & GetCublasHandle () const
 
const AFloat * GetDataPointer () const
 
AFloat * GetDataPointer ()
 
size_t GetNcols () const
 
size_t GetNoElements () const
 
size_t GetNrows () const
 
 operator TMatrixT< Double_t > () const
 Convert cuda matrix to Root TMatrix. More...
 
TCudaDeviceReference< AFloat > operator() (size_t i, size_t j) const
 Access to elements of device matrices provided through TCudaDeviceReference class. More...
 
TCudaMatrixoperator= (const TCudaMatrix &)=default
 
TCudaMatrixoperator= (TCudaMatrix &&)=default
 
void SetComputeStream (cudaStream_t stream)
 
void Synchronize (const TCudaMatrix &) const
 Blocking synchronization with the associated compute stream, if it's not the default stream. More...
 

Static Public Member Functions

static curandState_t * GetCurandStatesPointer ()
 
static AFloat GetDeviceReturn ()
 Transfer the value in the device return buffer to the host. More...
 
static AFloat * GetDeviceReturnPointer ()
 Return device pointer to the device return buffer. More...
 
static AFloat * GetOnes ()
 
static void ResetDeviceReturn (AFloat value=0.0)
 Set the return buffer on the device to the specified value. More...
 

Private Member Functions

void InitializeCuda ()
 Initializes all shared devices resource and makes sure that a sufficient number of curand states are allocated on the device and initialized as well as that the one-vector for the summation over columns has the right size. More...
 
void InitializeCurandStates ()
 

Private Attributes

TCudaDeviceBuffer< AFloat > fElementBuffer
 
size_t fNCols
 
size_t fNRows
 

Static Private Attributes

static cublasHandle_t fCublasHandle
 
static curandState_t * fCurandStates
 
static AFloat * fDeviceReturn
 Buffer for kernel return values. More...
 
static size_t fInstances
 Current number of matrix instances. More...
 
static size_t fNCurandStates
 
static size_t fNOnes
 Current length of the one vector. More...
 
static AFloat * fOnes
 Vector used for summations of columns. More...
 

#include <TMVA/DNN/Architectures/Cuda/CudaMatrix.h>

Constructor & Destructor Documentation

◆ TCudaMatrix() [1/6]

template<typename AFloat>
TMVA::DNN::TCudaMatrix< AFloat >::TCudaMatrix ( )

◆ TCudaMatrix() [2/6]

template<typename AFloat>
TMVA::DNN::TCudaMatrix< AFloat >::TCudaMatrix ( size_t  i,
size_t  j 
)

◆ TCudaMatrix() [3/6]

template<typename AFloat>
TMVA::DNN::TCudaMatrix< AFloat >::TCudaMatrix ( const TMatrixT< Double_t > &  )

◆ TCudaMatrix() [4/6]

template<typename AFloat>
TMVA::DNN::TCudaMatrix< AFloat >::TCudaMatrix ( TCudaDeviceBuffer< AFloat >  buffer,
size_t  m,
size_t  n 
)

◆ TCudaMatrix() [5/6]

template<typename AFloat>
TMVA::DNN::TCudaMatrix< AFloat >::TCudaMatrix ( const TCudaMatrix< AFloat > &  )
default

◆ TCudaMatrix() [6/6]

template<typename AFloat>
TMVA::DNN::TCudaMatrix< AFloat >::TCudaMatrix ( TCudaMatrix< AFloat > &&  )
default

◆ ~TCudaMatrix()

template<typename AFloat>
TMVA::DNN::TCudaMatrix< AFloat >::~TCudaMatrix ( )
default

Member Function Documentation

◆ GetComputeStream()

template<typename AFloat >
cudaStream_t TMVA::DNN::TCudaMatrix< AFloat >::GetComputeStream ( ) const
inline

Definition at line 247 of file CudaMatrix.h.

◆ GetCublasHandle()

template<typename AFloat>
const cublasHandle_t& TMVA::DNN::TCudaMatrix< AFloat >::GetCublasHandle ( ) const
inline

Definition at line 156 of file CudaMatrix.h.

◆ GetCurandStatesPointer()

template<typename AFloat>
static curandState_t* TMVA::DNN::TCudaMatrix< AFloat >::GetCurandStatesPointer ( )
inlinestatic

Definition at line 145 of file CudaMatrix.h.

◆ GetDataPointer() [1/2]

template<typename AFloat>
const AFloat* TMVA::DNN::TCudaMatrix< AFloat >::GetDataPointer ( ) const
inline

Definition at line 154 of file CudaMatrix.h.

◆ GetDataPointer() [2/2]

template<typename AFloat>
AFloat* TMVA::DNN::TCudaMatrix< AFloat >::GetDataPointer ( )
inline

Definition at line 155 of file CudaMatrix.h.

◆ GetDeviceReturn()

template<typename AFloat >
AFloat TMVA::DNN::TCudaMatrix< AFloat >::GetDeviceReturn ( )
inlinestatic

Transfer the value in the device return buffer to the host.

This tranfer is synchronous

Definition at line 280 of file CudaMatrix.h.

◆ GetDeviceReturnPointer()

template<typename AFloat>
static AFloat* TMVA::DNN::TCudaMatrix< AFloat >::GetDeviceReturnPointer ( )
inlinestatic

Return device pointer to the device return buffer.

Definition at line 144 of file CudaMatrix.h.

◆ GetNcols()

template<typename AFloat>
size_t TMVA::DNN::TCudaMatrix< AFloat >::GetNcols ( ) const
inline

Definition at line 152 of file CudaMatrix.h.

◆ GetNoElements()

template<typename AFloat>
size_t TMVA::DNN::TCudaMatrix< AFloat >::GetNoElements ( ) const
inline

Definition at line 153 of file CudaMatrix.h.

◆ GetNrows()

template<typename AFloat>
size_t TMVA::DNN::TCudaMatrix< AFloat >::GetNrows ( ) const
inline

Definition at line 151 of file CudaMatrix.h.

◆ GetOnes()

template<typename AFloat>
static AFloat* TMVA::DNN::TCudaMatrix< AFloat >::GetOnes ( )
inlinestatic

Definition at line 118 of file CudaMatrix.h.

◆ InitializeCuda()

template<typename AFloat>
void TMVA::DNN::TCudaMatrix< AFloat >::InitializeCuda ( )
private

Initializes all shared devices resource and makes sure that a sufficient number of curand states are allocated on the device and initialized as well as that the one-vector for the summation over columns has the right size.

◆ InitializeCurandStates()

template<typename AFloat>
void TMVA::DNN::TCudaMatrix< AFloat >::InitializeCurandStates ( )
private

◆ operator TMatrixT< Double_t >()

template<typename AFloat>
TMVA::DNN::TCudaMatrix< AFloat >::operator TMatrixT< Double_t > ( ) const

Convert cuda matrix to Root TMatrix.

Performs synchronous data transfer.

◆ operator()()

template<typename AFloat >
TCudaDeviceReference< AFloat > TMVA::DNN::TCudaMatrix< AFloat >::operator() ( size_t  i,
size_t  j 
) const

Access to elements of device matrices provided through TCudaDeviceReference class.

Note that access is synchronous end enforces device synchronization on all streams. Only used for testing.

Definition at line 289 of file CudaMatrix.h.

◆ operator=() [1/2]

template<typename AFloat>
TCudaMatrix& TMVA::DNN::TCudaMatrix< AFloat >::operator= ( const TCudaMatrix< AFloat > &  )
default

◆ operator=() [2/2]

template<typename AFloat>
TCudaMatrix& TMVA::DNN::TCudaMatrix< AFloat >::operator= ( TCudaMatrix< AFloat > &&  )
default

◆ ResetDeviceReturn()

template<typename AFloat >
void TMVA::DNN::TCudaMatrix< AFloat >::ResetDeviceReturn ( AFloat  value = 0.0)
inlinestatic

Set the return buffer on the device to the specified value.

This is required for example for reductions in order to initialize the accumulator.

Definition at line 272 of file CudaMatrix.h.

◆ SetComputeStream()

template<typename AFloat >
void TMVA::DNN::TCudaMatrix< AFloat >::SetComputeStream ( cudaStream_t  stream)
inline

Definition at line 254 of file CudaMatrix.h.

◆ Synchronize()

template<typename AFloat >
void TMVA::DNN::TCudaMatrix< AFloat >::Synchronize ( const TCudaMatrix< AFloat > &  A) const
inline

Blocking synchronization with the associated compute stream, if it's not the default stream.

Definition at line 261 of file CudaMatrix.h.

Member Data Documentation

◆ fCublasHandle

template<typename AFloat>
cublasHandle_t TMVA::DNN::TCudaMatrix< AFloat >::fCublasHandle
staticprivate

Definition at line 105 of file CudaMatrix.h.

◆ fCurandStates

template<typename AFloat>
curandState_t* TMVA::DNN::TCudaMatrix< AFloat >::fCurandStates
staticprivate

Definition at line 109 of file CudaMatrix.h.

◆ fDeviceReturn

template<typename AFloat>
AFloat* TMVA::DNN::TCudaMatrix< AFloat >::fDeviceReturn
staticprivate

Buffer for kernel return values.

Definition at line 106 of file CudaMatrix.h.

◆ fElementBuffer

template<typename AFloat>
TCudaDeviceBuffer<AFloat> TMVA::DNN::TCudaMatrix< AFloat >::fElementBuffer
private

Definition at line 114 of file CudaMatrix.h.

◆ fInstances

template<typename AFloat>
size_t TMVA::DNN::TCudaMatrix< AFloat >::fInstances
staticprivate

Current number of matrix instances.

Definition at line 104 of file CudaMatrix.h.

◆ fNCols

template<typename AFloat>
size_t TMVA::DNN::TCudaMatrix< AFloat >::fNCols
private

Definition at line 113 of file CudaMatrix.h.

◆ fNCurandStates

template<typename AFloat>
size_t TMVA::DNN::TCudaMatrix< AFloat >::fNCurandStates
staticprivate

Definition at line 110 of file CudaMatrix.h.

◆ fNOnes

template<typename AFloat>
size_t TMVA::DNN::TCudaMatrix< AFloat >::fNOnes
staticprivate

Current length of the one vector.

Definition at line 108 of file CudaMatrix.h.

◆ fNRows

template<typename AFloat>
size_t TMVA::DNN::TCudaMatrix< AFloat >::fNRows
private

Definition at line 112 of file CudaMatrix.h.

◆ fOnes

template<typename AFloat>
AFloat* TMVA::DNN::TCudaMatrix< AFloat >::fOnes
staticprivate

Vector used for summations of columns.

Definition at line 107 of file CudaMatrix.h.

Libraries for TMVA::DNN::TCudaMatrix< AFloat >:
[legend]

The documentation for this class was generated from the following file: