Functions | |
void | componentSumReduction (FLOAT_ARRAY_TYPE *dInData, FLOAT_ARRAY_TYPE *dInterData, int numInterData, int numThreads, int N) |
Parallel component wise sum procedure. | |
__global__ void | gpuComponentSumReduce1 (FLOAT_ARRAY_TYPE *dInData, FLOAT_ARRAY_TYPE *dInterData, int numInterData, int numThreads, int N) |
Kernel for first component wise reduction step. | |
__global__ void | gpuComponentSumReduce2 (FLOAT_ARRAY_TYPE *dInterData, int numInterData, int numThreads) |
Kernel for second component wise reduction step. | |
__global__ void | gpuReduce1 (float *dInData, float *dInterData, int numInterData, int numThreads, int N) |
Kernel for first sum reduction step. | |
__global__ void | gpuReduce2 (float *dInterData, int numInterData, int numThreads) |
Kernel for second rsum eduction step. | |
__global__ void | gpuSqSumReduce1 (FLOAT_ARRAY_TYPE *dInData, float *dInterData, int numInterData, int numThreads, int N) |
Kernel for first squared reduction step. | |
void | SqVecSumReduction (FLOAT_ARRAY_TYPE *dInData, float *dInterData, int numInterData, int numThreads, int N) |
Parallel squared vector sum procedure. | |
void | sumReduction (float *dInData, float *dInterData, int numInterData, int numThreads, int N) |
Parallel summation procedure. |
This module contains parallel implementations for typical tasks
void componentSumReduction | ( | FLOAT_ARRAY_TYPE * | dInData, | |
FLOAT_ARRAY_TYPE * | dInterData, | |||
int | numInterData, | |||
int | numThreads, | |||
int | N | |||
) |
Parallel componentwise sum function, used as wrapper for gpuComponentSumReduce1 and gpuComponentSumReduce2 . The final result is stored in the first element of dInterData
.
[in] | dInData | Input data |
[in,out] | dInterData | Used for intermedite storage, content will be overwritten |
[in] | numInterData | Elements in intermediate output data |
[in] | numThreads | Number of used threads |
[in] | N | Number of elements to be reduced |
Definition at line 804 of file gpu_tools.cu.
References checkCudaError(), and FLOAT_ARRAY_TYPE.
Referenced by removeDrift().
__global__ void gpuComponentSumReduce1 | ( | FLOAT_ARRAY_TYPE * | dInData, | |
FLOAT_ARRAY_TYPE * | dInterData, | |||
int | numInterData, | |||
int | numThreads, | |||
int | N | |||
) |
This kernel function sums componentwise all N
elements of the input data up into numInterData
intermediate partial sums.
[in] | dInData | Input data |
[out] | dInterData | Intermediate output data |
[in] | numInterData | Elements in intermediate output data |
[in] | numThreads | Number of used threads |
[in] | N | Number of elements to be reduced |
Definition at line 707 of file gpu_tools.cu.
References FLOAT_ARRAY_TYPE.
__global__ void gpuComponentSumReduce2 | ( | FLOAT_ARRAY_TYPE * | dInterData, | |
int | numInterData, | |||
int | numThreads | |||
) |
This kernel function takes the intermediate componentwise sums of gpuComponentSumReduce1 and calculates the final componentwise sum, which is stored in the first element of dInterData
[in,out] | dInterData | Intermediate output data |
[in] | numInterData | Elements in intermediate output data |
[in] | numThreads | Number of used threads |
Definition at line 745 of file gpu_tools.cu.
References FLOAT_ARRAY_TYPE.
__global__ void gpuReduce1 | ( | float * | dInData, | |
float * | dInterData, | |||
int | numInterData, | |||
int | numThreads, | |||
int | N | |||
) |
This kernel function sums all N
elements of the inputdata up into numInterData
intermediate partial sums.
[in] | dInData | Input data |
[out] | dInterData | Intermediate output data |
[in] | numInterData | Elements in intermediate output data |
[in] | numThreads | Number of used threads |
[in] | N | Number of elements to be reduced |
Definition at line 637 of file gpu_tools.cu.
__global__ void gpuReduce2 | ( | float * | dInterData, | |
int | numInterData, | |||
int | numThreads | |||
) |
This kernel function takes the intermediate sums of gpuReduce1 and calculates the final sum, which is stored in the first element of dInterData
[in,out] | dInterData | Intermediate output data |
[in] | numInterData | Elements in intermediate output data |
[in] | numThreads | Number of used threads |
Definition at line 660 of file gpu_tools.cu.
__global__ void gpuSqSumReduce1 | ( | FLOAT_ARRAY_TYPE * | dInData, | |
float * | dInterData, | |||
int | numInterData, | |||
int | numThreads, | |||
int | N | |||
) |
This kernel function takes the squared sums of all N
elements of the input data and stores numInterData
intermediate results.
[in] | dInData | Input data |
[out] | dInterData | Intermediate output data |
[in] | numInterData | Elements in intermediate output data |
[in] | numThreads | Number of used threads |
[in] | N | Number of elements to be reduced |
Definition at line 684 of file gpu_tools.cu.
void SqVecSumReduction | ( | FLOAT_ARRAY_TYPE * | dInData, | |
float * | dInterData, | |||
int | numInterData, | |||
int | numThreads, | |||
int | N | |||
) |
Parallel squared vector sum function, used as wrapper for gpuSqSumReduce1 and gpuReduce1 . The final result is stored in the first element of dInterData
.
[in] | dInData | Input data |
[in,out] | dInterData | Used for intermedite storage, content will be overwritten |
[in] | numInterData | Elements in intermediate output data |
[in] | numThreads | Number of used threads |
[in] | N | Number of elements to be reduced |
Definition at line 794 of file gpu_tools.cu.
References checkCudaError().
Referenced by getKineticEnergy(), and rescaleT().
void sumReduction | ( | float * | dInData, | |
float * | dInterData, | |||
int | numInterData, | |||
int | numThreads, | |||
int | N | |||
) |
Parallel sum function, used as wrapper for gpuReduce1 and gpuReduce1 . The final result is stored in the first element of dInterData
.
[in] | dInData | Input data |
[in,out] | dInterData | Used for intermedite storage, content will be overwritten |
[in] | numInterData | Elements in intermediate output data |
[in] | numThreads | Number of used threads |
[in] | N | Number of elements to be reduced |
Definition at line 783 of file gpu_tools.cu.
References checkCudaError().
Referenced by getHarmonicPotentialEnergy(), and getLJCutPotentialEnergy().