Implementation and performance analysis of the AXPY, DOT, and SpMV functions on Intel Xeon Phi and NVIDIA Tesla using OpenCL
The present work is an analysis of the performance
of the AXPY, DOT and SpMV functions using OpenCL. The code
was tested on the NVIDIA Tesla S2050 GPU and Intel Xeon Phi
3120A coprocessor. Due to nature of the AXPY function, only
two versions were implemented, the routine to be executed by the
CPU and the kernel to be executed on the previously mentioned
devices. It was studied how they perform for different vector’s
sizes. Their results show the NVIDIA architecture better suited
for the smaller vectors sizes and the Intel architecture for the
larger vector’s sizes. For the DOT and SpMV functions, there
are three versions implemented. The first, is the CPU routine,
the second one is an OpenCL kernel that uses local memory and
the third one is an OpenCL kernel that only uses global memory.
The kernels that use local memory are tested by varying the size
of the work-group; the kernels that only uses global memory are
tested by varying the arrays size. In the case of the first ones, the
results show the optimum work-group size and that the NVIDIA
architecture benefits from the use of local memory. For the latter
kernels, the results show that larger computational loads benefits
the Intel architecture.
keywords:
Publication: Congress
1624015033187
June 18, 2021
/research/publications/implementation-and-performance-analysis-of-the-axpy-dot-and-spmv-functions-on-intel-xeon-phi-and-nvidia-tesla-using-opencl
The present work is an analysis of the performance
of the AXPY, DOT and SpMV functions using OpenCL. The code
was tested on the NVIDIA Tesla S2050 GPU and Intel Xeon Phi
3120A coprocessor. Due to nature of the AXPY function, only
two versions were implemented, the routine to be executed by the
CPU and the kernel to be executed on the previously mentioned
devices. It was studied how they perform for different vector’s
sizes. Their results show the NVIDIA architecture better suited
for the smaller vectors sizes and the Intel architecture for the
larger vector’s sizes. For the DOT and SpMV functions, there
are three versions implemented. The first, is the CPU routine,
the second one is an OpenCL kernel that uses local memory and
the third one is an OpenCL kernel that only uses global memory.
The kernels that use local memory are tested by varying the size
of the work-group; the kernels that only uses global memory are
tested by varying the arrays size. In the case of the first ones, the
results show the optimum work-group size and that the NVIDIA
architecture benefits from the use of local memory. For the latter
kernels, the results show that larger computational loads benefits
the Intel architecture. - E. Coronado-Barrientos, G. Indalecio and A. Garcia-Loureiro
publications_en