Using sampled information: is it enough for the sparse matrix-vector product locality optimization?

One of the main factors that affect the performance of the sparse matrix–vector product (SpMV) is the low data reuse caused by the irregular and indirect memory access patterns. Different strategies to deal with this problem such as data reordering techniques have been proposed. The computational cost of these techniques is typically high because they consider all the nonzeros of the sparse matrix in order to find an appropriate permutation of rows and columns that improves the SpMV performance. In this paper, we analyze the possibility of increasing the locality of the SpMV using incomplete information in the reordering process. This partial information comes as a consequence of considering only a subset of the nonzero elements of the matrix. These nonzeros are obtained from the original matrix through a sampling process. In particular, two different sampling methods have been considered: a random sampling and an event-based sampling using hardware counters. We have detected that a small number of samples is enough to obtain quality reorderings. As a consequence, using sampling-based reorderings leads to noticeable performance improvements with respect to the non-reordered matrices, reaching speedup values up to 2.1_. In addition, an important reduction in the computational time required by the reordering technique has been observed

keywords: sparse matrix, locality, hardware counters, sampling, performance