3DyRM: a dynamic roofline model including memory latency information

Modern systems present complex memory hierarchies and heterogeneity among cores and processors. As a consequence, efficient programming is challenging. An easy-to-understand performance model, offering guidelines and information about the behaviour of a code, may be useful to alleviate these issues. In this paper, we present two extensions of the well known Berkeley Roofline Model. The first of these extensions, the DyRM, takes into consideration the complexities of multicore and heterogeneous systems, offering a more detailed view of the evolution of the execution of a code. The second, the 3DyRM, also adds information about the latency of memory accesses to better represent the behaviour on systems with complex memory hierarchies. A set of tools to obtain and represent the models has been implemented. These tools obtain the needed data from hardware counters, with low overhead. Different views are displayed by the tool that can be used to extract the main features of the code. Results of studying, with these tools, the NAS Parallel Benchmarks for OpenMP on two different systems are presented.

keywords: Roofline model, Performance, Hardware counters, PEBS, NPB, Multicore