Multicore systems present on-board memory hierarchies and communication networks that influence performance when executing shared memory parallel codes.
Characterising this influence is complex, and understanding the effect of particular hardware configurations on different codes is of paramount importance.
In this paper, monitoring information extracted from hardware counters at runtime is used to characterise the behaviour of each thread in the processes running in the system. This characterisation is given in terms of number of floating point operations per second, operational intensity, and latency of memory access.
Keywords: 3DyRM, Roofline Model, Hardware Counters, Performance, Thread migration