Multiobjective Optimization Technique Based on Monitoring Information to Increase the Performance of Thread Migration on Multicores

Multicore systems present on-board memory hier- archies and communication networks that influence their per- formance when they execute shared memory parallel codes. Characterizing this influence is complex, and understanding the effect of particular hardware configurations on different codes is of paramount importance. In this paper, monitoring information extracted from hardware counters in runtime is used to characterize the behaviour of each thread in the parallel code in terms of three values: the number of floating point operations per second, the operational intensity, and the memory access latency. Note that these values characterize the Roofline Model with the inclusion of additional information about memory access latencies. We propose to use this information to guide thread migration strategies that improve the efficiency of the execution of the code by increasing locality and affinity. The idea behind this proposal is to use these three values as objective functions to be optimized as a multiobjective optimization problem. The proposed technique is an iterative method inspired in evolutive optimization algorithms. To this end, an individual utility function is defined to represent the relative importance of these values. This function is a weighted product that can be considered as representative of the performance of each parallel thread. Different configurations of the SAXPY and SDOT kernels on multicores were used to validate the benefits of the proposed thread migration strategies. The results show that our strategy produces improvements up to 25% in scenarios where locality and affinity are low, and negligible degradation is observed when they are high. The use of hardware counters produces low overheads when extracting monitoring information.