Dynamic workload optimisation on NUMA and heterogeneous architectures

This thesis faces the challenges of dynamic workload optimisation and workload balancing in two different problems: in conventional systems using heterogeneous (CPU and GPU) parallelism, and in NUMA systems. On one hand, a library named IHP is proposed. Dynamically, the performance of the CPU and GPU is evaluated so the workload is divided accordingly. Results show that execution times can be improved between 3% and 55% depending on the code and the performance of the computing units. On the other hand, a tool for migrating threads and memory pages in NUMA systems has been developed. This tool incorporates several algorithms that, considering performance measurements, decide whether a migration is required. Experiments show that performance can be improved by up to 47%, particularly in multi-tasking scenarios.

keywords: High Performance Computing, NUMA Scheduling, Heterogeneous parallelism, Hardware Counters