On the Influence of Thread Allocation for Irregular Codes in NUMA Systems
This work presents a study undertaken to characterise the FINISTERRAE supercomputer, one of the biggest NUMA systems in Europe. The main objective was to determine the performance effect of bus contention and cache coherency as well as the suitability of porting strategies regarding irregular codes in such a complex architecture. Results show that: (1) cores which share a socket can be considered as independent processors in this context; (2) for big data sizes, the effect of sharing a bus degrades the final performance but masks the cache coherency effects; (3) the NUMA factor (remote to local memory latency ratio) is an important factor on irregular codes and (4) the default kernel allocation policy is not optimal in this system. These results allow us to understand the behaviour of thread-to-core mappings and memory allocation policies.
keywords: Itanium2, Hardware Counters, Irregular Codes, FinisTerrae