Lessons Learnt Porting Parallelisation Techniques for Irregular Codes to NUMA Systems

This work presents a study undertaken to characterise the behaviour of some parallelisation techniques for irregular codes, previously developed for SMP architectures, on a several-node SMP NUMA system. The main objective is to determine the performance effect of bus con- tention and cache coherency in such a complex architecture. Results show that: (1) cores which share a socket can be considered as independent processors in this context; (2) for big data sizes, the effect of sharing a bus degrades the performance but masks the cache coherency effects and (3) the NUMA-ratio is a critical factor on irregular codes. These results allow us to study the effect in performance of the thread-to-core mappings and memory allocation policies. Keywords-Irregular Codes, Itanium2, Hardware Counters.