The Divide-and-conquer (D&C) pattern appears in a large number of problems and is highly suitable to exploit parallelism. This has led to much research on its easy and efficient application both in shared and distributed memory parallel systems.
One of the most successful approaches explored in this area consists of expressing this pattern by means of parallel skeletons which automate and hide the complexity of the parallelization from the user while trying to provide good performance. In this paper, we tackle the development of a skeleton oriented to the efficient parallel resolution of D&C problems with a high degree of imbalance among the subproblems generated and/or a deep level of recurrence. The skeleton achieves in our experiments average speedups between 11 and 18% higher than those of other solutions, reaching
a maximum speedup of 78% in some tests. Nevertheless, the new proposal requires an average of between 13 and 29% less programming effort than the usual alternatives.
Keywords: Algorithmic skeletons, Divide-and-conquer, Template metaprogramming, Load balancing