Load balanced heterogeneous parallelism for finite difference problems on image denoising

In this work, we introduce a heterogeneous scheme for computing iterative (or time‐step) methods based on finite differences, using an image denoising problem as case study. The idea of this proposal is to dynamically split the domain of the problem into smaller regions based on the CPU and GPU performance, balancing the workload between them. Results show that this approach improves the execution times compared with only use GPU, which is typically faster than CPU in this kind of problems. In our experiments, performance improvements go from 3%, in scenarios where CPU can only handle a little portion of workload, to more than 30%, when CPU can assume more work.

keywords: CUDA, finite differences, heterogeneous parallelism, image processing, OpenMP