New technologies to bridge the gap between High Performance Computing (HPC) and Big Data

The unification of HPC and Big Data has received increasing attention in the last years. It is a common belief that exascale computing and Big Data are closely associated since HPC requires processing large-scale data from scientific instruments and simulations. But, at the same time, it was observed that tools and cultures of HPC and Big Data communities differ significantly. One of the most important issues in the path to the convergence is caused by the differences in their software stacks. This thesis will address the research challenge of bridging the gap between Big Data and HPC worlds. With this goal in mind, a set of tools and technologies will be developed and integrated into a new unified Big Data-HPC framework that will allow the execution of scientific multi-language applications on both environments using containers.

keywords: HPC, Big Data, MPI, Containers, Spark