Big Data meets High Performance Computing: Genomics and Natural Language Processing as case studies
The main objective of this thesis is to clarify a way to the convergence between Big Data and High Performance Computing. To do this, a computational study of the application of Big Data and HPC technologies to two real world scientific problems is performed. These two problems, a priori, would fit well in both worlds, HPC and Big Data. With the results obtained in the works presented in this thesis, the road to convergence between HPC and Big Data can be clarified to some extent. the two problems addressed are the sequence alignment in genomics and the natural language processing. By doing this, not only a way to reach Exascale can be opened, also, new tools that allow to carry out very important works within two nowadays scientific areas are developed. These new tools, which work in an efficient and scalable way represent a very important improvement to researchers in the aforementioned areas, as they can perform their daily work faster and more efficiently.
keywords: big data, high performance computing, genomics, bioinformatics, natural language processing