Perldoop2: a Big Data-oriented source-to-source Perl-Java compiler

Perl is one of the most important programming languages in many research areas. However, the most relevant Big Data frameworks, Apache Hadoop, Apache Spark and Apache Storm, do not support natively this language. To take advantage of these Big Data engines Perl programmers should port their applications to Java or Scala, which requires a huge effort, or use utilities as Hadoop Streaming with the corresponding degradation in the performance. For this reason we introduce Perldoop2, a Big Data-oriented Perl-Java source-to-source compiler. The compiler is able to generate Java code from Perl applications for sequential execution, but also for running on clusters taking advantage of Hadoop, Spark and Storm engines. Perl programmers only need to tag the source code in order to use the compiler. Experimental results demonstrate the benefits of Perldoop2 in terms of ease of use, performance and scalability.

keywords: Big Data, Compiler, Perl, Java, Hadoop, Spark