Towards Large Scale Environmental Data Processing with Apache Spark
Currently available environmental datasets are either manually constructed by professionals or
automatically generated from the observations provided by sensing devices. Usually, the former are
modelled and recorded with traditional general-purpose relational technologies, whereas the latter
require more specific scientific array formats and tools. Declarative data processing technologies are
available both for relational and array data, however, the efficient declarative integrated processing
of array and relational environmental data is a problem for which a satisfactory solution has still not
been provided. Due to the above, an integrated data processing language called MAPAL has been
proposed. This paper provides a brief description of the design decisions and challenges, related to
data storage and data processing that arise during the ongoing implementation of MAPAL on top of
the Apache Spark large scale data processing framework.
keywords: Environmental Data, Data Processing, Big Data, Apache Spark
Publication: Congress
1624015041766
June 18, 2021
/research/publications/towards-large-scale-environmental-data-processing-with-apache-spark
Currently available environmental datasets are either manually constructed by professionals or
automatically generated from the observations provided by sensing devices. Usually, the former are
modelled and recorded with traditional general-purpose relational technologies, whereas the latter
require more specific scientific array formats and tools. Declarative data processing technologies are
available both for relational and array data, however, the efficient declarative integrated processing
of array and relational environmental data is a problem for which a satisfactory solution has still not
been provided. Due to the above, an integrated data processing language called MAPAL has been
proposed. This paper provides a brief description of the design decisions and challenges, related to
data storage and data processing that arise during the ongoing implementation of MAPAL on top of
the Apache Spark large scale data processing framework. - Ferrón D., Villarroya S., Viqueira J.R.R., Pena T.F.
publications_en