OPERA-P: An Adaptive Scheduler for Dynamically Provisioning Big Data Frameworks On-demand

The proposed OPERA-P, short for OPportunistically, Elastically Resource Allocation and Provisioning scheduler, is a new hybrid BD platform that combines High-Throughput and High-Performance Computing, i.e., HTCondor and Yarn (see Figure 1). By utilizing OPERA-P, an HTCondor opportunistic pool and an Apache Yarn dedicated cluster can collaborate, and we can achieve an enhanced tasks throughput, for the benefits of BD applications, with minimal cost of deployment. This model is very similar to how multiple applications run concurrently on a laptop or smartphone. In that, new threads are spawned, and more resources are asked as they are needed; consequently, the OS arbitrates among all of the requests. In comparison, OPERA-P will represent the OS, by keep spawn new Docker containers among the idle HTCondor workstations (creating an opportunistic container-based cluster on the HTCondor pool) and ensures efficiently provisioning for the Hadoop dedicated cluster on-demand.

keywords: Big Data, Large-scale Distributed Cluster, Workload Scheduling, Analytical Framework