.: 6 Core Research Activities in Reconfigurable Computing Laboratory

 


HPC Productivity and Cloud Resource Management -

Please refer to Productivity-Aware Scheduling in HPC project page for current state of the project.

Funded by: NSF (I/UCRC Cloud and Autonomic Computing)

Collaborator: H. J. Siegel (Colorado State University), Aniruddha Marathe and Ghaleb Abdulla ( Lawrence Livermore National Laboratory) )

Researchers: Nirmal Kumbhare

Data infrastructures such as Google, Amazon, eBay, and E-Trade are powered by data centers (DCs) that contain tens to hundreds of thousands of computers and storage devices running complex software applications. Between 2013 to 2020, organizations’ investment in software for mobile, social, cloud, and big data technologies is expected to grow over 20 times faster than organizations’ investment in hardware for Information Technologies (IT) Traditional IT architectures are not designed to provide an agile infrastructure to keep up with the rapidly evolving next-generation mobile, big data, and application demands, as they are distinct from the “traditional” enterprise applications. Common characteristics of the next-generation applications are nonlinear scaling and relatively unpredictable growth as a function of inputs being processed, their size, and dynamic behavior. The dynamic and unpredictable changes in the Service Level Objectives (SLOs) (e.g., availability response time, reliability, energy) of these applications require continuous provisioning and re-provisioning of DC resources. Ultimately, this means considering the impractical approach of building new DCs that can efficiently support the distinct design principles for each unique DC workload or application. The goal of our research is to design an innovative composable “Just in Time Architecture (JITA)” DC and associated resource management techniques that utilize a set of flexible building blocks that can be dynamically and automatically assembled and re-assembled to meet the dynamic changes in workload’s SLO of current and future DC applications. Towards enabling the design of Just In Time Architectures (JITA) for the next generation DC applications we focus on addressing the dynamic composition and resource management challenges by building a Virtual DC and investigating:

  • Novel application-aware DC management system to monitor and analyze the execution of the DC resources continuously, control the logical and physical resource components dynamically, and re-configure them automatically to meet its SLOs at runtime
  • Novel model driven resource management heuristics and scheduling algorithms based on time dependent metrics that measure the value of a service to achieve a balance between competing goals (e.g., completion time and energy consumption).

In order to measure HPC productivity, we utilize a monotonically decreasing time-dependent value function that represents the value of completing the job for an organization. We quantify HPC productivity by accumulating the job-values achieved by the completed jobs. We then leveraged VoS as the basis for constructing power-aware value-based scheduling algorithms [7] in order to improve productivity of a power-constrained and oversubscribed HPC system. First one distributes the system power uniformly among the nodes to improve node utilization, second one allocates power in a greedy manner to high value jobs to meet their timing requirements. We study these two algorithms under different system-wide power-constraints on a real HPC prototype using synthetic workload trace of real scientific routines.

Publications:

  1. N. Kumbhare, C. Tunc, S. Hariri, I. Djordjevic, A. Akoglu, and H.J. Siegel, “Just in time architecture (jita) for dynamically composable data centers,” 13th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA’16), Nov./Dec. 2016.
  2. Dylan Machovec, Bhavesh Khemka, Nirmal Kumbhare, Sudeep Pasricha, Anthony A. Maciejewski, Howard Jay Siegel, Ali Akoglu, Gregory A. Koenig, Salim Hariri, Cihan Tunc, Michael Wright, Marcia Hilton, Rajendra Rambharos, Christopher Blandin, Farah Fargo, Ahmed Louri, and Neena Imam, "Utility-Based Resource Management in an Oversubscribed Energy-Constrained Heterogeneous Environment Executing Parallel Applications," Parallel Computing, Vol. 83, pp. 48-72, Apr. 2019.
  3. Cihan Tunc, Dylan Machovec, Nirmal Kumbhare, Ali Akoglu, Salim Hariri, Bhavesh Khemka, Howard J. Siegel, “Value of Service Based Resource Management for Large-Scale Computing Systems,” Cluster Computing, pp. 1-18, May 2017. DOI 10.1007/s10586-017-0901-9
  4. Cihan Tunc, Nirmal Kumbhare, Ali Akoglu, Salim Hariri, Dylan Machovec, Howard Jay Siegel, “Value of Service Based Task Scheduling for Cloud Computing Systems,” IEEE International Conference on Cloud and Autonomic Computing (ICCAC), Augsburg, Germany, September 12-16, 2016, pp. 1-11.
  5. Dylan Machovec, Cihan Tunc, Nirmal Kumbhare, Bhavesh Khemka, Ali Akoglu, Salim Hariri, Howard Jay Siegel, “Value-Based Resource Management in High-Performance Computing Systems,” 7th Workshop on Scientific Cloud Computing (ScienceCloud 2016), cosponsors: ACM SIGARCH (Special Interest Group on Computer Architecture) and The University of Arizona, in the proceedings of The 25th International Symposium on High Performance Parallel and Distributed Computing (HPDC `16), pp. 19-26, Kyoto, Japan, May/June 2016.
  6. Farah Fargo, Cihan Tunc, Youssif Al-Nashif, Ali Akoglu, and Salim Hariri, “Autonomic Workload and Resources Management of Cloud Computing Resources,” IEEE International Conference on Cloud and Autonomic Computing (ICCAC’14), London, Sept 8-12, 2014, pp. 101-110.
  7. Nirmal Kumbhare, Cihan Tunc, Dylan Machovec, Ali Akoglu, Salim Hariri and Howard Jay Siegel, “Value-Based Scheduling for Oversubscribed Power-Constrained Homogeneous HPC Systems,” IEEE International Conference on Cloud and Autonomic Computing (ICCAC), Tucson, USA, September 18-22, 2017, pp.120-130.