Resource management and monitoring

Authors: C. Cavazzonia, A. Federicob, D. Galettia, G. Morellib, A. Pierettib
a CINECA, via Magnanelli 6/3, 40033 Casalecchio di Reno, Italy
b CINECA, via dei Tizii 6/b, 00185 Roma, Italy

Abstract: HPC green thinking implies the reduction of power consumption which is in contrast to the need of an ever growing demand in terms of computational power. To circumvent this dilemma, several types of computing accelerators have been adopted. Using an accelerator means partial if not total code rewriting, with the aim of achieving a speed-up which would be difficult to attain with present CPU and hardware evolution. After an initial period of concern about this new technology, programmers have shown a growing interest in this field and several of the most used scientific codes have undergone an intense software restyling.
Accelerators have introduced a new class of requests which need to be fulfilled by resource schedulers on hybrid clusters. Hence, exploring what schedulers can offer in terms of minimizing the effort and maximizing the resource exploitation has become a fundamental issue. As the CINECA supercomputing center runs a new generation hybrid cluster with two different accelerators, i.e. GPUs and MICs, it is involved in testing a resource scheduler, PBSPro in order to put its cluster to the best possible use.

Download PDF


Authors: Seren Soner1, Can Ozturan1*
1Computer Engineering Department, Bogazici University, Istanbul, Turkey

Abstract: SLURM is a popular resource management system that is used on many supercomputers in the TOP500 list. In this work, we describe our new AUCSCHED3 SLURM scheduler plug-in that extends our earlier AUCSCHED2 plug-in with a capability to do topologically aware mappings of jobs on hierarchically interconnected systems like trees or fat trees. Our approach builds on our previous auction based scheduling algorithm of AUCSCHED2 and generates bids for topologically good mappings of jobs onto the resources. The priorities of the jobs are also adjusted slightly without changing the original priority ordering of jobs so as to favour topologically better candidate mappings. SLURM emulation results are presented for a heterogeneous 1024 node system which has 16 cores and 3 GPUs on each of its nodes. The results show that our heuristic generates better topological mappings than SLURM/Backfill. AUCSCHED3 is available at http://code.google.com/p/slurm-ipsched/.

Download paper: PDF


Authors: Seren Soner,Can Ozturan, Itir Karac


Computer Engineering Department,Bogazici University, Istanbul, Turkey

Abstract: SLURM resource management system is used on many TOP500 supercomputers. In this work, we present enhancementsthat we added to our AUCSCHED heterogeneous CPU-GPU scheduler plug-in whose -rst version was released inDecember 2012. In this new version, called AUCSCHED2, two enhancements are contributed: The -rst is the extensionof SLURM to support GPU ranges. The current version of SLURM supports speci-cation of node range but not of GPUranges. Such a feature can be very useful to runtime auto-tuning applications and systems that can make use of variablenumber of GPUs. The second enhancement involves the implementation of a new integer programming formulation inAUCSCHED2 that drastically reduces the number of variables. This allows faster solution and larger number of bids tobe generated. SLURM emulation results are presented for the heterogeneous 1408 node Tsubame supercomputer whichhas 12 cores and 3 GPU’s on each of its nodes. AUCSCHED2 is available at http://code.google.com/p/slurm-ipsched/.

Download PDF


Authors: Seren Soner,Can Ozturan, Itir Karac


Computer Engineering Department,Bogazici University, Istanbul, Turkey

Abstract: SLURM is a resource management system that is used on many TOP500 supercomputers. We present a heterogeneousCPU-GPU scheduler plug-in, called AUCSCHED, for SLURM that implements an auction based algorithm. In orderto tune the topological mapping of jobs to resources, our plug-in determines at scheduling time, for each job, the bestresource choices based on node contiguity from available ones. Each of these choices is then expressed as a bid that ajob makes in an auction. Our algorithm takes a window of jobs from the front of the job queue, generates multiple bidsfor available resources for each job, and solves an assignment problem that maximizes an objective function involvingpriorities of jobs. We generate several CPU-GPU synthetic workloads and perform realistic SLURM emulation teststo compare the performance of our auction based scheduler with that of SLURM’s own back—ll scheduler. In general,AUCSCHED has a few percentage points of better utilization over SLURM/BF plug-in but topologically SLURM/BFis leading to less fragmentation whereas AUCSCHED is leading to less spread. SLURM’s as well as our plug-in producehigh utilizations around 90% when workloads are made up of jobs requesting no more than 1 GPU per node. On theother hand, when workloads contain jobs that request 2 GPUs per node, it is observed that the system utilization dropsdrastically to the 65-75% range both when our AUCSCHED and SLURM’s own plug-in are used. This points to theneed to further study of scheduling jobs that utilize multiple GPU cards on nodes. Our plug-in which builds on ourearlier plug-in called IPSCHED is available at http://code.google.com/p/slurm-ipsched/.

Download PDF


Authors: Reha OguzSelvitopi, Ata Turk, Altay Guvenir, Cevdet Aykanat


Bilkent University, ComputerEngineering Department, 06800 Ankara, Turkey

Abstract: Topology aware mapping has started to attain interest again by the development of supercomputers whose topologies consist of thousands of processors with large diameters. In such parallel architectures, it is possible to obtain performance improvements for the executed parallel programs via careful mapping of tasks to processors by considering properties of the underlying topology and the communication pattern of the mapped program. One of the most widely used metric for capturing a parallel program’s communication overhead is the hop-bytes metric which takes the processor topology into account which is in contrast to the assumptions made by the wormhole routing. In this work, we propose a KL-based iterative improvement heuristic for mapping tasks of a given program to the processors of the parallel architecture where the objective is the reduction of the communication volume that is modeled with the hop-bytes metric. We assume that the communication pattern of the program is known beforehand and the processor topology information is available. The algorithm basically tries to improve a given initial mapping with a number of successive task swaps defined within a given processor neighborhood. We test our algorithm for different number of tasks and processors and demonstrate its results by comparing it to random mapping, which is widely used in recent supercomputers.

Download PDF


Authors: Seren Soner,Can Ozturan, Itir Karac


Computer Engineering Department,Bogazici University, Istanbul, Turkey

Abstract: We present an integer programming based heterogeneous CPU-GPU cluster scheduler for the widely used SLURM resource manager. Our scheduler algorithm takes windows of jobs and solves an allocation problem in which free CPU cores and GPU cards are allocated collectively to jobs so as to maximize some objective function. We perform realistic SLURM emulation tests using the Effective System Performance (ESP) workloads. The test results show that our scheduler produces better resource utilization and shorter average job waiting times. The SLURM scheduler plug-in that implements our algorithm is available at http://code.google.com/p/slurm-ipsched/.

Download PDF


Authors: DanielaGaletti, Federico Paladin
SuperComputing Applications andInnovation Dept., CINECA, Bologna, Italy

Abstract: This document describes the work of design, development and improvement of the Nagios moni toring system done in Cineca and used for the Tier-1 systems participating in the PRACE projects. Starting from the issues arisen by the complexity of the HPC systems and the related monitoring activities, the targeted solutions and their implementation are explained. The most important aspects of the implementation and the specific issues related to HPC will be described with a specific attention to the exascale clusters.

Download PDF


Disclaimer

These whitepapers have been prepared by the PRACE Implementation Phase Projects and in accordance with the Consortium Agreements and Grant Agreements n° RI-261557, n°RI-283493, or n°RI-312763.

They solely reflect the opinion of the parties to such agreements on a collective basis in the context of the PRACE Implementation Phase Projects and to the extent foreseen in such agreements. Please note that even though all participants to the PRACE IP Projects are members of PRACE AISBL, these whitepapers have not been approved by the Council of PRACE AISBL and therefore do not emanate from it nor should be considered to reflect PRACE AISBL’s individual opinion.

Copyright notices

© 2014 PRACE Consortium Partners. All rights reserved. This document is a project document of a PRACE Implementation Phase project. All contents are reserved by default and may not be disclosed to third parties without the written consent of the PRACE partners, except as mandated by the European Commission contracts RI-261557, RI-283493, or RI-312763 for reviewing and dissemination purposes.

All trademarks and other rights on third party products mentioned in the document are acknowledged as own by the respective holders.

Share: Share on LinkedInTweet about this on TwitterShare on FacebookShare on Google+Email this to someone