PRACE Preparatory Access – 10th cut-off evaluation in September 2012

PRACE Preparatory Access – 10th cut-off evaluation in September 2012.


Type A – Code scalability testing

Project name: Large Scale Discrete Element Simulations for Geotechnical Engineering Applications

Project leader: Catherine O’Sullivan, Imperial College London (UK)
Collaborators:Kevin Hanley, Imperial College London (UK)
Research field: Engineering and Energy
Resource awarded: 50.000 core-hours on CURIE Thin Nodes partition, GENCI@CEA, France

AbstractThe Discrete Element Method (DEM) is now established as a key tool in fundamental soil mechanics research. This is a computational method that explicitly models particles in a granular material and their interactions. It is assumed that the particles are rigid and that their interactions can be described via simple rheological models to make the simulations computationally tractable. The method is based on a time-stepping algorithm, and during the simulation contacting particles may separate and new contacts can form. Since these time-stepping algorithms are often only conditionally stable, a small time increment must be used in the simulations. To date, the computational cost of DEM simulations has restricted the number of particles, and consequently the range of particle sizes, that are considered. By making use of HPC systems offered by PRACE, DEM simulations may be run using realistic particle size distributions and representative numbers of particles. The applicants are already users of HECToR, the principal high-performance computing resource in the UK which is funded by the UK Research Councils, and wish to evaluate the performance of the granular LAMMPS code on the CURIE x86 system. LAMMPS is an established and widely-used open-source molecular dynamics code which the applicants have developed to add functionality for granular materials which is not currently included in the code. LAMMPS was specifically developed to be implemented on HPC environments: the acronym stands for ‘Large-scale Atomic/Molecular Massively Parallel Simulator’. Therefore it is an ideal base for a high-performance DEM code. The current version of LAMMPS is written in C++, uses MPI for parallelisation and velocity-Verlet as the default time-integration algorithm. The purpose of the preparatory access is to evaluate the scaling performance of LAMMPS on the CURIE x86 system, with a view to submitting a full access proposal if the scaling characteristics are positive. The applicants are engaged in two ongoing research studies. One considers the undrained response of Dunkirk and Toyoura sands tocyclic loading within a critical-state soil mechanics framework. This is important as many structures of interest to geotechnical engineers are subjected to cyclic loading due to forces imposed by the action of waves, winds; these include offshore wind turbines and offshore oil and gas installations.The other on-going research study focuses on the problem of internal erosion in embankment dams, i.e., the removal of fine soil particles from within an embankment dam by seepage. Internal erosion is recognised by dam engineers as a primary hazard to embankment dams, causing around half of all failures, and the aim of this study is to assess whether a micromechanical justification exists for commonly-used empirical guidelines used in dam design and monitoring

Project name: Cryo-GPU-Crunch

Project leader: Xavier Cavin, INRIA (FR)
Collaborators: Olivier Demengeon, INRIA (FR)
Research field: Medicine and Life Sciences
Resource awarded: 50.000 GPU on CURIE HYBRID partition, GENCI@CEA, France

AbstractThe goal of our Cryo-GPU-Crunch project is to develop high performance and scalable parallel algorithms fully utilizing the available CPU, GPU and network resources available to solve an important problem in Cryo-electron microscopy (cryo-EM): the accurate and automatic picking of particles from the raw EM data. Indeed, some algorithms and implementations have been proposed in the literature to solve this problem efficiently, but none of them has been designed to benefit from modern supercomputing hybrid resources. In the scope of this project with PRACE, we want to investigate the scalability of our parallel algorithms on a large-scale hybrid supercomputer, such as the one made by the Curie Hybrid Nodes.

Project name: Hectometric scale simulation of subsynoptically-driven severe convective Mediterranean events

Project leader: Pierre Benard, METEO-FRANCE (FR)
Collaborators: Ludovic Auger, Fabrice Voitus, METEO-FRANCE (FR)
Research field: Earth Sciences and Environment
Resource awarded: 50.000 core-hours on CURIE Thin Nodes partition, GENCI@CEA, France

AbstractThe aim of the project is to achieve high-resolution and large-domain numerical simulations of the meteorological atmosphere in order to better understand the role of small-scale phenomena in sub-synoptically driven extreme convective events, and thereby to better focus the aspects which must be refined for the future operational forecast. Severe Mediterranean convective events are crucial type of events in terms of public safety, mostly through floods, and are mainly driven by regional anomalies above the Mediterranean Sea, while the characteristics of the phenomenon itself may depends on very small-scale details. The simulations considered in this study are not accessible to current machines, due to the size of the relevant domain and the target resolution.

Project name: Interacting electrons in small constrictions and chains

Project leader: Avi Cohen, Bar-Ilan University (IL)
Collaborators: Richard Berkovits, Bar-Ilan University (IL)
Research field: Fundamental Physics
Resource awarded: 50.000 core-hours on Hermit, GAUSS@HLRS, Germany

AbstractWe investigate interaction effects of many electrons in quantum dots. Particularly the nature of interaction induced transition from liquid to Wigner crystal state in the intermediate range of interaction strength which is a tougher challenge to investigate. We
have developed a Hartree-Fock as well as Configuration-Interaction FORTRAN code that can run on Cray machines. A DFT code is about to be completed. A code for the classical limit of crystalline configurations applies Monte_Carlo classical calculations that run on a PC.In the future we also wish to run Density Matrix Renormalisation Group (DMRG) calculations for electrons in one and two dimensional systems. The results should be compared with other calculational methods.

Project name: Analysis of Computational Performance of FLEDS Software Package

Project leader: Annarita Viggiano, University of Basilicata (IT)
Collaborators: Vinicio Magi, University of Basilicata (IT)
Research field: Engineering and Energy
Resource awarded: 100.000 core-hours on FERMI, CINECA, Italy and 50.000 core-hours on CURIE Thin Nodes partition, GENCI@CEA, France

AbstractThe aim of this project is the analysis of the computational performance of some software packages, developed at the University of Basilicata and parallelized by using the MPI libraries, by using PRACE High Performance Computing systems. The software packages solve the conservation equations of compressible multi-component reacting premixed and non-premixed mixture of thermally perfect gases and they have a wide range of applications, from the study of fundamental phenomena by using a Direct Numerical Simulation approach to the study of performance and emissions of propulsion systems with advanced combustion strategies and conventional and renewable fuels.

Project name: Scalability Testing of Astrophysics Software Gadget on Massively Parallel System

Project leader: Plamenka Borovska, Technical University of Sofia (BG)
Collaborators: Veska Gancheva, Technical University of Sofia (BG)
Research field: Astrophysics
Resource awarded: 100.000 core-hours on JUQUEEN, GAUSS@FZJ, Germany

AbstractThe objective of this project is to carry on tests of scalability of the software GADGET on the JUQUEEN system in order to assess the efficiency of the algorithm on parallel supercomputers. The GADGET Software is intended for astrophysical simulations – colliding and merging galaxies, forming of large-scale structure in the space, studying the dynamics of the gaseous intergalactic medium, forming of the stars and its regulation, etc. The test case will be simulations of galaxy systems and cosmological expansion of the space.

Project name: A scalability study for an in-homogenous subgrid scale modelling approach for separated two-

phase flows

Project leader: Ozgur Ulas Kirlangic, Istanbul Technical University (TR)
Research field: Engineering and Energy
Resource awarded: 50.000 core-hours on CURIE Thin Nodes partition and 50.000 core-hours on CURIE Fat Nodes partition, GENCI@CEA, France

AbstractThe physics of the naturally separated two-phase flows dictate huge differences in the material properties as well as the turbulence characteristics of the separated fluids which reside in neighboring regions. Also under certain severe conditions, the close vicinity of the interface, which is a thin zone of steep gradients (or jumps) in such properties, may exhibit spontaneous dispersed two-phase flow characteristics consisting of finite particles, drops or bubbles, and so appear as an additional characteristic region in the problem. “Therefore, the problem can be pictured as an heterogeneous phenomena from an overall perspective.” In a large eddy simulation (LES) study of this class of problems, the use of an in-homogeneous sub-grid scale (SGS) modeling approach where the eddy viscosity coefficient is computed dynamically and locally may provide significant improvements in the results and there is a need for such an investigation.

Project name: Parallel Simulation and Scalability Testing of Oceanology Software NEMO

Project leader: Plamenka Borovska, Technical University of Sofia (BG)
Collaborators: Desislava Ivanova, Technical University of Sofia (BG)
Research field: Earth Sciences and Environment
Resource awarded: 100.000 core-hours on JUQUEEN, GAUSS@FZJ, Germany

AbstractNemo is portable modelling framework for oceanology. The code is written in Fortran 90/95 with Message Passing Interface (MPI) library for PE interconnection. Nemo configurations includes ocean engine, Louvain-la-Neuve sea-ice model and biogeochemichal models. The source code is free downloadable for six configurations with associated data sets from ORCA2, GYRE and POMME families. The latest version of NEMO is 3.4. NEMO is a modelling framework for oceanology involves the blue ocean, the white ocean, the green ocean, the adaptative mesh refinement software and the NEMO assimilation component. The case study of the project is to adopt and execute the NEMO code on JUQUEEN architecture for the sea-ice models by using ORCA2 and GYRE data sets. The simulation results will be evaluated and analysed with respect to scalability testing.

Project name: Response of the Atlantic Ocean Circulation to Greenland Ice Sheet Melting

Project leader: Henk Dijkstra, Utrecht University (NL)
Collaborators: Michael Kliphuis, Matthijs Den Toom, Utrecht University (NL), Wilbert Weijer, Los Alamos

National Laboratory (USA), Frank Seinstra, Netherlands eScience Center (NL), Henri Bal Free University (NL), Walter Lioen SARA (NL), Nicole Grégoire SURFnet (NL)
Research field: Earth Sciences and Environment
Resource awarded: 50.000 core-hours on Hermit, GAUSS@HLRS, Germany

AbstractThe Atlantic Meridional Overturning Circulation (MOC) is sensitive to freshwater anomalies such as arising from the melting of the Greenland Ice Sheet (GrIS). Changes in the MOC effect the meridional heat transport in the ocean and hence the global climate system. The response of the MOC to current and future GrIS meltwater input is one of the important uncertainties in projections of future climate change. In this project, we want to use a strongly eddying ocean model to determine the decadal time scale response of the MOC to GrIS freshwater anomalies. Preliminary work on this problem was recently published (Weijer et al.

Geophysical Research Letters, 39, L09606
, doi: 10.1029/2012GL051611, (2012),.pdf can be downloaded from dij…) but much more simulations are needed to address issues on how ocean eddies affect the behaviour of the MOC and how the response differs from that in (non-eddying) climate models. A major result from this project would be to establish (and understand) that the real (strongly eddying) MOC is much more sensitive to GrIS freshwater anomalies than current climate models indicate.

Type B – Code development and optimization by the applicant (without PRACE support)

Project name: Parallel Blood Flow Simulation

Project leader: Nenad Filipovic, University of Kragujevac (RS)
Collaborators: Milos Ivanovic, Tijana Djukic, University of Kragujevac (RS)
Research field: Medicine and Life Sciences
Resource awarded: 100.000 GPU on CURIE HYBRID partition, GENCI@CEA, France

AbstractAtherosclerosis is a progressive disease characterized by the accumulation of lipids and fibrous elements in arteries. Over the past decade, scientists come to appreciate a prominent role of inflammation in atherosclerosis. Understanding and prediction of the evolution of atherosclerotic plaques either into vulnerable or stable plaques are major tasks for the medical community. Large scale computer modeling is essential for better understanding and prediction of the plaque formation and development. Our in-house developed software provides the continuum and discrete algorithms by comparing results with our own experimental and clinical data and data from literature. Lesion growth modeling is a complex phenomenon with events occurring at different time and space scales (genetic, cellular and macroscopic level). Hence, several models arecurrently developed, including discrete particle dynamics, cellular automata and agent based model. The main application inside this system uses computational fluid dynamics approach to model the blood flow. It aims to provide the complete solution throughout the process of performing blood flow simulation: (1) 3D artery model preparation from DICOM images, (2) dynamic Navier-Stokes analysis and finally, (3) resulting quantity fields visualization, graphing and post-processing. The origination of the idea implemented in this software is a software PAK-F for FE (finite element) computer simulation of the viscous fluid flow with mass transport and heat transport. Discrete methods like discrete particle dynamics (DPD) and lattice Boltzmann method (LBM) were also used to model the blood flow. This software targets at providing patient-specific computational model of the cardiovascular system, used to improve the quality of prediction for the atherosclerosis progression and propagation into life-threatening events that need to be treated accordingly. The application provides a three-level patient model describing the 3D arterial tree anatomy, the patient-specific blood flow the biological processes that lead to the creation and progression of atherosclerotic plaques. Our parallel blood flow simulation software applies the developed patient-specific model on two main applications: clinical decision support and training. It introduces two decision support tools to assist clinical cardiologists in providing personalized treatment selection and real-time, on-the-fly advice during invasive interventions, such as stent positioning. The aim is to minimize future therapy costs, by providing higher than ever possible personalized treatment support. The same patient-specific model will also be used to develop a real-case simulator training, which will support realistic hands-on skill development training to clinical cardiologists. Based on the computed patient model, the recognition and characterization of the formed plaque(s), the software will make an estimation of severity of current situation and potential clinical outcomes (e.g. plaque rapture, artery stenosis). The existing medical knowledge and expertise will be recorded to allow for the most appropriate treatment selection according to the patient situation. The aim is to provide high quality support to the cardiologists and thus assist for the high quality patient treatment. The organizations involved in PBFS application development, a part of ARTreat FP7 project, are organizations which are among the European leading companies for products and services in the areas of medical IT and cardiovascular and arterial surgery.”

Project name: PARFLUX: Parallel computing using a finite volumes code for simulation of impacts on a rigid wall

Project leader: Joris Costes, Eurobios (FR)
Collaborators: Jean-Michel Ghidaglia, Jean-Philippe Braeunig, Mathieu Peybernes, ENS de Cachan (FR)
Research field: Mathematics and Computer Science
Resource awarded: 200.000 core-hours on CURIE THIN Nodes partition, GENCI@CEA, France.

AbstractPARFLUX is a high performance computing project based on a finite volumes code with interface capture called FLUX-IC. The method has been developed by Jean-Philippe Braeunig and al. in his PhD thesis, 2007 and belongs to the family of the CmFD (Computational multi-Fluid Dynamics) codes. FLUX-IC has proved its efficiency on various physical test cases, more particularly for impacts calculations, this type of results have been presented during several ISOPE conferences ( and are the subject of scientific publications. The goal of the project is to compute the free fall of a liquid block in an enclosed rectangular space filled with gas and describe the impact on a rigid wall as precisely as possible. Thanks to a supercomputer, we expect to reach a millimetre accuracy at the wall using over a thousand of cores.

Project name: Advancing scalability of the code for searching of gravitational wave signals from rotating neutron stars in LIGO and VIRGO data

Project leader: Gevorg Poghosyan, Karlsruhe Institute of Technology (GE)
Collaborators: Andrzej Krolak, Michal Bejger, Polish Academy of Sciences (PL)
Research field: Astrophysics
Resource awarded: 50.000 core-hours on Hermit, GCS@HLRS, Germany

AbstractThe search of Gravitational Waves (GW) in datas collected from detectors is computationally very intensive. The data analysis to be made varies between several orders of magnitudes up to the practically almost impossible ranges. Depending on the type of search, one has to match the data with several millions of theoretical waveform template, look for temporal power excesses in the detector output and analyse the data of several detectors to find correlations. We estimate that almost 15 million CPU hours will be needed to analyse at least partially the data sets collected by the
GW detector Virgo to find most relevant sources. Moreover to perform a search of all possible sources and exploit full physical potential of data a 100 times more computation time will be needed. Hence, only if we will reach target of massively parallel version of data analysis software applicable for runs on systems with more than 10 000 CPU/parallel tasks, one can speak about acceptable times to perform the full analysis.In framework of advanced support service of SimLab E&A Particle for scientific groups facing the extreme software challenges when using modern HPC system, we have parallelized the simulation code PolGrawAllSky developed by Polgraw-Virgo group for the search of gravitational radiation from spinning neutron stars. We already tested the scaling performance of the parallel version of the code on several cluster computers with up to 2000 CPU cores. The results are promising, but have to be extended on more than 10 000 cores in order to keep up with the dimensions of necessary parameter space (…).Furthermore, we assume loss of efficiency, when using more than 5 000 cores/parallel tasks with present code version, e.g. non optimal use of memory or high frequency of data input and output leading to loss of performance. The goal of project is to identify the bottle necks and possible non-optimal usage of resources by testing the code on massively parallel systems and develop the solutions for optimal usage of data analysis code at systems with more than 10 000 cores.

Project name: Scalable Preconditioners for Saddle-Point Problems Approximated by the Finite Element Method

Project leader: Alfio Quarteroni, EPFL (SW)
Collaborators: Luca Formaggia, Antonio Cervone, Nur Aiman Fadel, Politecnico di Milano (IT), Simone Deparis, Radu Popescu, Gwenol Grandperrin, EPFL (SW)
Research field: Mathematics and Computer Science
Resource awarded: 50.000 core-hours on Hermit, GCS@HLRS and 250.000 core-hours on JUQUEEN, GCS@FZJ, Germany

AbstractThis research project aims at testing the weak and strong scalability of state of the art parallel algorithms to solve saddle-point problems – Darcy, Stokes, Navier-Stokes equations – and coupled multi-physics problems, like fluid-structure interaction problems arising in cardiovascular simulations. The simulations are based on the parallel finite element library LifeV, which provides implementations of mathematical and numerical methods including multi-scale and multi-physics models. It serves both as research and production library and it is a joint collaboration between four institutions: Ecole Polytechnique Federale de Lausanne (CMCS) in Switzerland, Politecnico di Milano (MOX) in Italy, INRIA (REO) in France, and Emory University (ECM2) in USA. Because of the increasing complexity of the models implemented in LifeV and the problems size, we need to deploy the numerical simulations on high performance clusters. For example, the recent advances in the preconditioning techniques for the discretized Navier-Stokes equations permitted a forward step in the size of the problems that we are able to handle, namely up to 35 million unknowns on 8192 cores. This is already a huge progress since without large scale parallel simulations it will not be possible to face realistic problem of practical interest for the target application. However there is still a need for lower the time to solution and for tackling larger problems size. The two main scientific cases we aim to address are simulations of blood flow in arteries and of geological basins with finite elements. Despite the two applications are very different they share common computational challenges: complex geometries with the need of using large and unstructured computational meshes, the solution of saddle-point differential problems and the need of effective parallel preconditioners. The researchers working on this two subjects do indeed share the same software library, LifeV, which we plan to use in this project. Because of the very high computational cost, the numerical simulations need to be deployed using High Performance Computing. For example modeling Fluid-Structure Interaction (FSI) in the vascular system is mandatory to reliably compute flow indicators when the vessels undergo relatively large deformations. The resolution of the fully 3D FSI problem is very expensive; in order to lower the time to solution and to address complex problems, a parallel framework is necessary. Sedimentary basins extend for several kilometers and therefore require extremely large meshes to be discretized properly. In this context HPC becomes a necessity to be able to simulate realistic cases. Furthermore, different materials with very different physical properties coexist inside a domain of investigation. The jump in the coefficients leads to very high conditioning numbers of the discretized matrices; hence the equirement of suitable preconditioners that can scale also with respect to the coefficient jumps.

Project name: Scalable simulations for water management with Delft3D

Project leader: John Donners, SARA (NL)
Collaborators: Adri Mourits, Menno Genseberger, Bert Jagers, Edwin Spee, Deltares (NL), Marcin Zielinski, SARA (NL)
Research field: Earth Sciences and Environment
Resource awarded: 250.000 core-hours on FERMI, CINECA (IT), 200.000 core-hours on CURIE Thin Nodes partition and 50.000 core-hours on Hermit, GCS@HLRS

AbstractDelft3D is a world leading, open source 3D modeling suite to investigate hydrodynamics, sediment transport and morphology and water quality for fluvial, estuarine and coastal environments. The software is used and has proven its capabilities on many places around the world, like the Netherlands, USA, Hong Kong, Singapore, Australia, Venice, etc. The goal of this project is to improve the portability and scalability of Delft3D for a range of input datasets. This project is part of PRACE2IP Work Package 9 (Industrial Support). Results will be publicly available through deliverables and a white paper on the PRACE website.

Project name: Parallel Hypergraph Partitioner

Project leader: Gündüz Vehbi Demirci, Bilkent University (TR)
Collaborators: Aykanat Cevdet, Ata Turk, Bilkent University (TR)
Research field: Mathematics and Computer Science
Resource awarded: 200.000 core-hours on CURIE Thin Nodes partition and 50.000 core-hours on Hermit, GCS@HLRS

AbstractIn parallel computing, balanced partitioning/distribution of tasks among processors is very important, since balanced partitioning minimizes the idle-time. One way of distributing the work among processors is partitioning the data. There are efficient and effective modeling schemes based on graph and hypergraph partitioning and also there are effective multi-level graph and hypergraph partitioning tools to partition these models, once they are generated. Different partitioning models used in load balancing step causes different communication patterns. So
me models reduce communication volume better (1D-rowwise, 1D-columnwise [2], 2D-FineGrain [1]), whereas some of them reduce number of messages sent over the network better (2D-Jagged, 2D Checkerboard [1]). There are a few parallel hypergraph partitioning tools that can partition the 1D-rowwise, 1D-columnwise, and 2D-FineGrain hypergraph models. However, it is known that reducing number of messages is as important as reducing communication volume, especially under peta-flop computing settings [5]. Unfortunately there are no parallel tools for automatic partitioning of 2D-Checkerboard and 2D Jagged partitioning models. In this project we will develop a parallel 2D-Checkerboard hypergraph partitioner which will make use of hierarchical partitioning based on PaToH partitioning tool for hypergraphs. Hierarchical partitioning can be useful especially in heterogeneous systems [4]. A costly partitioning can be made firstly to minimize the communication across the slow network interfaces and then a less costly partitioning can be carried out in computing nodes where communication is much more faster. In this work we are also aware of the bottlenecks of the load-balancing scheme that we have proposed. Especially the biggest problem is that the available version of PaToH [3] can only run serially. We propose to develop a hierarchical parallel PaToH for 1D partitioning and a scheme to apply that parallel PaToH version to produce a parallel 2D-Checkerboard Hypergraph Partitioner.“

Project name: Automated Network Topology Identification and Topology Aware MPI Collectives

Project leader: Chandan Basu, Linkoping University (SE)
Collaborators: Soon-Heum Ko, Johan Raber, Linkoping University (SE)
Research field: Mathematics and Computer Science
Resource awarded: 200.000 core-hours on CURIE Thin Nodes partition GENCI@CEA, France

AbstractThis project is a PRACE-2IP wp12.1 R&D work on runtime environment. We are trying to develop an efficient runtime system for MPI parallel jobs, focusing on minimizing the communication cost by the effective reassignment of MPI ranks.We will collect different network statistics aiming to determine the network topology through statistical clustering of data. We will use this topology information to efficiently map ranks on suitable resources depending on the communication pattern. Our previous work in PRACE-1IP wp7.5 on a synthetic benchmark showed around 40% performance improvement for MPI_Alltoallv performance. This work is relevant to very wide parallel jobs and testing requires access to large systems.

Project name: Reducing communication through computational redundancy in parallel iterative solvers

Project leader: Cevdet Aykanat, Bilkent University (TR)
Collaborators: Sukru Torun, Bilkent University (TR)
Research field: Mathematics and Computer Science
Resource awarded: 100.000 GPU on CURIE Hybrid partition and 200.000 core-hours on CURIE Thin Nodes partition GENCI@CEA, France

AbstractIn this project, a data replication model for parallel sparse matrix vector multiplication (pSpMxV) of the form y = Ax, which is a kernel operation in iterative solvers, is proposed. In this model, a processor may compute a y-vector entry redundantly, which leads to a x-vector entry in the following iteration, instead of receiving that x-vector entry from another processor. Thus, redundant computation of that y-vector entry may lead to reduction in communication. For this model, we devise a directed-graph-based model that correctly captures the computation and communication pattern in iterative solvers. Moreover, we formulate the communication reduction by utilizing redundant computation of y-vector entries as a combinatorial problem on this directed graph model. Initial experimental results indicate that the communication reducing strategy by redundantly computing is promising.

Type C – Code development with support from experts from PRACE

Project name: Strong Scaling of LibGeoDecomp on Blue Gene/Q

Project leader: Andreas Schaefer, Friedrich-Alexander-Universität Erlangen-Nürnberg (GE)
Research field: Mathematics and Computer Science
Resource awarded: 250.000 core-hours on JUQUEEN, GCS@FZJ, Germany

AbstractLibGeoDecomp (Library for Geometric Decomposition Codes, is a generic computer simulation library. It is aimed at stencil codes, but also supports N-body simulations. In this project we wish to develop an optimized parallelization module for IBM BlueGene/Q to improve the library’s strong scaling behaviour.

Project name: Combinatorial Models for Topology Aware Mapping

Project leader: Cevdet Aykanat, Bilkent University (TR)
Collaborators: Ata Turk, Oguz Selvitopi, Bilkent University (TR)
Research field: Mathematics and Computer Science
Resource awarded: 200.000 core-hours on CURIE TN/FN partition GENCI@CEA, France and 50.000 core-hours on Hermit, GCS@HLRS

AbstractTopology-aware mapping has started to gain momentum again with the development of supercomputers equipped with thousands of cores. In such architectures, where many users submit parallel applications that heavily rely on scalability, the performance implications are of primary concern. These parallel programs find their applications areas in a wide range of research disciplines. The mapping of tasks of parallel applications to the processors of the parallel system is crucial for the performance and scalability of these applications. Parallel applications can benefit from performance improvements by carefully designed mapping algorithms that take various parameters into account in the mapping process. These parameters include the processor topology and the network architecture of the parallel system, certain characteristics of the applications such as communication patterns and computational loads of tasks, live information about the parallel system, etc. The aim of this project is to investigate combinatorial models and propose efficient algorithms for topology-aware mapping. The current approaches for topology-aware mapping are generally centered around specific type of applications, which are designed for specific type of processor and network architectures. These approaches are not general enough to be used as generic mapping algorithms. Our main goal in this project is to fill this gap by proposing fast methods and algorithms based on combinatorial mode
ls that provide performance improvementsindependent of the characteristics of the underlying parallel system architecture and certain characteristic of the parallel applications. Initially, these approaches will be tested on a widely used parallel application, Sparse Matrix-Vector Multiplication (SpMxV), which is used as a kernel operation in various scientific and high performance computing applications. There are various models for SpMxV, which makes it attractive for assessing the quality of proposed approaches. These models represent different test cases by differing in communication patterns and computational loads. Later, the scalability and performance of our approaches will further be tested for different parallel applications from different domains.With this project, we plan to propose a generic tool for mapping tasks of a parallel application to a processor topology utilizing various features of the application and the underlying parallel system layout. In this way, we believe that the performance of the parallel applications can greatly be improved.

Project name: CASINO for large solid catalyst systems: configuration numbers and population control

Project leader: Philip Hoggan, Clermont University (FR)
Collaborators: Neil Drummond, Lancaster University (UK)
Research field: Chemistry and Materials
Resource awarded: 200.000 core-hours on CURIE FN partition GENCI@CEA, FRANCE and 250.000 core-hours on JUQUEEN, GCS@FZJ, GERMANY

AbstractThe test reaction for hydrogen dissociation on copper surfaces is well-documented. Some geometries for the system are available. (see C. Díaz, E. Pijper, R. A. Olsen, H. F. Busnengo, D. J. Auerbach and G. J. Kroes, Science, 2009, 326, 832-834.) A very accurate benchmark using Quantum Monte Carlo simulations is required and could be obtained using the CASINO code. This involves handling large input files for the trial wave-function and the initial equilibrium configurations. The code is known to scale linearly to at least 100 000 cores and shared memory has been shown allow it to run efficiently within the RAM on Bluegene/P at IDRIS (France). The unknown aspect is how best to handle such large systems for over 8192 cores and how to be sure the configuration file does not increase in size so much as to saturate the RAM (population explosion) in the early phases of equilibrating a Diffusion Monte Carlo calculation. A fail-safe input flag -trip weight) avoids this. Population can also be monitored step by step. These precautions are temporary, as the population is stabler after equilibration. This peak in population is largely caused by variance in the trial wave-function. The aim is to improve trial wave-function quality and subsequently reach this equilibrium state more efficiently.