PRACE Preparatory Access – 30th cut-off evaluation in September 2017

Find below the results of the 30th cut-off evaluation of 1 September 2017 for the PRACE Preparatory Access.

Projects from the following research areas:

 

On the Strong Scaling of Maritime CFD: Open-Usage Community based viscous-flow CFD code ReFRESCO (www.refresco.org) towards the Exascale Era (Part2)

Project Name: On the Strong Scaling of Maritime CFD: Open-Usage Community based viscous-flow CFD code ReFRESCO (www.refresco.org) towards the Exascale Era (Part2)
Project leader: Dr. Eng. Guilherme Nuno Vasconcelos Beleza Vaz
Research field: Engineering
Resource awarded: 100000 core hours on Hazel Hen
Description

ReFRESCO (www.refresco.org) is a community based open-usage viscous-flow (CFD) solver for the Maritime World. It solves multiphase (unsteady) incompressible viscous flows using the Navier-Stokes equations, complemented with turbulence models, cavitation models and volume-fraction transport equations for different phases. ReFRESCO (v2.4.0) is currently being developed, verified and its several applications validated at MARIN (in the Netherlands) in collaboration with IST (in Portugal), USP-TPN (University of Sao Paulo, Brasil), TUDelft (Technical University of Delft, the Netherlands), UoS (University of Southampton, UK), UTwente (University of Twente, the Netherlands), Chalmers (Chalmers University, Sweden), UMaine (University of Maine, USA), Texas A&M (Texas A&M University, USA), UPB (Universidad Pontificia Bolivariana, Colombia), WUR (Wageningen University and Research, the Netherlands), Iowa University ( Iowa, USA) and UDE (University of Duisburg-Essen, Germany). Like most of the current CFD solvers it relies on domain-decomposition and MPI parallel paradigm for performance acceleration, mostly tailored for “traditional” multi-core distributed memory HPC machines. During the last years, an effort has been made on assessing the strong scalability of the whole code, and all its individual components. Bottlenecks have been identified; for instance single-process IO, excess of global MPI communication and low scalability of the linear-system-of-equations solver. In order to solve some of these issues new parallelization techniques (MPI+openMP), new solvers (PETSC, TRILINOS, own asynchronous solvers), new compilers (GNU, Intel, PGI/Nvidia) and algorithm choices have been implemented. Preliminary tests on small HPC machines show encouraging improvements. Tests on CPU+co-processor (Intel KnightsLanding KNL) platforms are undergoing. The objective of the current project is therefore to test all these new paradigms (and their combinations), for several Maritime problems, in more world-class Tier-0 PRACE HPC machines. All this envisaging the extension of ReFRESCO for forthcoming hardware and to prepare it to the Exascale era.

top

High-Order Methods for LES (HOMLES)

Project Name: High-Order Methods for LES (HOMLES)
Project leader: Prof. Francesco Bassi
Research field: Engineering
Resource awarded: 50000 core hours on Marconi – Broadwell, 100000 core hours on Marconi – KNL, 50000 core hours on MareNostrum, 100000 core hours on Hazel Hen
Description

The objective of the project “High-order methods for LES” (HOMLES) is to assess the parallel performance of high-order Computational Fluid Dynamics (CFD) solvers when dealing with the Implicit Large Eddy Simulation (ILES) of compressible flows. In ILES no explicit subgrid model is included but it is the numerical discretization itself that acts like a SGS. The Discontinuous Galerkin (DG) and, more recently, the Flux Reconstruction (FR) methods, thanks to their favourable numerical properties, showed to be well suited for the high-fidelity ILES of turbulent flows. The interest of the scientific community in such methods is demonstrated by the ongoing EU Horizon 2020 project TILDA (Towards Industrial LES/DNS in Aeronautics – Paving the Way for Future Accurate CFD) (http://cordis.europa.eu/project/rcn/193362_en.html) grant agreement No.635962. The HOMLES project involves three partners with a strong background in the development of modern, efficient and accurate solvers: the University of Bergamo, Italy (UNIBG), CENAERO, Belgium and NUMECA, Belgium. The partners of HOMLES are also part of the TILDA consortium, coordinated by NUMECA. This project is intended as a further joint effort towards the very accurate simulation of turbulent flows on industrial configurations. With the HOMLES project the partners want to prove the scaling of their solvers on a number of platforms as large as possible. With this in mind, the partners are asking with this preparatory project the access to several architectures among those available within the call. The partners will evaluate their solver scalability on computational problems close enough to the final intended application, i.e. turbomachinery, possibly comparing the parallel performance of explicit versus implicit time integrators.

top

Chemo-mechanical transduction in myosin molecular motors: insight from optimal path calculations

Project Name: Chemo-mechanical transduction in myosin molecular motors: insight from optimal path calculations
Project leader: Dr Marco Cecchini
Research field: Biochemistry, Bioinformatics and Life sciences
Resource awarded: 50000 core hours on Marconi – Broadwell, 100000 core hours on Marconi – KNL, 50000 core hours on MareNostrum, 50000 core hours on Curie, 50000 core hours on SuperMUC,
Description

The biomolecular motor myosin converts the chemical free energy of ATP hydrolysis into mechanical work. Understanding the structural and energetic underpinnings is key to achieve control over energy storage and conversion and eventually engineer nanomolecular machines. The recovery stroke is a key step of the functional cycle in which ATP hydrolysis is coupled to re-priming of the motor into an armed configuration. By combining X-ray crystallography and Molecular Dynamics,we have recently characterized a novel structural intermediate along the recovery stroke of myosin VI. Surprisingly, this intermediate shows that the re-priming of the lever arm precedes the switching on of ATPase activity. This observation suggests a novel picture of chemo-mechanical transduction which we would like to elucidate. To that end, we plan to compute the minimum free energy path (MFEP) for the recovery stroke using the cutting-edge string method. Importantly, since the recovery stroke involves only states with low-affinity for actin, full understanding of this step will open new pharmaceutical avenues to treat metastatic cancers in which myosins are involved.

top

Flexible Aerodynamic Solver Technology in an HPC environment

Project Name: Flexible Aerodynamic Solver Technology in an HPC environment
Project leader: Dr Nicolas Alferez
Research field: Engineering
Resource awarded: 100000 core hours on Marconi – KNL, 50000 core hours on MareNostrum,
Description

The CFD department at ONERA has been developing computational fluid dynamics software for decades both for its own research and for industrial partners. Numerical simulations of the noise generated with realistic flow conditions typically require to march in time complex algorithms with billions of unknowns to compute few seconds of a relevant timescale. The development of a major evolution of our industry-oriented finite-volume solver (elsA) is currently conducted in collaboration with Intel through an Intel Parallel Computing Center on a prototype structured solver: FastS. A special attention has been paid to achieving efficient shared memory parallelism (OpenMP) and use of modern vectorization capabilities (appropriate memory layouts and vectorization-friendly algorithms). In particular, an exhaustive and systematic optimisation approach has been carried out in order to achieve satisfactory intra-node performances for both the multi-core (e.g. cache-blocking, reduction of memory requirements) and the many-core (OpenMP strategy and corresponding algorithm) Intel Xeon and XeonPhi architectures. This first step of the optimisation process has led to a substantial intra-node performance improvement: the application’s hotspot (>75% of overall application time) reaches more than 75% of the double-precision-vectorised roofline limit on a double socket Haswell architecture. Recent developments on a KNL node has led to equally satisfactory intra-node performances on this new architecture. The aim of the project (type A preparatory access) is to explore large scale distributed memory parallelism (using hybrid MPI + OpenMP). Expected outcomes are: 1) prepare for a large scale physical case through a PRACE’s project which will be submitted to the next call. 2) Explore distributed memory parallelism on KNL to support the current IPCC project and first of all to choose the appropriate Intel architecture for the upcoming PRACE’s submission for the next call.

top

DISpOSED-DIrect numerical SimulatiOn of SquarE Duct flow

Project Name: DISpOSED-DIrect numerical SimulatiOn of SquarE Duct flow
Project leader: Dr Davide Modesti
Research field: Engineering
Resource awarded: 100000 core hours on Hazel Hen
Description

The aim of the present project is to perform weak and strong scalability tests of a compressible solver for the solution of Navier-Stokes equations through Direct Numerical Simulation (DNS), also in preparation for the next Prace call. A novel semi-implicit algorithm for time step advancement has recently been developed that allows to use the same solver at all Mach numbers. The solver has already been tested on different architectures, FERMI BG/Q, MARCONI BROADWELL/KNL, JUQUEEN BG/Q and has been used to perform large scale DNS at high Reynolds number. The project will focus on DNS of square duct flow at different Mach numbers, so as to test the performance of the solver both in the subsonic and supersonic regime.

top

SPHPore

Project Name: SPHPore
Project leader: Mr Malte Schirwon
Research field: Earth System Sciences
Resource awarded: 100000 core hours on Piz Daint,
Description

The study of fluid flow through porous materials is an important part for carbon dioxide sequestration in underground reservoirs, groundwater contamination remediation, thermal energy generation, soil irrigation for food production, and many more. We study immiscible two-phase flow in subsurface porous media to examine the stability and evolution of fluid interfaces. Here, the role that microscopic heterogeneities at the length scale of individual pores have on the macroscopic behavior remains to a large extent unknown, and needs to be addressed for better predictions. As a partial remedy, recent advances in pore-scale imaging-based characterization methods have provided valuable insights into the interplay of viscous, capillary, gravitational and inertial forces, which together constitute the complexity of interface dynamics at the pore-scale. Experimental measurements however do not tell the whole story, and simulations are required to complement experimental results. We thus consider the combination of experimental imaging data of the pore-space geometry with direct numerical simulations (DNS) of the flow, which we believe to be the most beneficial tool for quantitative characterization of flow in porous media. Our method of choice for DNS is a quasi-incompressible Smoothed Particle Hydrodynamics model (SPH) which incorporates the Navier-Stokes equations together with the Continuum Surface Stress method to account for the interfacial balance equations. Due to the geometric complexity of porous media samples obtained from micro-CT scans, particle methods like SPH are favourable. For the same reasons however, the number of particles required for scientifically meaningful simulation results can easily reach billions, and HPC techniques are mandatory. In this preparatory project, we want to study the scalability of our extension to the general HOOMD-blue particle code to simulate two-phase flow in large three-dimensional domains obtained directly from voxelized data of micro-CT porous media.

top

Nyx scalability tests at PRACE

Project Name: Nyx scalability tests at PRACE
Project leader: Dr Jose Onorbe
Research field: Universe Sciences
Resource awarded: 100000 core hours on Marconi – KNL, 50000 core hours on MareNostrum, 50000 core hours on Curie, 100000 core hours on Hazel Hen
Description

In this project we request computing time on several major HPC PRACE facilities in order to test the scalability performance of the Nyx code. The results of these scalability tests will be used to submit a PRACE tier-0 proposal at the next call. The Nyx code is a massive parallel cosmological N-body + hydrodynamical code. Nyx has been developed particularly for simulations of the Ly-alpha forest, its accuracy has been already established, and it has been instrumental for a number of recent IGM studies. We will employ a new method already implemented in the code by the PI to model inhomogeneous reionization which allows us to reliable vary the timing of inhomogeneous reionization and its associated heat injection without increasing the cost of the simulation. The method thus allows to explore the full physical parameter space and provide accurate fits to high-z Ly-α observations, at a cost dramatically less than using full radiative transfer simulations. The scaling behaviour of Nyx has already been tested in other large HPC centers. Although Nyx has shown good weak and strong scaling performance for even larger numbers of MPI tasks and OMP threads than the ones that we will require for our future PRACE proposal, we think that it is important to analyse its performance at PRACE for exactly the same setup that we want to use. For this reason we request a proposal A to tests for different HPC architectures available at PRACE to decide which one will be the best for our future PRACE proposal. Based on the architectures available in each PRACE HPC we apply for time to run scalability tests at MareNostrum, Marconi, Hazel Hen and Curie.

top

SwORD- direct numerical simulation of SupersOnic Rectangular Duct

Project Name: SwORD- direct numerical simulation of SupersOnic Rectangular Duct
Project leader: Dr Davide Modesti
Research field: Engineering
Resource awarded: 50000 core hours on Curie
Description

The aim of the present project is to perform weak and strong scalability tests of a compressible solver for the solution of Navier-Stokes equations through Direct Numerical Simulation (DNS), also in preparation for the next Prace call. A novel semi-implicit algorithm for time step advancement has recently been developed that allows to use the same solver at all Mach numbers. The solver has already been tested on different architectures, FERMI BG/Q, MARCONI BROADWELL/KNL, JUQUEEN BG/Q and has been used to perform large scale DNS at high Reynolds number. The project will focus on DNS of rectangular supersonic duct flow at different aspect ratios and Mach numbers.

top

AquEDUCT-Aspect ratio Effect in DUCT flow

Project Name: AquEDUCT-Aspect ratio Effect in DUCT flow
Project leader: Dr Davide Modesti
Research field: Engineering
Resource awarded: 50000 core hours on SuperMUC
Description

The aim of the present project is to perform weak and strong scalability tests of a compressible solver for the solution of Navier-Stokes equations through Direct Numerical Simulation (DNS), also in preparation for the next Prace call. A novel semi-implicit algorithm for time step advancement has recently been developed that allows to use the same solver at all Mach numbers. The project will focus on DNS of rectangular subsonic duct flow at different aspect ratios.

top

Large-scale SUSY phenomenology with GAMBIT

Project Name: Large-scale SUSY phenomenology with GAMBIT
Project leader: Dr Pat Scott
Research field: Fundamental Physics
Resource awarded: 50000 core hours on Marconi-Broadwell, 100000 core hours on Marconi – KNL,
Description

The Global and Modular Beyond-the-Standard Model Inference Tool (GAMBIT) is a project aimed at producing the most rigorous analyses and comparisons possible of theories for particle physics theories Beyond the Standard Model. It achieves this by combining the latest experimental results from dark matter searches, high-energy collider experiments such as the LHC, flavour physics, cosmology and neutrino physics. It then compares these results to the most accurate theoretical predictions of cross-sections, particle masses, scattering and decay rates, cosmic ray fluxes and neutrino oscillations using cutting-edge statistical methods, in order to produce the most up-to-date and complete picture of the search for dark matter and new physics possible. The GAMBIT codebase has been developed over a period of five years by a team of 30 experimentalists, theorists, statisticians and computer scientists, working in very close collaboration. It draws on the expertise of members of nearly all of the leading particle and astroparticle experiments around the world, as well as all of the leading pieces of software in the field. To date, GAMBIT has led to three landmark physics papers [1-3]. Two of these [2,3] have focused on supersymmetry, arguably the most promising theoretical framework for explaining dark matter and predicting the existence of other new particles. Due to computational constraints however, the most extensive of these analyses was able to explore just 7 of the 25 most interesting parameters of this framework. The current project will obtain performance data that will help determine just how many of these parameters could be rigorously explored using the power of the PRACE Tier 0 infrastructure. References: [1] GAMBIT Collaboration: P. Athron, et al. Status of the scalar singlet dark matter model, EPJC in press [arXiv:1705.07931] [2] GAMBIT Collaboration: P. Athron, et al. Global fits of GUT-scale SUSY models with GAMBIT, submitted to EPJC [arXiv:1705.07935] [3] GAMBIT Collaboration: P. Athron, et al. A global fit of the MSSM with GAMBIT, submitted to EPJC [arXiv:1705.07917].

top

AMGCL scalability testing

Project Name: AMGCL scalability testing
Project leader: Prof Riccardo Rossi
Research field: Mathematics and Computer Sciences
Resource awarded: 50000 core hours on MareNostrum, 100000 core hours on Piz Daint
Description

The capability to solve large and sparse systems of equations is a cornerstone of modern numerical methods, sparse linear systems of equations being ubiquitous in engineering and physics. Direct techniques, despite their attractiveness, simply become not viable beyond a certain size, typically of the order of the few millions of unknowns, due to their intrinsic memory requirements and shear computational cost. The proposed project aims at performing a scalability study of the distributed parallel iterative algorithm for solving large sparse linear systems of equations arising from discretization of partial differential equations. The algorithm is implemented as part of an opensource AMGCL library and uses subdomain deflation approach combined with algebraic multigrid used as a local preconditioner on each of the subdomains. The local preconditioning component may be parallelized with either of OpenMP, OpenCL, or CUDA technologies through one of AMGCL backends. The solver is currently working and is publicly available. Scalability was tested up to 10k cores within the numexas project. Prace support is requested to be able to complete the strong and weak scalability tests at larger scale, a step needed to publish the developments. Poisson type problems on regular domains will be automatically generated during a stencil based approach. Matrices corresponding to Navier-Stokes problems will be generated using the finite element code Kratos. The project represents a collaboration between Dr Riccardo Rossi (project leader, and main developer of the Kratos framework) and Dr Denis Demidov, the author of the Amgcl and vexcl libraries.

top

 

Type B: Code development and optimization by the applicant (without PRACE support) (3)

Scaling sparse matrix factorization with stochastic gradient descent

Project Name: Scaling sparse matrix factorization with stochastic gradient descent
Project leader: Assistant Professor Mustafa Ozdal
Research field: Mathematics and Computer Sciences
Resource awarded: 100000 core hours on Marconi – Broadwell, 200000 core hours on 200000
Description

Stochastic Gradient Descent (SGD) is a latent factor based method that is widely used in machine learning applications to solve a variety of optimization problems. SGD repeatedly computes the gradient of a loss function on a single training example or a small number of training examples and follows the negative gradient of the objective. It is a good method for problems that have many local optimums, where the idea at each step is to carry the model into a hopefully more optimal region via a noisy gradient by using a few number of samples. In computer science, it is generally used in the context of matrix factorization and completion. SGD is especially preferred for large-scale sparse datasets as it utilizes a very small number of samples to update parameters in a particular iteration. It is memory-efficient and shown to be robust enough along with its fast convergence in several studies. Moreover, it is suitable for an online setting where there is limited time to produce answers and the ease of its implementation provides many opportunities for optimization and tuning. There are also different variants of SGD algorithm, such as DSGD, ASGD, etc., that may be suitable for different needs. Although SGD has been very actively used by the machine learning applications [1] [2], its performance has not been much studied by the scientific computing community. Our main aim in this project is to implement and optimize SGD using a readily available parallel toolkit (i.e., PETSc) and test its scalability on HPC systems. There are already existing approaches to efficiently parallelize SGD on shared-memory systems or relatively small-scale commodity clusters. Although HPC systems differ from these systems in various aspects, we aim to investigate applicability of these approaches on large-scale HPC systems as well. The proven scalability of SGD on those small-scale systems may be an indicator of its scalability on HPC systems. The end goal is to test the suitability of SGD for running it on large-scale systems. [1] Koren, Yehuda, Robert Bell, and Chris Volinsky. “Matrix factorization techniques for recommender systems.” Computer 42, no. 8 (2009). [2] Gemulla, Rainer, Erik Nijkamp, Peter J. Haas, and Yannis Sismanis. “Large-scale matrix factorization with distributed stochastic gradient descent.” In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 69-77. ACM, 2011.

top

Evaluation of large-scale deep neural network training using Intel Caffe on modern CPU architectures

Project Name: Evaluation of large-scale deep neural network training using Intel Caffe on modern CPU architectures
Project leader: Dr. Valeriu Codreanu
Research field: Mathematics and Computer Sciences
Resource awarded: 200000 core hours on Marconi – KNL, 100000 core hours on MareNostrum
Description

Machine Learning (ML) in general, and Deep Learning in particular, has developed into an exciting field in recent years, with notable results in computer vision, text classification, speech recognition, and almost any activity in which some form of recognition is required. Even though ML has been around for more than half a century, notable results such as the ones for the ILSVRC classification challenge, have only been attained in recent years by deep neural models. These recent results already surpass human-level accuracy, with misclassification rates going down from over 20% in 2011 to below 5% in 2015. The current ‘de-facto’ hardware architecture used by ML researchers for training deep neural network models is the GPU architecture, due to the high degree of parallelism and memory bandwidth. However, we are well aware of the misconception that “only” GPUs are good for this task. The reason for this misconception is that projects such as Caffe provide un-optimized CPU implementations for the convolutional layers, these being the most computationally expensive operations in recent deep networks. Thus, we identify two interesting aspects: – Neural network models become deeper and more complex, but are constrained by the GPU memory capacity. – Even networks that fit memory-wise on a single-GPU can exhibit prohibitive training times. We are aware of Intel’s efforts to update the MKL library (and integrating it in Intel Caffe) with deep learning routines, addition of the MLSL communication library, as well as also on the performance improvements that the Skylake and Knights Landing architectures can deliver for deep learning. The Knights Landing architecture in particular could be a very good solution for the issues above because: 1. It features more and faster memory compared to any GPU board available, thus allowing bigger models to be trained on a single node. 2. It features the on-package Omni-Path controller allowing for low-latency, high-bandwidth communication between participating nodes, thus enabling distributed deep learning. We therefore propose to tackle the mentioned issues by using traditional neural network data/model parallelism together with Intel’s MLSL to distribute the training phase across multiple Knights Landing/Skylake nodes, while efficiently using each node through the Intel MKL libraries. This will allow for both reduced training times and for larger and more complex models..

top

Scaling GYSELA on Xeon Phi KNL

Project Name: Scaling GYSELA on Xeon Phi KNL
Project leader: Dr. Yuuichi Asahi
Research field: Engineering
Resource awarded: 200000 core hours on Marconi – KNL
Description

Numerical simulations for plasma turbulence play a central role in predicting the performance of magnetic confinement fusion devices. The fundamental difficulty of fusions plasma turbulence simulation lies in its low-collisionality characteristics (high Knudsen number) showing the non-Maxwellian features. Thus, five dimensional kinetic simulations, called gyrokinetic simulations, are often employed rather than the usual 3D fluid simulations. Since these simulations are very costly, it is appealing to take advantage of the high computational performance of the state-of-the-art many core architectures including Xeon Phi KNL. For the latest simulations with kinetic electrons, it may not even be achieved without the acceleration by these architectures. Our aim is to accelerate our codes on KNL and to understand the possible difficulties of the real application on the latest many core architectures, which is an open issue in this field. To this purpose, we would like to measure the scalability of our code on more than 1000 of KNLs which is relevant to the production runs with kinetic electrons. The Marconi supercomputer equipped with 3600 KNLs will allow us to investigate the scalability of GYSELA at a larger scale parallelism (relevant to production runs). This project aims at improving the GYSELA performance on KNL through the optimizations on communications, in addition to the kernel level optimizations.

top

 

Type C: Code development with support from experts from PRACE (1)

Optimisation of EC-Earth 3.2 model

Project Name: Optimisation of EC-Earth 3.2 model
Project leader: Dr Virginie Guemas
Research field: Earth System Sciences
Resource awarded: 100000 core hours on MareNostrum
Description

EC-Earth is the community European global climate model, based on the world-leading weather forecast model of the ECMWF (European Centre of Medium Range Weather Forecast) in its seasonal prediction configuration, along with NEMO, a state-of-the-art modelling framework for oceanographic, forecasting and climate studies which is developed by the NEMO European Consortium. BSC has developed a coupled version of EC-Earth 3.2 at a groundbreaking resolution. In the atmosphere the horizontal domain is based on a spectral truncation of the atmospheric model (IFS) at T1279 (approx. 15 km globally, i.e. the highest resolution we can use with the standard IFS – higher resolutions would require e.g. non-hydrostatic parameterizations) together with 91 vertical levels. The ocean component (NEMO), is run on the so-called ORCA12 tripolar (cartesian) grid, at a horizontal resolution of about 1/12° (approximately 16 km); with 75 vertical levels which thickness increases from 1m below surface up to 500m in the deep ocean.

top

 

Type D: Optimisation work on a PRACE Tier-1 (1)

Automatic generation and optimization of meshes for industrial CFD

Project Name: Automatic generation and optimization of meshes for industrial CFD
Project leader: Prof. Johan Hoffman
Research field: Mathematics and Computer Sciences
Resource awarded: 150000 core hours on Tier-1
Description

We develop algorithms for automatic generation and optimization of unstructured tetrahedral meshes for CFD in complex geometry, with particular focus on efficient representation of surface geometry. We will compare indirect surface wrapping and immersed boundary techniques with direct brute force resolution of the surface geometry, where our adaptive algorithms will be used to enforce fine resolution of the surface only where needed to achieve high accuracy in the quantity of interest. The goal of this project is to develop adaptive algorithms to obtain a comparable accuracy with a dramatically reduced number of cells.

top