PRACE Preparatory Access – 29th cut-off evaluation in September 2017

Find below the results of the 29th cut-off evaluation of Septmber 2017 for the PRACE Preparatory Access.

Projects from the following research areas:

 

Scalability of PCJ applications on HLRS

Project Name: Scalability of PCJ applications on HLRS
Project leader: Dr Marek Nowicki
Research field: Mathematics and Computer Sciences
Resource awarded: 100000 core hours on Hazel Hen
Description

The PCJ library [1,2] is HPC Challenge 2014 award-winning Java library for high-performance parallel computing. The PCJ library implements PGAS (Partitioned Global Address Space) paradigm for running concurrent applications on a multicore computer or computing clusters, so the systems that consist of many multicore nodes. The benchmarks’ results [3], show that the solutions based on the Java language, processed in the novel Java Virtual Machines, can be as fast as the solutions based on the lower level programming languages, like C++. The PCJ library is open source library (BSD license). The PCJ library hides from the user the communication mechanism performed inside the node and between the Java Virtual Machine (JVM) on the system. The PCJ library has been successfully used to adopt scientific applications for running on parallel systems. A good example is a multiparameter optimisation using a genetic algorithm for modelling connectome of the nematode (Caenorhabditis Elegans). An another example is the implementation of processing graph data from the Graph500 benchmarks suite, or parallelization of the invocation of BLAST application for finding similar DNA sequences. There is a number of microbenchmarks and applications that help to benchmark the PCJ library on different systems: – ping-pong – for measurement time and speed of communication between two PCJ threads on the same node or different nodes depending on the sending variable size, – barrier – for a time measurement of the synchronization of all threads running, – broadcast – broadcasting variable with various size to all PCJ threads, – PiInt – parallel approximation the PI value using rectangle method, – PiMC – parallel approximation the PI value using Monte Carlo method, – PiEstimator – parallel approximation the PI value using quasi-Monte Carlo method that use 2D Halton Sequence, – RayTracer – parallel ray tracing depending on the scene size, – GameOfLife – parallel calculation the state of the huge universe of the cellular automaton. The most of those applications have been prepared using the PCJ library scales up even up to whole available Cray XC40 system (about 960 of 1084 nodes) available at Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw (ICM UW, Poland). Under this proposal, I plan to run example applications and benchmarks on the larger systems than I have access to currently, that is on Cray XC40 (Hazel Hen) with 7712 computational nodes. Hopefully, thanks to the usage of Java and availability of standard Oracle Java Virtual Machine for the system, the applications and benchmarks should run properly without neither much nor any changes in the applications codes. [1] PCJ: http://pcj.icm.edu.pl [Accessed: 7.06.2017] [2] M. Nowicki. Opracowanie nowych metod programowania równoleglego w Javie w oparciu o paradygmat PGAS (Partitioned Global Address Space). PhD Dissertation (in Polish), University of Warsaw, 2015, http://ssdnm.mimuw.edu.pl/pliki/prace-studentow/st/pliki/marek-nowicki-d.pdf [Accessed: 12.11.2016] [3] M. Nowicki, P. Bala. Parallel computations in java with PCJ library. In: 2012 International Conference on High Performance Computing and Simulation (HPCS), pp. 381–387. IEEE (2012)

top

Parallel Voronoi diagram generation

Project Name: Parallel Voronoi diagram generation
Project leader: Dr. Jean-Claude Charr
Research field: Mathematics and Computer Sciences
Resource awarded: 50000 core hours on Curie
Description

This project concerns the development of a message passing application that generates for a set of 3D particles its 3D Voronoi diagram in parallel, using the serial Voro++ library. Each process of the distributed application computes the Voronoi diagram of its local particles using the serial Voro++. The particles that have a Voronoi cell that intersects the local boundaries, and their neighbors are considered to be dependent of particles located on the neighboring processes and are sent to those neighbors to compute the global 3D Voronoi diagram. After receiving the particles from their neighbors, each process recomputes the Voronoi cells for only the sent local particles because Voro++ can compute individually the Voronoi cell of a given particle. The application have already been developed and tested on a small local cluster. The main objective of this project is to test the scalability of the application while using thousands of cores.

top

Topology Aware Collective Communications

Project Name: Topology Aware Collective Communications
Project leader: Mr. Salvatore Di Girolamo
Research field: Mathematics and Computer Sciences
Resource awarded: 50000 core hours on SuperMUC
Description

Collective communications are one of the most critical communication operations at scale since they can involve large number of nodes, sometimes the entire machine. Alpha-Beta and LogGP are the most common performance models that are used for designing algorithms for collective communications. However, these models rely on some important assumption, like fully-connected network and no congestion. This project aims to extend the design space of collective operations by relaxing such assumptions and including the description of the physical network topology into the collective operation schedule generation. We plan to use the PRACE allocation to evaluate the performance of standard and proposed topology-aware algorithms for collective operations. In particular, the first stage (type A) will be focused on showing that the oblivious virtual-to-physical topology mapping can lead to performance degradation of the collective communication, hence limit the applications scalability. We plan to explore, benchmark, and improve the proposed topology-aware algorithms in the second phase of the project (type B allocation).

top

Bottleneck problem for cloud droplet and planetesimal formation

Project Name: Bottleneck problem for cloud droplet and planetesimal formation
Project leader: Mr Xiang-Yu Li
Research field: Universe Sciences
Resource awarded: 50000 core hours on Marconi – Broadwell, 100000 core hours on Marconi-KNL, 5000 core hours on Curie, 100000 core hours on Hazel Hen, 50000 core hours on SuperMUC,
Description

Turbulence effect on inertial particles is a fundamental problem in both Natural and Engineering Sciences. Rapid rain formation in warm clouds in meteorological context and planetesimal formation in astrophysical context are the two challengings. Observations of radar reflectivity in tropical regions suggest that rain could form in cumulus clouds by the warm rain process in approximately 15-20 minutes. The growth of cloud droplets in warm rain is dominated by two processes: condensation and coagulation. Condensation of water vapor on active cloud condensation nuclei is important in the size range of 2 – 15 µm. To form the rain droplets, which fall down initiating rain, they should grow up to 50 µm in radius during 15 – 20 minutes. As the droplet radius growth rate is inversely proportional to the radius, a larger droplet grows slower than a smaller one, which generates a very narrow size distribution. Therefore, droplet growth by coagulation is required to facilitate rapid growth and thus the formation of rain drops. Gravitational coagulation plays a significant role for droplet growth with size larger than 40 µm ~ 50 µm. However, it cannot explain the fast growth in the size gap, which is called the bottleneck problem. Therefore, turbulence-generated coagulation has been suggested to explain the bottleneck problem. The even severe size gap for planetesimal formation is one of the most challenging problems in astrophysics. In this project, we aim to use direct high-resolution numerical simulation (DNS) to tackle the longstanding bottleneck problem. We will particularly focus on the numerical study of the combined turbulence and cloud droplet growth.

top

Cosmological Volumes of Simulated Galaxies with Resolved Structure and Multiphase Interstellar Medium

Project Name: Cosmological Volumes of Simulated Galaxies with Resolved Structure and Multiphase Interstellar Medium
Project leader: Prof. Robert Feldmann
Research field: Universe Sciences
Resource awarded: 50000 core hours on Marconi-Broadwell
Description

Recently, large-volume cosmological simulations that reproduce many properties of observed galaxies have become available. However, these simulations model core physical processes on interstellar medium scales (such as star formation, stellar feedback, and supermassive black hole growth) using finely tuned sub-grid models, limiting their predictive power and compromising their ability to capture physics internal to galaxies. Zoom-in simulations have begun to implement much more detailed ISM models, but are available only for selected halos. We propose a new set of simulations designed to bridge the gap between current large-volume and zoom-in simulations. The resolution is comparable to today’s best zoom-ins but contain hundreds of galaxies. Our simulations implement the comprehensive stellar feedback model developed as part of the FIRE zoom-in simulation project, which has been demonstrated to reproduce a wide range of key galaxy observables without the need to tune parameters. Furthermore, all runs will be based on a new Meshless Finite Mass (MFM) hydrodynamic solver, which has been demonstrated to provide superior accuracy relative to smoothed particle hydrodynamics (SPH) for a wide range of problems. The preparatory project will be used for scaling tests of the proposed target simulations.

top

Code scalability testing for atomistic molecular dynamics simulations of respiratory complex I

Project Name: Code scalability testing for atomistic molecular dynamics simulations of respiratory complex I
Project leader: Dr Vivek Sharma
Research field: Biochemistry, Bioinformatics and Life sciences
Resource awarded: 100000 core hours on Marconi-KNL, 100000 core hours on Piz Daint,
Description

Energy plays a central role in our lives; it is required for heating, lighting, or as a fuel. In a human body, energy is stored in the form of a small molecule called ATP (adenosine triphosphate). It is produced in the mitochondria of the cell by the action of various enzymes. One such enzyme is respiratory complex I, which contributes to about 40 % of ATP generation in mitochondria. Besides its central role in ATP production, complex I is known to be associated with various mitochondrial disorders. How enzyme functions or dysfunctions during a disease, remains unknown. In this project, we will perform scalability tests of a large atomistic model system of complex I, which as been constructed from an X-ray structure. We will use highly-efficient and highly parallelized molecular dynamics simulation software GROMACS for the purpose. The scalability plots will be used for forthcoming large-scale PRACE projects.

top

Strain-specific interplay of alpha-synuclein with membranes

Project Name: Strain-specific interplay of alpha-synuclein with membranes
Project leader: Dr Liang Xu
Research field: Biochemistry, Bioinformatics and Life sciences
Resource awarded: 100000 core hours on Hezel Hen, 100000 core hours on Piz Daint,
Description

The pathology of many neurodegenerative diseases is closely related to amyloid deposits in patient’s brain. The accumulation of a-synuclein protein results in Parkinson’s disease (PD), dementia with Lewy bodies (DLB) and multiple system atrophy (MSA). The precise mechanisms of a-synulcein that leads to toxicity and cell death are largely elusive. However, it’s clear that a-synuclein interacts with a variety of cellular membranes and these interactions may contribute to its function and/or pathology. Modulating membrane binding of a-synuclein may provide a therapeutic strategy. Recent studies revealed that different a-synuclein strains exhibit distinct properties such as difference in secondary structure, neurotoxicity, ability of seeding/cross-seeding, and propagation. To better understand the molecular signature that determines the strain-specific interactions between a-synuclein and membranes, we aim to perform multi-scale molecular dynamics (MSMD) simulations to examine the mode of action of a-synuclein in the presence of various model membranes. The aggregation propensity of diverse a-synulcein strains (oligomers or fibrils) will be tested and the effect of membrane composition on the binding preference of a-synuclein will be carefully investigated. The combination of large-scale simulation and advanced analysis methods will provide valuable insight into the strain-dependent interactions of a-synuclein with membranes.

top

Testing the scalaibility of Lattice QCD codes on new computer architectures

Project Name: Testing the scalaibility of Lattice QCD codes on new computer architectures
Project leader: Dr Piotr Korcyl
Research field: Fundamental Physics
Resource awarded: 50000 core hours on Marconi-Broadwell, 100000 core hours on Marconi-KNL,
Description

The aim of the project is to benchmark the scaling properties of lattice QCD software on new computer architectures. Lattice QCD is a numerical approach to Quantum Chromodynamics which offers calculations of physical properties of hadronic matter from first principles. It is computationally very demanding as it involves systems of the size of 10^10 variables with complex interactions. Hence it necessitates highly optimized software tailored for specific computer architectures. In this project we will bechmark the performance of our software on the Marconi system, in particular its weak and strong scaling properties, with an aim of a subsequent large scale PRACE computer time application.

top

Characterizing the structural basis for the nucleosome recognition by pioneer transcription factors: preparatory phase

Project Name: Characterizing the structural basis for the nucleosome recognition by pioneer transcription factors: preparatory phase
Project leader: Dr Vlad Cojocaru
Research field: Biochemistry, Bioinformatics and Life sciences
Resource awarded: 50000 core hours on Marconi-Broadwell, 50000 core hours on SuperMUC,
Description

Transcription factors are proteins that directly or indirectly bind to DNA in order to transcribe genetic information into RNA. In most cases accessibility to DNA is a prerequisite for binding of transcription factors. However, in the nucleus the DNA is packed into chromatin, which quite often is inaccessible to transcription factors. The fundamental unit of chromatin is the nucleosome, which is formed by wrapping 147 DNA base pairs around a core of eight histone proteins. Interestingly, a series of transcription factors are able to bind to closed chromatin states, recognizing their binding sites even in the presence of nucleosomes. These factors, known as “pioneer transcription factors”, can help open chromatin, increase DNA accessibility, and support binding of other transcription factors. How transcription factors recognize their binding site on a nucleosome is not known. In particular, structural data on nucleosome – transcription factor complexes are not available. This would be of utmost importance as characterizing the molecular mechanism of how transcription factors bind to nucleosomes is a crucial step towards understanding how chromatin is opened to eventually exert a certain biological function. In recent years, it has been reported that many of the transcription factors involved in transitions between different cellular identities are pioneer factors. In particular, in a Nobel Prize- awarded discovery, it has been shown that three such factors, Oct4, Sox2, and Klf4 are required to convert a somatic skin cell into a pluripotent stem cell. When introduced in such skin cells, Oct4 and Sox2 recognize their binding sites in DNA wrapped in nucleosomes. Based on chromatin immunoprecipitation followed by sequencing and nucleosome footprinting experiments, Oct4 and Sox2 binding sites and their overlap with nucleosome positions have been identified. Based on the available experimental data, we have built atomic resolution structural models of Oct4 – nucleosome interaction on two native DNA sequences. In collaboration with experimenters at our institute, we partially validated these models using mutations in the DNA binding sites and electrophoretic mobility shift assays. To further validate the models and decode the structural dynamics involved in Oct4-nucleosome binding at atomic resolution, we will now perform a series of molecular dynamics simulations of alternative Oct4-nucleosome configurations. Because these molecular systems and the number of simulations required are large, we require access to additional computational resources for the success of this project. Therefore, we will submit a full PRACE proposal for the upcoming call (deadline 30th of May 2017). For this, we are now requesting preparatory access to several PRACE sites to optimize the scaling of the performance of the software we will be using for our specific systems.

top

Distributed Computation of Matrix Inverse using a Probabilistic Approach

Project Name: Distributed Computation of Matrix Inverse using a Probabilistic Approach
Project leader: Prof Juan Acebron
Research field: Mathematics and Computer Sciences
Resource awarded: 50000 core hours on Marconi-Broadwell, 100000 core hours on Marconi-KNL, 50000 core hours on Curie, 50000 core hours on SuperMUC
Description

Current (and future) applications require increasingly powerful and efficient resources. In recent years, the hardware has experienced extraordinary advances, more than any other scientific discipline. The reality today is that the most advanced computer systems are made up of millions of cores. However, it is now the software, in particular parallel numerical algorithms, which proves to be the weakest element in solving problems. Large-scale computations have seen an intolerable waste of resources, making it impossible in some cases to exploit the full potential of available resources, mostly due to communication overheads. The study of the properties of complex networks has been the topic increasingly intense research in the last few years. The reason for this interest is that complex networks arise in many different groundbreaking areas of science. This is the case for networks arising in technological, social, biological, and others. Important metrics, such as entropy, node centrality, and communicability require the computation of a function of the network adjacency matrix, an operation that is not viable for large networks. In many cases the representation of the adjacency matrix for these networks is only possible because they are naturally sparse. Unfortunately, in general, the function of a matrix is a full matrix that simply cannot be represented due to its large size. This limitation severely hinders the analysis of networks of interest. Moreover, even for smaller matrices that allow the function of the matrix to fit in the available memory, classical methods for the computation of the function are not easily parallelizable. These methods require an amount of communication that reduces significantly the efficiency and scalability of a parallel solution. A new computational paradigm is proposed as solution. We are developing probabilistic numerical methods for the computation of different operations on matrices, in particular the computation of the inverse, and the exponential. These methods are intrinsically parallel. More importantly, they allow the computation of individual positions of the inverse matrix, hence any operation is feasible on matrices of arbitrary size as long as its representation is available.

top

Scalability tests for phonon and electron-phonon calculations using Quantum-Espresso and GW calculations using Yambo on 2D materials typical systems.

Project Name: Scalability tests for phonon and electron-phonon calculations using Quantum-Espresso and GW calculations using Yambo on 2D materials typical systems.
Project leader: Prof Nicola Marzari
Research field: Chemical Sciences and Materials
Resource awarded: 100000 core hours on Markoni-KNL,
Description

In this project we will perform scalability test and performances analysis for QuantumEspresso ph.x code (phonon and electron-phonon) and Yambo (GW) on systems of small medium and big size representative of the materials contained in the database of 1844 2D materials build in our group by performing a “computational exfoliation” more than 110000 bulk materials. These tests will represent a preparatory phase for the submission of a project within the 15th PRACE call in which we will propose a detailed study on the electronic, dynamical and transport properties of selected materials of this database.

top

Dynamics in Hybrid Perovskite Materials

Project Name: Dynamics in Hybrid Perovskite Materials
Project leader: Prof. Alison Walker
Research field: Chemical Sciences and Materials
Resource awarded: 50000 core hours on Curie, 100000 core hours in Hazel Hen, 50000 core hours in SuperMUC
Description

In recent years, organic inorganic hybrid perovskite solar cells have attracted a huge research interest due to their exceptional photovoltaic properties as well as easy and cheap fabrication processes.[1] The solar cells made of these materials have recently demonstrated a certified power conversion efficiencies of more than 22%.[2] Due to ‘soft’ nature of these hybrid perovskites, dynamical structural changes influence their optoelectronic properties quite noticeably.[3] Recently, various experimental as well as computational studies have further pointed out that dynamics of inorganic framework is coupled with the organic molecular motions, [3] having prominent effect towards band-gap fluctuations, charge transport, exciton binding energy.[4] Dynamical variation of these properties at ambient conditions has been investigated very rarely.[5] Ab initio molecular dynamics (AIMD) based on density functional theory (DFT) has emerged as the perfect method to explore this field of research very recently.[6] AIMD efficiently and cost-effectively captures the dynamical properties of hybrid perovskites with tens of picosecond time-scale. In our work, we propose to perform AIMD simulations on various kinds of perovskite materials using CP2K package.[7] Very recently, stability of the 3D-perovskite at ambient condition is one of the major problems, creating roadblock to commercialize these in large scale.[8] Apart from various external factors, it has been ditected that perovskites are intirnsically unstable due to low crystallization energy. In our project, we intend to look at the dynamical behavior of the perovskite crystals at atomistic level and propose the possible way to enhance the stability as well as modify photovoltaic properties. Particularly, incorporating different kinds of doping, we will investigate how one can increase the internal interactions among organic and inorganic components. That in turn, can results in enhanced stability of these materials in their perovskite phases. Along with exploring stability of the perovskites, we are also interested to look into the ion migration in these materials.[9] It’s known that ion migration causes the hysteresis in the current-voltage diagram as well as resposible for degradation of their solar-cell device. Though, various computational studies have thoroughly investigated this issue with static DFT-based calculations, finite temperature effect which has large impact, has been largely overlooked due to cost of computation. In our project, we are going to explore defect-migration in these perovskites using AIMD. The exact mechanism of defect migration and its effect towards structural and optoelectronic properties under ambient condition can be studied in atomistic level. With this knowledge, we can explain the physical origin of various recent experimental reports of suppressed hysteresis. We can further propose other probable easy and cost-effective ways to reduce ion-migration in operating solar-cell devices. [1] Science 2012, 338, 643 [2] NREL efficiency chart; https://www.nrel.gov/pv/assets/images/efficiency-chart.png [3] Acc. Chem. Res. 2016, 49, 573 [4] Phys. Rev. B: Condens. Matter Mater. Phys. 2016, 94, 04520 [5] J. Am. Chem. Soc., 2017, 139, 4068 [6] J. Phys.: Condens. Matter, 2016, 29, 043001 [7] https://www.cp2k.org/ [8] Angew .Chem .Int. Ed. 2017, 56, 1190 [9] Acc. Chem. Res., 2016, 49, 286

top

Scalability test of C++ code developed for solving compressible Navier-Stokes equations

Project Name: Scalability test of C++ code developed for solving compressible Navier-Stokes equations
Project leader: Professor Vassilis Theofilis
Research field: Engineering
Resource awarded: 50000 core hours on Marconi-Broadwell, 100000 core hours on Marconi-KNL, 100000 core hours on Hazel Hen,
Description

An in-house C++ high fidelity DNS/LES parallel code was developed. The code solves density, momentum and total energy in cylindrical coordinate with a hybrid solver employing a sixth order central finite difference scheme for smooth regions and a fifth order weighted essentially non-oscillatory scheme with local Lax-Friedrichs flux splitting into discontinuous regions. Temporal integration is performed using a fourth-order five-step Runge-Kutta scheme. The subgrid scale terms were computed using Germano’s dynamic model. The one-dimensional non-reflecting boundary conditions are used for the adiabatic walls and outflow regions. The code will be used in a collaborative research project between University of Liverpool (UK) and Monash University (Australia). The good scaling was achieved on national HPC facilities in Australia. The main purpose of this application is to test the scalability of the code on Marconi and Hazel Hen facilities. The long-time goal is applying for the computer time on the Tier-0 supercomputer systems to extend our simulations to larger domains and higher Reynolds numbers.

top

SOWFA scalability tests

Project Name: SOWFA scalability tests
Project leader: Dr Paolo Schito
Research field: Engineering
Resource awarded: 50000 core hours on Marconi-Broadwell, 100000 core hours on Marconi-KNL,
Description

Wind farm energy harvesting capability is generally calculated through the wind resource assessment. The real energy production of the wind farm is measured during the operation. The calculation of the loads on the wind turbines is not explicitly calculated during the wind farm operations and the control strategy is currently not taking into account the interference of upwind wind turbines. Current state of the art of wind farm controllers is designing control strategies that minimize the interference between wind turbine wakes and reduces possible extreme loading conditions on the rotors, in order to achieve a higher energy extraction and reducing the fatigue loads on the machine. The H2020 European project is currently investigating advanced wind farm control possibilities, by means of full scale, wind tunnel and CFD modelling. The CFD modelling tool is an advanced framework, where the dynamics of each single turbine, the control of each turbine and the control of the entire wind farm, are all taken into account and related to the incoming wind characteristics. This tool is called a high fidelity model, since the detailed modeling of the wind flow in the wind far is available. The study will investigate different control strategies to achieve the goal of lower LCOE (levelized cost of energy), by reducing the turbine loads and the increase of power extraction by the wind farm.

top

Scalability of ComprEssible DUct fLow

Project Name: Scalability of ComprEssible DUct fLow
Project leader: Dr Davide Modesti
Research field: Engineering
Resource awarded: 100000 core hours on Marconi-KNL,
Description

The project aim is to develop and test the scalability performances of an MPI implicit solver for the solution of the compressible Navier-Stokes equations in a duct flow. The main goal of the project is to obtain scalability data on MARCONI-KNL supercomputer, to apply for the next Prace call, on the same machine.

top

Scaling of a discontinuous Gakerkin code for simulation of low-pressure turbines.

Project Name: Scaling of a discontinuous Gakerkin code for simulation of low-pressure turbines.
Project leader: Dr Jean-Sebastien Cagnone
Research field: Engineering
Resource awarded: 100000 core hours on Marconi-KNL,
Description

The objective is to perform scaling studies with Argo, a high­-order ?discontinuous Galerkin (DGM) solver. We will evaluate the parallel performance of the code, with the final objective of performing high-resolution Large ?Eddy Simulations (LES) of the flow in three­-dimensional low­-pressure turbines passages, including end­-wall effects and corner separation. The parallel performance of Argo was previously demonstrated on Intel and BlueGene machines of the Prace network. In this preparatory call, we will port the code to Intel’s KNL many-core architecture, and measure the achieved speed-up on this new computing infrastructure.

top

HOSTEL DG (High Order Space-Time Eulerian-Lagrangian Discontinuous Galerkin)

Project Name: HOSTEL DG (High Order Space-Time Eulerian-Lagrangian Discontinuous Galerkin)
Project leader: Ing. Walter Boscheri
Research field: Engineering
Resource awarded: 100000 core hours on Marconi-KNL,
Description

Lagrangian algorithms have become very popular in the last decades because they allow the discontinuities which affect the flow motion to be precisely located and tracked. Moving mesh schemes typically present less numerical diffusion, hence achieving a better resolution of the flow features. As a drawback, since the mesh is moving in time and continuously changes its configuration, the computational grid might involve bad elements, i.e. highly stretched or distorted control volumes that would yield a significant reduction of the admissible time step. In some extreme cases, the computational cells can even become invalid, meaning that an element has a negative volume which would inevitably blow up the simulation. This is why a new family of Arbitrary-Lagrangian-Eulerian (ALE) finite volume methods has been developed, in which the mesh velocity can be chosen independently from the local fluid velocity. This allows the mesh to be moved more carefully, taking care of the geometrical quality of the elements. Specifically, we employ a rezoning strategy that aims at locally optimizing the shape of the control volumes, therefore avoiding the occurrence of invalid elements. We will focus on the development of the first better than third order Discontinuous Galerkin schemes in the context of ADER ALE methods. In the high order DG framework we have to take into account a curved geometry configuration that leads to curved boundaries for the definition of the control volumes. Furthermore classical DG methods are known to produce strong oscillations in the presence of discontinuous solutions, therefore we plan to implement an posteriori sub-cell limiter to overcome the problem of limiting. The discrete solution in those cells affected by a strong discontinuity is computed relying on a very robust TVD finite volume scheme which operates on the sub-grid level. The sub-grid is composed by a total number of M=2N+1 subcells per space dimension, where N is the degree of the DG scheme. We want to develop the new algorithm in two and three space dimensions on unstructured meshes and we require the new schemes to be high order accurate both in space and time, keeping the ALE approach that has proved to provide excellent resolution properties especially across contact discontinuities. The DG method will certainly produce more accurate and less diffusive results w.r.t. the ones obtained with a finite volume scheme, hence representing a very powerful tool for Lagrangian-like simulations. The goal of this project is to improve the scalability of our code in order to be ready for a submission for a regular PRACE call.

top

Type B: Code development and optimization by the applicant (without PRACE support) (5)

Parallel Partitioning methods for Hybrid Linear System Solvers

Project Name: Parallel Partitioning methods for Hybrid Linear System Solvers
Project leader: Prof. Cevdet Aykanat
Research field: Mathematics and Computer Sciences
Resource awarded: 250000 core hours on Hazel Hen, 250000 core hours on Juqueen,
Description

We propose a new parallel partitoning method for hybrid linear system solvers. The objective is to increase the performance and the scalability of the hybrid sparse linear system solver. Our algorithm utilize the structure of the coefficient matrix and divides it into sub-matrices according to the number of processors in the parallel environment. Each processors solve the small linear system individually. An iterative solver is used to improve the partial solution iteratively. After a number of iterations the solution is acquired.

top

Development of plasma model for aeroacoustics simulations for noise reduction devices in airship design

Project Name: Development of plasma model for aeroacoustics simulations for noise reduction devices in airship design
Project leader: Prof. Joan Baiges
Research field: Engineering
Resource awarded: 100000 core hours on Marconi-Brodwell, 200000 core hours on Curie, 100000 core hours on SuperMUC
Description

The current project aims at the development of the required tools for the numerical simulation of plasma actuators in noise reduction mechanisms in aircraft design. The project is framed in the IMAGE H2020 european project, a Europe – China collaboration project. http://www.cimne.com/vpage/2/2189/Objectives. The control approach proposed in IMAGE involves plasma actuation. To date, Dielectric Barrier Discharge plasma actuators represent one of the most promising avenues for controlling boundary layer flow and noise production. Technologically, advantages of the approach include the lack of any mechanical parts, light weight and the virtually unlimited control strategies offered considering they can be distributed in patches of almost any form. Furthermore, it has been shown that their electro-mechanical efficiency can also be altered by varying the amplitude and temporal variation of the driving voltage. In fact, much has still to be discovered about these actuation devices, but the perspectives are very encouraging considering the results reported in the literature, Two main code developments need to be tested in an MPI setting and validated in this setting: The first consists in the plasma body force which acts at the aerodynamic level and which accounts for the effect of the plasma actuators. We need to ensure that the implemented body force does not harm the scalability of the code. For this, several scalability tests are going to be performed. Also, this body force needs to be calibrated with several test cases, which will also be run in the current project. The tests cases of interest are the flow past a tandem cylinder configuration, for which experimental data at high Reynolds numbers is already available. The second test case will be the flow in a Wing-Mock-Up configuration, again comparison of the developed model agains experimental results will be performed. I The second implementation aspect which will be dealt in this problem is an efficient MPI-IO (output) strategy, which needs to be implemented for the visualization of the results. Our current implementation is based on the GiD postprocessor. However, for very large simulations this format is not suitable. We have implemented a VTK postprocess format which needs to be tested when using a large number of processors. Depending on the performance of this VTK format postprocessor, implementation and testing of HDF5 postprocess format will be also developed during the project.

top

A Scalable Discrete Adjoint Method For Large-Scale Aerodynamic Shape Optimization Problems In Incompressible Turbulent Flows

Project Name: A Scalable Discrete Adjoint Method For Large-Scale Aerodynamic Shape Optimization Problems In Incompressible Turbulent Flows
Project leader: Dr. Mariano Vazquez
Research field: Engineering
Resource awarded: 100000 core hours on Marconi-Broadwell, 100000 core hours on SuperMUC
Description

The objective of this project is to develop a parallel framework to handle shape optimization problems using discrete adjoint method. This framework is to be utilized for designing aerodynamic devices including airplane wings as well as wind turbines in the real flow conditions. Due to the fact that the real 3D flow simulations require millions degrees of freedom, scalable schemes are proposed for resolving the flow and the adjoint equations which make this framework suitable for large-scale design problems. Since the turbulence effects are considered not only for the flow simulations but also for the adjoint sensitivities, the derived sensitivities contain all the physical characteristics of the real flows. In contrast to the conventional turbulent adjoint formulation, the one proposed here is based on the hand-coded linearization of the turbulence model without any approximation giving rise to the exact sensitivities. Hence, the aerodynamic characteristics are enhanced more efficiently in the developed framework.

top

NEWA2HPC

Project Name: NEWA2HPC
Project leader: Dr. Gokhan Kirkil
Research field: Engineering
Resource awarded: 100000 core hours on Marconi-Broadwell,
Description

The New European Wind Atlas (NEWA, http://www.neweuropeanwindatlas.eu) Project is funded under the European Commission’s 7th Framework Programme ERA-Net Plus (http://euwindatlas.eu/) that comprises nine funding agencies from eight EU Member States. The project aims at creating a new and detailed European wind resource atlas using meso- and microscale modeling, as well as collecting data from field experiments to generate a high-fidelity database of wind characteristics. The NEWA project will develop a new reference methodology for wind resource assessment and wind turbine site suitability based on a mesoscale to microscale model-chain. This new methodology will produce a more reliable wind characterization than current models, leading to a significant improvement in the quantification of uncertainties on wind energy production and wind conditions that affect the design of wind turbines. The model-chain, e.g. how the models at various spatial scales share information, will be thoroughly validated across Europe with dedicated experiments and historical wind resource assessment campaigns from industry. High fidelity experiments will be executed to address wind energy specific modelling challenges in complex and forested terrain, coastal transitions and offshore. The reference model-chain code will be offered as open-source together with best practice guidelines to help standardizing the methodology for industrial use. As a result, NEWA database will be published open access, based on a publicly available reference model-chain, whose credibility will be built upon strong sense validation benchmarks. We need PRACE computing resources to support the production phase of the wind atlas for which a PRACE Project Access application will be submitted in 2017. Initially this proposal was submitted for MareNostrum, but it was rejected due to the upcoming upgrade. As we must start as soon as possible with the preparation work for the wind atlas, we re-submit it now for Marconi. The wind atlas climatology will be produced with the Weather Research and Forecast (WRF) model (http://www.wrfmodel.org/index.php). The WRF modeling system should first be installed and tested on Marconi. Afterwards, the focus of this preparatory phase will be on the optimization of the model set-up to make the best use of the allocation of the production phase.

top

Adoption of high performance computing in Neural Designer

Project Name: Adoption of high performance computing in Neural Designer
Project leader: Mr. Fernando Gomez Perez
Research field: Mathematics and Computer Sciences
Resource awarded: 200000 core hours on Marconi-KNL, 100000 core hours on SuperMUC,
Description

The predictive analytics market is undergoing an impressive growth. Indeed, organizations that incorporate that technique into their daily operations not only better manage the present, but also increase the probability of future success. Artelnics develops the professional predictive analytics solution called Neural Designer. It makes intelligent use of data by discovering complex relationships, recognizing unknown patterns, predicting actual trends or finding associations. Neural Designer out-stands in terms of usability, functionality and performance. Current technology lacks from advanced model selection techniques, and usually requires many computational resources. The main challenge for Neural Designer is to include a framework capable of untangling complex interactions in big data sets. In order to do that, the software must achieve high performance by means of parallel processing. The users of the solution are professional data scientists, which work at analytics departments of innovative companies, consulting firms specialized in analytics or research centres. Neural Designer will be capable of analysing bigger data sets in less time, providing our customers with results in a way previously unachievable.

top

Type C: Code development with support from experts from PRACE (3)

Quasi-particle self-consistent GW approximation: avoiding the I/O bottleneck

Project Name: Quasi-particle self-consistent GW approximation: avoiding the I/O bottleneck
Project leader: Prof. Mark van Schilfgaarde
Research field: Chemical Sciences and Materials
Resource awarded: 100000 core hours on Markconi-Broadwell, 250000 core hours on Hazel Hen, 100000 core hours on SuperMUC
Description

The Questaal Suite offers a range of electronic structure programs that can be used to model different materials and nanoscale structures. The majority of the codes use an all-electron implementation of density-functional theory. This includes several forms (hamiltonian and Green’s function) that serve different purposes. Additionally there is an all- electron implementation of GW theory, including a quasiparticle self-consistent form of it. These codes share a basis set of atom-centred functions. The basis has its genesis in the Linear Muffin Tin Orbitals (LMTO) method of O. K. Andersen, who formulated the theory of linear methods in band theory. The LMTO and LAPW (Linear Augmented Plane Wave) methods are the most common direct forms of the linear methods, though most approaches (including those based on pseudopotentials) depend on a linearization as well. The present code is a descendent of the “tight binding linear method” that formed the mainstay of Andersen’s group in Stuttgart for many years. Applications include modeling electronic structure, magnetic properties of materials, Landauer-Buttiker formulation of electronic transport, impurity effects in solids, and linear response. Packages distributed in the Questaal suite include: Full Potential LMTO: This is an all-electron implementation of density-functional theory using convolutions of Hankel functions and Gaussian orbitals as a basis set. This code also provides an interface to a QSGW package. It is a?fairly accurate basis, and has been benchmarked against other all-electron schemes. QSGW: GW is usually implemented as an extension to the LDA, i.e. G and W are generated from the LDA. The GW package also has the ability to carry out quasiparticle self-consistency (QSGW). QSGW may be thought of as an optimised form of the GW approximation of Hedin. Self-consistent calculations are more expensive than usual formulations of GW based on a perturbation of density functional theory, but it is much more accurate and systematic. Self-consistency also removes dependence on the starting point and also makes it possible to generate ground state properties that are sensitive to self-consistency, such as the magnetic moment. QSGW is perhaps the most universally applicable, true ab initio theory for electronic states in extended systems that exists today. It has a proven ability to consistently and reliably predict such as quasiparticle (QP) levels for a wide range of materials such as graphene, Fe-based superconductors, and NH3CH3PbI3 (a recently popular solar cell material) in a consistent and parameter-free manner that cannot be achieved by other theories. Many other properties, such as Dresselhaus coefficients, electric field gradients, transmission probability, and spin waves, are well described by the theory. QSGW is more expensive than usual forms of GW because the entire self-energy (iGW) must be calculated. A parallel version of this code has been written. It contains a significant bottleneck, which prohibits the realistic application to systems with more than 40 atoms or so. The aim of this project is to redesign the code and eliminate this bottleneck.

top

Type D: Optimisation work on a PRACE Tier-1 (5)

High-precision nonadiabatic rotational states of hydrogen molecule

Project Name: High-precision nonadiabatic rotational states of hydrogen molecule
Project leader: Prof. Jacek Komasa
Research field: Chemical Sciences and Materials
Resource awarded: 150000 core hours on Tier-1
Description

The main goal of this project is to eliminate memory limitations in solving the general symmetric eigenvalue problem with large (180000+) dense matrices. It is planned to modify existing OpenMP-based code to a distributed version with acceptable memory-per-node requirements. Successful implementation of this project will have a significant impact on the scientific goal, which is the determination of rovibrational energy levels in four-particle molecular systems.

top

Extending the scalability and parallelization of SEDITRANS code

Project Name: Extending the scalability and parallelization of SEDITRANS code
Project leader: Dr. Guillermo Oyarzun
Research field: Engineering
Resource awarded: 150000 core hours on Tier-1
Description

This project is a WP5 activity part of the Initial Training Network SEDITRANS (GA number: 607394), implemented within the 7th Framework Programme of the European Commission under call FP7-PEOPLE-2013-ITN. SEDITRANS is a research network that consists of six academic and four industrial partners within Europe. It is focused in to advancing in the comprehension of coastal processes utilizing high performance computing (HPC) for the numerical simulation of the three-dimensional, turbulent flow, which is induced in the coastal zone, and mainly in the surf zone, by wave propagation (oblique to the shore), refraction, breaking and dissipation. Currently, the parallel code is optimized to be used in medium size clusters by means of a MPI parallelization, using typically between 100 and 1024 CPUs for each execution. Our aim is to extend the parallelization of the code, in order to run it on hybrid architectures such as the ones of the Tier0. So far we have tested the GPU parallelization in some parts of the code, reporting promising results. However we need to analyze this strategy when we use more GPUs simultaneously combining the distributed memory parallelization using MPI with the stream processing within the GPU.

top

Automation of high fidelity CFD analysis for aircraft design and optimization

Project Name: Automation of high fidelity CFD analysis for aircraft design and optimization
Project leader: Dr Mengmeng Zhang
Research field: Engineering
Resource awarded: 150000 core hours on Tier-1
Description

Airinnvoa is a company to develop computational solutions for aerodynamic shape optimization, which is an important task in aircraft design. The high fidelity CFD (computational fluid dynamics) analysis is a major tool for modern aircraft design and optimization, and the computational power is a limiting factor. To carry out the high fidelity CFD requires the engineers have special skills in making mesh and execute the analysis code, which constraints the use of the CFD analysis only to a limited number of people. The goal of the proposed project is to help engineers to design the aircraft in a more efficient and simpler way by making the core processes automatic. Airinnova has been conducting the PRACE SHAPE project with collaboration with PRACE partner SNIC-KTH. The outcome of the research work has been presented in an AIAA conference paper. In this proposed project, we will follow from our previously work and take advantage of the optimization results and the existing scripts. In the proposed project, we will continue to carry out the high fidelity CFD analysis (RANS), with emphasizing running CFD in an automation way by starting from a watertight aircraft geometry. Gradient-based optimization algorithms by solving the adjoint-based equations will be applied to the final step of the automation process, which allows the flexibility integrate the whole automation process into a MDO (Multi-Disciplinary Optimization) design environment. The tasks mainly consist of: 1. Automation process development: Develop the automation process by starting from a watertight Common Research Model (CRM) aircraft geometry with designed pylons and nacelles. 3. Benchmark: performance analysis for the desired model using proposed automation process including auto-meshing and CFD solver auto-run. 3. Port: deploy and run on a PRACE Tier1 system and prepare for Tier-0 system

top

Scalable Delft3D FM for efficient modelling of shallow water and transport processes

Project Name: Scalable Delft3D FM for efficient modelling of shallow water and transport processes
Project leader: Dr Menno Genseberger
Research field: Earth System Sciences
Resource awarded: 150000 core hours on Tier-1
Description

Forecasting of flooding, morphology and water quality in coastal and estuarine areas, rivers, and lakes is of great importance for society. To tackle this, the modelling suite Delft3D, was developed by Deltares (independent non-profit institute for applied research in the field of water and subsurface). Delft3D is used worldwide. Users range from consultants, engineers and contractors to regulators and government officials. Delft3D has been open source since 2011. It consists of modules for modelling hydrodynamics, waves, morphology, water quality, and ecology. In two previous (small) PRACE projects [1, 2] and the FP7 Fortissimo experiment Delft3D as a Service (see for instance example in [3]) steps have been taken (a.o. with SURFsara and CINECA) to make Delft3D modules more efficient and scalable for high performance computing. Currently, for Delft3D there is a transition from the shallow water solver Delft3D-FLOW for structured computational meshes to D-Flow FM (Flexible Mesh) for unstructured computational meshes. D-Flow FM will be the main computational core of the Delft3D Flexible Mesh Suite. For typical real-life applications, for instance for highly detailed modelling and operational forecasting, there is urgency to make D-Flow FM also more efficient and scalable for high performance computing. As the solver in D-Flow FM is quite different from the one in Delft3D-FLOW some major steps have to be taken. Also for the modules for modelling waves, morphology, water quality, and ecology that connect to D-Flow FM. Aim of the current project is to make significant progress towards Tier-0 systems for the shallow water and transport solvers in the Delft3D Flexible Mesh Suite. Starting point are the results and experiences of the previous projects, mainly obtained on the Tier-1 system Cartesius. First steps with D-Flow FM on Cartesius were set in the Fortissimo project. For D-Flow FM the computational work is parallelized by domain decomposition. At the moment work started on optimal boundary conditions at the interfaces of the subdomains to minimize the amount of solver iterations and therefore the amount of required communication between subdomains (strategy originally developed in [4, 5]). This aspect is essential when making the step to exascale computations. The same technique has been applied before to the shallow water solver Simona [6, 7] which is almost identical to Delft3D-FLOW and was recently used by Deltares and SURFsara at Cartesius to optimize work for forecasting of flooding around the Dutch lakes [8]. With this technique, for more academic type of eigenvalue problems weak scaling up to 16000 cores was obtained at the Curie thin nodes [9, 10]. [1] http://www.prace-project.eu/IMG/pdf/wp100.pdf [2] http://www.prace-project.eu/IMG/pdf/wp177.pdf [3] https://ir.cwi.nl/pub/24648 (under revision for publication) [4] http://www.ddm.org/DD07/Tan_Borsboom.pdf [5] Tan, K.H.: Local coupling in domain decomposition. Ph.D. thesis, Utrecht University (1995) [6] http://www.ddm.org/DD21/homepage/dd21.inria.fr/pdf/borsboomgensebergerhofspee_mini_8.pdf [7] Vollebregt, Roest, Lander: Large scale computing at Rijkswaterstaat. Parallel Computing 29, pp. 1-20 (2003) [8] https://ir.cwi.nl/pub/25171 [9] http://www.ddm.org/DD21/homepage/dd21.inria.fr/pdf/gensebergerm_contrib.pdf [10] Genseberger, Improving the parallel performance of a domain decomposition preconditioning technique in the Jacobi-Davidson method for large scale eigenvalue problems, Applied Numerical Mathematics 60 pp. 1083-1099 (preprint at https://ir.cwi.nl/pub/13582)

top

Radiative Transfer Forward Modelling of Solar Observations with ALMA

Project Name: Radiative Transfer Forward Modelling of Solar Observations with ALMA
Project leader: Dr. Sven Wedemeyer
Research field: Universe Sciences
Resource awarded: 150000 core hours on Tier-1
Description

The Atacama Large Millimeter/submillimeter Array (ALMA), which is currently the world’s largest astronomical observatory, opened up a new window on the universe. The interferometric array is located on the Chajnantor plateau in the Chilean Andes at an altitude of 5000 m and consists of 66 antennas, most of them with a diameter of 12 m. By combining the antennae, they act like a giant telescope with baselines of up to 16 km. Already within the first few years of operation, ALMA led to many scientific discoveries. Since December 2016, ALMA is also used for observations of our Sun. It observes the Sun at a spatial resolution, which is unprecedented in this wavelength range, and offers novel means of determining the properties of the plasma in the Sun’s outer atmospheric layers. Due to the properties of the solar radiation at millimeter wavelengths, ALMA serves as a linear thermometer, mapping narrow layers at different heights. It can measure the thermal structure and dynamics of the solar atmosphere and thus sources and sinks of atmospheric heating. Among other expected scientific results, ALMA promises significant steps towards understanding the intricate dynamics and physical processes that, in combination, might yield the solution of the coronal heating problem – a long standing fundamental question in modern astrophysics. However, ALMA’s novel diagnostic capabilities, which will ultimately progress our understanding of the Sun, still need to be developed and understood further in order to fully exploit the instrument’s potential. Detailed numerical simulations of the solar atmosphere and artificial observations of the Sun play a key role in this respect. Such artificial observations of the Sun will be produced as part of the SolarALMA project at the University of Oslo, which is funded with a Consolidator Grant by the European Research Council (ERC), in collaboration with Dr. de la Cruz Rodriguez from the University of Stockholm. The overall aim of the SolarALMA project is to utilize the first observations of the Sun with ALMA and to develop the required observing and analysis techniques. An important step in this endeavour is the development of realistic numerical models of the solar atmosphere, which can be used to test how to optimally set up solar observations, and how to analyse and interpret them. While 3D numerical models are routinely produced on high-performance computers, the codes available for producing the corresponding (artificial) ALMA observations of such models did not perform well enough so far. We have developed a new code that solves the radiative transfer equation for a 3D numerical model and thus reveals how the modeled part of the Sun would look like through the eyes of ALMA at (sub)millimeter wavelengths. The new code is in an advanced stage but, still needs to be optimized in order to provide the basis for essential studies, which will result in optimizing ALMA’s scientific impact for observing and understanding the Sun.

top

Share: Share on LinkedInTweet about this on TwitterShare on FacebookShare on Google+Email this to someone