PRACE Preparatory Access – 21st cut-off evaluation in June 2015

Find below the results of the 21st cut-off evaluation of June 2015 for the PRACE Preparatory Access.

Projects from the following research areas:



Type A: Code scalability testing (5)

Convection in massive stars: towards a better understanding of stellar evolution

Project Name: Convection in massive stars: towards a better understanding of stellar evolution
Project leader: Dr. Cyril Georgy, Keele University, United Kingdom
Research field: Astrophysics
Resource awarded: 50.000 CPU on Hornet


Description: The goal of this preparatory project is to port and set-up our hydrodynamics code, PROMPI, on the HORNET cluster in order to ensure its scalability for future scientific runs. PROMPI is a modern, MPI-paralellised, hydrodynamics code based on a finite-volume scheme. It has been used extensively to study stellar hydrodynamics at the University of Arizona, and has been ported to extremely large HPC platforms in the United States including Jaguar at Oak Ridge and the Magnus supercomputer in Western Australia, which is a Cray XC40 machine (same as HORNET).

The test runs will consist in setting-up very high resolution simulations of convective zones in deep stellar interiors (future scientific runs) and run them for a short time with various numbers of CPU-cores. The aim of these tests is to check our code set-up and acquire experience on HORNET, with as short/mid-term goal to submit a request for a Project Access. The future request will consist in running high resolution simulations (already set-up during this preparatory project) of convection at various stages of stellar evolution, in order to study the behaviour of the fluid in different conditions. We aim to answer questions such as: where are the boundaries of the convective zones located? How is the mixing across these boundaries? These simulations will form the basis to develop realistic prescriptions for traditionnal one-dimensional stellar evolution codes, based on the average behaviour of the fluid in our 3 dimensions hydrodynamical simulations.



Project Name: 3DNRS
Project leader: Prof. Sandro Frigio,University Camerino, Italy
Research field: Mathematics and Computer Science
Resource awarded: 100.000 CPU on Fermi, 50.000 CPU on MareNostrum, 50.000 CPU on Curie Fat Nodes (FN), 50.000 CPU on Hornet


Description: The Navier-Stokes (NS) equations are the main mathematical model for viscous fluid motion and have been intensively studied both theoretically and numerically. One of the main open problems is the existence of possible singularities (blow-up) for three-dimensional (3d) flow, which is in the list of the Millennium Prize Problems of the Clay Institute. Previous results obtained by computer simulations have been inconclusive. In fact simulating the NS equations in dimension 3 is computationally onerous and reliable evidence on a blow-up is difficult to obtain if one does not know its structure.

We propose a numerical study of a particular class of solutions of the 3d incompressible Navier-Stokes equations suggested by the theoretical work of Li and Sinai (Li, D. and Sinai, Ya. G.: `Blowups of complex solutions of the 3D Navier-Stokes system and renormalization group method”. J. Eur. Math. Soc. 10, 267–313, 2008), who proved the existence of a blow up for complex-valued solutions with suitable initial data. The behavior of such solutions in Fourier space is simple and gives a guideline for the computer simulations. We have already obtained results by computer simulation which describe the details of the blow-up for the 3d NS equations and the 2d Burgers, for which solutions of the Li-Sinai type also exist.
The work of Li and Sinai suggests the study of real-valued solutions of the NS equations which have a structure in Fourier space similar to that of the complex blow-up solutions. Some preliminary results of computer simulations shows that such solutions share some interesting features of the complex solutions, such as a concentration of energy in in a small region of the physical space (a kind of “tornado” effect). They are worth of a further study with higher computational resources.
The expected results of the study that we propose may be summarized as follows:

  • Deeper understanding of the blow-up mechanism for the 3-d complex NS equations.
  • For the real solutions of the 3-d NS an understanding of the phenomenon of concentration of energy, with hope of showing evidence of a blow-up.
  • Analysis of simplified models of fluid motion which produce a “tornado effect”.

For the nature of the problem we need to work with a large amount of grid points and to repeat the evolution iterations many times. For this reason we need a Tier-0 resources (of order of 10 Millions of cores hours). To apply for a future Tier-0 project we want to test the scalability performance of our code on the different architectures of PRACE.


Enhanced sampling Molecular Dynamics of metamorphic proteins: scalability test on CPU and GPU-based architectures

Project Name: Enhanced sampling Molecular Dynamics of metamorphic proteins: scalability test on CPU and GPU-based architectures
Project leader: Dr. Alessandro Pandini, Brunel University London, United Kingdom
Research field: Medicine and Life Sciences
Resource awarded: 50.000 CPU on Curie Fat Nodes (FN), 50.000 GPU on Curie Hybrid Nodes, 50.000 CPU on Curie Thin Nodes (TN)



Description: Until recently it was widely accepted that the native state of a protein comprises a limited number of closely related conformations associated with one fold. The discovery of several examples of ‘metamorphic’ proteins has challenged this assumption and demonstrated that some proteins can switch between multiple folds with different function. The shift may be induced by the binding of small molecules as well as by environmental changes. Specific mutations can stabilise one of the forms. Indeed, examples are known of nearly identical amino acid sequences with different folds.

In addition to providing insight into the evolution and the emergence of novel folds and the theory of protein folding, metamorphic proteins are considered as promising scaffolds for protein design in synthetic biology. In the future, it will be possible to design specific biological functions for one of the available fold states and activate them on-demand by changes in environmental conditions or by targeting the protein with small molecules.

While extremely promising, metamorphic proteins are particularly challenging to study with traditional computational strategies due to their intrinsic bistability and tendency to undergo drastic conformational changes upon small perturbations. In particular, equilibrium molecular dynamics simulations are generally limited in their ability to sample large structural changes such as those associated with a fold switch. A most suitable approach is to use enhanced sampling techniques where a bias is usually added to accelerate the sampling of the desired changes.

Fold switches generally involve multiple rearrangements of secondary structures. A possible strategy to induce such transitions is to use bias-exchange MetaDynamics (BE-MTD) simulations, where a replica exchange framework is performed on multiple CVs.

We propose to benchmark the performance of this strategy on CPU and GPU-based machines to identify the best architecture and optimal ratio between the number of computing cores and the number of replicas. Scaling tests will be also performed to determine the maximum number of CVs that can be used in a single BE-MTD run. This preparatory stage will be focused on the metamorphic protein Mad2. The resulting benchmark data will be used to support future PRACE proposals.


Respiratory Chain

Project Name: Respiratory Chain
Project leader: Dr. Marco Stenta, Syngenta Crop Protection Mnchwilen AG, Switzerland
Research field: Medicine and Life Sciences
Resource awarded: 100.000 CPU on Fermi


Description: Oxidative phosphorylation (OXPHOS) is a complex metabolic pathway that uses the energy released by the oxidation of nutrients to form ATP molecules. In eukaryotes, OXPHOS takes place in the mitochondrion. The mitochondrial electron transport chain (mtETC) transfers electrons from NADH to, ultimately, O2 through a series of spatially separated redox reactions. This process is coupled with the generation of a transmembrane proton electrochemical gradient, which is then used by enzyme ATP synthase to generate ATP molecules. Electron and proton transfer are carried out by a series of protein complexes anchored at the inner mitochondrial membrane:

  • Complex I NADH-coenzyme Q oxidoreductase
  • Complex II Succinate-Q oxidoreductase
  • Complex III Q-cytochrome c oxidoreductase
  • Complex IV Cytochrome c oxidase

The structure of respiratory complexes has been resolved and several experimental data elucidate their functioning and interaction. Nevertheless many aspects of their biochemical functioning remain elusive. Respiratory complexes are the target of several inhibitors of pharmaceutical and agrochemical relevance. A better knowledge of the mechanism of substrate/inhibitor access/egress to/from active pocket would help designing new inhibitors with improved kinetic profiles. A detailed atomistic view on ligand binding could help shedding a light on the resistance mechanism observed for some known inhibitors.
The purpose of this research program is to:
1) Model respiratory Complex I/II/III/IV of relevant species starting from available X-ray structures
2) Derive accurate molecular mechanics parameters for the various cofactors implied in the mtETC (clusters Fe-S, eme groups, etc.)
3) Simulate the dynamic behavior of Respiratory complexes in a realistic environment including water and mitochondrial membrane
4a) Investigate the energetics of binding of natural substrates and of few selected inhibitors
4b) Investigate the kinetics of binding of natural substrates and of few selected inhibitors

The objective of this Preparatory Access proposal is to quantify the computational cost of modeling each of the 4 respiratory complexes, in order to prepare a proposal to be submitted in September 2015 as a Project Access.


The Interaction of Alzheimer’s Amyloid-Beta Peptide With Bilayer Containing Neuronal Lipids

Project Name: The Interaction of Alzheimer’s Amyloid-Beta Peptide With Bilayer Containing Neuronal Lipids
Project leader: Prof. Birgit Strodel, Juelich research Center, Germany
Research field: Medicine and Life Sciences
Resource awarded: 50.000 CPU on Curie Thin Nodes (TN), 50.000 CPU on Hornet, 100.000 CPU on SuperMUC,



Description: Aberrant protein aggregation is one of the main causes of the onset of many neurodegenerative diseases such as Alzheimer’s disease (AD) or Parkinson’s disease. The peptide most closely related to the etiology of Alzheimer’s disease is amyloid beta (AB), which exists in two main alloforms of 40 (AB40) and 42 (AB42) residues. AB has been shown to form pores in the plasma membrane of neurons, which disrupts Ca2+ homeostasis and subsequently causes neuronal death. The neuronal membrane has been shown to promote aggregation, and experimental studies have demonstrated that AB may have a higher affinity for particular lipid types. However, in order to provide a surface large enough to accommodate peptide aggregates, and lipid types in a statistically significant quantity, a large membrane must be used. We propose a bilayer comprised of 23×23 lipids (~190 Å x ~190 Å surface area) per leaflet, containing several lipids that are present in neuronal membranes. The system will include explicit solvent in sufficient volume to enable the aggregation of up to 10-12 AB42 peptides to be studied and to determine how the membrane lipids affect the aggregation process. All-atom molecular dynamics (MD) in explicit solvent as implemented in the state of the art, parallel GROMACS software and enhanced sampling techniques such as Hamiltonian replica exchange will be used to adequately explore the conformational space of the peptide under the influence of the bilayer. However, a system of this size will require sufficient testing for ensure that the use of computational resources is done as efficiently as possible. In this project we will fine-tune and benchmark the performance of GROMACS for the solvated membrane system and evaluate the scaling behaviour of enhanced sampling techniques.




Type B: Code development and optimization by the applicant (without PRACE support) (5)

Fast Linear Algebra

Project Name: Fast Linear Algebra
Project leader: Dr. Oded Schwartz, Hebrew University, Israel
Research field: Mathematics and Computer Science
Resource awarded: 250.000 CPU on Fermi, 50.000 CPU on Hornet, 250.000 CPU on Juqueen,



Description: For high performance computing HPC, the major bottleneck is the cost of communication between processors and within memory hierarchy. These costs take orders of magnitude more time (and energy) than arithmetic computations, and judging by hardware trends, their share in the total costs is expected to increase further. Hence the need for communication minimizing algorithms. Ideally, we would be able to obtain lower bounds on the amount of communication required for fundamental problems, and design communication-optimal algorithms, i.e., attaining those bounds.
We have obtained several lower bounds and optimal algorithms within dense and sparse numerical linear algebra. In this project we intend to implement, tune, and benchmark some of these algorithms.


Improving the methodology of generating galaxy mock catalogues for very large galaxy surveys and very small scales

Project Name: Improving the methodology of generating galaxy mock catalogues for very large galaxy surveys and very small scales
Project leader: Dr. Chia-Hsun Chuang, Spanish national Research Council and Autonomous University of Madrid-Institute for Theoretical Physics (IFT), Spain
Research field: Astrophysics
Resource awarded: 200.000 CPU on Curie Fat Nodes (FN), 200.000 CPU on Curie Thin Nodes (TN)



Description: This project aim to extend the methodology/code of Effective Zeldovich approximation mock catalogue (EZmock, see to produce the halo/galaxy catalogues for very large volume galaxy surveys. EZmock is the cheapest method which can generate mock halo/galaxy catalogues with accurate clustering statistics (i.e., 2-point and 3-point clustering statistics in real and redshift space). By replacing the initial gaussian density field, one can generate a large number of mock catalogues which can be used to estimate the cosmic variance of the large scale structures. This is a fundamental task for the data analysis of any on-going and future large volume galaxy surveys. The current version of the code has been tested to be able to produce the mock halo catalogues for a box with length size of 2.5Gpc/h which can be use to construct the mock galaxy catalogs for the Baryon Oscillation Sky Survey (BOSS). The method/code needs to be modified and tested for larger boxes corresponding to future surveys.

In addition, the method/code has been tested to be good for scales larger than 10 Mpc/h (or k < 0.7 Mpc/h). It would be meaningful to extend the study to smaller scales. The method has also been validated for high bias tracers (luminous red galaxies), but it would be useful to understand the performance of lower bias tracers.


Improving the simulation performance of a large biological-system by load balancing among its multiple active zones.

Project Name: Improving the simulation performance of a large biological-system by load balancing among its multiple active zones.
Project leader: Dr Juan Torras, Universitat Politecnica de Catalunya, Spain
Research field: Medicine and Life Sciences
Resource awarded: 100.000 CPU on MareNostrum, 100.000 GPU on Curie Hybrid Nodes, 200.000 CPU on Curie Thin Nodes (TN)



Description: Large biological systems are getting much interest because their involvement in real metabolic processes. Among them, those large systems with differentiated active zones (AZ) demands a special attention because of its separate evolution is linked to the global system behavior. An example would be a protein with multiple metallic centers such as ferritin, where the metallic ions are playing an important role in protein-protein interactions to form metal-induced self-assembly cages. The interest in this structures lies on its potentiality for targeting drug delivery. This project aims at improve the performance in an existing hybrid QM/MM MD code (PUPIL) that allows to treat multiple active zones in large biological systems by means of a concurrent set of QM/MM calculations of their different active zones within a unique QM/MM-MD simulation. Taking into account that different active zones might involve different computational resources in a unique large system, with different execution times, a load balancing implementation will be incorporated along this project to improve the global simulation performance. Thus, a new resource distribution center responsible to raise up several QM/MM calculations and assign the necessary resources at each MD step within the QM/MM-MD approach, will be implemented and tested.


Incorporating latency-based communication cost metrics into recursive bipartitioning based decomposition tools

Project Name: Incorporating latency-based communication cost metrics into recursive bipartitioning based decomposition tools
Project leader: Prof. Cevdet Aykanat, Bikkent University, Turkey
Research field: Mathematics and Computer Science
Resource awarded: 250.000 CPU on Juqueen, 250.000 CPU on SuperMUC



Description: Partitioning tools are widely used to decompose a given domain into sub-domains to enable efficient parallelism. These tools can be categorized into two with respect to how they obtain a partition: the ones that utilize a recursive bipartitioning approach and the ones that utilize a direct k-way approach. Recursive bipartitioning approach is the most widely used for obtaining a partition, realized in partitioners such as Metis, Scotch, etc. The problem is to partition a given matrix (modeled through graph or hypergraph) into P parts in such a way that inter-processor communication is minimized while computational load balance is maintained. In the obtained partition, the edges that cross the parts in the graph require inter-processor communication. The partitioners aim at minimizing number of these edges and this corresponds to minimizing inter-processor communication. Although minimization of this metric is heavily studied, the requirements of ever larger scale computations require message count (determines latency) also to be considered, which can be a crucial factor that determine overall communication time. In this direction, there are very recent works centered around subject [1] [2].
Sparse matrix operations have low computational density, making them hard to scale beyond certain number of cores. This low computational density makes communication costs more important in obtaining good performance. Communication time depends on multiple factors. These factors should all be considered together in the partitioner to achieve scalable performance. In this project, we wish to scale irregular sparse domains beyond thousands of cores. We investigate models and methods that are capable of minimizing both bandwidth and latency portion of the communication bottlenecks through partitioning. Our models depend on the recursive bipartitioning and exploits it in such a way that both bottlenecks are addressed at the same time. Our approach is able to achieve symmetric partition on the given matrix and can be integrated into any readily available partitioner that adapts recursive bipartitioning. We plan to test the generated partitions by using them as partition vectors in PETSc (PETSc also uses ParMetis as a library). Our eventual goal is to improve scalability of PETSc. Any application relying on PETSc to model and solve partial differential equations may benefit from our work to scale their code.
[1] M. Deveci, K. Kaya, B. Ucar, Umit V. Catalyurek, Hypergraph partitioning for multiple communication cost metrics: Model and methods, Journal of Parallel and Distributed Computing 77 (2015) 69-83.
[2] O. Selvitopi, M. Ozdal, C. Aykanat, A novel method for scaling iterative solvers: Avoiding latency overhead of parallel sparse-matrix vector multiplies, 2014. doi:10.1109/TPDS.2014.2311804.


Parameter Optimization and Evaluating OpenFOAM Simulations for Magnetohydrodynamics

Project Name: Parameter Optimization and Evaluating OpenFOAM Simulations for Magnetohydrodynamics
Project leader: Assoc. Prof. Ahmet Duran, Instanbul Technical University, Turkey
Research field: Mathematics and Computer Science
Resource awarded: 250.000 CPU on Fermi, 200.000 CPU on Curie Thin Nodes (TN), 250.000 CPU on Juqueen,



Description: We investigated the challenges facing CFD solvers as applied to bio-medical fluid flow simulations and in particular the OpenFOAM 2.1.1 solver, icoFoam, for the large penta-diagonal matrices coming from the simulation of blood flow in arteries with a structured mesh domain in PRACE-3IP project at TGCC Curie (a modern Tier-0 system) (see [1] and references therein). We achieved scaled speed-up for large matrices up to 64 million x 64 million matrices and speed-up up to 16384 cores on Curie.

We will be working with Ergolines s.r.l. for the SHAPE project under PRACE 4IP WP7. In this project we will focus on parameter optimization for magnetohydrodynamics (MHD). We will test OpenFOAM “mhdFoam” simulator for various geometries. We will compare the performance, scalability and robustness of OpenFOAM on for its potential and limitations.

[1] A. Duran, M.S. Celebi, S. Piskin, and M. Tuncel, Scalability of OpenFOAM for Bio-medical Flow Simulations, Journal of Supercomputing, 71(3), 2015, pp. 938-951, DOI 10.1007/s11227-014-1344-1 Springer Link. This work was financially supported by the PRACE Project funded in part by the EUs 7th Framework Programme (FP7/2007–2014) under Grant agreement No. RI-312763, (see PRACE white paper WP 162 for an early version, June 9, 2014)



Type C: Code development with support from experts from PRACE (1)


Project Name: HPCWelding
Project leader: Dr. Tobias Loose, Ingenieurburo Tobias Loose, Germany
Research field: Engineering and Energy
Resource awarded: 50.000 CPU on Hornet



Description: Ingenieurbüro Tobias Loose is an engineering office specialized on simulations for welding and heat treatment and provides consulting for industrial customers, training and software for customer applications. The welding and heat treatment simulations aim on the one hand to determine the final state of the assemblies after the manufacturing processes: We would like to know the distortion, residual stress, material properties and the microstructure. On the other hand, these simulations are utilized to optimize the processes.
Welding simulation models need a fine discretisation in the weld area. Furthermore, the industry has been requesting the analysis of large assemblies as well as the analysis of thick plates with multilayered welds. In addition, welding is a transient process and its numerical analysis involves a large number of time steps. These circumstances lead to welding simulation models for industrial cases with a large number of elements and a large number of time steps. This yields the problem of long simulation times on small computing clusters.
The duration of simulation time for welding analysis is a high barrier for the acceptance of this simulation technique in the industry. A welding simulation needs to be performed within one week which requires calculation times within one day. Consulting about welding simulation can be performed in reasonable manner if the issue of calculation time is resolved.
High performance computing can provide a solution for this issue. The Finite Element Code LS-DYNA provides good performance and permits parallelized computation using domain decomposition. While the parallelized LS-DYNA code has successfully been used in explicit crash analysis with up to 2048 cores, the parallelized welding simulation is a new field for the LS-DYNA solver and it is not clear how the solver reacts if welding analysis is set up highly parallelized.
The goal of this SHAPE project coached by PRACE-4IP WP7 is to check the feasibility of parallelized welding analysis with LS-DYNA and its performance.