Find below the results of the 27th cut-off evaluation of December 2016 for the PRACE Preparatory Access.

Projects from the following research areas:

**Type A: Code scalability testing (12)**

### Scalability of Alya on Piz Daint hybrid nodes

**Project Name**: Scalability of Alya on Piz Daint hybrid nodes

**Project leader**: Dr Guillaume Houzeaux

**Research field**: Engineering

**Resource awarded**: 100000 core hours on Piz Daint

Description

This project aims to analyze the scalability and performance of Alya on the hybrid supercomputer Piz Daint of the Swiss National Supercomputing Centre (CSCS). Alya is a high performance computational mechanics code designed to solve coupled multi-physic problems. The physics solvable with the Alya include: incompressible/compressible flow, solid mechanics, chemistry, particle transport, heat transfer, turbulent flows, electrical propagation, etc. Alya is one of the twelve simulation codes of the Unified European Applications Benchmark Suite (UEABS) and thus complies with the highest standards in HPC. It has proven to efficiently scale up to 100K CPU-cores for industrial applications. Recently the execution on GPU Accelerators has also been enabled in Alya. We have attested the performance of a hybrid approach on the Mino-Tauro supercomputer of the Barcelona Supercomputing Center (BSC), using up to 128 GPU for a single job. This project goes one step further, we want to analyze in detail the performance of Alya on a hybrid Tier0 system such as Piz Daint in order to understand the capabilities in such a system for future production projects.

### Proton Transfer in Dry Protic Ionic Liquids: Modelling Innovative Materials for Electrochemistry

**Project Name**: Proton Transfer in Dry Protic Ionic Liquids: Modelling Innovative Materials for Electrochemistry

**Project leader**: Prof. Enrico Bodo

**Research field**: Chemical Sciences and Materials

**Resource awarded**: 50000 core hours on Marconi-Broadwell,

Description

Ionic liquids (ILs) are salts made by bulky, sterically mismatched molecular ions which possess a low melting point owing to the fact that the electrostatic interactions are weakened and lattice formation frustrated by geometric effects. In contrast to traditional organic solvents, ILs possess negligible flammability and volatility and represent a new class of “green” solvents that are inherently safer and more environmentally friendly than conventional solvents. A Protic Ionic Liquid is formed through a simple acid base reaction and the removal of the produced water. When the difference of Pka between the acid and the conjugate acid of the base is large (>10 Pka units) the ensuing liquid is generally completely ionised. In this case, the acidic proton is transferred quantitatively to the base during the acid-base reaction and result strongly bound to it. Proton mobility is therefore very limited. For this reason, conduction in these liquids is due to ion drifting (Walden mechanism) and inversely proportional to the liquid viscosity. In order to have a larger conductivity one has to find a way to promote the formation of different, possibly more mobile, charge carriers. One of the way to achieve this objective is to have proton transfer from one molecule to another through what is commonly known as Grotthuss mechanism where the charge is transferred through the H-bond chains. Recently published results from us [1] have revealed that the combination of amino acids in their deprotonated and thus anionic form with choline cations ([Ch]) could represent a good candidate for achieving dry proton transfer in ionic liquids. These materials give origin to a novel and potentially important class of protic ionic liquids (PILs) where H-bonds play an important role. The possibility we intend to explore is that of using PILs made by molecular ions that carry an additional protic function. Compounds such as [Ch][Asp] and [Ch][Cys] have this features. The former has a weak acid terminal in addition to the one that is deprotonated in the formation reaction, while the latter has a weak basic proton attached to the sulphur atom. [1] M. Campetella, E. Bodo*, M. Montagna, S. De Santis and L. Gontrani Theoretical study of ionic liquids based on the cholinium cation. Ab initio simulations of their condensed phases, J. Chem. Phys. 144, 104504 (2016)

### Benchmarks on European HPC systems

**Project Name**: Benchmarks on European HPC systems

**Project leader**: Dr Carmen Domene

**Research field**: Chemical Sciences and Materials

**Resource awarded**: 50000 core hours on Marconi-Broadwell, 50000 core hours on Curie, 100000 core hours on Jugueen, 10000 core hours on SuperMUC, 100000 core hours on Piz Daint

Description

The growing resistance of bacteria against conventional antibiotics has led to an intense search for alternatives such as surfactin. The unique feature of surfactin is its cyclic peptide headgroup, which is not completely hydrophilic and confers a certain degree of amphiphilic character. A variety of remarkable applications and physiological activities (antibacterial, antifungal, and hemolytic activities) have been proposed for surfactin, and therefore it has been the focus of increasing experimental and computational efforts for its potential biotechnological and biomedical applications.The goal of this project is to use computer simulations to characterise the aggregation properties of surfactin and its modes of action with model lipid membranes as well as to quantify its free energy of binding to some membranes of characteristic compositions. The computational results using the NAMD code will be used to interpret experimental data produced in large scale European neutron facilities.

### Accurately simulating the physics of the intra-cluster medium

**Project Name**: Accurately simulating the physics of the intra-cluster medium

**Project leader**: Dr Rahul Kannan

**Research field**: Universe Sciences

**Resource awarded**: 100000 core hours on Hazel Hen

Description

Thermal conduction is the process through which internal energy is diffusively transported by collision of particles. Conduction has been invoked by many authors to explain the low radiative cooling rate in clusters, in terms of a conductive heat flow that offsets the central cooling losses. Another interesting phenomenon is the generation of turbulent pressure support in clusters induced by buoyancy instabilities coupled to thermal conduction. This may drive macroscopic turbulence with a pressure support of ~20% that is claimed to be additive to the turbulence driven by cosmological infall. There also exists the possibility that magneto-thermal instability driven turbulence can explain the existence of Mpc-size magnetic fields in the outer parts of clusters. In order to investigate these interesting astrophysical phenomena we have developed a novel extremum preserving anisotropic diffusion solver for thermal conduction on the unstructured moving Voronoi mesh of the AREPO code (Kannan et al. 2015c). The method relies on splitting the one-sided facet fluxes into normal and oblique components, with the oblique fluxes being limited such that the total flux is both locally conservative and extremum preserving. Efficient fully implicit and semi-implicit time integration schemes are also implemented. I plan to run a series of high resolution cluster simulations with MHD and anisotropic thermal conduction turned on in order to accurately simulate and quantify the effect of conduction on cluster scales. Using this Type-A allocation, I will test the performance and scaling of the new implementation on Hazel-Hen and Piz Daint.

### Benchmark of NAMD for binding free energy calculations

**Project Name**: Benchmark of NAMD for binding free energy calculations

**Project leader**: Dr. Shunzhou Wan

**Research field**: Biochemistry, Bioinformatics and Life sciences

**Resource awarded**: 100000 core hours on Piz Daint

Description

The project aims to predict the strength of macromolecular binding free energies using computationally based molecular modelling. For successful uptake in drug design and discovery, reliable predictions of binding affinities need to be made on time scales which influence experimental programmes. Therefore, speed is of the essence if we wish to use free energy based calculation methods in the area. GPU accelerators can significantly reduce the time required for the calculations. In the project, key proteins in cancer will be used. Benchmark of MD packages NAMD and AMBER will be performed and compared with that from CPU-only machines.

### LESCHATS – Large Eddy Simulation of a CHAnnel flow to study Turbulence and Scalars

**Project Name**: LESCHATS – Large Eddy Simulation of a CHAnnel flow to study Turbulence and Scalars

**Project leader**: Prof. Iztok Tiselj

**Research field**: Engineering

**Resource awarded**: 100000 core hours on Juqueen

Description

In some industrial applications, turbulent flows lead to fluctuating thermal stresses inside neighbouring walls. Among those applications, there are critical ones (long term ageing materials and severe emergency cooling in nuclear industry for instance). For such complex applications, investigations often rely on turbulence models. Wall-resolved Large Eddy Simulation (LES) provide valuable insights for understanding the flow physics while providing reliable data to improve turbulence models. We will perform wall-resolved LES of a turbulent channel flow to produce a validation database for turbulence models at a high Reynolds number that include temperature-related statistics. Most of the publicly available databases for the turbulent channel flow or the turbulent boundary layer do not include any statistic related to temperature fluctuations because no passive/active scalar was solved in the simulation. A few databases do include a passive scalar but most of the time, the boundary condition on that scalar is an imposed temperature (Dirichlet). A couple of databases do include passive scalars with an imposed heat flux (Neumann) but none include cases with a turbulent heat exchange coefficient (Robin). However, it is established that the boundary condition used for the scalar strongly impacts the near-wall statistics (turbulent heat fluxes, scalar variance and associated dissipation rates). In addition, it was recently established that Robin boundary conditions can mimic some cases with fluid-solid coupling better than Dirichlet or Neumann boundary conditions. In a first time, our validation database will include temperature-related statistics for different boundary conditions: imposed temperature (Dirichlet), imposed heat flux (Neumann) and various turbulent heat exchange coefficients (Robin). Later on, it will include cases with fluid-solid coupling.

### The role of AGN in cosmic reionisation by galaxies

**Project Name**: The role of AGN in cosmic reionisation by galaxies

**Project leader**: Dr Maxime Trebitsch

**Research field**: Universe Sciences

**Resource awarded**: 50000 core hours on Marconi-Broadwell, 50000 core hours on Curie, 100000 core hours on Hazel Hen,

Description

The Epoch of Reionization is the era of the first light sources, when the Universe was less than a billion years old. The nature of these radiative sources is highly debated, but there seems to be a consensus that the faint objects, just below the current observational limit and that will be observed by the upcoming James Webb Space Telescope, are important contributors to the reionization budget. Even in a case where reionization is dominated by galaxies, the presence of massive black holes has been suggested in the center of these galaxies, and their feedback could affect the ionising efficiency of galaxies. Our scientific project aims to model high-redshift galaxies and understand how the feedback from a central black hole can affect the transfer of ionising photons from the stars to the intergalactic medium. We will perform high resolution, adaptive mesh refinement (AMR) cosmological simulations tracking ionising radiation from high redshift galaxies. We will compare two similar runs, differing only by the inclusion (or not) of black hole physics.

### Performance evaluation of the ultrasound k-Wave toolbox

**Project Name**: Performance evaluation of the ultrasound k-Wave toolbox

**Project leader**: Dr Jiri Jaros

**Research field**: Biochemistry, Bioinformatics and Life sciences

**Resource awarded**: 100000 core hours on Piz Daint

Description

During last couple of years, our research group has been developing the k-Wave toolbox. This toolbox targets on the full-wave simulation of non-linear ultrasound wave propagation in heterogeneous absorbing media. Since the first beta release in 2010, k-Wave has rapidly become the de facto standard software in the field, with almost 8000 registered users in 60 countries (from both academia and industry). The toolbox now underpins a wide range of international research in ultrasound and photoacoustics, ranging from the reconstruction of clinical photoacoustic images to fundamental studies into the design of ultrasound transducers. The clinically relevant simulations are computationally very demanding. The biggest simulations executed so far have consumed over 3TB of RAM and 250k corehours on 1024 cores. Recently, we have developed a novel GPU-based simulation code based on the local Fourier basis decomposition reducing the communication overhead by replacing global communication executed inside 3D FFTs by a local neighbour communications similar to classic FDTD methods. We have tested this code on the Anselm supercomputer with 16 K20m GPUs and the Emerald with 128 M2070 GPUs. Both scaling tests showed very high parallel efficiency (over 95%) being almost 7 times faster than the CPU code running on the same number of nodes while saving over 70% of corehours. Unfortunately, the Emerald system is quite outdated while Anselm only offers 16 GPUs. The goal of this project is to test our simulation code on a large number of recent Pascal GPUs and collect results for the upcoming SC17 supercomputing conference. This project is also the first step to run large production simulations on GPU clusters.

### 3D MHD Simulation of the Kruskal-Schwarzschild Instability in a Strongly Magnetized Outflow

**Project Name**: 3D MHD Simulation of the Kruskal-Schwarzschild Instability in a Strongly Magnetized Outflow

**Project leader**: Dr Ramandeep Gill

**Research field**: Universe Sciences

**Resource awarded**: 50000 core hours on Curie, 100000 core hours on Hazel Hen, 100000 core hours on SuperMUC

Description

Relativistic outflows occur in a wide variety of astrophysical systems. These sources probe extreme physical conditions that are beyond our reach on Earth, such as strong gravity, very large densities and magnetic fields, extremely energetic particles, and relativistic bulk motions. They are promising sources of gravitational waves or high-energy neutrinos, and most likely produce the highest energy cosmic rays. However, their origin and inner workings are still poorly understood. A widely accepted hypothesis is that all of these sources share a common mechanism originally developed for pulsar winds. Namely, it is assumed that relativistic jets in AGNs, GRBs, and micro-quasars are driven by rotating, twisted magnetic fields that transfer the rotational energy to large distances in the form of a Poynting flux. In pulsar winds and AGNs there is good evidence that the jet is indeed initially highly magnetized, while for GRBs and micro-quasars the jet composition, and in particular its initial degree of magnetization is unclear, and therefore of great interest. The focus of this preparatory work is on magnetically dominated jets in which energy is dissipated through the process of magnetic reconnection that is facilitated by plasma instabilities and/or magneto-hydrodynamic (MHD) turbulence. An attractive mechanism that may efficiently dissipate magnetic energy and at the same time contribute to the acceleration of the Poynting flux dominated outflow is the Kruskal-Schwarzschild (KS) plasma instability. This instability has only been studied analytically (Lyubarsky 2010) in the context of strongly magnetized relativistic jets, where the analytic treatment only considered the simplest case of a striped wind with exactly anti-aligned magnetic field lines. More complex and realistic scenarios, where the magnetic field lines can have a general misalignment angle and where the plasma modes can develop at different angles with respect to the magnetic field lines, can only be realized numerically with the help of computationally expensive 3D MHD numerical simulations. To this end, we aim to use the preparatory access to study this instability using the publicly available MHD code Athena. We have already conducted 2D simulations of the KS instability that allowed us to simulate only the exactly anti-aligned magnetic field lines case. These simulations were performed on a single node with 8 cores and thus restricted us to low resolution runs. High resolution 3D simulations of strongly magnetized relativistic plasmas are still challenging and therefore during the preparatory access the code will be tested at different resolutions to see if it is able to resolve the instability at physically important scales. A key test for the simulations will be the comparison of the growth rate of the instability in the linear stage to that obtained from analytic linear stability analysis (which we have already calculated). This will guide us to the minimum resolution needed to resolve the instability so that a longer running simulation, which will explore the non-linear development, can be setup during the production runs. This preparatory access will also be used to conduct a parameter space study to prepare for the higher resolution publication runs.

### Two dimensional fission landscapes with the Gogny force for r-process abundances calculation.

**Project Name**: Two dimensional fission landscapes with the Gogny force for r-process abundances calculation.

**Project leader**: Dr Jean-François Lemaître

**Research field**: Universe Sciences

**Resource awarded**: 50000 core hours on Curie

Description

The nuclear fission process is a fundamental mechanism for our understanding of the nucleosynthesis of the elements heavier than iron, by the so-called rapid neutron capture process, or r-process, which remains one of the main puzzles of modern astrophysics. More specifically, fission plays a major role to explain the final r-process abundance distribution, in particular those resulting from the decompression of high-density matter dynamically ejected from neutron star merging systems [1]. From the theoretical point of view, despite more than 70 years of research, fission is one of the least well understood processes in low energy nuclear physics. The recent discovery of an asymmetric fission in the light mercury region [2] and the controversy in terms of interpretation [3-4]. is an illustration among others of the difficulties one has to face. In practical applications, it remains of major importance to be able to estimate accurately the probability that fission occurs when competing with other decay channels or the number of neutrons released during the fission process. Almost all existing evaluations of the fission observables rely on the multiple-humped fission penetration model where barriers are described by inverted decoupled parabolas [5]. Such approaches consider all ingredients as free parameters in order to be able to achieve more or less accurate fits to experimental cross sections. Although such adjustments respond to the needs of crucial nuclear applications (in particular, nuclear energy production and nuclear waste transmutation), their predictive power remains poor due to the large number of free parameters and, therefore, these methods cannot be used in applications requiring a purely theoretical description of fission for experimentally unknown nuclei, such as nuclear astrophysics. Our goal thus consists in computation of potential energy surfaces (PES) as a function of the nucleus elongation and asymmetry for a large number of actinides (i.e. from Z=80 to Z=110) and from neutron to proton drip line using the microscopic quantum mechanical approach starting from the sole Gogny effective nucleon-nucleon interaction. The fission paths deduced from these PES will provide the fundamental nuclear ingredients required for the calculations of neutron-induced fission cross sections, as well as spontaneous and beta-delayed fission rates [6]. It is worth mentioning that such a systematic study over a huge range of nuclei has never been performed before using a purely microscopic approach, mainly because of the computing time it requires. [1] S. Goriely et al.,”New Fission Fragment Distributions and r-Process Origin of the Rare-Earth Elements”, Physical Review Letters 111 ( 2013) 242502. [2] A. N. Andreyev et al., “New Type of Asymmetric Fission in Proton-Rich Nuclei”, Phys. Rev. Lett. 105, 252502 (2010). [3] P. Möller et al.,, “Calculated fission yields of neutron-deficient mercury isotopes”, Phys. Rev. C 85, 024306 (2012). [4] S. Panebianco et al., Role of deformed shell effects on the mass asymmetry in nuclear fission of mercury isotopes”, Physical Review C 86 (2012) 064601. [5] D. L. Hill and J. A. Wheeler, “Nuclear constitution and the interpretation of fission phenomena, Physical Review 89 (1953) 1102. [6] A.J. Koning et al., “TALYS-1.0”, Proceedings of the International Conference on Nuclear Data for Science and Technology, April 22-27, 2007.

### Computational model of plasticity in the cerebellar input layer

**Project Name**: Computational model of plasticity in the cerebellar input layer

**Project leader**: Dr Jesus Garrido

**Research field**: Mathematics and Computer Sciences

**Resource awarded**: 50000 core hours on Marconi-Broadwell, 10000 core hours on Juqueen, 100000 core hours on Piz Daint

Description

The ability to perceive and understand the state of the surrounding environment and the own state is critical for next generation robotic systems. To that aim, the human brain is still far beyond current artificial systems performance due to its capability of processing huge amounts of heterogeneous sensorial data. Interestingly, the cerebellum has been shown to play a crucial role in the generation of dexterous movements as evidenced from cerebellar ataxic patients. Behavioural studies suggest that the cerebellum actively improves sensorial discrimination and proprioception thanks to the prediction of the sensorial consequences of actions. In the last decade, several forms of long-term synaptic plasticity have been observed within the cerebellum, suggesting that distributed plasticity could support the predictive action. However the way in which those mechanisms cooperate in order to improve the function of the whole cerebellar network is not completely understood. In this project, the candidate will develop a novel theory of sensorial information representation and processing based on the cerebellar architecture. The proposed model will make use of long-term synaptic plasticity mechanisms distributed along connections existing in the cerebellar input layer (granular layer) to iteratively create sparse representations of the information, allowing fast and effective learning in successive layers. The predictions extracted from this model will be useful to design new experimental protocols to unveil the cerebellar role in acting and sensing. This activity will take place in the framework of the 2 following research projects: Human Brain Project (HBP) granted to Prof. Ros at the University of Granada. During the SGA1 phase of this project Prof. Ros’ group develops cerebellar models for the neurorobotics platform (SP10). CEREBSENSING: Cerebellar Distributed Plasticity Towards Active Sensing and Motor Control recently granted to Dr. Garrido from the European Commission (Marie-Curie IF Actions). This project will allow the research group to develop novel cerebellar models where synaptic plasticity will play a key role and explore its influence in active sensing and motor control. In the framework of the described projects, a novel cerebellar model will be developed and applied for motor control and sensorial discrimination tasks. This cerebellar model will be mainly focused in the plastic capabilities that each synaptic layer has evidenced to maintain. Strangely enough, plastic properties of cerebellar synapses have long been ignored in computational modelling (with the exception of the parallel fibers plasticity), mainly due to the absence of powerful enough computational resources. One of the main issues that computational modelling of synaptic plasticity faces nowadays is the large amount of simulation time it requires. Although synaptic plasticity in neuronal systems occur along different time-scales [Garrido et. al., 2013], the sensorial discrimination process has been proved to require long-term plasticity (e.g., consider the time a newborn takes to correctly discriminate and coordinate its own movements). Thus, the study of the synaptic plasticity mechanisms in biological systems will need the stimulation of the neuronal networks with realistic patterns of activity emulating (as much as possible) the real signals that sensorial systems provide. This activity will take place in the framework of the 3 following research projects, in which computational modelling plays a main role and where the requested HPC facilities will be crucial: Human Brain Project (HBP) granted to Prof. Ros at the University of Granada and Prof. D’Angelo at the University of Pavia from 1/04/2016 to 31/03/2018. During the SGA1 phase of this project Prof. Ros’ group develops cerebellar models for the neurorobotics platform (SP10) while Prof. D’Angelo’s group is involved in the development of cerebellar simulation models (SP6). CEREBSENSING: Cerebellar Distributed Plasticity Towards Active Sensing and Motor Control recently granted to Dr. Garrido from the European Commission (Marie-Curie IF Actions) from ending 2015 to early 2018. This project will allow the research group to develop novel cerebellar models where synaptic plasticity (distributed in synapses along the whole cerebellar network) will play a key role and explore its influence in active sensing and motor control.

### Electron transport through blue copper azurin

**Project Name**: Electron transport through blue copper azurin

**Project leader**: Dr Linda Angela Zotti

**Research field**: Chemical Sciences and Materials

**Resource awarded**: 50000 core hours on Curie

Description

The first measurements of the electrical conductance through single proteins have been performed in the past few years by STM(Scanning Tunneling Microscopy)-break junction technique [1]. This field has so far lacked theoretical support, mainly because proteins are very complex systems. Within the project “Protein Based Electronics”, which has been granted by the Spanish Ministery program for young researchers (Ref. MAT2014-58982-JIN), I intend to tackle this issue. Research on the conductive properties of proteins has up until now focused mainly on their possible employment as active elements in electronic devices (switches, nanocable, sensors..) [2]. Nevertheless, it can be foreseen that electron current measurements can in future provide information about structural differences between proteins, useful for medical purposes. For instance, it is well known that the activity of a protein strongly depends on its conformation, which can vary with small changes in the sequence. In cancerous tissues, some critical proteins were found to be damaged or inactive; incorrect protein folding is known to be a possible cause of various kinds of disease. Since previous theoretical and experimental studies have shown a strong dependence of the electron current on the geometry of the molecular backbone for various kinds of organic molecules, the same is expected for proteins. Electron transport through proteins is known to consist of two main components, namely a coherent part (tunneling) and an incoherent part (hopping). From a theoretical point of view, while the incoherent transport has been extensively analyzed, this is not the case for the coherent component, which has only been tackled in a very limited number of studies, with methods based on strong approximations. Instead, it needs a proper analysis which can provide information about the nature of the transport, for instance whether it is electron or hole transport and which specific orbital carries the current. In this project, I am focusing on this aspect, studying the coherent transport through protein-based junctions and exploring the structural dependence. Use of DFT(Density Functional Theory), NEGF(Non-equilibrium Green’s Functions) techniques will be made. I will carry out the task in collaboration with Juan Carlos Cuevas, an expert in the theory of molecular electronics, and with Ruben Perez and Jose Guilherme Vilhena, who have acquired extensive experience in molecular dynamics applied to biophysics. [1]Artés, Juan M., Ismael Díez-Pérez, and Pau Gorostiza. “Transistor-like Behavior of Single Metalloprotein Junctions.” Nano letters 12.6 (2011): 2679-2684. [2] Chen, Yu-Shiun, Meng-Yen Hong, and G. Steven Huang. “A protein transistor made of an antibody molecule and two gold nanoparticles.” Nature nanotechnology 7.3 (2012): 197-203.

**Type B: Code development and optimization by the applicant (without PRACE support) (11)**

### Improving the Performance of Canonical Polyadic Decomposition of Tensors on Distributed Memory Systems

**Project Name**: Improving the Performance of Canonical Polyadic Decomposition of Tensors on Distributed Memory Systems

**Project leader**: Prof. Dr. Cevdet Aykanat

**Research field**: Mathematics and Computer Sciences

**Resource awarded**: 100000 core hours on Marconi-Broadwell, 200000 core hours on Curie, 250000 core hours on Hazel Hen, 250000 core hours on Juqueen

Description

The tensors are multidimensional arrays and arise in many applications. Most of the time, the applications that use tensors require decomposing it into components, which is accomplished by the Canonical Polyadic Decomposition (CPD) [1,2,3]. CPD approximates a given tensor as the sum of R rank-one tensors, where R denotes the rank of the decomposition. Then the problem of finding the CPD is finding the vectors whose outer-products give the rank-one tensors, where those vectors can be organized as the n factor matrices, where n is the number of tensor dimensions. The most popular method for computing the CPD of a given tensor is Alternating Least Squares (ALS) method, which recomputes each of the factor matrices at each iteration. While recomputing a factor matrix, ALS needs to multiply the tensor with the remaining n-1 factor matrices, which is an expensive operation in terms of both the time and the space. To efficiently parallelize the ALS operation on a given tensor for distributed memory systems, the computational dependencies of the processors should be taken into account carefully. Our aim is to reorder the tensor indices in such a way that each processor is assigned a disjoint subset of nonzeros and the rows of the factor matrices where the computational dependencies present for the nonozeros owned by each processor is upper bounded by a limit. In our proposed model, those limited computational dependencies are encoded via a hypergraph partitioning model. The proposed hypergraph partitioning model minimizes the total communication volume of the processors for communicating the factor matrix rows needed by other processors to complete their assigned multiplications. We also aim at maintaining a balance on the computational workloads of the processors, which is again ensured by the proposed hypergraph model. We are planning to reorder a given tensor by using the proposed hypergraph partitioning model, distribute the tensor nonzeros and factor matrix rows accordingly, and run the parallel ALS algorithm. We are planning to compare the proposed tensor decomposition scheme against the state-of-the-art ALS method [4], which is reported to perform best among the distributed memory implementations of the ALS method. References: [1] A. H. Andersen and W. S. Rayens. Structure-seeking multilinear methods for the analysis of fmri data. NeuroImage, 22(2):728 – 739, 2004. [2] A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka, and T. M. Mitchell. Toward an architecture for never-ending language learning. In Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010. [3] Y. Shi, A. Karatzoglou, L. Baltrunas, M. Larson, A. Hanjalic, and N. Oliver. Tfmap: Optimizing map for top-n context-aware recommendation. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’12, pages 155–164, New York, NY, USA, 2012. ACM. [4] S. Smith and G. Karypis. A medium-grained algorithm for distributed sparse tensor factorization. 30th IEEE International Parallel & Distributed Processing Symposium, 2016.

### Open MP/MPI Parallelization of a Kinetic Octree Multi Species Polyatomic Code (KOPPA)

**Project Name**: Open MP/MPI Parallelization of a Kinetic Octree Multi Species Polyatomic Code (KOPPA)

**Project leader**: Prof Angelo Iollo

**Research field**: Mathematics and Computer Sciences

**Resource awarded**: 100000 core hours on Marconi-Broadwell, 200000 core hours on Curie,

Description

The objective of this project is to work on a hybrid OpenMP/MPI version of a rarefied flow dynamics code (KOPPA ). This code is currently fully parallel with an MPI protocol in the physical space dimensions. Kinetic degrees of freedom offer an ideal candidate for shared memory parallelization. The objective of this preparatory access project is to develop and test such kind of hybrid approach to parallelism for Boltzmann like models. This improvement will significantly increase the code scalability and hence its potential impact in real applications such as atmospheric entry problems.

### Fast two-dimensional sparse matrix decomposition based on graph partitioning

**Project Name**: Fast two-dimensional sparse matrix decomposition based on graph partitioning

**Project leader**: Prof. Cevdet Aykanat

**Research field**: Mathematics and Computer Sciences

**Resource awarded**: 100000 core hours on Marconi-Broadwell, 250000 core hours on Hazel Hen, 250000 core hours on Jugueen,

Description

Solving large sparse linear systems on distributed systems necessitates a decomposition of the sparse matrix among processors. If discretized application is irregular, it usually pays off to model the sparse matrix as a graph or hypergraph and partition it using a partitioning tool since it is hard to predict the communication overhead that will be incurred in parallelization due to irregular sparsity pattern of the matrix. The available partitioning tools generally aim at reducing the total amount communication incurred in solving the linear system. There are several tools available for that purpose; Metis, PaToH, Scotch being the most commonly used sequential ones among them and ParMetis and Zoltan being the common parallel tools. There are many ways to decompose the sparse matrix in context via graph and hypergraph models. These models can broadly be categorized into two as one-dimensional (1D) and two-dimensional (2D) according to the smallest indivisible computational tasks they utilize. In 1D models, the computational tasks are defined on whole rows or columns of the sparse matrix while in 2D models they are defined on finer granularity, i.e., on nonzeros or on a subset of nonzeros in a row or column. The advantage of 2D partitioning models is that they offer more flexibility in partitioning in terms of reducing communication overheads due to the mentioned characteristics. However, since they possess more computational tasks, it also makes them bigger in size compared to their 1D counterparts and this results in increased partitioning overhead. Due to inherent difficulty and several different approaches in representing the sparse matrix for 2D partitioning, only the more general hypergraph models [1] [2] found their use in the literature. Partitioning hypergraphs are more expensive compared to partitioning graphs, however, usually a factor of two to four. Also considering the increased size of models for 2D partitioning, using hypergraph models for 2D partitioning becomes an expensive feat due to high partitioning overhead. In this project, we aim to reduce the high partitioning overhead of 2D models via parallel graph partitioning. Our graph model is novel in the sense that it uses graphs to model the sparse matrix in which the vertices of the graph correspond to subset of nonzeros in rows/columns in order to achieve a 2D decomposition. This graph model aims to obtain a Cartesian decomposition of the matrix in an intelligent manner where the total volume of communication is minimized and a bound is provided on the maximum number of messages communicated. In this manner, our model is expected to obtain superior quality partitions considerably faster compared to existing 2D models. We plan to realize our model with ParMetis [3], a successful parallel graph partitioner commonly used by the computational scientists. [1] Çatalyürek, Umit. V., Cevdet Aykanat, and Bora Uçar. “On two-dimensional sparse matrix partitioning: Models, methods, and a recipe.” SIAM Journal on Scientific Computing 32, no. 2 (2010): 656-683. [2] Selvitopi, Oguz, and Cevdet Aykanat. “Reducing latency cost in 2D sparse matrix partitioning models.” Parallel Computing 57 (2016): 1-24. [3] Karypis, George, Kirk Schloegel, and Vipin Kumar. “Parmetis: Parallel graph partitioning and sparse matrix ordering library.” Version 1.0, Dept. of Computer Science, University of Minnesota (1997).

### Wall modeling for Finite Element LES

**Project Name**: Wall modeling for Finite Element LES

**Project leader**: Dr Herbert Owen Coppola

**Research field**: Engineering

**Resource awarded**: 100000 core hours on Marconi-Broadwell, 200000 core hours on Curie, 100000 core hours on SuperMUC

Description

In the near future, we will use the prodigious potential offered by the ever-growing computing infrastructure to foster and accelerate the European transition to a reliable and low carbon energy supply. We are fully committed to the former goal by establishing an Energy Oriented Centre of Excellence for computing applications, (EoCoE), through the on-going contract H2020-EINFRA-2015-1. EoCoE aims to assist the energy transition via targeted support to four renewable energy pillars: Meteo, Materials, Water and Fusion, each with a heavy reliance on numerical modelling. The primary goal of EoCoE is to create a new, long lasting and sustainable community around computational energy science. To this end, we are resolving the current bottlenecks in application codes, leading to new modelling capabilities and scientific advances among the four user communities. Furthermore, we are developing cutting-edge mathematical and numerical methods, and tools to foster the usage of Exascale computing. For the EoCoE project, we are really interested in improving our turbulent flow simulation code for complex topographies. Nowadays, this code is used with success by the renewable energy company IBERDROLA to improve the production of their wind farm designs. Currently, the state of the art, for wind farm assessment is to use RANS (Reynolds-averaged Navier–Stokes equations) turbulence models. It is well known that they suffer from limitations for flows with important separation, as happens in complex terrain. With the advent of Exascale computing, LES (Large Eddy Simulation), which provides better accuracy at a higher computational cost, becomes an interesting alternative. For Atmospheric Boundary Layer, wall modeling is key to make LES simulations feasible due to their very high Reynolds Numbers. The standard implementation of the wall law model in the Finite Element context provides poor results for LES flows. We have been working on an improved implementation that must now be tested on Atmospheric Boundary Layer problems of interest for the wind energy community.

### Increased Asynchrony for Adaptive Mesh Refinements

**Project Name**: Increased Asynchrony for Adaptive Mesh Refinements

**Project leader**: Asst. Prof. Didem Unat

**Research field**: Mathematics and Computer Sciences

**Resource awarded**: 200000 core hours on Marconi-KNL, 250000 core hours on Hazel Hen,

Description

Adaptive Mesh Refinement (AMR) is a well known method for efficiently solving PDEs. A straightforward AMR algorithm typically exhibits many synchronization points, where costly communication often degrades the performance and inhibits scalability. We are developing a runtime system to hide the effects of communication in an AMR application. We are optimizing the synchronous AMR algorithm in the BoxLib software framework without severely affecting the productivity of the application programmer.

### Communication and Multicore Optimization of Multiphase Flow Simulations

**Project Name**: Communication and Multicore Optimization of Multiphase Flow Simulations

**Project leader**: Asst. Prof. Didem Unat

**Research field**: Engineering

**Resource awarded**: 200000 core hours on Marconi-KNL

Description

The goal of the proposed project is to improve the data movement overhead within a shared memory node and as well as across distributed nodes for large-scale flow simulations. Fluid mechanics plays an important role in natural events and engineering applications either explicitly or implicitly. There are two reasons why we chose to study flow simulations in this project: 1) One important aspect of fluid mechanics is that mathematical models of Navier Stokes equations are accurate and reliable but their flow simulations generally require parallel platforms for realistic solutions. As a result, efficient parallelization of flow simulations is of great importance to industrial applications, in turns to national economy. 2) Particularly simulations using hybrid Eulerian-Lagrangian approaches as the ones under the scope of this project are very complex in terms of their communication pattern. While Lagrangian particles move with local flow, they interact both with Eulerian system and with other particles. Because of this reason, Eulerian-Lagrangian flow simulations are attractive and challenging in terms of research perspective.

### Tiling Based Programming Model for GPUs

**Project Name**: Tiling Based Programming Model for GPUs

**Project leader**: Asst. Prof. Didem Unat

**Research field**: Mathematics and Computer Sciences

**Resource awarded**: 100000 core hours on Piz Daint

Description

In this project we are extending a synchronous tiling library to run asynchronously on heterogeneous systems. We focus on the development of an asynchronous runtime system and its scheduler. Previously we have developed the TiDA library which runs only on the homogeneous multicore systems. We are extending it to exploit GPU systems. With the help of the PRACE systems, we will be developing and testing its performance on the emerging parallel architectures on our combustion simulations.

### Computational design of nanobodies for the molecular recognition of protein biomarkers

**Project Name**: Computational design of nanobodies for the molecular recognition of protein biomarkers

**Project leader**: Dr MIGUEL SOLER

**Research field**: Biochemistry, Bioinformatics and Life sciences

**Resource awarded**: 100000 core hours on Marconi-Broadwell, 200000 core hours on Marconi-KNL, 200000 core hours on Curie, 250000 on Hazel Hen,

Description

In the last two decades, biomarker appeared as promising tools in cancer diagnosis. Typical receptors capable of recognizing protein biomarkers are antibodies. However, while monoclonal antibodies are highly successful in therapeutics and diagnostics, they are highly costly, since they have to be optimized in vivo. Other important drawback is the low specificity to discriminate different isoforms of the same protein biomarker, leading to erroneous diagnosis/prognosis signals [1]. The goal of this project is to provide an alternative method for designing receptors for protein biomarkers. Specifically, we will demonstrate that in silico design can generate binders able to discriminate between closely related protein isoforms by guiding their binding to selective epitopes on tumor biomarkers. To this aim, we will use as alternative receptors nanobodies (VHH), the smallest antibody fragments which still preserve the binding capacity of the whole original antibody they derive from [2]. VHH are easy to engineer, and already widely used in the development of diagnostic and therapy reagents [1,2]. Their single-domain of 120-135 aminoacids can be modelled with little computational effort compared to that required for larger antibodies (>1200 aminoacids), and with greater accuracy, allowing time and cost effective in silico optimization and screening. On the other hand, since molecular modelling allows the control over the target binding site, it is possible to choose any epitope of choice to capture a target biomarker. Our computational protocol for the design of high specificity binders is an adapted version of our previous design protocol that has proven successful for optimizing peptides for drugs [3] and protein recognition [4]. The design approach is based on a combination of molecular dynamics (MD), cluster analysis, binding scoring and replica exchange Monte Carlo [4]. As a proof of concept, we perform the design of VHH for the recognition of the HER2, a protein biomarker overexpressed in ovarian, breast, stomach, and uterine cancer [5]. Starting from low-affinity VHH acquired from a synthetic library, we will optimize the VHH-protein binding by performing iteratively: (1) a single random mutation in the VHH binding region sequence; (2) a MD simulation in explicit solvent of the complex VHH-protein; (3) the evaluation of the binding affinity of the new sequence by using a scoring function; and (4) a Metropolis move to accept or reject the mutation, based on the comparison of binding energies of the original and mutated VHH with the protein. This preparatory access project is a preliminary benchmark for the adaptation and optimization of the VHH design algorithm for the high-specificity recognition of protein biomarkers, possibly leading to a production project to be submitted in the next calls. [1] Tsé C, et al. Cancer Treat. Rev. 2012;38:133-42. [2] Muyldermans S. Annual Rev. Biochem.. 2013;82:775-97. [3] Gladich I, et al. J. Phys. Chem. B. 2015;119:12963-9. [4] Soler M, et al. Phys. Chem. Chem. Phys. 2017;19:2740-8. [5] Lee CK, et al. J. Clin. Oncol. 2016;34:936-44.

### Plasmonics on the quantum level: ab initio study of surface plasmons in noble metals (gold, silver and copper)

**Project Name**: Plasmonics on the quantum level: ab initio study of surface plasmons in noble metals (gold, silver and copper)

**Project leader**: Dr Nathalie VAST

**Research field**: Fundamental Physics

**Resource awarded**: 200000 core hours on Curie

Description

Our project is devoted to the ab initio study of surface plasmons using time-dependent density-functional perturbation theory (TDDFPT) and the linear response approach. In order to characterize surface plasmons we simulate electron energy loss spectra (EELS) of the material using the Quantum ESPRESSO package for density functional theory (DFT) calculations, and specifically the turboEELS code for TDDFPT calculations. In our implementation of TDDFPT, we use the Liouville-Lanczos (LL) recursion scheme to compute, for a given value of the transferred momentum q, the frequency-dependent charge-density susceptibility. When q = 0, the susceptibility is the electronic response to an (optical) excitation with light, while the case of finite q corresponds to the excitation by an electron beam as done in a scanning electron microscope. The global aim of our project is to perform large scale ab initio calculations of the EELS for high Miller index surfaces of noble metals, and in particular of the (788) gold surface. We would like to characterize the surface plasmons of this system and, particularly, those whose frequency behaves linearly with the vanishing magnitude of the in-plane transferred momentum q, the so-called acoustic surface plasmons, which have the property to localize and enhance the electro-magnetic field, hence their potential use in plasmonics. In the present preparatory project, we plan to study and optimize the performance of the turboEELS code of the Quantum ESPRESSO package in order to prepare above-mentioned large-scale calculations. To this end we aim at performing turboEELS calculations for the (111) silver and gold surfaces, to determine the convergence parameters and study the scaling of our code when increasing the number of atoms in the system. Thus in order to handle large system, we plan to optimize our code with the (111) surface and make sure that we reach the best possible performance before launching large scale calculations on the (788) surface. Therefore we are presently doing development of the turboEELS code to introduce two new features: a dedicated algorithm for the case of exactly zero transferred momentum q=0, and a dedicated algorithm that takes into account spin orbit coupling together with ultrasoft pseudopotentials. We have already performed tests on TIERS-1 clusters and TIERS-0 HPC systems, and new MPI-optimizations are also planned to be performed in the framework of present project. The first dedicated algorithm (q=0 case) is already implemented and need to be tested. It will allow to double the code performance in the optical case. The second dedicated algorithm will allow to perform pertinent calculation for heavy materials in which relativistic effects are important. In gold for instance, spin orbit coupling yields to a Rashba splitting in the band structure, which has not been account for so far in EELS calculations. It is hence mandatory to study the role of spin-orbit coupling in order to compute accurate spectra for a wide range of frequencies.

**Type C: Code development with support from experts from PRACE (2)**

### OPmized mulTI-fluid plasMA Solver (OPTIMAS)

**Project Name**: OPmized mulTI-fluid plasMA Solver (OPTIMAS)

**Project leader**: Dr Andrea Lani

**Research field**: Universe Sciences

**Resource awarded**: 250000 core hours on Hazel Hen, 250000 core hoursn on Curie

Description

A massively parallel numerical solver for the simulation of multi-fluid, chemically reacting and radiative magnetized plasmas on unstructured grids is being developed since 2013 within the open source COOLFluiD platform. The C++ code, which is truly unique in its kind and results from the collaboration among VKI, KU Leuven and NASA researchers, is unsteady, time- implicit and its main target applications include computationally intensive 3D simulations of crucial phenomena such as magnetic reconnection, wave propagation through the solar atmosphere and solar wind/earth magnetosphere interactions. Before moving towards more realistic, scale-resolving 3D simulations, requiring prohibitively large meshes in order to fully capture the physics of the phenomena we want to investigate (millions of grid points are already needed in 2D cases), a careful performance assessment and optimization is needed on Tier-0 systems. The usage of the latter is mandatory for fulfilling the memory and run-time requirements of our target 3D testcases which will help our research moving forward and which are expected to be groundbreaking in the field. During this project, we will test and optimize our MPI-based parallel algorithms (i.e. parallel mesh extrusion from 2D to 3D, mesh partitioning), including I/O (i.e. parallel reading of input mesh data files, parallel writing of solution files in different formats), for some target, realistic, 3D setups involving 18 advection-diffusion-reaction partial differential equations and more than 1 billion mesh points, to be run on up to 100,000 CPU-cores. Getting access to three different architectures (IBM BlueGene/ Q, CRAY, IBM System X iDataplex) will enable us to: – ensure portability of our code to those architectures, which may require tuning COOLFluiD’s CMake-based configuration system and implement some compilation-related fixes; – compare simulation and I/O performances on those systems; – identify and tackle pitfalls (especially in I/O) by upgrading existing parallel algorithms; – maximize memory scalability which, if not optimal, could be an issue especially on BlueGene/Q systems which have very limited memory per core. As final task, we also plan to test the GPU-enabled version of the same code (which has only been run on one node with 8 GPUs for toy problems so far, due to lack of computing resources at our disposal) on a reduced but still realistic 3D setup (

### Water droplets and turbulence interaction inside warm cloud — clear air interface

**Project Name**: Water droplets and turbulence interaction inside warm cloud — clear air interface

**Project leader**: Prof Daniela Tordella

**Research field**: Earth System Sciences

**Resource awarded**: 100000 core hours on Marconi-Broadwell, 100000 core hoursn on SuperMUC

Description

We focus on one of the main objectives of the network H2020 MSCA ITN COMPLETE, which started in June 2016 and is dedicated to the study of microphysics, turbulence and telemetry of clouds. The project was born and written within this group, the PI is Daniela Tordella. For the abstract and the partnership, please, confer Cordis http://cordis.europa.eu/project/rcn/203353_en.html. With respect to COMPLETE, our contribution focuses on the unsteady dynamics of the transport through interfaces between warm clouds and the surrounding clear air. Clouds are fugitive in nature. If one looks for a few seconds, they seem to keep the same form. When looking again, after a minute, one finds that are somewhat changed. Hardly then extended cloud formations can live for more than 2-3 days. Their spatial structure is in-homogenous and anisotropic with continuous changes associated with a large set of coexisting timescales. Until now numerical simulations of clouds hypothesized nearly steady and homogeneous conditions. To address the clear air-cloud interaction dynamics, we lean on the knowledge we produced in recent years on the turbulent mixing in inhomogeneous conditions and time decay. Our contribution relates to the anisotropic nature of small scale turbulence (Tordella&Iovieno, PRL 2011), the acceleration-containment properties of the turbulence self-transport (Iovieno&Tordella: JFM 2006, PRE 2008, PhysicaD 2012) and the transport of a scalar quantity such as water vapor (Iovieno et al, JoT, 2014). The cloud interface is modeled through two interacting regions at different turbulent intensity which represent the cloud and the clear air, respectively. Different initial conditions reproduce possible local stratification in density and temperature. These conditions can be stable or unstable and simultaneously include water droplets (30 microns in diameter) subjected to evaporation, condensation and coalescence (EFMC 2016, APS DFD 2014, 2015). We foresee to simulate a small portion of atmosphere inbetween a warm cloud and the clear air above or below it. Vertically, the portion extends for six meters and, horizontally, to three meters in each direction. Total water content is typically one ppm in volume, which, associated to an initial condition where drops are 30 microns in diameter, leads to an initial number of drops of 10^11. The grid is of 4092x2048x2048 points, which leads to a Taylor’s microscale Reynolds number of 500. The governing equations are NS equations in Boussinesq‘s approximation coupled to the equation describing the trajectories of water drops seen as particles with inertia, transported by the background turbulence and subject to gravity. The aim of this preliminary submission is the code optimization both per se and to obtain a good scaling up to 16k cores. In future (regular Tier 0 call next to this call), we aim at runs producing data sets of about 180 TB (4 initial conditions, repeated 3 times). In this submission, we foresee two runs with initial conditions modeling different perturbations of the cloud-clear air interface. An evolution lasting 6-7 times the turbulence time scale is implemented. The grid is reduced to 2048 x 1024 x 1024 points, 30Tb is the total data storage.