PRACE Preparatory Access – 14th cut-off evaluation in September 2013

Find below the results of the 14th cut-off evaluation of September 2013 for the PRACE Preparatory Access Call.

Type A – Code scalability testing

Project name: DFT+U investigation of irradiation-induced defects in uranium dioxide

Project leader: Dr Marjorie Bertolus; CEA, DEN, FRANCE
Collaborators: Mr VATHONNE Emerson (CEA), Dr Michel Freyss (DEN-FR)
Research field: Chemistry and Materials
Resource awarded: 100,000 core hours on JUQUEEN @ GCS@Jülich, Germany; 50,000 core hours on CURIE FN @ GENCI@CEA, France; 100,000 core hours on SuperMUC @ GCS@LRZ, Germany


Abstract: Defect energies in UO2 are calculated in the DFT+U using the supercell approach. This type of calculations is expensive and numerous configurations need to be considered for each defect studied. Because of computational limitations, supercells consisting of 2x2x2 repetitions of the 12 atom conventional cell, i.e. containing 96 atoms, are currently used and the largest cell considered so far in the literature contained 144 atoms (2x2x3 supercell). It is therefore important to determine the error on the defect formation energies and fission incorporation energies induced by the supercell size in order to propagate the uncertainties to the higher scale models. The study of large defects involving 4 to 6 atoms observed experimentally after ion irradiation, as well as the calculations of charged supercells, for which the convergence of the energy as a function of supercell size is slow because of the long-distance effect of the electrostatic potential, make this even more necessary. Moreover, larger supercell sizes are necessary for the study of other uranium oxides phases, which exhibit complex structures (see J.P. Crocombette’s proposal on the feasibility of the investigation of U4O9 in the preparatory access call).

We would therefore like to study supercells consisting of

  • 3x3x3 repetitions of the 12-atom conventional cell, i.e. 324 atoms supercells
  • 4x4x4 repetitions of 12-atom conventional cell, i.e. 768 atoms supercellsThe calculation time being approximately proportional to the cube of the number of valence electrons present in the system and of the supercell parameter, this increase in size induces a drastic increase in the calculation time. In addition, an increased number of atoms slows down the convergence of the calculations. These calculations can therefore only be performed on Tier-O calculators.

The study of defect and fission product incorporation is mainly performed using the VASP code, which is computationally efficient for geometry optimization and energy determination but above all for the determination of migration pathways.

The objectives of the preparatory access are as follows:

  • Check the feasibility of calculations on very large supercells, in particular check convergence difficulties which arise with large number of atoms
  • Determine the speedup of DFT+U calculations on large UO2 supercells
  • Estimate calculation times for defect energy calculations
  • Test the performance on Curie thin nodes, on which we already perform 96-atom supercell calculations, as well as on machines we are not familiar with, especially Super Muc, Juqueen
  • Prepare the computational aspects of a proposal on electronic structure investigation of uranium oxides jointly with J.P. Crocombette for the eighth PRACE call

Project name: Scalability of post-Wannier codes for transport properties of thin layers.

Project leader: Dr Malgorzata Wierzbowska; University of Warsaw, POLAND
Collaborators: MSc. Karolina Milowska (Ludwig-Maximilians-Universitat Munchen – DE), Dr Svjetlana Galamic-Mulaomerovic (BarbaTech Ltd – IE)
Research field: Chemistry and Materials
Resource awarded: 100,000 core hours on SuperMUC @ GCS@LRZ, Germany


Abstract: Transport properties of thin layers are of great interest in future electronic devices. Especially, the ballistic transport, magneto-electric manipulation, thermal conduction and optoelectronic transport open new routes for magnetic, optical and thermal manipulation of new materials. Such effects like ordinary, anomalous and spin Hall and Nernst conductivities, as well as Seebeck and spin-Seebeck effects and optically enhanced transport have the origin in the band-structure and phonon properties. They can be calculated by means of the density functional theory (DFT) and beyond-DFT methods.We are interested in studying thin layers of (Ga,Mn)As and organic semiconductors. It is known that the width, doping concentration, temperature and surface reconstructions can change drastically the magneto-electric characteristics of (Ga,Mn)As (Chiba et al., PRL 104, 106601 (2010) ). Interest in studying organic semiconductors quickly grows by the scientific community due to hope for predictions of new topological insulators (Z.F. Wang, Zheng Liu, Feng Liu, Nature Communications, DOI: 10.1038/ncomms2451). These materials set a class of systems showing transport properties which we want to tailor – by changing chemically and electronically active groups. All calculations will be carried out by open-source codes: Quantum-Espresso, wannier90, EPW and BerkeleyGW. The preparatory access is necessary to test scaling properties of post-wannier90 codes with respect to the wall-time and number of CPUs, memory, and work-file disk usage. This preparation will provide important properties necessary to prepare a full access proposal, and share the results of our tests with the developers of these newly released codes, helping to optimize them.

Project name: Massively parallel aggregation-based algebraic multigrid

Project leader: Prof. Yvan Notay; Universite Libre de Bruxelles, BELGIUM
Collaborators: Prof. Artem Napov (Universite Libre de Bruxelles – BE)
Research field: Mathematics and Computer Science
Resource awarded: 100,000 core hours on JUQUEEN @ GCS@Jülich, Germany; 50,000 core hours on CURIE FN @ GENCI@CEA, France; 50,000 core hours on HERMIT @ GCS@HLRS, Germany


Abstract: AGMG (AGgregation-based algebraic MultiGrid solver) is a software package that solves large sparse systems of linear equations; it is especially well suited for discretized partial differential equations. AGMG is an algebraic solver that can be used black box and thus substitute for direct solvers based on Gaussian elimination. It uses a method of the multigrid type with coarse grids obtained automatically by aggregation of the unknowns. Sequential AGMG is scalable in the sense that the time needed to solve a system is (under known conditions) proportional to the number of unknowns. A parallel version exists which is known efficient on moderate size Intel clusters (with up to 200 cores), solving systems with up to
10,000,000,000 unknowns in less than 1 minute. On such parallel computers, AGMG exhibits weak scalability, in the sense that, if the number of unknowns is increased proportionally to the number of computing nodes being used, the time needed to solve the linear system remains roughly constant. AGMG appears also strongly scalable for certain ranges of computer and problem sizes: if the linear system to solve is large enough so that, after parallelization, there remain a significant numbers of unknowns on each processor (of the order of 100,000 on Intel clusters), then the time needed is roughly proportional to the inverse of the number of processors used.

Despite no scalability issue has been reported so far for AGMG, performance on massively parallel computers cannot be predicted. In particular, AGMG uses a special multigrid cycle (namely, the K-cycle), which requires more communication and synchronization than the standard multigrid V-cycle, and represents therefore a potential performance bottleneck, whose impact can be different on Intel farms, Cray, and IBM BG systems. On the opposite side, the simple aggregation process used by AGMG frees it from issues that typically limit the performances of other algebraic multigrid solvers, which are related to the more complex (and less inherently parallel) way used by these methods to construct the successive coarse grids.

The project aims at testing AGMG on different types of massively parallel computers, to identify if there are real sources of bottleneck while tuning some parameters for optimal performances. At the end we expect a clear picture of the capabilities of the software, and, if the scalability appears at some point limited, the identification of performance bottlenecks can be the starting point of a further project aiming at improving the efficiency of the method.

Project name: Scalability of the CFD code elsA for chimera grid computations.

Project leader: Mr Vincent Brunet; Onera, FRANCE
Collaborators: Dr Michel Gazaix (ONERA – FR)
Research field: Engineering and Energy
Resource awarded: 50,000 core hours on CURIE FN @ GENCI@CEA, France


Abstract: The aim of the project is to assess the scalability of the CFD code elsA for chimera grid computations on tier-0 type machines.The ONERA aerodynamic code elsA solves the Navier-Stokes equations with different turbulence models on structured grids. It is a widely used software in the aeronautical community both in academic institutions and also in industry: Safran, Airbus, Eurocopter,…. Its scalability properties for classical structured grids have been already demonstrated through a previous preparatory prace project. The objective here is to extend the assessment of scalability for a specific meshing technology called Chimera Grids. For this kind of computations, the different structured grids which discretize the fluid flow may be constructed independently of each other and independently of the bodies. It implies that the communications between the different grids are much more complex and have to be done using larger sets of data than those required for usual continuous grids. It results that the message passing streams between the different processes might be significantly affected as well as the overall distribution of computations on the cores to reach an efficient scalability on tier-0 machines. The planned test case considered in this study is a realistic configuration made of a civil like aircraft in a transonic buffeting configuration. It includes 500 million nodes. As the timescales necessary to describe the full physics of the buffeting are of some tenth of a second, it is anticipated that the future whole computation will require about 10 million core hours and so will be in the scope of the PRACE initiative.

Project name: Indentation of Metallic Nanoparticles in a Multi-million Atom Molecular Dynamics simulation

Project leader: Prof. Dan Mordehai; Technion, ISRAEL
Research field: Chemistry and Materials
Resource awarded: 100,000 core hours on JUQUEEN @ GCS@Jülich, Germany; 50,000 core hours on CURIE FN @ GENCI@CEA, France; 50,000 core hours on HERMIT @ GCS@HLRS, Germany; 100,000 core hours on SuperMUC @ GCS@LRZ, Germany; 100,000 core hours on FERMI @ CINECA, Italy


Abstract: The following study is part of our efforts to understand the strength of metallic specimen at the nanoscale. The goal of the preparatory stage is to examine the efficiency of a hybrid MPI+OpenMP molecular dynamics (MD) simulation on large supercomputer, towards a detailed study of the indentation process of metallic nanoparticles in the future. As a benchmark problem, we choose a multi-million atom problem of a nanoparticle under indentation. Efficient MD simulations of this problem is of great challenge since it includes a large number of atoms, which are distributed inhomogenously in space and are interacting via a multi-body interatomic potentials. One of the approaches to improve the efficiency of the simulations in this case is to combine different parallelization techniques.

In this project we employ the MD simulation code LAMMPS, a Large-scale Atomic/Molecular Massively Parallel Simulator. Recently multilevel parallelization was incorporated in LAMMPS and it can now combine both MPI tasks and OpenMP multi-threading. Since the atoms are not distributed homogeneously in our simulation box, the highest efficiency is expected to be obtained with this hybrid method: the computational cell will be spatially decomposed via MPI parallelization, running 1-2 MPI tasks per node, and multi-threading with OpenMP will provide additional parallelism within the node. We already obtained good results on our laboratory”s cluster and in the preparatory stage we will benchmark this multilevel parallelization approach on a large supercomputer. The outcomes of this stage will support a PRACE request in a regular call, on the study of nanomechanical properties at the nanoscale.

Project name: Use Cases for Curie Best Practice Guide (PRACE 3IP – Task 7.3)

Project leader: Dr Nikos Anastopoulos; GRNET, GREECE
Research field: Mathematics and Computer Science
Resource awarded: 50,000 core hours on CURIE FN @ GENCI@CEA, France


Abstract: The purpose of the project is to evaluate tools, techniques and methodologies for optimizing and tuning MPI/OpenMP/hybrid applications on the Curie thin, fat and hybrid nodes, and demonstrate their best usage in the context of Curie Best Practice Guide (PRACE-3IP, Task 7.3). The codes that will be used are simple micro-benchmarks (STREAM benchmark) for evaluating the memory system capability of the Curie nodes, as well as a complete solver based on the Jacobi method. Furthermore, exploration scripts will be run on the hybrid nodes to extract processor topology information. This is a follow-on project to “2010PA1482”, where a sparse matrix-vector multiplication kernel was evaluated for the same purpose (Curie Best Practice Guide, PRACE-3IP, Task 7.3).

Project name: Towards A
IMD of Ionic Liquid at electrochemical interface

Project leader: Prof. Maxim Fedorov; University of Strathclyde, UNITED KINGDOM
Collaborators: Dr. Ari Paavo Seitsonen (University of Zurich – CH), Dr. Vladislav Ivanistsev (University of Strathclyde – UK)
Research field: Chemistry and Materials
Resource awarded: 100,000 core hours on JUQUEEN @ GCS@Jülich, Germany


Abstract: In the present project we aim to test the feasibility of Ab Initio Molecular Dynamics (AIMD) simulations on solvation at electrochemical interfaces by ionic liquids (ILs). ILs possess a unique combination of properties that makes them more attractive solvents from fundamental and application point of view than common aqueous electrolytes. We plan to use an IL model (thousands atoms) and simulation time scale (up to hundreds picoseconds) larger than have been achieved in state-of-the-art AIMD simulations of electrified aqueous interfaces. The need for increase of size and time is justified by the complex molecular structure of ionic liquids and their high viscosity. The crucial role of dispersion interactions in ILs makes the calculations all the more challenging. With the allocated time, we plan not to perform any actual simulations but primarily to test the scalability and single-step calculation times for a test model (1-ethyl-3-methyl tetrafluoroborate near graphene surface: EMImBF4-Gr). Further we plan to propose the first of its kind AIMD study in a regular submission, for which we would have realistic estimation of the required CPU time. The scientific interest for such simulations is detailed below. We would like to test ability of CP2K program to treat the EMImBF4-Gr interfacial model at JUQueen Tier0 platform, for which the code was previously optimized. Foremost we will focus on effects with quantum origin: semi-metal nature of graphene, electron density redistribution between graphene and EMImBF4, and dispersion interactions. All these will require adjustment in specific parameters determining the convergence of the calculations as well as careful test of DFT functional against intrinsic DFT errors. Next, we aim to make an educated guess for optimal model size and MD time-step to satisfy the reliability of results with performance. We note that such pioneering AIMD simulations require computer resources unavailable from Tier1 and smaller computation platforms.

Project name: First Principles investigations of solvent effect in photophysical properties of ellipticine

Project leader: Dr Prasenjit Ghosh; Indian Institute of Science Education and Research, Pune, India, INDIA
Collaborators: Mr. Subrahmanyam Sappati(Indian Institute of Science Education and Research, Pune, India – IN), Dr Ralph Gebauer, Dr Ivan Girotto (Abdus Salam International Center for Theoretical Physics – IT)
Research field: Chemistry and Materials
Resource awarded: 100,000 core hours on FERMI @ CINECA, Italy


Abstract: Ellipticine, an important anti-cancer agent, shows interesting photophysical behavior in different solvents. For example though in most protic solvents exhibits a single fluorescence peak, in certain protic solvents like methanol (MeOH) and ethylene glycol (EG) dual fluorescence is observed. The dual emission has been attributed to the presence of normal ellipticine and its protonated form. However, the mechanism behind the protonation is highly debated. While some experiments claim that there is intra-molecular proton transfer, others show that the proton transfer is intermolecular. Additionally the absorption spectra in certain protic solvents like ethylene glycol (EG) and 1,1,1,3,3,3-hexafluoro-2-propanol (HFP) shows a new absorption band at 420 nm. In this project, using a combination of ab initio molecular dynamics and time dependent density functional theory we plan to understand the microscopic origin of these interesting photophysical properties.

Project name: Amorphous GeTe Phase Changing Material

Project leader: Prof. John Robertson; Cambridge University, UNITED KINGDOM
Collaborators: Mr Yuzheng Guo, Mr Xiaoming Yu (Cambridge University – UK)
Research field: Engineering and Energy
Resource awarded: 100,000 core hours on JUQUEEN @ GCS@Jülich, Germany; 50,000 core hours on CURIE FN @ GENCI@CEA, France


Abstract: The fast development of Si-based integrated circuits has been following Moore”s law for the past 40 years. The size of the electronic devices is approaching the quantum limit of several nanometers where the quantum effects greatly affect the performance. New materials and designs must be introduced in order to further improve the devices performance. In this work we focus on the design and material of Dynamic Random Access Memory(DRAM). Modern DRAM stores information as the charge in a capacitor, connected to a transistor (a “1T-1C” cell architecture). To read a cell, the capacitor is discharged, and the stored charge routed to a signal amplifier. Read operations are therefore destructive. The cell must be refreshed each time. Furthermore, the stored charge tends to leak, and so the entire memory bank must be refreshed periodically to prevent data loss. This makes DRAM power-hungry, and the refresh power can be a substantial fraction of a computer”s standby power drain.A new design of DRAM is based on the fast, reversible switching of a phase-change material (PCM) between one or more material states, typically amorphous and (partially) crystalline phases, with measurable contrasts in physical properties, such as optical reflectance and electrical resistivity. PCMs which show reflectance contrasts are suitable for optical-memory applications, and are the basis for rewritable optical discs.In this work we will use the ab intio molecular dynamics to study the physical properties of PCM during heating and cooling process. More specifically we will test the performance of CASTEP code on molecular dynamics simulation of GeTe amorphous and crystalline material.For a cooling or heating process, the simulation must be long enough in order to simulate the real cooling or heating rate. A large number of time step in simulation must be performed. The desired time step usually is around 10000-100000. Therefore the performance of CASTEP is critical in order to save the simulation time.We will also to check the sample size convergence for the amorphous phase simulation. Due to the computational limitation, the amorphous sample should be made as small as possible. However the periodical boundary condition used in CASTEP introduces mirror interaction. It will drive the system to crystalline phase easily. The amorphous sample must be large enough in order to suppress this effect. We will test different sizes of the amorphous phase and compare the structure during the process.

Type B – Code development and optimization by the applicant (without PRACE support)

Project name: Molecular Dynamics by quantum Monte Carlo forces for liquid water

Project leader: Prof Sandro Sorella; SISSA, ITALY

Mr. Ye Luo, Mr. Guglielmo Mazzola (SISSA – IT)
Research field: Chemistry and Materials
Resource awarded: 250,000 core hours on SuperMUC @ GCS@LRZ, Germany


Abstract: In the present work we wish to investigate the possibility to simulate in SUPERMUC the liquid water at room temperature and experimental density using a very accurate ab-initio method where the electronic structure calculations are based on an accurate Quantum Monte Carlo (QMC) method. This preparatory access is supposed to be a development/test part of a more ambitious project: the MODYQA Tier-0 application presented by the applicant in the 7th call.

Project name: Preparing for very large scale MD simulations of H2O nucleation

Project leader: Prof. Juerg Diemand; University of Zurich, SWITZERLAND
Research field: Chemistry and Materials
Resource awarded: 50,000 core hours on HERMIT @ GCS@HLRS, Germany; 250,000 core hours on SuperMUC @ GCS@LRZ, Germany


Abstract: Nucleation out of a homogeneous gas phase is suppressed by the Kelvin effect: growing the surface of small nano-scale droplets below some critical size takes more energy than what is gained by the additional liquid volume and such small droplets are more likely to evaporate than to grow. This allows homogenous substances to remain in the gas phase at higher pressures, i.e. they can reach a metastable supersaturated state.

Molecular dynamics (MD) simulations allow for the detailed direct tracking of the cluster formation process. Unfortunately, due to computational limitations, MD simulations have been limited to far higher supersaturations and nucleation rates, and to far lower critical clusters than those probed by laboratory experiments. PRACE resources have recently allowed us to close this gap for the case of Argon, where our billion atom MD simulations made the first direct comparison with experiments possible. The goal of this project is to achieve the same for the more complex cases of homogenous H$_2$O nucleation and for in-homogenous nucleation.

The current state-of-the-art MD simulations of water nucleation have up to 10″000 H$_2$O molecules. Our proposed runs contain up to one billion molecules and up to 100 million time-steps. We use the highly optimised code LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator), which performs large scale simulations of this type efficiently on up to 60″000 cores.

Project name: PRACE-3IP T7.2: Exploiting State-of-the-Art HPC Tools and Techniques to enable applications for Exascale (Part B)

Project leader: Dr Michael Lysaght; ICHEC, IRELAND
Research field: Engineering and Energy
Resource awarded: 200,000 core hours on CURIE FN @ GENCI@CEA, France; 250,000 core hours on SuperMUC @ GCS@LRZ, Germany


Abstract: The objective of PRACE-3IP Work Package 7 (WP7) “Application Enabling and Support” is to provide applications enabling support for HPC applications codes which are important for European researchers to ensure that these applications can effectively exploit multi-petaflop systems. This applications enabling activity aims to use the most promising tools, algorithms and standards for optimisation and parallel scaling that have recently been developed through research and experience in PRACE and other projects.

It is widely expected that the exascale systems of the future will be qualitatively different from current and past computer systems. They will be built using many-core processors with hundreds of cores per chip, their performance will be driven by parallelism, constrained by energy, and with all of their parts, will be subject to frequent faults and failures. While the focus of WP7 is on enabling European applications for current multi-petascale systems, this is not to mean that WP7 should ignore the challenges that are expected to confront applications on the road to exascale. While there still may not be a general consensus on what an exascale machine in the future will look like, it is becoming increasingly likely that it will share some of the characteristics of the current No.1 systems in both the Top500 and Green500 lists, indicating a possible convergence towards heterogeneous architectures as a means to reach exascale. With this view in mind, there are opportunities now for WP7 to anticipate and prepare for the challenges that will be faced as we advance from the petascale era to the exascale frontier.

In this project we aim to optimize and enable several scientific and engineering software applications that are of interest to the European scientific and engineering research community, with a particular focus on enabling these applications for future multi-petascale and exascale systems. Our work will be reported in D7.2.2 and will also be available to the European research community in the form of PRACE whitepapers.

Project name: Aghora_hpc_2

Project leader: Dr. Vincent Couaillier; ONERA, the French Aerospace Lab, FRANCE
Collaborators: Dr Marta de la Llave Plata, Dr Emeric Martin, Dr Florent Renac (ONERA, the French Aerospace Lab – FR)
Research field: Mathematics and Computer Science
Resource awarded: 200,000 core hours on CURIE FN @ GENCI@CEA, France


Abstract: ** This new project would be in a continuation of the PRACE project “pa1428” which will finish on October, the 15th. **

The research project AGHORA (Algorithm Generation for High-Order Resolution in Aerodynamics), started in January 2012, involves four departments of Onera (DSNA, DAAP, DADS, DTIM) and is managed by the team NUMF (NUmerical Methods for Fluid dynamics) in the CFD & Aeroacoustics Department (DSNA). A software demonstrator Aghora is developed for the solution of complex flow systems, including turbulent flows (RANS, LES, DNS), multi-phase flows and multi-species flows. The main goal is to investigate the potential of high-order schemes based on Discontinuous Galerkin methods to deliver accurate solutions, providing not only information about overall flow features but also local values of the quantities of interest. The downside of these methods, however, is that they require the solution of very large discrete systems and result in long execution times and high memory requirements. Designing new efficient scalable algorithms for very high-performance supercomputers therefore appears to be essential to address such a research challenge.

The goal of the PRACE project “pa1428” was to compare efficiency of a pure MPI version against hybrid MPI/OpenMP versions. A strong scalability analysis will be presented in a forthcoming PRACE report. This new project will focus on the parallelization of the boundary conditions by threads playing on MPI thread support levels. On top of that, we would like to profile our application to measure performance indicators (FLOPS, modified data sharing ratio for the threads, bus and parallelization ratio) and to detect bottlenecks arising from possible synchronization problems. Tools such as Intel Vtune, Paraver, Scalasca and Vampir would be
very useful for that. Finally, we plan to save elapsed time during the emission and reception phases by exchanging a smaller amount of data. This improvement is expected to lead to a better performance for large polynomial degrees.

Project name: Performance improvement of an hybrid OpenMP/MPI code for the solution of the Navier-Stokes equations by using new libraries and a different domain decomposition.

Project leader: Mr Luca Gallana; Politecnico di Torino, ITALY
Collaborators: Ms Francesca De Santi, Mr Silvio Di Savino, Mr Federico Fraternale, Dr Michele Iovieno, Prof. Daniela Tordella (Politecnico di Torino – IT)
Research field: Earth Sciences and Environment
Resource awarded: 200,000 core hours on CURIE FN @ GENCI@CEA, France; 250,000 core hours on FERMI @ CINECA, Italy


Abstract: With this project we wish to upgrade a pseudo-spectral hybrid OpenMP/MPI code for the solution of the Navier-Stokes equations. (see M. Iovieno, C. Cavazzoni, and D. Tordella, 2001 – CPC 141 and following updates,…). This code was used to simulate homogeneous turbulence and also simple anisotropic and inhomogeneous flows as the shear-less turbulent mixings (D. Tordella, M. Iovieno, and P. R. Bailey, 2008 – PRE 77, D. Tordella and M. Iovieno, 2011 – PRL 107; D. Tordella and K. R. Sreenivasan, 2012 – PhysD), recently also in presence of density stratification (2012/2013 PRACE project n°2011050773).

In a parallelepiped domain on a NxNxN3 grid points the code solves the NS equation to obtain the velocity field, an energy equation under the Bousinnesq approximation to obtain the density (or temperature) field, and a transport equation for the passive scalars (for instance water vapour, pollutants, tracers, etc.)

Different kind of parallelization have been implemented, as slab (1D) pure MPI parallelization and stencil (2D) pure MPI parallelization; recently also an hybrid OpenMP/MPI code have been developed.

The difficulty in Direct Numerical Simulation of turbulent phenomenologies are related to the non-linear growth of grid resolution and to the time required to perform a single simulation: both of them depends on the Reynolds lambda parameter (a measure of the turbulent intensity of a velovity which is based on the Taylor micro-scale) through an exponential relationship. Particularly, the number of point inside the computational grid scales with this control parameter to the 9/2 power, while the computational time scales with the 6 power.

In this project we want to focus not directly on the method of parallelization used, but rather on the libraries used, and on how they are implemented in the computational code. In fact, to be able to study turbulent fields under complex physical configurations involving multiple control parameters (Reynolds, Schmidt, Froude numbers, kinetic energy and integral scale ratios), we have to improve as much as possible the efficiency of the code.

To reach this target we plan to implement in the code new libraries (such as FFTW and MPI I/O), a new kind of data distribution among MPI instances and a global optimization to simultaneously improve time performance, scalability and memory usage.

Project name: Extracting cosmological constraints from large scale structure of galaxy clustering

Project leader: Dr. Chia-Hsun Chuang; Spanish National Research Council (CSIC) and the Autonomous University of Madrid (UAM), SPAIN
Research field: Astrophysics
Resource awarded: 200,000 core hours on CURIE FN @ GENCI@CEA, France


Abstract: With the increasing volume of the galaxy surveys, it is crucial to improve the precision of the methodology of extracting cosmological constraints from observed data. For this purpose, there are couple major methods/problems need to be improved/solved including error estimation, theoretical modeling, and minimizing systematic errors. N-body simulations or semi-Nbody simulation would be needed to test the possible solutions. In addition, we use Markov Chain Monte Carlos analysis to obtain the cosmological constraints. High performance computation is required for all the tasks described above.

Project name: Direct numerical simulation of a high-Reynolds-number homogeneous shear turbulence

Project leader: Prof Javier Jimenez; Universidad Politecnica Madrid, SPAIN
Collaborators: Mr Siwei Dong, Dr Atsushi Sekimoto (Universidad Politecnica Madrid – ES)
Research field: Fundamental Physics
Resource awarded: 250,000 core hours on JUQUEEN @ GCS@Jülich, Germany


Abstract: Turbulence is often induced by shear, and one of the most fundamental problems in fluid dynamics is to reveal the interaction between the mean flow and the kinetic energy of the turbulent fluctuations. The simplest flow in which to investigate this chaotic interaction is the so-called homogeneous shear turbulence (HST), which has a constant velocity gradient (shear) in one direction but whose statistics are not a function of space (homogeneity). It is known that HST has velocity “streaks” like those often observed in wall-bounded turbulent flows, and some features similar to the logarithmic layer in those flows. The logarithmic layer has been investigated for a long time, recently using simulations, and the multi-scale interactions among its eddies are of great interest. However, because of the non-linear nature of those interactions, the mechanisms by which large-scale motions are generated and later collapse into smaller eddies are not well understood. The key is to achieve a high-enough Reynolds number to include a healthy range of scales. The current state of the art in simulations of isotropic turbulence, which do not include shear, is Re_lambda=600-800, while the largest available simulations of wall-bounded flows (Re_tau=2000) reach Re_lambda=130. Since the cost of wall-bounded simulations increases roughly as Re_lambda^8, increasing the range of scales in real wall-bounded flows is out of reach for present computational resources, while HST is only slightly more expensive than the isotropic case.

The purpose of this project is to investigate the turbulent structures of the logarithmic layer, and of other shear flows, using data from direct numerical simulations of HST. That would be the subject of a later project request which we expect to be of the order of 40 Mcpuh, based on BG/Q. We have run preliminary tests to determine the necessary box sizes and Reynolds numbers, but the simulation code, which is a modification of a previous one for turbulent channels, needs to be ported to large machines. Our group has experience in large-scale computation. The channel code has been used extensively by several groups up to 2K cores, with good scaling, and a previous modification for boundary layers has run successfully in production in Jugene (Entrainment Effects in Rough-wall Boundary Layers; 2011040595), and Intrepid (Incite) on 32Kcores. The present modification has been tested in JUQUEEN up to 32K co
res with efficiency higher than 90%. The data from the final computation will be compared with those of turbulent channels and boundary layers at high Reynolds numbers, and the results will be made available to the community.

Project name: Large Scale Multidimensional Fission Landscapes with the Gogny force

Project leader: Dr Noel Dubray; CEA DAM DIF, FRANCE
Research field: Fundamental Physics
Resource awarded: 200,000 core hours on CURIE FN @ GENCI@CEA, France


Abstract: Despite more than 70 years of research, fission is one of the least well described nuclear reaction phenomena. At the same time, it is one of the most efficient methods actually employed to produce energy, and probably the only way to deal with nuclear wastes in the future through transmutation. This situation makes it necessary to continue theoretical as well as experimental research to model this complex process. Beyond these societal considerations, fission is also an important process through which elements heavier than iron have been produced in the Universe.In quantum mechanical approaches, the starting point of fission modelling is the potential energy surface (PES) which describes the possible paths that follows a nucleus while evolving from its fundamental state towards a very deformed configuration to finally separate into at least two fragments [1,2]. These PESs are only the starting point of more elaborated time dependent quantum mechanical approaches [1,2,3,4] which enable to deal with time evolution of the nuclear system during the process and have shown their ability to reproduce several important features of fission, such as the total kinetic energy and the fragments yields for instance [3,4].Up to now, most of the PESs have been determined as function of the two main deformation characteristics, namely the elongation and the asymmetry of the nucleus. Both are required to proceed towards configurations which can lead, as observed experimentally [5], to the formation of a heavy and a light fragment. However, it has been recently shown that such bidimensionnal PES are not, in most of the cases, continuous [6] and that other degrees of freedom have to be explored to reduce the importance of the observed discontinuities, therefore providing a smooth potential for solving the time dependent quantum mechanical schrodinger equation to rigourously propagate the nucleus wave function. Our project consists in computing systematically, within a Hartree-Fock-Bogoliubov microscopic approach based on the Gogny effective nucleon-nucleon interaction, PESs as function of three deformation parameters for all even-even actinides between Thorium (Z=90) and Darmstadtium (Z=110). Out of such microscopic calculations, we plan (i) to extract the fission path required to compute nuclear fission cross sections [7], (ii) to analyse the nuclear matter densities close to the scission configurations (points where fissioning system is close to separation into two fragments) to feed the SPY model with more fundamental bases [8] and (iii) to wash out first order discontinuities of the PESs, a necessary first step before performing a correct microscopic time-dependent description of the fission process.

Type C – Code development with support from experts from PRACE

Project name: Parsek2D-MLMD

Project leader: Dr. Maria Elena Innocenti; KULeuven, BELGIUM
Research field: Astrophysics
Resource awarded: 200,000 core hours on CURIE FN @ GENCI@CEA, France


Abstract: First principle simulations of plasma evolution can be obtained through the Particle In Cell (PIC) approach, where both electrons and ions are represented as particles moving through a grid that discretizes the domain. However, PIC simulations are a computationally very expensive endeavor and often implicit or adaptive techniques have to be used to simulate the problems of interest. Implicit methods can be used to lift the strict stability constraints of explicit PIC simulations. Adaptive methods adjust the grid resolution to the physics of interest rather than using the same spatial resolution for the entire grid.To our knowledge, our code, the C++ parallel code Parsek2D-MLMD, is the only implicit adaptive code available in the plasma PIC community.This means that the advantages of the two techniques are combined and the spatial and (in future implementations) temporal discretization step can be tailored on the local physics of interest with a notable saving of computational resources and thus the possibility to simulate bigger domains with respect to single level simulations.The Implicit Moment Method (IMM) described in Vu & Brackbill, 1992, and Lapenta et al., 2006, is used here. The Multi Level Multi Domain (MLMD) technique, described in Innocenti et al., 2013 and Beck et al., submitted, can be considered as part of the adaptive family, with critical differences. Mainly, fields and particles are represented at all levels (we identify as levels the grids simulated with different resolution), also for the areas which are also simulated with higher resolution. This offers notable algorithmic advantages in dealing with particles crossing into sub-domains at different resolution.The MLMD technique introduces three additional communication steps between the levels with respect to single level IMM simulations: field and particle boundary condition communication from the coarse to the refined levels and field projection from the refined to the coarse levels. These communication points constitute bottlenecks in the application which are theoretically unavoidable. However, their current implementation can surely be improved towards a more efficient one. Such optimization is the aim of this proposal.Parsek2D-MLMD improves resource consumption without degrading the physical significance of the simulation especially for scientific problems which are multi scale in nature, i.e. where processes at different scales develop in physically separated subdomains.As such, we propose to use as scientific case for the project magnetic reconnection problems, since there it is desirable to simulate big domains (tens to hundreds of ion skin depths in each direction) with relatively small areas of the domain requiring very high resolutions (fractions of the electron skin depth in the electron diffusion region), while the rest of the domain can be represented with a wider gridding.