PRACEdays14 Posters

PRACEdays14 Poster Session (Tuesday 20 May 2014, 18:30 – 19:30)

Award for Best Poster:

Simulating an Electrodialysis Desalination Process with HPC

PDF - 3.5 Mb

Kannan Masilamani is a PhD Student under the supervision of Prof. Sabine Roller,
University of Siegen

He is working on the BMBF funded project “HISEEM: Highly Efficient Integrated Simulation
of Electro-Membrane Processes for Desalination of Sea Water.” And his research focus is
the Multi-Species LBM method and it’s coupling with Electrodynamics


Electrodialysis can be use for efficient seawater desalination. For this, an electric field is used in combination
with selective membranes to separate salt ions from the seawater. Those membranes are kept apart by a
complex spacer structure. Within the spacer filled flow channel, the process involves the transport of ions and
the bulk mixture. A multi-species Lattice Boltzmann Method (LBM) for liquid mixture is implemented in our
highly scalable simulation framework Adaptable Poly-Engineering Simulator (APES) and deployed on High
Performance Computing (HPC) systems to gain some insights to this complex process.

For relevant results, it is necessary to simulate the full device used in the laboratory and industrial scale,
which results in simulations with half a billion elements. A performance analysis is done for the method on the
Cray XE6 system Hermit, HLRS, Stuttgart.

Perturbation-Response Scanning method reveals hot residues responsible for conformational
transitions of human serum transferrin protein

PDF - 1.7 Mb

Haleh Abdizadeh is a research assistant at the Computational Materials Science Laboratory,
Sabanci University, Turkey working on her PhD. She received her M.Sc. in Polymers
Engineering from Amirkabir University of Technology, Iran. Currently, her research focuses
on structural dynamics of proteins to detect conformational changes and allosteric modulation
of function in ferric binding proteins.


Proteins usually undergo conformational changes between structurally different forms to fulfill their functions.

The large-scale allosteric conformational transitions are believed to involve some key residues that mediate
the conformational movements between different regions of the protein. In the present work, we have
employed Perturbation- Response Scanning (PRS) method based on the linear response theory in order to
predict the key residues involved in protein conformational transitions. The key functional sites are identified
as the residues whose perturbations largely influence the conformational transition between the initial and
target conformations. Ten different states of the human serum transferrin (hTF) protein in apo, holo and partially
open forms under different initial conditions have been used as case studies to identify critical residues
responsible for closed- partially open- open transitions. The results show that the functionally important residues
mainly are confined to highly specific regions. Interestingly, we observe a rich mixture of both conservation
and variability within the identified sites. In addition, perturbation directionality is an important factor in
recovering the conformational change, implying that highly selective binding must occur near these sites to
invoke the necessary conformational change.

Moreover, our extensive Molecular Dynamics (MD) simulations of holo hTF in physiological and endosomal
pH are in remarkable agreement with experimental observations. Our results indicate domain
motions in the N-lobe as well as domain rigidity in the C-lobe at physiological pH. However, the C
lobe goes through more flexible dynamics at low pH, achieved as a result of protonation of pKa upshifted
residues. This flexibility in turn leads to the selective release of iron within this cellular compartment.

Old-fashioned CPU optimisation of a fluid simulation for investigating turbophoresis

PDF - 1.3 Mb

John Donners received his PhD in oceanography at Utrecht University in 2005. He was
a research scientist doing climate modelling on the Earth Simulator in Yokohama for the
UK-Japan climate collaboration from 2004 to 2008. He then joined SURFsara as a consultant
for supercomputing. His tasks include the parallellization, optimization and scaling of
HPC applications on national and European HPC systems.


Turbulent flows with embedded particles occur frequently in the environment and in industry. These flows
have richer physics than flow of a single-phase fluid and new numerical simulation techniques have been developed
in recent years. One of the main interests of this research is turbophoresis, the tendency of particles
to migrate in the direction of decreasing turbulence. This principle tends to segregate particles in a turbulent
flow toward the wall region and is expected to increase the deposition rate onto a surface. High-resolution
simulations with a spectral model are used to correctly predict the particle equation of motion in models that
do not resolve all turbulent scales.

Long integrations are required to reach statistical equilibrium of the higher-order moments of the particle
velocities. Most of the runtime of the spectral model is taken up by Fourier transforms and collective com
munications. To reach the required performance, the MPI-only parallellization scheme was extended with
the use of MPI datatypes, multi-threaded FFTW calls and OpenMP parallellization. To maximize efficiency,

MPI communication and multi-threaded FFTW calls are overlapped: the master thread is used to complete
the blocking collective communication, while computations are split across the other threads. To accomplish
this overlap, communication and computation of multiple variables is interleaved. When no communication
is required, computations are split across all threads. The core count was increased by a factor of 5.2, while
the total runtime could be reduced by a factor of 6.7.

Faster simulations allow for a tighter loop of hypothesis building and testing, which result in faster scientific
discovery. The parallellization techniques presented here only require relatively small modifications to the
code, without introducing revolutionary new paradigms for accelerators. This can keep the focus of the scientist
on the generation of knowledge.

Toward next stage of design method of polymer nano-composites by X-ray scattering
analysis and large-scale simulations on supercomputers

Dr. Katsumi Hagita is lecturer in Department of Applied Physics, National Defense Academy
of JAPAN. I received Ph. D degree in Physics from Keio University, Japan on March, 2001.
My research interests are polymer physics and polymer material science as well as data
analysis based on statistical physics and large scale simulations as computational physics.


Polymer Nano-Composites (PNC), ex. polymer films and tire rubber, is widely used in our usual life. Geometry
of nano-fillers has much important role to tune its function. Recently, nano-science and technology can
perform molecular level control of synthesis to make various branching of polymer, modification of end of

polymer and grafting to a substrate or a nano-particle, and observation of nano space from nano-meter to

submicron meter. With benefits of recent progress of massively parallel supercomputing, virtual experiments

to study effect by polymer architecture, morphology of nano particles can be performed for basic science
by current top supercomputers and will be for R&D of industrial products by future top supercomputers. We
proposed an approach combined X-ray scattering analysis and large scale simulations of bead spring model
of PNC. Overview of our simulation model and approach, and results are shown in my Poster Presentation.

This work is partially supported by JHPCN (Joint Usage/Research Center for Interdisciplinary Large-scale

Information Infrastructures) in JAPAN for efficient and advanced use of networked supercomputers.

The development of scalable unstructured high-resolution discretisations for Large Eddy
Simulation of turbomachinery flows

PDF - 102.2 Mb

Koen Hillewaert is the team leader of the Argo group at Cenaero. His group has been active
for several years in the industrialisation and large scale deployment of implicit high-order
discontinuous Galerkin methods for the DNS and LES of industrial flows. The group has
benefited from PRACE resources in the industrial pilot project noFUDGE (transitional flow in
a LP turbine) and the currently running two year project PadDLES (demonstration of p-variable
discretisations for LES). The group was furthermore involved in the European research
projects Adigma and IDIHOM on the industrialisation of high-order methods.

Koen is the head developer of the Argo code, responsible for discretisation and computational efficiency aspects.
He is one of the organisers of the workshop on higher-order methods for CFD. Koen is currently also
vice-chair of the PRACE user forum.


To allow the design for more reliable off-design operation, higher overall efficiency and lower environmental
nuisance of jet engines, more precise CFD tools will be required in complement to the currently available

The industry state of the art in CFD is largely based on statistical turbulence modeling. Scale-resolving approaches on the other hand compute (large) turbulent flow structures directly, thereby removing turbulence
modeling altogether (Direct Numerical Simulation/DNS) or reducing its scope to the smaller turbulent structures,
which are more universal in nature (Large Eddy Simulation/LES). Given the limitation of statistical models
to the prediction of near-design aerodynamic performance, there is a need for scale-resolving approaches
for the prediction of off-design aerodynamic performance, noise generation, combustion, transitional flows …

The stumbling block towards an industrial use of DNS and LES is the huge computational cost. The detailed
representation of turbulent flow structures impose huge resolution and accuracy requirements, unobtainable
by the low order discretisation methods currently used in industry. High resolution codes used for the fundamental study of turbulence are on the other hand not sufficiently flexible to tackle real industrial geometries,
and often do not provide possibilities for adaptive resolution, which could drastically enhance solution reliability.
The combination of high performance computing to adaptive unstructured high-resolution codes promises
a breakthrough in modeling capabilities.

This talk discusses the recent developments in the development of the discontinuous Galerkin Method for
the large-scale DNS and LES of turbomachinery flows. Due to its elementwise defined discontinuous interpolation,
this method features high accuracy on unstructured meshes, excellent serial and (strong) parallel
performance and high flexibility for adaptive resolution. The main focus of the talk will be the further assessment
of the LES models on benchmark test cases as well as the assessment of the benefits of local or-
der-adaptation currently persued in the PRACE project ‘PadDLES’. Furthermore, serial and parallel efficiency
optimisation will be discussed.

Accelerating Simulations of Hydrogen Rich Systems by a Factor of 2.5

PDF - 14.8 Mb

Himanshu Khandelia has a Masters in Biochemical Engineering and Biotechnology from
the Indian Institute of Technology, Delhi, after which he pursued a PhD in Chemical Engineering
from the University of Minnesota. HK moved to Denmark in 2006 for a postdoc,
and is currently Associate Professor at MEMPHYS, Center for Biomembrane Physics at the
University of Southern Denmark, Odense, Denmark. HK is supported by a Lundbeckfonden
Young Investigator award which is awarded to 6-8 outstanding young scientists each year.
We are interested in the biophysics of membranes, and of membrane associated phenomena,
such as the mechanism of ion pumping across the membrane, the biogenesis of lipid
droplets in the lipid bilayer and drug-membrane interactions. PRACE resources have been
instrumental in furthering our research.


Biological molecules are hydrogen-rich. Fast vibrations of H-bonded atoms and angles limit the time-step in
molecular dynamics simulations to 2 fs. We implement a method to improve performance of all-atom lipid
simulations by a factor of 2.5. We extend the virtual sites procedure to POPC lipids, thus permitting a time-
step of 5 fs. We test our algorithm on a simple bilayer, on a small peptide in a membrane, and on a large
transmembrane protein in a lipid bilayer, the latter requiring the use of HPC at Hector. Membrane properties
are mostly unaffected, and the reorientation of a small peptide in the membrane and the lipid binding and
ion binding of a large membrane protein are unaffected by the new VS procedure.

The procedure is compatible with the previously implemented virtual sites method for proteins, thus allowing
for VS simulations of protein-lipid complexes.

Currently, the method has been implemented for the CHARMM36 force field, and is applicable to other lipids,
proteins and force fields, thus potentially accelerating molecular simulations of all lipid-containing biological

Self-consistent charge carrier mobility calculation in organic semiconductors with explicit
polaron treatment

Ivan Kondov has acquired a master’s degree in theoretical chemistry and chemical physics
at the Sofia University and PhD in theoretical physics at the Technical University Chemnitz
with thesis on computational studies of electron transfer processes. He has over twelve
years experience with scientific computing and simulations. Since 2009 he is heading the
Simulation Laboratory NanoMicro at Steinbuch Centre for Computing, Karlsruhe Institute
of Technology. His scientific interests range in the broad fields of computational chemistry,
computational nanoscience and high performance computing.


Whole-device simulation of organic electronics is important for improving device performance. We present
a multi-step simulation of electronic processes in organic light-emitting diodes (OLEDs) achieved by multi-scale modelling, i.e. by integrating different simulation techniques covering multiple length scales. A typical
model with 3000 molecules consists of about 1000 pairs of charge hopping sites in the core region, which
contains about 100 electrostatically interacting molecules. The energy levels of each site depend on the
local electrostatic environment yielding a significant contribution to the energy disorder. This effect is explicitly
taken into account in the quantum mechanics sub-model in a self-consistent manner, which represents
however, a considerable computational challenge. Thus we find that the total number of computationally
expensive density functional theory (DFT) calculations needed is very high (about 105). Each of these calculations
is parallelized using the MPI library and scales up to 1024 Blue Gene/Q cores for small organic
molecules of about 50-100 atoms. Next data are exchanged between all contained molecules at each iteration of the self-consistence loop to update the electrostatic environment of each site. This requires that the
quantum mechanics sub-model is executed on a high-performance computing system employing a special
scheduling strategy for a second-level parallelisation of the model. In this study we use this procedure to
investigate charge transport in thin films based on the experimentally known electron-conducting small molecule
Alq3, but the same model can be applied to, for example, two-component organic guest/host systems.

CFD Simulations by Open Source Software

PDF - 53.2 Mb

Tomas Kozubek is a professor of applied mathematics at the VSB Technical University of
Ostrava and head of department Libraries for Parallel Computing at IT4Innovations National
supercomputing center. He obtained his PhD in Computer Sciences and Applied Mathematics
from VSB Technical University of Ostrava. His research interest is in scalable algorithms
for solving large problems of mechanics, FETI type domain decomposition methods,
quadratic programming algorithms and reliable solution of the nonlinear problems.

Tomas is also a local coordinator of the work packages within Partnership for Advanced Computing in Europe (PRACE) project and principal investigator for Czech Republic of EXascale Algorithms and Advanced
Computational Techniques (EXA2CT).


Demand from end users who need to solve their problems which are in many cases very complex is and
always has been driving force for developing of new efficient algorithms. This is even more apparent in era
of supercomputers. Nowadays high performance computers give their users computational power unimaginable
few years ago. Demand for algorithms able to tame and utilize this power has been lately driving force
for parallelization of existing and development of new parallel algorithms.

At this poster examples of engineering problems such as external aerodynamics, urban flow and thermodynamics
solved on High Performance Computing (HPC) platform are presented. To obtain high fidelity results
numerical models consisting of meshes with huge number of cells has to be created. As a consequence large
number of equations has to be solved to obtain final solution. To do so in acceptable time supercomputer

Anselm at National supercomputing center IT4Innovations, Czech Republic, was employed. To emphasize
advantage of supercomputers when it comes to computational time results of scalability for all cases are
presented at this poster as well.

Deployment of open source codes on HPC systems together with development of new algorithms for solving
large number of equations will enable researchers and engineers to solve even more challenging problems
in many areas and industries such as aerospace, automotive, biomechanics or urban flow.

GPGPU based Lanczos algorithm for large symmetric eigenvalue problems


PDF - 2.2 Mb

Mr. Vishal Mehta has his bachelor’s in Electronics and Communication from Nirma University,
India and M.Sc. in High Performance Computing from Trinity College Dublin. His active
field of research includes heterogeneous computing models and algorithms, High Performance
architectures and Hadoop Distributed File system. He is currently pursuing his M.Sc.
degree from Trinity College Dublin. He has previously worked at Space Application Centre,
Indian Space Research Organization; porting Synthetic Aperture Radar Algorithms on CUDA
platform. He is also a recipient of Government of Ireland award for international scholars
(2012) and Nvidia academic research grant (2011).


Eigen value problems are heart of many science and engineering applications. However, they are computationally expensive, especially when the eigenvalue systems are very large. There are techniques like power
iteration, Arnoldi’s algorithm, and Lanczos procedure when only few of large or small Eigen values are required.

The use of GPGPU for these computations is challenging. The CUDA computing model and PTX assembly
from Nvidia does provide flexible environment for a programmer to use the hardware to its threshold.

The Implicit restarted Lanczos has been developed for an NVIDIA GPU, providing notable speed up over
standard shared memory OpenMP model. The salient features include householder transformations for QR
decomposition and strum sequencing techniques for eigen values of symmetric tridiagonal matrix. The memory
levels like shared memory, caches, and registers have been efficiently used along with highly efficient
PTX assemblies. PTX assembly optimization includes reducing registers in use; by managing assembly instructions
pertaining to false shared memory initializations and false movements of values around registers.

Car body design in crash: a new optimization challenge

PDF - 570.8 kb

Marc Pariente. I am a Mechanical Engineer, I received my degree in 2003 at IFMA (Institut
Français de mécanique avancée) located in Clermont-Ferrand (FRANCE). My knowledge
is based on mechanics, material, optimization, Finite Element Method… Working at the Renault Research department since 2004 first on topics linked with engine ignition and spark
plug. Since 2012, I join the research optimization team to work on improvement on car crash
model and optimization tools.


The presentation will focus on the results of a PRACE HPC project initiated in March 2013 and completed in
March 2014. The purpose of the project is the optimal design of a vehicle body to reach the safety objective,
with representative means and targets that the automakers will use in the next 3-5 years. The project consists
of two complementary phases:

- The development of a crash numerical model integrating a more precise representation of the
physics than the current models (about 20 MFE, calculated with 1024 cores within 24hrs);

- The use of this model in a design study by optimization techniques in large dimension (about 100
parameters) , and representative of the combinatorial aspects of the industrial issues, such as the
re-use of existing parts up to design a new vehicle.

An application of the model reduction techniques in crash will help to conclude on the prospects for large-
scale optimization problems with heavy numerical simulations.

Harnessing Performance Variability for HPC Applications

PDF - 6.2 Mb

Antonio Portero. He received the Electronic Engineering degree, master in Microelectronics
and PhD (Suma Cum Laude) in Computer Science from the Universitat Autònoma de Barcelona
(Spain) in 1997, 2000 and 2008 respectively.

Currently, he is Senior Researcher at IT4Innovations National Supercomputer Center, Czech
Republic. Before, he was Research Associate at the University of Siena (Italy). He has been
involved in several European Projects: HARPA (Harnessing Performance Variability), TERAFLUX
in the area of Future and Emerging Technologies for Tera-device Computing, HiPEAC
(High Performance Embedded-system Architecture and Compiler) and ERA (Embedded Reconfigurable Architectures).

His current interest includes Computer Architecture themes such as Embedded Systems, Multiprocessors,
Memory System Performance, Workload Characterization and Network on Chip. He is HiPEAC member
(European Network of Excellence on High Performance and Embedded Architecture and Compilation),
IEEE (Institute of Electrical and Electronics Engineers) and ACM (Association for Computing Machinery).


The overall goal of the HARPA project is to provide architectures for High Performance Computing (HPC)-oriented
with efficient mechanisms to offer performance dependability guarantees in the presence of unreliable
time-dependent variations and aging throughout the lifetime of the system. This will be done by utilizing both
proactive (in the absence of hard failures) and reactive (in the presence of hard failures) techniques.
The term “performance dependability guarantee” refers to time-criticality (i.e., meeting deadlines), and a
predefined bound on the performance deviation from the nominal specifications in the case of HPC. The
promise is to achieve this reliability guarantee with a reasonable energy overhead (e.g. less than 10%
average). A significant improvement is hence achieved compared to the SotA, which now provides guarantees
at the payoff of at least 50% overhead. In addition, we will provide a better flexibility in the platform design
while still achieving power savings of at least 20%. To the best of our knowledge, this is the first project to
attempt a holistic approach of providing dependable performance guarantees in HPC systems. This is done
while taking into account various non-functional factors, such as timing, reliability, power, and aging effects.
The HARPA project aims to address several scientific challenges in this direction:

  1. Shaving margins. Similar to the circuit technique Razor, but with different techniques at the microarchitecture
    and middleware, our aim is to introduce margin shaving concepts into aspects of a system that are typically
    over-provisioned for the worst case.
  2. A more predictable system with real-time guarantees, where needed. The different monitors, knobs,
    and the HARPA engine will make the target system more predictable and pro-actively act on performance
    variability prior to hard failures. (iii) Implementation of effective platform monitors and knobs. HARPA will select
    the appropriate monitors and knobs and their correct implementation to reduce efficiency and performance

Technical Approach: HARPA Engine Overview

Figure shows the main concepts of the HARPA architecture and the main components of an architecture
that can provide performance-dependability guarantees. The main elements that distinguish a HARPA-enabled
system are: (i) Monitors and knobs, (ii) User requirements and (iii) HARPA Engine. The HARPA engine
actuates the knobs to bias the execution flow as desired, based on the state of the system and the
performance (timing/throughput) requirements of the application.

The concepts that are to be developed within the HARPA context address the HPC. More specifically, from
HPC domain we will use Disaster and Flood Management Simulation.

Web page:

Engineering simulations at CSUC

PDF - 2.2 Mb

Pere Puigdomènech Thibaut studied in the Lycée Français de Barcelone and received a
degree in Mechanical Engineering from the Escola Tècnica Superior d’Enginyeria Industrial
de Barcelona (ETSEIB) at the Universitat Politècnica de Catalunya (UPC).

He is currently working as HPC Engineering Support at the Consorci de Serveis Universitaris
de Catalunya (CSUC) in the HPC and Applications Department.


The Consorci de Serveis Universitaris de Catalunya (CSUC) shares academic, scientific, library, transfer
of knowledge and management services to associated entities to improve effectiveness and efficiency
by enhancing synergies and economies of scale. The center provide services to public and private universities,
research centers and institutes, offering a wide range of services such as supercomputing, communications,
advanced communications library resources, digital repositories, e-administration and shared services

The HPC&applications area of CSUC offers its knowledge to accademic and industrial users providing
technical and scientific support hence they can obtain the maximum benefit of the use of the HPC systems.

The poster will present Benchmark results of most used industrial codes showing performance behaviour in
real cases from:

- Ansys FLUENT 14: Truck_111m: Flow around a truck body (DES 111e6 elements) and Donaldson
LES (LES 20 e6 elements).

- Pamcrash 2012: Barrier: Entire car crash model (3e6 elements).

- ABAQUS 6.12 Explicit and Implicit: Cylinder head-block linear elastic analysis (5e6 elements) and
Wave propagation (10e6 elements).

- STAR-CCM + 7.02: Aeroacustic model (60e6 elements).

- OpenFOAM 2.0.0: Motorbike fluid dynamics (RANS 70e6 elements).

Solving Large non-Symmetric Eigenvalue problems using GPUs

PDF - 2 Mb

Teemu Rantalaiho received a Master of Science in theoretical physics from University of
Helsinki in 2010 and started doctoral studies under Kari Rummukainen later the same year
concentrating on computational methods in quantum field theory. Has previously worked
nearly a decade in industry doing multiphysics research, applied mathematics and software
engineering and software architecture for mobile graphics processors and applications.

Currently investigating technicolor theories using GPU-clusters and developing code as
well as investigating inversion methods to study transport coefficients in finite temperature
Quantum Field theories.

Recent work also includes contribution to a project to study eigenvalue distributions of the Wilson operator
in QCD-like theories. Further interests include parallelization techniques and their application in high
performance computing.


We present an implementation of the Implicitly restarted Arnoldi method (IRAM) with deflation optimized for
CUDA capable graphics processing units. The resulting code has been published online and is free to use
with two levels of APIs that can be tailored to meet many needs. The IRAM method is a Krylov subspace
method that can be used to extract a part of the eigenvalue/vector spectrum of a large nonsymmetric (non
hermitean) matrix. Our use case was the extraction of the low-lying eigenvalue distribution of the Wilson-
Dirac operator in the context of Lattice QCD and the large amount of computations needed for a single
calculation combined with our already CUDA capable QCD code warranted the use of a custom solution for
IRAM. Our approach followed the strategy of our QCD code where abstraction of parallel algorithms allows
us to decouple the actual scientific code from the underlying hardware; This way one can run the same code
on both CPUs and GPUs, greatly reducing development time, which is one of the key performance metrics
in production codes.

Benchmarks on a single Tesla k20m (ECC on – 175GB/s mem bw) GPU show that our algorithm runs about 18.5
times faster than ARPACK++ on a single core of a Xeon X5650 @ 2.67GHz (32GB/s) with a 786432 sized system
of a sparse (QCD) matrix with about 6 percent of the time spent in matrix-vector multiplies (on the GPU). On this
use-case the GPU code achieved 146 Gbytes/s, which is 83 percent of theoretical peak memory bandwidth. Our
code supports multiple GPUs through MPI and the code scales well as long as there is enough work to fill the GPUs.

High Performance Computing aspects of acoustic simulations of an air-intake system in OpenFOAM®

PDF - 1.3 Mb

Jan Schmalz is a research assistant and Ph.D. student at the chair of Mechanics and Robotics
at the University of Duisburg-Essen. He received a diploma in Mechanical Engineering from
the University of Applied Sciences Ravensburg-Weingarten in 2006 and after a few years of
work experience in the automotive industries and further studies of Mechanical Engineering he
received a Master of Science in Mechanical Engineering from the University of Duisburg-Essen in 2010. His research interests include computational fluid dynamics applications in high
performance computing frameworks and in particular the implementation of computational
aero acoustics approaches for parallel computational fluid dynamic simulations.


Air-intake systems of combustion engines emit sound mainly based on turbulences. But often the acoustic
parameters and the sound emission are considered not before an existing prototype. Unfortunately changes
of concepts are hardly feasible in that stage of development process. Numerical methods, like finite volume
methods for computational fluid dynamics, applied on virtual prototypes are helpful tools during the early
stages of product development processes. Concerning the acoustical behavior commonly used methods of
computational fluid dynamics are extended to compute e.g. the sound pressure level in the far field at a specific
observer point. The contributed data is comparable to the results of common acoustic measurements.

In this paper the open source computational fluid dynamics framework OpenFOAM® is used to solve the complex
fluid dynamic task of an air-intake system of a combustion engine. Due to the used numerical approach
it also has the principle functionality to solve aero acoustic problems. A computational aero acoustic approach
based on acoustic analogies is implemented in OpenFOAM® 2.1.1. This novel approach is mainly based on
Curle’s acoustic analogy where existing surfaces within the computation domain are rigid and stationary.
The CAA approach is added to originally distributed transient incompressible and compressible application
solvers, pisoFoam and rhoPimpleFoam respectively, which are both parallelized already and are able to run
on several compute cores.

The presented method takes into account the possibility and availability of high performance computing re
sources. It provides the advantage to compute the flow fields, acoustic sources and the corresponding sound
propagation in an extended near field on one mesh only which might be done during the first phases in product
development. The specific behavior of parallel computation of acoustical fields in a HPC environment will
be discussed by means of the mentioned computing case of an air intake system.

Linear Algebra Library for Heterogeneous Computing in Scientific Discovery

PDF - 281.5 kb

Thomas Soddemann studied physics and mathematics in Paderborn and Freiburg. He received
his Ph.D. in statistical physics from the Johannes-Gutenberg-University Mainz. Later
he worked at Johns Hopkins University, Sandia National Labs, and for the Max-Planck-Society’s Supercomputing Centre RZG before he joined Fraunhofer SCAI as group lead HPC.

His work focuses on numerical mathematical methods and automated code optimization.


Current hardware configurations evolve more and more to highly heterogeneous environments combining
traditional CPU based systems with accelerator boards. Obtaining good performance on such systems is
challenging and implies code adaption, integration of new components and using different libraries.

Application domains from various industrial fields including aerospace, automotive, engineering and oil & gas
exploration often can be subsumed to simulations solving big sparse linear systems of equations, which can
be challenging e.g. due to numerical stability and scalability.

The Library for Accelerated Math Applications (LAMA) accomplishes both: new and altering hardware systems with efficient backends for various architectures and accelerated calculation through a wide set of linear
solvers. LAMA affords full sparse BLAS functionality with a maximum of flexibility in hardware and software
decisions at the same time. The configuration of the whole LAMA environment can be set up by a Domain
Specific Language and can therefore be reconfigured on run time. Concepts of solvers, distributions, matrix
formats are exchangeable and users can switch between compute locations e.g. GPU or Intel® MIC. As new
hardware architectures and features are hitting the market in much shorter time intervals than ever before it
will be necessary to rely on flexible software technologies to adapt these changes and to be able to maintain
existing methods in time to benefit from them and stay competitive.