• PATCs and PTCs

  • PRACE operates six PRACE Advanced Training Centres (PATCs) since 2012, and they have established a state-of-the-art curriculum for training in HPC and scientific computing. PATCs carry out and coordinate training and education activities that enable both European academic researchers and European industry to utilise the computational infrastructure available through PRACE and provide top-class education and training opportunities for computational scientists in Europe.

    The six PRACE Advanced Training Centres (PATCs) are based at:

    • Barcelona Supercomputing Center (Spain)
    • Consorzio Interuniversitario, CINECA (Italy)
    • CSC – IT Center for Science Ltd (Finland)
    • EPCC at the University of Edinburgh (UK)
    • Gauss Centre for Supercomputing (Germany)
    • Maison de la Simulation (France)

    In addition to operating the PATCs, 4 PRACE Training Centres (PTCs) will be piloted. The PTCs will expand the geographical reach of PATCs by sourcing PATC courses locally, collaborating with PATCs in delivering courses locally or by complementing the PATC programme with local courses.

    The four selected PRACE Training Centers (PTCs) are based at:

    • GRNET – Greek Research and Technology Network (Greece)
    • ICHEC – Irish Centre for High-End Computing (Ireland)
    • IT4I – National Supercomputing Center VSB Technical University of Ostrava (Czech Republic)
    • SURFsara (The Netherlands)

    The following figure depicts the location of the PATC and PTC centers throughout Europe.

    PATC events this month:

    April 2018
    Mon Tue Wed Thu Fri Sat Sun
     
    1
     
    2
     
    3
     
    Description

    The course includes topics on code optimization for x86 platforms and efficient code parallelisation using OpenMP threading. Advanced aspects of threading and optimization, such as new features in OpenMP 4.5, will be covered during the course. Some performance aspects of hybrid MPI+OpenMP programs will also be discussed.

    Learning outcome

    Awareness of modern features of x86 CPUs
    Ability to vectorize computations
    Ability to use advanced features of OpenMP
    Ability to increase code performance using threading and x86 optimization
    Prerequisites

    Good knowledge of C/C++ or Fortran
    Good knowledge of threading using OpenMP
    Basic knowledge of MPI
    Basic knowledge of modern CPU architectures
    Agenda

    Day 1: Wednesday, April 4


    Course introduction


    Performance analysis methods and tools


    Vectorization using SIMD

    Day 2: Thursday, April 5


    More about SIMD vectorization


    Optimizing memory accesses

    Day 3: Friday, April 6


    Advanced OpenMP features


    OpenMP performance considerations


    Hybrid MPI and OpenMP

    Lecturers:   Sami Ilvonen (CSC), Mikko Byckling (Intel)

    Language:  English

    Price:          Free of charge

    https://events.prace-ri.eu/event/718/
    Apr 4 8:00 to Apr 6 15:00
    Description

    The course includes topics on code optimization for x86 platforms and efficient code parallelisation using OpenMP threading. Advanced aspects of threading and optimization, such as new features in OpenMP 4.5, will be covered during the course. Some performance aspects of hybrid MPI+OpenMP programs will also be discussed.

    Learning outcome

    Awareness of modern features of x86 CPUs
    Ability to vectorize computations
    Ability to use advanced features of OpenMP
    Ability to increase code performance using threading and x86 optimization
    Prerequisites

    Good knowledge of C/C++ or Fortran
    Good knowledge of threading using OpenMP
    Basic knowledge of MPI
    Basic knowledge of modern CPU architectures
    Agenda

    Day 1: Wednesday, April 4


    Course introduction


    Performance analysis methods and tools


    Vectorization using SIMD

    Day 2: Thursday, April 5


    More about SIMD vectorization


    Optimizing memory accesses

    Day 3: Friday, April 6


    Advanced OpenMP features


    OpenMP performance considerations


    Hybrid MPI and OpenMP

    Lecturers:   Sami Ilvonen (CSC), Mikko Byckling (Intel)

    Language:  English

    Price:          Free of charge

    https://events.prace-ri.eu/event/718/
    Apr 4 8:00 to Apr 6 15:00
    Description

    The course includes topics on code optimization for x86 platforms and efficient code parallelisation using OpenMP threading. Advanced aspects of threading and optimization, such as new features in OpenMP 4.5, will be covered during the course. Some performance aspects of hybrid MPI+OpenMP programs will also be discussed.

    Learning outcome

    Awareness of modern features of x86 CPUs
    Ability to vectorize computations
    Ability to use advanced features of OpenMP
    Ability to increase code performance using threading and x86 optimization
    Prerequisites

    Good knowledge of C/C++ or Fortran
    Good knowledge of threading using OpenMP
    Basic knowledge of MPI
    Basic knowledge of modern CPU architectures
    Agenda

    Day 1: Wednesday, April 4


    Course introduction


    Performance analysis methods and tools


    Vectorization using SIMD

    Day 2: Thursday, April 5


    More about SIMD vectorization


    Optimizing memory accesses

    Day 3: Friday, April 6


    Advanced OpenMP features


    OpenMP performance considerations


    Hybrid MPI and OpenMP

    Lecturers:   Sami Ilvonen (CSC), Mikko Byckling (Intel)

    Language:  English

    Price:          Free of charge

    https://events.prace-ri.eu/event/718/
    Apr 4 8:00 to Apr 6 15:00
    7
     
    8
     
    Overview

    This course is dedicated to scientists and students to learn (sequential) programming with Fortran of scientific applications. The course teaches newest Fortran standards. Hands-on sessions will allow users to immediately test and understand the language constructs. This workshop provides scientific training in Computational Science, and in addition, the scientific exchange of the participants among themselves.

    Only the last three days of this course are sponsored by the PATC project.

    For further information and registration please visit the HLRS course page.

    https://events.prace-ri.eu/event/688/
    Apr 9 8:30 to Apr 13 15:30
    The registration to this course will open on January 2018. Please, bring your own laptop.  All the PATC courses at BSC are free of charge.

    Course convener: David Vicente

     Objectives: The objective of this course is to present to potential users the new configuration of MareNostrum and a introduction on how to use the new system (batch system, compilers, hardware, MPI, etc).Also It will provide an introduction about RES and PRACE infrastructures and how to get access to the supercomputing resources available.

    Learning Outcomes: The students who finish this course will know the internal architecture of the new MareNostrum, how it works, the ways to get access to this infrastructure and also some information about optimization techniques for its architecture.

    Level: INTERMEDIATE -for trainees with some theoretical and practical knowledge; those who finished the beginners course.

    Prerequisites:  Any potential user of a HPC infrastructure will be welcome

    Agenda:

    DAY 1 (April, 9) 9am - 5pm

     

    09:00h - 09:30h Introduction to BSC, PRACE PATC and this training (David Vicente)

    09:30h - 10:30h MareNostrum 4 – the view from System administration group (Javier Bartolomé)

    10:30h – 11:00h COFFEE BREAK

    11:00h - 11:30h Deep Learning and Big data tools on MN4 (Carlos Tripiana)

    11:30h - 12:15h How to use MareNostrum 4: BASIC Things, BATCH system, Filesystems, Compilers, Modules, DT, DL, BSC commands (Oscar Hernandez, Rubén Ramos, Félix Ramos)

    12:15h - 13:00h Hands-on I (Oscar Hernandez, Rubén Ramos, Félix Ramos)

    13:00h - 14:30h LUNCH (not hosted)

    14:30h - 15:15h How to use MN4 – HPC architectures and parallel programming (Jorge Rodriguez, Pablo Rodenas)

    15:15h - 16:00h Hands-on II (Pablo Ródenas, Jorge Rodriguez)

    16:00h - 16:15h COFFEE BREAK

    16:15h - 17:00h How to use MN4 – Advanced I - MPI implementations, MPI IO,  GREASY (Pablo Ródenas, Jorge Rodríguez)17:00 - Adjourn

     

    DAY 2 (April, 10) 9am - 1:30pm

     

    09:00h - 09:30h You choose!: MareNostrum 4 visit (In the chapel) - Doubts + Hands ON + Tunning your app (In the classroom) (David Vicente, Jorge Rodríguez)

    09:30h - 10:00h How can I get resources from you? (RES Marta Renato)10:00h - 10:30h How can I get Resources from you? – (PRACE Cristian Morales)10:30h - 11:00h COFFEE BREAK

    11:00h - 11:25h Debugging on MareNostrum, from GDB to DDT (Pablo Rodenas Oscar Hernandez)11:25h - 12:00h Tuning applications   BSC performance tools (Extrae and Paraver) 
    12:00h - 13:00h Hands-on III – Performance tools and tunning your application13:00h - 13:30h Wrap-up : Can we help you in your porting ? How ? when ? (David Vicente)

    13:30h - AdjournEND of COURSE

    https://events.prace-ri.eu/event/646/
    Apr 9 10:00 to Apr 10 13:30
    Overview

    This course is dedicated to scientists and students to learn (sequential) programming with Fortran of scientific applications. The course teaches newest Fortran standards. Hands-on sessions will allow users to immediately test and understand the language constructs. This workshop provides scientific training in Computational Science, and in addition, the scientific exchange of the participants among themselves.

    Only the last three days of this course are sponsored by the PATC project.

    For further information and registration please visit the HLRS course page.

    https://events.prace-ri.eu/event/688/
    Apr 9 8:30 to Apr 13 15:30
    The registration to this course will open on January 2018. Please, bring your own laptop.  All the PATC courses at BSC are free of charge.

    Course convener: David Vicente

     Objectives: The objective of this course is to present to potential users the new configuration of MareNostrum and a introduction on how to use the new system (batch system, compilers, hardware, MPI, etc).Also It will provide an introduction about RES and PRACE infrastructures and how to get access to the supercomputing resources available.

    Learning Outcomes: The students who finish this course will know the internal architecture of the new MareNostrum, how it works, the ways to get access to this infrastructure and also some information about optimization techniques for its architecture.

    Level: INTERMEDIATE -for trainees with some theoretical and practical knowledge; those who finished the beginners course.

    Prerequisites:  Any potential user of a HPC infrastructure will be welcome

    Agenda:

    DAY 1 (April, 9) 9am - 5pm

     

    09:00h - 09:30h Introduction to BSC, PRACE PATC and this training (David Vicente)

    09:30h - 10:30h MareNostrum 4 – the view from System administration group (Javier Bartolomé)

    10:30h – 11:00h COFFEE BREAK

    11:00h - 11:30h Deep Learning and Big data tools on MN4 (Carlos Tripiana)

    11:30h - 12:15h How to use MareNostrum 4: BASIC Things, BATCH system, Filesystems, Compilers, Modules, DT, DL, BSC commands (Oscar Hernandez, Rubén Ramos, Félix Ramos)

    12:15h - 13:00h Hands-on I (Oscar Hernandez, Rubén Ramos, Félix Ramos)

    13:00h - 14:30h LUNCH (not hosted)

    14:30h - 15:15h How to use MN4 – HPC architectures and parallel programming (Jorge Rodriguez, Pablo Rodenas)

    15:15h - 16:00h Hands-on II (Pablo Ródenas, Jorge Rodriguez)

    16:00h - 16:15h COFFEE BREAK

    16:15h - 17:00h How to use MN4 – Advanced I - MPI implementations, MPI IO,  GREASY (Pablo Ródenas, Jorge Rodríguez)17:00 - Adjourn

     

    DAY 2 (April, 10) 9am - 1:30pm

     

    09:00h - 09:30h You choose!: MareNostrum 4 visit (In the chapel) - Doubts + Hands ON + Tunning your app (In the classroom) (David Vicente, Jorge Rodríguez)

    09:30h - 10:00h How can I get resources from you? (RES Marta Renato)10:00h - 10:30h How can I get Resources from you? – (PRACE Cristian Morales)10:30h - 11:00h COFFEE BREAK

    11:00h - 11:25h Debugging on MareNostrum, from GDB to DDT (Pablo Rodenas Oscar Hernandez)11:25h - 12:00h Tuning applications   BSC performance tools (Extrae and Paraver) 
    12:00h - 13:00h Hands-on III – Performance tools and tunning your application13:00h - 13:30h Wrap-up : Can we help you in your porting ? How ? when ? (David Vicente)

    13:30h - AdjournEND of COURSE

    https://events.prace-ri.eu/event/646/
    Apr 9 10:00 to Apr 10 13:30
    Overview

    This course is dedicated to scientists and students to learn (sequential) programming with Fortran of scientific applications. The course teaches newest Fortran standards. Hands-on sessions will allow users to immediately test and understand the language constructs. This workshop provides scientific training in Computational Science, and in addition, the scientific exchange of the participants among themselves.

    Only the last three days of this course are sponsored by the PATC project.

    For further information and registration please visit the HLRS course page.

    https://events.prace-ri.eu/event/688/
    Apr 9 8:30 to Apr 13 15:30
    Overview

    This course is dedicated to scientists and students to learn (sequential) programming with Fortran of scientific applications. The course teaches newest Fortran standards. Hands-on sessions will allow users to immediately test and understand the language constructs. This workshop provides scientific training in Computational Science, and in addition, the scientific exchange of the participants among themselves.

    Only the last three days of this course are sponsored by the PATC project.

    For further information and registration please visit the HLRS course page.

    https://events.prace-ri.eu/event/688/
    Apr 9 8:30 to Apr 13 15:30
    The registration to this course will open in January.

    All PATC Courses at BSC do not charge fees.

    PLEASE BRING YOUR OWN LAPTOP.

    Local Web Page:
    This is an expansion of the topic "OpenACC and other approaches to GPU computing" covered on this year's and last year's editions of the Introduction to CUDA Programming.
    This course will provide very good introduction to the PUMPS Summer School run jointly with NVIDIA -  also  at Campus Nord, Barcelona. For further information visit the school website.
    Convener: 
    Antonio Pena, BSC
    Acting Director,
    NVIDIA GPU Center of ExcellenceObjectives: 

    As an NVIDIA GPU Center of Excellence, BSC and UPC are deeply involved in research and outreach activities around GPU Computing. OpenACC is a high-level, directive-based programming model for GPU computing. It is a very convenient language to leverage the GPU power with minimal code modifications, being the preferred option for non computer scientists. This course will cover the necessary topics to get started with GPU programming in OpenACC, as well as some advanced topics.

    The target audiences of the course are students who want to develop exciting applications for these processors, as well as those who want to develop programming tools and future implementations for these processors.

    Level: (All courses are designed for specialists with at least 1st cycle degree or similar background experience)
    BEGINNERS: for trainees from different background or very little knowledge.

    Agenda:

    DAY 1
    9:00 - 10:00 Introduction to OpenACC on x86 CPU and GPU
    10:00 - 11:00 Hands-on: Introduction
    11:00 - 11:30 Break
    11:30 - 12:30 Profiling and Parallelizing with the OpenACC Toolkit
    12:30 - 13:30 Hands-on: Profiling and Parallelizing
    13:30 - 15:00 Lunch break
    15:00 - 17:00 Hands-on: Open LabsDAY 2
    9:00 - 10:00 Expressing Data Locality and Optimizations with OpenACC
    10:00 - 11:00 Hands-on: Data Locality and Optimizations
    11:00 - 11:30 Break
    11:30 - 12:30 Advanced OpenACC Techniques: Interoperability, MPI, and Pipelining
    12:30 - 13:30 Hands-on: Advanced Techniques
    13:30 - 15:00 Lunch break
    15:00 - 17:00 Hands-on: Open LabsEnd of Course

    https://events.prace-ri.eu/event/651/
    Apr 12 9:00 to Apr 13 18:00
    Overview

    This course is dedicated to scientists and students to learn (sequential) programming with Fortran of scientific applications. The course teaches newest Fortran standards. Hands-on sessions will allow users to immediately test and understand the language constructs. This workshop provides scientific training in Computational Science, and in addition, the scientific exchange of the participants among themselves.

    Only the last three days of this course are sponsored by the PATC project.

    For further information and registration please visit the HLRS course page.

    https://events.prace-ri.eu/event/688/
    Apr 9 8:30 to Apr 13 15:30
    The registration to this course will open in January.

    All PATC Courses at BSC do not charge fees.

    PLEASE BRING YOUR OWN LAPTOP.

    Local Web Page:
    This is an expansion of the topic "OpenACC and other approaches to GPU computing" covered on this year's and last year's editions of the Introduction to CUDA Programming.
    This course will provide very good introduction to the PUMPS Summer School run jointly with NVIDIA -  also  at Campus Nord, Barcelona. For further information visit the school website.
    Convener: 
    Antonio Pena, BSC
    Acting Director,
    NVIDIA GPU Center of ExcellenceObjectives: 

    As an NVIDIA GPU Center of Excellence, BSC and UPC are deeply involved in research and outreach activities around GPU Computing. OpenACC is a high-level, directive-based programming model for GPU computing. It is a very convenient language to leverage the GPU power with minimal code modifications, being the preferred option for non computer scientists. This course will cover the necessary topics to get started with GPU programming in OpenACC, as well as some advanced topics.

    The target audiences of the course are students who want to develop exciting applications for these processors, as well as those who want to develop programming tools and future implementations for these processors.

    Level: (All courses are designed for specialists with at least 1st cycle degree or similar background experience)
    BEGINNERS: for trainees from different background or very little knowledge.

    Agenda:

    DAY 1
    9:00 - 10:00 Introduction to OpenACC on x86 CPU and GPU
    10:00 - 11:00 Hands-on: Introduction
    11:00 - 11:30 Break
    11:30 - 12:30 Profiling and Parallelizing with the OpenACC Toolkit
    12:30 - 13:30 Hands-on: Profiling and Parallelizing
    13:30 - 15:00 Lunch break
    15:00 - 17:00 Hands-on: Open LabsDAY 2
    9:00 - 10:00 Expressing Data Locality and Optimizations with OpenACC
    10:00 - 11:00 Hands-on: Data Locality and Optimizations
    11:00 - 11:30 Break
    11:30 - 12:30 Advanced OpenACC Techniques: Interoperability, MPI, and Pipelining
    12:30 - 13:30 Hands-on: Advanced Techniques
    13:30 - 15:00 Lunch break
    15:00 - 17:00 Hands-on: Open LabsEnd of Course

    https://events.prace-ri.eu/event/651/
    Apr 12 9:00 to Apr 13 18:00
    14
     
    15
     
    The registration to this course will open in Jaunary.

    All PATC Courses at BSC do not charge fees.

    PLEASE BRING YOUR OWN LAPTOP.

    Local Web Page:
     

    This course will provide very good introduction to the PUMPS Summer School run jointly with NVIDIA -also  at Campus Nord, Barcelona. For further information visit the school website  as this school has attendee selection process.
    You may also be interested in our Introduction to OpenACC course.

    Convener: 
    Antonio Pena, BSC
    Acting Director,
    NVIDIA GPU Center of Excellence

    Objectives: 

    The aim of this course is to provide students with knowledge and hands-on experience in developing applications software for processors with massively parallel computing resources. In general, we refer to a processor as massively parallel if it has the ability to complete more than 64 arithmetic operations per clock cycle. Many commercial offerings from NVIDIA, AMD, and Intel already offer such levels of concurrency. Effectively programming these processors will require in-depth knowledge about parallel programming principles, as well as the parallelism models, communication models, and resource limitations of these processors.

    Agenda to be announced shortly.

    The target audiences of the course are students who want to develop exciting applications for these processors, as well as those who want to develop programming tools and future implementations for these processors.

    Level: (All courses are designed for specialists with at least 1st cycle degree or similar background experience)
    BEGINNERS: for trainees from different background or very little knowledge.

    Agenda:

     

    Day 1 (April, 16)

    09:00 – 10:45 The GPU hardware: Many-core Nvidia developments

    10:45 – 11:15 Coffee break

    11:15 – 13:00 CUDA Programming: Threads, blocks, kernels, grids

    13:00 – 14:00 Lunch break

    14:00 – 15:45 CUDA Tools: Compiling, debugging, profiling, occupancy calculator

    15:45 – 16:15 Coffee break

    16:15 - 18:00 CUDA Examples (1): VectorAdd, Stencil, ReverseArray

    18:00 Adjourn

     

    Day 2 (April, 17)

    09:00 – 10:45 CUDA Examples (2): Matrices Multiply. Assorted optimizations

    10:45 – 11:15 Coffee break

    11:15 – 13:00 Inside Kepler and Maxwell: Dynamic parallelism, Hyper-Q, unified memory 

    13:00 – 14:00 Lunch break

    14:00 – 15:45 Hands-on Lab

    15:45 – 16:15 Coffee break

    16:15 – 18:00 Hands-on Lab

    18:00 Adjourn

     

    Day 3 (April, 18)

    09:00 – 10:45 Inside Pascal and Volta: Stacked memory, NV-link, tensor cores

    10:45 – 11:15 Coffee break

    11:15 – 13:00 OpenACC and other approaches to GPU computing. Bibliography 

    13:00 – 14:00 Lunch break

    14:00 – 15:45 Hands-on Lab

    15:45 – 16:15 Coffee break

    16:15 – 18:00 Hands-on Lab

    18:00 Adjourn

     

    Day 4 (April, 19)

    09:00 – 10:45 Atomics and Histogramming

    10:45 – 11:15 Coffee break

    11:15 – 13:00 Reduction operators

    13:00 – 14:00 Lunch break

    14:00 – 15:45 Hands-on Lab

    15:45 – 16:15 Coffee break

    16:15 – 18:00 Hands-on Lab

    18:00 Adjourn

     

    Day 5 (April, 20)

    09:00 – 10:45  Hands-on Lab

    10:45 – 11:15 Coffee break

    11:15 – 13:00 Hands-on Lab

    13:00 Adjourn

     

    End of Course

     

    MU: Manuel Ujaldón (Full Professor of Computer Architecture and former Nvidia CUDA Fellow)

     

     

     

    https://events.prace-ri.eu/event/652/
    Apr 16 9:00 to Apr 20 13:00
    DAVIDE (Development of an Added Value Infrastructure Designed in Europe) is an energy-aware Petaflops Class High Performance Cluster based on IBM Power Architecture and coupled with NVIDIA Tesla Pascal GPUs with NVLink. The innovative design of DAVIDE has been developed by E4 Computer Engineering for PRACE, with the aim of providing a leading edge HPC cluster showing higher performance, reduced power consumption and ease of use.  A key technology of the architecture, developed in collaboration with the University of Bologna, is an innovative infrastructure for measuring, monitoring and capping the power consumption of the node and of the whole system.

    With a focus on energy efficiency, in this introductory course we describe in more detail the DAVIDE architecture and, with the help of uses cases in fields such as materials science and deep learning, illustrate the new opportunities available to users thanks to the introduction of this system into the Cineca HPC infrastructure.

    Skills:
    By the end of the course, participants will be expected to:

    have a good understanding of the technologies employed in the DAVIDE HPC supercomputer
    recognize the types of problem and application particularly suited to the cluster
    understand how the energy monitoring system can be used to improve the energy efficiency of applications
    Target Audience:
    Researchers who may wish to use the cluster in the future for their research or simply those interested in the evolving field of energy monitoring and efficiency in HPC.

    Grant
    The lunch for the two days will be offered to all the participants and some grants are available. The only requirement to be eligible is to be not funded by your institution to attend the course and to work or live in an institute outside the Bologna area. The grant  will be 200 euros for students working and living outside Italy and 100 euros for students working and living in Italy. Some documentation will be required and the grant will be paid only after a certified presence of minimum 80% of the lectures.

    Further information about how to request the grant, will be provided at the confirmation of the course: about 3 weeks before the starting date.

    https://events.prace-ri.eu/event/716/
    Apr 16 9:00 to Apr 17 17:00
    The registration to this course will open in Jaunary.

    All PATC Courses at BSC do not charge fees.

    PLEASE BRING YOUR OWN LAPTOP.

    Local Web Page:
     

    This course will provide very good introduction to the PUMPS Summer School run jointly with NVIDIA -also  at Campus Nord, Barcelona. For further information visit the school website  as this school has attendee selection process.
    You may also be interested in our Introduction to OpenACC course.

    Convener: 
    Antonio Pena, BSC
    Acting Director,
    NVIDIA GPU Center of Excellence

    Objectives: 

    The aim of this course is to provide students with knowledge and hands-on experience in developing applications software for processors with massively parallel computing resources. In general, we refer to a processor as massively parallel if it has the ability to complete more than 64 arithmetic operations per clock cycle. Many commercial offerings from NVIDIA, AMD, and Intel already offer such levels of concurrency. Effectively programming these processors will require in-depth knowledge about parallel programming principles, as well as the parallelism models, communication models, and resource limitations of these processors.

    Agenda to be announced shortly.

    The target audiences of the course are students who want to develop exciting applications for these processors, as well as those who want to develop programming tools and future implementations for these processors.

    Level: (All courses are designed for specialists with at least 1st cycle degree or similar background experience)
    BEGINNERS: for trainees from different background or very little knowledge.

    Agenda:

     

    Day 1 (April, 16)

    09:00 – 10:45 The GPU hardware: Many-core Nvidia developments

    10:45 – 11:15 Coffee break

    11:15 – 13:00 CUDA Programming: Threads, blocks, kernels, grids

    13:00 – 14:00 Lunch break

    14:00 – 15:45 CUDA Tools: Compiling, debugging, profiling, occupancy calculator

    15:45 – 16:15 Coffee break

    16:15 - 18:00 CUDA Examples (1): VectorAdd, Stencil, ReverseArray

    18:00 Adjourn

     

    Day 2 (April, 17)

    09:00 – 10:45 CUDA Examples (2): Matrices Multiply. Assorted optimizations

    10:45 – 11:15 Coffee break

    11:15 – 13:00 Inside Kepler and Maxwell: Dynamic parallelism, Hyper-Q, unified memory 

    13:00 – 14:00 Lunch break

    14:00 – 15:45 Hands-on Lab

    15:45 – 16:15 Coffee break

    16:15 – 18:00 Hands-on Lab

    18:00 Adjourn

     

    Day 3 (April, 18)

    09:00 – 10:45 Inside Pascal and Volta: Stacked memory, NV-link, tensor cores

    10:45 – 11:15 Coffee break

    11:15 – 13:00 OpenACC and other approaches to GPU computing. Bibliography 

    13:00 – 14:00 Lunch break

    14:00 – 15:45 Hands-on Lab

    15:45 – 16:15 Coffee break

    16:15 – 18:00 Hands-on Lab

    18:00 Adjourn

     

    Day 4 (April, 19)

    09:00 – 10:45 Atomics and Histogramming

    10:45 – 11:15 Coffee break

    11:15 – 13:00 Reduction operators

    13:00 – 14:00 Lunch break

    14:00 – 15:45 Hands-on Lab

    15:45 – 16:15 Coffee break

    16:15 – 18:00 Hands-on Lab

    18:00 Adjourn

     

    Day 5 (April, 20)

    09:00 – 10:45  Hands-on Lab

    10:45 – 11:15 Coffee break

    11:15 – 13:00 Hands-on Lab

    13:00 Adjourn

     

    End of Course

     

    MU: Manuel Ujaldón (Full Professor of Computer Architecture and former Nvidia CUDA Fellow)

     

     

     

    https://events.prace-ri.eu/event/652/
    Apr 16 9:00 to Apr 20 13:00
    DAVIDE (Development of an Added Value Infrastructure Designed in Europe) is an energy-aware Petaflops Class High Performance Cluster based on IBM Power Architecture and coupled with NVIDIA Tesla Pascal GPUs with NVLink. The innovative design of DAVIDE has been developed by E4 Computer Engineering for PRACE, with the aim of providing a leading edge HPC cluster showing higher performance, reduced power consumption and ease of use.  A key technology of the architecture, developed in collaboration with the University of Bologna, is an innovative infrastructure for measuring, monitoring and capping the power consumption of the node and of the whole system.

    With a focus on energy efficiency, in this introductory course we describe in more detail the DAVIDE architecture and, with the help of uses cases in fields such as materials science and deep learning, illustrate the new opportunities available to users thanks to the introduction of this system into the Cineca HPC infrastructure.

    Skills:
    By the end of the course, participants will be expected to:

    have a good understanding of the technologies employed in the DAVIDE HPC supercomputer
    recognize the types of problem and application particularly suited to the cluster
    understand how the energy monitoring system can be used to improve the energy efficiency of applications
    Target Audience:
    Researchers who may wish to use the cluster in the future for their research or simply those interested in the evolving field of energy monitoring and efficiency in HPC.

    Grant
    The lunch for the two days will be offered to all the participants and some grants are available. The only requirement to be eligible is to be not funded by your institution to attend the course and to work or live in an institute outside the Bologna area. The grant  will be 200 euros for students working and living outside Italy and 100 euros for students working and living in Italy. Some documentation will be required and the grant will be paid only after a certified presence of minimum 80% of the lectures.

    Further information about how to request the grant, will be provided at the confirmation of the course: about 3 weeks before the starting date.

    https://events.prace-ri.eu/event/716/
    Apr 16 9:00 to Apr 17 17:00
    The registration to this course will open in Jaunary.

    All PATC Courses at BSC do not charge fees.

    PLEASE BRING YOUR OWN LAPTOP.

    Local Web Page:
     

    This course will provide very good introduction to the PUMPS Summer School run jointly with NVIDIA -also  at Campus Nord, Barcelona. For further information visit the school website  as this school has attendee selection process.
    You may also be interested in our Introduction to OpenACC course.

    Convener: 
    Antonio Pena, BSC
    Acting Director,
    NVIDIA GPU Center of Excellence

    Objectives: 

    The aim of this course is to provide students with knowledge and hands-on experience in developing applications software for processors with massively parallel computing resources. In general, we refer to a processor as massively parallel if it has the ability to complete more than 64 arithmetic operations per clock cycle. Many commercial offerings from NVIDIA, AMD, and Intel already offer such levels of concurrency. Effectively programming these processors will require in-depth knowledge about parallel programming principles, as well as the parallelism models, communication models, and resource limitations of these processors.

    Agenda to be announced shortly.

    The target audiences of the course are students who want to develop exciting applications for these processors, as well as those who want to develop programming tools and future implementations for these processors.

    Level: (All courses are designed for specialists with at least 1st cycle degree or similar background experience)
    BEGINNERS: for trainees from different background or very little knowledge.

    Agenda:

     

    Day 1 (April, 16)

    09:00 – 10:45 The GPU hardware: Many-core Nvidia developments

    10:45 – 11:15 Coffee break

    11:15 – 13:00 CUDA Programming: Threads, blocks, kernels, grids

    13:00 – 14:00 Lunch break

    14:00 – 15:45 CUDA Tools: Compiling, debugging, profiling, occupancy calculator

    15:45 – 16:15 Coffee break

    16:15 - 18:00 CUDA Examples (1): VectorAdd, Stencil, ReverseArray

    18:00 Adjourn

     

    Day 2 (April, 17)

    09:00 – 10:45 CUDA Examples (2): Matrices Multiply. Assorted optimizations

    10:45 – 11:15 Coffee break

    11:15 – 13:00 Inside Kepler and Maxwell: Dynamic parallelism, Hyper-Q, unified memory 

    13:00 – 14:00 Lunch break

    14:00 – 15:45 Hands-on Lab

    15:45 – 16:15 Coffee break

    16:15 – 18:00 Hands-on Lab

    18:00 Adjourn

     

    Day 3 (April, 18)

    09:00 – 10:45 Inside Pascal and Volta: Stacked memory, NV-link, tensor cores

    10:45 – 11:15 Coffee break

    11:15 – 13:00 OpenACC and other approaches to GPU computing. Bibliography 

    13:00 – 14:00 Lunch break

    14:00 – 15:45 Hands-on Lab

    15:45 – 16:15 Coffee break

    16:15 – 18:00 Hands-on Lab

    18:00 Adjourn

     

    Day 4 (April, 19)

    09:00 – 10:45 Atomics and Histogramming

    10:45 – 11:15 Coffee break

    11:15 – 13:00 Reduction operators

    13:00 – 14:00 Lunch break

    14:00 – 15:45 Hands-on Lab

    15:45 – 16:15 Coffee break

    16:15 – 18:00 Hands-on Lab

    18:00 Adjourn

     

    Day 5 (April, 20)

    09:00 – 10:45  Hands-on Lab

    10:45 – 11:15 Coffee break

    11:15 – 13:00 Hands-on Lab

    13:00 Adjourn

     

    End of Course

     

    MU: Manuel Ujaldón (Full Professor of Computer Architecture and former Nvidia CUDA Fellow)

     

     

     

    https://events.prace-ri.eu/event/652/
    Apr 16 9:00 to Apr 20 13:00
    This course gives an overview of the most relevant GPGPU computing techniques to accelerate computationally demanding tasks on HPC heterogeneous architectures based on GPUs.

    The course will start with an architectural overview of modern GPU based heterogeneous architectures, focusing on its computing power versus data movement needs. The course will cover both a high level (pragma-based) programming approach with OpenACC for a fast porting startup, and lower level approaches based on nVIDIA CUDA and OpenCL programming languages for finer grained computational intensive tasks. A particular attention will be given on performance tuning and techniques to overcome common data movement bottlenecks and patterns.

    Topics:

    Overview of architectural trends of GPUs in HPC. GPGPU parallel programming in heterogeneous architectures. Basis of OpenACC, CUDA and OpenCL programming.

    Skills:
    By the end of the course, students will be able to:

    understand the strengths and weaknesses of GPUs as accelerators
    program GPU accelerated applications using both higher and lower level programming approaches
    overcome problems and bottlenecks regarding data movement between host and device memories
    make best use of independent execution queues for concurrent computing/data-movement operations
    Target Audience:
    Researchers and programmers interested in porting scientific applications or use efficient post-process and data-analysis techniques in modern heterogeneous HPC architectures.

    Prerequisites:

    A basic knowledge of C or Fortran is mandatory. Programming and Linux or Unix. A basic knowledge of any parallel programming technique/paradigm is recommended.

    Grant:
    The lunch for the three days will be offered to all the participants and some grants are available. The only requirement to be eligible is to be not funded by your institution to attend the course and to work or live in an institute outside the Bologna area. The grant  will be 300 euros for students working and living outside Italy and 150 euros for students working and living in Italy (outside Bologna). Some documentation will be required and the grant will be paid only after a certified presence of minimum 80% of the lectures.

    Further information about how to request the grant, will be provided at the confirmation of the course: about 3 weeks before the starting date.

    https://events.prace-ri.eu/event/715/
    Apr 18 9:00 to Apr 20 18:00
    The registration to this course will open in Jaunary.

    All PATC Courses at BSC do not charge fees.

    PLEASE BRING YOUR OWN LAPTOP.

    Local Web Page:
     

    This course will provide very good introduction to the PUMPS Summer School run jointly with NVIDIA -also  at Campus Nord, Barcelona. For further information visit the school website  as this school has attendee selection process.
    You may also be interested in our Introduction to OpenACC course.

    Convener: 
    Antonio Pena, BSC
    Acting Director,
    NVIDIA GPU Center of Excellence

    Objectives: 

    The aim of this course is to provide students with knowledge and hands-on experience in developing applications software for processors with massively parallel computing resources. In general, we refer to a processor as massively parallel if it has the ability to complete more than 64 arithmetic operations per clock cycle. Many commercial offerings from NVIDIA, AMD, and Intel already offer such levels of concurrency. Effectively programming these processors will require in-depth knowledge about parallel programming principles, as well as the parallelism models, communication models, and resource limitations of these processors.

    Agenda to be announced shortly.

    The target audiences of the course are students who want to develop exciting applications for these processors, as well as those who want to develop programming tools and future implementations for these processors.

    Level: (All courses are designed for specialists with at least 1st cycle degree or similar background experience)
    BEGINNERS: for trainees from different background or very little knowledge.

    Agenda:

     

    Day 1 (April, 16)

    09:00 – 10:45 The GPU hardware: Many-core Nvidia developments

    10:45 – 11:15 Coffee break

    11:15 – 13:00 CUDA Programming: Threads, blocks, kernels, grids

    13:00 – 14:00 Lunch break

    14:00 – 15:45 CUDA Tools: Compiling, debugging, profiling, occupancy calculator

    15:45 – 16:15 Coffee break

    16:15 - 18:00 CUDA Examples (1): VectorAdd, Stencil, ReverseArray

    18:00 Adjourn

     

    Day 2 (April, 17)

    09:00 – 10:45 CUDA Examples (2): Matrices Multiply. Assorted optimizations

    10:45 – 11:15 Coffee break

    11:15 – 13:00 Inside Kepler and Maxwell: Dynamic parallelism, Hyper-Q, unified memory 

    13:00 – 14:00 Lunch break

    14:00 – 15:45 Hands-on Lab

    15:45 – 16:15 Coffee break

    16:15 – 18:00 Hands-on Lab

    18:00 Adjourn

     

    Day 3 (April, 18)

    09:00 – 10:45 Inside Pascal and Volta: Stacked memory, NV-link, tensor cores

    10:45 – 11:15 Coffee break

    11:15 – 13:00 OpenACC and other approaches to GPU computing. Bibliography 

    13:00 – 14:00 Lunch break

    14:00 – 15:45 Hands-on Lab

    15:45 – 16:15 Coffee break

    16:15 – 18:00 Hands-on Lab

    18:00 Adjourn

     

    Day 4 (April, 19)

    09:00 – 10:45 Atomics and Histogramming

    10:45 – 11:15 Coffee break

    11:15 – 13:00 Reduction operators

    13:00 – 14:00 Lunch break

    14:00 – 15:45 Hands-on Lab

    15:45 – 16:15 Coffee break

    16:15 – 18:00 Hands-on Lab

    18:00 Adjourn

     

    Day 5 (April, 20)

    09:00 – 10:45  Hands-on Lab

    10:45 – 11:15 Coffee break

    11:15 – 13:00 Hands-on Lab

    13:00 Adjourn

     

    End of Course

     

    MU: Manuel Ujaldón (Full Professor of Computer Architecture and former Nvidia CUDA Fellow)

     

     

     

    https://events.prace-ri.eu/event/652/
    Apr 16 9:00 to Apr 20 13:00
    This course gives an overview of the most relevant GPGPU computing techniques to accelerate computationally demanding tasks on HPC heterogeneous architectures based on GPUs.

    The course will start with an architectural overview of modern GPU based heterogeneous architectures, focusing on its computing power versus data movement needs. The course will cover both a high level (pragma-based) programming approach with OpenACC for a fast porting startup, and lower level approaches based on nVIDIA CUDA and OpenCL programming languages for finer grained computational intensive tasks. A particular attention will be given on performance tuning and techniques to overcome common data movement bottlenecks and patterns.

    Topics:

    Overview of architectural trends of GPUs in HPC. GPGPU parallel programming in heterogeneous architectures. Basis of OpenACC, CUDA and OpenCL programming.

    Skills:
    By the end of the course, students will be able to:

    understand the strengths and weaknesses of GPUs as accelerators
    program GPU accelerated applications using both higher and lower level programming approaches
    overcome problems and bottlenecks regarding data movement between host and device memories
    make best use of independent execution queues for concurrent computing/data-movement operations
    Target Audience:
    Researchers and programmers interested in porting scientific applications or use efficient post-process and data-analysis techniques in modern heterogeneous HPC architectures.

    Prerequisites:

    A basic knowledge of C or Fortran is mandatory. Programming and Linux or Unix. A basic knowledge of any parallel programming technique/paradigm is recommended.

    Grant:
    The lunch for the three days will be offered to all the participants and some grants are available. The only requirement to be eligible is to be not funded by your institution to attend the course and to work or live in an institute outside the Bologna area. The grant  will be 300 euros for students working and living outside Italy and 150 euros for students working and living in Italy (outside Bologna). Some documentation will be required and the grant will be paid only after a certified presence of minimum 80% of the lectures.

    Further information about how to request the grant, will be provided at the confirmation of the course: about 3 weeks before the starting date.

    https://events.prace-ri.eu/event/715/
    Apr 18 9:00 to Apr 20 18:00
    The registration to this course will open in Jaunary.

    All PATC Courses at BSC do not charge fees.

    PLEASE BRING YOUR OWN LAPTOP.

    Local Web Page:
     

    This course will provide very good introduction to the PUMPS Summer School run jointly with NVIDIA -also  at Campus Nord, Barcelona. For further information visit the school website  as this school has attendee selection process.
    You may also be interested in our Introduction to OpenACC course.

    Convener: 
    Antonio Pena, BSC
    Acting Director,
    NVIDIA GPU Center of Excellence

    Objectives: 

    The aim of this course is to provide students with knowledge and hands-on experience in developing applications software for processors with massively parallel computing resources. In general, we refer to a processor as massively parallel if it has the ability to complete more than 64 arithmetic operations per clock cycle. Many commercial offerings from NVIDIA, AMD, and Intel already offer such levels of concurrency. Effectively programming these processors will require in-depth knowledge about parallel programming principles, as well as the parallelism models, communication models, and resource limitations of these processors.

    Agenda to be announced shortly.

    The target audiences of the course are students who want to develop exciting applications for these processors, as well as those who want to develop programming tools and future implementations for these processors.

    Level: (All courses are designed for specialists with at least 1st cycle degree or similar background experience)
    BEGINNERS: for trainees from different background or very little knowledge.

    Agenda:

     

    Day 1 (April, 16)

    09:00 – 10:45 The GPU hardware: Many-core Nvidia developments

    10:45 – 11:15 Coffee break

    11:15 – 13:00 CUDA Programming: Threads, blocks, kernels, grids

    13:00 – 14:00 Lunch break

    14:00 – 15:45 CUDA Tools: Compiling, debugging, profiling, occupancy calculator

    15:45 – 16:15 Coffee break

    16:15 - 18:00 CUDA Examples (1): VectorAdd, Stencil, ReverseArray

    18:00 Adjourn

     

    Day 2 (April, 17)

    09:00 – 10:45 CUDA Examples (2): Matrices Multiply. Assorted optimizations

    10:45 – 11:15 Coffee break

    11:15 – 13:00 Inside Kepler and Maxwell: Dynamic parallelism, Hyper-Q, unified memory 

    13:00 – 14:00 Lunch break

    14:00 – 15:45 Hands-on Lab

    15:45 – 16:15 Coffee break

    16:15 – 18:00 Hands-on Lab

    18:00 Adjourn

     

    Day 3 (April, 18)

    09:00 – 10:45 Inside Pascal and Volta: Stacked memory, NV-link, tensor cores

    10:45 – 11:15 Coffee break

    11:15 – 13:00 OpenACC and other approaches to GPU computing. Bibliography 

    13:00 – 14:00 Lunch break

    14:00 – 15:45 Hands-on Lab

    15:45 – 16:15 Coffee break

    16:15 – 18:00 Hands-on Lab

    18:00 Adjourn

     

    Day 4 (April, 19)

    09:00 – 10:45 Atomics and Histogramming

    10:45 – 11:15 Coffee break

    11:15 – 13:00 Reduction operators

    13:00 – 14:00 Lunch break

    14:00 – 15:45 Hands-on Lab

    15:45 – 16:15 Coffee break

    16:15 – 18:00 Hands-on Lab

    18:00 Adjourn

     

    Day 5 (April, 20)

    09:00 – 10:45  Hands-on Lab

    10:45 – 11:15 Coffee break

    11:15 – 13:00 Hands-on Lab

    13:00 Adjourn

     

    End of Course

     

    MU: Manuel Ujaldón (Full Professor of Computer Architecture and former Nvidia CUDA Fellow)

     

     

     

    https://events.prace-ri.eu/event/652/
    Apr 16 9:00 to Apr 20 13:00
    This course gives an overview of the most relevant GPGPU computing techniques to accelerate computationally demanding tasks on HPC heterogeneous architectures based on GPUs.

    The course will start with an architectural overview of modern GPU based heterogeneous architectures, focusing on its computing power versus data movement needs. The course will cover both a high level (pragma-based) programming approach with OpenACC for a fast porting startup, and lower level approaches based on nVIDIA CUDA and OpenCL programming languages for finer grained computational intensive tasks. A particular attention will be given on performance tuning and techniques to overcome common data movement bottlenecks and patterns.

    Topics:

    Overview of architectural trends of GPUs in HPC. GPGPU parallel programming in heterogeneous architectures. Basis of OpenACC, CUDA and OpenCL programming.

    Skills:
    By the end of the course, students will be able to:

    understand the strengths and weaknesses of GPUs as accelerators
    program GPU accelerated applications using both higher and lower level programming approaches
    overcome problems and bottlenecks regarding data movement between host and device memories
    make best use of independent execution queues for concurrent computing/data-movement operations
    Target Audience:
    Researchers and programmers interested in porting scientific applications or use efficient post-process and data-analysis techniques in modern heterogeneous HPC architectures.

    Prerequisites:

    A basic knowledge of C or Fortran is mandatory. Programming and Linux or Unix. A basic knowledge of any parallel programming technique/paradigm is recommended.

    Grant:
    The lunch for the three days will be offered to all the participants and some grants are available. The only requirement to be eligible is to be not funded by your institution to attend the course and to work or live in an institute outside the Bologna area. The grant  will be 300 euros for students working and living outside Italy and 150 euros for students working and living in Italy (outside Bologna). Some documentation will be required and the grant will be paid only after a certified presence of minimum 80% of the lectures.

    Further information about how to request the grant, will be provided at the confirmation of the course: about 3 weeks before the starting date.

    https://events.prace-ri.eu/event/715/
    Apr 18 9:00 to Apr 20 18:00
    21
     
    22
     
    GPU-accelerated computing drives current scientific research. Writing fast numeric algorithms for GPUs offers high application performance by offloading compute-intensive portions of the code to an NVIDIA GPU. The course will cover basic aspects of GPU architectures and programming. Focus is on the usage of the parallel programming language CUDA-C which allows maximum control of NVIDIA GPU hardware. Examples of increasing complexity will be used to demonstrate optimization and tuning of scientific applications.

    Topics covered will include:

    Introduction to GPU/Parallel computing
    Programming model CUDA
    GPU libraries like CuBLAS and CuFFT
    Tools for debugging and profiling
    Performance optimizations
    Prerequisites: Some knowledge about Linux, e.g. make, command line editor, Linux shell, experience in C/C++

    Application
    Registrations are only considered until 31 March 2018 due to available space, the maximal number of participants is limited. Applicants will be notified, whether they are accepted for participitation.

    Instructors: Dr. Jan Meinke, Jochen Kreutz, Dr. Andreas Herten, JSC; Jiri Kraus, NVIDIA

    Contact
    For any questions concerning the course please send an e-mail to j.meinke@fz-juelich.de

    https://events.prace-ri.eu/event/705/
    Apr 23 9:00 to Apr 25 16:30
                 

    Sponsor



    Goals

    This workshop organized by VI-HPS, LRZ (new VI-HPS member since April 2018) and IT4Innovations as a PRACE training event will:

    give an overview of the VI-HPS programming tools suite
    explain the functionality of individual tools, and how to use them effectively
    offer hands-on experience and expert assistance using the tools
    To foster the Czech-German collaboration in high performance computing, a contingent of places has been reserved for participants from the Czech Republic.

    Programme Overview

    The detailed program will be available on the VI-HPS training web site.

    Presentations and hands-on sessions are planned on the following topics

    Setting up, welcome and introduction
    Score-P instrumentation and measurement
    Scalasca automated trace analysis
    TAU performance system
    Vampir interactive trace analysis
    Periscope/PTF automated performance analysis and optimisation
    Extra-P automated performance modeling
    Paraver/Extrae/Dimemas trace analysis and performance prediction
    [k]cachegrind cache utilisation analysis
    MAQAO performance analysis & optimisation
    MUST runtime error detection for MPI
    ARCHER runtime error detection for OpenMP
    JUBE script-based workflow execution environment
    ... and potentially others to be added
    A brief overview of the capabilities of these and associated tools is provided in the VI-HPS Tools Guide.

    The workshop will be held in English and run from 09:00 to not later than 18:00 each day, with breaks for lunch and refreshments. For participants from public research institutions in PRACE countries, the course fee is sponsored through the PRACE PATC program. All participants are responsible for their own travel and accommodation.

    A social event for participant and instructor networking is planned for the evening on Tuesday 24 April, consisting of a guided tour of the Weihenstephan Brewery sponsored by Megware followed by a self-paid dinner at the brewery restaurant.

    Classroom capacity is limited, therefore priority may be given to applicants with parallel codes already running on the workshop computer system (CoolMUC-3), and those bringing codes from similar Xeon Phi x86 cluster systems to work on. Participants are therefore encouraged to prepare their own MPI, OpenMP and hybrid MPI+OpenMP parallel application codes for analysis.



     

    Picture: Participants of the 21st VI-HPS Tuning Workshop at LRZ in April 2016.

    Programme in Detail (provisional)

    Day 1:
    Monday 23 April
    08:30
    (registration & set-up of course accounts on workshop computers)
    09:00
    Welcome
    Introduction to VI-HPS & overview of tools
    Introduction to parallel performance engineering
    CoolMUC-3 computer system and software environment [Volker Weinberg, LRZ]
    Building and running NPB/BT-MZ on CoolMUC-3 [Ilya Zhukov, JSC]

    10:30
    (break)
     
    11:00
    mpiP lightweight MPI profiling [Martin Schulz, TUM]
    mpiP hands-on exercises
    MAQAO performance analysis tools [Cédric Valensi, Emmanuel Oseret & Salah Ibn Amar, UVSQ]
    MAQAO hands-on exercises

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:30
    Review of day and schedule for remainder of workshop
    18:00
    (adjourn)
     
    Day 2:
    Tuesday 24 April
    09:00
    Score-P instrumentation & measurement toolset [Ronny Tschüter, TUD]
    Score-P hands-on exercises
    CUBE profile explorer hands-on exercises [Ilya Zhukov, JSC]

    10:30
    (break)
    11:00
    Score-P analysis scoring & measurement filtering  [R. Tschüter, TUD]
    Measuring hardware counters and other metrics 
    Extra-P automated performance modeling [Sergei Shudler, TUDarmstadt]
    Extra-P hands-on exercises

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:30
    Review of day and schedule for remainder of workshop
    18:00
    (adjourn)
    19:30
    Social event: Guided tour of Weihenstephan Brewery and dinner
     
    Day 3:
    Wednesday 25 April
    09:00
    Scalasca automated trace analysis [Ilya Zhukov, JSC]
    Scalasca hands-on exercises
    Vampir interactive trace analysis [Matthias Weber, TUDresden]
    Vampir hands-on exercises

    10:30
    (break)
     
    11:00
    Paraver tracing tools suite [Judit Giménez & Lau Mercadal, BSC]
    Paraver hands-on exercises

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:30
    Review of day and schedule for remainder of workshop
    18:00
    (adjourn)
     
    Day 4:
    Thursday 26 April
    09:00
    JUBE workflow execution environment [Thomas Breuer, JSC]
    JUBE hands-on exercises
    Periscope Tuning Framework [Robert Mijakovic, TUM]
    Periscope hands-on exercises

    10:30
    (break)
     
    11:00
    TAU performance system [Sameer Shende, UOregon]
    TAU hands-on exercises

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:30
    Review of day and schedule for remainder of workshop
    18:00
    (adjourn)
     
    Day 5:
    Friday 27 April
    09:00
    MUST MPI runtime error detection [Joachim Protze, RWTH]
    MUST hands-on exercises
    ARCHER OpenMP runtime error detection [Joachim Protze, RWTH]
    ARCHER hands-on exercises

    10:30
    (break)
    11:00
    Kcachegrind cache analysis [Josef Weidendorfer, LRZ]
    Kcachegrind hands-on exercises
    Review

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:00
    (adjourn)

     

    Hardware and Software Platforms

    CoolMUC-3: KNL-based x86 Linux cluster system:

    148 compute nodes each with single Intel Xeon Phi 7210-F 'Knights Landing' MIC processors (1.3GHz, 64 cores per processor, 4 hardware threads per core) and 96GB RAM and 16GB HBM
    cluster modes: quad, snc4, a2a
    memory modes: flat, cache, hybrid
    network: Intel OmniPath interconnect
    parallel filesystem: GPFS (SCRATCH & WORK)
    software: SLES12-based GNU/Linux, Intel MPI; Intel, GCC and other compilers; SLURM batchsystem
    The local HPC system CoolMUC-3 is the primary platform for the workshop and will be used for the hands-on exercises. Course accounts will be provided during the workshop to participants without existing accounts. Other systems where up-to-date versions of the tools are installed can also be used when preferred, though support may be limited and participants are expected to already possess user accounts on non-local systems. Regardless of whichever systems they intend to use, participants should be familiar with the relevant procedures for compiling and running their parallel applications (via batch queues where appropriate).

    Registration

    Via https://events.prace-ri.eu/event/648/registration/register

    Location, Transport and Accommodation

    The workshop will be held in Leibniz Rechenzentrum on the university campus outside Garching bei München, approximately 25 min north from the city centre of Munich. The U-bahn line U6 (station: Garching-Forschungszentrum) provides direct connection from the campus area to both Munich and Garching.Getting to/from LRZ

    It is recommended to choose a hotel in Garching or Munich city centre and use the U-bahn to reach LRZ.Accommodation in GarchingAccommodation in Munich

    https://events.prace-ri.eu/event/648/
    Apr 23 9:00 to Apr 27 17:00
    The increase in computational power goes hand in hand with an increase in the size of the data to be managed, both on the input and on the output sides. IO can easily become a bottleneck for large scale architectures. The understanding of parallel file system mechanisms and parallel IO concepts enables users to efficiently use existing high level libraires like Netcdf or HDF5. 

    Topics:

    • HDF5 High level IO libraries (3h)

    • Parallel HDF5 and focus on MPI-IO hints (3h)

    • Parallel file systems: Lustre (1h30)

    • PDI: Parallel Data Interface (4h30)

     

    Instructors: M. Haefele (Maison de la Simulation, CNRS), Thomas Leibovici (TGCC, CEA), Julien Bigot (maison de la Simulation)

    Learning outcomes: After this course, participants should understand the tradeoffs implied by using a parallel file-system, and know how to efficiently use parallel IO libraries.

    Prerequisites: Knowledge of C or Fortran programming language, parallel programing with MPI

    https://events.prace-ri.eu/event/698/
    Apr 23 9:30 to Apr 24 17:00
    GPU-accelerated computing drives current scientific research. Writing fast numeric algorithms for GPUs offers high application performance by offloading compute-intensive portions of the code to an NVIDIA GPU. The course will cover basic aspects of GPU architectures and programming. Focus is on the usage of the parallel programming language CUDA-C which allows maximum control of NVIDIA GPU hardware. Examples of increasing complexity will be used to demonstrate optimization and tuning of scientific applications.

    Topics covered will include:

    Introduction to GPU/Parallel computing
    Programming model CUDA
    GPU libraries like CuBLAS and CuFFT
    Tools for debugging and profiling
    Performance optimizations
    Prerequisites: Some knowledge about Linux, e.g. make, command line editor, Linux shell, experience in C/C++

    Application
    Registrations are only considered until 31 March 2018 due to available space, the maximal number of participants is limited. Applicants will be notified, whether they are accepted for participitation.

    Instructors: Dr. Jan Meinke, Jochen Kreutz, Dr. Andreas Herten, JSC; Jiri Kraus, NVIDIA

    Contact
    For any questions concerning the course please send an e-mail to j.meinke@fz-juelich.de

    https://events.prace-ri.eu/event/705/
    Apr 23 9:00 to Apr 25 16:30
    The increase in computational power goes hand in hand with an increase in the size of the data to be managed, both on the input and on the output sides. IO can easily become a bottleneck for large scale architectures. The understanding of parallel file system mechanisms and parallel IO concepts enables users to efficiently use existing high level libraires like Netcdf or HDF5. 

    Topics:

    • HDF5 High level IO libraries (3h)

    • Parallel HDF5 and focus on MPI-IO hints (3h)

    • Parallel file systems: Lustre (1h30)

    • PDI: Parallel Data Interface (4h30)

     

    Instructors: M. Haefele (Maison de la Simulation, CNRS), Thomas Leibovici (TGCC, CEA), Julien Bigot (maison de la Simulation)

    Learning outcomes: After this course, participants should understand the tradeoffs implied by using a parallel file-system, and know how to efficiently use parallel IO libraries.

    Prerequisites: Knowledge of C or Fortran programming language, parallel programing with MPI

    https://events.prace-ri.eu/event/698/
    Apr 23 9:30 to Apr 24 17:00
                 

    Sponsor



    Goals

    This workshop organized by VI-HPS, LRZ (new VI-HPS member since April 2018) and IT4Innovations as a PRACE training event will:

    give an overview of the VI-HPS programming tools suite
    explain the functionality of individual tools, and how to use them effectively
    offer hands-on experience and expert assistance using the tools
    To foster the Czech-German collaboration in high performance computing, a contingent of places has been reserved for participants from the Czech Republic.

    Programme Overview

    The detailed program will be available on the VI-HPS training web site.

    Presentations and hands-on sessions are planned on the following topics

    Setting up, welcome and introduction
    Score-P instrumentation and measurement
    Scalasca automated trace analysis
    TAU performance system
    Vampir interactive trace analysis
    Periscope/PTF automated performance analysis and optimisation
    Extra-P automated performance modeling
    Paraver/Extrae/Dimemas trace analysis and performance prediction
    [k]cachegrind cache utilisation analysis
    MAQAO performance analysis & optimisation
    MUST runtime error detection for MPI
    ARCHER runtime error detection for OpenMP
    JUBE script-based workflow execution environment
    ... and potentially others to be added
    A brief overview of the capabilities of these and associated tools is provided in the VI-HPS Tools Guide.

    The workshop will be held in English and run from 09:00 to not later than 18:00 each day, with breaks for lunch and refreshments. For participants from public research institutions in PRACE countries, the course fee is sponsored through the PRACE PATC program. All participants are responsible for their own travel and accommodation.

    A social event for participant and instructor networking is planned for the evening on Tuesday 24 April, consisting of a guided tour of the Weihenstephan Brewery sponsored by Megware followed by a self-paid dinner at the brewery restaurant.

    Classroom capacity is limited, therefore priority may be given to applicants with parallel codes already running on the workshop computer system (CoolMUC-3), and those bringing codes from similar Xeon Phi x86 cluster systems to work on. Participants are therefore encouraged to prepare their own MPI, OpenMP and hybrid MPI+OpenMP parallel application codes for analysis.



     

    Picture: Participants of the 21st VI-HPS Tuning Workshop at LRZ in April 2016.

    Programme in Detail (provisional)

    Day 1:
    Monday 23 April
    08:30
    (registration & set-up of course accounts on workshop computers)
    09:00
    Welcome
    Introduction to VI-HPS & overview of tools
    Introduction to parallel performance engineering
    CoolMUC-3 computer system and software environment [Volker Weinberg, LRZ]
    Building and running NPB/BT-MZ on CoolMUC-3 [Ilya Zhukov, JSC]

    10:30
    (break)
     
    11:00
    mpiP lightweight MPI profiling [Martin Schulz, TUM]
    mpiP hands-on exercises
    MAQAO performance analysis tools [Cédric Valensi, Emmanuel Oseret & Salah Ibn Amar, UVSQ]
    MAQAO hands-on exercises

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:30
    Review of day and schedule for remainder of workshop
    18:00
    (adjourn)
     
    Day 2:
    Tuesday 24 April
    09:00
    Score-P instrumentation & measurement toolset [Ronny Tschüter, TUD]
    Score-P hands-on exercises
    CUBE profile explorer hands-on exercises [Ilya Zhukov, JSC]

    10:30
    (break)
    11:00
    Score-P analysis scoring & measurement filtering  [R. Tschüter, TUD]
    Measuring hardware counters and other metrics 
    Extra-P automated performance modeling [Sergei Shudler, TUDarmstadt]
    Extra-P hands-on exercises

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:30
    Review of day and schedule for remainder of workshop
    18:00
    (adjourn)
    19:30
    Social event: Guided tour of Weihenstephan Brewery and dinner
     
    Day 3:
    Wednesday 25 April
    09:00
    Scalasca automated trace analysis [Ilya Zhukov, JSC]
    Scalasca hands-on exercises
    Vampir interactive trace analysis [Matthias Weber, TUDresden]
    Vampir hands-on exercises

    10:30
    (break)
     
    11:00
    Paraver tracing tools suite [Judit Giménez & Lau Mercadal, BSC]
    Paraver hands-on exercises

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:30
    Review of day and schedule for remainder of workshop
    18:00
    (adjourn)
     
    Day 4:
    Thursday 26 April
    09:00
    JUBE workflow execution environment [Thomas Breuer, JSC]
    JUBE hands-on exercises
    Periscope Tuning Framework [Robert Mijakovic, TUM]
    Periscope hands-on exercises

    10:30
    (break)
     
    11:00
    TAU performance system [Sameer Shende, UOregon]
    TAU hands-on exercises

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:30
    Review of day and schedule for remainder of workshop
    18:00
    (adjourn)
     
    Day 5:
    Friday 27 April
    09:00
    MUST MPI runtime error detection [Joachim Protze, RWTH]
    MUST hands-on exercises
    ARCHER OpenMP runtime error detection [Joachim Protze, RWTH]
    ARCHER hands-on exercises

    10:30
    (break)
    11:00
    Kcachegrind cache analysis [Josef Weidendorfer, LRZ]
    Kcachegrind hands-on exercises
    Review

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:00
    (adjourn)

     

    Hardware and Software Platforms

    CoolMUC-3: KNL-based x86 Linux cluster system:

    148 compute nodes each with single Intel Xeon Phi 7210-F 'Knights Landing' MIC processors (1.3GHz, 64 cores per processor, 4 hardware threads per core) and 96GB RAM and 16GB HBM
    cluster modes: quad, snc4, a2a
    memory modes: flat, cache, hybrid
    network: Intel OmniPath interconnect
    parallel filesystem: GPFS (SCRATCH & WORK)
    software: SLES12-based GNU/Linux, Intel MPI; Intel, GCC and other compilers; SLURM batchsystem
    The local HPC system CoolMUC-3 is the primary platform for the workshop and will be used for the hands-on exercises. Course accounts will be provided during the workshop to participants without existing accounts. Other systems where up-to-date versions of the tools are installed can also be used when preferred, though support may be limited and participants are expected to already possess user accounts on non-local systems. Regardless of whichever systems they intend to use, participants should be familiar with the relevant procedures for compiling and running their parallel applications (via batch queues where appropriate).

    Registration

    Via https://events.prace-ri.eu/event/648/registration/register

    Location, Transport and Accommodation

    The workshop will be held in Leibniz Rechenzentrum on the university campus outside Garching bei München, approximately 25 min north from the city centre of Munich. The U-bahn line U6 (station: Garching-Forschungszentrum) provides direct connection from the campus area to both Munich and Garching.Getting to/from LRZ

    It is recommended to choose a hotel in Garching or Munich city centre and use the U-bahn to reach LRZ.Accommodation in GarchingAccommodation in Munich

    https://events.prace-ri.eu/event/648/
    Apr 23 9:00 to Apr 27 17:00
    GPU-accelerated computing drives current scientific research. Writing fast numeric algorithms for GPUs offers high application performance by offloading compute-intensive portions of the code to an NVIDIA GPU. The course will cover basic aspects of GPU architectures and programming. Focus is on the usage of the parallel programming language CUDA-C which allows maximum control of NVIDIA GPU hardware. Examples of increasing complexity will be used to demonstrate optimization and tuning of scientific applications.

    Topics covered will include:

    Introduction to GPU/Parallel computing
    Programming model CUDA
    GPU libraries like CuBLAS and CuFFT
    Tools for debugging and profiling
    Performance optimizations
    Prerequisites: Some knowledge about Linux, e.g. make, command line editor, Linux shell, experience in C/C++

    Application
    Registrations are only considered until 31 March 2018 due to available space, the maximal number of participants is limited. Applicants will be notified, whether they are accepted for participitation.

    Instructors: Dr. Jan Meinke, Jochen Kreutz, Dr. Andreas Herten, JSC; Jiri Kraus, NVIDIA

    Contact
    For any questions concerning the course please send an e-mail to j.meinke@fz-juelich.de

    https://events.prace-ri.eu/event/705/
    Apr 23 9:00 to Apr 25 16:30
                 

    Sponsor



    Goals

    This workshop organized by VI-HPS, LRZ (new VI-HPS member since April 2018) and IT4Innovations as a PRACE training event will:

    give an overview of the VI-HPS programming tools suite
    explain the functionality of individual tools, and how to use them effectively
    offer hands-on experience and expert assistance using the tools
    To foster the Czech-German collaboration in high performance computing, a contingent of places has been reserved for participants from the Czech Republic.

    Programme Overview

    The detailed program will be available on the VI-HPS training web site.

    Presentations and hands-on sessions are planned on the following topics

    Setting up, welcome and introduction
    Score-P instrumentation and measurement
    Scalasca automated trace analysis
    TAU performance system
    Vampir interactive trace analysis
    Periscope/PTF automated performance analysis and optimisation
    Extra-P automated performance modeling
    Paraver/Extrae/Dimemas trace analysis and performance prediction
    [k]cachegrind cache utilisation analysis
    MAQAO performance analysis & optimisation
    MUST runtime error detection for MPI
    ARCHER runtime error detection for OpenMP
    JUBE script-based workflow execution environment
    ... and potentially others to be added
    A brief overview of the capabilities of these and associated tools is provided in the VI-HPS Tools Guide.

    The workshop will be held in English and run from 09:00 to not later than 18:00 each day, with breaks for lunch and refreshments. For participants from public research institutions in PRACE countries, the course fee is sponsored through the PRACE PATC program. All participants are responsible for their own travel and accommodation.

    A social event for participant and instructor networking is planned for the evening on Tuesday 24 April, consisting of a guided tour of the Weihenstephan Brewery sponsored by Megware followed by a self-paid dinner at the brewery restaurant.

    Classroom capacity is limited, therefore priority may be given to applicants with parallel codes already running on the workshop computer system (CoolMUC-3), and those bringing codes from similar Xeon Phi x86 cluster systems to work on. Participants are therefore encouraged to prepare their own MPI, OpenMP and hybrid MPI+OpenMP parallel application codes for analysis.



     

    Picture: Participants of the 21st VI-HPS Tuning Workshop at LRZ in April 2016.

    Programme in Detail (provisional)

    Day 1:
    Monday 23 April
    08:30
    (registration & set-up of course accounts on workshop computers)
    09:00
    Welcome
    Introduction to VI-HPS & overview of tools
    Introduction to parallel performance engineering
    CoolMUC-3 computer system and software environment [Volker Weinberg, LRZ]
    Building and running NPB/BT-MZ on CoolMUC-3 [Ilya Zhukov, JSC]

    10:30
    (break)
     
    11:00
    mpiP lightweight MPI profiling [Martin Schulz, TUM]
    mpiP hands-on exercises
    MAQAO performance analysis tools [Cédric Valensi, Emmanuel Oseret & Salah Ibn Amar, UVSQ]
    MAQAO hands-on exercises

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:30
    Review of day and schedule for remainder of workshop
    18:00
    (adjourn)
     
    Day 2:
    Tuesday 24 April
    09:00
    Score-P instrumentation & measurement toolset [Ronny Tschüter, TUD]
    Score-P hands-on exercises
    CUBE profile explorer hands-on exercises [Ilya Zhukov, JSC]

    10:30
    (break)
    11:00
    Score-P analysis scoring & measurement filtering  [R. Tschüter, TUD]
    Measuring hardware counters and other metrics 
    Extra-P automated performance modeling [Sergei Shudler, TUDarmstadt]
    Extra-P hands-on exercises

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:30
    Review of day and schedule for remainder of workshop
    18:00
    (adjourn)
    19:30
    Social event: Guided tour of Weihenstephan Brewery and dinner
     
    Day 3:
    Wednesday 25 April
    09:00
    Scalasca automated trace analysis [Ilya Zhukov, JSC]
    Scalasca hands-on exercises
    Vampir interactive trace analysis [Matthias Weber, TUDresden]
    Vampir hands-on exercises

    10:30
    (break)
     
    11:00
    Paraver tracing tools suite [Judit Giménez & Lau Mercadal, BSC]
    Paraver hands-on exercises

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:30
    Review of day and schedule for remainder of workshop
    18:00
    (adjourn)
     
    Day 4:
    Thursday 26 April
    09:00
    JUBE workflow execution environment [Thomas Breuer, JSC]
    JUBE hands-on exercises
    Periscope Tuning Framework [Robert Mijakovic, TUM]
    Periscope hands-on exercises

    10:30
    (break)
     
    11:00
    TAU performance system [Sameer Shende, UOregon]
    TAU hands-on exercises

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:30
    Review of day and schedule for remainder of workshop
    18:00
    (adjourn)
     
    Day 5:
    Friday 27 April
    09:00
    MUST MPI runtime error detection [Joachim Protze, RWTH]
    MUST hands-on exercises
    ARCHER OpenMP runtime error detection [Joachim Protze, RWTH]
    ARCHER hands-on exercises

    10:30
    (break)
    11:00
    Kcachegrind cache analysis [Josef Weidendorfer, LRZ]
    Kcachegrind hands-on exercises
    Review

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:00
    (adjourn)

     

    Hardware and Software Platforms

    CoolMUC-3: KNL-based x86 Linux cluster system:

    148 compute nodes each with single Intel Xeon Phi 7210-F 'Knights Landing' MIC processors (1.3GHz, 64 cores per processor, 4 hardware threads per core) and 96GB RAM and 16GB HBM
    cluster modes: quad, snc4, a2a
    memory modes: flat, cache, hybrid
    network: Intel OmniPath interconnect
    parallel filesystem: GPFS (SCRATCH & WORK)
    software: SLES12-based GNU/Linux, Intel MPI; Intel, GCC and other compilers; SLURM batchsystem
    The local HPC system CoolMUC-3 is the primary platform for the workshop and will be used for the hands-on exercises. Course accounts will be provided during the workshop to participants without existing accounts. Other systems where up-to-date versions of the tools are installed can also be used when preferred, though support may be limited and participants are expected to already possess user accounts on non-local systems. Regardless of whichever systems they intend to use, participants should be familiar with the relevant procedures for compiling and running their parallel applications (via batch queues where appropriate).

    Registration

    Via https://events.prace-ri.eu/event/648/registration/register

    Location, Transport and Accommodation

    The workshop will be held in Leibniz Rechenzentrum on the university campus outside Garching bei München, approximately 25 min north from the city centre of Munich. The U-bahn line U6 (station: Garching-Forschungszentrum) provides direct connection from the campus area to both Munich and Garching.Getting to/from LRZ

    It is recommended to choose a hotel in Garching or Munich city centre and use the U-bahn to reach LRZ.Accommodation in GarchingAccommodation in Munich

    https://events.prace-ri.eu/event/648/
    Apr 23 9:00 to Apr 27 17:00
                 

    Sponsor



    Goals

    This workshop organized by VI-HPS, LRZ (new VI-HPS member since April 2018) and IT4Innovations as a PRACE training event will:

    give an overview of the VI-HPS programming tools suite
    explain the functionality of individual tools, and how to use them effectively
    offer hands-on experience and expert assistance using the tools
    To foster the Czech-German collaboration in high performance computing, a contingent of places has been reserved for participants from the Czech Republic.

    Programme Overview

    The detailed program will be available on the VI-HPS training web site.

    Presentations and hands-on sessions are planned on the following topics

    Setting up, welcome and introduction
    Score-P instrumentation and measurement
    Scalasca automated trace analysis
    TAU performance system
    Vampir interactive trace analysis
    Periscope/PTF automated performance analysis and optimisation
    Extra-P automated performance modeling
    Paraver/Extrae/Dimemas trace analysis and performance prediction
    [k]cachegrind cache utilisation analysis
    MAQAO performance analysis & optimisation
    MUST runtime error detection for MPI
    ARCHER runtime error detection for OpenMP
    JUBE script-based workflow execution environment
    ... and potentially others to be added
    A brief overview of the capabilities of these and associated tools is provided in the VI-HPS Tools Guide.

    The workshop will be held in English and run from 09:00 to not later than 18:00 each day, with breaks for lunch and refreshments. For participants from public research institutions in PRACE countries, the course fee is sponsored through the PRACE PATC program. All participants are responsible for their own travel and accommodation.

    A social event for participant and instructor networking is planned for the evening on Tuesday 24 April, consisting of a guided tour of the Weihenstephan Brewery sponsored by Megware followed by a self-paid dinner at the brewery restaurant.

    Classroom capacity is limited, therefore priority may be given to applicants with parallel codes already running on the workshop computer system (CoolMUC-3), and those bringing codes from similar Xeon Phi x86 cluster systems to work on. Participants are therefore encouraged to prepare their own MPI, OpenMP and hybrid MPI+OpenMP parallel application codes for analysis.



     

    Picture: Participants of the 21st VI-HPS Tuning Workshop at LRZ in April 2016.

    Programme in Detail (provisional)

    Day 1:
    Monday 23 April
    08:30
    (registration & set-up of course accounts on workshop computers)
    09:00
    Welcome
    Introduction to VI-HPS & overview of tools
    Introduction to parallel performance engineering
    CoolMUC-3 computer system and software environment [Volker Weinberg, LRZ]
    Building and running NPB/BT-MZ on CoolMUC-3 [Ilya Zhukov, JSC]

    10:30
    (break)
     
    11:00
    mpiP lightweight MPI profiling [Martin Schulz, TUM]
    mpiP hands-on exercises
    MAQAO performance analysis tools [Cédric Valensi, Emmanuel Oseret & Salah Ibn Amar, UVSQ]
    MAQAO hands-on exercises

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:30
    Review of day and schedule for remainder of workshop
    18:00
    (adjourn)
     
    Day 2:
    Tuesday 24 April
    09:00
    Score-P instrumentation & measurement toolset [Ronny Tschüter, TUD]
    Score-P hands-on exercises
    CUBE profile explorer hands-on exercises [Ilya Zhukov, JSC]

    10:30
    (break)
    11:00
    Score-P analysis scoring & measurement filtering  [R. Tschüter, TUD]
    Measuring hardware counters and other metrics 
    Extra-P automated performance modeling [Sergei Shudler, TUDarmstadt]
    Extra-P hands-on exercises

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:30
    Review of day and schedule for remainder of workshop
    18:00
    (adjourn)
    19:30
    Social event: Guided tour of Weihenstephan Brewery and dinner
     
    Day 3:
    Wednesday 25 April
    09:00
    Scalasca automated trace analysis [Ilya Zhukov, JSC]
    Scalasca hands-on exercises
    Vampir interactive trace analysis [Matthias Weber, TUDresden]
    Vampir hands-on exercises

    10:30
    (break)
     
    11:00
    Paraver tracing tools suite [Judit Giménez & Lau Mercadal, BSC]
    Paraver hands-on exercises

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:30
    Review of day and schedule for remainder of workshop
    18:00
    (adjourn)
     
    Day 4:
    Thursday 26 April
    09:00
    JUBE workflow execution environment [Thomas Breuer, JSC]
    JUBE hands-on exercises
    Periscope Tuning Framework [Robert Mijakovic, TUM]
    Periscope hands-on exercises

    10:30
    (break)
     
    11:00
    TAU performance system [Sameer Shende, UOregon]
    TAU hands-on exercises

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:30
    Review of day and schedule for remainder of workshop
    18:00
    (adjourn)
     
    Day 5:
    Friday 27 April
    09:00
    MUST MPI runtime error detection [Joachim Protze, RWTH]
    MUST hands-on exercises
    ARCHER OpenMP runtime error detection [Joachim Protze, RWTH]
    ARCHER hands-on exercises

    10:30
    (break)
    11:00
    Kcachegrind cache analysis [Josef Weidendorfer, LRZ]
    Kcachegrind hands-on exercises
    Review

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:00
    (adjourn)

     

    Hardware and Software Platforms

    CoolMUC-3: KNL-based x86 Linux cluster system:

    148 compute nodes each with single Intel Xeon Phi 7210-F 'Knights Landing' MIC processors (1.3GHz, 64 cores per processor, 4 hardware threads per core) and 96GB RAM and 16GB HBM
    cluster modes: quad, snc4, a2a
    memory modes: flat, cache, hybrid
    network: Intel OmniPath interconnect
    parallel filesystem: GPFS (SCRATCH & WORK)
    software: SLES12-based GNU/Linux, Intel MPI; Intel, GCC and other compilers; SLURM batchsystem
    The local HPC system CoolMUC-3 is the primary platform for the workshop and will be used for the hands-on exercises. Course accounts will be provided during the workshop to participants without existing accounts. Other systems where up-to-date versions of the tools are installed can also be used when preferred, though support may be limited and participants are expected to already possess user accounts on non-local systems. Regardless of whichever systems they intend to use, participants should be familiar with the relevant procedures for compiling and running their parallel applications (via batch queues where appropriate).

    Registration

    Via https://events.prace-ri.eu/event/648/registration/register

    Location, Transport and Accommodation

    The workshop will be held in Leibniz Rechenzentrum on the university campus outside Garching bei München, approximately 25 min north from the city centre of Munich. The U-bahn line U6 (station: Garching-Forschungszentrum) provides direct connection from the campus area to both Munich and Garching.Getting to/from LRZ

    It is recommended to choose a hotel in Garching or Munich city centre and use the U-bahn to reach LRZ.Accommodation in GarchingAccommodation in Munich

    https://events.prace-ri.eu/event/648/
    Apr 23 9:00 to Apr 27 17:00
                 

    Sponsor



    Goals

    This workshop organized by VI-HPS, LRZ (new VI-HPS member since April 2018) and IT4Innovations as a PRACE training event will:

    give an overview of the VI-HPS programming tools suite
    explain the functionality of individual tools, and how to use them effectively
    offer hands-on experience and expert assistance using the tools
    To foster the Czech-German collaboration in high performance computing, a contingent of places has been reserved for participants from the Czech Republic.

    Programme Overview

    The detailed program will be available on the VI-HPS training web site.

    Presentations and hands-on sessions are planned on the following topics

    Setting up, welcome and introduction
    Score-P instrumentation and measurement
    Scalasca automated trace analysis
    TAU performance system
    Vampir interactive trace analysis
    Periscope/PTF automated performance analysis and optimisation
    Extra-P automated performance modeling
    Paraver/Extrae/Dimemas trace analysis and performance prediction
    [k]cachegrind cache utilisation analysis
    MAQAO performance analysis & optimisation
    MUST runtime error detection for MPI
    ARCHER runtime error detection for OpenMP
    JUBE script-based workflow execution environment
    ... and potentially others to be added
    A brief overview of the capabilities of these and associated tools is provided in the VI-HPS Tools Guide.

    The workshop will be held in English and run from 09:00 to not later than 18:00 each day, with breaks for lunch and refreshments. For participants from public research institutions in PRACE countries, the course fee is sponsored through the PRACE PATC program. All participants are responsible for their own travel and accommodation.

    A social event for participant and instructor networking is planned for the evening on Tuesday 24 April, consisting of a guided tour of the Weihenstephan Brewery sponsored by Megware followed by a self-paid dinner at the brewery restaurant.

    Classroom capacity is limited, therefore priority may be given to applicants with parallel codes already running on the workshop computer system (CoolMUC-3), and those bringing codes from similar Xeon Phi x86 cluster systems to work on. Participants are therefore encouraged to prepare their own MPI, OpenMP and hybrid MPI+OpenMP parallel application codes for analysis.



     

    Picture: Participants of the 21st VI-HPS Tuning Workshop at LRZ in April 2016.

    Programme in Detail (provisional)

    Day 1:
    Monday 23 April
    08:30
    (registration & set-up of course accounts on workshop computers)
    09:00
    Welcome
    Introduction to VI-HPS & overview of tools
    Introduction to parallel performance engineering
    CoolMUC-3 computer system and software environment [Volker Weinberg, LRZ]
    Building and running NPB/BT-MZ on CoolMUC-3 [Ilya Zhukov, JSC]

    10:30
    (break)
     
    11:00
    mpiP lightweight MPI profiling [Martin Schulz, TUM]
    mpiP hands-on exercises
    MAQAO performance analysis tools [Cédric Valensi, Emmanuel Oseret & Salah Ibn Amar, UVSQ]
    MAQAO hands-on exercises

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:30
    Review of day and schedule for remainder of workshop
    18:00
    (adjourn)
     
    Day 2:
    Tuesday 24 April
    09:00
    Score-P instrumentation & measurement toolset [Ronny Tschüter, TUD]
    Score-P hands-on exercises
    CUBE profile explorer hands-on exercises [Ilya Zhukov, JSC]

    10:30
    (break)
    11:00
    Score-P analysis scoring & measurement filtering  [R. Tschüter, TUD]
    Measuring hardware counters and other metrics 
    Extra-P automated performance modeling [Sergei Shudler, TUDarmstadt]
    Extra-P hands-on exercises

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:30
    Review of day and schedule for remainder of workshop
    18:00
    (adjourn)
    19:30
    Social event: Guided tour of Weihenstephan Brewery and dinner
     
    Day 3:
    Wednesday 25 April
    09:00
    Scalasca automated trace analysis [Ilya Zhukov, JSC]
    Scalasca hands-on exercises
    Vampir interactive trace analysis [Matthias Weber, TUDresden]
    Vampir hands-on exercises

    10:30
    (break)
     
    11:00
    Paraver tracing tools suite [Judit Giménez & Lau Mercadal, BSC]
    Paraver hands-on exercises

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:30
    Review of day and schedule for remainder of workshop
    18:00
    (adjourn)
     
    Day 4:
    Thursday 26 April
    09:00
    JUBE workflow execution environment [Thomas Breuer, JSC]
    JUBE hands-on exercises
    Periscope Tuning Framework [Robert Mijakovic, TUM]
    Periscope hands-on exercises

    10:30
    (break)
     
    11:00
    TAU performance system [Sameer Shende, UOregon]
    TAU hands-on exercises

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:30
    Review of day and schedule for remainder of workshop
    18:00
    (adjourn)
     
    Day 5:
    Friday 27 April
    09:00
    MUST MPI runtime error detection [Joachim Protze, RWTH]
    MUST hands-on exercises
    ARCHER OpenMP runtime error detection [Joachim Protze, RWTH]
    ARCHER hands-on exercises

    10:30
    (break)
    11:00
    Kcachegrind cache analysis [Josef Weidendorfer, LRZ]
    Kcachegrind hands-on exercises
    Review

    12:30
    (lunch)
    13:30
    Hands-on coaching to apply tools to analyze participants' own code(s).
    17:00
    (adjourn)

     

    Hardware and Software Platforms

    CoolMUC-3: KNL-based x86 Linux cluster system:

    148 compute nodes each with single Intel Xeon Phi 7210-F 'Knights Landing' MIC processors (1.3GHz, 64 cores per processor, 4 hardware threads per core) and 96GB RAM and 16GB HBM
    cluster modes: quad, snc4, a2a
    memory modes: flat, cache, hybrid
    network: Intel OmniPath interconnect
    parallel filesystem: GPFS (SCRATCH & WORK)
    software: SLES12-based GNU/Linux, Intel MPI; Intel, GCC and other compilers; SLURM batchsystem
    The local HPC system CoolMUC-3 is the primary platform for the workshop and will be used for the hands-on exercises. Course accounts will be provided during the workshop to participants without existing accounts. Other systems where up-to-date versions of the tools are installed can also be used when preferred, though support may be limited and participants are expected to already possess user accounts on non-local systems. Regardless of whichever systems they intend to use, participants should be familiar with the relevant procedures for compiling and running their parallel applications (via batch queues where appropriate).

    Registration

    Via https://events.prace-ri.eu/event/648/registration/register

    Location, Transport and Accommodation

    The workshop will be held in Leibniz Rechenzentrum on the university campus outside Garching bei München, approximately 25 min north from the city centre of Munich. The U-bahn line U6 (station: Garching-Forschungszentrum) provides direct connection from the campus area to both Munich and Garching.Getting to/from LRZ

    It is recommended to choose a hotel in Garching or Munich city centre and use the U-bahn to reach LRZ.Accommodation in GarchingAccommodation in Munich

    https://events.prace-ri.eu/event/648/
    Apr 23 9:00 to Apr 27 17:00
    28
     
    29
     
    30
     
     


    PTC events this month:

    April 2018
    Mon Tue Wed Thu Fri Sat Sun
     
    1
     
    2
     
    3
     
    4
     
    5
     
    6
     
    7
     
    8
     
    9
     
    10
     
    11
     
    12
     
    13
     
    14
     
    15
     
    16
     
    17
     
    18
     
    19
     
    20
     
    21
     
    22
     
    Introduction to Biomolecular modelling and Molecular dynamics in HPC

    (Classical and Quantum)

    23 April 2018

    Purpose of the course

    The purpose of this course is to present to existing and potential users of Molecular Dynamics packages the method, the necessary steps for a successful simulation, common practices, common mistakes. The steps for a complete simulation workflow i.e. system setup up to final properties evaluation will be presented using popular software packages.

    Outcomes

    After the course the participants should be able to efficiently use their prefered MD application (i.e. NAMD, GROMACS, LAMMPS, CP2K), for molecular modelling and molecular dynamics simulations,  how to create configuration files based on their needs, tuning the models, how to efficiently use the resources based on the simulation details, avoid common mistakes.

    Prerequisites

    Background in Physics/Chemistry/Biology. Programming skills, aware of Parallel environments. Bring your own laptop in order to be able to participate in the training hands on. Hands on work will be done in pairs so if you don’t have a laptop you might work with a colleague. Course language is English.

    Registration

    Registrations will be evaluated on a first-come, first-served basis. GRNET is responsible for the selection of the participants on the basis of the training requirements and the technical skills of the candidates. GRNET will also seek to guarantee the maximum possible geographical coverage with the participation of candidates from many countries.

    Venue

    GRNET headquarters

    Address: 2nd  Floor, 7, Kifisias Av. GR 115 23 Athens

    Information on how to reach GRNET headquarters ia available on GRNET website: https://grnet.gr/en/contact-us/  

    Accommodation options near GRNET can be found at: https://grnet.gr/wp-content/uploads/sites/13/2015/11/Hotels-near-GRNET-en.pdf

    ARIS - System Information

    ARIS is the name of the Greek supercomputer, deployed and operated by GRNET (Greek Research and Technology Network) in Athens. ARIS consists of 532 computational nodes seperated in four “islands” as listed here:


    426 thin nodes: Regular compute nodes without accelerator.


    44 gpu nodes: “2 x NVIDIA Tesla k40m” accelerated nodes.


    18 phi nodes: “2 x INTEL Xeon Phi 7120p” accelerated nodes.


    44 fat nodes: Fat compute nodes have larger number of cores and memory per core than a thin node.

    All the nodes are connected via Infiniband network and share 2PB GPFS storage.The infrastructure also has an IBM TS3500 library of maximum storage capacity of about 6 PB. Access to the system is provided by two login nodes.

    About Tutors

    Dr. Zoe Cournia (female) is a Researcher – Assistant Professor level at the Biomedical Research Foundation, Academy of Athens, where she works on anticancer drug design, design of drug delivery systems and biomolecular modeling using computational techniques. She graduated from the Chemistry Department, University of Athens in 2001 and completed her PhD at the University of Heidelberg in Germany in 2006. She then worked as a postdoctoral researcher at the Chemistry Department, Yale University, USA, on computer-aided drug design and in 2009 she became a lecturer at Yale College. She has been awarded the American Association for Cancer Research Angiogenesis Fellowship (2008), the "Woman of Innovation 2009" Award from the Connecticut Technology Council, USA, the Marie Curie Fellowship from the European Union (2010), the "Outstanding Junior Faculty Award" from the American Chemical Society (2014) and the first "Ada Lovelace Award" from the "Partnership for Advanced Computing in Europe" (2016). She is currently teaching at the Master’s program “Information Technologies in Technology and Medicine” at the Department of Informatics and Telecommunications, National University of  Athens.

    Dr. Dimitris Tsalikis (male) is a Research Associate at the Department of Chemical Engineering in the University of Patras. His research focuses on the physicochemical characterization and the rheology of polymers, polymer nanocomposites, nanofluidics and formulations via atomistic and mesoscopic simulations and to this he develops novel parallel computational methodologies. He received his Diploma in Chemical Engineering from the University of Patras in 2004 and his Ph.D. (titled: “Computational study of structural relaxation and plastic deformation of glassy polymers”) from the National Technical University of Athens in 2009 under the advisement of Prof. Doros N. Theodorou. In 2011 he joined the research team of Prof. Vlasis Mavrantzas in Patras as a Research Associate. Dr. Tsalikis has a solid experience with high performance computing since 2007 being an active user of Tier1 and Tier0 HPC systems available to scientific community under the frameworks of HPC-Europa, PRACE and LinkSCEEM projects. He is currently teaching at the Master’s program “Polymer Science and Technology” at University of Patras.

     Dr. Dellis (Male) holds a B.Sc. in Chemistry (1990) and PhD in Computational Chemistry (1995) from the National and Kapodistrian University of Athens, Greece. He has extensive HPC and grid computing experience. He was using HPC systems in computational chemistry research projects on fz-juelich machines (2003-2005). He received an HPC-Europa grant on BSC (2009). In EGEE/EGI projects he acted as application support and VO software manager for SEE VO, grid sites administrator (HG-02, GR-06), NGI_GRNET support staff (2008-2014). In PRACE 1IP/2IP/3IP/4IP/5IP he was involved in benchmarking tasks either as group member or as BCO (2010-2017). Currently he holds the position of “Senior HPC Applications Support Engineer” at GRNET S.A. where he is responsible for activities related to user consultations, porting, optimization and running HPC applications at national and international resources.

    Dr Aristeidis Sotiropoulos received his BSc in Computer Science in 1998 from the University of Crete, Greece and his PhD in Parallel Processing and Cluster Computing in 2004 from the National Technical University of Athens, Greece. His interests mainly focus on the fields of Large Scale Computing & Storage Systems, System Software for Scalable High Speed Interconnects for Computer Clusters and Advanced Microprocessor Architectures. He has published several scientific papers in international journals and conference proceedings. He has received the IEEE IPDPS 2001 best paper award for the paper "Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping". He has worked in several European and National R&D programs in the field of High Performance Computing, Grid Computing, Cloud Computing and Storage. In 2013, he was appointed as the Head of Operations and Financial Management Services, in charge of 15 people. Currently, he is managing EC projects at GRNET SA, the Greek NREN responsible for the provision of advanced e-infrastructure services to the Greek Academic and Research Community.

    About GRNET

    GRNET provides Internet connectivity, high-quality e-Infrastructures and advanced services to the Greek Educational, Academic and Research community.

    Through its high-speed, high-capacity infrastructure that spans across the entire country, GRNET interconnects more than 150 institutions, including all universities and technological institutions, as well as many research institutes and the public Greek School Network.

    GRNET operates the National High Performance Computing system (a Tier-1 in the European HPC ecosystem) and offers user and application support services, that provide Greek scientists with the computing infrastructure and expertise they need for their research enabling them to perform large scale simulations.

    GRNET offers innovative IaaS cloud computing services to the Greek and global research & education communities: “ ~okeanos” and “okeanos global” allow users to create multi-layer virtual infrastructure and instantiate virtual computing machines, local networks to interconnect them, and a reliable storage space within seconds, with few, simple mouse clicks.

    GRNET aims at contributing towards Greece’s Digital Convergence with the EU, by supporting the development and encouraging the use of e-Infrastructures and services. The right and timely planning strategies, together with the long experience and know-how of its people, guarantee the continuation and enhancement of GRNET’s successful course.

    Greek Research and Technology Network – Networking Reserach and Education:

    www.grnet.gr, hpc.grnet.gr

    https://events.prace-ri.eu/event/724/
    Apr 23 9:00 17:30
    Would you like to make 3D visualisations that are visually more
    attractive than what ParaView or VisIt can provide? Do you need an image
    for a grant application that needs to look spectacular? Would you like
    to create a cool animation of flying through your simulation data? Then
    this course may be for you!

    The goal of this course is to provide you with hands-on knowledge of
    producing great images and animations from 3D (scientific) data. We will
    be using the open-source package Blender (www.blender.org), which
    provides good basic functionality, while also being usable for advanced
    usage and general editing of 3D data. It is also a lot of fun to work
    with (once you get used to its graphical interface).

    Example types of relevant scientific data are 3D cell-based simulations,
    3D models from photogrammetry, (isosurfaces of) 3D medical scans,
    molecular models and earth sciences data. Note that we don't focus on
    information visualization of abstract data, such as graphs (although you
    could convert those into a 3D model first and then use them in Blender).

    We like to encourage participants to bring along (a sample) of the data
    they normally work with and would like to apply the course knowledge to.

    https://events.prace-ri.eu/event/721/
    Apr 24 9:00 17:30
    25
     
    26
     
    27
     
    28
     
    29
     
    30