This course presents the main concepts behind parallel programming models and their implementation. The course analyzes the role of the programmer, the compiler, the runtime and operating systems when looking for productive programming environments and their efficient implementation. The course also describes the tools required to understand the behavior of parallel applications when executed on current supercomputing architectures (based on a collection of distributed-memory nodes, each one built from current multicore chips and/or accelerators). The course will be very practical with optimization and parallelization assignments using different tools (Extrae, Paraver and Dimemas) and programming models (OpenMP, OmpSs, MPI or CUDA) and insights into their implementation.
Teachers
Person in charge
Jesus Jose Labarta Mancho (
)
Weekly hours
Theory
2
Problems
0
Laboratory
1
Guided learning
0
Autonomous learning
7
Competences
Technical Competences of each Specialization
High performance computing
CEE4.2 - Capability to analyze, evaluate, design and optimize software considering the architecture and to propose new optimization techniques.
CEE4.3 - Capability to analyze, evaluate, design and manage system software in supercomputing environments.
Generic Technical Competences
Generic
CG3 - Capacity for mathematical modeling, calculation and experimental designing in technology and companies engineering centers, particularly in research and innovation in all areas of Computer Science.
Transversal Competences
Basic
CB7 - Ability to integrate knowledges and handle the complexity of making judgments based on information which, being incomplete or limited, includes considerations on social and ethical responsibilities linked to the application of their knowledge and judgments.
CB9 - Possession of the learning skills that enable the students to continue studying in a way that will be mainly self-directed or autonomous.
Contents
Basic concepts in parallel programming and performance analysis
Necessary background to follow an advanced parallel programming course. Issues when programming multicore architectures. General introduction of the main techniques and basic features of current performance analysis tools.
Advanced shared- and distributed-memory programming: OpenMP and MPI
Summary of basic features in OpenMP and MPI. Advanced features in OpenMP, MPI and hybrid programming.
Advanced dataflow programming and novel paradigms for accelerator-based architectures
Dataflow paradigms (OmpSs). Runtime exploitation of parallelism and architecture hiding. Advanced parallel programming using accelerators: CUDA, OpenCL, OpenACC, ...
Data acquisition and performance analytics
Tracing of sequential and parallel applications. Trace processing and performance analytics.
Models and performance prediction
Trace-based modeling of parallel performance. Architectural parameters: CPU, memory, interconnect.
Analysis and optimization of real applications
Analysis of two large applications (sequential and/or parallel) and optimization using hybrid programming paradigms (dataflow, shared- and distributed-memory and accelerators).
Teaching methodology
For the part devoted to programming models, theory classes to understand the concepts behind parallel programming models for current supercomputing architectures. This will be followed by a general introduction of the main techniques and basic features of major tools. Laboratory classes will start by introducing advanced features in the most used programming models and the usage of the tools on some simple examples. Then the student will be faced with a few relatively large codes that will have to be analyzed with different tools and optimized using hybrid programming models.
Evaluation methodology
The evaluation of the course will be based on a set of practical works. At least two major applications will have to be evaluated by each student. At least one of the applications will be in an area to which the student has no previous exposure. A detailed analysis report of the performance "problems" of each application will be required, including a detailed quantification of their importance and suggestions of potential ways to overcome them.