This course introduces the fundamentals of high-performance and parallel computing. It is targeted at scientists and engineers seeking to develop the skills necessary for working with supercomputers, the leading edge in high-performance computing technology.
In the first part of the course, we will cover the basic building blocks of supercomputers and their system software stack. Then, we will introduce their traditional parallel and distributed programming models, which allow one to exploit parallelism, a central element for scaling the applications in these types of high-performance infrastructures.
In the second part of the course, we will motivate the current supercomputing systems developed to support artificial intelligence algorithms required in today's world. This year's syllabus will pay special attention to Deep Learning (DL) algorithms and their scalability using a GPU platform.
This course uses the learn by doing approach, based on a set of exercises, made up of programming problems and reading papers, that the students must carry out throughout the course. The course will be marked by a continuous assessment, which ensures constant, steady work.
All in all, this course seeks to enable students to acquire practical skills that can help them as much as possible to adapt and anticipate the new technologies that will undoubtedly emerge in the coming years. For the practical part of the exercises, the student will use supercomputing facilities from the Barcelona Supercomputing Center (BSC-CNS).
CEE4.1 - Capability to analyze, evaluate and design computers and to propose new techniques for improvement in its architecture.
CEE4.2 - Capability to analyze, evaluate, design and optimize software considering the architecture and to propose new optimization techniques.
CEE4.3 - Capability to analyze, evaluate, design and manage system software in supercomputing environments.
Generic Technical Competences
Generic
CG1 - Capability to apply the scientific method to study and analyse of phenomena and systems in any area of Computer Science, and in the conception, design and implementation of innovative and original solutions.
Transversal Competences
Teamwork
CTR3 - Capacity of being able to work as a team member, either as a regular member or performing directive activities, in order to help the development of projects in a pragmatic manner and with sense of responsibility; capability to take into account the available resources.
Basic
CB6 - Ability to apply the acquired knowledge and capacity for solving problems in new or unknown environments within broader (or multidisciplinary) contexts related to their area of study.
CB8 - Capability to communicate their conclusions, and the knowledge and rationale underpinning these, to both skilled and unskilled public in a clear and unambiguous way.
CB9 - Possession of the learning skills that enable the students to continue studying in a way that will be mainly self-directed or autonomous.
Objectives
To train students to follow by themselves the continuous development of supercomputing systems that enable the convergence of advanced analytic algorithms as artificial intelligence.
Related competences:
CB6,
CB8,
CB9,
CTR3,
CG1,
CEE4.1,
CEE4.2,
CEE4.3,
Contents
00. Welcome: Course content and motivation
01. Supercomputing basics
02. General purpose supercomputers
03. Parallel programming models
04. Parallel performance metrics
05. Parallel Performance models
06. Heterogeneous supercomputers
07. Parallel programming languages for heterogeneous platforms
08. Emerging Trends and Challenges in Supercomputing
09. Artificial Intelligence is a computing problem
10. Deep Learning essential concepts
11. Using Supercomputers for DL training
12. Accelerate the learning with parallel training using a multi-GPU parallel server
13. Accelerate the learning with parallel training using a multi-GPU parallel server
14. How to speed up the training of Transformers-based models
Exercise 01: Read and present a paper about exascale computers challenges
Theory
0h
Problems
0h
Laboratory
1h
Guided learning
0h
Autonomous learning
2h
02. General purpose supercomputers
Theory
1h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
4h
Exercise 02: Getting started with Supercomputing
Theory
0h
Problems
0h
Laboratory
1h
Guided learning
0.2h
Autonomous learning
2h
03. Parallel programming models
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h
Exercise 03: Getting Started with Parallel Programming Models
Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0.1h
Autonomous learning
2h
04. Parallel performance metrics
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
4h
Exercise 04: Getting Started with Parallel Performance Metrics
Theory
0h
Problems
0h
Laboratory
1h
Guided learning
0.1h
Autonomous learning
4h
05. Parallel performance models
Theory
1h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
4h
Exercise 05: Getting started with parallel performance metrics and models
Theory
0h
Problems
0h
Laboratory
1h
Guided learning
0.1h
Autonomous learning
3h
06. Heterogeneous supercomputers
Theory
6h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
3h
Exercise 06: Comparing supercomputers performance
Theory
0h
Problems
0h
Laboratory
1h
Guided learning
0.1h
Autonomous learning
3h
07. Parallel programming languages for heterogeneous platforms
Theory
1h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h
Exercise 07: Getting started with CUDA
Theory
0h
Problems
0h
Laboratory
3h
Guided learning
0.1h
Autonomous learning
5h
08. Emerging Trends and Challenges in Supercomputing
Theory
1h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
1h
Exercise 08: Read and present a paper about emerging trends in supercomputing
Theory
0h
Problems
0h
Laboratory
1h
Guided learning
0.1h
Autonomous learning
3h
Midterm
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
10h
09. Artificial Intelligence is a Supercomputing problem
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
3h
Exercise 09: First contact with Deep Learning
Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0.1h
Autonomous learning
4h
10. Deep Learning essential concepts
Theory
1h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
1h
Exercise 10: The new edition of the TOP500
Theory
0h
Problems
0h
Laboratory
1h
Guided learning
0.2h
Autonomous learning
4h
11. Using Supercomputers for DL training
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h
Exercise 11: Using a supercomputer for Deep Learning training
Theory
0h
Problems
0h
Laboratory
3h
Guided learning
0.2h
Autonomous learning
4h
12. Accelerate the learning with parallel training using a multi-GPU parallel server
Theory
1h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
1h
Exercise 12: Accelerate the learning with parallel training using a multi-GPU parallel server
Theory
0h
Problems
0h
Laboratory
3h
Guided learning
0.2h
Autonomous learning
4h
13. Accelerate the learning with distributed training using multiple parallel servers
Theory
1h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
1h
Exercise 13: Accelerate the learning with distributed training using multiple parallel server
Theory
0h
Problems
0h
Laboratory
3h
Guided learning
0.2h
Autonomous learning
8h
14. How to speed up the training of Transformers-based models
Theory
1h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
1h
Exercise 14: How to speed up the training of Transformers-based models
Theory
0h
Problems
0h
Laboratory
3h
Guided learning
0.2h
Autonomous learning
4h
Final remarks
Theory
0.5h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h
Teaching methodology
Class attendance and participation: Regular attendance is expected, and is required to be able to discuss concepts that will be covered during class.
Lab activities: Some exercises will be conducted as hands-on sessions during the course using supercomputing facilities. The student's own laptop will be required to access these resources during the theory class. Each hands-on session will involve writing a lab report with all the results. There are no days for theory classes and days for laboratory classes. Theoretical and practical activities will be interspersed during the same session to facilitate the learning process.
Reading/presentation assignments: Some exercise assignments will consist of reading documentation/papers that expand the concepts introduced during lectures. Some exercises will involve student presentations (randomly chosen).
Assessment: There will be one midterm exam in the middle of the course. The student is allowed to use any type of documentation (also digital via the student's laptop)
Evaluation methodology
The evaluation of this course can be obtained by continuous assessment. This assessment will take into account the following:
20% Attendance + participation
15% Midterm exam
65% Exercises (+ exercise presentations) and Lab exercises (+ Lab reports)
Details of the weight of each component of the course in the grade are described in the tentative scheduling section.
Course Exam: For those students who have not benefited from the continuous assessment, a course exam will be announced during the course. This exam includes evaluating the knowledge of the entire course (practical part, theoretical part, and self-learning part). During this exam, the student is not allowed to use any documentation (neither on paper nor digital).
Bibliography
Basic:
Class handouts and materials associated with this class -
Torres, J,
2019.
Understanding Supercomputing, to speed up machine learning algorithms (Course notes) -
Torres, J,
2018.
Programming in C and Linux basics will be expected in the course. In addition, prior exposure to parallel programming constructions, Python language, experience with linear algebra/matrices, or machine learning knowledge will be helpful.