Supercomputers Architecture

You are here

Credits
6
Types
Specialization complementary (High Performance Computing)
Requirements
This subject has not requirements, but it has got previous capacities
Department
AC
This course introduces the fundamentals of high-performance and parallel computing, designed for scientists and engineers aiming to develop skills in working with supercomputers, the forefront of high-performance computing technology.

In the first part of the course, we will explore the basic building blocks of supercomputers and their system software stacks. We will then enter into traditional parallel and distributed programming models, essential for exploiting parallelism and scaling applications in conventional high-performance infrastructures.

In the second part of the course, we will review the hardware and software stack that allows the management of distributed GPU applications, which have become ubiquitous in high-performance computing worldwide installations over the past decade. These GPU-based systems deliver the majority of performance in the largest Pre-Exascale supercomputers, such as the Marenostrum 5 supercomputer.

The third part of the course will focus on understanding how contemporary supercomputing systems have been the true drivers of recent advances in artificial intelligence, with particular emphasis on the scalability of deep learning algorithms using these advanced high-performance computing installations based on GPUs.

Adopting a "learn by doing" approach, the course combines lectures, reading assignments, and hands-on exercises using one of Europe¿s fastest supercomputers, the Marenostrum 5 at the Barcelona Supercomputing Center (BSC-CNS). Assessment will be continuous, ensuring consistent and steady progress, with the aim of equipping students with practical skills to adapt to and anticipate new technologies in the evolving landscape of high-performance computing.

Teachers

Person in charge

  • Jordi Torres Viñals ( )

Weekly hours

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
7.5384

Competences

Technical Competences of each Specialization

High performance computing

  • CEE4.1 - Capability to analyze, evaluate and design computers and to propose new techniques for improvement in its architecture.
  • CEE4.2 - Capability to analyze, evaluate, design and optimize software considering the architecture and to propose new optimization techniques.
  • CEE4.3 - Capability to analyze, evaluate, design and manage system software in supercomputing environments.

Generic Technical Competences

Generic

  • CG1 - Capability to apply the scientific method to study and analyse of phenomena and systems in any area of Computer Science, and in the conception, design and implementation of innovative and original solutions.

Transversal Competences

Teamwork

  • CTR3 - Capacity of being able to work as a team member, either as a regular member or performing directive activities, in order to help the development of projects in a pragmatic manner and with sense of responsibility; capability to take into account the available resources.

Basic

  • CB6 - Ability to apply the acquired knowledge and capacity for solving problems in new or unknown environments within broader (or multidisciplinary) contexts related to their area of study.
  • CB8 - Capability to communicate their conclusions, and the knowledge and rationale underpinning these, to both skilled and unskilled public in a clear and unambiguous way.
  • CB9 - Possession of the learning skills that enable the students to continue studying in a way that will be mainly self-directed or autonomous.

Objectives

  1. To train students to follow by themselves the continuous development of supercomputing systems that enable the convergence of advanced analytic algorithms as artificial intelligence.
    Related competences: CB6, CB8, CB9, CTR3, CG1, CEE4.1, CEE4.2, CEE4.3,

Contents

  1. 00. Welcome: Course content and motivation
  2. 01. Supercomputing basics
  3. 02. Heterogeneous supercomputers
  4. 03. Supercomputer management and storage systems
  5. 04. Benchmarking supercomputers
  6. 05. Data center infrastructures
  7. 06. Parallel programming models
  8. 07. Parallel performance models
  9. 08. Parallel programming languages for heterogeneous platforms
  10. 09. Artificial Intelligence is a computing problem
  11. 10. Deep Learning essential concepts
  12. 11. Using Supercomputers for DL training
  13. 12. Accelerate the learning with parallel training on multi-GPUs
  14. 13. Accelerate the learning with distributed training on multiple parallel servers
  15. 14. How to speed up the training of Transformers-based models

Activities

Activity Evaluation act


00. Welcome


Objectives: 1
Theory
0.5h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h

01. Supercomputing basics



Theory
1h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
3.5h

Exercise 01: Supercomputing impact



Theory
0h
Problems
0h
Laboratory
1h
Guided learning
0h
Autonomous learning
2h

02. Heterogeneous supercomputers



Theory
1h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
4h

Exercise 02: Getting started with storage and management systems



Theory
0h
Problems
0h
Laboratory
1h
Guided learning
0h
Autonomous learning
2h

03. Supercomputer management and storage systems



Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h

Exercise 03: Exascale computers challenge



Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
2h

04. Benchmarking supercomputers



Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
4h

Exercise 04: Getting started with parallel programming models



Theory
0h
Problems
0h
Laboratory
1h
Guided learning
0h
Autonomous learning
4h

05. Data centers infrastructures



Theory
1h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
4h

Exercise 05: Getting started with parallel performance metrics



Theory
0h
Problems
0h
Laboratory
1h
Guided learning
0h
Autonomous learning
3h

06. Parallel programming models



Theory
6h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
3h

Exercise 06: Getting started with parallel performance models



Theory
0h
Problems
0h
Laboratory
1h
Guided learning
0h
Autonomous learning
3h

07. Parallel performance models



Theory
1h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h

Exercise 07: Emerging trends in supercomputing



Theory
0h
Problems
0h
Laboratory
1h
Guided learning
0h
Autonomous learning
5h

08. Parallel programming languages for heterogeneous platforms



Theory
1h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
1h

Exercise 08: Getting started with CUDA



Theory
0.5h
Problems
0h
Laboratory
3h
Guided learning
0h
Autonomous learning
3h

Midterm



Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
10.5h

09. Artificial Intelligence is a Supercomputing problem



Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
3h

Exercise 09: First contact with Deep Learning and Supercomputing



Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
4h

10. Deep Learning essential concepts



Theory
1h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
1h

Exercise 10: The new edition of the TOP500



Theory
0h
Problems
0h
Laboratory
1h
Guided learning
0h
Autonomous learning
4h

11. Using Supercomputers for DL training



Theory
1.5h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h

Exercise 11: Using a supercomputer for Deep Learning training



Theory
0h
Problems
0h
Laboratory
3h
Guided learning
0h
Autonomous learning
4h

12. Accelerate the learning with parallel training using a multi-GPU parallel server



Theory
1h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
3h

Exercise 12: Accelerate the learning with parallel training using a multi-GPU parallel server



Theory
0h
Problems
0h
Laboratory
3h
Guided learning
0h
Autonomous learning
4h

13. Accelerate the learning with distributed training using multiple parallel servers



Theory
1h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
1h

Exercise 13: Accelerate the learning with distributed training using multiple parallel server



Theory
0h
Problems
0h
Laboratory
3h
Guided learning
0h
Autonomous learning
8h

14. How to speed up the training of Transformers-based models



Theory
1h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h

Exercise 14: How to speed up the training of Transformers-based models



Theory
0h
Problems
0h
Laboratory
3h
Guided learning
0h
Autonomous learning
4h

Final remarks



Theory
0.5h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h

Teaching methodology

Class attendance and participation: Regular attendance is expected, and is required to be able to discuss concepts that will be covered during class.

Lab activities: Some exercises will be conducted as hands-on sessions during the course using supercomputing facilities. The student's own laptop will be required to access these resources during the theory class. Each hands-on session will involve writing a lab report with all the results. There are no days for theory classes and days for laboratory classes. Theoretical and practical activities will be interspersed during the same session to facilitate the learning process.

Reading/presentation assignments: Some exercise assignments will consist of reading documentation/papers that expand the concepts introduced during lectures. Some exercises will involve student presentations (randomly chosen).

Assessment: There will be one midterm exam in the middle of the course. The student is allowed to use any type of documentation (also digital via the student's laptop)

Evaluation methodology

The evaluation of this course can be obtained by continuous assessment. This assessment will take into account the following:

20% Attendance + participation
10% Midterm exam
70% Exercises (+ exercise presentations) and Lab exercises (+ Lab reports)

Students who have not benefited from continuous assessment have the opportunity to take a final Course Exam. This exam includes evaluating the knowledge of the entire course (practical part, theoretical part, and self-learning part). During this course exam, the student is not allowed to use any documentation (neither on paper nor digital).

Bibliography

Basic:

Previous capacities

Programming in C and Linux basics will be expected in the course. In addition, prior exposure to parallel programming constructions, Python language, experience with linear algebra/matrices, or machine learning knowledge will be helpful.