Créditos
6
Tipos
Complementaria de especialidad (Computación de Altas Prestaciones)
Requisitos
Esta asignatura no tiene requisitos
, pero tiene capacidades previas
Departamento
AC
In the first part of the course, we will explore the basic building blocks of supercomputers and their system software stacks. We will then enter into traditional parallel and distributed programming models, essential for exploiting parallelism and scaling applications in conventional high-performance infrastructures.
In the second part of the course, we will review the hardware and software stack that allows the management of distributed GPU applications, which have become ubiquitous in high-performance computing worldwide installations over the past decade. These GPU-based systems deliver the majority of performance in the largest Pre-Exascale supercomputers, such as the Marenostrum 5 supercomputer.
The third part of the course will focus on understanding how contemporary supercomputing systems have been the true drivers of recent advances in artificial intelligence, with particular emphasis on the scalability of deep learning algorithms using these advanced high-performance computing installations based on GPUs.
Adopting a "learn by doing" approach, the course combines lectures, reading assignments, and hands-on exercises using one of Europe¿s fastest supercomputers, the Marenostrum 5 at the Barcelona Supercomputing Center (BSC-CNS). Assessment will be continuous, ensuring consistent and steady progress, with the aim of equipping students with practical skills to adapt to and anticipate new technologies in the evolving landscape of high-performance computing.
Profesorado
Responsable
- Jordi Torres Viñals ( torres@ac.upc.edu )
Horas semanales
Teoría
2
Problemas
0
Laboratorio
2
Aprendizaje dirigido
0
Aprendizaje autónomo
7.5384
Competencias
High performance computing
Genéricas
Trabajo en equipo
Básicas
Objetivos
Contenidos
-
00. Bienvenida: Contenido del curso y motivación
-
01. Conceptos básicos de supercomputación
-
02. Heterogeneous supercomputers
-
03. Supercomputer management and storage systems
-
04. Benchmarking supercomputers
-
05. Data center infrastructures
-
06. Parallel programming models
-
07. Parallel performance models
-
08. Parallel programming languages for heterogeneous platforms
-
09. Artificial Intelligence is a computing problem
-
10. Deep Learning essential concepts
-
11. Using Supercomputers for DL training
-
12. Accelerate the learning with parallel training on multi-GPUs
-
13. Accelerate the learning with distributed training on multiple parallel servers
-
14. How to speed up the training of Transformers-based models
Actividades
Actividad Acto evaluativo
Teoría
0.5h
Problemas
0h
Laboratorio
0h
Aprendizaje dirigido
0h
Aprendizaje autónomo
0h
01. Supercomputing basics
Teoría
1h
Problemas
0h
Laboratorio
0h
Aprendizaje dirigido
0h
Aprendizaje autónomo
3.5h
Exercise 01: Supercomputing impact
Teoría
0h
Problemas
0h
Laboratorio
1h
Aprendizaje dirigido
0h
Aprendizaje autónomo
2h
02. Heterogeneous supercomputers
Teoría
1h
Problemas
0h
Laboratorio
0h
Aprendizaje dirigido
0h
Aprendizaje autónomo
4h
Exercise 02: Getting started with storage and management systems
Teoría
0h
Problemas
0h
Laboratorio
1h
Aprendizaje dirigido
0h
Aprendizaje autónomo
2h
03. Supercomputer management and storage systems
Teoría
2h
Problemas
0h
Laboratorio
0h
Aprendizaje dirigido
0h
Aprendizaje autónomo
2h
Exercise 03: Exascale computers challenge
Teoría
0h
Problemas
0h
Laboratorio
2h
Aprendizaje dirigido
0h
Aprendizaje autónomo
2h
04. Benchmarking supercomputers
Teoría
2h
Problemas
0h
Laboratorio
0h
Aprendizaje dirigido
0h
Aprendizaje autónomo
4h
Exercise 04: Getting started with parallel programming models
Teoría
0h
Problemas
0h
Laboratorio
1h
Aprendizaje dirigido
0h
Aprendizaje autónomo
4h
05. Data centers infrastructures
Teoría
1h
Problemas
0h
Laboratorio
0h
Aprendizaje dirigido
0h
Aprendizaje autónomo
4h
Exercise 05: Getting started with parallel performance metrics
Teoría
0h
Problemas
0h
Laboratorio
1h
Aprendizaje dirigido
0h
Aprendizaje autónomo
3h
06. Parallel programming models
Teoría
6h
Problemas
0h
Laboratorio
0h
Aprendizaje dirigido
0h
Aprendizaje autónomo
3h
Exercise 06: Getting started with parallel performance models
Teoría
0h
Problemas
0h
Laboratorio
1h
Aprendizaje dirigido
0h
Aprendizaje autónomo
3h
07. Parallel performance models
Teoría
1h
Problemas
0h
Laboratorio
0h
Aprendizaje dirigido
0h
Aprendizaje autónomo
2h
Exercise 07: Emerging trends in supercomputing
Teoría
0h
Problemas
0h
Laboratorio
1h
Aprendizaje dirigido
0h
Aprendizaje autónomo
5h
08. Parallel programming languages for heterogeneous platforms
Teoría
1h
Problemas
0h
Laboratorio
0h
Aprendizaje dirigido
0h
Aprendizaje autónomo
1h
Exercise 08: Getting started with CUDA
Teoría
0.5h
Problemas
0h
Laboratorio
3h
Aprendizaje dirigido
0h
Aprendizaje autónomo
3h
Midterm
Teoría
2h
Problemas
0h
Laboratorio
0h
Aprendizaje dirigido
0h
Aprendizaje autónomo
10.5h
09. Artificial Intelligence is a Supercomputing problem
Teoría
2h
Problemas
0h
Laboratorio
0h
Aprendizaje dirigido
0h
Aprendizaje autónomo
3h
Exercise 09: First contact with Deep Learning and Supercomputing
Teoría
0h
Problemas
0h
Laboratorio
2h
Aprendizaje dirigido
0h
Aprendizaje autónomo
4h
10. Deep Learning essential concepts
Teoría
1h
Problemas
0h
Laboratorio
0h
Aprendizaje dirigido
0h
Aprendizaje autónomo
1h
Exercise 10: The new edition of the TOP500
Teoría
0h
Problemas
0h
Laboratorio
1h
Aprendizaje dirigido
0h
Aprendizaje autónomo
4h
11. Using Supercomputers for DL training
Teoría
1.5h
Problemas
0h
Laboratorio
0h
Aprendizaje dirigido
0h
Aprendizaje autónomo
2h
Exercise 11: Using a supercomputer for Deep Learning training
Teoría
0h
Problemas
0h
Laboratorio
3h
Aprendizaje dirigido
0h
Aprendizaje autónomo
4h
12. Accelerate the learning with parallel training using a multi-GPU parallel server
Teoría
1h
Problemas
0h
Laboratorio
0h
Aprendizaje dirigido
0h
Aprendizaje autónomo
3h
Exercise 12: Accelerate the learning with parallel training using a multi-GPU parallel server
Teoría
0h
Problemas
0h
Laboratorio
3h
Aprendizaje dirigido
0h
Aprendizaje autónomo
4h
13. Accelerate the learning with distributed training using multiple parallel servers
Teoría
1h
Problemas
0h
Laboratorio
0h
Aprendizaje dirigido
0h
Aprendizaje autónomo
1h
Exercise 13: Accelerate the learning with distributed training using multiple parallel server
Teoría
0h
Problemas
0h
Laboratorio
3h
Aprendizaje dirigido
0h
Aprendizaje autónomo
8h
14. How to speed up the training of Transformers-based models
Teoría
1h
Problemas
0h
Laboratorio
0h
Aprendizaje dirigido
0h
Aprendizaje autónomo
0h
Exercise 14: How to speed up the training of Transformers-based models
Teoría
0h
Problemas
0h
Laboratorio
3h
Aprendizaje dirigido
0h
Aprendizaje autónomo
4h
Final remarks
Teoría
0.5h
Problemas
0h
Laboratorio
0h
Aprendizaje dirigido
0h
Aprendizaje autónomo
2h
Metodología docente
Class attendance and participation: Regular attendance is expected, and is required to be able to discuss concepts that will be covered during class.Lab activities: Some exercises will be conducted as hands-on sessions during the course using supercomputing facilities. The student's own laptop will be required to access these resources during the theory class. Each hands-on session will involve writing a lab report with all the results. There are no days for theory classes and days for laboratory classes. Theoretical and practical activities will be interspersed during the same session to facilitate the learning process.
Reading/presentation assignments: Some exercise assignments will consist of reading documentation/papers that expand the concepts introduced during lectures. Some exercises will involve student presentations (randomly chosen).
Assessment: There will be one midterm exam in the middle of the course. The student is allowed to use any type of documentation (also digital via the student's laptop).
Método de evaluación
The evaluation of this course can be obtained by continuous assessment. This assessment will take into account the following:20% Attendance + participation
10% Midterm exam
70% Exercises (+ exercise presentations) and Lab exercises (+ Lab reports)
Students who have not benefited from continuous assessment have the opportunity to take a final Course Exam. This exam includes evaluating the knowledge of the entire course (practical part, theoretical part, and self-learning part). During this course exam, the student is not allowed to use any documentation (neither on paper nor digital).
Bibliografía
Básico
-
Supercomputing for Artificial Intelligence: Foundations, Architectures, and Scaling Deep Learning
- Torres, Jordi,
WATCH THIS SPACE Book Series - Barcelona. Amazon KDP,
2025.
ISBN: 9798319328359
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991005476510706711&context=L&vid=34CSUC_UPC:VU1&lang=ca