High Performance Computing for Artificial Intelligence (HPC4AI) is a master-level, practical-oriented course focused on understanding how modern AI training workloads actually run on real supercomputing infrastructures.
Rather than treating deep learning frameworks and tools as black boxes, the course adopts a system-oriented perspective. It guides students through the complete execution workflow of AI training, from hardware architecture and system software to job scheduling, parallel execution, performance measurement, and scalability analysis. The emphasis is on execution behavior: how computation, memory, communication, and coordination interact, and how these interactions determine performance, efficiency, and cost.
A central premise of the course is that the nature of engineering work in AI is changing. Modern AI tools can generate training scripts, pipelines, and even distributed execution logic with minimal effort. As a result, writing code is no longer the primary challenge. The real difficulty, and the real value, lies in understanding whether that code scales, where bottlenecks appear, when efficiency is lost, and what trade-offs are being made when more resources are used.
For this reason, the course explicitly allows and acknowledges the use of modern AI tools (such as code assistants, agentic systems, or automated code generators). However, the course is not about code authorship or syntax. It is about developing the ability to reason about performance, scalability, efficiency, and cost when training deep learning models on real HPC systems. Students are expected to understand what is being executed, how it behaves at scale, and why performance changes as observed.
Hands-on experimentation is a core component of the course. Through a sequence of laboratory activities, students train deep learning models using single and multiple GPUs, explore parallel and distributed training strategies, and analyze scalability and performance behavior under realistic conditions. All laboratory work and assessments are evaluated based on the quality of experimental setup, the relevance of performance measurements, the interpretation of results, and the soundness of scalability and cost¿benefit reasoning.
The course material is self-contained and based on the official course textbook, which serves as the main reference for both theoretical concepts and practical activities. No prior experience with supercomputers is required, and deep learning concepts are introduced progressively as needed.
Ultimately, HPC4AI is not a course about recipes or fixed solutions. It is a course about developing engineering judgment. As code generation becomes cheaper and more accessible, the ability to measure, reason, and decide becomes essential. This course is designed to develop precisely that ability.
Details specific to the 2026 edition of the course can be found on the course web page: https://torres.ai/HPC4AI-MEI
Teachers
Person in charge
Jordi Torres Viñals (
)
Weekly hours
Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
7.1
Competences
Technical Competences of each Specialization
Direcció i gestió
CDG1 - Capability to integrate technologies, applications, services and systems of Informatics Engineering, in general and in broader and multicisciplinary contexts.
Especifics
CTE6 - Capability to design and evaluate operating systems and servers, and applications and systems based on distributed computing.
CTE9 - Capability to apply mathematical, statistical and artificial intelligence methods to model, design and develop applications, services, intelligent systems and knowledge-based systems.
Generic Technical Competences
Generic
CG1 - Capability to plan, calculate and design products, processes and facilities in all areas of Computer Science.
CG4 - Capacity for mathematical modeling, calculation and simulation in technology and engineering companies centers, particularly in research, development and innovation tasks in all areas related to Informatics Engineering.
CG6 - Capacity for general management, technical management and research projects management, development and innovation in companies and technology centers in the area of Computer Science.
CG7 - Capacity for implementation, direction and management of computer manufacturing processes, with guarantee of safety for people and assets, the final quality of the products and their homologation.
CG8 - Capability to apply the acquired knowledge and to solve problems in new or unfamiliar environments inside broad and multidisciplinary contexts, being able to integrate this knowledge.
Transversal Competences
Appropiate attitude towards work
CTR5 - Capability to be motivated by professional achievement and to face new challenges, to have a broad vision of the possibilities of a career in the field of informatics engineering. Capability to be motivated by quality and continuous improvement, and to act strictly on professional development. Capability to adapt to technological or organizational changes. Capacity for working in absence of information and/or with time and/or resources constraints.
Basic
CB6 - Ability to apply the acquired knowledge and capacity for solving problems in new or unknown environments within broader (or multidisciplinary) contexts related to their area of study.
CB8 - Capability to communicate their conclusions, and the knowledge and rationale underpinning these, to both skilled and unskilled public in a clear and unambiguous way.
CB9 - Possession of the learning skills that enable the students to continue studying in a way that will be mainly self-directed or autonomous.
Objectives
OE1: Foundations of HPC platforms for AI: understand the architecture, main components, and software environment of modern high-performance computing platforms designed for artificial intelligence workloads.
Related competences:
CTE6,
CG1,
CG6,
CG7,
CG8,
OE2: Practical use of a supercomputer for AI workloads: acquire basic autonomy in using a real supercomputer, including access, resource management, and job execution for artificial intelligence applications.
Related competences:
CB6,
CTE6,
CG1,
CG8,
OE3: Fundamentals of Deep Learning for HPC users: understand the fundamental principles of Deep Learning required to train models in high-performance computing environments, without requiring advanced prior knowledge.
Related competences:
CTE9,
CG4,
CG8,
OE4: Parallel training of Deep Learning models: understand and apply parallel training techniques for Deep Learning models using multiple GPUs within single or multiple compute nodes.
Related competences:
CB6,
CB9,
CTE6,
CTE9,
CG1,
OE5: Performance analysis and optimization of AI training: analyze the performance of artificial intelligence model training using metrics such as throughput, speedup, and efficiency, and apply basic optimization techniques.
Related competences:
CTE6,
CTE9,
CG1,
CG4,
OE6: Experimental evaluation and communication of results: experimentally evaluate results obtained in a supercomputing environment and communicate technical conclusions in a clear, structured, and well-argued manner.
Related competences:
CB8,
CB9,
CTR5,
CDG1,
Contents
C1: HPC platforms and software ecosystem for AI
Architecture of modern supercomputers, hardware components, operating system, and software stack for artificial intelligence workloads.
C2: Accessing and using a supercomputer for AI workloads
Access to a supercomputer, account management, batch systems, SLURM, and job execution for Deep Learning applications.
C3: Deep Learning fundamentals for HPC environments
Basic Deep Learning concepts required to train models in HPC environments, including neural networks, training workflows, and datasets (any background can be assumed).
C4: Parallel training of Deep Learning models
Parallel training of Deep Learning models using multiple GPUs, including parallelism strategies and programming frameworks.
C5: Performance metrics and optimization of AI training
Performance analysis of AI model training using metrics such as throughput, speedup, and efficiency, and basic optimization techniques.
C6: Experimental evaluation and presentation of results
Experimental evaluation of results obtained in an HPC environment and clear communication of conclusions through technical reports and presentations.
The course follows an active learning and continuous assessment approach, combining theoretical lectures, hands-on laboratory work, autonomous learning, and student presentations.
Theoretical sessions are delivered through participatory lectures, where the instructor introduces the fundamental concepts related to high-performance computing platforms, deep learning fundamentals, parallel training strategies, and performance analysis for artificial intelligence workloads. Students are expected to actively participate in discussions during these sessions.
Hands-on activities constitute a central component of the course and are based on a learn-by-doing methodology. These activities focus on practical experimentation using a real supercomputing environment (MareNostrum 5). Part of the hands-on work is carried out during regular class sessions, while the remaining work is completed outside the classroom as autonomous learning. All hands-on activities require the submission of corresponding reports and, in some cases, technical presentations through the institutional learning platform (Racó).
Autonomous learning is mainly based on the detailed study of the course textbook, which constitutes the main reference material for the subject. Students are also required to prepare presentations and technical material related to their practical work.
Student presentations play an important role in the course. Individual students or groups are randomly selected to present their work and results in class. Peer evaluation is incorporated as part of the learning process, encouraging critical analysis and constructive feedback.
Regular attendance and active participation are expected. Students are responsible for all material covered in class, including announcements, assignments, and project guidelines, regardless of attendance. It is the student¿s responsibility to obtain any missed material.
Evaluation methodology
The evaluation of this course is based on a continuous assessment system, strongly focused on practical work and active participation.
The final grade is composed of the following elements:
- Attendance and participation: 20%
Regular attendance and active participation in lectures, discussions, and hands-on sessions.
Attendance is mandatory. To qualify for continuous assessment, students must attend at least 80% of the class sessions.
- Hands-on activities (laboratory work): 60%
Evaluation of the practical laboratory activities carried out throughout the course (LAB 0 to LAB 4).
The instructor will assess the submitted work using a rubric that considers correctness, completeness, experimental results, and technical understanding.
Some students or groups will be randomly selected during the course to present and explain their laboratory work (LAB 0 to LAB 2). This mechanism is intended to ensure that all students prepare and understand their work thoroughly.
- Technical presentations and peer evaluation: 20%
During the final session of the course, all students will present either LAB 3 or LAB 4 (assigned randomly).
Presentations will be evaluated by the instructor and through peer evaluation, which will contribute to the final presentation grade.
Attendance on the presentation day is mandatory. Students who do not attend this session will not receive the presentation grade.
Requirements for continuous assessment: To qualify for continuous assessment, students must meet all the following requirements:
- Attendance: at least 80% of the class sessions.
- Hands-on activities: completion of at least 50% of the laboratory work.
Final exam option
- Students who do not meet the requirements for continuous assessment will have the option to take a final exam.
- This exam will evaluate the entire course content, including theoretical concepts, practical knowledge, and autonomous learning material based on the course book and laboratory activities.
- The final exam will be announced during the course. No documentation (printed or digital) will be allowed during the exam.
Bibliography
Basic:
Supercomputing for Artificial Intelligence: Foundations, Architectures, and Scaling Deep Learning -
Torres, Jordi,
WATCH THIS SPACE Book Series - Barcelona. Amazon KDP, 2025. ISBN: 979-831932835-9
Python is the programming language of choice for the labs' sessions of this course. It is assumed that the student has a basic knowledge of Python prior to starting classes. Also, some experience with Linux basics will be necessary.