Supercomputers Architecture

You are here

Credits
6
Types
Specialization complementary (High Performance Computing)
Requirements
This subject has not requirements, but it has got previous capacities
Department
AC
Supercomputers represent the leading edge in high performance computer technology. This course will describe all elements in the system architecture of a supercomputer, from the shared memory multiprocessor in the compute node, to the interconnection network and distributed memory cluster, including infrastructures that host them. We will also discuss the their building blocks and the system software stack, including their parallel programming models, exploiting parallelism is central for greater computational power . We will introduce the continuous development of supercomputing systems enabling its convergence with the advanced analytic algorithms required in today's world. At this point, We will pay special attention to Deep Learning algorithms and its executions on a GPU platform. The practical component is the most important part of this subject. In this course the “learn by doing” method is used, with a set of Hands-on, based on problems that the students must carry out throughout the course. The course will be marked by continuous assessment which ensures constant and steady work. The method is also based on teamwork and a ‘learn to learn' approach reading and presenting papers. Thus the student is able to adapt and anticipate new technologies that will arise in the coming years. For the Labs we will use supercomputing facilities from the Barcelona Supercomputing Center (BSC-CNS).

Teachers

Person in charge

  • Jordi Torres Viñals ( )

Weekly hours

Theory
2
Problems
0
Laboratory
2
Guided learning
0.15
Autonomous learning
7.7

Competences

Technical Competences of each Specialization

High performance computing

  • CEE4.1 - Capability to analyze, evaluate and design computers and to propose new techniques for improvement in its architecture.
  • CEE4.2 - Capability to analyze, evaluate, design and optimize software considering the architecture and to propose new optimization techniques.
  • CEE4.3 - Capability to analyze, evaluate, design and manage system software in supercomputing environments.

Generic Technical Competences

Generic

  • CG1 - Capability to apply the scientific method to study and analyse of phenomena and systems in any area of Computer Science, and in the conception, design and implementation of innovative and original solutions.

Transversal Competences

Teamwork

  • CTR3 - Capacity of being able to work as a team member, either as a regular member or performing directive activities, in order to help the development of projects in a pragmatic manner and with sense of responsibility; capability to take into account the available resources.

Basic

  • CB6 - Ability to apply the acquired knowledge and capacity for solving problems in new or unknown environments within broader (or multidisciplinary) contexts related to their area of study.
  • CB8 - Capability to communicate their conclusions, and the knowledge and rationale underpinning these, to both skilled and unskilled public in a clear and unambiguous way.
  • CB9 - Possession of the learning skills that enable the students to continue studying in a way that will be mainly self-directed or autonomous.

Objectives

  1. To train students to follow by themselves the continuous development of supercomputing systems that enable the convergence of advanced analytic algorithms as artificial intelligence.
    Related competences: CB6, CB8, CB9, CTR3, CG1, CEE4.1, CEE4.2, CEE4.3,

Contents

  1. Course content and motivation
  2. Supercomputing Basics
  3. Supercomputers Architecture
  4. Supercomputers Benchmarking
  5. General Purpose Supercomputers
  6. Resource Management in Supercomputers
  7. Parallel Programming Models and Motivation
  8. MPI basics
  9. Taking Time
  10. OpenMP basics
  11. MPI Advanced
  12. Parallel Performance
  13. Heterogeneous Supercomputers
  14. Accelerator Architecture
  15. Getting Started with CUDA Programming Model
  16. Supercomputing, the heart of Deep Learning
  17. Software Stack for Artificial Intelligence
  18. Deep Learning Basics Concepts
  19. Computing Performance
  20. Training on Multiple GPUs
  21. Training on Multiple Servers

Activities

Activity Evaluation act


Course content and motivation


Objectives: 1
Contents:
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h

Supercomputing Basics



Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h

HPC Building Blocks (general purpose blocks)



Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h

HPC Software Stack (general purpose blocks)



Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h

Parallel Programming Models: OpenMP



Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h

Parallel Programming Models: MPI



Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h

Parallel Performance Metrics and Measurements



Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

HPC Building Blocks for AI servers



Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h

Coprocessors and Programming Models



Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h

Powering Artificial Intelligence, Machine Learning and Deep Learning with Supercomputing



Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h

Parallel AI platforms and its software stack



Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h

Distributed AI platforms and its software stack



Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h

Conclusions and remarks: Towards Exascale Computing



Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h

1- Supercomputing Building Blocks



Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0.2h
Autonomous learning
3h

2- Getting Started with Supercomputing



Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0.2h
Autonomous learning
6h

3- Getting Started with Parallel Programming Models



Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0.1h
Autonomous learning
5h

4- Getting Started with Parallel Performance Metrics



Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0.2h
Autonomous learning
5h

5- Getting Started with Parallel Performance Model – I



Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0.2h
Autonomous learning
5h

6- Getting Started with Parallel Performance Model – II



Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0.1h
Autonomous learning
5h

7- Getting Started with GPU based Supercomputing



Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0.1h
Autonomous learning
5h

8- Getting Started with CUDA programming model



Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0.2h
Autonomous learning
6h

9- Getting Started with Deep Learning Frameworks in a Supercomputer



Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0.1h
Autonomous learning
6h

10- Getting Started with Deep Learning basic model



Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0.2h
Autonomous learning
6h

11- Getting Started with a Deep Learning real problems and its solutions



Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0.1h
Autonomous learning
6h

12- Getting Started with parallelization of a Deep Learning problems



Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0.1h
Autonomous learning
5h

13- Getting Started with a distributed Deep Learning problems



Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0.2h
Autonomous learning
8h

Teaching methodology

The theoretical part of the course will follow the slides designed by the teacher. The practical component is the most important part of this subject. In this course the “learn by doing” method is used, with a set of Hands-on, based on problems that the students must carry out throughout the course. The course will be marked by a continuous assessment that ensures constant and steady work. The method is also based on teamwork and a ‘learn to learn' approach reading and presenting papers. Thus the student is able to adapt and anticipate new technologies that will arise in the coming years.
Class attendance and participation: Regular and consistent attendance is expected and to be able to discuss concepts covered during class.

Lab activities: Hands-on sessions will be conducted during the course using supercomputing facilities. Each hands-on will involve writing a lab report with all the results.

Reading/Presentation Assignments: 6 assignments that include reading documentation/papers that expand the concepts introduced during lectures.

Assessment: There will be 2 short midterm exams along the course (and some pop quiz that can be used to replace attendance if due to the situation it is required).

Student presentation: Students/groups randomly chosen will present the reading assignment (presentations/projects).

Evaluation methodology

The evaluation of this course will take into account different items (tentative):

Attendance (minimum 80% required) & participation in class will account for 20% of the grade.
Readings, Presentations (and Homework) will account for 20 % of the grade.
Exams will account for 20% of the grade.
Lab sessions (+ Lab reports) will account for 40 % of the grade.

Bibliography

Basic:

  • Class handouts and materials associated with this class - Torres, J, 2019.
  • Understanding Supercomputing, to speed up machine learning algorithms (Course notes) - Torres, J, 2018.
  • Marenostrum4 User's guide - BSC documentation, Operations department, 2019.
  • High performance computing : modern systems and practices - Sterling, T.; Anderson, M.; Brodowicz, M, Morgan Kaufmann, 2018. ISBN: 9780124201583
    http://cataleg.upc.edu/record=b1519884~S1*cat
  • Dive into deep learning - Zhang, A.; Lipton, Z.C.; Li, M.; Smola, A.J, 2020.
  • First contact with Deep learning: practical introduction with Keras - Torres, J, Kindle Direct Publishing, 2018. ISBN: 9781983211553
    http://cataleg.upc.edu/record=b1510639~S1*cat

Web links

Previous capacities

Programming in C and Linux basics will be expected in the course. Prior exposure to parallel programming constructions, Python language, experience .

Addendum

Contents

THERE ARE NOT CHANGES IN THE CONTENTS REGARDING THE INFORMATION IN THE COURSE GUIDE.

Teaching methodology

THE COURSE IS PLANNED WITH 100% PRESENTIALITY, SO THERE ARE NOT CHANGES IN THE METHODOLOGY REGARDING THE INFORMATION IN THE COURSE GUIDE.

Evaluation methodology

THERE ARE NOT CHANGES IN THE EVALUATION METHOD REGARDING THE INFORMATION IN THE COURSE GUIDE.

Contingency plan

IF THE COURSE HAS TO BE OFFERED WITH REDUCED PRESENTIALITY OR NOT PRESENTIALLY, THERE WILL BE NOT CHANGES IN THE CONTENTS AND THE EVALUATION METHOD, BUT THE METHODOLOGY WILL BE ADAPTED TO ALLOW FOLLOWING THE COURSE REMOTELY, INCLUDING AMONG OTHERS: * USE THE 'RACÓ' TO DOWNLOAD THE SLIDES, EXERCISES, PRACTICAL ASSIGNMENTS, AND OTHER DOCUMENTATION * USE VIDEO AND/OR SCREENCAST MATERIAL FOR ASYNCHRONOUS LECTURES AND PRACTICAL CLASSES * USE VIDEOCONFERENCE FOR SYNCHRONOUS LECTURES AND PRACTICAL CLASSES * USE THE 'RACÓ' FOR ASSIGNMENT SUBMISSIONS * USE MAIL AND/OR THE FORUM FOR ASYNCHRONOUS CONSULTATION * USE CHAT AND/OR VIDEOCONFERENCE FOR SYNCHRONOUS CONSULTATION * TENTATIVE TO USE "ATENEA" OR "RACÓ" FOR TWO MIDTERM EVALUATION