High Performance Computing

Credits

Types

Compulsory

Requirements

This subject has not requirements , but it has got previous capacities

Department

DAC;UAB

HPC is a mandatory subject of the Curriculum of the Bioinformatics grade which covers the basic principles of high performance computing and parallel programing. The course mainly
explores parallel systems based on general-purpose cache-based microprocessor architectures as well as modern accelerators such as GPUs. The course includes basic concepts
regarding performance measurement, programming paradigms and tools, and some common parallel algorithms.
Among other goals, this subject aims at:
1. Introducing the student into the world of high performance computing systems, in order to understand the evolution of modern computer architectures and their usage to solve
demanding problems.
2. Helping the students to analyze a given problem for possibilities of optimizing the code performance by using appropriate parallel programming paradigms efficiently.
3. Teaching the student to select algorithms and hardware for the solution of high performance projects, and the skills needed to write, run and assess the performance of parallel programs on different hardware architectures and software environments.

Teachers

Person in charge

Josep Ramon Herrero Zaragoza (josepr@ac.upc.edu)

Others

Christian Guzman Ruiz (christian.guzman@uab.cat)
Miquel Angel Senar Rosell (miquelangel.senar@uab.cat)
Sandra Adriana Mendez Valerio (sandra.adriana.mendez@upc.edu)

Weekly hours

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Competences

Learning Outcomes

Knowledge

K3 - Identify the mathematical foundations, computational theories, algorithmic schemes and information organization principles applicable to the modeling of biological systems and to the efficient solution of bioinformatics problems through the design of computational tools.

K4 - Integrate the concepts offered by the most widely used programming languages in the field of Life Sciences to model and optimize data structures and build efficient algorithms, relating them to each other and to their application cases.

Skills

S7 - Implement programming methods and data analysis based on the development of working hypotheses within the area of study.

S8 - Make decisions, and defend them with arguments, in the resolution of problems in the areas of biology, as well as, within the appropriate fields, health sciences, computer sciences and experimental sciences.

Competences

C6 - Detect deficiencies in the own knowledge and overcome them through critical reflection and the choice of the best action to expand this knowledge.

Objectives

Ability to formulate simple performance models given a parallelization strategy for an application, that allows an estimation of the influence of major architectural aspects: number of processing elements, data access cost and cost of interaction between processing elements, among others.
Related competences: K3, S8,
Ability to identify the different types of parallelism that can be exploited in a computer architecture (ILP, TLP, and DLP within a processor, multiprocessor and multicomputer) and describe its principles of operation.
Related competences: K4,
Ability to compile and execute a parallel program, using the essential command-line tools to measure the execution time.
Related competences: K4,
Ability to choose the most appropriate decomposition strategy to express parallelism in an application (tasks, data).
Related competences: K4, S7, S8,
Ability to apply the basic techniques to synchronize parallel execution, avoiding race conditions and deadlock and enabling the overlap between computation and interaction, among others.
Related competences: K4, C6,
Ability to program in MPI the parallel version of a sequential application.
Related competences: K4, S7,
Ability to program in OpenMP the parallel version of a sequential application.
Related competences: K4, S7,
Ability to program in OpenACC the parallel version of a sequential application.
Related competences: K4, S7,
Ability to measure, using instrumentation, visualization and analysis tools, the performance achieved with the implementation of a parallel application and to detect factors that limit this performance: granularity of tasks, equitable load and interaction between tasks, among others.
Related competences: K4, S7, S8, C6,

Introduction to high performance computing and parallel computing
Introduction to high performance computing, parallel architectures and parallel computing
Understanding Paralallelism
Theoretical concepts. Performance metrics.
Distributed memory programming with MPI
Distributed memory architectures: programming with Message Passing Interface (MPI)
Performance modeling
Ability to formulate simple performance models given a parallelization strategy for an application, that allows an estimation of the influence of major architectural aspects: number of processing elements, data access cost and cost of interaction between processing elements, among others.
Shared memory programming with OpenMP
Shared memory architectures: programming with OpenMP
Cluster Computing and Batch Queue Systems
Cluster Computing and Batch Queue Systems: SLURM
Python for HPC
Python for High Performance Computing
GPU computing and OpenACC
Computation with Graphics Processing Units (GPUs) using OpenACC

Activities

Activity Evaluation act

Theoretical expository lectures

To take ownership of their learning process. This involves attending classes, labs, and tutorials punctually and participating actively in discussions. Students are responsible for managing their time effectively to complete all required assignments, readings, and projects by the specified deadlines. They must also adhere to the university's academic honesty policies, seek clarification when material is unclear, and respect the learning environment by engaging with peers and the teacher in a professional manner.

Theory: Theory and problem-solving classes
Autonomous learning: Personal work on theory and problem solving

Objectives: 1 4 5 2 6 7 8
Contents:

1 . Introduction to high performance computing and parallel computing
2 . Understanding Paralallelism
3 . Distributed memory programming with MPI
4 . Performance modeling
5 . Shared memory programming with OpenMP
6 . Cluster Computing and Batch Queue Systems
7 . Python for HPC
8 . GPU computing and OpenACC

Theory

27h

Problems

Laboratory

Guided learning

Autonomous learning

45h

Practical Hands-on sessions

The student must develop practical assignments using a parallel cluster. In such environment, the student's duty includes active execution, problem-solving, and resource management. They must: - Prepare in advance by reviewing the theoretical concepts and any pre-lab readings or initial setup instructions before the lab session. - Execute the tasks independently, carefully documenting the procedures, inputs, and outputs, including any errors encountered and how they were resolved. - Develop practical debugging skills and utilize available technical resources (documentation, specific software tools) to solve technical challenges before seeking instructor help. - Submit working, well-documented code/reports that follow the specified formatting, version control, and naming conventions.

Problems: Practical Hands-on sessions
Autonomous learning: Complete the assignments and prepare the deliverables.

Objectives: 3 4 5 6 7 8 9
Contents:

3 . Distributed memory programming with MPI
5 . Shared memory programming with OpenMP
6 . Cluster Computing and Batch Queue Systems
7 . Python for HPC
8 . GPU computing and OpenACC

Theory

Problems

28h

Laboratory

Guided learning

Autonomous learning

45h

Theory Exam 1st part (TE1)

Theory Exam 1st part (TE1)
Objectives: 1 4 5 2 6 9
Week: 8 (Outside class hours)

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Laboratory Exam 1st part (LE1)

Laboratory Exam 1st part (LE1)
Objectives: 3 4 5 6 9
Week: 8 (Outside class hours)

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Theory Exam 2nd part (TE2)

Theory Exam 2nd part (TE2)
Objectives: 4 5 2 7 8
Week: 16 (Outside class hours)

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Laboratory Exam 2nd part (LE2)

Laboratory Exam 2nd part (LE2)
Objectives: 3 4 7 8 9
Week: 16 (Outside class hours)

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Teaching methodology

Theoretical classes amount to two hours per week. They introduce all the knowledge, techniques, concepts needed to solve problems and lab assignments. Some problems will be solved during these theoretical sessions. The student is expected to work on the rest as personal work using a collection of problems. Additionally, two hours of laboratory sessions are also held weekly; active participation and performance during the laboratory sessions will be valued (work during the session, advancing as far as possible in order to achieve the objectives of each session).
Students must work autonomously prior to the sessions in order to prepare for the contact sessions and use time well. Afterwards, they have to work autonomously in order to practice and consolidate concepts and abilities developed throughout the course.
For the practical assignments and the problems in this course, mainly the C programming language and the MPI, OpenMP and OpenACC parallel programming models will be used. The practical laboratory sessions will be done in a parallel machine running the Linux operating system. The access to such cluster will be done using the student's laptop. Thus, the students must make sure that they can connect to a Linux cluster from their laptop.

Evaluation methodology

The course grade will be computed as the weighted average of 6 grades, (3 grades for each of the parts of the course):

CG = 0.05 * LS1 + 0.15 * LE1 + 0.3 * TE1 + 0.05 * LS2 + 0.15 * LE2 + 0.3 * TE2

Where:

LS_i relates to the submission of the deliverables of the practical assignments for the i-th part.

LE_i is the laboratory exam for the i-th part.

TE_i is the theory exam for the i-th part.

IMPORTANT: Completion and presentation of all laboratory follow-up reports (LS_i) is a necessary condition to pass the subject. Only reports with a minimum of content are considered a report prepared and presented. Empty reports or with only the questions, for example, are not considered completed or submitted.

Students will pass the course if they have completed all exams and obtained a course grade (CG) equal to or greater than 5 over 10.

Students who fail the course with a final average grade above 3 over 10, will be allowed to take a recovery exam for the theoretical part. The recovery exam will consist of two parts: there will be a retake exam for each of the 2 parts (TRE1 and/or TRE2) for students who failed that specific part. In the re-evaluation exam only the theoretical part of the course can be retaken. The grade TRE_i will then replace the grade obtained in the regular course for the corresponding theory exam TE_i.

Bibliography

Basic

Introduction to parallel computing - Grama, Ananth, Pearson Addison Wesley, 2003. ISBN: 9780201648652
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003524559706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Introduction to high performance computing for scientists and engineers - Hager, Georg; Wellein, Gerhard, CRC Press, cop. 2011. ISBN: 9781439811924
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991005256675306711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Using MPI : portable parallel programming with the Message-Passing-Interface - Gropp, William; Lusk, Ewing; Skjellum, Anthony, MIT Press, [2014]. ISBN: 9780262326605
https://ebookcentral-proquest-com.recursos.biblioteca.upc.edu/lib/upcatalunya-ebooks/detail.action?pq-origsite=primo&docID=3339903
Using OpenMP : portable shared memory parallel programming - Chapman, Barbara; Jost, Gabriele; Pas, Ruud van der, MIT Press, cop. 2008. ISBN: 9780262533027
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003339779706711&context=L&vid=34CSUC_UPC:VU1&lang=ca

Complementary

Structured parallel programming : patterns for efficient computation - McCool, Michael; Robison, Arch D; Reinders, James, Elsevier/Morgan Kaufmann, [2012]. ISBN: 9786613689603
https://www-sciencedirect-com.recursos.biblioteca.upc.edu/book/monograph/9780124159938/structured-parallel-programming
Using MPI : portable parallel programming with the Message-Passing-Interface - Gropp, William; Lusk, Ewing; Skjellum, Anthony, MIT Press, [2014]. ISBN: 9780262326605
https://ebookcentral-proquest-com.recursos.biblioteca.upc.edu/lib/upcatalunya-ebooks/detail.action?pq-origsite=primo&docID=3339903
The Science of Computing (formerly: Introduction to High-Performance Scientific Computing) - Eijkhout, Victor, 2022.
https://people.montefiore.uliege.be/geuzaine/INFO0939/assets/textbook/EijkhoutIntroToHPC.pdf
Parallel Programming in MPI and OpenMP The Art of HPC, volume 2 - Eijkhout, Victor, 2022.
https://www.cse.iitd.ac.in/~rijurekha/col380_2024/mpi_book.pdf

Web links

Message Passing Interface (MPI): Tutorial https://computing.llnl.gov/tutorials/mpi/
OpenMP Tutorial https://hpc-tutorials.llnl.gov/openmp/
Introduction to Parallel Computing Tutorial https://hpc.llnl.gov/documentation/tutorials/introduction-parallel-computing-tutorial
Slurm cluster management and job scheduling system for large and small Linux clusters https://slurm.schedmd.com/
MPI Documents https://www.mpi-forum.org/docs/
OpenACC high-level directives-based programming model for high performance computing for CPUs, GPUs and a variety of accelerators https://www.openacc.org/
OpenMP API specification and Resources https://www.openmp.org/resources/

Previous capacities

To follow this course satisfactorily, participants must have a solid foundation in sequential imperative programming and fundamental data structures. Knowledge of the C programming language with a specific focus on pointers and dynamic memory management is a must. Familiarity with the Linux/Unix command line and basic build tools is needed to navigate the lab environment effectively. Additionally, students should be comfortable with the manipulation of mathematical expressions, as this skill is necessary to understand the theoretical concepts and performance models introduced throughout the sessions. However, no prior knowledge of parallel programming (MPI, OpenMP, or OpenACC) is required.