Credits
6
Types
Compulsory
Requirements
This subject has not requirements
, but it has got previous capacities
Department
DAC;UAB
explores parallel systems based on general-purpose cache-based microprocessor architectures as well as modern accelerators such as GPUs. The course includes basic concepts
regarding performance measurement, programming paradigms and tools, and some common parallel algorithms.
Among other goals, this subject aims at:
1. Introducing the student into the world of high performance computing systems, in order to understand the evolution of modern computer architectures and their usage to solve
demanding problems.
2. Helping the students to analyze a given problem for possibilities of optimizing the code performance by using appropriate parallel programming paradigms efficiently.
3. Teaching the student to select algorithms and hardware for the solution of high performance projects, and the skills needed to write, run and assess the performance of parallel programs on different hardware architectures and software environments.
Teachers
Person in charge
- Josep Ramon Herrero Zaragoza ( josepr@ac.upc.edu )
Others
- Christian Guzman Ruiz ( christian.guzman@uab.cat )
- Miquel Angel Senar Rosell ( miquelangel.senar@uab.cat )
- Sandra Adriana Mendez Valerio ( sandra.adriana.mendez@upc.edu )
Weekly hours
Theory
2
Problems
2
Laboratory
0
Guided learning
0
Autonomous learning
6
Competences
Knowledge
Skills
Competences
Objectives
-
Ability to formulate simple performance models given a parallelization strategy for an application, that allows an estimation of the influence of major architectural aspects: number of processing elements, data access cost and cost of interaction between processing elements, among others.
Related competences: K3, S8, -
Ability to identify the different types of parallelism that can be exploited in a computer architecture (ILP, TLP, and DLP within a processor, multiprocessor and multicomputer) and describe its principles of operation.
Related competences: K4, -
Ability to compile and execute a parallel program, using the essential command-line tools to measure the execution time.
Related competences: K4, -
Ability to choose the most appropriate decomposition strategy to express parallelism in an application (tasks, data).
Related competences: K4, S7, S8, -
Ability to apply the basic techniques to synchronize parallel execution, avoiding race conditions and deadlock and enabling the overlap between computation and interaction, among others.
Related competences: K4, C6, -
Ability to program in MPI the parallel version of a sequential application.
Related competences: K4, S7, -
Ability to program in OpenMP the parallel version of a sequential application.
Related competences: K4, S7, -
Ability to program in OpenACC the parallel version of a sequential application.
Related competences: K4, S7, -
Ability to measure, using instrumentation, visualization and analysis tools, the performance achieved with the implementation of a parallel application and to detect factors that limit this performance: granularity of tasks, equitable load and interaction between tasks, among others.
Related competences: K4, S7, S8, C6,
Contents
-
Introduction to high performance computing and parallel computing
Introduction to high performance computing, parallel architectures and parallel computing -
Understanding Paralallelism
Theoretical concepts. Performance metrics. -
Distributed memory programming with MPI
Distributed memory architectures: programming with Message Passing Interface (MPI) -
Performance modeling
Ability to formulate simple performance models given a parallelization strategy for an application, that allows an estimation of the influence of major architectural aspects: number of processing elements, data access cost and cost of interaction between processing elements, among others. -
Shared memory programming with OpenMP
Shared memory architectures: programming with OpenMP -
Cluster Computing and Batch Queue Systems
Cluster Computing and Batch Queue Systems: SLURM -
Python for HPC
Python for High Performance Computing -
GPU computing and OpenACC
Computation with Graphics Processing Units (GPUs) using OpenACC
Activities
Activity Evaluation act
Theoretical expository lectures
To take ownership of their learning process. This involves attending classes, labs, and tutorials punctually and participating actively in discussions. Students are responsible for managing their time effectively to complete all required assignments, readings, and projects by the specified deadlines. They must also adhere to the university's academic honesty policies, seek clarification when material is unclear, and respect the learning environment by engaging with peers and the teacher in a professional manner.- Theory: Theory and problem-solving classes
- Autonomous learning: Personal work on theory and problem solving
Contents:
- 1 . Introduction to high performance computing and parallel computing
- 2 . Understanding Paralallelism
- 3 . Distributed memory programming with MPI
- 4 . Performance modeling
- 5 . Shared memory programming with OpenMP
- 6 . Cluster Computing and Batch Queue Systems
- 7 . Python for HPC
- 8 . GPU computing and OpenACC
Theory
27h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
45h
Practical Hands-on sessions
The student must develop practical assignments using a parallel cluster. In such environment, the student's duty includes active execution, problem-solving, and resource management. They must: - Prepare in advance by reviewing the theoretical concepts and any pre-lab readings or initial setup instructions before the lab session. - Execute the tasks independently, carefully documenting the procedures, inputs, and outputs, including any errors encountered and how they were resolved. - Develop practical debugging skills and utilize available technical resources (documentation, specific software tools) to solve technical challenges before seeking instructor help. - Submit working, well-documented code/reports that follow the specified formatting, version control, and naming conventions.- Problems: Practical Hands-on sessions
- Autonomous learning: Complete the assignments and prepare the deliverables.
Contents:
Theory
0h
Problems
28h
Laboratory
0h
Guided learning
0h
Autonomous learning
45h
Teaching methodology
Theoretical classes amount to two hours per week. They introduce all the knowledge, techniques, concepts needed to solve problems and lab assignments. Some problems will be solved during these theoretical sessions. The student is expected to work on the rest as personal work using a collection of problems. Additionally, two hours of laboratory sessions are also held weekly; active participation and performance during the laboratory sessions will be valued (work during the session, advancing as far as possible in order to achieve the objectives of each session).Students must work autonomously prior to the sessions in order to prepare for the contact sessions and use time well. Afterwards, they have to work autonomously in order to practice and consolidate concepts and abilities developed throughout the course.
For the practical assignments and the problems in this course, mainly the C programming language and the MPI, OpenMP and OpenACC parallel programming models will be used. The practical laboratory sessions will be done in a parallel machine running the Linux operating system. The access to such cluster will be done using the student's laptop. Thus, the students must make sure that they can connect to a Linux cluster from their laptop.
Evaluation methodology
The course grade will be computed as the weighted average of 6 grades, (3 grades for each of the parts of the course):CG = 0.05 * LS1 + 0.15 * LE1 + 0.3 * TE1 + 0.05 * LS2 + 0.15 * LE2 + 0.3 * TE2
Where:
LS_i relates to the submission of the deliverables of the practical assignments for the i-th part.
LE_i is the laboratory exam for the i-th part.
TE_i is the theory exam for the i-th part.
IMPORTANT: Completion and presentation of all laboratory follow-up reports (LS_i) is a necessary condition to pass the subject. Only reports with a minimum of content are considered a report prepared and presented. Empty reports or with only the questions, for example, are not considered completed or submitted.
Students will pass the course if they have completed all exams and obtained a course grade (CG) equal to or greater than 5 over 10.
Students who fail the course with a final average grade above 3 over 10, will be allowed to take a recovery exam for the theoretical part. The recovery exam will consist of two parts: there will be a retake exam for each of the 2 parts (TRE1 and/or TRE2) for students who failed that specific part. In the re-evaluation exam only the theoretical part of the course can be retaken. The grade TRE_i will then replace the grade obtained in the regular course for the corresponding theory exam TE_i.
Bibliography
Basic
-
Introduction to parallel computing
- Grama, Ananth,
Pearson Addison Wesley,
2003.
ISBN: 9780201648652
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003524559706711&context=L&vid=34CSUC_UPC:VU1 -
Introduction to high performance computing for scientists and engineers
- Hager, Georg; Wellein, Gerhard,
CRC Press,
cop. 2011.
ISBN: 9781439811924
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991005256675306711&context=L&vid=34CSUC_UPC:VU1 -
Using MPI : portable parallel programming with the Message-Passing-Interface
- Gropp, William; Lusk, Ewing; Skjellum, Anthony,
MIT Press,
[2014].
ISBN: 9780262326605
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991005260863706711&context=L&vid=34CSUC_UPC:VU1 -
Using OpenMP : portable shared memory parallel programming
- Chapman, Barbara; Jost, Gabriele; Pas, Ruud van der,
MIT Press,
cop. 2008.
ISBN: 9780262533027
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003339779706711&context=L&vid=34CSUC_UPC:VU1
Complementary
-
Structured parallel programming : patterns for efficient computation
- McCool, Michael; Robison, Arch D; Reinders, James,
Elsevier/Morgan Kaufmann,
[2012].
ISBN: 9786613689603
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991005250023506711&context=L&vid=34CSUC_UPC:VU1 -
Using MPI : portable parallel programming with the Message-Passing-Interface
- Gropp, William; Lusk, Ewing; Skjellum, Anthony,
MIT Press,
[2014].
ISBN: 9780262326605
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991005260863706711&context=L&vid=34CSUC_UPC:VU1 -
The Science of Computing (formerly: Introduction to High-Performance Scientific Computing)
- Eijkhout, Victor ,
2022.
https://github.com/VictorEijkhout/TheArtofHPC_pdfs/blob/main/vol1/EijkhoutIntroToHPC.pdf -
Parallel Programming in MPI and OpenMP The Art of HPC, volume 2
- Eijkhout, Victor,
2022.
https://github.com/VictorEijkhout/TheArtofHPC_pdfs/blob/main/vol2/EijkhoutParallelProgramming.pdf
Web links
- Message Passing Interface (MPI): Tutorial https://computing.llnl.gov/tutorials/mpi/
- OpenMP Tutorial https://hpc-tutorials.llnl.gov/openmp/
- Introduction to Parallel Computing Tutorial https://hpc.llnl.gov/documentation/tutorials/introduction-parallel-computing-tutorial
- Slurm cluster management and job scheduling system for large and small Linux clusters https://slurm.schedmd.com/
- MPI Documents https://www.mpi-forum.org/docs/
- OpenACC high-level directives-based programming model for high performance computing for CPUs, GPUs and a variety of accelerators https://www.openacc.org/
- OpenMP API specification and Resources https://www.openmp.org/resources/