Credits
6
Types
Compulsory
Requirements
This subject has not requirements
, but it has got previous capacities
Department
AC
Web
-
Mail
-
Teachers
Person in charge
- Eduard Ayguadé Parra ( eduard@ac.upc.edu )
Others
- Josep Lluís Berral García ( berral@ac.upc.edu )
Weekly hours
Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6
Competences
Transversals
Especifics
Generic
Objectives
-
To acquire knowledge about the basic execution models and performance metrics
Related competences: CE05,
Subcompetences- Performance characterisation tools
-
To acquire knowledge about the architecture of scalar processors and the techniques for exploiting ILP (instruction-level parallelism) and DLP (data-level parallelism)
Related competences: CG3, CE05,
Subcompetences- Scalar optimizations: vectorisation
-
Understand shared memory architectures, hardware support for memory coherence and synchronization
Related competences: CG3, CE05,
Subcompetences- Shared-memory parallel programming: OpenMP
-
Understand the distributed memory architectures and the hardware support for data exchange
Related competences: CG3, CE05, CE07,
Subcompetences- Message-passing parallel programming: MPI
-
Acquire knowledge about accelerator-based architectures and their access to the memory hierarchy of the scalar processor
Related competences: CG3, CE05, -
Acquire knowledge and apply the basic techniques of parallel programming, for shared- and distributed-memory multiprocessors
Related competences: CT6, CE11, -
Ability to discuss and compare the resolution of problems and practical exercises, both in group work or autonomously
Related competences: CT3, CT6, -
Understand the relation between the course and the AI area
Related competences: CG2, CG5, CG9,
Contents
-
Execution models and performance metrics
Presentation of the serial, multiprogrammed, concurrent and parallel execution models, together with the basic metrics that characterize their performance. -
Scalar processor architecture and code optimization
This unit introduces the basic architecture of the scalar processor and the techniques for increasing parallelism at the instructional level (ILP: pipelined and superscalar design) and at the data level (DLP: vector units). Memory hierarchy optimization and vectorization. -
Shared-memory multiprocessors architecture and programming
This unit introduces the UMA (uniform memory access time) and NUMA (non-uniform memory access time) shared-memory multiprocessor architectures, including bus and directory-based consistency mechanisms and the support for synchronization using atomic instructions. It also presents the architecture of a node within a cluster architecture and the components that make it up (processors with multiple execution cores, memory and buses). Parallelization of applications using the tasking model in OpenMP. -
Distributed-memory multiprocessor architecture and programming
This unit presents multiprocessor architectures of distributed memory based on message-passing through a scalable interconnection network. Parallelization of applications with the MPI programming model. -
Accelerators for artificial intelligence applications
This unit presents the architectures aimed at accelerating the most characteristic computing kernels in artificial intelligence applications: GPU (Graphics Processing Units), TPU (Tensor Processing Units), ... and their integration in the shared-memory nodes of a cluster architecture. Use case: accelerators for Deep Learning environments.
Activities
Activity Evaluation act
Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
2h
Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
2h
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h
Teaching methodology
The course is based on classroom theory and laboratory sessions. The theory sessions combine lectures and the resolution of exercises, following the program set out in this syllabus and based on the use of own material (slides, collection of exercises, ...). During the sessions, the dialogue and discussions are encouraged to anticipate and consolidate the learning outcomes of the course.Laboratory sessions cover the aspects related to programming and follow the same subjects in the course syllabus. They are practice sessions using a cluster architecture available from the Computer Architecture Department.
Evaluation methodology
There are two exams for the theory part and one for the laboratory part:- PT: mid-term theory exam (20%)
- FT: final theory exam (35%)
- FL: final laboratory exam (30%)
Additionally, they will be evaluated continuously:
- SL: laboratory monitoring reports (15%) which will also be used to assess the CT3 and CT6 transversal competencies.
The Final Grade (NF) of the course is obtained from
NF = (0.30 x FL + 0.15 x SL) + MAX(0.55 x FT; (0.20 x PT + 0.35 x FT))
In the case of NF < 5.0 but higher than 3.5 having taken both parts of the final exam, there will be the option of re-evaluation through an exam that will cover the entire course (theory and practices). The grade of the reassessment exam will replace the NF grade, whose value may not be greater than 7.
Bibliography
Basic
-
Computer organization and design: the hardware/software interface
- Patterson, David A and Hennessy, John L,
Morgan Kaufmann,
2017.
ISBN: 9780128017333
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004094079706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Computer architecture: a quantitative approach
- Hennessy, John L. and Patterson, David A.,
Morgan Kaufmann,
2019.
ISBN: 9780128119051
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004117509706711&context=L&vid=34CSUC_UPC:VU1&lang=ca