This course provides basic knowledge about:1) the different levels of parallelism that we can find in current architectures (at the level of instructions or ILP, at the level of data or DLP and at the level of execution flow or TLP); 2) how the memory hierarchy is organized to support them; and 3) the mechanisms that allow them to be exploited from the point of view of application programming. This knowledge will allow the understanding of the opportunities offered by these architectures to address the computational needs of most artificial intelligence applications.
Teachers
Person in charge
Eduard Ayguadé Parra (
)
Others
Josep Lluís Berral García (
)
Weekly hours
Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6
Competences
Transversal Competences
Transversals
CT3 [Avaluable] - Efficient oral and written communication. Communicate in an oral and written way with other people about the results of learning, thinking and decision making; Participate in debates on topics of the specialty itself.
CT6 [Avaluable] - Autonomous Learning. Detect deficiencies in one's own knowledge and overcome them through critical reflection and the choice of the best action to extend this knowledge.
Technical Competences
Especifics
CE05 - To be able to analyze and evaluate the structure and architecture of computers, as well as the basic components that make them up.
CE07 - To interpret the characteristics, functionalities and structure of Distributed Systems, Computer Networks and the Internet and design and implement applications based on them.
CE11 - To identify and apply the fundamental principles and basic techniques of parallel, concurrent, distributed and real-time programming.
Generic Technical Competences
Generic
CG2 - To use the fundamental knowledge and solid work methodologies acquired during the studies to adapt to the new technological scenarios of the future.
CG3 - To define, evaluate and select hardware and software platforms for the development and execution of computer systems, services and applications in the field of artificial intelligence.
CG5 - Work in multidisciplinary teams and projects related to artificial intelligence and robotics, interacting fluently with engineers and professionals from other disciplines.
CG9 - To face new challenges with a broad vision of the possibilities of a professional career in the field of Artificial Intelligence. Develop the activity applying quality criteria and continuous improvement, and act rigorously in professional development. Adapt to organizational or technological changes. Work in situations of lack of information and / or with time and / or resource restrictions.
Objectives
To acquire knowledge about the basic execution models and performance metrics
Related competences:
CE05,
Subcompetences:
Performance characterisation tools
To acquire knowledge about the architecture of scalar processors and the techniques for exploiting ILP (instruction-level parallelism) and DLP (data-level parallelism)
Related competences:
CG3,
CE05,
Subcompetences:
Scalar optimizations: vectorisation
Understand shared memory architectures, hardware support for memory coherence and synchronization
Related competences:
CG3,
CE05,
Subcompetences:
Shared-memory parallel programming: OpenMP
Understand the distributed memory architectures and the hardware support for data exchange
Related competences:
CG3,
CE05,
CE07,
Subcompetences:
Message-passing parallel programming: MPI
Acquire knowledge about accelerator-based architectures and their access to the memory hierarchy of the scalar processor
Related competences:
CG3,
CE05,
Acquire knowledge and apply the basic techniques of parallel programming, for shared- and distributed-memory multiprocessors
Related competences:
CT6,
CE11,
Ability to discuss and compare the resolution of problems and practical exercises, both in group work or autonomously
Related competences:
CT3,
CT6,
Understand the relation between the course and the AI area
Related competences:
CG2,
CG5,
CG9,
Contents
Execution models and performance metrics
Presentation of the serial, multiprogrammed, concurrent and parallel execution models, together with the basic metrics that characterize their performance.
Scalar processor architecture and code optimization
This unit introduces the basic architecture of the scalar processor and the techniques for increasing parallelism at the instructional level (ILP: pipelined and superscalar design) and at the data level (DLP: vector units). Memory hierarchy optimization and vectorization.
Shared-memory multiprocessors architecture and programming
This unit introduces the UMA (uniform memory access time) and NUMA (non-uniform memory access time) shared-memory multiprocessor architectures, including bus and directory-based consistency mechanisms and the support for synchronization using atomic instructions. It also presents the architecture of a node within a cluster architecture and the components that make it up (processors with multiple execution cores, memory and buses). Parallelization of applications using the tasking model in OpenMP.
Distributed-memory multiprocessor architecture and programming
This unit presents multiprocessor architectures of distributed memory based on message-passing through a scalable interconnection network. Parallelization of applications with the MPI programming model.
Accelerators for artificial intelligence applications
This unit presents the architectures aimed at accelerating the most characteristic computing kernels in artificial intelligence applications: GPU (Graphics Processing Units), TPU (Tensor Processing Units), ... and their integration in the shared-memory nodes of a cluster architecture. Use case: accelerators for Deep Learning environments.
Activities
ActivityEvaluation act
Execution models, performance metrics and analysis tools
The course is based on classroom theory and laboratory sessions. The theory sessions combine lectures and the resolution of exercises, following the program set out in this syllabus and based on the use of own material (slides, collection of exercises, ...). During the sessions, the dialogue and discussions are encouraged to anticipate and consolidate the learning outcomes of the course.
Laboratory sessions cover the aspects related to programming and follow the same subjects in the course syllabus. They are practice sessions using a cluster architecture available from the Computer Architecture Department.
Evaluation methodology
There are two exams for the theory part and one for the laboratory part:
- PT: mid-term theory exam (20%)
- FT: final theory exam (35%)
- FL: final laboratory exam (30%)
Additionally, they will be evaluated continuously:
- SL: laboratory monitoring reports (15%) which will also be used to assess the CT3 and CT6 transversal competencies.
The Final Grade (NF) of the course is obtained from
NF = (0.30 x FL + 0.15 x SL) + MAX(0.55 x FT; (0.20 x PT + 0.35 x FT))
In the case of NF < 5.0 but higher than 3.5, there will be the option of re-evaluation through an exam that will cover the entire course (theory and practices). The grade of said exam will replace the NF grade, whose value may not be greater than 7.