Parallelism and Distributed Systems

You are here

Credits
6
Types
Compulsory
Requirements
This subject has not requirements, but it has got previous capacities
Department
AC
Web
-
Mail
-
This course provides basic knowledge about:1) the different levels of parallelism that we can find in current architectures (at the level of instructions or ILP, at the level of data or DLP and at the level of execution flow or TLP); 2) how the memory hierarchy is organized to support them; and 3) the mechanisms that allow them to be exploited from the point of view of application programming. This knowledge will allow the understanding of the opportunities offered by these architectures to address the computational needs of most artificial intelligence applications.

Teachers

Person in charge

  • Eduard Ayguadé Parra ( )

Others

  • Josep Lluís Berral García ( )

Weekly hours

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6

Competences

Transversal Competences

Transversals

  • CT3 [Avaluable] - Efficient oral and written communication. Communicate in an oral and written way with other people about the results of learning, thinking and decision making; Participate in debates on topics of the specialty itself.
  • CT6 [Avaluable] - Autonomous Learning. Detect deficiencies in one's own knowledge and overcome them through critical reflection and the choice of the best action to extend this knowledge.

Technical Competences

Especifics

  • CE05 - To be able to analyze and evaluate the structure and architecture of computers, as well as the basic components that make them up.
  • CE07 - To interpret the characteristics, functionalities and structure of Distributed Systems, Computer Networks and the Internet and design and implement applications based on them.
  • CE11 - To identify and apply the fundamental principles and basic techniques of parallel, concurrent, distributed and real-time programming.

Generic Technical Competences

Generic

  • CG2 - To use the fundamental knowledge and solid work methodologies acquired during the studies to adapt to the new technological scenarios of the future.
  • CG3 - To define, evaluate and select hardware and software platforms for the development and execution of computer systems, services and applications in the field of artificial intelligence.
  • CG5 - Work in multidisciplinary teams and projects related to artificial intelligence and robotics, interacting fluently with engineers and professionals from other disciplines.
  • CG9 - To face new challenges with a broad vision of the possibilities of a professional career in the field of Artificial Intelligence. Develop the activity applying quality criteria and continuous improvement, and act rigorously in professional development. Adapt to organizational or technological changes. Work in situations of lack of information and / or with time and / or resource restrictions.

Objectives

  1. To acquire knowledge about the basic execution models and performance metrics
    Related competences: CE05,
    Subcompetences:
    • Performance characterisation tools
  2. To acquire knowledge about the architecture of scalar processors and the techniques for exploiting ILP (instruction-level parallelism) and DLP (data-level parallelism)
    Related competences: CG3, CE05,
    Subcompetences:
    • Scalar optimizations: vectorisation
  3. Understand shared memory architectures, hardware support for memory coherence and synchronization
    Related competences: CG3, CE05,
    Subcompetences:
    • Shared-memory parallel programming: OpenMP
  4. Understand the distributed memory architectures and the hardware support for data exchange
    Related competences: CG3, CE05, CE07,
    Subcompetences:
    • Message-passing parallel programming: MPI
  5. Acquire knowledge about accelerator-based architectures and their access to the memory hierarchy of the scalar processor
    Related competences: CG3, CE05,
  6. Acquire knowledge and apply the basic techniques of parallel programming, for shared- and distributed-memory multiprocessors
    Related competences: CT6, CE11,
  7. Ability to discuss and compare the resolution of problems and practical exercises, both in group work or autonomously
    Related competences: CT3, CT6,
  8. Understand the relation between the course and the AI area
    Related competences: CG2, CG5, CG9,

Contents

  1. Execution models and performance metrics
    Presentation of the serial, multiprogrammed, concurrent and parallel execution models, together with the basic metrics that characterize their performance.
  2. Scalar processor architecture and code optimization
    This unit introduces the basic architecture of the scalar processor and the techniques for increasing parallelism at the instructional level (ILP: pipelined and superscalar design) and at the data level (DLP: vector units). Memory hierarchy optimization and vectorization.
  3. Shared-memory multiprocessors architecture and programming
    This unit introduces the UMA (uniform memory access time) and NUMA (non-uniform memory access time) shared-memory multiprocessor architectures, including bus and directory-based consistency mechanisms and the support for synchronization using atomic instructions. It also presents the architecture of a node within a cluster architecture and the components that make it up (processors with multiple execution cores, memory and buses). Parallelization of applications using the tasking model in OpenMP.
  4. Distributed-memory multiprocessor architecture and programming
    This unit presents multiprocessor architectures of distributed memory based on message-passing through a scalable interconnection network. Parallelization of applications with the MPI programming model.
  5. Accelerators for artificial intelligence applications
    This unit presents the architectures aimed at accelerating the most characteristic computing kernels in artificial intelligence applications: GPU (Graphics Processing Units), TPU (Tensor Processing Units), ... and their integration in the shared-memory nodes of a cluster architecture. Use case: accelerators for Deep Learning environments.

Activities

Activity Evaluation act


Execution models, performance metrics and analysis tools

-
Objectives: 1 7
Contents:
Theory
4h
Problems
0h
Laboratory
4h
Guided learning
0h
Autonomous learning
6h

Scalar processor architecture and application optimization

-
Objectives: 2 7
Contents:
Theory
6h
Problems
0h
Laboratory
4h
Guided learning
0h
Autonomous learning
10h

Shared-memory multiprocessor architecture and OpenMP programming

-
Objectives: 3 7 8 6
Contents:
Theory
8h
Problems
0h
Laboratory
8h
Guided learning
0h
Autonomous learning
14h

OpenMP tutorial

-
Objectives: 3
Contents:
Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
2h

Mid-term theory exam


Objectives: 1 2
Week: 6
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
10h

Distributed-memory multiprocessor architecture and MPI programming

-
Objectives: 4 7 8 6
Contents:
Theory
4h
Problems
0h
Laboratory
8h
Guided learning
0h
Autonomous learning
12h

Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
2h

Accelerator architecture for artificial intelligence applications

-
Objectives: 7 8 5 6
Contents:
Theory
4h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Final exam


Objectives: 1 2 3 4 5
Week: 15 (Outside class hours)
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
20h

Laboratory mid-term exam


Objectives: 6
Week: 14
Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
8h

Teaching methodology

The course is based on classroom theory and laboratory sessions. The theory sessions combine lectures and the resolution of exercises, following the program set out in this syllabus and based on the use of own material (slides, collection of exercises, ...). During the sessions, the dialogue and discussions are encouraged to anticipate and consolidate the learning outcomes of the course.

Laboratory sessions cover the aspects related to programming and follow the same subjects in the course syllabus. They are practice sessions using a cluster architecture available from the Computer Architecture Department.

Evaluation methodology

There are two exams for the theory part and one for the laboratory part:
- PT: mid-term theory exam (20%)
- FT: final theory exam (35%)
- FL: final laboratory exam (30%)

Additionally, they will be evaluated continuously:
- SL: laboratory monitoring reports (15%) which will also be used to assess the CT3 and CT6 transversal competencies.

The Final Grade (NF) of the course is obtained from
NF = (0.30 x FL + 0.15 x SL) + MAX(0.55 x FT; (0.20 x PT + 0.35 x FT))

In the case of NF < 5.0 but higher than 3.5, there will be the option of re-evaluation through an exam that will cover the entire course (theory and practices). The grade of said exam will replace the NF grade, whose value may not be greater than 7.

Bibliography

Basic:

Previous capacities

Those acquired in the course Fundamentals of Computers (FC) conceptually preceding this course.