Parallelism and Distributed Systems

Credits

Types

Compulsory

Requirements

This subject has not requirements , but it has got previous capacities

Department

Web

Mail

This course provides basic knowledge about:1) the different levels of parallelism that we can find in current architectures (at the level of instructions or ILP, at the level of data or DLP and at the level of execution flow or TLP); 2) how the memory hierarchy is organized to support them; and 3) the mechanisms that allow them to be exploited from the point of view of application programming. This knowledge will allow the understanding of the opportunities offered by these architectures to address the computational needs of most artificial intelligence applications.

Teachers

Person in charge

Eduard Ayguadé Parra (eduard@ac.upc.edu)

Others

Josep Lluís Berral García (berral@ac.upc.edu)

Weekly hours

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Competences

Transversal Competences

Transversals

CT3 [Avaluable] - Efficient oral and written communication. Communicate in an oral and written way with other people about the results of learning, thinking and decision making; Participate in debates on topics of the specialty itself.

CT6 [Avaluable] - Autonomous Learning. Detect deficiencies in one's own knowledge and overcome them through critical reflection and the choice of the best action to extend this knowledge.

Technical Competences

Especifics

CE05 - To be able to analyze and evaluate the structure and architecture of computers, as well as the basic components that make them up.

CE07 - To interpret the characteristics, functionalities and structure of Distributed Systems, Computer Networks and the Internet and design and implement applications based on them.

CE11 - To identify and apply the fundamental principles and basic techniques of parallel, concurrent, distributed and real-time programming.

Generic Technical Competences

Generic

CG2 - To use the fundamental knowledge and solid work methodologies acquired during the studies to adapt to the new technological scenarios of the future.

CG3 - To define, evaluate and select hardware and software platforms for the development and execution of computer systems, services and applications in the field of artificial intelligence.

CG5 - Work in multidisciplinary teams and projects related to artificial intelligence and robotics, interacting fluently with engineers and professionals from other disciplines.

CG9 - To face new challenges with a broad vision of the possibilities of a professional career in the field of Artificial Intelligence. Develop the activity applying quality criteria and continuous improvement, and act rigorously in professional development. Adapt to organizational or technological changes. Work in situations of lack of information and / or with time and / or resource restrictions.

Objectives

To acquire knowledge about the basic execution models and performance metrics
Related competences: CE05,
Subcompetences
- Performance characterisation tools
To acquire knowledge about the architecture of scalar processors and the techniques for exploiting ILP (instruction-level parallelism) and DLP (data-level parallelism)
Related competences: CG3, CE05,
Subcompetences
- Scalar optimizations: vectorisation
Understand shared memory architectures, hardware support for memory coherence and synchronization
Related competences: CG3, CE05,
Subcompetences
- Shared-memory parallel programming: OpenMP
Understand the distributed memory architectures and the hardware support for data exchange
Related competences: CG3, CE05, CE07,
Subcompetences
- Message-passing parallel programming: MPI
Acquire knowledge about accelerator-based architectures and their access to the memory hierarchy of the scalar processor
Related competences: CG3, CE05,
Acquire knowledge and apply the basic techniques of parallel programming, for shared- and distributed-memory multiprocessors
Related competences: CT6, CE11,
Ability to discuss and compare the resolution of problems and practical exercises, both in group work or autonomously
Related competences: CT3, CT6,
Understand the relation between the course and the AI area
Related competences: CG2, CG5, CG9,

Execution models and performance metrics
Presentation of the serial, multiprogrammed, concurrent and parallel execution models, together with the basic metrics that characterize their performance.
Scalar processor architecture and code optimization
This unit introduces the basic architecture of the scalar processor and the techniques for increasing parallelism at the instructional level (ILP: pipelined and superscalar design) and at the data level (DLP: vector units). Memory hierarchy optimization and vectorization.
Shared-memory multiprocessors architecture and programming
This unit introduces the UMA (uniform memory access time) and NUMA (non-uniform memory access time) shared-memory multiprocessor architectures, including bus and directory-based consistency mechanisms and the support for synchronization using atomic instructions. It also presents the architecture of a node within a cluster architecture and the components that make it up (processors with multiple execution cores, memory and buses). Parallelization of applications using the tasking model in OpenMP.
Distributed-memory multiprocessor architecture and programming
This unit presents multiprocessor architectures of distributed memory based on message-passing through a scalable interconnection network. Parallelization of applications with the MPI programming model.
Accelerators for artificial intelligence applications
This unit presents the architectures aimed at accelerating the most characteristic computing kernels in artificial intelligence applications: GPU (Graphics Processing Units), TPU (Tensor Processing Units), ... and their integration in the shared-memory nodes of a cluster architecture. Use case: accelerators for Deep Learning environments.

Activities

Activity Evaluation act

Execution models, performance metrics and analysis tools

-
Objectives: 1 7
Contents:

1 . Execution models and performance metrics

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Scalar processor architecture and application optimization

-
Objectives: 2 7
Contents:

2 . Scalar processor architecture and code optimization

Theory

Problems

Laboratory

Guided learning

Autonomous learning

10h

Shared-memory multiprocessor architecture and OpenMP programming

-
Objectives: 3 7 8 6
Contents:

3 . Shared-memory multiprocessors architecture and programming

Theory

Problems

Laboratory

Guided learning

Autonomous learning

14h

OpenMP tutorial

-
Objectives: 3
Contents:

3 . Shared-memory multiprocessors architecture and programming

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Mid-term theory exam

Objectives: 1 2
Week: 6

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Distributed-memory multiprocessor architecture and MPI programming

-
Objectives: 4 7 8 6
Contents:

4 . Distributed-memory multiprocessor architecture and programming

Theory

Problems

Laboratory

Guided learning

Autonomous learning

12h

MPI tutorial

-
Objectives: 4
Contents:

4 . Distributed-memory multiprocessor architecture and programming

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Accelerator architecture for artificial intelligence applications

-
Objectives: 7 8 5 6
Contents:

5 . Accelerators for artificial intelligence applications

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Final exam

Objectives: 1 2 3 4 5
Week: 15 (Outside class hours)

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Laboratory mid-term exam

Objectives: 6
Week: 14

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Teaching methodology

The course is based on classroom theory and laboratory sessions. The theory sessions combine lectures and the resolution of exercises, following the program set out in this syllabus and based on the use of own material (slides, collection of exercises, ...). During the sessions, the dialogue and discussions are encouraged to anticipate and consolidate the learning outcomes of the course.

Laboratory sessions cover the aspects related to programming and follow the same subjects in the course syllabus. They are practice sessions using a cluster architecture available from the Computer Architecture Department.

Evaluation methodology

There are two exams for the theory part and one for the laboratory part:
- PT: mid-term theory exam (20%)
- FT: final theory exam (35%)
- FL: final laboratory exam (30%)

Additionally, they will be evaluated continuously:
- SL: laboratory monitoring reports (15%) which will also be used to assess the CT3 and CT6 transversal competencies.

The Final Grade (NF) of the course is obtained from
NF = (0.30 x FL + 0.15 x SL) + MAX(0.55 x FT; (0.20 x PT + 0.35 x FT))

In the case of NF < 5.0 but higher than 3.5 having taken both parts of the final exam, there will be the option of re-evaluation through an exam that will cover the entire course (theory and practices). The grade of the reassessment exam will replace the NF grade, whose value may not be greater than 7.

Bibliography

Basic

Computer organization and design: the hardware/software interface - Patterson, David A and Hennessy, John L, Morgan Kaufmann, 2017. ISBN: 9780128017333
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004094079706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Computer architecture: a quantitative approach - Hennessy, John L. and Patterson, David A., Morgan Kaufmann, 2019. ISBN: 9780128119051
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004117509706711&context=L&vid=34CSUC_UPC:VU1&lang=ca

Previous capacities

Those acquired in the course Fundamentals of Computers (FC) conceptually preceding this course.