High Performance Computing

You are here

Credits
6
Types
Compulsory
Requirements
This subject has not requirements, but it has got previous capacities
Department
AC
Mail
,
The aim of this subject is to know the operation and applications of high-performance computing systems, in order to deploy artificial intelligence applications that require a large amount of resources, process optimization and application of accelerators, and leveraging and orchestrating cloud resources. This course will cover concepts of virtualization and containerization, as well as distributed file systems and distributed computing systems. You will also see scalability in machine learning algorithms and artificial intelligence, using state-of-the-art technologies, both for middleware and accelerators. We will work with C, Python and Scala languages.

Teachers

Person in charge

  • Jordi Torres Viñals ( )
  • Josep Lluís Berral García ( )

Weekly hours

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6

Competences

Transversal Competences

Transversals

  • CT2 - Sustainability and Social Commitment. To know and understand the complexity of economic and social phenomena typical of the welfare society; Be able to relate well-being to globalization and sustainability; Achieve skills to use in a balanced and compatible way the technique, the technology, the economy and the sustainability.
  • CT3 - Efficient oral and written communication. Communicate in an oral and written way with other people about the results of learning, thinking and decision making; Participate in debates on topics of the specialty itself.
  • CT6 [Avaluable] - Autonomous Learning. Detect deficiencies in one's own knowledge and overcome them through critical reflection and the choice of the best action to extend this knowledge.

Basic

  • CB2 - That the students know how to apply their knowledge to their work or vocation in a professional way and possess the skills that are usually demonstrated through the elaboration and defense of arguments and problem solving within their area of ??study.

Technical Competences

Especifics

  • CE05 - To be able to analyze and evaluate the structure and architecture of computers, as well as the basic components that make them up.
  • CE06 - To be able to identify the features, functionalities and structure of Operating Systems and to design and implement applications based on their services.
  • CE07 - To interpret the characteristics, functionalities and structure of Distributed Systems, Computer Networks and the Internet and design and implement applications based on them.
  • CE08 - To detect the characteristics, functionalities and components of data managers, which allow the adequate use of them in information flows, and the design, analysis and implementation of applications based on them.
  • CE11 - To identify and apply the fundamental principles and basic techniques of parallel, concurrent, distributed and real-time programming.
  • CE19 - To use current computer systems, including high-performance systems, for the processing of large volumes of data from the knowledge of its structure, operation and particularities.

Generic Technical Competences

Generic

  • CG1 - To ideate, draft, organize, plan and develop projects in the field of artificial intelligence.
  • CG3 - To define, evaluate and select hardware and software platforms for the development and execution of computer systems, services and applications in the field of artificial intelligence.
  • CG9 - To face new challenges with a broad vision of the possibilities of a professional career in the field of Artificial Intelligence. Develop the activity applying quality criteria and continuous improvement, and act rigorously in professional development. Adapt to organizational or technological changes. Work in situations of lack of information and / or with time and / or resource restrictions.

Objectives

  1. Understand the use of high-performance computing and middlewares for artificial intelligence
    Related competences: CG1, CG9, CT3, CT6, CE19,
  2. Know the basic components of hardware and middleware in high-performance platforms
    Related competences: CG9, CT2, CE05, CE08, CE19,
  3. Learn about the use of accelerators (e.g. GPUs) and the tools for their exploitation
    Related competences: CG3, CT6, CE08, CE19,
  4. Learn about virtualization concepts and usage of virtual machines
    Related competences: CG3, CT2, CB2, CE05, CE06,
  5. Become familiar with the basic tools for exploiting distributed systems, with programming models oriented to distribution
    Related competences: CG3, CT6, CE07, CE08, CE11,
  6. Know the basic concepts on distributed systems, interconnection and connection among systems.
    Related competences: CG3, CT3, CT6, CE07, CE11,
  7. Learn about file systems: basic usage of file systems, redundancy on disks, logic volumes and fault tolerance.
    Related competences: CG3, CT6, CB2, CE06, CE07, CE08,
  8. Discover the challenges on high-performance computing on artificial intelligence
    Related competences: CG1, CG9, CT2, CT3,

Contents

  1. Introduction to High-Performance Computing Systems
    Introduction to large-scale computing systems, specialized and the Cloud.
  2. Accelerators and high-performance devices
    Incorporation of accelerators (e.g. GPUs) and the tools for their exploitation. Matrix operations accelerated through specialized devices.
  3. Middleware and high-performance platforms for artificial intelligence
    Basic components of hardware and middleware in high-performance platforms. Use of state of the art and commodity tools (e.g. TensorFlow, Pytorch, etc.) combined with specialized devices.
  4. Parallelism applied to artificial intelligence
    Parallelism on high-performance computing through the most common middlewares for artificial intelligence, deep learning and transformers, and their associated techniques
  5. Introduction to distributed programming models for Big Data
    Introduction to Map-Reduce programming models over distributed data systems and language Scala.
  6. Virtualization concepts and containerization
    Introduction to the use of virtual machines and containerization, for isolation executions and personalized environments, as load migration and resource management in shared systems.
  7. Local and distributed file systems, redundancy and availability
    Basic usage of file systems, distributed file systems, logic volumes, redundancy, fault tolerance and high availability.
  8. Distributed systems for computing
    Basic concepts on distributed systems (e.g. Hadoop and Spark), interconnection and communications, paradigms of distributed systems and protocols, and fault tolerance. Basic tools for exploiting concurrency on distributed systems, and their programming models oriented towards artificial intelligence and Big Data processing.
  9. Challenges for high-performance computing for artificial intelligence
    Challenges for present and future of high-performance computing applied to artificial intelligence. Current tools and environments in the industry, the Cloud, academia and society.

Activities

Activity Evaluation act


Introduction to High-performance Computing systems

Introduction to High Performance Computing systems, tools and environments. Familiarization with HPC facilities, hands-on use of HPC systems and C language.
Objectives: 2
Contents:
Theory
2h
Problems
0h
Laboratory
4h
Guided learning
0h
Autonomous learning
9h

Accelerators, supercomputers and high-performance devices

Accelerators and high performance devices. GPUs and accelerator devices. Matrix multiplication using GPUs. Introduction to Python on a supercomputer.
Objectives: 3
Contents:
Theory
2h
Problems
0h
Laboratory
4h
Guided learning
0h
Autonomous learning
9h

Middleware and high-performance platforms for artificial intelligence

Middleware and high performance platforms for artificial intelligence. TensorFlow/Pytorch, Deep Learning and HPC.
Objectives: 1
Contents:
Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
6h

Parallelism applied to artificial intelligence

Parallelism applied to artificial intelligence. Scalability, advanced deep learning techniques, transformers and the future of Deep Learning.
Objectives: 2 1
Contents:
Theory
4h
Problems
0h
Laboratory
4h
Guided learning
0h
Autonomous learning
12h

Introduction to distributed programming models for Big Data

Introduction to Map-Reduce programming models on distributed data systems and Scala language.
Objectives: 6
Contents:
Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
6h

Virtualization and containerization concepts

Introduction to the use of virtual machines and containerization, for isolated and customized execution of environments, as well as load migration and resource management to shared systems.
Objectives: 4
Contents:
Theory
4h
Problems
0h
Laboratory
4h
Guided learning
0h
Autonomous learning
12h

Local and distributed file systems, redundancy and availability

Basic uses of file systems, as well as distributed data storage systems, logical volumes, redundancy, fault tolerance, and high availability.
Objectives: 7
Contents:
Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
6h

Computing in distributed systems

Basic concepts of distributed systems (e.g. Hadoop and Spark), interconnection and communications, distributed systems paradigms and protocols, and fault tolerance. Basic tools for the exploitation of concurrency in distributed systems, and their programming models oriented to artificial intelligence and massive data processing.
Objectives: 6 5
Contents:
Theory
4h
Problems
0h
Laboratory
4h
Guided learning
0h
Autonomous learning
12h

Current tools and environments in industry, the cloud, academia and society.

Current tools and environments in industry, the cloud, academia and society.
Objectives: 8
Contents:
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Present and future challenges of high-performance computing applied to artificial intelligence. Seminars on HPC

Seminars of experts in the field. Presentation of work.
Objectives: 5 1 8
Contents:
Theory
6h
Problems
0h
Laboratory
4h
Guided learning
0h
Autonomous learning
12h

Teaching methodology

The course is based on theory and face-to-face laboratory sessions. The theoretical sessions combine lectures and seminars by experts in the field, following the program set out in this study plan and based on the use of own material. During the sessions, dialogue and discussion are promoted in order to anticipate and consolidate the learning outcomes of the subject.

The laboratory sessions deal with the aspects related to the different technologies presented, and follow the same topics as the syllabus studies. These are hands-on practical sessions, using different computational resources in the Department of Computer Architecture and the Barcelona Supercomputing Center.

Evaluation methodology

The evaluation will basically be based on the completion of continuous work during the different sessions of the course. Attendance and participation will be mandatory, and therefore will also be assessed by passing a list and requiring participation in the interactive sessions. Finally, there will be a research project throughout the course, which students will have to present to their peers.

The distribution of weights for each activity is as follows:
- AS: attendance in class, theory and laboratories (10%), which will be used to evaluate transversal competence CT3.
- PR: class participation (10%)
- EX: laboratory deliverables (65%), as an arithmetic average of laboratory practices.
- RE: presentation of a research paper (15%), which will be used to evaluate transversal skills CT2, CT3 and CT6.

The Final Grade (NF) of the subject is obtained from
NF = 0.10 x AS + 0.10 x PR + 0.65 x EX + 0.10 x RE

Bibliography

Basic:

Complementary:

  • BSC documentation about Marenostrum 4 and CTE-Power - Barcelona Supercomputing Center, , .

Previous capacities

Having studied the subjects of Computer Fundamentals, as well as Parallelism and Distributed Systems.