The aim of this subject is to know the operation and applications of high-performance computing systems, in order to deploy artificial intelligence applications that require a large amount of resources, process optimization and application of accelerators, and leveraging and orchestrating cloud resources. This course will cover concepts of virtualization and containerization, as well as distributed file systems and distributed computing systems. You will also see scalability in machine learning algorithms and artificial intelligence, using state-of-the-art technologies, both for middleware and accelerators. We will work with C, Python and Scala languages.
Teachers
Person in charge
Josep Lluís Berral García (
)
Others
Jordi Torres Viñals (
)
Weekly hours
Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6
Competences
Transversal Competences
Transversals
CT2 - Sustainability and Social Commitment. To know and understand the complexity of economic and social phenomena typical of the welfare society; Be able to relate well-being to globalization and sustainability; Achieve skills to use in a balanced and compatible way the technique, the technology, the economy and the sustainability.
CT3 - Efficient oral and written communication. Communicate in an oral and written way with other people about the results of learning, thinking and decision making; Participate in debates on topics of the specialty itself.
CT6 [Avaluable] - Autonomous Learning. Detect deficiencies in one's own knowledge and overcome them through critical reflection and the choice of the best action to extend this knowledge.
Basic
CB2 - That the students know how to apply their knowledge to their work or vocation in a professional way and possess the skills that are usually demonstrated through the elaboration and defense of arguments and problem solving within their area of ??study.
Technical Competences
Especifics
CE05 - To be able to analyze and evaluate the structure and architecture of computers, as well as the basic components that make them up.
CE06 - To be able to identify the features, functionalities and structure of Operating Systems and to design and implement applications based on their services.
CE07 - To interpret the characteristics, functionalities and structure of Distributed Systems, Computer Networks and the Internet and design and implement applications based on them.
CE08 - To detect the characteristics, functionalities and components of data managers, which allow the adequate use of them in information flows, and the design, analysis and implementation of applications based on them.
CE11 - To identify and apply the fundamental principles and basic techniques of parallel, concurrent, distributed and real-time programming.
CE19 - To use current computer systems, including high-performance systems, for the processing of large volumes of data from the knowledge of its structure, operation and particularities.
Generic Technical Competences
Generic
CG1 - To ideate, draft, organize, plan and develop projects in the field of artificial intelligence.
CG3 - To define, evaluate and select hardware and software platforms for the development and execution of computer systems, services and applications in the field of artificial intelligence.
CG9 - To face new challenges with a broad vision of the possibilities of a professional career in the field of Artificial Intelligence. Develop the activity applying quality criteria and continuous improvement, and act rigorously in professional development. Adapt to organizational or technological changes. Work in situations of lack of information and / or with time and / or resource restrictions.
Objectives
Understand the use of high-performance computing and middlewares for artificial intelligence
Related competences:
CG1,
CG9,
CT3,
CT6,
CE19,
Know the basic components of hardware and middleware in high-performance platforms
Related competences:
CG9,
CT2,
CE05,
CE08,
CE19,
Learn about the use of accelerators (e.g. GPUs) and the tools for their exploitation
Related competences:
CG3,
CT6,
CE08,
CE19,
Learn about virtualization concepts and usage of virtual machines
Related competences:
CG3,
CT2,
CB2,
CE05,
CE06,
Become familiar with the basic tools for exploiting distributed systems, with programming models oriented to distribution
Related competences:
CG3,
CT6,
CE07,
CE08,
CE11,
Know the basic concepts on distributed systems, interconnection and connection among systems.
Related competences:
CG3,
CT3,
CT6,
CE07,
CE11,
Learn about file systems: basic usage of file systems, redundancy on disks, logic volumes and fault tolerance.
Related competences:
CG3,
CT6,
CB2,
CE06,
CE07,
CE08,
Discover the challenges on high-performance computing on artificial intelligence
Related competences:
CG1,
CG9,
CT2,
CT3,
Contents
Introduction to High-Performance Computing Systems
Introduction to large-scale computing systems, specialized and the Cloud.
Accelerators and high-performance devices
Incorporation of accelerators (e.g. GPUs) and the tools for their exploitation. Matrix operations accelerated through specialized devices.
Middleware and high-performance platforms for artificial intelligence
Basic components of hardware and middleware in high-performance platforms. Use of state of the art and commodity tools (e.g. TensorFlow, Pytorch, etc.) combined with specialized devices.
Parallelism applied to artificial intelligence
Parallelism on high-performance computing through the most common middlewares for artificial intelligence, deep learning and transformers, and their associated techniques
Introduction to distributed programming models for Big Data
Introduction to Map-Reduce programming models over distributed data systems and language Scala.
Virtualization concepts and containerization
Introduction to the use of virtual machines and containerization, for isolation executions and personalized environments, as load migration and resource management in shared systems.
Local and distributed file systems, redundancy and availability
Basic usage of file systems, distributed file systems, logic volumes, redundancy, fault tolerance and high availability.
Distributed systems for computing
Basic concepts on distributed systems (e.g. Hadoop and Spark), interconnection and communications, paradigms of distributed systems and protocols, and fault tolerance. Basic tools for exploiting concurrency on distributed systems, and their programming models oriented towards artificial intelligence and Big Data processing.
Challenges for high-performance computing for artificial intelligence
Challenges for present and future of high-performance computing applied to artificial intelligence. Current tools and environments in the industry, the Cloud, academia and society.
Activities
ActivityEvaluation act
Virtualization and containerization concepts
Introduction to the use of virtual machines and containerization, for isolated and customized execution of environments, as well as load migration and resource management to shared systems. Objectives:4 Contents:
Introduction to Client-Server models, execution management systems, and launch of applications in cluster and Cloud systems.
Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
6h
Supercomputers and High-performance Computing
Supercomputers and High Performance Computing systems, tools and environments. Familiarization with HPC facilities, hands-on use of HPC systems and C language. Objectives:2 Contents:
Accelerators, supercomputers and high-performance devices
Accelerators and high performance devices. GPUs and accelerator devices. Matrix multiplication using GPUs. Introduction to Python on a supercomputer. Objectives:3 Contents:
Basic concepts of distributed systems (e.g. Hadoop and Spark), interconnection and communications, distributed systems paradigms and protocols, and fault tolerance. Basic tools for the exploitation of concurrency in distributed systems, and their programming models oriented to artificial intelligence and massive data processing. Objectives:65 Contents:
Local and distributed file systems, redundancy and availability
Basic uses of file systems, as well as distributed data storage systems, logical volumes, redundancy, fault tolerance, and high availability. Objectives:7 Contents:
Parallelism applied to artificial intelligence. Scalability, advanced deep learning techniques, transformers and the future of Deep Learning. Objectives:21 Contents:
The course is based on theory and face-to-face laboratory sessions. The theoretical sessions combine lectures and seminars by experts in the field, following the program set out in this study plan and based on the use of own material. During the sessions, dialogue and discussion are promoted in order to anticipate and consolidate the learning outcomes of the subject.
The laboratory sessions deal with the aspects related to the different technologies presented, and follow the same topics as the syllabus studies. These are hands-on practical sessions, using different computational resources in the Department of Computer Architecture and the Barcelona Supercomputing Center.
Evaluation methodology
The evaluation will basically be based on the completion of continuous work during the different sessions of the course. Attendance and participation will be mandatory, and therefore will also be assessed by passing a list and requiring participation in the interactive sessions. Finally, there will be a research project throughout the course, which students will have to present to their peers.
The distribution of weights for each activity is as follows:
- AS: attendance in class, theory and laboratories (10%), which will be used to evaluate transversal competence CT3.
- PR: class participation (15%)
- EX: laboratory deliverables (55%), as an arithmetic average of laboratory practices.
- RE: presentation of a research paper (20%), which will be used to evaluate transversal skills CT2, CT3 and CT6.
The Final Grade (NF) of the subject is obtained from
NF = 0.10 x AS + 0.15 x PR + 0.55 x EX + 0.20 x RE
Re-evaluation
a) Re-evaluation can only be applied to students that presented all EX + RE exercises, and failed NF. (This is, those that want to upgrade their marks or are NP are excluded.)
b) Maximum mark in re-evaluation is 7.