Credits
6
Types
Compulsory
Requirements
This subject has not requirements
, but it has got previous capacities
Department
AC
Web
https://docencia.ac.upc.edu/gia-cap
Mail
josep.ll.berral@upc.edu, jordi.torres@upc.edu
Teachers
Person in charge
- Josep Lluís Berral García ( berral@ac.upc.edu )
Others
- Jordi Torres Viñals ( torres@ac.upc.edu )
Weekly hours
Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6
Competences
Transversals
Basic
Especifics
Generic
Objectives
-
Understand the use of high-performance computing and middlewares for artificial intelligence
Related competences: CG1, CG9, CT3, CT6, CE19, -
Know the basic components of hardware and middleware in high-performance platforms
Related competences: CG9, CT2, CE05, CE08, CE19, -
Learn about the use of accelerators (e.g. GPUs) and the tools for their exploitation
Related competences: CG3, CT6, CE08, CE19, -
Learn about virtualization concepts and usage of virtual machines
Related competences: CG3, CT2, CB2, CE05, CE06, -
Become familiar with the basic tools for exploiting distributed systems, with programming models oriented to distribution
Related competences: CG3, CT6, CE07, CE08, CE11, -
Know the basic concepts on distributed systems, interconnection and connection among systems.
Related competences: CG3, CT3, CT6, CE07, CE11, -
Learn about file systems: basic usage of file systems, redundancy on disks, logic volumes and fault tolerance.
Related competences: CG3, CT6, CB2, CE06, CE07, CE08, -
Discover the challenges on high-performance computing on artificial intelligence
Related competences: CG1, CG9, CT2, CT3,
Contents
-
Introduction to High-Performance Computing Systems
Introduction to large-scale computing systems, specialized and the Cloud. -
Accelerators and high-performance devices
Incorporation of accelerators (e.g. GPUs) and the tools for their exploitation. Matrix operations accelerated through specialized devices. -
Middleware and high-performance platforms for artificial intelligence
Basic components of hardware and middleware in high-performance platforms. Use of state of the art and commodity tools (e.g. TensorFlow, Pytorch, etc.) combined with specialized devices. -
Parallelism applied to artificial intelligence
Parallelism on high-performance computing through the most common middlewares for artificial intelligence, deep learning and transformers, and their associated techniques -
Introduction to distributed programming models for Big Data
Introduction to Map-Reduce programming models over distributed data systems and language Scala. -
Virtualization concepts and containerization
Introduction to the use of virtual machines and containerization, for isolation executions and personalized environments, as load migration and resource management in shared systems. -
Local and distributed file systems, redundancy and availability
Basic usage of file systems, distributed file systems, logic volumes, redundancy, fault tolerance and high availability. -
Distributed systems for computing
Basic concepts on distributed systems (e.g. Hadoop and Spark), interconnection and communications, paradigms of distributed systems and protocols, and fault tolerance. Basic tools for exploiting concurrency on distributed systems, and their programming models oriented towards artificial intelligence and Big Data processing. -
Challenges for high-performance computing for artificial intelligence
Challenges for present and future of high-performance computing applied to artificial intelligence. Current tools and environments in the industry, the Cloud, academia and society.
Activities
Activity Evaluation act
Virtualization and containerization concepts
Introduction to the use of virtual machines and containerization, for isolated and customized execution of environments, as well as load migration and resource management to shared systems.Objectives: 4
Contents:
Theory
4h
Problems
0h
Laboratory
4h
Guided learning
0h
Autonomous learning
12h
Service and application architecture
Introduction to Client-Server models, execution management systems, and launch of applications in cluster and Cloud systems.
Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
6h
Supercomputers and High-performance Computing
Supercomputers and High Performance Computing systems, tools and environments. Familiarization with HPC facilities, hands-on use of HPC systems and C language.Objectives: 2
Contents:
Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
9h
Accelerators, supercomputers and high-performance devices
Accelerators and high performance devices. GPUs and accelerator devices. Matrix multiplication using GPUs. Introduction to Python on a supercomputer.Objectives: 3
Contents:
Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
9h
Computing in distributed systems
Basic concepts of distributed systems (e.g. Hadoop and Spark), interconnection and communications, distributed systems paradigms and protocols, and fault tolerance. Basic tools for the exploitation of concurrency in distributed systems, and their programming models oriented to artificial intelligence and massive data processing.Objectives: 6 5
Contents:
Theory
4h
Problems
0h
Laboratory
4h
Guided learning
0h
Autonomous learning
12h
Current tools and environments in industry, the cloud, academia and society.
Current tools and environments in industry, the cloud, academia and society.Objectives: 8
Contents:
Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
6h
Local and distributed file systems, redundancy and availability
Basic uses of file systems, as well as distributed data storage systems, logical volumes, redundancy, fault tolerance, and high availability.Objectives: 7
Contents:
Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
6h
Middleware and high-performance platforms for artificial intelligence
Middleware and high performance platforms for artificial intelligence. TensorFlow/Pytorch, Deep Learning, LLMs and HPC.Objectives: 1
Contents:
Theory
2h
Problems
0h
Laboratory
4h
Guided learning
0h
Autonomous learning
6h
Teaching methodology
The course is based on theory and face-to-face laboratory sessions. The theoretical sessions combine lectures and seminars by experts in the field, following the program set out in this study plan and based on the use of own material. During the sessions, dialogue and discussion are promoted in order to anticipate and consolidate the learning outcomes of the subject.The laboratory sessions deal with the aspects related to the different technologies presented, and follow the same topics as the syllabus studies. These are hands-on practical sessions, using different computational resources in the Department of Computer Architecture and the Barcelona Supercomputing Center.
Evaluation methodology
The evaluation will basically be based on the completion of continuous work during the different sessions of the course. Attendance and participation will be mandatory, and therefore will also be assessed by passing a list and requiring participation in the interactive sessions. Finally, there will be a research project throughout the course, which students will have to present to their peers.The distribution of weights for each activity is as follows:
- AS: attendance in class, theory and laboratories (10%), which will be used to evaluate transversal competence CT3.
- PR: class participation (15%)
- EX: laboratory and class deliverables (55%), as an arithmetic average of the different assignments.
- RE: presentation of a research paper (20%), which will be used to evaluate transversal skills CT2, CT3 and CT6.
The Final Grade (NF) of the subject is obtained from
NF = 0.10 x AS + 0.15 x PR + 0.55 x EX + 0.20 x RE
Re-evaluation
a) Re-evaluation can only be applied to students that presented all EX + RE exercises, and failed NF. (This is, those that want to upgrade their marks or are NP are excluded.)
b) Maximum mark in re-evaluation is 7.
Bibliography
Basic
-
First contact with Deep learning : practical introduction with Keras
- Torres, Jordi,
Kindle Direct Publishing,
[2018].
ISBN: 9781983211553
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004153269706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Dive into Deep Learning
- Zhang, Aston and Lipton, Zachary C. and Li, Mu and Smola, Alexander J.,
The authors,
2020.
-
High performance computing : modern systems and practices
- Sterling, Thomas; Anderson, Matthew; Brodowicz, Maciej,
Morgan Kaufmann,
[2018].
ISBN: 9780124201583
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004173809706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Spark: the definitive guide: big data processing made simple
- Chambers, B.; Zaharia, M,
O'Reilly,
2018.
ISBN: 9781491912300
-
Hadoop : the definitive guide
- White, Tom,
O'Reilly,
2015.
ISBN: 9781491901632
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004054859706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
La inteligencia artificial explicada a los humanos
- Torres Viñals, Jordi,
Plataforma Editorial,
2023.
ISBN: 9788419655561
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991005151879806711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Supercomputing for Artificial Intelligence: Foundations, Architectures and Scaling Deep Learning
- Torres Viñals, Jordi,
Watch This Space,
2025.
ISBN: 9798319328359
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991005476510706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Complementary
-
BSC documentation about Marenostrum 5
- Barcelona Supercomputing Center,
Web links
- Documentation MareNostrum-V https://www.bsc.es/supportkc/docs/MareNostrum5/intro/