Advanced Reinforcement Learning

Credits

Types

Optional

Requirements

This subject has not requirements , but it has got previous capacities

Department

Web

https://sites.google.com/upc.edu/ara

This course deepens in the topic of reinforcement learning (RL) after a general introduction in the APRNS course. The course emphasizes, among others, techniques that allow accelerating the learning time of policies and techniques that allow their application in real problems. It also describes how RL is used in cases ranging from learning superhuman policies in games (such as Go), to learning the coordination of multi-agent systems, through its application in the development of large language models (LLMs).

Teachers

Person in charge

Mario Martín Muñoz (mmartin@cs.upc.edu)

Weekly hours

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Competences

Transversal Competences

Transversals

CT6 [Avaluable] - Autonomous Learning. Detect deficiencies in one's own knowledge and overcome them through critical reflection and the choice of the best action to extend this knowledge.

Basic

CB5 - That the students have developed those learning skills necessary to undertake later studies with a high degree of autonomy

Technical Competences

Especifics

CE18 - To acquire and develop computational learning techniques and to design and implement applications and systems that use them, including those dedicated to the automatic extraction of information and knowledge from large volumes of data.

CE19 - To use current computer systems, including high-performance systems, for the processing of large volumes of data from the knowledge of its structure, operation and particularities.

CE22 - To represent, design and analyze dynamic systems. To acquire concepts such as observability, stability and controllability.

Generic Technical Competences

Generic

CG2 - To use the fundamental knowledge and solid work methodologies acquired during the studies to adapt to the new technological scenarios of the future.

CG4 - Reasoning, analyzing reality and designing algorithms and formulations that model it. To identify problems and construct valid algorithmic or mathematical solutions, eventually new, integrating the necessary multidisciplinary knowledge, evaluating different alternatives with a critical spirit, justifying the decisions taken, interpreting and synthesizing the results in the context of the application domain and establishing methodological generalizations based on specific applications.

CG9 - To face new challenges with a broad vision of the possibilities of a professional career in the field of Artificial Intelligence. Develop the activity applying quality criteria and continuous improvement, and act rigorously in professional development. Adapt to organizational or technological changes. Work in situations of lack of information and / or with time and / or resource restrictions.

Objectives

Learn the problems when designing reinforcement functions and how to solve them
Related competences: CG4, CB5, CE18,
Learn techniques to accelerate reinforcement learning so that it is viable in real-world applications.
Related competences: CG2, CG9, CT6, CB5,
Understand the problem of simultaneous learning in multi-agent systems and the techniques that allow this learning
Related competences: CG4, CE22,
Learn how to incorporate learning from examples to get better policies than those generated by the examples and to get the hidden reinforcement function behind those examples.
Related competences: CG2, CG4, CE19,

Reinforcement Function Design: Inverse Reinforcement Learning (IRL)
The reinforcement function is crucial in RL but is not always easy to define. This topic shows how to derive a reinforcement function from example behavior.
Learning the Reinforcement Function with a Human in the Loop (RLHF)
In the definition of complex reinforcement functions we do not have examples of behaviors to apply IRL to. In this case we will see how to create the Reinforcement Function with Human Feedback (RLHF). This mechanism is the basis of training to align large language models such as ChatGPT and others.
Reinforcement learning aided by world model learning.
Reinforcement learning is slow. To reduce the number of interactions with the environment, one possibility is to learn a predictive model of the environment from interactions with it and thus be able to generate simulated experiences from which one could learn without interacting so much with the real world. In this topic, we see this approach and its limitations.
Basic and Advanced Exploration in RL: Implementing Curiosity
A core element in RL is exploration to find better policies. Basic exploration methods involve taking random actions, which leads to inefficiency and slow learning. There are better ways to explore new options and this topic describes them, from determining uncertainty in learned knowledge to implementing curiosity methods to improve exploration.
Learning in Multiagent systems by using RL
In RL it is assumed that the environment is Markovian and that, therefore, changes in the environment occur only by actions of the learning agent. When the agent learns in an environment where other agents are also acting and learning, this condition no longer holds and RL algorithms must adapt. In this topic we see the most advanced methods of reinforcement learning in multi-agent systems, with special emphasis on cooperative problems.
Competition in multiagent systems using RL: AlfaGo and family
A special case of interaction in multi-agent systems is competition and, in particular, zero-sum games. In this scenario, reinforcement learning has led to the development of superhuman abilities in some cases, notably the case of the game of Go. In this topic we will see the self-play and MonteCarlo Tree Search techniques that allow you to develop these skills.
RL in sparse reinforcement functions: Conditional policies and hindsight
Often in RL the reinforcement function is sparse (uninformative). This has the advantage that the obtained policies are not biased, but it slows down learning. In this topic we study the policies conditioned on the objective and the technique of hindsight which have been shown to be very effective in accelerating the learning in these cases.
Off-line reinforcement learning
In some applications we have examples of behavior generated by humans or other policies. One possibility to take advantage of this data is to do imitation learning or apply IRL to learn from examples. However, the resulting policy will be at most as good as the one generated by the examples. Can we get better policies than the examples generate using RL? Off-line RL takes advantage of the quality of Off-policy methods to obtain good policies not with the data it generates but with possibly suboptimal data generated by other policies (the examples)
Curricular and hierarchical learning
In RL it is often difficult to learn complex tasks from scratch. One approach, aligned with how humans learn, is to define a curriculum or hierarchy of tasks to initially learn before attempting to learn the complex task for which the agent is not ready. In this topic you will see how to do curriculum learning and hierarchical learning in these cases.
Transfer learning, Meta learning, Lifelong learning and AGI
RL is an interesting approach to autonomous learning by intelligent agents. However, by its nature it is focused on specific tasks when it is known that an intelligent agent must solve different tasks. This subject considers the interaction between different tasks that must be learned with respect to the transfer of knowledge from one to another (Transfer learning), the learning of tasks to improve learning in subsequent tasks (Meta- Learning) and, finally, regarding the maintenance of the knowledge learned during the life of the agent (Life-long learning). We will see how all these techniques could empower the agent and enable true Artificial General Intelligence (AGI).

Problems

Laboratory

Guided learning

Autonomous learning

Transfer learning, Meta learning, Lifelong learning

Theory

Problems

Laboratory

Guided learning

Autonomous learning

RL and AGI

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Final control

Objectives: 1 2 3 4
Week: 15 (Outside class hours)

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Teaching methodology

The classes are divided into theory, problem and laboratory sessions.

In the theory sessions, knowledge of the subject will be developed, interspersed with the presentation of new theoretical material with examples and interaction with the students in order to discuss the concepts.

In the laboratory classes, small practices will be developed using tools and using specific libraries that will allow you to practice and reinforce the knowledge of the theory classes.

Evaluation methodology

The subject will include the following assessment activities:

- Reports of laboratory activities, which must be submitted within a specified period for each session (approximately 2 weeks). A laboratory grade, L, will be calculated from a weighted average of the grades of these reports.

- A first partial exam, taken around the middle of the course, on the material covered up to that point. Let P1 be the grade obtained in this exam.

- On the designated day within the exam period, a second partial exam on the material not covered by the first partial. Let P2 be the grade obtained in this exam.

- A presentation (Pr) in class of an application of your choice of RL (groups of 2, 15 minutes).

The grades L, P1, P2 and Pr are in range 0 and 10. The final grade for the subject will be:

0.3 ¿ L + 0.25 ¿ P1 + 0.25 ¿ P2 + 0.2 ¿ Pr

Bibliography

Basic

Deep reinforcement learning in action - Zai, Alexander; Brown, Brandon, Manning Publications Co, 2020. ISBN: 9781617295430
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004203829706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Deep Reinforcement Learning Hands-On: apply modern RL methods to practical problems of chatbots, robotics, discrete optimization, web automation, and more - Lapam, Maxim, Packt Publishing, 2020. ISBN: 9781838820046
https://ebookcentral-proquest-com.recursos.biblioteca.upc.edu/lib/upcatalunya-ebooks/detail.action?pq-origsite=primo&docID=6034344
Multi-Agent Reinforcement Learning Foundations and Modern Approaches - Albrecht, Stefano V.; Christianos, Filippos; Schäfer, Lukas, MIT Press, 2024. ISBN: 9780262049375
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991005317955806711&context=L&vid=34CSUC_UPC:VU1&lang=ca

Complementary

Mastering reinforcement learning with Python : build next-generation, self-learning models using reinforcement learning techniques and best practices - Bilgin, Enes, Packt Publishing, [2020]. ISBN: 9781838644147
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004957196306711&context=L&vid=34CSUC_UPC:VU1&lang=ca

Web links

Pàgina web on trobar les transparències i els materials del laboratori https://sites.google.com/upc.edu/ara

Previous capacities

Basic knowledge of Deep Learning and Reinforcement Learning (having completed APRNS)