Advanced Reinforcement Learning

You are here

Credits
6
Types
Elective
Requirements
This subject has not requirements, but it has got previous capacities
Department
CS
This course deepens in the topic of reinforcement learning (RL) after a general introduction in the APRNS course. The course emphasizes, among others, techniques that allow accelerating the learning time of policies and techniques that allow their application in real problems. It also describes how RL is used in cases ranging from learning superhuman policies in games (such as Go), to learning the coordination of multi-agent systems, through its application in the development of large language models (LLMs).

Teachers

Person in charge

  • Mario Martín Muñoz ( )

Weekly hours

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6

Competences

Transversal Competences

Transversals

  • CT6 [Avaluable] - Autonomous Learning. Detect deficiencies in one's own knowledge and overcome them through critical reflection and the choice of the best action to extend this knowledge.

Basic

  • CB5 - That the students have developed those learning skills necessary to undertake later studies with a high degree of autonomy

Technical Competences

Especifics

  • CE18 - To acquire and develop computational learning techniques and to design and implement applications and systems that use them, including those dedicated to the automatic extraction of information and knowledge from large volumes of data.
  • CE19 - To use current computer systems, including high-performance systems, for the processing of large volumes of data from the knowledge of its structure, operation and particularities.
  • CE22 - To represent, design and analyze dynamic systems. To acquire concepts such as observability, stability and controllability.

Generic Technical Competences

Generic

  • CG2 - To use the fundamental knowledge and solid work methodologies acquired during the studies to adapt to the new technological scenarios of the future.
  • CG4 - Reasoning, analyzing reality and designing algorithms and formulations that model it. To identify problems and construct valid algorithmic or mathematical solutions, eventually new, integrating the necessary multidisciplinary knowledge, evaluating different alternatives with a critical spirit, justifying the decisions taken, interpreting and synthesizing the results in the context of the application domain and establishing methodological generalizations based on specific applications.
  • CG9 - To face new challenges with a broad vision of the possibilities of a professional career in the field of Artificial Intelligence. Develop the activity applying quality criteria and continuous improvement, and act rigorously in professional development. Adapt to organizational or technological changes. Work in situations of lack of information and / or with time and / or resource restrictions.

Objectives

  1. Learn the problems when designing reinforcement functions and how to solve them
    Related competences: CG4, CB5, CE18,
  2. Learn techniques to accelerate reinforcement learning so that it is viable in real-world applications.
    Related competences: CG2, CG9, CT6, CB5,
  3. Understand the problem of simultaneous learning in multi-agent systems and the techniques that allow this learning
    Related competences: CG4, CE22,
  4. Learn how to incorporate learning from examples to get better policies than those generated by the examples and to get the hidden reinforcement function behind those examples.
    Related competences: CG2, CG4, CE19,

Contents

  1. Reinforcement Function Design: Inverse Reinforcement Learning (IRL)
    The reinforcement function is crucial in RL but is not always easy to define. This topic shows how to derive a reinforcement function from example behavior.
  2. Learning the Reinforcement Function with a Human in the Loop (RLHF)
    In the definition of complex reinforcement functions we do not have examples of behaviors to apply IRL to. In this case we will see how to create the Reinforcement Function with Human Feedback (RLHF). This mechanism is the basis of training to align large language models such as ChatGPT and others.
  3. Reinforcement learning aided by world model learning.
    Reinforcement learning is slow. To reduce the number of interactions with the environment, one possibility is to learn a predictive model of the environment from interactions with it and thus be able to generate simulated experiences from which one could learn without interacting so much with the real world. In this topic, we see this approach and its limitations.
  4. Basic and Advanced Exploration in RL: Implementing Curiosity
    A core element in RL is exploration to find better policies. Basic exploration methods involve taking random actions, which leads to inefficiency and slow learning. There are better ways to explore new options and this topic describes them, from determining uncertainty in learned knowledge to implementing curiosity methods to improve exploration.
  5. Learning in Multiagent systems by using RL
    In RL it is assumed that the environment is Markovian and that, therefore, changes in the environment occur only by actions of the learning agent. When the agent learns in an environment where other agents are also acting and learning, this condition no longer holds and RL algorithms must adapt. In this topic we see the most advanced methods of reinforcement learning in multi-agent systems, with special emphasis on cooperative problems.
  6. Competition in multiagent systems using RL: AlfaGo and family
    A special case of interaction in multi-agent systems is competition and, in particular, zero-sum games. In this scenario, reinforcement learning has led to the development of superhuman abilities in some cases, notably the case of the game of Go. In this topic we will see the self-play and MonteCarlo Tree Search techniques that allow you to develop these skills.
  7. RL in sparse reinforcement functions: Conditional policies and hindsight
    Often in RL the reinforcement function is sparse (uninformative). This has the advantage that the obtained policies are not biased, but it slows down learning. In this topic we study the policies conditioned on the objective and the technique of hindsight which have been shown to be very effective in accelerating the learning in these cases.
  8. Off-line reinforcement learning
    In some applications we have examples of behavior generated by humans or other policies. One possibility to take advantage of this data is to do imitation learning or apply IRL to learn from examples. However, the resulting policy will be at most as good as the one generated by the examples. Can we get better policies than the examples generate using RL? Off-line RL takes advantage of the quality of Off-policy methods to obtain good policies not with the data it generates but with possibly suboptimal data generated by other policies (the examples)
  9. Curricular and hierarchical learning
    In RL it is often difficult to learn complex tasks from scratch. One approach, aligned with how humans learn, is to define a curriculum or hierarchy of tasks to initially learn before attempting to learn the complex task for which the agent is not ready. In this topic you will see how to do curriculum learning and hierarchical learning in these cases.
  10. Transfer learning, Meta learning, Lifelong learning and AGI
    RL is an interesting approach to autonomous learning by intelligent agents. However, by its nature it is focused on specific tasks when it is known that an intelligent agent must solve different tasks. This subject considers the interaction between different tasks that must be learned with respect to the transfer of knowledge from one to another (Transfer learning), the learning of tasks to improve learning in subsequent tasks (Meta- Learning) and, finally, regarding the maintenance of the knowledge learned during the life of the agent (Life-long learning). We will see how all these techniques could empower the agent and enable true Artificial General Intelligence (AGI).

Activities

Activity Evaluation act


Quick review of reinforcement learning fundamentals, theory and algorithm



Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
6h

Reinforcement Function Design: Inverse Reinforcement Learning (IRL)


Objectives: 1
Contents:
Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
6h

Learning the reinforcement function with a human in the Loop


Objectives: 1
Contents:
Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
6h

Off-line reinforcement learning


Objectives: 4
Contents:
Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
6h

RL in sparse reinforcement functions: Conditioned policies and hindsight


Objectives: 2
Contents:
Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
6h

Reinforcement learning aided by model learning


Objectives: 2
Contents:
Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
6h

Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
6h

Control first part of the course



Week: 8 (Outside class hours)
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h

Advanced Exploration in RL: Implementing Curiosity


Objectives: 2
Contents:
Theory
2h
Problems
0h
Laboratory
4h
Guided learning
0h
Autonomous learning
10h

Learning in Multiagent systems using RL


Objectives: 3
Contents:
Theory
4h
Problems
0h
Laboratory
4h
Guided learning
0h
Autonomous learning
10h

Competition in multiagent systems using RL: AlfaGo and family


Objectives: 3
Contents:
Theory
2h
Problems
0h
Laboratory
4h
Guided learning
0h
Autonomous learning
10h

Curricular and hierarchical learning



Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
6h

Transfer learning, Meta learning, Lifelong learning



Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
6h

RL and AGI



Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Final control


Objectives: 1 2 3 4
Week: 15 (Outside class hours)
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h

Teaching methodology

The classes are divided into theory, problem and laboratory sessions.

In the theory sessions, knowledge of the subject will be developed, interspersed with the presentation of new theoretical material with examples and interaction with the students in order to discuss the concepts.

In the laboratory classes, small practices will be developed using tools and using specific libraries that will allow you to practice and reinforce the knowledge of the theory classes.

Evaluation methodology

The subject will include the following assessment acts:

- Reports of the laboratory activities, which must be delivered within the deadline indicated for each session (roughly, 2 weeks). Based on a weighted average of the grades of these reports, a laboratory grade will be calculated, L.

- A first partial exam, taken towards the middle of the course, of the material seen until then. Let P1 be the grade obtained in this exam.

- On the designated day within the exam period, a second partial exam of the subject not covered by the first partial. Let P2 be the grade obtained in this exam.

The three grades L, P1, P2 are between 0 and 10.

The final grade of the subject will be: 0.4*L +0.3*P1 + 0.3*P2

Bibliography

Basic:

Complementary:

Web links

Previous capacities

Basic knowledge of Deep Learning and Reinforcement Learning (having completed APRNS)