Skip to main content

Advanced Reinforcement Learning

Credits
6
Types
Elective
Requirements
This subject has not requirements , but it has got previous capacities
Department
CS
Web
https://sites.google.com/upc.edu/ara
This course deepens in the topic of reinforcement learning (RL) after a general introduction in the APRNS course. The course emphasizes, among others, techniques that allow accelerating the learning time of policies and techniques that allow their application in real problems. It also describes how RL is used in cases ranging from learning superhuman policies in games (such as Go), to learning the coordination of multi-agent systems, through its application in the development of large language models (LLMs).

Teachers

Person in charge

Weekly hours

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6

Competences

Transversals

  • CT6 [Avaluable] - Autonomous Learning. Detect deficiencies in one's own knowledge and overcome them through critical reflection and the choice of the best action to extend this knowledge.
  • Basic

  • CB5 - That the students have developed those learning skills necessary to undertake later studies with a high degree of autonomy
  • Especifics

  • CE18 - To acquire and develop computational learning techniques and to design and implement applications and systems that use them, including those dedicated to the automatic extraction of information and knowledge from large volumes of data.
  • CE19 - To use current computer systems, including high-performance systems, for the processing of large volumes of data from the knowledge of its structure, operation and particularities.
  • CE22 - To represent, design and analyze dynamic systems. To acquire concepts such as observability, stability and controllability.
  • Generic

  • CG2 - To use the fundamental knowledge and solid work methodologies acquired during the studies to adapt to the new technological scenarios of the future.
  • CG4 - Reasoning, analyzing reality and designing algorithms and formulations that model it. To identify problems and construct valid algorithmic or mathematical solutions, eventually new, integrating the necessary multidisciplinary knowledge, evaluating different alternatives with a critical spirit, justifying the decisions taken, interpreting and synthesizing the results in the context of the application domain and establishing methodological generalizations based on specific applications.
  • CG9 - To face new challenges with a broad vision of the possibilities of a professional career in the field of Artificial Intelligence. Develop the activity applying quality criteria and continuous improvement, and act rigorously in professional development. Adapt to organizational or technological changes. Work in situations of lack of information and / or with time and / or resource restrictions.
  • Objectives

    1. Learn the problems when designing reinforcement functions and how to solve them
      Related competences: CB5, CG4, CE18,
    2. Learn techniques to accelerate reinforcement learning so that it is viable in real-world applications.
      Related competences: CB5, CT6, CG2, CG9,
    3. Understand the problem of simultaneous learning in multi-agent systems and the techniques that allow this learning
      Related competences: CG4, CE22,
    4. Learn how to incorporate learning from examples to get better policies than those generated by the examples and to get the hidden reinforcement function behind those examples.
      Related competences: CG2, CG4, CE19,

    Contents

    1. Reinforcement Function Design: Inverse Reinforcement Learning (IRL)
      The reinforcement function is crucial in RL but is not always easy to define. This topic shows how to derive a reinforcement function from example behavior.
    2. Learning the Reinforcement Function with a Human in the Loop (RLHF)
      In the definition of complex reinforcement functions we do not have examples of behaviors to apply IRL to. In this case we will see how to create the Reinforcement Function with Human Feedback (RLHF). This mechanism is the basis of training to align large language models such as ChatGPT and others.
    3. Reinforcement learning aided by world model learning.
      Reinforcement learning is slow. To reduce the number of interactions with the environment, one possibility is to learn a predictive model of the environment from interactions with it and thus be able to generate simulated experiences from which one could learn without interacting so much with the real world. In this topic, we see this approach and its limitations.
    4. Basic and Advanced Exploration in RL: Implementing Curiosity
      A core element in RL is exploration to find better policies. Basic exploration methods involve taking random actions, which leads to inefficiency and slow learning. There are better ways to explore new options and this topic describes them, from determining uncertainty in learned knowledge to implementing curiosity methods to improve exploration.
    5. Learning in Multiagent systems by using RL
      In RL it is assumed that the environment is Markovian and that, therefore, changes in the environment occur only by actions of the learning agent. When the agent learns in an environment where other agents are also acting and learning, this condition no longer holds and RL algorithms must adapt. In this topic we see the most advanced methods of reinforcement learning in multi-agent systems, with special emphasis on cooperative problems.
    6. Competition in multiagent systems using RL: AlfaGo and family
      A special case of interaction in multi-agent systems is competition and, in particular, zero-sum games. In this scenario, reinforcement learning has led to the development of superhuman abilities in some cases, notably the case of the game of Go. In this topic we will see the self-play and MonteCarlo Tree Search techniques that allow you to develop these skills.
    7. RL in sparse reinforcement functions: Conditional policies and hindsight
      Often in RL the reinforcement function is sparse (uninformative). This has the advantage that the obtained policies are not biased, but it slows down learning. In this topic we study the policies conditioned on the objective and the technique of hindsight which have been shown to be very effective in accelerating the learning in these cases.
    8. Off-line reinforcement learning
      In some applications we have examples of behavior generated by humans or other policies. One possibility to take advantage of this data is to do imitation learning or apply IRL to learn from examples. However, the resulting policy will be at most as good as the one generated by the examples. Can we get better policies than the examples generate using RL? Off-line RL takes advantage of the quality of Off-policy methods to obtain good policies not with the data it generates but with possibly suboptimal data generated by other policies (the examples)
    9. Curricular and hierarchical learning
      In RL it is often difficult to learn complex tasks from scratch. One approach, aligned with how humans learn, is to define a curriculum or hierarchy of tasks to initially learn before attempting to learn the complex task for which the agent is not ready. In this topic you will see how to do curriculum learning and hierarchical learning in these cases.
    10. Transfer learning, Meta learning, Lifelong learning and AGI
      RL is an interesting approach to autonomous learning by intelligent agents. However, by its nature it is focused on specific tasks when it is known that an intelligent agent must solve different tasks. This subject considers the interaction between different tasks that must be learned with respect to the transfer of knowledge from one to another (Transfer learning), the learning of tasks to improve learning in subsequent tasks (Meta- Learning) and, finally, regarding the maintenance of the knowledge learned during the life of the agent (Life-long learning). We will see how all these techniques could empower the agent and enable true Artificial General Intelligence (AGI).

    Activities

    Activity Evaluation act


    Quick review of reinforcement learning fundamentals, theory and algorithm



    Theory
    2h
    Problems
    0h
    Laboratory
    2h
    Guided learning
    0h
    Autonomous learning
    6h

    Reinforcement Function Design: Inverse Reinforcement Learning (IRL)


    Objectives: 1
    Contents:
    Theory
    2h
    Problems
    0h
    Laboratory
    2h
    Guided learning
    0h
    Autonomous learning
    6h

    Learning the reinforcement function with a human in the Loop


    Objectives: 1
    Contents:
    Theory
    2h
    Problems
    0h
    Laboratory
    2h
    Guided learning
    0h
    Autonomous learning
    6h

    Off-line reinforcement learning


    Objectives: 4
    Contents:
    Theory
    2h
    Problems
    0h
    Laboratory
    2h
    Guided learning
    0h
    Autonomous learning
    6h

    RL in sparse reinforcement functions: Conditioned policies and hindsight


    Objectives: 2
    Contents:
    Theory
    2h
    Problems
    0h
    Laboratory
    2h
    Guided learning
    0h
    Autonomous learning
    6h

    Reinforcement learning aided by model learning


    Objectives: 2
    Contents:
    Theory
    2h
    Problems
    0h
    Laboratory
    2h
    Guided learning
    0h
    Autonomous learning
    6h

    Theory
    2h
    Problems
    0h
    Laboratory
    2h
    Guided learning
    0h
    Autonomous learning
    6h

    Control first part of the course



    Week: 8 (Outside class hours)
    Theory
    0h
    Problems
    0h
    Laboratory
    0h
    Guided learning
    0h
    Autonomous learning
    0h

    Advanced Exploration in RL: Implementing Curiosity


    Objectives: 2
    Contents:
    Theory
    2h
    Problems
    0h
    Laboratory
    4h
    Guided learning
    0h
    Autonomous learning
    10h

    Learning in Multiagent systems using RL


    Objectives: 3
    Contents:
    Theory
    4h
    Problems
    0h
    Laboratory
    4h
    Guided learning
    0h
    Autonomous learning
    10h

    Competition in multiagent systems using RL: AlfaGo and family


    Objectives: 3
    Contents:
    Theory
    2h
    Problems
    0h
    Laboratory
    4h
    Guided learning
    0h
    Autonomous learning
    10h

    Curricular and hierarchical learning



    Theory
    2h
    Problems
    0h
    Laboratory
    2h
    Guided learning
    0h
    Autonomous learning
    6h

    Transfer learning, Meta learning, Lifelong learning



    Theory
    2h
    Problems
    0h
    Laboratory
    2h
    Guided learning
    0h
    Autonomous learning
    6h

    RL and AGI



    Theory
    2h
    Problems
    0h
    Laboratory
    0h
    Guided learning
    0h
    Autonomous learning
    6h

    Final control


    Objectives: 1 2 3 4
    Week: 15 (Outside class hours)
    Theory
    0h
    Problems
    0h
    Laboratory
    0h
    Guided learning
    0h
    Autonomous learning
    0h

    Teaching methodology

    The classes are divided into theory, problem and laboratory sessions.

    In the theory sessions, knowledge of the subject will be developed, interspersed with the presentation of new theoretical material with examples and interaction with the students in order to discuss the concepts.

    In the laboratory classes, small practices will be developed using tools and using specific libraries that will allow you to practice and reinforce the knowledge of the theory classes.

    Evaluation methodology

    The subject will include the following assessment acts:

    - Reports of the laboratory activities, which must be delivered within the deadline indicated for each session (roughly, 2 weeks). Based on a weighted average of the grades of these reports, a laboratory grade will be calculated, L.

    - A first partial exam, taken towards the middle of the course, of the material seen until then. Let P1 be the grade obtained in this exam.

    - On the designated day within the exam period, a second partial exam of the subject not covered by the first partial. Let P2 be the grade obtained in this exam.

    The three grades L, P1, P2 are between 0 and 10.

    The final grade of the subject will be: 0.4*L +0.3*P1 + 0.3*P2

    Bibliography

    Basic

    Complementary

    Web links

    Previous capacities

    Basic knowledge of Deep Learning and Reinforcement Learning (having completed APRNS)