This course deepens in the topic of reinforcement learning (RL) after a general introduction in the APRNS course. The course emphasizes, among others, techniques that allow accelerating the learning time of policies and techniques that allow their application in real problems. It also describes how RL is used in cases ranging from learning superhuman policies in games (such as Go), to learning the coordination of multi-agent systems, through its application in the development of large language models (LLMs).
Teachers
Person in charge
Mario Martín Muñoz (
)
Weekly hours
Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6
Competences
Transversal Competences
Transversals
CT6 [Avaluable] - Autonomous Learning. Detect deficiencies in one's own knowledge and overcome them through critical reflection and the choice of the best action to extend this knowledge.
Basic
CB5 - That the students have developed those learning skills necessary to undertake later studies with a high degree of autonomy
Technical Competences
Especifics
CE18 - To acquire and develop computational learning techniques and to design and implement applications and systems that use them, including those dedicated to the automatic extraction of information and knowledge from large volumes of data.
CE19 - To use current computer systems, including high-performance systems, for the processing of large volumes of data from the knowledge of its structure, operation and particularities.
CE22 - To represent, design and analyze dynamic systems. To acquire concepts such as observability, stability and controllability.
Generic Technical Competences
Generic
CG2 - To use the fundamental knowledge and solid work methodologies acquired during the studies to adapt to the new technological scenarios of the future.
CG4 - Reasoning, analyzing reality and designing algorithms and formulations that model it. To identify problems and construct valid algorithmic or mathematical solutions, eventually new, integrating the necessary multidisciplinary knowledge, evaluating different alternatives with a critical spirit, justifying the decisions taken, interpreting and synthesizing the results in the context of the application domain and establishing methodological generalizations based on specific applications.
CG9 - To face new challenges with a broad vision of the possibilities of a professional career in the field of Artificial Intelligence. Develop the activity applying quality criteria and continuous improvement, and act rigorously in professional development. Adapt to organizational or technological changes. Work in situations of lack of information and / or with time and / or resource restrictions.
Objectives
Learn the problems when designing reinforcement functions and how to solve them
Related competences:
CG4,
CB5,
CE18,
Learn techniques to accelerate reinforcement learning so that it is viable in real-world applications.
Related competences:
CG2,
CG9,
CT6,
CB5,
Understand the problem of simultaneous learning in multi-agent systems and the techniques that allow this learning
Related competences:
CG4,
CE22,
Learn how to incorporate learning from examples to get better policies than those generated by the examples and to get the hidden reinforcement function behind those examples.
Related competences:
CG2,
CG4,
CE19,
Contents
Reinforcement Function Design: Inverse Reinforcement Learning (IRL)
The reinforcement function is crucial in RL but is not always easy to define. This topic shows how to derive a reinforcement function from example behavior.
Learning the Reinforcement Function with a Human in the Loop (RLHF)
In the definition of complex reinforcement functions we do not have examples of behaviors to apply IRL to. In this case we will see how to create the Reinforcement Function with Human Feedback (RLHF). This mechanism is the basis of training to align large language models such as ChatGPT and others.
Reinforcement learning aided by world model learning.
Reinforcement learning is slow. To reduce the number of interactions with the environment, one possibility is to learn a predictive model of the environment from interactions with it and thus be able to generate simulated experiences from which one could learn without interacting so much with the real world. In this topic, we see this approach and its limitations.
Basic and Advanced Exploration in RL: Implementing Curiosity
A core element in RL is exploration to find better policies. Basic exploration methods involve taking random actions, which leads to inefficiency and slow learning. There are better ways to explore new options and this topic describes them, from determining uncertainty in learned knowledge to implementing curiosity methods to improve exploration.
Learning in Multiagent systems by using RL
In RL it is assumed that the environment is Markovian and that, therefore, changes in the environment occur only by actions of the learning agent. When the agent learns in an environment where other agents are also acting and learning, this condition no longer holds and RL algorithms must adapt. In this topic we see the most advanced methods of reinforcement learning in multi-agent systems, with special emphasis on cooperative problems.
Competition in multiagent systems using RL: AlfaGo and family
A special case of interaction in multi-agent systems is competition and, in particular, zero-sum games. In this scenario, reinforcement learning has led to the development of superhuman abilities in some cases, notably the case of the game of Go. In this topic we will see the self-play and MonteCarlo Tree Search techniques that allow you to develop these skills.
RL in sparse reinforcement functions: Conditional policies and hindsight
Often in RL the reinforcement function is sparse (uninformative). This has the advantage that the obtained policies are not biased, but it slows down learning. In this topic we study the policies conditioned on the objective and the technique of hindsight which have been shown to be very effective in accelerating the learning in these cases.
Off-line reinforcement learning
In some applications we have examples of behavior generated by humans or other policies. One possibility to take advantage of this data is to do imitation learning or apply IRL to learn from examples. However, the resulting policy will be at most as good as the one generated by the examples. Can we get better policies than the examples generate using RL? Off-line RL takes advantage of the quality of Off-policy methods to obtain good policies not with the data it generates but with possibly suboptimal data generated by other policies (the examples)
Curricular and hierarchical learning
In RL it is often difficult to learn complex tasks from scratch. One approach, aligned with how humans learn, is to define a curriculum or hierarchy of tasks to initially learn before attempting to learn the complex task for which the agent is not ready. In this topic you will see how to do curriculum learning and hierarchical learning in these cases.
Transfer learning, Meta learning, Lifelong learning and AGI
RL is an interesting approach to autonomous learning by intelligent agents. However, by its nature it is focused on specific tasks when it is known that an intelligent agent must solve different tasks. This subject considers the interaction between different tasks that must be learned with respect to the transfer of knowledge from one to another (Transfer learning), the learning of tasks to improve learning in subsequent tasks (Meta- Learning) and, finally, regarding the maintenance of the knowledge learned during the life of the agent (Life-long learning). We will see how all these techniques could empower the agent and enable true Artificial General Intelligence (AGI).
Activities
ActivityEvaluation act
Quick review of reinforcement learning fundamentals, theory and algorithm
Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
6h
Reinforcement Function Design: Inverse Reinforcement Learning (IRL)
The classes are divided into theory, problem and laboratory sessions.
In the theory sessions, knowledge of the subject will be developed, interspersed with the presentation of new theoretical material with examples and interaction with the students in order to discuss the concepts.
In the laboratory classes, small practices will be developed using tools and using specific libraries that will allow you to practice and reinforce the knowledge of the theory classes.
Evaluation methodology
The subject will include the following assessment acts:
- Reports of the laboratory activities, which must be delivered within the deadline indicated for each session (roughly, 2 weeks). Based on a weighted average of the grades of these reports, a laboratory grade will be calculated, L.
- A first partial exam, taken towards the middle of the course, of the material seen until then. Let P1 be the grade obtained in this exam.
- On the designated day within the exam period, a second partial exam of the subject not covered by the first partial. Let P2 be the grade obtained in this exam.
The three grades L, P1, P2 are between 0 and 10.
The final grade of the subject will be: 0.4*L +0.3*P1 + 0.3*P2