The main objective of the subject is to introduce the most common approaches for dialog and speech processing. During the course, we will cover its main methods, from rule-based systems to deep-learning-based ones that learn from corpora of millions of examples. By the end of the subject, students will understand how phone assistants, virtual assistants (e.g., Alexa or Siri), or chatbots such as ChatGPT work.
Teachers
Person in charge
Carlos Escolano Peinado (
)
Others
Jordi Luque Serrano (
)
Weekly hours
Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6
Competences
Transversal Competences
Transversals
CT1 - Entrepreneurship and innovation. Know and understand the organization of a company and the sciences that govern its activity; Have the ability to understand labor standards and the relationships between planning, industrial and commercial strategies, quality and profit.
CT2 - Sustainability and Social Commitment. To know and understand the complexity of economic and social phenomena typical of the welfare society; Be able to relate well-being to globalization and sustainability; Achieve skills to use in a balanced and compatible way the technique, the technology, the economy and the sustainability.
CT6 [Avaluable] - Autonomous Learning. Detect deficiencies in one's own knowledge and overcome them through critical reflection and the choice of the best action to extend this knowledge.
CT8 - Gender perspective. An awareness and understanding of sexual and gender inequalities in society in relation to the field of the degree, and the incorporation of different needs and preferences due to sex and gender when designing solutions and solving problems.
Basic
CB2 - That the students know how to apply their knowledge to their work or vocation in a professional way and possess the skills that are usually demonstrated through the elaboration and defense of arguments and problem solving within their area of ??study.
CB3 - That students have the ability to gather and interpret relevant data (usually within their area of ??study) to make judgments that include a reflection on relevant social, scientific or ethical issues.
CB4 - That the students can transmit information, ideas, problems and solutions to a specialized and non-specialized public.
CB5 - That the students have developed those learning skills necessary to undertake later studies with a high degree of autonomy
Technical Competences
Especifics
CE14 - To master the foundations, paradigms and techniques of intelligent systems and to analyze, designing and build computer systems, services and applications that use these techniques in any field of application, including robotics.
CE15 - To acquire, formalize and represent human knowledge in a computable form for solving problems through a computer system in any field of application, particularly those related to aspects of computing, perception and performance in intelligent environments or environments.
CE16 - To design and evaluate human-machine interfaces that guarantee the accessibility and usability of computer systems, services and applications.
CE17 - To develop and evaluate interactive systems and presentation of complex information and its application to solving human-computer and human-robot interaction design problems.
CE18 - To acquire and develop computational learning techniques and to design and implement applications and systems that use them, including those dedicated to the automatic extraction of information and knowledge from large volumes of data.
CE27 - To design and apply speech processing techniques, speech recognition and human language comprehension, with application in social artificial intelligence.
Generic Technical Competences
Generic
CG3 - To define, evaluate and select hardware and software platforms for the development and execution of computer systems, services and applications in the field of artificial intelligence.
CG4 - Reasoning, analyzing reality and designing algorithms and formulations that model it. To identify problems and construct valid algorithmic or mathematical solutions, eventually new, integrating the necessary multidisciplinary knowledge, evaluating different alternatives with a critical spirit, justifying the decisions taken, interpreting and synthesizing the results in the context of the application domain and establishing methodological generalizations based on specific applications.
CG5 - Work in multidisciplinary teams and projects related to artificial intelligence and robotics, interacting fluently with engineers and professionals from other disciplines.
CG6 - To identify opportunities for innovative applications of artificial intelligence and robotics in constantly evolving technological environments.
CG7 - To interpret and apply current legislation, as well as specifications, regulations and standards in the field of artificial intelligence.
CG8 - Perform an ethical exercise of the profession in all its facets, applying ethical criteria in the design of systems, algorithms, experiments, use of data, in accordance with the ethical systems recommended by national and international organizations, with special emphasis on security, robustness , privacy, transparency, traceability, prevention of bias (race, gender, religion, territory, etc.) and respect for human rights.
CG9 - To face new challenges with a broad vision of the possibilities of a professional career in the field of Artificial Intelligence. Develop the activity applying quality criteria and continuous improvement, and act rigorously in professional development. Adapt to organizational or technological changes. Work in situations of lack of information and / or with time and / or resource restrictions.
Objectives
Understand the fundamental theories and techniques associated with dialog processing and generation.
Related competences:
CG3,
CG5,
CG6,
CT6,
CB3,
CB4,
CE14,
CE17,
Understand the fundamental theories and techniques associated with voice and speech processing.
Related competences:
CE27,
CG3,
CG5,
CT6,
CB2,
CB3,
CB4,
CE14,
CE17,
Get to know the most relevant resources and applications for Dialog Processing and Generation.
Related competences:
CE27,
CG3,
CG4,
CG5,
CG6,
CT6,
CT8,
CB3,
CB4,
CB5,
CE15,
Introduction
Introduction to the subject's content and to speech and dialog processing.
Rule-based systems.
Dialog systems based on human-crafted rules.
Corpus-based dialog systems: Frame-based and retrieval systems
Statistical dialog systems based on an example corpus.
Deep Learning based dialog systems
Introduction to seq2seq, Transformer, and their application to dialog tasks.
Ethical considerations and dialog policy.
Possible risks of dialog systems and techniques to mitigate them.
Speech processing.
Techniques to transform speech and use it in our systems.
Automatic speech recognition
Deep learning methods for automatic speech recognition.
Text-to-Speech systems.
Generative text-to-speech systems based on Deep Learning.
Activities
ActivityEvaluation act
Introductory Session
Introduction to the concepts of dialog and speech processing. We will also revisit some basic concepts of natural language processing, that are required to understand the subject (Tokenization and embeddings).
Theory: Explain the objectives and evaluation of the subject, and revise some basic natural language processing concepts.
Laboratory: Present the practical exercises to do during the subject.
Sistemas de diálogo basados en corpus: Sistemas de retrieval i frame-based.
En esta actividad se explicarán los sistemas basados en un corpus de ejemplos y sus principales diferencias con los sistemas basados en reglas. Dentro de estos nuevos sistemas, nos centraremos en los sistemas que recuperan ejemplos de una base de ejemplos (retrieval) y los sistemas generativos a partir de frames (frame-based).
Theory: In this activity we'll explain corpus-based systems and their main differences with rule-based systems. About this new approach, we will focus on retrieval systems from an example corpus and generative frame-based systems.
The course deepens the concepts of Human Language Processing, extending them to dialogue tasks. In addition, it introduces a new modality of data, speech, and how both tasks can be combined when creating our systems.
Classes are organized into theory and laboratory sessions. In the theory classes, the teacher will present the concepts to the students by combining them with exercises and questions to make the classes more interactive and ensure that the students achieve the concepts of the subject. In laboratory classes, students work in groups independently to apply the concepts they have seen in class to real data. These tasks include laboratory sessions where students can make inquiries and resolve their doubts, with independent work to develop their systems. The students' ability to research and find new solutions to the proposed problems will be assessed. In addition, at the end of the subject, students will have to test their ability to acquire new knowledge independently, by presenting a research article on one of the subjects covered in the subject.
Evaluation methodology
20% Partial Exam + 25% Final Exam + 45% Laboratory + 10% Paper Presentation
The theoretical part of the subject will be evaluated based on two exams. The first partial exam will focus on the dialogue blog (Contents 1-5). The second exam (Final) will evaluate the second block of speech processing (Contents 6-8). This exam will include exercises that combine speaking and dialogue to evaluate how students have acquired the knowledge of both blocks.
Regarding the laboratory part, the three activities will have the same weight, 15% of the total of the subject. Students will have around four weeks to complete them. The objective is to evaluate how the students apply the content seen in class in practice as well as their ability to solve problems and work as a team.
Finally, at the end of the subject, the students will have to choose an article on the processing of dialogue or voice and make a presentation in class. The objective of this task is to evaluate your ability to analyze new information and be able to achieve new knowledge of the subject, autonomously.
Assessment of skills.
The assessment of competence on autonomous use of information will be carried out with the oral presentation of the scientific article (10%). The students must be able to draw their conclusions on a new work related to the topics seen in class.
Bibliography
Basic:
Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition -
Jurafsky, Dan; Martin, James H,
Els autors, 2019.
To be able to do this subject, it is recommended to have previously taken the following subjects:
XNDL-IA: In this subject, the fundamentals of deep learning are explained, including recurrent networks. Knowing these topics is necessary to understand how models based on Seq2Seq architectures work, state of the art in both voice and dialogue processing.
PLH-IA: This subject explains the basics of human language processing. Concepts such as text preprocessing to reduce ambiguities or the continuous representation of text are necessary to be able to develop the systems we will study in the subject.