Voice and Dialog Processing

You are here

Credits
6
Types
Compulsory
Requirements
This subject has not requirements, but it has got previous capacities
Department
CS
The main objective of the subject is to introduce the most common approaches for dialog and speech processing. During the course, we will cover its main methods, from rule-based systems to deep-learning-based ones that learn from corpora of millions of examples. By the end of the subject, students will understand how phone assistants, virtual assistants (e.g., Alexa or Siri), or chatbots such as ChatGPT work.

Teachers

Person in charge

  • Carlos Escolano Peinado ( )

Others

  • Anna Arias Duart ( )

Weekly hours

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6

Competences

Transversal Competences

Transversals

  • CT1 - Entrepreneurship and innovation. Know and understand the organization of a company and the sciences that govern its activity; Have the ability to understand labor standards and the relationships between planning, industrial and commercial strategies, quality and profit.
  • CT2 - Sustainability and Social Commitment. To know and understand the complexity of economic and social phenomena typical of the welfare society; Be able to relate well-being to globalization and sustainability; Achieve skills to use in a balanced and compatible way the technique, the technology, the economy and the sustainability.
  • CT6 [Avaluable] - Autonomous Learning. Detect deficiencies in one's own knowledge and overcome them through critical reflection and the choice of the best action to extend this knowledge.
  • CT8 - Gender perspective. An awareness and understanding of sexual and gender inequalities in society in relation to the field of the degree, and the incorporation of different needs and preferences due to sex and gender when designing solutions and solving problems.

Basic

  • CB2 - That the students know how to apply their knowledge to their work or vocation in a professional way and possess the skills that are usually demonstrated through the elaboration and defense of arguments and problem solving within their area of ??study.
  • CB3 - That students have the ability to gather and interpret relevant data (usually within their area of ??study) to make judgments that include a reflection on relevant social, scientific or ethical issues.
  • CB4 - That the students can transmit information, ideas, problems and solutions to a specialized and non-specialized public.
  • CB5 - That the students have developed those learning skills necessary to undertake later studies with a high degree of autonomy

Technical Competences

Especifics

  • CE14 - To master the foundations, paradigms and techniques of intelligent systems and to analyze, designing and build computer systems, services and applications that use these techniques in any field of application, including robotics.
  • CE15 - To acquire, formalize and represent human knowledge in a computable form for solving problems through a computer system in any field of application, particularly those related to aspects of computing, perception and performance in intelligent environments or environments.
  • CE16 - To design and evaluate human-machine interfaces that guarantee the accessibility and usability of computer systems, services and applications.
  • CE17 - To develop and evaluate interactive systems and presentation of complex information and its application to solving human-computer and human-robot interaction design problems.
  • CE18 - To acquire and develop computational learning techniques and to design and implement applications and systems that use them, including those dedicated to the automatic extraction of information and knowledge from large volumes of data.
  • CE27 - To design and apply speech processing techniques, speech recognition and human language comprehension, with application in social artificial intelligence.

Generic Technical Competences

Generic

  • CG3 - To define, evaluate and select hardware and software platforms for the development and execution of computer systems, services and applications in the field of artificial intelligence.
  • CG4 - Reasoning, analyzing reality and designing algorithms and formulations that model it. To identify problems and construct valid algorithmic or mathematical solutions, eventually new, integrating the necessary multidisciplinary knowledge, evaluating different alternatives with a critical spirit, justifying the decisions taken, interpreting and synthesizing the results in the context of the application domain and establishing methodological generalizations based on specific applications.
  • CG5 - Work in multidisciplinary teams and projects related to artificial intelligence and robotics, interacting fluently with engineers and professionals from other disciplines.
  • CG6 - To identify opportunities for innovative applications of artificial intelligence and robotics in constantly evolving technological environments.
  • CG7 - To interpret and apply current legislation, as well as specifications, regulations and standards in the field of artificial intelligence.
  • CG8 - Perform an ethical exercise of the profession in all its facets, applying ethical criteria in the design of systems, algorithms, experiments, use of data, in accordance with the ethical systems recommended by national and international organizations, with special emphasis on security, robustness , privacy, transparency, traceability, prevention of bias (race, gender, religion, territory, etc.) and respect for human rights.
  • CG9 - To face new challenges with a broad vision of the possibilities of a professional career in the field of Artificial Intelligence. Develop the activity applying quality criteria and continuous improvement, and act rigorously in professional development. Adapt to organizational or technological changes. Work in situations of lack of information and / or with time and / or resource restrictions.

Objectives

  1. Understand the fundamental theories and techniques associated with dialog processing and generation.
    Related competences: CG3, CG5, CG6, CT6, CB3, CB4, CE14, CE17,
  2. Understand the fundamental theories and techniques associated with voice and speech processing.
    Related competences: CE27, CG3, CG5, CT6, CB2, CB3, CB4, CE14, CE17,
  3. Get to know the most relevant resources and applications for Dialog Processing and Generation.
    Related competences: CE27, CG3, CG4, CG5, CG6, CT6, CT8, CB3, CB4, CB5, CE15,
  4. Develop programs to solve particular tasks from Dialog and Speech area.
    Related competences: CE27, CG5, CG7, CG8, CG9, CT1, CT2, CT6, CT8, CB2, CB3, CE14, CE16, CE18,

Contents

  1. Introduction
    Introduction to the subject's content and to speech and dialog processing.
  2. Rule-based systems.
    Dialog systems based on human-crafted rules.
  3. Corpus-based dialog systems: Frame-based and retrieval systems
    Statistical dialog systems based on an example corpus.
  4. Deep Learning based dialog systems
    Introduction to seq2seq, Transformer, and their application to dialog tasks.
  5. Ethical considerations and dialog policy.
    Possible risks of dialog systems and techniques to mitigate them.
  6. Speech processing.
    Techniques to transform speech and use it in our systems.
  7. Automatic speech recognition
    Deep learning methods for automatic speech recognition.
  8. Text-to-Speech systems.
    Generative text-to-speech systems based on Deep Learning.

Activities

Activity Evaluation act


Introductory Session

Introduction to the concepts of dialog and speech processing. We will also revisit some basic concepts of natural language processing, that are required to understand the subject (Tokenization and embeddings).
  • Theory: Explain the objectives and evaluation of the subject, and revise some basic natural language processing concepts.
  • Laboratory: Present the practical exercises to do during the subject.
Objectives: 1 3
Contents:
Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
0h

Rule-based dialog systems.

The historical context of dialog and rule-based systems.
  • Theory: Historical context and rule-based systems. We'll cover human-made rule crafting and its advantages for interpretability.
  • Laboratory:
Objectives: 1 2 4
Contents:
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h

Sistemas de diálogo basados en corpus: Sistemas de retrieval i frame-based.

En esta actividad se explicarán los sistemas basados en un corpus de ejemplos y sus principales diferencias con los sistemas basados en reglas. Dentro de estos nuevos sistemas, nos centraremos en los sistemas que recuperan ejemplos de una base de ejemplos (retrieval) y los sistemas generativos a partir de frames (frame-based).
  • Theory: In this activity we'll explain corpus-based systems and their main differences with rule-based systems. About this new approach, we will focus on retrieval systems from an example corpus and generative frame-based systems.
Objectives: 3 4
Contents:
Theory
4h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h

Deep Learning-based dialog systems.

Introduction to Seq2Seq systems, Transformer, and their application to dialog.
  • Theory: Introduction to Seq2Seq systems, Transformer, and their application to dialog.
Objectives: 1 3 4
Contents:
Theory
6h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h

Ethical considerations and dialog policy.

Ethical considerations when training dialog systems and methods to mitigate the dangers of this kind of system.
  • Theory: Ethical considerations when training dialog systems and methods to mitigate the dangers of this kind of system.
Objectives: 3 4
Contents:
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h

Speech processing.

Introduction to speech processing, especially the transformations needed to train deep learning-based systems.
  • Theory: Introduction to speech processing, especially the transformations needed to train deep learning-based systems.
Objectives: 2 3 4
Contents:
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h

Automatic speech recognition.

Deep Learning-based techniques for speech recognition, CTC loss, and Seq2Seq-based systems.
  • Theory: Deep Learning-based techniques for speech recognition, CTC loss, and Seq2Seq-based systems.
Objectives: 2 3 4
Contents:
Theory
4h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h

Text-to-speech systems.

Introduction to text-to-speech systems using deep learning.
  • Theory: Introduction to text-to-speech systems using deep learning.
Objectives: 2 3 4
Contents:
Theory
4h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h

Partial exam

Exam comprising the dialog part of the subject.
Objectives: 1 3 4
Week: 8 (Outside class hours)
Type: theory exam
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
20h

Final exam

Exam about the speech part of the subject.
Objectives: 1 2 3 4
Week: 15 (Outside class hours)
Type: theory exam
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
25h

Practical excercises.

Delivery of the practical exercises done during the subject.
Objectives: 1 2 3 4
Week: 14 (Outside class hours)
Type: assigment
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
45h

P1. Rule-based dialog system.

Crafting a rule-based dialog system given a task.
  • Laboratory: Crafting a rule-based dialog system given a task.
Objectives: 1 3 4
Contents:
Theory
0h
Problems
0h
Laboratory
8h
Guided learning
0h
Autonomous learning
0h

P2. Framed-based dialog system and Deep Learning.

Crafting a framed-based diàlog system using deep learning techniques.
  • Laboratory: Crafting a framed-based diàlog system using deep learning techniques.
Objectives: 3 4
Contents:
Theory
0h
Problems
0h
Laboratory
8h
Guided learning
0h
Autonomous learning
0h

P3. Automatic speech recognition.

Crafting an automatic speech recognition system using deep learning techniques.
  • Laboratory: Crafting an automatic speech recognition system using deep learning techniques.
Objectives: 2 3 4
Contents:
Theory
0h
Problems
0h
Laboratory
8h
Guided learning
0h
Autonomous learning
0h

Paper presentation about dialog or speech.

Oral presentation of a scientific paper about dialog or speech.
  • Laboratory: Oral presentation of a scientific paper about dialog or speech.
Objectives: 1 2 3 4
Contents:
Theory
0h
Problems
0h
Laboratory
4h
Guided learning
0h
Autonomous learning
0h

Teaching methodology

The course deepens the concepts of Human Language Processing, extending them to dialogue tasks. In addition, it introduces a new modality of data, speech, and how both tasks can be combined when creating our systems.

Classes are organized into theory and laboratory sessions. In the theory classes, the teacher will present the concepts to the students by combining them with exercises and questions to make the classes more interactive and ensure that the students achieve the concepts of the subject. In laboratory classes, students work in groups independently to apply the concepts they have seen in class to real data. These tasks include laboratory sessions where students can make inquiries and resolve their doubts, with independent work to develop their systems. The students' ability to research and find new solutions to the proposed problems will be assessed. In addition, at the end of the subject, students will have to test their ability to acquire new knowledge independently, by presenting a research article on one of the subjects covered in the subject.

Evaluation methodology

20% Partial Exam + 25% Final Exam + 45% Laboratory + 10% Paper Presentation


The theoretical part of the subject will be evaluated based on two exams. The first partial exam will focus on the dialogue blog (Contents 1-5). The second exam (Final) will evaluate the second block of speech processing (Contents 6-8). This exam will include exercises that combine speaking and dialogue to evaluate how students have acquired the knowledge of both blocks.

Regarding the laboratory part, the three activities will have the same weight, 15% of the total of the subject. Students will have around four weeks to complete them. The objective is to evaluate how the students apply the content seen in class in practice as well as their ability to solve problems and work as a team.

Finally, at the end of the subject, the students will have to choose an article on the processing of dialogue or voice and make a presentation in class. The objective of this task is to evaluate your ability to analyze new information and be able to achieve new knowledge of the subject, autonomously.

Assessment of skills.

The assessment of competence on autonomous use of information will be carried out with the oral presentation of the scientific article (10%). The students must be able to draw their conclusions on a new work related to the topics seen in class.

Bibliography

Basic:

Previous capacities

To be able to do this subject, it is recommended to have previously taken the following subjects:
XNDL-IA: In this subject, the fundamentals of deep learning are explained, including recurrent networks. Knowing these topics is necessary to understand how models based on Seq2Seq architectures work, state of the art in both voice and dialogue processing.
PLH-IA: This subject explains the basics of human language processing. Concepts such as text preprocessing to reduce ambiguities or the continuous representation of text are necessary to be able to develop the systems we will study in the subject.