This course is focused on the study of speech and language technologies, a fundamental part of artificial intelligence that aims to develop systems to analyze, understand, translate, and generate oral or written human language. Special attention is given to new technologies based on deep learning and its applications. The exercises provide students with the opportunity to deepen some topics and are also intended to help enhance their application development or research skills.
Teachers
Person in charge
Jose Adrian Rodriguez Fonollosa (
)
Others
Carlos Escolano Peinado (
)
Weekly hours
Theory
3
Problems
0
Laboratory
1
Guided learning
0
Autonomous learning
6
Competences
Technical Competences
Technical competencies
CE5 - Design and apply techniques of signal processing, choosing between different technological tools, including those of Artificial vision, speech recognition and multimedia data processing.
CE6 - Build or use systems of processing and comprehension of written language, integrating it into other systems driven by the data. Design systems for searching textual or hypertextual information and analysis of social networks.
Transversal Competences
Transversals
CT5 [Avaluable] - Solvent use of information resources. Manage the acquisition, structuring, analysis and visualization of data and information in the field of specialty and critically evaluate the results of such management.
CT6 - Autonomous Learning. Detect deficiencies in one's own knowledge and overcome them through critical reflection and the choice of the best action to extend this knowledge.
CT7 [Avaluable] - Third language. Know a third language, preferably English, with an adequate oral and written level and in line with the needs of graduates.
Basic
CB4 - That the students can transmit information, ideas, problems and solutions to a specialized and non-specialized public.
CB5 - That the students have developed those learning skills necessary to undertake later studies with a high degree of autonomy
Generic Technical Competences
Generic
CG1 - To design computer systems that integrate data of provenances and very diverse forms, create with them mathematical models, reason on these models and act accordingly, learning from experience.
CG2 - Choose and apply the most appropriate methods and techniques to a problem defined by data that represents a challenge for its volume, speed, variety or heterogeneity, including computer, mathematical, statistical and signal processing methods.
CG4 - Identify opportunities for innovative data-driven applications in evolving technological environments.
CG5 - To be able to draw on fundamental knowledge and sound work methodologies acquired during the studies to adapt to the new technological scenarios of the future.
Objectives
Know the most important deep learning technologies of interest in the processing of oral and written language.
Related competences:
CE5,
CE6,
CT5,
CT6,
CT7,
CG1,
CG2,
CG4,
CG5,
CB4,
CB5,
The student must know the most important applications of speech and language technology.
Related competences:
CE5,
CE6,
CT5,
CT6,
CT7,
CG1,
CG2,
CG4,
CG5,
CB4,
CB5,
The student must be able to select the most appropriate speech and language technology for a particular task or application.
Related competences:
CE5,
CE6,
CT5,
CT6,
CT7,
CG1,
CG2,
CG4,
CG5,
CB4,
CB5,
Develop innovative applications that use speech technology appropriately.
Related competences:
CE5,
CE6,
CT5,
CT6,
CT7,
CG1,
CG2,
CG4,
CG5,
CB4,
CB5,
El alumno debe ser capaz de identificar los parámetros fundamentales de la voz en el dominio temporal y frecuencial
Related competences:
CE5,
CT5,
CT6,
CT7,
CG1,
CB4,
CB5,
The student must know the most important mathematical and machine learning tools for the analysis of the voice as vector quantification (VQ), Gaussian mixture models (GMM) and hidden Markov models (HMM).
Related competences:
CE5,
CT5,
CT6,
CT7,
CG1,
CG2,
CG4,
CG5,
CB4,
CB5,
The student must know the techniques for statistical language modeling.
Related competences:
CE6,
CT5,
CT6,
CT7,
CG1,
CG2,
CG4,
CG5,
CB4,
CB5,
Contents
Introduction to language and speech technologies and applications
Applications of oral and written language processing. Social impact.
Main blocks of a natural language processing system: speech recognition, natural language processing, text to speech conversion.
Language as a sequence of words. Vector representation of words. One-hot encoding versus continuous-space representations.
Word2vec: Continuous bag-of-words (CBOW) versus Continuous skip-gram. GloVe vectors. Structures and analogies in word vector representations.
Language Modeling
Statistical modeling based on N-grams.
Modeling with neural networks. Recurring networks Convolutional networks. Attention mechanisms: the Transformer.
Contextual language representations
General purpose language representations.
Unsupervised training. Unidirectional and bidirectional systems.
Main architectures: ULMfit, OpenAI GPT, ELMo, BERT, XLM. Applications.
Introduction to automatic speech recognition
Pattern matching. Dynamic time warping.
Hidden Markov models. Isolated word recognition.
Large vocabulary continuous ASR: Acoustic modeling, Language modeling, Search.
Speech synthesis
Linguistic processing.
Prosody modeling.
Waveform generation.
Concatenation methods.
Activities
ActivityEvaluation act
Topic development: Introduction to speech and language technology and applications
Introduction to speech and language technology and applications.
Word vectors Objectives:32 Contents:
Lectures presenting new theoretical material and practical examples.
Theoretical and practical assignments grouped in subjects.
Research project, presented in written and oral form by the students.
Evaluation methodology
Course evaluation is based on the following aspects:
- Two exams, a midterm exam, and the final exam, to assess the knowledge acquired but the student on the topics worked on in theory and practice sessions (60%)
College Calculus, Linear Algebra
Basic Probability and Statistics
Large programming experience in Pyhton
Machine Learning.
Introduction to Deep Learning