This course is focused on the study of speech and language technologies, a fundamental part of artificial intelligence that aims to develop systems to analyze, understand, translate, and generate oral or written human language. Special attention is given to new technologies based on deep learning and its applications. The final project gives students the opportunity to deepen on a particular topic, and it also aims to help boost their own skills in the development of applications or in research.
Person in charge
Marta Ruiz Costa-Jussa (
Jose Adrian Rodriguez Fonollosa (
CE5 - Design and apply techniques of signal processing, choosing between different technological tools, including those of Artificial vision, speech recognition and multimedia data processing.
CE6 - Build or use systems of processing and comprehension of written language, integrating it into other systems driven by the data. Design systems for searching textual or hypertextual information and analysis of social networks.
CT5 - Solvent use of information resources. Manage the acquisition, structuring, analysis and visualization of data and information in the field of specialty and critically evaluate the results of such management.
CT6 - Autonomous Learning. Detect deficiencies in one's own knowledge and overcome them through critical reflection and the choice of the best action to extend this knowledge.
CT7 - Third language. Know a third language, preferably English, with an adequate oral and written level and in line with the needs of graduates.
CB4 - That the students can transmit information, ideas, problems and solutions to a specialized and non-specialized public.
CB5 - That the students have developed those learning skills necessary to undertake later studies with a high degree of autonomy
Generic Technical Competences
CG1 - To design computer systems that integrate data of provenances and very diverse forms, create with them mathematical models, reason on these models and act accordingly, learning from experience.
CG2 - Choose and apply the most appropriate methods and techniques to a problem defined by data that represents a challenge for its volume, speed, variety or heterogeneity, including computer, mathematical, statistical and signal processing methods.
CG4 - Identify opportunities for innovative data-driven applications in evolving technological environments.
CG5 - To be able to draw on fundamental knowledge and sound work methodologies acquired during the studies to adapt to the new technological scenarios of the future.
Know the most important deep learning technologies of interest in the processing of oral and written language.
El alumno debe ser capaz de identificar los parámetros fundamentales de la voz en el dominio temporal y frecuencial
The student must know the most important mathematical and machine learning tools for the analysis of the voice as vector quantification (VQ), Gaussian mixture models (GMM) and hidden Markov models (HMM).
Introduction to language and speech technologies and applications
Applications of oral and written language processing. Social impact.
Main blocks of a natural language processing system: speech recognition, natural language processing, text to speech conversion.
Language as a sequence of words. Vector representation of words. One-hot encoding versus continuous-space representations.
Word2vec: Continuous bag-of-words (CBOW) versus Continuous skip-gram. GloVe vectors. Structures and analogies in word vector representations.
Statistical modeling based on N-grams.
Modeling with neural networks. Recurring networks Convolutional networks. Attention mechanisms: the Transformer.
Contextual language representations
General purpose language representations.
Unsupervised training. Unidirectional and bidirectional systems.
Main architectures: ULMfit, OpenAI GPT, ELMo, BERT, XLM. Applications.
Introduction to automatic speech recognition
Pattern matching. Dynamic time warping.
Hidden Markov models. Isolated word recognition.
Large vocabulary continuous ASR: Acoustic modeling, Language modeling, Search.
Topic development: Introduction to speech and language technology and applications
Introduction to speech and language technology and applications.
Word vectors Objectives:32 Contents: