Spoken and Written Language Processing

Credits

Types

Compulsory

Requirements

This subject has not requirements , but it has got previous capacities

Department

TSC;CS

This course is focused on the study of speech and language technologies, a fundamental part of artificial intelligence that aims to develop systems to analyze, understand, translate, and generate oral or written human language. Special attention is given to new technologies based on deep learning and its applications. The exercises provide students with the opportunity to deepen some topics and are also intended to help enhance their application development or research skills.

Teachers

Person in charge

Jose Adrian Rodriguez Fonollosa ( jose.fonollosa@upc.edu )

Others

Carlos Escolano Peinado ( carlos.escolano@upc.edu )

Weekly hours

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Competences

Technical Competences

Technical competencies

CE5 - Design and apply techniques of signal processing, choosing between different technological tools, including those of Artificial vision, speech recognition and multimedia data processing.

CE6 - Build or use systems of processing and comprehension of written language, integrating it into other systems driven by the data. Design systems for searching textual or hypertextual information and analysis of social networks.

Transversal Competences

Transversals

CT5 [Avaluable] - Solvent use of information resources. Manage the acquisition, structuring, analysis and visualization of data and information in the field of specialty and critically evaluate the results of such management.

CT6 - Autonomous Learning. Detect deficiencies in one's own knowledge and overcome them through critical reflection and the choice of the best action to extend this knowledge.

CT7 [Avaluable] - Third language. Know a third language, preferably English, with an adequate oral and written level and in line with the needs of graduates.

Basic

CB4 - That the students can transmit information, ideas, problems and solutions to a specialized and non-specialized public.

CB5 - That the students have developed those learning skills necessary to undertake later studies with a high degree of autonomy

Generic Technical Competences

Generic

CG1 - To design computer systems that integrate data of provenances and very diverse forms, create with them mathematical models, reason on these models and act accordingly, learning from experience.

CG2 - Choose and apply the most appropriate methods and techniques to a problem defined by data that represents a challenge for its volume, speed, variety or heterogeneity, including computer, mathematical, statistical and signal processing methods.

CG4 - Identify opportunities for innovative data-driven applications in evolving technological environments.

CG5 - To be able to draw on fundamental knowledge and sound work methodologies acquired during the studies to adapt to the new technological scenarios of the future.

Objectives

Know the most important deep learning technologies of interest in the processing of oral and written language.
Related competences: CB5, CT5, CT6, CT7, CE5, CE6, CG1, CG2, CG4, CB4, CG5,
The student must know the most important applications of speech and language technology.
Related competences: CB5, CT5, CT6, CT7, CE5, CE6, CG1, CG2, CG4, CB4, CG5,
The student must be able to select the most appropriate speech and language technology for a particular task or application.
Related competences: CB4, CB5, CT5, CT6, CT7, CE5, CE6, CG1, CG2, CG4, CG5,
Develop innovative applications that use speech technology appropriately.
Related competences: CB5, CT5, CT6, CT7, CE5, CE6, CG1, CG2, CG4, CG5, CB4,
El alumno debe ser capaz de identificar los parámetros fundamentales de la voz en el dominio temporal y frecuencial
Related competences: CB4, CB5, CT5, CT6, CT7, CE5, CG1,
The student must know the most important mathematical and machine learning tools for the analysis of the voice as vector quantification (VQ), Gaussian mixture models (GMM) and hidden Markov models (HMM).
Related competences: CB4, CB5, CT5, CT6, CT7, CE5, CG1, CG2, CG4, CG5,
The student must know the techniques for statistical language modeling.
Related competences: CB4, CB5, CT5, CT6, CT7, CE6, CG1, CG2, CG4, CG5,

Introduction to language and speech technologies and applications
Applications of oral and written language processing. Social impact.
Main blocks of a natural language processing system: speech recognition, natural language processing, text to speech conversion.
Language as a sequence of words. Vector representation of words. One-hot encoding versus continuous-space representations.
Word2vec: Continuous bag-of-words (CBOW) versus Continuous skip-gram. GloVe vectors. Structures and analogies in word vector representations.
Language Modeling
Statistical modeling based on N-grams.
Modeling with neural networks. Recurring networks Convolutional networks. Attention mechanisms: the Transformer.
Contextual language representations
General purpose language representations.
Unsupervised training. Unidirectional and bidirectional systems.
Main architectures: ULMfit, OpenAI GPT, ELMo, BERT, XLM. Applications.
Neural Machine Translation
Introduction to Machine Translation. Automatic quality evaluation: BLEU
Neural Machine Translation.
Introduction to automatic speech recognition
Pattern matching. Dynamic time warping.
Hidden Markov models. Isolated word recognition.
Large vocabulary continuous ASR: Acoustic modeling, Language modeling, Search.
Speech synthesis
Linguistic processing.
Prosody modeling.
Waveform generation.
Concatenation methods.

Activities

Activity Evaluation act

Topic development: Introduction to speech and language technology and applications

Introduction to speech and language technology and applications. Word vectors
Objectives: 3 2
Contents:

1 . Introduction to language and speech technologies and applications

Theory

Problems

Laboratory

Guided learning

Autonomous learning

10h

Topic development: Language Modeling

Objectives: 6
Contents:

2 . Language Modeling

Theory

Problems

Laboratory

Guided learning

Autonomous learning

10h

Topic development: Automatic Speech Recognition

Automatic Speech Recognition
Objectives: 5 6 7
Contents:

5 . Introduction to automatic speech recognition

Theory

Problems

Laboratory

Guided learning

Autonomous learning

10h

Topic development: Speech Synthesis

Speech Synthesis
Objectives: 2
Contents:

6 . Speech synthesis

Theory

Problems

Laboratory

Guided learning

Autonomous learning

10h

Topic development: Contextual language representations

Objectives: 1
Contents:

3 . Contextual language representations

Theory

Problems

Laboratory

Guided learning

Autonomous learning

16h

Topic development: Neuronal Machine Translation

Neuronal Machine Translation

Theory

Problems

Laboratory

Guided learning

Autonomous learning

10h

Final project

Final project
Objectives: 3 4
Contents:

2 . Language Modeling
5 . Introduction to automatic speech recognition
6 . Speech synthesis
3 . Contextual language representations
4 . Neural Machine Translation

Theory

Problems

Laboratory

Guided learning

Autonomous learning

24h

Teaching methodology

Lectures presenting new theoretical material and practical examples.
Theoretical and practical assignments grouped in subjects.
Research project, presented in written and oral form by the students.

Evaluation methodology

Course evaluation is based on the following aspects:

- Two exams, a midterm and a final, to assess the knowledge acquired by the student on the topics covered in theory (60%) and in practice sessions (15%).

- Lab Evaluation (25%): graded via written reports and oral defenses of the assignments.

Bibliography

Basic

Spoken language processing: a guide to theory, algorithm and system development - Huang, X.; Acero, A.; Hon, H.-W, Prentice Hall, 2001. ISBN: 0130226165
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991002590969706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Deep learning - Goodfellow, I.; Bengio, Y.; Courville, A, The MIT Press, 2016. ISBN: 9780262035613
https://www.deeplearningbook.org/
Deep learning for NLP and speech recognition - Kamath, U.; Liu, J.; Whitaker, J, Springer, 2019. ISBN: 9783030145958
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004193579706711&context=L&vid=34CSUC_UPC:VU1&lang=ca

Complementary

Text-to-speech synthesis - Taylor, P, Cambridge University Press, 2009. ISBN: 9780521899277
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003624489706711&context=L&vid=34CSUC_UPC:VU1&lang=ca

Previous capacities

College Calculus, Linear Algebra
Basic Probability and Statistics
Large programming experience in Pyhton
Machine Learning.
Introduction to Deep Learning