Credits
6
Types
Compulsory
Requirements
This subject has not requirements
, but it has got previous capacities
Department
TSC;CS
Teachers
Person in charge
- Jose Adrian Rodriguez Fonollosa ( jose.fonollosa@upc.edu )
Others
- Carlos Escolano Peinado ( carlos.escolano@upc.edu )
Weekly hours
Theory
3
Problems
0
Laboratory
1
Guided learning
0
Autonomous learning
6
Competences
Technical competencies
Transversals
Basic
Generic
Objectives
-
Know the most important deep learning technologies of interest in the processing of oral and written language.
Related competences: CE5, CE6, CT5, CT6, CT7, CG1, CG2, CG4, CG5, CB4, CB5, -
The student must know the most important applications of speech and language technology.
Related competences: CE5, CE6, CT5, CT6, CT7, CG1, CG2, CG4, CG5, CB4, CB5, -
The student must be able to select the most appropriate speech and language technology for a particular task or application.
Related competences: CE5, CE6, CT5, CT6, CT7, CG1, CG2, CG4, CG5, CB4, CB5, -
Develop innovative applications that use speech technology appropriately.
Related competences: CE5, CE6, CT5, CT6, CT7, CG1, CG2, CG4, CG5, CB4, CB5, -
El alumno debe ser capaz de identificar los parámetros fundamentales de la voz en el dominio temporal y frecuencial
Related competences: CE5, CT5, CT6, CT7, CG1, CB4, CB5, -
The student must know the most important mathematical and machine learning tools for the analysis of the voice as vector quantification (VQ), Gaussian mixture models (GMM) and hidden Markov models (HMM).
Related competences: CE5, CT5, CT6, CT7, CG1, CG2, CG4, CG5, CB4, CB5, -
The student must know the techniques for statistical language modeling.
Related competences: CE6, CT5, CT6, CT7, CG1, CG2, CG4, CG5, CB4, CB5,
Contents
-
Introduction to language and speech technologies and applications
Applications of oral and written language processing. Social impact.
Main blocks of a natural language processing system: speech recognition, natural language processing, text to speech conversion.
Language as a sequence of words. Vector representation of words. One-hot encoding versus continuous-space representations.
Word2vec: Continuous bag-of-words (CBOW) versus Continuous skip-gram. GloVe vectors. Structures and analogies in word vector representations. -
Language Modeling
Statistical modeling based on N-grams.
Modeling with neural networks. Recurring networks Convolutional networks. Attention mechanisms: the Transformer. -
Contextual language representations
General purpose language representations.
Unsupervised training. Unidirectional and bidirectional systems.
Main architectures: ULMfit, OpenAI GPT, ELMo, BERT, XLM. Applications. -
Neural Machine Translation
Introduction to Machine Translation. Automatic quality evaluation: BLEU
Neural Machine Translation. -
Introduction to automatic speech recognition
Pattern matching. Dynamic time warping.
Hidden Markov models. Isolated word recognition.
Large vocabulary continuous ASR: Acoustic modeling, Language modeling, Search. -
Speech synthesis
Linguistic processing.
Prosody modeling.
Waveform generation.
Concatenation methods.
Activities
Activity Evaluation act
Theory
6h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
10h
Theory
6h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
10h
Theory
9h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
16h
Topic development: Neuronal Machine Translation
Neuronal Machine Translation
Theory
6h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
10h
Teaching methodology
Lectures presenting new theoretical material and practical examples.Theoretical and practical assignments grouped in subjects.
Research project, presented in written and oral form by the students.
Evaluation methodology
Course evaluation is based on the following aspects:- Two exams, a midterm and a final, to assess the knowledge acquired by the student on the topics covered in theory (60%) and in practice sessions (15%).
- Lab Evaluation (25%): graded via written reports and oral defenses of the assignments.
Bibliography
Basic
-
Spoken language processing: a guide to theory, algorithm and system development
- Huang, X.; Acero, A.; Hon, H.-W,
Prentice Hall,
2001.
ISBN: 0130226165
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991002590969706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Deep learning
- Goodfellow, I.; Bengio, Y.; Courville, A,
The MIT Press,
2016.
ISBN: 9780262035613
https://www.deeplearningbook.org/ -
Deep learning for NLP and speech recognition
- Kamath, U.; Liu, J.; Whitaker, J,
Springer,
2019.
ISBN: 9783030145958
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004193579706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Complementary
-
Text-to-speech synthesis
- Taylor, P,
Cambridge University Press,
2009.
ISBN: 9780521899277
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003624489706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Previous capacities
College Calculus, Linear AlgebraBasic Probability and Statistics
Large programming experience in Pyhton
Machine Learning.
Introduction to Deep Learning