Can a machine learn to correct the grammaticality of text? Can a machine learn to answer questions we make in plain English? Can a machine learn to translate languages, using Wikipedia as a training set?
This course offers an in depth coverage of methods for Natural Language Processing. We will present fundamental models and tools to approach a variety of Natural Language Processing tasks, ranging from syntactic processing, to semantic processing, to final applications such as information extraction, human-machine dialogue systems, and machine translation. The flow of the course is along two main axis: (1) computational formalisms to describe natural language processes, and (2) statistical and machine learning methods to acquire linguistic models from large data collections.
Person in charge
Lluis Padro Cirera (
Christine Raouf Saad Basta (
Marta Ruiz Costa-Jussa (
Generic Technical Competences
CG3 - Capacity for modeling, calculation, simulation, development and implementation in technology and company engineering centers, particularly in research, development and innovation in all areas related to Artificial Intelligence.
Technical Competences of each Specialization
CEA3 - Capability to understand the basic operation principles of Machine Learning main techniques, and to know how to use on the environment of an intelligent system or service.
CEA5 - Capability to understand the basic operation principles of Natural Language Processing main techniques, and to know how to use in the environment of an intelligent system or service.
CT3 - Ability to work as a member of an interdisciplinary team, as a normal member or performing direction tasks, in order to develop projects with pragmatism and sense of responsibility, making commitments taking into account the available resources.
CT6 - Capability to evaluate and analyze on a reasoned and critical way about situations, projects, proposals, reports and scientific-technical surveys. Capability to argue the reasons that explain or justify such situations, proposals, etc..
Analisis y sintesis
CT7 - Capability to analyze and solve complex technical problems.
CB6 - Ability to apply the acquired knowledge and capacity for solving problems in new or unknown environments within broader (or multidisciplinary) contexts related to their area of study.
CB8 - Capability to communicate their conclusions, and the knowledge and rationale underpinning these, to both skilled and unskilled public in a clear and unambiguous way.
CB9 - Possession of the learning skills that enable the students to continue studying in a way that will be mainly self-directed or autonomous.
Learn to apply statistical methods for NLP in a practical application
Understand statistical and machine learning techniques applied to NLP
Develop the ability to solve technical problems related to statistical and algorithmic problems in NLP
Understand fundamental methods of Natural Language Processing from a computational perspective
Statistical Models for NLP
Introduction to statistical modelling for language. Maximum Likelhood models and smooting. Maximum entropy estimation. Log-Linear models
Distances and Similarities
Distances (and similarities) between linguistic units. Textual, Semantic, and Distributional distances. Semantic spaces (WN, Wikipedia, Freebase, Dbpedia).
Prediction in word sequences: PoS tagging, NERC. Local classifiers, HMM, global predictors, Log-linear models.
Parsing constituent trees: PCFG, CKY vs Inside/outside
Parsing dependency trees: CRFs for parsing. Earley algorithm
Document representation: from BoW to NLU.
Deep Leaning approaches - Introduction
Introduction to ANN for NLP
Lexical semantics. Word Embeddings
Deep Learning approaches - Word Sequences
PoS tagging, NERC
These lectures will present sequence models, an important set of tools that is used for sequential tasks. We will present this in the framework of structured prediction (later in the course we will see that the same framework is used for parsing and translation). We will focus on machine learning aspects, as well as algorithmic aspects. We will give special emphasis to Conditional Random Fields.
Also Deep Learning models will be presented Objectives:42 Contents:
We will present statistical models for syntactic structure, and in general tree structures. The focus will be on probabilistic context-free grammars and dependency grammars, two standard formalisms. We will see relevant algorithms, as well as methods to learn grammars from data based on the structured prediction framework.
Sentence similarity, sentence classification. LSTM. BERT. Sentence embeddings Objectives:42 Contents:
The course will be structured around five main blocks of lectures. In each theory lecture, we will present fundamental algorithmic and statistical techniques for NLP. This will be followed by problem lectures, where we will look in detail to derivations of algorithms and mathematical proofs that are necessary in order to understand statistical methods in NLP.
Furthermore, there will be four problem sets that students need to solve at home. Each problem set will consist of three or four problems that will require the student to understand the elements behind statistical NLP methods. In some cases these problems will involve writing small programs to analyze data and perform some computation.
Finally, students will develop a practical project in teams of two or three students. The goal of the project is to put into practice the methods learned in class, and learn how the experimental methodology that is used in the NLP field. Students have to identify existing components (i.e. data and tools) that can be used to build a system, and perform experiments in order to perform empirical analysis of some statistical NLP method.
Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition -
Jurafsky, D.; Martin, J.H,
Prentice Hall, 2008. ISBN: 9332518416 http://cataleg.upc.edu/record=b1333746~S1*cat