Advanced Natural Language Processing

You are here

Credits
6
Types
Elective
Requirements
This subject has not requirements
Department
CS
Can a machine learn to correct the grammaticality of text? Can a machine learn to answer questions we make in plain English? Can a machine learn to translate languages, using Wikipedia as a training set?

This course offers an in depth coverage of methods for Natural Language Processing. We will present fundamental models and tools to approach a variety of Natural Language Processing tasks, ranging from syntactic processing, to semantic processing, to final applications such as information extraction, human-machine dialogue systems, and machine translation. The flow of the course is along two main axis: (1) computational formalisms to describe natural language processes, and (2) statistical and machine learning methods to acquire linguistic models from large data collections.

Weekly hours

Theory
2
Problems
1
Laboratory
0
Guided learning
0.6
Autonomous learning
6.5

Competences

Generic Technical Competences

Generic

  • CG3 - Capacity for mathematical modeling, calculation and experimental designing in technology and companies engineering centers, particularly in research and innovation in all areas of Computer Science.

Transversal Competences

Teamwork

  • CTR3 - Capacity of being able to work as a team member, either as a regular member or performing directive activities, in order to help the development of projects in a pragmatic manner and with sense of responsibility; capability to take into account the available resources.

Reasoning

  • CTR6 - Capacity for critical, logical and mathematical reasoning. Capability to solve problems in their area of study. Capacity for abstraction: the capability to create and use models that reflect real situations. Capability to design and implement simple experiments, and analyze and interpret their results. Capacity for analysis, synthesis and evaluation.

Basic

  • CB6 - Ability to apply the acquired knowledge and capacity for solving problems in new or unknown environments within broader (or multidisciplinary) contexts related to their area of study.
  • CB8 - Capability to communicate their conclusions, and the knowledge and rationale underpinning these, to both skilled and unskilled public in a clear and unambiguous way.
  • CB9 - Possession of the learning skills that enable the students to continue studying in a way that will be mainly self-directed or autonomous.

Technical Competences of each Specialization

Specific

  • CEC1 - Ability to apply scientific methodologies in the study and analysis of phenomena and systems in any field of Information Technology as well as in the conception, design and implementation of innovative and original computing solutions.
  • CEC2 - Capacity for mathematical modelling, calculation and experimental design in engineering technology centres and business, particularly in research and innovation in all areas of Computer Science.

Objectives

  1. Understand fundamental methods of Natural Language Processing from a computational perspective
    Related competences: CG3, CB6, CB9, CEC1, CEC2, CTR6,
  2. Understand statistical and machine learning techniques applied to NLP
    Related competences: CG3, CB6, CB9, CEC1, CEC2, CTR6,
  3. Develop the ability to solve technical problems related to statistical and algorithmic problems in NLP
    Related competences: CG3, CB6, CB8, CB9, CEC1, CEC2, CTR6,
  4. Learn to apply statistical methods for NLP in a practical application
    Related competences: CG3, CB6, CB8, CB9, CEC1, CEC2, CTR3, CTR6,

Contents

  1. Course Introduction
    Fundamental tasks in NLP. Main challenges in NLP. Review of statistical paradigms. Review of language modeling techniques.
  2. Classification in NLP
    Review of supervised machine learning methods. Linear classifiers. Generative and discriminative learning. Feature representations in NLP. The EM algorithm.
  3. Sequence Models
    Hidden Markov Models. Log-linear models and Conditional Random Fields. Applications to part-of-speech tagging and named-entity extraction.
  4. Syntax and Parsing
    Probabilistic Context Free Grammars. Dependency Grammars. Parsing Algorithms. Discriminative Learning for Parsing.
  5. Machine Translation
    Introduction to Statistical Machine Translation. The IBM models. Phrase-based methods. Syntax-based approaches to translation.
  6. Unsupervised and Semisupervised methods in NLP
    Bootstrapping. Cotraining. Distributional methods.

Activities

Activity Evaluation act


Course Introduction


Objectives: 1 2
Contents:
Theory
2h
Problems
1h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h

Classification in NLP


Objectives: 1 2
Contents:
Theory
5h
Problems
3h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h

Problem Set 1


Objectives: 1 2 3
Week: 4
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
1.7h
Autonomous learning
10h

Sequence Models in NLP


Objectives: 1 2
Contents:
Theory
6h
Problems
3h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h

Problem Set 2


Objectives: 3 1 2
Week: 7
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
1.7h
Autonomous learning
10h

Syntax and Parsing


Objectives: 1 2
Contents:
Theory
6h
Problems
3h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h

Problem Set 3


Objectives: 1 2 3
Week: 10
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
1.7h
Autonomous learning
10h

Statistical Machine Translation

We will present the basic elements of statistical machine translation systems, including representation aspects, algorithmic aspects, and methods for parameter estimation.
Objectives: 1 2
Contents:
Theory
4h
Problems
2h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h

Unsupervised Methods in NLP

We will review several methods for unsupervised learning in NLP, in the context of lexical models, sequence models, and grammatical models. We will focus on bootstrapping and cotraining methods, the EM algorithm, and distributional methods

Theory
4h
Problems
3h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h

Problem Set 4


Objectives: 1 2 3
Week: 13
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
1.7h
Autonomous learning
10h

Final Exam


Objectives: 1 2 3
Week: 15
Theory
3h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
12.5h

Practical Project


Objectives: 1 2 4
Week: 16
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
2.2h
Autonomous learning
45h

Teaching methodology

The course will be structured around five main blocks of lectures. In each theory lecture, we will present fundamental algorithmic and statistical techniques for NLP. This will be followed by problem lectures, where we will look in detail to derivations of algorithms and mathematical proofs that are necessary in order to understand statistical methods in NLP.

Furthermore, there will be four problem sets that students need to solve at home. Each problem set will consist of three or four problems that will require the student to understand the elements behind statistical NLP methods. In some cases these problems will involve writing small programs to analyze data and perform some computation.

Finally, students will develop a practical project in teams of two or three students. The goal of the project is to put into practice the methods learned in class, and learn how the experimental methodology that is used in the NLP field. Students have to identify existing components (i.e. data and tools) that can be used to build a system, and perform experiments in order to perform empirical analysis of some statistical NLP method.

Evaluation methodology

Final grade = 0.6 final exam + 0.4 project

where

final exam is the grade of the final exam

project is the grade of the project

Bibliography

Basic: