Processament del Llenguatge Natural Avançat

Hores setmanals
Objectius
Continguts
Activitats
Metodologia docent
Mètode d'avaluació
Bibliografia
Web links
Capacitats prèvies

Crèdits

5

Tipus

Optativa

Requisits

Aquesta assignatura no té requisits, però té capacitats prèvies

Departament

CS

Web

http://www.lsi.upc.edu/~ageno/anlp

Can a machine learn to correct the grammaticality of text? Can a machine learn to answer questions we make in plain English? Can a machine learn to translate languages, using Wikipedia as a training set?

This course offers an in depth coverage of methods for Natural Language Processing. We will present fundamental models and tools to approach a variety of Natural Language Processing tasks, ranging from syntactic processing, to semantic processing, to final applications such as information extraction, human-machine dialogue systems, and machine translation. The flow of the course is along two main axis: (1) computational formalisms to describe natural language processes, and (2) statistical and machine learning methods to acquire linguistic models from large data collections.

Hores setmanals

Teoria

2

Problemes

1

Laboratori

0

Aprenentatge dirigit

0

Aprenentatge autònom

5.3

Objectius

Learn to apply statistical methods for NLP in a practical application
Competències relacionades: CEA3, CEA5, CT3, CB6, CB8, CB9,
Understand statistical and machine learning techniques applied to NLP
Competències relacionades: CEA3, CG3, CT6, CT7, CB6,
Develop the ability to solve technical problems related to statistical and algorithmic problems in NLP
Competències relacionades: CEA3, CEA5, CG3, CT7, CB6, CB8, CB9,
Understand fundamental methods of Natural Language Processing from a computational perspective
Competències relacionades: CEA5, CT7, CB6,

Continguts

Course Introduction
Fundamental tasks in NLP. Main challenges in NLP. Review of statistical paradigms. Review of language modeling techniques.
Classification in NLP
Review of supervised machine learning methods. Linear classifiers. Generative and discriminative learning. Feature representations in NLP. The EM algorithm.
Sequence Models.
Hidden Markov Models. Log-linear models and Conditional Random Fields. Applications to part-of-speech tagging and named-entity extraction.
Syntax and Parsing.
Probabilistic Context Free Grammars. Dependency Grammars. Parsing Algorithms. Discriminative Learning for Parsing.
Machine Translation
Introduction to Statistical Machine Translation. The IBM models. Phrase-based methods. Syntax-based approaches to translation.
Unsupervised and Semisupervised methods in NLP
Bootstrapping. Cotraining. Distributional methods.

Activitats

Activitat Acte avaluatiu

Course Introduction

Review of the field of Natural Language Processing, and the main challenges in the field. Review of the statistical paradigm. Review of language models. The student has to understand the basic questions for which we will see a variety of techniques during the course.
Objectius: 4 2
Continguts:

1 . Course Introduction

Teoria

2h

Problemes

1h

Laboratori

0h

Aprenentatge dirigit

0h

Aprenentatge autònom

0h

Classification in NLP

These lectures present machine learning algorithms used in the field of NLP. Special attention is given to the difference between generative and discriminative methods for parameter estimation. We will also present the type of features that are typically used in NLP in discriminative methods. We expect that students already have some background in machine learning, and the goal of these lectures is to see how machine learning is applied to NLP.
Objectius: 4 2
Continguts:

2 . Classification in NLP

Teoria

5h

Problemes

3h

Laboratori

0h

Aprenentatge dirigit

0h

Aprenentatge autònom

0h

Problem Set 1

Objectius: 4 2 3
Setmana: 4

Teoria

0h

Problemes

0h

Laboratori

0h

Aprenentatge dirigit

0h

Aprenentatge autònom

6h

Sequence Models in NLP

These lectures will present sequence models, an important set of tools that is used for sequential tasks. We will present this in the framework of structured prediction (later in the course we will see that the same framework is used for parsing and translation). We will focus on machine learning aspects, as well as algorithmic aspects. We will give special emphasis to Conditional Random Fields.
Objectius: 4 2
Continguts:

3 . Sequence Models.

Teoria

6h

Problemes

4h

Laboratori

0h

Aprenentatge dirigit

0h

Aprenentatge autònom

0h

Problem Set 2

Objectius: 4 2 3
Setmana: 7

Teoria

0h

Problemes

0h

Laboratori

0h

Aprenentatge dirigit

0h

Aprenentatge autònom

6h

Syntax and Parsing

We will present statistical models for syntactic structure, and in general tree structures. The focus will be on probabilistic context-free grammars and dependency grammars, two standard formalisms. We will see relevant algorithms, as well as methods to learn grammars from data based on the structured prediction framework.
Objectius: 4 2
Continguts:

3 . Sequence Models.

Teoria

6h

Problemes

3h

Laboratori

0h

Aprenentatge dirigit

0h

Aprenentatge autònom

0h

Problem Set 3

Objectius: 4 2 3
Setmana: 10

Teoria

0h

Problemes

0h

Laboratori

0h

Aprenentatge dirigit

0h

Aprenentatge autònom

6h

Statistical Machine Translation

We will present the basic elements of statistical machine translation systems, including representation aspects, algorithmic aspects, and methods for parameter estimation.
Objectius: 4 2
Continguts:

5 . Machine Translation

Teoria

4h

Problemes

2h

Laboratori

0h

Aprenentatge dirigit

0h

Aprenentatge autònom

0h

Unsupervised Methods in NLP

We will review several methods for unsupervised learning in NLP, in the context of lexical models, sequence models, and grammatical models. We will focus on bootstrapping and cotraining methods, the EM algorithm, and distributional methods.
Objectius: 4 2
Continguts:

6 . Unsupervised and Semisupervised methods in NLP

Teoria

4h

Problemes

2h

Laboratori

0h

Aprenentatge dirigit

0h

Aprenentatge autònom

0h

Problem Set 4

Objectius: 4 2 3
Setmana: 14

Teoria

0h

Problemes

0h

Laboratori

0h

Aprenentatge dirigit

0h

Aprenentatge autònom

6h

Final Exam

Objectius: 4 2 3
Setmana: 15

Teoria

3h

Problemes

0h

Laboratori

0h

Aprenentatge dirigit

0h

Aprenentatge autònom

10.5h

Project

Objectius: 4 2 1
Setmana: 16

Teoria

0h

Problemes

0h

Laboratori

0h

Aprenentatge dirigit

0h

Aprenentatge autònom

45h

Metodologia docent

The course will be structured around five main blocks of lectures. In each theory lecture, we will present fundamental algorithmic and statistical techniques for NLP. This will be followed by problem lectures, where we will look in detail to derivations of algorithms and mathematical proofs that are necessary in order to understand statistical methods in NLP.

Furthermore, there will be four problem sets that students need to solve at home. Each problem set will consist of three or four problems that will require the student to understand the elements behind statistical NLP methods. In some cases these problems will involve writing small programs to analyze data and perform some computation.

Finally, students will develop a practical project in teams of two or three students. The goal of the project is to put into practice the methods learned in class, and learn how the experimental methodology that is used in the NLP field. Students have to identify existing components (i.e. data and tools) that can be used to build a system, and perform experiments in order to perform empirical analysis of some statistical NLP method.

Mètode d'avaluació

Final grade = 0.6 final exam + 0.4 project

where

final exam is the grade of the final exam

project is the grade of the project

Bibliografia

Bàsica:

Linguistic Structure Prediction - Smith, Noah, Morgan & Claypool Publishers, 2011. ISBN: 9781608454051
http://www.morganclaypool.com/doi/abs/10.2200/S00361ED1V01Y201105HLT013
Lecture Notes for Coursera Course "Natural Language Processing" - Collins, Michael,
http://www.cs.columbia.edu/~mcollins/notes-spring2013.html

Web links

The course website, includes lecture slides, and links to relevant bibliography and resources. http://www.lsi.upc.edu/~ageno/anlp

Capacitats prèvies

- Introductory concepts and methods of Natural Language processing.

- Introductory concepts and methods of Machine Learning.

- Programming.

Processament del Llenguatge Natural Avançat

Hores setmanals

Objectius

Continguts

Activitats

Course Introduction

Classification in NLP

Problem Set 1

Sequence Models in NLP

Problem Set 2

Syntax and Parsing

Problem Set 3

Statistical Machine Translation

Unsupervised Methods in NLP

Problem Set 4

Final Exam

Project

Metodologia docent

Mètode d'avaluació

Bibliografia

Bàsica:

Web links

Capacitats prèvies

On som

Contacta amb la FIB

Processament del Llenguatge Natural Avançat

Esteu aquí

Hores setmanals

Objectius

Continguts

Activitats

Course Introduction

Classification in NLP

Problem Set 1

Sequence Models in NLP

Problem Set 2

Syntax and Parsing

Problem Set 3

Statistical Machine Translation

Unsupervised Methods in NLP

Problem Set 4

Final Exam

Project

Metodologia docent

Mètode d'avaluació

Bibliografia

Bàsica:

Web links

Capacitats prèvies

On som

Contacta amb la FIB