This course is an introduction to the most relevant tasks, applications, techniques and resources involved in empirical Natural Language Processing (NLP), i.e. using Statistical and Machine Learning (ML) methods.
Weekly hours
Theory
3
Problems
0
Laboratory
0
Guided learning
0.6
Autonomous learning
6.4
Objectives
Justify the approppriateness of specific statistical techniques for facing specific NLP tasks.
Related competences:
CB6,
CB7,
CTR4,
CTR6,
CEC1,
CEC2,
CEC3,
CG1,
CG3,
Evaluating the usefulness of statistical components to be included into NLP applications for carrying out NLP tasks
Related competences:
CEC3,
CG3,
Searching and selection of statistical NLP resources and tools able to be used in NLP tasks and applications
Related competences:
CB7,
CB9,
CTR4,
CEC1,
CEC2,
CEC3,
CG3,
Design and implementation of new NLP components, tuning of existing components, and integration into a NLP application
Related competences:
CB6,
CB9,
CTR4,
CEC2,
CEC3,
CG3,
Contents
Introduction & basics
NLP vs Computational Linguistics vs HLT
Knowledge-based vs Empirical methods
Resources
Lexical resources
Corpora
Grammars
Ontologies
Language Models
Basics
{word, class, phrase}-based models
Information content
entropy
mutual information
joint and conditional entropy
pointwise mutual information
Kullback-Leibler divergence (KL)
Application to NLP tasks
Noise channel models
Alignment models
Application to NLP tasks
Finite State Models
Finite State Automata (FSA) and Regular grammars
Finite State Transducers (FST)
Finite State Probabilistic models
Application to NLP tasks
Log linear & Maximum Entropy Models
Classification problems MLE vs MEM
Generative and conditional (discriminative) models.
MM and HMM.
CRF
Building ME models
Maximum Entropy Markov Models (MEMM)
Applications to NLP
Using statistical techniques for NLP applications
Machine Translation (MT) in detail
Other NLP tasks (Part of Speech (POS) tagging, Named Entity Recognition and Classification (NERC), Mention detection & tracking, Coreference resolution, Text Alignment, Lexical Acquisition, Relation Extraction, Semantic Role Labeling (SRL), Word Sense Disambiguation (WSD)) and applications (Information Extraction (IE), Information Retrieval (IR), Question Answering (Q&A), Automatic Summarization, Sentiment Analysis, and Text Classification) only sketched.
Activities
ActivityEvaluation act
Introduction & basics
Introduction & basics
attending the theory class
Homework discusion and tutoring Objectives:2 Contents:
Students will solve the 5 homeworks at home although they will receive advise from the teachers. Homeworks are due two weeks after the proposal. The evaluation will contain comments on the student works Objectives:4 Contents:
Final exam of the course
The exam will be in the classroom Objectives:123 Week:
16 Type:
final exam
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h
Teaching methodology
The teaching methology is as follows:
For each of the 9 topics there will be one (the most frequent case) or more theory classes. The material (slides, readings, etc.) is known in advance.
Additionaly, a set of homeworks directly attached with the different topic will be proposed along the course to the students (usually 5 homeworks are proposed). These homeworks can be sometimes solved by hand and in other cases by writing a short program.
Evaluation methodology
The evaluation is based on two components:
1) The final exam
2) The grades of the 5 homeworks
The final grade is obtained from the grades of such components.
The weights of the two components are the same (50%).
The weights of the five homeworks are the same (20%).
Bibliography
Basic:
Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition and Computational Linguistics -
Jurafsky, Daniel & Martin, James H.,
ISBN: 0131873210 http://www.cs.colorado.edu/~martin/slp.html