Statistical Natural Language Processing

You are here

Credits
6
Types
Specialization complementary (Data Science)
Requirements
This subject has not requirements
Department
CS
This course is an introduction to the most relevant tasks, applications, techniques and resources involved in empirical Natural Language Processing (NLP), i.e. using Statistical and Machine Learning (ML) methods.

Weekly hours

Theory
3
Problems
0
Laboratory
0
Guided learning
0.6
Autonomous learning
6.4

Objectives

  1. Justify the approppriateness of specific statistical techniques for facing specific NLP tasks.
    Related competences: CB6, CB7, CTR4, CTR6, CEC1, CEC2, CEC3, CG1, CG3,
  2. Evaluating the usefulness of statistical components to be included into NLP applications for carrying out NLP tasks
    Related competences: CEC3, CG3,
  3. Searching and selection of statistical NLP resources and tools able to be used in NLP tasks and applications
    Related competences: CB7, CB9, CTR4, CEC1, CEC2, CEC3, CG3,
  4. Design and implementation of new NLP components, tuning of existing components, and integration into a NLP application
    Related competences: CB6, CB9, CTR4, CEC2, CEC3, CG3,

Contents

  1. Introduction & basics
    NLP vs Computational Linguistics vs HLT
    Knowledge-based vs Empirical methods
    Resources
    Lexical resources
    Corpora
    Grammars
    Ontologies
  2. Language Models
    Basics
    {word, class, phrase}-based models
    Information content
    entropy
    mutual information
    joint and conditional entropy
    pointwise mutual information
    Kullback-Leibler divergence (KL)
    Application to NLP tasks
    Noise channel models
    Alignment models
    Application to NLP tasks
  3. Finite State Models
    Finite State Automata (FSA) and Regular grammars
    Finite State Transducers (FST)
    Finite State Probabilistic models
    Application to NLP tasks
  4. Log linear & Maximum Entropy Models
    Classification problems – MLE vs MEM
    Generative and conditional (discriminative) models.
    MM and HMM.
    CRF
    Building ME models
    Maximum Entropy Markov Models (MEMM)
    Applications to NLP
  5. Models for parsing
    Constituent parsing
    Stochastic Context Free Grammars (SCFG)
    Richer probabilistic models
    Applications to NLP.
    Syntactic parsing
    Semantic parsing
    Dependency parsing
  6. Supervised Machine Learning for NLP
    Classification problems.
    Margin-based classifiers: Perceptron, SVM, AdaBoost.
    Kernel-based mehods.
  7. Semi-supervised Learning
    Bootstrapping
  8. Unsupervised Learning (Clustering)
    Similarity
    Hiereachical Clustering
    non-hierarchical clustrering
    Clustering evaluation.
  9. Using statistical techniques for NLP applications
    Machine Translation (MT) in detail
    Other NLP tasks (Part of Speech (POS) tagging, Named Entity Recognition and Classification (NERC), Mention detection & tracking, Coreference resolution, Text Alignment, Lexical Acquisition, Relation Extraction, Semantic Role Labeling (SRL), Word Sense Disambiguation (WSD)) and applications (Information Extraction (IE), Information Retrieval (IR), Question Answering (Q&A), Automatic Summarization, Sentiment Analysis, and Text Classification) only sketched.

Activities

Activity Evaluation act


Introduction & basics

Introduction & basics attending the theory class Homework discusion and tutoring
Objectives: 2
Contents:
Theory
3h
Problems
0h
Laboratory
0h
Guided learning
1h
Autonomous learning
4h

Language Models

Language Models attending the theory class Homework discusion and tutoring
Objectives: 1 3
Contents:
Theory
6h
Problems
0h
Laboratory
0h
Guided learning
1h
Autonomous learning
10h

Finite State Models

Finite State Models attending the theory class Homework discusion and tutoring
Objectives: 1 2 3
Contents:
Theory
3h
Problems
0h
Laboratory
0h
Guided learning
1h
Autonomous learning
4h

Log linear & Maximum Entropy Models

Log linear & Maximum Entropy Models attending the theory class Homework discusion and tutoring
Objectives: 1 2 3 4
Contents:
Theory
9h
Problems
0h
Laboratory
0h
Guided learning
1h
Autonomous learning
12h

Models for parsing

Models for parsing attending the theory class Homework discusion and tutoring
Objectives: 1 2 4
Contents:
Theory
6h
Problems
0h
Laboratory
0h
Guided learning
1h
Autonomous learning
10h

Supervised Machine Learning for NLP

Supervised Machine Learning for NLP attending the theory class Homework discusion and tutoring
Objectives: 1 2 4
Contents:
Theory
3h
Problems
0h
Laboratory
0h
Guided learning
1h
Autonomous learning
5h

Semi-supervised Learning

Semi-supervised Learning attending the theory class Homework discusion and tutoring
Objectives: 1 2 3 4
Contents:
Theory
3h
Problems
0h
Laboratory
0h
Guided learning
1h
Autonomous learning
5h

Unsupervised Learning (Clustering)

Unsupervised Learning (Clustering) attending the theory class Homework discusion and tutoring
Objectives: 1 2 3 4
Contents:
Theory
3h
Problems
0h
Laboratory
0h
Guided learning
1h
Autonomous learning
4h

Using statistical techniques for NLP applications

Using statistical techniques for NLP applications attending the theory class Homework discusion and tutoring
Objectives: 1 2 3 4
Contents:
Theory
9h
Problems
0h
Laboratory
0h
Guided learning
1h
Autonomous learning
9h

Homeworks

Students will solve the 5 homeworks at home although they will receive advise from the teachers. Homeworks are due two weeks after the proposal. The evaluation will contain comments on the student works
Objectives: 4
Contents:
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
30h

Final Exam

Final exam of the course The exam will be in the classroom
Objectives: 1 2 3
Week: 16
Type: final exam
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h

Teaching methodology

The teaching methology is as follows:

For each of the 9 topics there will be one (the most frequent case) or more theory classes. The material (slides, readings, etc.) is known in advance.

Additionaly, a set of homeworks directly attached with the different topic will be proposed along the course to the students (usually 5 homeworks are proposed). These homeworks can be sometimes solved by hand and in other cases by writing a short program.

Evaluation methodology

The evaluation is based on two components:

1) The final exam
2) The grades of the 5 homeworks

The final grade is obtained from the grades of such components.

The weights of the two components are the same (50%).
The weights of the five homeworks are the same (20%).

Bibliography

Basic: