Introduction to Human Language Technology

You are here

Credits
5
Types
Compulsory
Requirements
This subject has not requirements

Department
CS
This course is an introduction to most relevants problems involved in Natural Language Processing, the most relevant techniques and resources used and the theories they are based on. The course includes an overview of Natural Language applications.

The course is focused on the two most relevant approaches to Natural Language processing: knowledge based and empirical ( both statistical and machine learning).

Teachers

Person in charge

  • Jordi Turmo Borrás ( )

Others

  • Marta Gatius Vila ( )

Weekly hours

Theory
2
Problems
0.5
Laboratory
0.5
Guided learning
0.21
Autonomous learning
8.93

Competences

Generic Technical Competences

Generic

  • CG1 - Capability to plan, design and implement products, processes, services and facilities in all areas of Artificial Intelligence.
  • CG3 - Capacity for modeling, calculation, simulation, development and implementation in technology and company engineering centers, particularly in research, development and innovation in all areas related to Artificial Intelligence.

Technical Competences of each Specialization

Academic

  • CEA5 - Capability to understand the basic operation principles of Natural Language Processing main techniques, and to know how to use in the environment of an intelligent system or service.

Professional

  • CEP4 - Capability to design, write and report about computer science projects in the specific area of ??Artificial Intelligence.
  • CEP6 - Capability to assimilate and integrate the changing economic, social and technological environment to the objectives and procedures of informatic work in intelligent systems.
  • CEP7 - Capability to respect the legal rules and deontology in professional practice.

Transversal Competences

Teamwork

  • CT3 - Ability to work as a member of an interdisciplinary team, as a normal member or performing direction tasks, in order to develop projects with pragmatism and sense of responsibility, making commitments taking into account the available resources.

Information literacy

  • CT4 - Capacity for managing the acquisition, the structuring, analysis and visualization of data and information in the field of specialisation, and for critically assessing the results of this management.

Reasoning

  • CT6 - Capability to evaluate and analyze on a reasoned and critical way about situations, projects, proposals, reports and scientific-technical surveys. Capability to argue the reasons that explain or justify such situations, proposals, etc..

Objectives

  1. Understand the fundamental concepts of Natural Language Processing, most well-known techniques and theories as well as most relevant existing resources.
    Related competences: CT4, CT6, CEA5, CG1, CG3, CEP6,
  2. Understand most relevant applications of NLP and the theories, tecniques and resources they use.
    Related competences: CT4, CT6, CEA5, CG1, CG3, CEP6,
  3. Design and development of programs to solve specific problems in the NLP context, involving the selection of most appropiate techniques and resources as well as the use of existing resources. There would be one larger programs to be developed in groups of two students.
    Related competences: CT3, CT4, CT6, CEA5, CG1, CG3, CEP4, CEP6, CEP7,
  4. Reason (ocassionally, in group) about several problems in the NLP context that imply considering different techniques and resources.
    Related competences: CT3, CT4, CT6, CEA5, CG1, CG3, CEP7,

Contents

  1. 1. Introduction to Natural Language Processing 2. Resources. 3. Language models. 4. Morphology. 5. Syntax 6. Semantics, 7. Coreference. 8. Generation.
    1. Introduction to Natural Language Processing.
    Computational Linguistic and Natural Language Processing (NLP). Motivation and applications. Main challenges in NLP. Basic levels in lingusitic description.

    2. Resources.
    Resources used for processing Natural Language.

    3. Language models.
    Statistical language models. Finite state techniques. Markov model, Hidden-markov model

    4. Morphology
    Basic levels of lingusitic description: textual, lexical and morphology processing. Dicitonaries and lexicons

    5. Syntax.
    Grammars. Syntagmatic grammars. Incontextual grammars. Logic grammars.
    Basic techniques of syntactic processing.

    6. Semantics.
    Representation of semantic knowledge. Semantic dictionaries. Ontologies. Semantic interpretation. Semantic desambiguation.

    7. Coreference
    Basic concepts of coreference resolution: mention detection and corefent chains resolution. Introduction to basic methods.

    8. Generation.
    Natural Language generation. Simbolic and statistical methods

Activities

Introduction

Computational Linguistic and Natural Language Processing (NLP). History, motivation, aplications. Main challenges in NLP. Levels in lingusitic description.
Theory
4
Problems
0
Laboratory
1
Guided learning
0
Autonomous learning
2.9
Objectives: 1

Applications to NLP

Multilingual systems. Dialogue. NL interfaces and multimodal interfaces. Question/Answering. Information extraction.Summarization. Information retrieval. Translation.
Theory
2
Problems
1
Laboratory
1
Guided learning
0
Autonomous learning
3.9
Objectives: 2

Language models.

Statistical language models.Information theory. Finite state techniques.Markov model, Hidden-markov model and their application to tagging.
Theory
2
Problems
2
Laboratory
0
Guided learning
0
Autonomous learning
12
Objectives: 1 4

Morphology and lexicons

Theory
6
Problems
1
Laboratory
1
Guided learning
0
Autonomous learning
12
Objectives: 1 4

Syntax

Formal languages. Grammars. Syntagmatic grammars. Incontextual grammars . Logic grammars. Feature grammars. Basic techniques of syntactic processing. Obtaining syntactic knowledge. Grammar induction.
Theory
8
Problems
2
Laboratory
1
Guided learning
0
Autonomous learning
21
Objectives: 1 4 3

Semantics, pragmatics and discourse

Representation of semantic knowledge. Semantic dictionaries. Ontologies. Semantic interpretation. Semantic desambiguation. Discourse. Dialogue. Dialogue grammars. Pragmatics.
Theory
3
Problems
2
Laboratory
2
Guided learning
0.8
Autonomous learning
14.2
Objectives: 1 4 3

Generation

Natural Language generation. Simbolic and statistical methods
Theory
1
Problems
0
Laboratory
0
Guided learning
0
Autonomous learning
4
Objectives: 1 2

Teaching methodology

There are three types of sessions: theory, exercise and laboratory.

In the theory sessions we will introduce new concepts together with the challenges they present and the approaches to face them.

In the sessions of exercises we will work on the concepts, techniques and algorithms introduced in the theory sessions.

In the laboratory sessions small practices will be developed using the appropriate NLP tools to practice and reinforce the knowledge of the theory classes.

Evaluation methodology

There will be a unique exam at the end of the course and two projects. The exam will include all the course contents.
The grade of the projectes will be computed by considering the deliverables presented by the students.
. In particular, the final grade of the course will be calculated as follows:
Course grade = final exam grade* 0.6 + projects grade average * 0.4

Bibliografy

Basic:

  • Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition and Computational Linguistics - Jurafsky, Daniel & Martin, James H., Prentice-Hall, Inc. , 2009. ISBN:
  • Handbook of natural language processing - Somers, Harold L; Dale, Robert, Marcel Dekker , cop.2000. ISBN: 0824790006
    http://cataleg.upc.edu/record=b1172244~S1*cat
  • Foundations of Statistical Natural Language Processing - Manning,Chris & Schütze, Hinrich, MIT Press , 1999. ISBN:
    http://nlp.stanford.edu/fsnlp/
  • The Oxford handbook of Computational Linguistics - Mitkov, Ruslan, Oxford University Press , 2004. ISBN: 978-0199276349

Web links

Previous capacities


Those acquired in the course of Artificial Intelligence (AI) ( degree in Computer Engineering)