Introduction to Human Language Technology

Teachers
Weekly hours
Competences
Objectives
Contents
Activities
Teaching methodology
Evaluation methodology
Bibliography
Web links
Previous capacities

Credits

Types

Compulsory

Requirements

This subject has not requirements, but it has got previous capacities

Department

Web

www.cs.upc.edu/~turmo/ihlt/plan32j-7/IHLT.html

The goal of this course is to provide the fundamentals of Natural Language Processing (NLP) to the student. Concretely, the course is an introduction to the most relevant drawbacks involved in NLP, the most relevant techniques and resources used to tackle with them, and the theories they are based on. In addition, brief descriptions of the most relevant NLP applications are included. The course will focus on knowledge-based and empirical-based approaches to NLP (both statistical and machine learning).

IHLT provides the basic NLP knowledge in order to course AHLT and HLE. While AHLT goes in depth in the NLP statistical techniques, HLE reviews the state of the art on real applications in which NLP technology is involved.

Teachers

Person in charge

Jordi Turmo Borrás ( )

Others

Salvador Medina Herrera ( )

Weekly hours

Theory

Problems

Laboratory

Guided learning

Autonomous learning

5.93

Competences

Generic Technical Competences

Generic

CG1 - Capability to plan, design and implement products, processes, services and facilities in all areas of Artificial Intelligence.
CG3 - Capacity for modeling, calculation, simulation, development and implementation in technology and company engineering centers, particularly in research, development and innovation in all areas related to Artificial Intelligence.

Technical Competences of each Specialization

Academic

CEA5 - Capability to understand the basic operation principles of Natural Language Processing main techniques, and to know how to use in the environment of an intelligent system or service.

Professional

CEP4 - Capability to design, write and report about computer science projects in the specific area of ??Artificial Intelligence.
CEP6 - Capability to assimilate and integrate the changing economic, social and technological environment to the objectives and procedures of informatic work in intelligent systems.
CEP7 - Capability to respect the legal rules and deontology in professional practice.

Transversal Competences

Teamwork

CT3 - Ability to work as a member of an interdisciplinary team, as a normal member or performing direction tasks, in order to develop projects with pragmatism and sense of responsibility, making commitments taking into account the available resources.

Information literacy

CT4 - Capacity for managing the acquisition, the structuring, analysis and visualization of data and information in the field of specialisation, and for critically assessing the results of this management.

Reasoning

CT6 - Capability to evaluate and analyze on a reasoned and critical way about situations, projects, proposals, reports and scientific-technical surveys. Capability to argue the reasons that explain or justify such situations, proposals, etc..

Objectives

Understand the fundamental concepts of Natural Language Processing, most well-known techniques and theories as well as most relevant existing resources.
Related competences: CT4, CT6, CEA5, CEP6, CG1, CG3,
Understand most relevant applications of NLP and the theories, tecniques and resources they use.
Related competences: CT4, CT6, CEA5, CEP6, CG1, CG3,
Design and development of programs to solve specific problems in the NLP context, involving the selection of most appropiate techniques and resources as well as the use of existing resources. There would be one larger programs to be developed in groups of two students.
Related competences: CT3, CT4, CT6, CEA5, CEP4, CEP6, CEP7, CG1, CG3,
Reason (ocassionally, in group) about several problems in the NLP context that imply considering different techniques and resources.
Related competences: CT3, CT4, CT6, CEA5, CEP7, CG1, CG3,

Document Structure and Language
Text selection, Tokenization, Sentence splitting, Language Identifiers
Words
Morphology, Finite States Automata, Finite States Transducers.
PoS tagging, Hidden Markov Models.
Lexical semantics, Semantic resources.
Word Sense Diambiguation.
Word sequences
Recognition and classification of word sequences with meaning.
BIO discriminative models. Conditional Random Fields (CRF).
Named Entity Recognition and Classification (NERC).
Noun-phrase Chunking.
Sentences
Syntactic grammars, typology. Context free grammars. Probabilistic context free grammars. Chomsky normal form grammars.

Syntactic parsers, properties and strategies. CKY and probabilistic CKY parsers.
Sentence sequences
Coreference resolution. Mention detection. Types of techniques for the generation of coreferents chains. Mention-pair model. Entity-mention model. Rankers model.

Activities

Activity Evaluation act

Introduction

Objectives: 1 2

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Document structure and language

Objectives: 1 3

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Morphological analysis

Finite States Automata. Finite States Transducers.
Objectives: 1 2

Theory

Problems

Laboratory

Guided learning

Autonomous learning

PoS tagging

Hidden Markov Models
Objectives: 1 4 2

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Lexical semantics, Semantic resources.

Objectives: 1 4 2

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Word Sense Diambiguation.

Objectives: 1 4 2

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Recognition and classification of word sequences with meaning.

BIO discriminative models. Conditional Random Fields (CRF). Named Entity Recognition and Classification (NERC). Noun-phrase Chunking.
Objectives: 4 3 1

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Syntactic parsing: Syntactic grammars

Typology. Context free grammars. Probabilistic context free grammars. Chomsky normal form grammars.
Objectives: 1 4 2

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Syntactic parsing: parsers

Syntactic parsers, properties and strategies. CKY and probabilistic CKY parsers.
Objectives: 1 4 2

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Coreference resolution

Objectives: 1 2

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Project tutoring

Objectives: 4 2
Contents:

2 . Words
3 . Word sequences
4 . Sentences
5 . Sentence sequences

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Project presentation

Theory

Problems

Laboratory

Guided learning

Autonomous learning

40h

Final exam

Week: 15 (Outside class hours)

Theory

Problems

Laboratory

Guided learning

Autonomous learning

43h

Teaching methodology

There are two types of sessions: theory/exercise and laboratory.

In each theory/exercise session we will introduce new concepts together with the challenges they present and the approaches to face them. In addition, we will solve some exercises to fix those concepts, techniques and algorithms introduced in the session.

In the laboratory sessions small practices will be developed using the appropriate NLP tools to practice and reinforce the knowledge learned in the theory classes.

Evaluation methodology

There will be a unique exam at the end of the course, one project and one deliverable for each lab session. The exam will include all the course contents.
The mark of the project and deliverables will be computed by considering the documents presented by the students.
The final mark of the course will be calculated as follows:
Course mark = final exam mark* 0.5 + lab mark * 0.5

Bibliography

Basic:

Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition - Jurafsky, D.; Martin, J.H, Prentice-Hall, Inc., 2024.
https://web.stanford.edu/~jurafsky/slp3/
The Oxford handbook of computational linguistics - Mitkov, R. (ed.), Oxford University Press, 2003. ISBN: 0198238827
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991002689009706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Foundations of statistical natural language processing - Manning, C.D.; Schütze, H, MIT Press, 1999. ISBN: 0262133601
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991001994779706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Handbook of natural language processing - Dale, R.; Moisl, H.; Somers, H, Marcel Dekker, 2000. ISBN: 0824790006
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991002071619706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
The Handbook of Computational Linguistics and Natural Language Processing Blackwell Handbooks in Linguistics - Clark, Alexander ; Fox, Chris; Lappin, Shalom, Wiley-Blackwell, 2010. ISBN: 9781444324044
https://onlinelibrary-wiley-com.recursos.biblioteca.upc.edu/doi/book/10.1002/9781444324044

Web links

Time table of the course depending on the holidays http://www.cs.upc.edu/~turmo/IHLT.html

Previous capacities

Those acquired in the course of Artificial Intelligence (AI) (degree in Computer Engineering)

Introduction to Human Language Technology

Teachers

Person in charge

Others

Weekly hours

Competences

Generic Technical Competences

Generic

Technical Competences of each Specialization

Academic

Professional

Transversal Competences

Teamwork

Information literacy

Reasoning

Objectives

Contents

Activities

Introduction

Document structure and language

Morphological analysis

PoS tagging

Lexical semantics, Semantic resources.

Word Sense Diambiguation.

Recognition and classification of word sequences with meaning.

Syntactic parsing: Syntactic grammars

Syntactic parsing: parsers

Coreference resolution

Project tutoring

Project presentation

Final exam

Teaching methodology

Evaluation methodology

Bibliography

Basic:

Web links

Previous capacities

Where we are

Contact with us

Introduction to Human Language Technology

You are here

Teachers

Person in charge

Others

Weekly hours

Competences

Generic Technical Competences

Generic

Technical Competences of each Specialization

Academic

Professional

Transversal Competences

Teamwork

Information literacy

Reasoning

Objectives

Contents

Activities

Introduction

Document structure and language

Morphological analysis

PoS tagging

Lexical semantics, Semantic resources.

Word Sense Diambiguation.

Recognition and classification of word sequences with meaning.

Syntactic parsing: Syntactic grammars

Syntactic parsing: parsers

Coreference resolution

Project tutoring

Project presentation

Final exam

Teaching methodology

Evaluation methodology

Bibliography

Basic:

Web links

Previous capacities

Where we are

Contact with us