Saltar al contingut Saltar a navegacio
Map
  • Home
  • Information
  • Contact
  • Map

Natural Language Processing (PLN)

Credits Dept. Type Requirements
7.5 (6.0 ECTS) CS
  • Elective for DIE
IA - Prerequisite for DIE

Instructors

Person in charge:  Horacio Rodríguez Hontoria (horacio@cs.upc.edu)
Others:(-)

General goals

This subject presents an array of the problems entailed by the processing of language (natural, or human language), the techniques used to undertake language processing and the theoretical basis that underlies them. The subject focuses on the study of the three branches of linguistics engineering:
-  Methods and tools
-  Resources and ways of acquiring them
-  Applications.
The subject will be essentially practical in focus. The two main areas of application will be addressed: systems based on human-machine dialogue and systems for processing large volumes of textual information. The subject matter dealt with in the class aims to cover the two approaches that are normally used in processing natural language: the knowledge-based approach (which is basically linguistic), and the approach based on empirical methods (which is basically either statistical or of an automatic learning nature).

Specific goals

Knowledges

  1. Scope and need for Natural Language processing techniques.
  2. Basic concepts regarding the most common applications in Natural Language Processing. Systems based on dialogues, automatic translation systems, answer search systems, automatic summary systems, systems for recovering and extracting text and other information.
  3. Basic techniques used in Natural Language Processing tasks. Morphological and syntactic analysis, semantic interpretation, semantic disambiguation, and others.
  4. The fundamental principles of these techniques.
  5. Basic knowledge of languages using Natural Language Processing applications.
  6. Knowledge of the resources needed in language processing: dictionaries, grammars, lexicons, ontologies, and others

Abilities

  1. Ability to analyze a problem, identify the scope for and advantages of applying Natural Language Processing in seeking a solution.
  2. Choice of the most suitable techniques for each NLP task.
  3. Ability to extract and represent the knowledge required to construct a Natural Language Processing application. Choice and assessment of the resources required, and of those available.
  4. Ability to integrate existing NLP components (both linguistic resources and treatment tools) in real applications.
  5. Ability to design and construct software components solving basic NLP problems (analysers, disambiguation tools, translators, etc.).
  6. Ability to design and carry out experiments on the application of NLP empirical methods, and to study their results.

Competences

  1. Ability to solve problems through the application of scientific and engineering methods.
  2. Ability to create and use models of reality.
  3. Ability to design systems, components and processes meeting certain needs, using the most appropriate methods, techniques and tools in each case.
  4. Ability to design and carry out experiments and analyse the results.

Contents

Estimated time (hours):

T P L Alt Ext. L Stu A. time
Theory Problems Laboratory Other activities External Laboratory Study Additional time

1. Introduction to Natural Language Processing
T      P      L      Alt    Ext. L Stu    A. time Total 
3,0 0 0 0 0 2,0 0 5,0
Introduction. Linguistic Engineering, Computational Linguistics and Natural Language Processing. History, applications, and reasons for NLP.



The problems posed by NLP.



NLP - Description and basic tasks. Levels of linguistic description.

2. Basic levels of linguistic processing
T      P      L      Alt    Ext. L Stu    A. time Total 
5,0 5,0 8,0 0 10,0 10,0 0 38,0
Text, lexical and morphological treatments.



Text processing tasks. Text segmentation. Language identification.



Lexical processing tasks. Identification of lexical units. The concept of words. Lexicons, dictionaries. Lexical and semantic ontologies.



Corpus.



Ways of acquiring lexical information.



Morphological processing tasks. Morphological analyzers. Tools based on finite state techniques (automata and finite state transducers).



Automatic learning techniques applied to morphology. Morphology induction.



POS tagging and Word Sense Disambiguation (WSD).

3. Syntactic treatment
T      P      L      Alt    Ext. L Stu    A. time Total 
12,0 7,0 8,0 0 15,0 15,0 0 57,0
Syntactic formalisms.



Basic concepts of formal languages. Grammars. Types of grammars.



Phrase structure grammars. Expanded non-contextual grammars.



Logical grammars.



Recent syntactic formalisms: GPSG, HPSG. Feature Grammars with and without types (PATR II, ALE, CUF, etc.).



Basic techniques of syntactic analysis.



Analytical tools using non-contextual grammars. Extensive non-contextual grammars: ATN, CHART, CKY, Earley, LR, Tomita.



Analytical tools using logic grammars. Problems posed by unification management.



Statistical, superficial, and fragmentary analyzers. Chunkers.



Comparison between symbolic and empirical approaches.



Ways of acquiring syntactic information.



Grammatical induction.

4. Semantic and pragmatic treatments
T      P      L      Alt    Ext. L Stu    A. time Total 
4,0 2,0 4,0 0 5,0 5,0 0 20,0
Forms of semantic representation. Semantic dictionaries. Semantic ontologies.



Lexical semantics. Word Sense Disambiguation (WSD).



Semantic interpretation.



Collaboration between syntax and semantics.



Discursive semantics. Dialogues. Dialogue grammars. Pragmatism.

5. Generation
T      P      L      Alt    Ext. L Stu    A. time Total 
2,0 0 0 0 0 2,0 0 4,0
Generation of Natural Language.



Tactical and strategic generation.



Symbolic and statistical methods.

6. Applications
T      P      L      Alt    Ext. L Stu    A. time Total 
2,0 0 8,0 0 9,0 4,0 0 23,0
Applications based on dialogues.



NL interfaces.



Multi-modal interfaces.



Machine translation.



Information recovery.



Extracting information.



Automatic summary.



Searching for the answer.



Multi-lingual systems.


Total per kind T      P      L      Alt    Ext. L Stu    A. time Total 
28,0 14,0 28,0 0 39,0 38,0 0 147,0
Avaluation additional hours 3,0
Total work hours for student 150,0

Docent Methodolgy

The classes are split into theory, problem, and lab sessions. The theory sessions develop students" knowledge. The classes of problems let students delve into the techniques and algorithms explained in the theory sessions in greater depth.







The lab classes involve small practical assignments using tools and languages appropriate for NLP purposes (basically, Python, Prolog and NLTK). This work practices and builds on the knowledge imparted in the theory classes.







The final lab sessions will be spent on integrating the software modules produced throughout the course in order to create the final application.

Evaluation Methodgy

Assessment is based on a part exam, a final exam, and a lab grade.







The part exam will not confer any exemption and will be held in class hours. Students failing to sit or pass the part exam will only be assessed on their performance in the final exam.







The lab grade will be based on student reports on the practical work carried out in the lab classes.







The final course grade will be calculated as follows:







Final Grade = max (part exam grade * 0.15 + Final exam grade * 0.45, Final exam grade * 0.6) + Lab grade * 0.4

Basic Bibliography

  • Robert Dale, Hermann Moisl, Harold Somers, [editors] Handbook of natural language processing, Marcel Dekker, 2000.
  • Daniel Jurafsky and James H. Martin Speech and Language Processing : an introduction to natural language processing, computational linguistics, and speech recognition, Prentice Hall, 2000.
  • Christopher D. Manning, Hinrich Schütze Foundations of statistical natural language processing, MIT Press, 1999.
  • Ruslan Mitkov, [editors] The Oxford handbook of computational linguistics, Oxford University Press, 2003.

Complementary Bibliography

  • James Allen Natural language understanding, Benjamin /Cummings, 1995.
  • Horacio Rodríguez Hontoria, M. Antònia Martí Antonin, Irene Castellón Masalles Formalismes lògics per al tractament del llenguatge natural, Edicions UPC, 1995.
  • María Antonia Martí Antonín Tecnologías del lenguaje, Editorial UOC, 2003.
  • M. Antònia Martí Antonín, Irene Castellón Masalles Lingüística computacional, EUB, Edicions Universitat de Barcelona, 2001.
  • Edward Loper and Steven Bird NLTK: The Natural Language Toolkit , ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language, 2002.

Previous capacities

Students must have knowledge of:

Knowledge representation
Problem-solving techniques
Basic concepts regarding Natural Language Processing.
Concepts of formal languages (specifically, finite automata, regular and non-contextual languages)

Accordingly, students must have previously passed taken the Artificial Intelligence and Theory of Computation courses before they can take this one.

Students should also have taken the course on compilers.


Compartir

 
logo FIB © Barcelona school of informatics - Contact - RSS