Person in charge: | (-) |
Others: | (-) |
Credits | Dept. |
---|---|
7.5 (6.0 ECTS) | CS |
Person in charge: | (-) |
Others: | (-) |
This subject presents an array of the problems entailed by the processing of language (natural, or human language), the techniques used to undertake language processing and the theoretical basis that underlies them. The subject focuses on the study of the three branches of linguistics engineering:
- Methods and tools
- Resources and ways of acquiring them
- Applications.
The subject will be essentially practical in focus. The two main areas of application will be addressed: systems based on human-machine dialogue and systems for processing large volumes of textual information. The subject matter dealt with in the class aims to cover the two approaches that are normally used in processing natural language: the knowledge-based approach (which is basically linguistic), and the approach based on empirical methods (which is basically either statistical or of an automatic learning nature).
Estimated time (hours):
T | P | L | Alt | Ext. L | Stu | A. time |
Theory | Problems | Laboratory | Other activities | External Laboratory | Study | Additional time |
|
T | P | L | Alt | Ext. L | Stu | A. time | Total | ||
---|---|---|---|---|---|---|---|---|---|---|
5,0 | 5,0 | 8,0 | 0 | 10,0 | 10,0 | 0 | 38,0 | |||
Text, lexical and morphological treatments.
Text processing tasks. Text segmentation. Language identification. Lexical processing tasks. Identification of lexical units. The concept of words. Lexicons, dictionaries. Lexical and semantic ontologies. Corpus. Ways of acquiring lexical information. Morphological processing tasks. Morphological analyzers. Tools based on finite state techniques (automata and finite state transducers). Automatic learning techniques applied to morphology. Morphology induction. POS tagging and Word Sense Disambiguation (WSD). |
|
T | P | L | Alt | Ext. L | Stu | A. time | Total | ||
---|---|---|---|---|---|---|---|---|---|---|
12,0 | 7,0 | 8,0 | 0 | 15,0 | 15,0 | 0 | 57,0 | |||
Syntactic formalisms.
Basic concepts of formal languages. Grammars. Types of grammars. Phrase structure grammars. Expanded non-contextual grammars. Logical grammars. Recent syntactic formalisms: GPSG, HPSG. Feature Grammars with and without types (PATR II, ALE, CUF, etc.). Basic techniques of syntactic analysis. Analytical tools using non-contextual grammars. Extensive non-contextual grammars: ATN, CHART, CKY, Earley, LR, Tomita. Analytical tools using logic grammars. Problems posed by unification management. Statistical, superficial, and fragmentary analyzers. Chunkers. Comparison between symbolic and empirical approaches. Ways of acquiring syntactic information. Grammatical induction. |
|
T | P | L | Alt | Ext. L | Stu | A. time | Total | ||
---|---|---|---|---|---|---|---|---|---|---|
4,0 | 2,0 | 4,0 | 0 | 5,0 | 5,0 | 0 | 20,0 | |||
Forms of semantic representation. Semantic dictionaries. Semantic ontologies.
Lexical semantics. Word Sense Disambiguation (WSD). Semantic interpretation. Collaboration between syntax and semantics. Discursive semantics. Dialogues. Dialogue grammars. Pragmatism. |
|
T | P | L | Alt | Ext. L | Stu | A. time | Total | ||
---|---|---|---|---|---|---|---|---|---|---|
2,0 | 0 | 0 | 0 | 0 | 2,0 | 0 | 4,0 | |||
Generation of Natural Language.
Tactical and strategic generation. Symbolic and statistical methods. |
|
T | P | L | Alt | Ext. L | Stu | A. time | Total | ||
---|---|---|---|---|---|---|---|---|---|---|
2,0 | 0 | 8,0 | 0 | 9,0 | 4,0 | 0 | 23,0 | |||
Applications based on dialogues.
NL interfaces. Multi-modal interfaces. Machine translation. Information recovery. Extracting information. Automatic summary. Searching for the answer. Multi-lingual systems. |
Total per kind | T | P | L | Alt | Ext. L | Stu | A. time | Total |
28,0 | 14,0 | 28,0 | 0 | 39,0 | 38,0 | 0 | 147,0 | |
Avaluation additional hours | 3,0 | |||||||
Total work hours for student | 150,0 |
The classes are split into theory, problem, and lab sessions. The theory sessions develop students" knowledge. The classes of problems let students delve into the techniques and algorithms explained in the theory sessions in greater depth.
The lab classes involve small practical assignments using tools and languages appropriate for NLP purposes (basically, Python, Prolog and NLTK). This work practices and builds on the knowledge imparted in the theory classes.
The final lab sessions will be spent on integrating the software modules produced throughout the course in order to create the final application.
Assessment is based on a part exam, a final exam, and a lab grade.
The part exam will not confer any exemption and will be held in class hours. Students failing to sit or pass the part exam will only be assessed on their performance in the final exam.
The lab grade will be based on student reports on the practical work carried out in the lab classes.
The final course grade will be calculated as follows:
Final Grade = max (part exam grade * 0.15 + Final exam grade * 0.45, Final exam grade * 0.6) + Lab grade * 0.4
Students must have knowledge of:
Knowledge representation
Problem-solving techniques
Basic concepts regarding Natural Language Processing.
Concepts of formal languages (specifically, finite automata, regular and non-contextual languages)
Accordingly, students must have previously passed taken the Artificial Intelligence and Theory of Computation courses before they can take this one.
Students should also have taken the course on compilers.