Saltar al contingut Menu
Map
  • Home
  • Information
  • Contact
  • Map

Information Retrieval (RI)

Credits Dept.
7.5 (6.0 ECTS) CS

Instructors

Person in charge:  (-)
Others:(-)

General goals

Understanding the problems of retrieving information. Knowing the different components of an information retrieval system, the factors and techniques that can optimize the process, and how to use and adapt them. Knowing some applications of these systems, at least in bioinformatics and the Web.

Specific goals

Knowledges

  1. Understand the problems associated with information storage and retrieval, particularly text information.
  2. Understand that effective information search and retrieval is closely linked with the way the information is organised and described.
  3. Understand the most important algorithms and techniques for searching for patterns in textual information.
  4. Describe information retrieval techniques in biocomputing and Web applications.

Abilities

  1. Ability to decide which information recovery techniques are likely to be the most effective in a given information system (particularly in relation to text information).
  2. In particular, ability to decide which information search and retrieval techniques are likely to be the most effective in simple applications for biocomputing and Web fields.
  3. Ability to employ various criteria for assessing the effectiveness and usefulness of an information retrieval system.
  4. Ability to implement basic techniques (algorithms and data structures) for retrieving information.
  5. Ability to use and adapt a tool to manage textual information, like Lucene.

Competences

  1. Ability to apply mathematical knowledge and logic in solving problems.
  2. Ability to create and use models of reality.
  3. Ability to design and carry out experiments and analyse the results.
  4. Ability to design systems, components and processes meeting certain needs, using the most appropriate methods, techniques and tools in each case.
  5. Ability to take take decisions when faced with uncertainty or contradictory requirements
  6. Ability to act independently: Know how to work on one"s own with just the bare minimum of knowledge and guidance.
  7. Ability to learn on one"s own.

Contents

Estimated time (hours):

T P L Alt Ext. L Stu A. time
Theory Problems Laboratory Other activities External Laboratory Study Additional time

1. Introduction
T      P      L      Alt    Ext. L Stu    A. time Total 
2,0 1,0 2,0 0 1,0 2,0 0 8,0
Need for information retrieval techniques. Information retrieval vs. databases. The information retrieval process. Preprocessing and lexical analysis.

2. Information retrieval models
T      P      L      Alt    Ext. L Stu    A. time Total 
4,0 4,0 2,0 0 2,0 10,0 0 22,0
Formal definition and basic concepts: Abstract document models and query languages. Boolean model. Vector model. Latent semantic indexing.

3. Indexing and searching, implementation
T      P      L      Alt    Ext. L Stu    A. time Total 
2,0 2,0 0 0 0 4,0 0 8,0
Inverted files. Index compression. Example: Efficient implementation of the cosine rule with tf-idf. Example: Lucene.

4. Evaluation in information retrieval
T      P      L      Alt    Ext. L Stu    A. time Total 
2,0 2,0 2,0 0 4,0 4,0 0 14,0
Precision and Recall. Other measures of performance. Reference collections. Relevance feedback and query expansion.

5. Applications to the Web
T      P      L      Alt    Ext. L Stu    A. time Total 
8,0 6,0 0 0 0 8,0 0 22,0
Ranking and relevance in Web models. PageRank algorithm. Architecture of web searchers. Web crawling. Link-based analysis of social networks.

6. Sequential search and indexed search
T      P      L      Alt    Ext. L Stu    A. time Total 
8,0 8,0 3,0 0 2,0 16,0 0 37,0
Pattern search. Algorithms for approximate search and exact search. Hidden Markov models. Tries. Inverted files, suffix tree. Construction algorithms, use and analysis.

7. Applications to biocomputing
T      P      L      Alt    Ext. L Stu    A. time Total 
6,0 6,0 3,0 0 2,0 15,0 0 32,0
DNA chain patterns. Sequence similarity. DNA sequencing. DNA databases.


Total per kind T      P      L      Alt    Ext. L Stu    A. time Total 
32,0 29,0 12,0 0 11,0 59,0 0 143,0
Avaluation additional hours 5,0
Total work hours for student 148,0

Docent Methodolgy

The lab classes will implement variations of the algorithms seen in the theory and problem sessions, or will apply them to search for information in lifelike situations.







Students may be required to prepare for some of the lab sessions. Some of them will require the drafting of a short report or submission of code. This work will count towards student assessments.



Nowadays we are using the Lucene package.

Evaluation Methodgy

There will be a first part exam at about half the course, and at the end the students can choose whether to take a second part exam or a final exam of the whole course.



The lab grade will be based on the reports or the programmes submitted after the lab sessions.







The course note is calculated as follows:





Students who choose to take the second part exam:



0.2 * lab note

+ 0.4 * 1st part exam

+ 0.4 * 2nd part exam



Students who choose to take the final exam:



0.2 * lab note

+ max (0.2 * 1st part exam + 0.6 * final exam, 0.8 * final exam)

Basic Bibliography

  • Ricardo Baeza-Yates, Berthier Ribeiro-Neto. Modern information retrieval, Addison-Wesley, 1999.
  • Ian H. Witten, Alistair Moffat, Timothy C. Bell Managing gigabytes : compressing and indexing documents and images, Morgan Kauffman, 1999.
  • Gonzalo Navarro, Matthieu Raffinot Flexible pattern matching in strings : practical on-line search algorithms for texts and biological sequences, Cambridge University Press, 2002.
  • Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze. Introduction to Information Retrieval, Cambridge University Press, 2008.
  • Dan Gusfield Algorithms on strings, trees, and sequences: computer science and computational biology , Cambridge University Press, 1997.

Complementary Bibliography

  • Richard K. Belew Finding out about : a cognitive perspective on search engine technology and the WWW, Cambridge University Press, 2000.
  • Otis Gospodnetic, Erik Hatcher Lucene in action, Manning, 2005.
  • Zdravko Markov, Daniel T. Larose Data mining the web, Wiley Interscience, 2007.

Web links

  1. http://lucene.apache.org/java/docs/


  2. http://en.wikipedia.org/wiki/Information_Retrieval


  3. http://www-csli.stanford.edu/~schuetze/information-retrieval-book.html


  4. http://www-pagines.fib.upc.es/~ri/


Previous capacities

Ability to produce medium-sized programmes, preferably of object-oriented nature.

Ability to design and analyze simple data structures.

Know the difference between main memory and secondary memory and its impact on the program's efficiency.


Compartir

 
logo FIB © Barcelona school of informatics - Contact - RSS
This website uses cookies to offer you the best experience and service. If you continue browsing, it is understood that you accept our cookies policy.
Classic version Mobile version