Information Retrieval and Recommender Systems

Weekly hours
Competences
Objectives
Contents
Activities
Teaching methodology
Evaluation methodology
Bibliography
Web links
Previous capacities

Credits

6

Types

MIRI: Specialization complementary (Data Science)
MDS: Elective

Requirements

This subject has not requirements, but it has got previous capacities

Department

CS

The amount of information stored digitally in organizations, or collectively on the web, is today large enough to make searching this information a generally complicated task. The field known as "Information Retrieval" finds methods to organize information in such a way that finding information afterwards can be done simply and efficiently. We will cover basic keyword-based techniques to search in textual information. Then, we will examine search in the web, where hyperlinks can be used not only to direct the search but to assess the interest value of each page - as is the case with the well-known PageRank algorithm. We will see extensions of these techniques to the case of Social Networks where interactions among users can provide very useful information. Finally, we will study how the tecnologies known as Big Data and recommendation complement Information Retrieval techniques in contemporary systems.

Teachers

Person in charge

Ramon Ferrer Cancho ( )

Weekly hours

Theory

2

Problems

1

Laboratory

0.5

Guided learning

0.44444445

Autonomous learning

7.11

Competences

Transversal Competences

Information literacy

CT4 - Capacity for managing the acquisition, the structuring, analysis and visualization of data and information in the field of specialisation, and for critically assessing the results of this management.

Third language

CT5 - Achieving a level of spoken and written proficiency in a foreign language, preferably English, that meets the needs of the profession and the labour market.

Basic

CB6 - Ability to apply the acquired knowledge and capacity for solving problems in new or unknown environments within broader (or multidisciplinary) contexts related to their area of study.
CB7 - Ability to integrate knowledge and handle the complexity of making judgments based on information which, being incomplete or limited, includes considerations on social and ethical responsibilities linked to the application of their knowledge and judgments.
CB10 - Possess and understand knowledge that provides a basis or opportunity to be original in the development and/or application of ideas, often in a research context.

Generic Technical Competences

Generic

CG2 - Identify and apply methods of data analysis, knowledge extraction and visualization for data collected in disparate formats

Technical Competences

Especifics

CE1 - Develop efficient algorithms based on the knowledge and understanding of the computational complexity theory and considering the main data structures within the scope of data science
CE11 - Analyze and extract knowledge from unstructured information using natural language processing techniques, text and image mining

Objectives

Information search and information processing in heterogenous environments
Related competences: CB10, CT4, CT5, CE1, CE11, CG2,
Recommeder systems
Related competences: CB6, CB7, CT5, CE11, CG2,
Advanced algorithms for data mining
Related competences: CB10, CB6, CT5, CE1, CE11,

Introduction
Need of search and analysis techniques of massive information. Search and analysis vs. databases. Information retrieval process. Preprocessing and lexical analysis.
Models of information retrieval
Formal definition and basic concepts: abstract models of documents and query languages. Boolean model. Vector model. Latent Semantic Indexing.
Implementation: Indexing and searching
Inverse and signature files. Index compression. Example: Efficient implementation of the rule of the cosine measure with tf-idf. Example: Lucene.
Evaluation in information retrieval
Recall and precision. Other performance measures. Reference collections. Relevance feedback and query expansion.
Web search
Ranking and relevance in the web. The PageRank algorithm. Crawling. Architecture of a simple web search system.
Architecture of massive information processing systems
Scalability, high performance, and fault tolerance: the case of massive web searchers. Distributed architectures. Example: Hadoop.
Network analysis
Descriptive parameters and characteristics of networks: degree, diameter, small-world networks, among others. Algorithms on networks: clustering, community detection and detection of influential nodes, reputation, among others.
Information Systems based on massive information analysis. Combination with other technologies.
Search Engine Optimization. Joint use of IR techniques with Data Mining and Machine Learning. Recommender Systems.

Activities

Activity Evaluation act

Theoretical development of topics 1 to 8 of the course

The student will attend the instructor's presentation and actively participate in the initial discussion of the challenge to be solved in that session.
Objectives: 1 2 3
Contents:

1 . Introduction
3 . Implementation: Indexing and searching
4 . Evaluation in information retrieval
6 . Architecture of massive information processing systems
8 . Information Systems based on massive information analysis. Combination with other technologies.
2 . Models of information retrieval
5 . Web search
7 . Network analysis

Theory

26h

Problems

0h

Laboratory

0h

Guided learning

0h

Autonomous learning

26h

Exercises on topics 1 to 8 of the course

In each session, the instructor proposes a number of exercises (say, 4 to 7) on the topic just covered in theory. Next, a few of the problems (say, 3) are solved jointly. Students must solve the rest of the exercises and deliver them by the start of next session. A part of the session is devoted to discussing the possible questions that may have appeared while solving the problems pending from the last session.
Objectives: 1 2 3
Contents:

1 . Introduction
3 . Implementation: Indexing and searching
4 . Evaluation in information retrieval
6 . Architecture of massive information processing systems
8 . Information Systems based on massive information analysis. Combination with other technologies.
2 . Models of information retrieval
5 . Web search
7 . Network analysis

Theory

0h

Problems

13h

Laboratory

0h

Guided learning

0h

Autonomous learning

26h

Laboratory work on topics 1 to 8

The teacher will describe a practical work to be carried out related with the topics most recently covered. This may be a data analysis task, the implementation of an algorithm seen in class, or proposing a solution for an Information Retrieval scenario. The student completes the work as much as possible in class, although occasionally some additional time may be necessary. In many cases the student will have to produce a report on the work done and results obtained, to be delivered within some clearly stated deadline (say, 2 weeks).
Objectives: 1 2 3
Contents:

1 . Introduction
3 . Implementation: Indexing and searching
4 . Evaluation in information retrieval
6 . Architecture of massive information processing systems
8 . Information Systems based on massive information analysis. Combination with other technologies.
2 . Models of information retrieval
5 . Web search
7 . Network analysis

Theory

0h

Problems

0h

Laboratory

13h

Guided learning

0h

Autonomous learning

13h

Final exam

Final exam on the contents of the whole course
Objectives: 1 2 3
Week: 18

Theory

3h

Problems

0h

Laboratory

0h

Guided learning

0h

Autonomous learning

15h

Study and presentation of a scientific paper

Study and presentation of a scientific paper related to the course topic
Objectives: 1 2 3

Theory

0h

Problems

0h

Laboratory

0h

Guided learning

3h

Autonomous learning

10h

Teaching methodology

Sessions of theory + problemes of 3 sessions per week. The 2 hours of each session are theoretical expositions, and the third one is devoted to joint exercise solving. For each session, the student will have to deliver solutions to a few problems proposed but not solved in the previous session.

Laboratory sessions of 1 hour per week. For many of the sessons, the student will have to deliver a report of the work done and obtained results after about two weeks.

The working of each type of session is described in the "Activities" session.

Furthermore, at the end of the course each student must present to instructors and fellow students a scientific paper related to the course topic, in the format of a conference presentation. Near week 8 of the course, a list of papers will be made public, from which each student can choose one, or alternatively propose a paper of his/her choice, to be approved by the instructors. The date and time range for the presentations will be announced with at least 2 months time, and the schedule within the chosen day at least 1 week time.

Evaluation methodology

Define:

- NF as the grade of the final exam
- NE the grade of exercise assignments
- NL the grade of lab reports
- NA the grade from the presentation of a scientific article

(all in the range 0..10).

Then the final course grade is 0.3*NF + 0.25*NL + 0.25*NE + 0.2*NA.

Bibliography

Basic:

Modern information retrieval: the concepts and technology behind search - Baeza-Yates, R.; Ribeiro-Neto, B, Addison-Wesley / Pearson, 2011. ISBN: 9780321416919
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003938679706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Introduction to information retrieval - Manning, C.D.; Raghavan, P.; Schütze, H, Cambridge University Press, 2008. ISBN: 9780521865715
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003641259706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Search engines: information retrieval in practice - Croft, W.B.; Metzler, D.; Strohman, T, Pearson, 2010. ISBN: 9780131364899
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003969369706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Mining the social web: data mining Facebook, Twitter, LinkedIn, Instagram, Github, and more - Russell, M.A.; Klassen, M, O'Reilly Media, 2018. ISBN: 9781491973509
Lucene in action - McCandless, M.; Hatcher, E.; Gospodnetic, O, Manning, 2010. ISBN: 9781933988177
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003760299706711&context=L&vid=34CSUC_UPC:VU1&lang=ca

Web links

Supporting web of the course http://www.cs.upc.edu/~IR-MIRI/

Previous capacities

Those assumed for admision to MIRI plus those provided by the common learning phase.

Information Retrieval and Recommender Systems

Teachers

Person in charge

Weekly hours

Competences

Transversal Competences

Information literacy

Third language

Basic

Generic Technical Competences

Generic

Technical Competences

Especifics

Objectives

Contents

Activities

Theoretical development of topics 1 to 8 of the course

Exercises on topics 1 to 8 of the course

Laboratory work on topics 1 to 8

Final exam

Study and presentation of a scientific paper

Teaching methodology

Evaluation methodology

Bibliography

Basic:

Web links

Previous capacities

Where we are

Contact with us

Information Retrieval and Recommender Systems

You are here

Teachers

Person in charge

Weekly hours

Competences

Transversal Competences

Information literacy

Third language

Basic

Generic Technical Competences

Generic

Technical Competences

Especifics

Objectives

Contents

Activities

Theoretical development of topics 1 to 8 of the course

Exercises on topics 1 to 8 of the course

Laboratory work on topics 1 to 8

Final exam

Study and presentation of a scientific paper

Teaching methodology

Evaluation methodology

Bibliography

Basic:

Web links

Previous capacities

Where we are

Contact with us