Credits
6
Types
Compulsory
Requirements
This subject has not requirements
, but it has got previous capacities
Department
CS
Teachers
Person in charge
- Marta Arias Vicente ( marias@cs.upc.edu )
Others
- Jordi Turmo Borrás ( turmo@cs.upc.edu )
- Juan Luis Esteban Ángeles ( esteban@cs.upc.edu )
Weekly hours
Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6
Competences
Technical competencies
Transversals
Basic
Generic
Objectives
-
Describe different models for evaluating similarity between texts, and how they apply to textual search. Decide which of the models is best suited to a specific scenario involving text search. Implement the models from scratch (in a very basic system) or on a highly scalable text indexing system.
Related competences: CE1, CE4, CE6, CE7, CT5, CT6, CT7, CG2, CG4, CG5, CB2, CB3, -
Describe the advantages, in order to carry out effective searches, of using the information given by links in hyperlink structures, such as the web, digital social networks, and the semantic web. Describe the main parameters used to characterize these linked structures. Reproduce the most commonly used algorithms to establish importance in these structures (e.g. pagerank), to discover structure in them (e.g. community discovery) and to improve search results proposed by a user. Implement these algorithms from scratch in a very basic system, or on top of massive data processing systems so that they can scale.
Translated with www.DeepL.com/Translator
Related competences: CE1, CE4, CE6, CE7, CT5, CT6, CT7, CG2, CG4, CG5, CB2, CB3, -
Evaluate the effectiveness of search systems in complex systems, describing it in terms of hard measures such as "recall" and "accuracy" but also in terms of soft measures such as user satisfaction, novelty and task completion. Adapt the operation and presentation of information search systems with feedback from the user experience methodically collected.
Related competences: CE1, CT4, CT5, CT6, CT7, CG3, CG4, CG5, CB2, CB3, CB4, -
Define the problem of the recommendation and the differences with other problems related to information previously stored (search, learning, ...). Describe the main approaches to the problem of item recommendations and the advantages and disadvantages of each one. Describe the main algorithms of each of the approaches. Be able to implement basic versions from scratch, or advanced versions on top of massive data processing systems. Evaluate the effectiveness of recommendation systems, both in terms of hard measures and soft measures such as user satisfaction. Decide on the most appropriate forms of recommendation to simple real scenarios, including the characterization of potential users.
Translated with www.DeepL.com/Translator
Related competences: CE1, CE4, CE7, CT5, CT6, CT7, CG2, CG4, CG5, CB2, CB3, CB4, -
Use known algorithmic paradigms to deal with data problems characterized by high volume and high speed. They include: streaming algorithms that treat data flows with little time per element, and little memory. Algorithms to answer proximity questions, particularly with geolocalized information. Algorithms that use sampling to draw reliable conclusions about large volumes of data. Integration of the techniques seen in the rest of the course with algorithmic techniques of other subjects, such as "machine learning", "clustering" and "pattern mining". Techniques for dealing with sensitive data, such as anonymization and privacy-preserving machine learning. "Consistent and distributed caching.
Translated with www.DeepL.com/Translator
Related competences: CE1, CE4, CE7, CT5, CT6, CT7, CG2, CG4, CG5, CB2, CB3, -
Integrate the techniques described in the previous objectives into a small but realistic project. Have the ability to design the architecture of a complex system and choose the techniques and technologies previously seen during the course to be applied. The objective is not to finalize the implementation of the system, but to arrive at a level of design detail that would allow a programming team to commission its completion.
Related competences: CE1, CE4, CE6, CE7, CT4, CT5, CT7, CG2, CG3, CG4, CG5, CB2, CB4, -
To evaluate in a basic way the implications of the systems that are learned to build in the subject in terms of privacy, security, ethics and people's rights. It is understood by "in an elementary way" to be able to detect that these implications are significant enough to seek the opinion of an expert in the matter, particularly in relation to the RGPD and the need to carry out risk and impact analysis.
Related competences: CE7, CT5, CG4, CB2, CB3, CB4,
Contents
-
Search and analyisis of text information
Models booleà i vectorial. Cerca basada en paraules clau. Preprocés dels textos. Indexació. Avaluació d'estratègies de cerca. Formació de grups i classificació de textos. Models generatius (LSI, LDA). -
Search and analysis in linked structures
La web: Algorísmes d'avaluació en estructures hiperenllaçades. "Crawling" i "scraping". Xarxes socials: Mesures de centralitat. Comunitats. Influència. Web semàntica. -
Recommendation
Sistemes recomanadors. Recomanació basada en contingut i recomanació basada en la comunitat ("collaborative filtering"). Consideracions pràctiques. -
Massive data algorithms
Resums (sketches) i fluxos de dades (streaming). Mostratge (sampling). Preguntes de proximitat. Dades geolocalitzades. "Caching" consistent i distribuït. Tractament de dades sensibles: anonimització, "end-to-end encryption" i "privacy-preserving machine learning"
Activities
Activity Evaluation act
Activitat sobre el contingut "Cerca i anàlisi d'informació textual"
A teoria, el professor presenta les motivacions i principals conceptes, i en acabar professor i estudiants resolen conjuntament 2-3 problemes de consolidació. A laboratori, els estudiants resolen un cas relacionat amb el contingut.Objectives: 1 3 6 7
Contents:
Theory
6h
Problems
0h
Laboratory
6h
Guided learning
0h
Autonomous learning
12h
Activitat sobre el contingut "Cerca i anàlisi en estructures enllaçades"
A teoria, el professor presenta les motivacions i principals conceptes, i en acabar professor i estudiants resolen conjuntament 2-3 problemes de consolidació. A laboratori, els estudiants resolen un cas relacionat amb el contingut.- Theory: Format classe magistral + resolució grupal de problemes
Theory
6h
Problems
0h
Laboratory
6h
Guided learning
0h
Autonomous learning
12h
Activitat sobre el tema "Recomanació"
A teoria, el professor presenta les motivacions i principals conceptes, i en acabar professor i estudiants resolen conjuntament 2-3 problemes de consolidació. A laboratori, els estudiants resolen un cas relacionat amb el contingut.Objectives: 4 6 7
Contents:
Theory
4h
Problems
0h
Laboratory
4h
Guided learning
0h
Autonomous learning
8h
Activitat sobre el contingut "Algorismes per a dades massives"
A teoria, el professor presenta les motivacions i principals conceptes, i en acabar professor i estudiants resolen conjuntament 2-3 problemes de consolidació. A laboratori, els estudiants resolen un cas relacionat amb el contingut.Objectives: 5 6 7
Contents:
Theory
8h
Problems
0h
Laboratory
8h
Guided learning
0h
Autonomous learning
18h
Integració. Construcció de sistemes reals. Implicacions en privacitat, seguretat i drets de les persones.
A teoria, el professor presenta les motivacions i principals conceptes, i en acabar professor i estudiants resolen conjuntament 2-3 problemes de consolidació. A laboratori, els estudiants resolen un cas relacionat amb el contingut.Objectives: 6 7
Theory
4h
Problems
0h
Laboratory
4h
Guided learning
0h
Autonomous learning
8h
Teaching methodology
Classes "de teoria" expositives per part del professor. Es proposaran un cert nombre d'exercicis a resoldre fora de classe per a la propera sessió.Classes "de teoria" dedicades a la resolució. Es comentaran en comú les solucions dels exercicis proposats a la/les sessions precedents. S'esperarà que els estudiants hagin intentat resoldre'ls.
Classes "de laboratori": A partir d'un guió que rebran al principi de la sessió, els estudiants duran a terme alguna tasca amb ordinador per consolidar els conceptes vistos a les classes de "teoria". Típicament serà l'implementació i experimentació amb algun algorisme, o l'anàlisi d'algun conjunt de dades.
Evaluation methodology
P = mid term.F = final exam mark.
L = lab session reports mark.
Final will be computed as 25% P + 50% F + 25% L.
The grade assigned to the "competencia transversal" CT6 (autonomous learning) will be computed from exam responses and/or information reflected on project reports from a topic proposed by the instructor that students will have to learn on their own.
Bibliography
Basic
-
Modern information retrieval: the concepts and technology behind search
- Baeza-Yates, R.; Ribeiro-Neto, B,
Addison-Wesley / Pearson,
2011.
ISBN: 9780321416919
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003938679706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Mining of massive datasets
- Leskovec, J.; Rajaraman, A.; Ullman, J.D,
Cambridge University Press,
2020.
ISBN: 9781108476348
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004193679706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Everybody lies : what the internet can tell us about who we really are
- Stephens-Davidowitz, S,
Bloomsbury Publishing,
2018.
ISBN: 9781408894736
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004177379706711&context=L&vid=34CSUC_UPC:VU1&lang=ca