Languages follow many statistical regularities called laws. Perhaps the most popular example is Zipf's law for word frequencies, that relates the frequency of a word with its rank, but other laws have been formulated, such as the law of abbreviation, the law of meaning-distribution, the meaning-frequency law,...and so on (Zipf 1949). About 15 years ago, a family of optimization models was introduced to shed light on the origins of Zipf's law for word frequencies (Ferrer-i-Cancho & Solé 2003, Ferrer-i-Cancho 2005). In that family, language is modelled as a bipartite graph where words connect to meanings and a cost function is defined based on the structure of that graph. A simple Monte Carlo algorithm was used to minimize the cost function while the structure of the graph was allowed to vary. Recently, it has been shown how these models shed light on how children learn words (Ferrer-i-Cancho 2017). The aim of this project is to investigate new versions of these models (e.g., Ferrer-i-Cancho & Vitevitch 2018) in two directions: (1) Providing an efficient implementation of the optimization algorithm. (2) Comparing the statistical properties of the model against the statistical properties of natural communication systems.
In greater detail, the two directions consist of
(1) Providing an efficient implementation of the optimization algorithm. See Ferrer-i-Cancho and Solé (2003) and Ferrer-i-Cancho (2005) for further details about the algorithm. Evaluating the cost for a given bipartite graph from scratch has cost of the order of nm, where n is the number of words and m is the number of meanings. Decinding when to stop the optimizacion algorithm requires (nm)^2 evaluations of the cost function (in practice it had to be cut down to about nm due to computational costs). For these reasons, n and m have been kept small in previous studies compared to real values in fully fledged human language (e.g., n = m = 150 in Ferrer-i-Cancho and Solé 2003). This computational callenge would be solved applying different techniques, e.g., (a) parallelization (b) dynamic calculation (when changing a few cells of the adjacency matrix, the cost function should not be computed from scratch) and (c) heuristics to speed up the Monte Carlo scheme.
(2) Comparing the statistical properties of the model against the real statistical properties of human language (e.g., linguistics laws) and animal communication, including properties that have not been tested in previous research on these models. See Ferrer-i-Cancho (2018) for an overview of some of the statistical properties of real language that could be tested.
Depending on the personal interests of the student, the project can focus in one of the two directions.
It is possible to publish the results of the project in a research journal.
Ferrer-i-Cancho, R. & Solé, R. V. (2003). Least effort and the origins of scaling in human language. Proceedings of the National Academy of Sciences USA 100, 788-791.
Ferrer-i-Cancho, R. (2005). Zipf's law from a communicative phase transition. European Physical Journal B 47, 449-457.
Ferrer-i-Cancho, R. (2017). The optimality of attaching unlinked labels to unlinked meanings. Glottometrics 36, 1-16.
Ferrer-i-Cancho, R. & Vitevitch, M. S. (2018). The origins of Zipf's meaning-frequency law. Journal of the American Society for Information Science and Technology 69 (11), 1369-1379.
Ferrer-i-Cancho, R. (2018). Optimization models of natural communication. Journal of Quantitative Linguistics 25 (3), 207-237.
Zipf, G.K. (1949). Human behaviour and the principle of least effort. Cambridge (MA), USA: Addison-Wesley.
The master thesis consists of developing a framework for Group Recommender Systems and investigating the methods for generating recommendations to groups.
Most of the research on group recommendation investigated the core algorithms used for recommendation generation. Two different strategies have been mostly used for generating group recommendations: aggregating individual predictions into group predictions or aggregating individual models into group models. Differences among these strategies differ in the timing of data aggregation step.
In fact, the role of a group recommender system is to make suggestions that reflect the preferences of the group as a whole, while offering reasonable and acceptable options to individual group members. An important issue to be addressed in this kind of recommenders is how to reach consensus among members during and at the end of the recommendation process.
In this proposal, the focus will be on generating a group recommender framework that will be based on the well-known LibRec library. The main steps will be:
Boltzmann Machines are probabilistic models developed in 1985 by D.H. Ackley, G.E. Hinton and T.J. Sejnowski. In 2006, Restricted Boltzmann Machines (RBMs) were used in the pre-training step of several successful deep learning models, leading to a new renaissance of neural networks and artificial intelligence. In spite of their nice mathematical formulation, there are a number of issues that are hard to compute: - The computation of the partition function is NP-hard, involving an exponential sum of terms - The exact computation of the derivative of the log-likelihood is also NP-hard, since it contains the derivative of the partition function Therefore, in practice we have to approximate both the computation of the probabilities and several components of the learning process itself. These drawbacks have prevented RBMs to show their real potential as truly probabilistic models. Currently, we are working on trying to improve several of the unsolved issues related to RBMs: - Mechanisms to control the learning process www.lsi.upc.edu/%7Eeromero/Publications/Downloads/2018-tnnls-stopcritRBM.pdf - Better approximations of the derivative of the log-likelihood http://www.lsi.upc.edu/%7Eeromero/Publications/Downloads/2019-nn-weightedCD.pdf - Efficient approximation of the partition function (work in progress) These works have opened new lines of research, some of which can be the topic of a Master's Thesis. The scope and degree of depth of the work can be adapted to the estimated times to complete the Thesis. For further details, contact Enrique Romero ( ).
We propose to a student or multiple students to work on processing techniques using Deep Learning (Convolutional Neural networks, Generative Adversarial Networks, Semantic Segmentation Networks) to detect and classify marine mammals in photographs and satellite imagery. The computational capacity offered by these new tools will allow the scientific community to better study endangered species and to give an adequate and rapid response to face the current biodiversity crisis.
A detailed description of the project can be found in the following link:
The goal of this project is to analyze state-of-the-art metrics that combines them to quantify the equality of learning opportunities among learners and mitigate the inequalities generated by the recommender systems as post-processed approach that balances personalization and learning opportunity equality in recommendations
Online educational platforms are promising to play a primary role in mediating the success of individuals' careers. Hence, while building overlying content recommendation services, it becomes essential to ensure that learners are provided with equal learning opportunities, according to the platform values, context, and pedagogy. Even though the importance of creating equality of learning opportunities has been well investigated in traditional institutions, how it can be operationalized scalably in online learning ecosystems through recommender systems is still under-explored.
The goal of this project is to analyze state-of-the-art metrics that combines them to quantify the equality of learning opportunities among learners and mitigate the inequalities generated by the recommender systems as post-processed approach that balances personalization and learning opportunity equality in recommendations. To do it, we will consider state-of-the-art recommender systems, covering both model- and memory -based approaches and point-wise and pair-wise algorithms.
The goal of this project is to analyze and mitigate bias in collaborative filtering recommender systems.
Recommender systems analyze the behavior of the users and their preferences, to learn patterns and understand what might be interesting to suggest them. Natural imbalances in the data (e.g., in the amount of observations for a subset of popular items) might be embedded in the patterns. As a consequence, the produced recommendations exacerbate these imbalances, thus strengthening inequalities and generating biases . When these imbalances are associated to sensitive attributes of the users (e.g., gender or race), this might have negative societal consequences, such as unfairness . Unfairness might affect all the stakeholders involved in a recommender system, such as the users (when the minority receives systematically worse recommendations), or the content providers (when the items offered by a group of providers are exposed less than those of their counterpart) .
The goal of this project is to analyze and mitigate bias in recommender systems. To do it, we will consider state-of-the-art recommender systems, covering both model- and memory -based approaches and point-wise and pair-wise algorithms.
 Himan Abdollahpouri, Gediminas Adomavicius, Robin Burke, Ido Guy, Dietmar Jannach, Toshihiro Kamishima, Jan Krasnodebski, and Luiz AugustoPizzato. 2020. Multistakeholder recommendation: Survey and research directions.User Model. User-Adapt. Interact.30, 1 (2020), 127¿158. https://doi.org/10.1007/s11257-019-09256-1
 Ludovico Boratto, Gianni Fenu, and Mirko Marras. 2019. The Effect of Algorithmic Bias on Recommender Systems for Massive Open Online Courses.InAdvances in Information Retrieval - 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 14-18, 2019, Proceedings, Part I(Lecture Notes in Computer Science, Vol. 11437), Leif Azzopardi, Benno Stein, Norbert Fuhr, Philipp Mayr, Claudia Hauff, and Djoerd Hiemstra (Eds.).Springer, 457¿472. https://doi.org/10.1007/978-3-030-15712-8_30
 Sara Hajian, Francesco Bonchi, and Carlos Castillo. 2016. Algorithmic Bias: From Discrimination Discovery to Fairness-aware Data Mining.InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, Balaji Krishnapuram, Mohak Shah, Alexander J. Smola, Charu C. Aggarwal, Dou Shen, and Rajeev Rastogi (Eds.). ACM, 2125¿2126.https://doi.org/10.1145/2939672.2945386
 Francesco Ricci, Lior Rokach, and Bracha Shapira. 2015. Recommender Systems: Introduction and Challenges. InRecommender Systems Handbook,Francesco Ricci, Lior Rokach, and Bracha Shapira (Eds.). Springer, 1¿34. https://doi.org/10.1007/978-1-4899-7637-6_