Boltzmann Machines are probabilistic models developed in 1985 by D.H. Ackley, G.E. Hinton and T.J. Sejnowski. In 2006, Restricted Boltzmann Machines (RBMs) were used in the pre-training step of several successful deep learning models, leading to a new renaissance of neural networks and artificial intelligence. In spite of their nice mathematical formulation, there are a number of issues that are hard to compute: - The computation of the partition function is NP-hard, involving an exponential sum of terms - The exact computation of the derivative of the log-likelihood is also NP-hard, since it contains the derivative of the partition function Therefore, in practice we have to approximate both the computation of the probabilities and several components of the learning process itself. These drawbacks have prevented RBMs to show their real potential as truly probabilistic models. Currently, we are working on trying to improve several of the unsolved issues related to RBMs: - Mechanisms to control the learning process www.lsi.upc.edu/%7Eeromero/Publications/Downloads/2018-tnnls-stopcritRBM.pdf - Better approximations of the derivative of the log-likelihood http://www.lsi.upc.edu/%7Eeromero/Publications/Downloads/2019-nn-weightedCD.pdf - Efficient approximation of the partition function (work in progress) These works have opened new lines of research, some of which can be the topic of a Master's Thesis. The scope and degree of depth of the work can be adapted to the estimated times to complete the Thesis. For further details, contact Enrique Romero ( ).
This project is a continuation of two previous master thesis developed in the second terms of courses 2017-2018 and 2018-2019. In the framework of the Tokyo2020 Olympic Games Weather Project, leaded by TriM s.r.l. company and funded by Austrian Sailing Federation, Croatia and Cyprus Laser Olympic classes, a big amount of data have been and are currently collected on the sea through real time sensors during trainings and racings and are being stored into a cloud database. The collaboration with UPC is aimed at developing a data analysis methodology able to support sailors decisions during Olympic Games races. Sailing strategy and performance are strongly related with environmental parameters such as weather, oceanic current and geographical data. A thorough prediction of the conditions expected during a sailing race is a valuable information for a sailor, as it completely conditions his/her tactics during the race. With the aim of developing a decision support system valid for the Olympic Classes Sailing Venues, the following components will be developed and integrated together into one single web-based platform: 1. Wind component 2. Waves component 3. Oceanic current component 4. Boat Performance component. The present master thesis proposal is related with the wind component. The goal is to develop a methodology/procedure to analyse the 'recorded wind dataset' and to recognize significant wind patterns, in other words characteristic features of the wind speed and direction related with the other weather parameters of the day (air pressure, air and water temperature, etc..) and with the geographical position of the specific racing area. A similar analysis should be performed on the 'weather prediction model dataset' and for instance on the wind parameter of the model, to find correlations between predicted and measured values. This step is fundamental for the validation of the weather model that will be used daily during the Olympic Games.
Traditional classification schemes for wind patterns are based on meteorological experience and manual analysis of synoptic weather charts. However, thanks to the work performed during the two previous master thesis, it has been demonstrated that approaches based on clustering analysis of collected data, able to automatically induce wind patterns, as well as the characteristic features of these patterns and their evolution through the day, are a very valuable support to human decisions. Moreover, since data measuring are performed in different locations around the race areas, the automatic clustering methods could also find different behaviours depending on a specific area for the same wind pattern. All these would allow:
- A detailed analysis to determine the representativeness of the wind fields encountered in the measuring period, their frequency of occurrence, timing, rate of evolution, and transition probabilities.
- Consequently, a more thorough prediction of the conditions expected before a sailing race, which is as mentioned a highly valuable information for the sailor.
During the previous master projects the clustering module has been completely developed, and it has been applied to data coming from weather prediction models and to data collected on the sea. This has been done applying traditional clustering techniques. However, it has also been seen that these data are recorded daily and present a time-series behaviour, and that the wind patterns may depend on additional parameters. The current project aims then at:
- Applying more advanced and novel sequential clustering techniques to these data in order to extract more complex and realistic wind patterns
- Comparing/combining the results obtained with both data sets (weather models and data measured on the sea)
- Applying additional machine learning techniques in order to take into account available parameters (such as geolocation)
- Generalising the environment so that the developed algorithm can be used not only in Tokyo, but in any particular location where wind patterns are significant and data are available
The format of data collected on the sea is the one provided by Texys Marine company who built the on-board wind measurement system, and the format of weather prediction models dataset is the one used by operational meteorological centers, and suggested by the World Meteorological Organization, for the storage and the exchange of gridded weather and climate models fields (grib2 or NetCDF).
Although this is a modality A thesis, interaction/collaboration with TriM company is highly possible depending on the evolution of the work.
We propose to a student or multiple students to work on processing techniques using Deep Learning (Convolutional Neural networks, Generative Adversarial Networks, Semantic Segmentation Networks) to detect and classify marine mammals in photographs and satellite imagery. The computational capacity offered by these new tools will allow the scientific community to better study endangered species and to give an adequate and rapid response to face the current biodiversity crisis.
A detailed description of the project can be found in the following link:
In recent years the volume of information available electronically has increased exponentially, coining the term Big Data to refer to this phenomenon. The medical domain is an area in which the number of documents generated by the centers for patient primary care constantly increases. However, a bottleneck is generated because processing these documents requires specialized personnel craftly performing tasks. In the framework of GraphMed research project, we are developing a set of processors that allow automatic analysis of medical texts taking into account criteria of robustness, high precision and coverage. In particular, this thesis would aim at the extraction of semantic graphs related to medical records and acquisition of patterns of clinical behavior.
The medical records of each patient contain textual information about the clinical evolution of the patient (including drugs, chemicals, diseases, symptoms and body parts). The analysis of this information can be of significant interest for the development of future clinical performances. Therefore, the development of a methodology able to get semantic graphs where that information is represented in structured format, as well as to acquire patterns of clinical behaviour from them is of great interest to the medical community in primary care. For example, we can get to automatically infer that a certain drug has a previously unknown side effect, or that patients suffering from a certain disease develop certain symptoms in a certain period of time.
Since we have an annotated corpus of medical records, the project will use supervised and semi-supervised machine learning techniques.
Although this is a modality A thesis, remunerated collaboration with the GraphMed project is highly possible depending on the evolution of the work.