Machine Learning (ML) has taken the world by storm and has become a fundamental pillar of engineering. As a result, the last decade has witnessed an explosive growth in the use of deep neural networks (DNNs) in pursuit of exploiting the advantages of ML in virtually every aspect of our lives: computer vision, natural language processing, medicine or economics are just a few examples. However, NOT all DNNs fit to all problems: convolutional NNs are good for computer vision, recurrent NNs are good for temporal analysis, and so on. In this context, the main focus of N3Cat and BNN-UPC is to explore the possibilities of the new and less explored variant called Graph Neural Networks (GNNs), whose aim is to learn and model graph-structured data. This has huge implications in fields such as quantum chemistry, computer networks, or social networks among others. OBJECTIVES =========== N3Cat and BNN-UPC are looking for students wanting to work in the area of Graph Neural Networks studying their uses, processing architectures, and algorithms. To this end, the candidate will work on ONE of the following areas: - Investigating the state of the art on this area, surveying the different works done in terms of applications, processing frameworks, algorithms, benchmarks, datasets. This can be taken from a hardware or software perspective. - Helping to build a testbed formed by a cluster of GPUs that will be running pyTorch or Tensorflow. We will instrument the testbed to measure the computation workload and communication flows between GPUs. - Analyzing the communication workload of running a GNN either in the testbed or by means of architectural simulations. - Developing means of accelerating GNN processing in software (e.g., improving scheduling of the message passing) or hardware (e.g. designing a domain-specific architecture).
Companies and scientists working in areas such as finance or genomics are generating enormously large datasets (in the order of petabytes) commonly referred as Big Data. How to efficiently and effectively process such large amounts of data is an open research problem. Since communication is involved in Big Data processing at many levels, at the NaNoNetworking Center in Catalunya (N3Cat) we are currently investigating the potential role of wireless communications in the Big Data scenario. The main focus of the project is to evaluate the impact of applying wireless communications and networking methods to processors and data centers oriented to the management of Big Data. OBJECTIVES =========== N3Cat is looking for students wanting to work in the area of wireless communications for Big Data. To this end, the candidate will work on one of the following areas: - Traffic analysis of Big Data frameworks and applications, as well as in smaller manycore systems. - Channel characterization in Big Data environments: indoor, within the racks of a data center, within the package of CPU, within a chip. - Design of wireless communication protocols for computing systems from the processor level to the data center level.
In many state-of-the-art simulation codes the discretization is so closely tied to the data layout and solver that switching discretizations in the same code is not possible. Not only does this preclude the kind of comparison that is necessary for scientific investigation, but it makes library development impossible. This project consists of implementing and verifying different topology strategies to treats all the different pieces of a 3D mesh (e.g. cells, faces, edges, and vertices) in exactly the same way. This allows the mesh interface to be very small and simple while remaining flexible and general. This also allows "dimension independent programming", which means that the same algorithm can be used unchanged for meshes of different shapes and dimensions. The project will use an existing parallel python prototype and explore alternatives to improve its robustness and extend it without sacrificing flexibility. This project will investigate various ways to optimize and parallelize Python programs for large-scale simulations on real-life production clusters. This project will be developed in the context of the PIXIL project (Interreg POCTEFA), which is coordinated by the geosciences applications group of the Barcelona Supercomputing Center.
To carry out the experiments and tests, the marenostrum supercomputer will be used (https://www.bsc.es/marenostrum)
More information at:
Robotic Process Automation is receiving significant attention, due to the promise of improving the performance of the main processes of an organization by incorporating robots that partially perform repetitive tasks. In this project, we will consider how Process Mining can help into finding opportunities to apply Robotic Process Automation for a real case study.
Recently, one of the leaders in Robotic Process Automation has adquired one of the main process mining tools (https://www.uipath.com/newsroom/uipath-acquires-process-gold-unparalleled-process-understanding). This is a confirmation of the potential link between the field of process mining and the field of robotic process automation.
In this project we will try to find out how strong is this link. By using real data from a company that is in trying to automate its processes, the student will dig into the field of process mining to propose a methodology to unleash the application of RPA.
In this project, there is a possibility to have a grant that covers the time invested.
From a snow avalanche model developed at the UPC, which simulates the dynamics of this phenomenon, we want to do a full validation of the model to allow specialists in avalanches of the ICGC to use it as a tool in the decision-making process.
The Validation Verification and Accreditation of a model is essential to be able to effectively use a model in production for decission making. The project aims to validate the model and the implementation so that the end result reproduces the natural dynamics of the phenomenon. With this validation model will be used as a tool in support of avalanche team of the ICGC. In the development of the project will help from specialists of the ICGC who are taking part in this process of validation.Advanced DOE techniques will be applied during the project.
Languages follow many statistical regularities called laws. Perhaps the most popular example is Zipf's law for word frequencies, that relates the frequency of a word with its rank, but other laws have been formulated, such as the law of abbreviation, the law of meaning-distribution, the meaning-frequency law,...and so on (Zipf 1949). About 15 years ago, a family of optimization models was introduced to shed light on the origins of Zipf's law for word frequencies (Ferrer-i-Cancho & Solé 2003, Ferrer-i-Cancho 2005). In that family, language is modelled as a bipartite graph where words connect to meanings and a cost function is defined based on the structure of that graph. A simple Monte Carlo algorithm was used to minimize the cost function while the structure of the graph was allowed to vary. Recently, it has been shown how these models shed light on how children learn words (Ferrer-i-Cancho 2017). The aim of this project is to investigate new versions of these models (e.g., Ferrer-i-Cancho & Vitevitch 2018) in two directions: (1) Providing an efficient implementation of the optimization algorithm. (2) Comparing the statistical properties of the model against the statistical properties of natural communication systems.
In greater detail, the two directions consist of
(1) Providing an efficient implementation of the optimization algorithm. See Ferrer-i-Cancho and Solé (2003) and Ferrer-i-Cancho (2005) for further details about the algorithm. Evaluating the cost for a given bipartite graph from scratch has cost of the order of nm, where n is the number of words and m is the number of meanings. Decinding when to stop the optimizacion algorithm requires (nm)^2 evaluations of the cost function (in practice it had to be cut down to about nm due to computational costs). For these reasons, n and m have been kept small in previous studies compared to real values in fully fledged human language (e.g., n = m = 150 in Ferrer-i-Cancho and Solé 2003). This computational callenge would be solved applying different techniques, e.g., (a) parallelization (b) dynamic calculation (when changing a few cells of the adjacency matrix, the cost function should not be computed from scratch) and (c) heuristics to speed up the Monte Carlo scheme.
(2) Comparing the statistical properties of the model against the real statistical properties of human language (e.g., linguistics laws) and animal communication, including properties that have not been tested in previous research on these models. See Ferrer-i-Cancho (2018) for an overview of some of the statistical properties of real language that could be tested.
Depending on the personal interests of the student, the project can focus in one of the two directions.
It is possible to publish the results of the project in a research journal.
Ferrer-i-Cancho, R. & Solé, R. V. (2003). Least effort and the origins of scaling in human language. Proceedings of the National Academy of Sciences USA 100, 788-791.
Ferrer-i-Cancho, R. (2005). Zipf's law from a communicative phase transition. European Physical Journal B 47, 449-457.
Ferrer-i-Cancho, R. (2017). The optimality of attaching unlinked labels to unlinked meanings. Glottometrics 36, 1-16.
Ferrer-i-Cancho, R. & Vitevitch, M. S. (2018). The origins of Zipf's meaning-frequency law. Journal of the American Society for Information Science and Technology 69 (11), 1369-1379.
Ferrer-i-Cancho, R. (2018). Optimization models of natural communication. Journal of Quantitative Linguistics 25 (3), 207-237.
Zipf, G.K. (1949). Human behaviour and the principle of least effort. Cambridge (MA), USA: Addison-Wesley.
FHIR (Fast Healthcare Interoperability Resources) is a set of standards developed by HL7 International to facilitate eHealth information interoperability and use. On the other hand, different efforts are in place to improve the representation (more compression and security) of Genomic information, such as those from the GA4GH (Global Alliance for Genomics and Health) and the MPEG standardization committee. The DMAG (Distributed Multimedia Applications Group) of the Computer Architecture Department of the UPC is involved in the specification of some of these new standards. The objective of this project is to integrate genomic information into EHRs (Electronic Health Records). For this purpose, the different standards for the representation of medical and genomic information will be analysed, and FHIR will be used to faclitate that integration. Finally, a small prototype will be developed, probably making use of existing open source software. The results of this work could be contributed to one of the different standardization organizations for its consideration.
The weather - dependent routing algorithm will be integrated into a service for the logistic platforms of food distribution companies enabling self-response capabilities during severe weather events. Using a Multi-Hazard Early Warning System (MH-EWS) as an input, effects of weather will be crossed with a representation model of the road network. The model will be used to provide alternative routes with anticipation to the logistic demand (total freight to be moved between warehouses). Resulting routes should be shown over a map.
In this project the aim is to implement and evaluate some agile optimization methods for city logistics that meet real time and large scale requirements.
City logistics is benefiting significantly by using big data analytics (based on IoT data) to improve the performance and sustainability in modern large cities. However, smart city platforms distinguish for their dynamics and large scale making it difficult to take real time decisions. Therefore agile optimization methods have emerged as a way to cope with such demanding requirements.
The project will seek large scale distributed implementations using real life data sets.
Web tracking technologies are extensively used to collect large amounts of personal information (PI), including the things we search, the sites we visit, the people we contact, or the products we buy. Although it is commonly believed that this data is mainly used for targeted advertising, some recent works revealed that it is exploited for many other purposes, such price discrimination, financial credibility, insurance coverage, government surveillance, background scanning or identity theft. The main objective of this project is to apply network traffic monitoring and analysis technologies to uncover the particular methods used to track Internet users and collect PI. This project will be useful for both Internet users and the research community, and will produce open source tools, real data sets, and publications revealing most privacy attempting practices. Some preliminary results of our work in this area were recently published in Proceedings of the IEEE (IF: 9.237) and featured in a Wall Street Journal article.
More info at:
Gran parte del código abierto para big-data está escrito para la JVM y actualmente gran parte de este código lo forman algoritmos de minería de datos, y otras técnicas que pueden incluirse en la especialidad de Inteligencia Artificial. Además de Java, Python, otro lenguaje interpretado, es cada vez más utilizado en cualquier entorno de programación, en concreto también en inteligencia artificial y para algoritmos de machine learning. Hay adeptos da cada uno de los dos lenguajes, basándose en su curva de aprendizaje, su portabilidad, ... Es open source, portable y soporta tareas estándar de la minería de datos, como es el pre-procesado de datos, la clasificación, agrupación en clústeres, visualización, regresión y selección de características. El objetivo principal de este proyecto es: A partir de un conjunto de algoritmos que caracterizan el trabajo a realizar en minería de datos, tales como pre-procesado de datos, la clasificación, agrupación en clústeres, visualización, regresión y selección de características, compara el rendimiento de estos dos lenguajes y ver las ventajas o inconvenientes de utilizar uno u otro, según la plataforma hardware subyacente. En concreto, x86 y ARM.
The Barcelona Neural Networking Center (BNN-UPC) is offering two positions to develop the Master Thesis in the field of Graph Neural Networks (GNN) applied to computer networking. This TFM will be fully funded and will be carried in the context of a large industrial project with a major multinational technology company.
Graph Neural Networks (GNN) have been recently proposed to learn, model and generalize over graph structured data. Computer Networks are fundamentally graphs, and many of its relevant characteristics -such as topology and routing- are represented as graph-structured data.
GNN are a central tool to apply ML techniques to Computer Networks. GNN can learn the relationship of complex network characteristics and build relevant models that can be useful to plan and manage a network. In combination with Deep-Reinforcement Learning (DRL) techniques, GNN can help developing autonomous network optimization mechanisms that will result in unprecedented performance, achieving the ultimate vision of self-driving networks.
The Barcelona Neural Networking Center (https://bnn.upc.edu) is a new research initiative of UPC with the main goal of carrying out fundamental research in the field of Graph Neural Networks applied to Computer Networks, and providing education and training to the new generation of Computer Networking students.
The main goal of this project is to develop a network monitoring system that can be used by network operators to detect bitcoin miners (or miners from other blockchain technologies) in their network. The system will rely only on network measurements obtained by standard network measurement tools and estimate interesting characteristics of detected miners, such as power consumption. How to apply: Please send an email to with your CV and academic file (pdf can be generated from the Raco).
PETGEM is an open-source 3D electromagnetic modeler with support for HPC architectures. It solves Maxwell's equations using a high-order vector finite element method. PETGEM is mostly written in Python. On the other hand, OpenGeode is an open-source platform for representing and manipulating geometric models. This project consists of the integration of OpenGeode within PETGEM in order to improve the main work-flow for electromagnetic modeling in geophysics. The project will use real-life data sets and explore alternatives to extend it without sacrificing flexibility. Among the topics are refactoring in software engineering, Python programming (efficient use of Numpy arrays), and parallel computing with message passing approach (mpi4py). This project will be developed in the context of the PIXIL project (Interreg POCTEFA), which is coordinated by the geosciences applications group of the Barcelona Supercomputing Center.
More information at:
The identification of the applications behind the network traffic (i.e. traffic classification) is crucial for ISPs and network operators to better manage and control their networks. However, the increasing use of encryption and web-based applications makes this identification very challenging. This problem is exacerbated with the widespread deployment of content distribution networks (e.g. Akamai) and cloud-based services (e.g. Amazon AWS). The goal of this project is to develop a traffic monitoring tool to accurately identify web services from HTTPS traffic, including Google, YouTube, Facebook, Twitter among others. The tool will combine the information from IP addresses and DNS, with novel classification methods inspired on the Google PageRank algorithm to identify encrypted traffic, even if served from Akamai, AWS or Google infrastructures. This project will be carried out in collaboration with the tech-based company Talaia Networks (https://www.talaia.io), which develops cloud-based network monitoring solutions.
How to apply: Please send an email to firstname.lastname@example.org with your CV and academic file (pdf can be generated from the Raco).