Credits
6
Types
Compulsory
Requirements
This subject has not requirements
, but it has got previous capacities
Department
ESSI
Web
https://learnsql.fib.upc.edu
The first part of the course focuses on property graphs, which build upon traditional graph data management systems and provide the basis for efficient graph storage, querying, and processing. Within this framework, students study fundamental graph algorithms and graph processing techniques to analyze structure, connectivity, and patterns in large-scale graph data.
The second part of the course introduces knowledge graphs, which extend graph data management with semantic annotations and formal vocabularies, enabling symbolic reasoning, inference, and richer forms of data integration. This perspective highlights how semantics add interpretability and reasoning capabilities beyond purely structural graph analysis.
The final part of the course presents a complementary form of graph exploitation based on graph embeddings. By mapping graph elements into continuous vector spaces, embeddings enable the application of machine learning techniques directly on graph-structured data. This includes an introduction to graph neural networks (GNNs) as a powerful paradigm for representation learning on graphs that explicitly captures structural and relational context.
As this is a rapidly evolving and still maturing research area, there is no single, well-established methodology. Consequently, the course emphasizes rigorous reasoning, technical depth, and innovation, preparing students to effectively incorporate complex, graph-structured data into organizational decision-making processes.
Teachers
Person in charge
- Anna Queralt Calafat ( anna.queralt@upc.edu )
Others
- Albert Martín Garcia ( albert.martin.g@upc.edu )
- Gerard Pons Recasens ( gerard.pons.recasens@upc.edu )
- Oscar Romero Moral ( oscar.romero@upc.edu )
Weekly hours
Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
7.11
Competences
Teamwork
Third language
Entrepreneurship and innovation
Basic
Generic
Especifics
Objectives
-
Learn, understand and apply the fundamentals of property graphs
Related competences: CT3, CT5, CG1, CE5, CE9, CB6, CB9, CB10, -
Learn, understand and apply the fundamentals of knowledge graphs
Related competences: CT3, CT5, CG1, CE5, CE9, CB6, CB9, CB10, -
Perform graph data processing both in centralized and distributed environments
Related competences: CT3, CT5, CG1, CE5, CE9, CB6, CB9, CB10, -
Integrate, combine and refine semi-structured or non-structured data using graph formalisms
Related competences: CT3, CT5, CT1, CG1, CG3, CE3, CE5, CE9, CE12, CE13, CB6, CB7, CB8, CB9, -
Determine how to apply graph formalisms to solve the Variety challenge (data integration)
Related competences: CT5, CT1, CG3, CE3, CE5, CE9, CE12, CE13, CB6, CB7, CB9, -
Apply property or knowledge graphs to solve realistic problems such as data integration, graph-based data analysis, etc.
Related competences: CT3, CT5, CT1, CG1, CG3, CE3, CE5, CE9, CE12, CE13, CB6, CB7, CB8, CB9, CB10,
Contents
-
Introduction and formalization of semantic data management
Definition of data management tasks from the perspectives of databases and knowledge representation. Syntactic and semantic heterogeneity, and the impact of data heterogeneity on different data management tasks. Concept of data integration and definition of a theoretical framework for managing and integrating heterogeneous data sources. The need for a canonical data model for data integration, including the definition of a data model and the essential characteristics of canonical data models. -
Property graphs
Data structures. Model integrity constraints. Basic operations based on topology, content, and hybrid approaches. Graph query languages: GraphQL and Cypher. Graph database concepts. Native implementations and implementations based on relational algebra. Impact of these design decisions on core operations. Efficient graph design. Impact of these heterogeneities on the main operations. Distributed graph databases: motivation and challenges. The thinking like a vertex paradigm as the de facto standard for distributed graph processing. Main distributed graph processing algorithms. -
Knowledge graphs
RDF, RDFS, and OWL. Data structures. Integrity constraints. Relationship with first-order logic. Foundations in Description Logics. Inference. Basic operations and query languages. SPARQL and its algebra. Entailment regimes (inference). -
Property and knowledge graphs comparison. Use cases
Recap about both models. Commonalities and differences. Concepts to borrow between both paradigms.
Main use cases. Metadata management: Data Lake semantification and data governance.
Main use cases. Exploitation of their topological features: recommenders on graphs and data mining.
Visualization: by means of a GUI (Gephi) or programmatically (D3.js or GraphLab). -
Embeddings and GNNs
Concept of embeddings. Properties. Application to graphs and connection with Machine Learning and learning algorithms. GNN architectures. Applications.
Activities
Activity Evaluation act
Lectures
During lectures the main concepts will be discussed. Lectures will combine master lectures and active / cooperative learning activities. The student is meant to have a pro-active attitude during active / cooperative learning activities. During master lectures, the student is meant to listen, take notes and ask questions.Objectives: 2 5 3 1
Contents:
Theory
25h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
28h
Teaching methodology
Lectures: The instructor presents the topic. Students follow the lecture, take notes, and prepare additional material outside the classroom. They may also be asked to carry out activities during these sessions.Laboratory: Laboratory sessions are mainly devoted to practical work (with or without a computer) on the concepts introduced in the lecture sessions. Tools relevant to the introduced concepts are presented and used in projects during these sessions. Laboratory work requires the submission of project-based assignments, to be developed both in class and at home, which are assessed together with an on-site examination.
Evaluation methodology
Final grade = 40% EX + 60% LABEX = Final exam grade
LAB = Weighted grade of the laboratory work. Laboratory assessment is based on the submission (E) and an on-site assessment test (C) related to the submission. The final laboratory grade is computed as the geometric mean of E and C.
Bibliography
Basic
-
Data Integration: A Theoretical Perspective
- Lenzerini, Maurizio,
PODS '02: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems,
2002.
ISBN: 1-58113-507-6
https://doi.org/10.1145/543613.543644 -
Managing and mining graph data
- Aggarwal, Charu C; Wang, Haixun,
Springer,
2010.
ISBN: 9781441960443
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003843179706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
The description logic handbook: theory, implementation and applications
- Baader, Franz,
Cambridge University Press,
2003.
ISBN: 0521781760
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991002562579706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Web data management
- Abiteboul, Serge,
Cambridge University Press,
2012.
ISBN: 9781107012431
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003929239706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing
- Sahu, Siddhartha; Mhedhbi, Amine; Salihoglu, Semih; Lin, Jimmy; Özsu, M. Tamer,
Cornell University Library,
2017.
https://arxiv.org/abs/1709.03188 -
Deep Learning
- Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron,
MIT Press,
2016.
https://www.deeplearningbook.org/ -
Representation Learning on Graphs
- Hamilton, William L.,
Morgan & Claypool Publishers,
2020.
https://www.cs.mcgill.ca/~wlh/grl_book/ -
Neural Network Methods in Natural Language Processing (Synthesis Lectures on Human Language Technologies)
- Goldberg, Yoav; Hirst, Graemer,
Morgan & Claypool ,
2017.
ISBN: 9781681732350
https://mitpressbookstore.mit.edu/book/9781681732350 -
A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications
- Cai, HongYun and Zheng, Vincent W. and Chang, Kevin Chen-Chuan,
IEEE Transactions on Knowledge and Data Engineering,
9 (2018).
ISBN: 1558-2191
10.1109/TKDE.2018.2807452