Semantic Data Management

Teachers
Weekly hours
Competences
Objectives
Contents
Activities
Teaching methodology
Evaluation methodology
Bibliography
Previous capacities

Credits

6

Types

Compulsory

Requirements

This subject has not requirements, but it has got previous capacities

Department

ESSI

Web

https://learnsql.fib.upc.edu

This course introduces the principles and techniques of semantic data management for representing, integrating, and exploiting complex and heterogeneous data. Students learn how graph-based data models enable the explicit representation of entities and relationships, overcoming the limitations of traditional key-based data models when dealing with highly connected data. The course covers property graphs and knowledge graphs as foundational abstractions for semantic data integration.

The first part of the course focuses on property graphs, which build upon traditional graph data management systems and provide the basis for efficient graph storage, querying, and processing. Within this framework, students study fundamental graph algorithms and graph processing techniques to analyze structure, connectivity, and patterns in large-scale graph data.

The second part of the course introduces knowledge graphs, which extend graph data management with semantic annotations and formal vocabularies, enabling symbolic reasoning, inference, and richer forms of data integration. This perspective highlights how semantics add interpretability and reasoning capabilities beyond purely structural graph analysis.

The final part of the course presents a complementary form of graph exploitation based on graph embeddings. By mapping graph elements into continuous vector spaces, embeddings enable the application of machine learning techniques directly on graph-structured data. This includes an introduction to graph neural networks (GNNs) as a powerful paradigm for representation learning on graphs that explicitly captures structural and relational context.

As this is a rapidly evolving and still maturing research area, there is no single, well-established methodology. Consequently, the course emphasizes rigorous reasoning, technical depth, and innovation, preparing students to effectively incorporate complex, graph-structured data into organizational decision-making processes.

Teachers

Person in charge

Anna Queralt Calafat ( )

Others

Albert Martin Garcia ( )
Gerard Pons Recasens ( )
Oscar Romero Moral ( )

Weekly hours

Theory

2

Problems

0

Laboratory

2

Guided learning

0

Autonomous learning

7.11

Competences

Transversal Competences

Teamwork

CT3 - Ability to work as a member of an interdisciplinary team, as a normal member or performing direction tasks, in order to develop projects with pragmatism and sense of responsibility, making commitments taking into account the available resources.

Third language

CT5 - Achieving a level of spoken and written proficiency in a foreign language, preferably English, that meets the needs of the profession and the labour market.

Entrepreneurship and innovation

CT1 - Know and understand the organization of a company and the sciences that govern its activity; have the ability to understand labor standards and the relationships between planning, industrial and commercial strategies, quality and profit. Being aware of and understanding the mechanisms on which scientific research is based, as well as the mechanisms and instruments for transferring results among socio-economic agents involved in research, development and innovation processes.

Basic

CB6 - Ability to apply the acquired knowledge and capacity for solving problems in new or unknown environments within broader (or multidisciplinary) contexts related to their area of study.
CB7 - Ability to integrate knowledge and handle the complexity of making judgments based on information which, being incomplete or limited, includes considerations on social and ethical responsibilities linked to the application of their knowledge and judgments.
CB8 - Capability to communicate their conclusions, and the knowledge and rationale underpinning these, to both skilled and unskilled public in a clear and unambiguous way.
CB9 - Possession of the learning skills that enable the students to continue studying in a way that will be mainly self-directed or autonomous.
CB10 - Possess and understand knowledge that provides a basis or opportunity to be original in the development and/or application of ideas, often in a research context.

Generic Technical Competences

Generic

CG1 - Identify and apply the most appropriate data management methods and processes to manage the data life cycle, considering both structured and unstructured data
CG3 - Define, design and implement complex systems that cover all phases in data science projects

Technical Competences

Especifics

CE3 - Apply data integration methods to solve data science problems in heterogeneous data environments
CE5 - Model, design, and implement complex data systems, including data visualization
CE9 - Apply appropriate methods for the analysis of non-traditional data formats, such as processes and graphs, within the scope of data science
CE12 - Apply data science in multidisciplinary projects to solve problems in new or poorly explored domains from a data science perspective that are economically viable, socially acceptable, and in accordance with current legislation
CE13 - Identify the main threats related to ethics and data privacy in a data science project (both in terms of data management and analysis) and develop and implement appropriate measures to mitigate these threats

Objectives

Learn, understand and apply the fundamentals of property graphs
Related competences: CT3, CT5, CG1, CE5, CE9, CB6, CB9, CB10,
Learn, understand and apply the fundamentals of knowledge graphs
Related competences: CT3, CT5, CG1, CE5, CE9, CB6, CB9, CB10,
Perform graph data processing both in centralized and distributed environments
Related competences: CT3, CT5, CG1, CE5, CE9, CB6, CB9, CB10,
Integrate, combine and refine semi-structured or non-structured data using graph formalisms
Related competences: CT3, CT5, CT1, CG1, CG3, CE3, CE5, CE9, CE12, CE13, CB6, CB7, CB8, CB9,
Determine how to apply graph formalisms to solve the Variety challenge (data integration)
Related competences: CT5, CT1, CG3, CE3, CE5, CE9, CE12, CE13, CB6, CB7, CB9,
Apply property or knowledge graphs to solve realistic problems such as data integration, graph-based data analysis, etc.
Related competences: CT3, CT5, CT1, CG1, CG3, CE3, CE5, CE9, CE12, CE13, CB6, CB7, CB8, CB9, CB10,

Introduction and formalization of semantic data management
Definition of data management tasks from the perspectives of databases and knowledge representation. Syntactic and semantic heterogeneity, and the impact of data heterogeneity on different data management tasks. Concept of data integration and definition of a theoretical framework for managing and integrating heterogeneous data sources. The need for a canonical data model for data integration, including the definition of a data model and the essential characteristics of canonical data models.
Property graphs
Data structures. Model integrity constraints. Basic operations based on topology, content, and hybrid approaches. Graph query languages: GraphQL and Cypher. Graph database concepts. Native implementations and implementations based on relational algebra. Impact of these design decisions on core operations. Efficient graph design. Impact of these heterogeneities on the main operations. Distributed graph databases: motivation and challenges. The thinking like a vertex paradigm as the de facto standard for distributed graph processing. Main distributed graph processing algorithms.
Knowledge graphs
RDF, RDFS, and OWL. Data structures. Integrity constraints. Relationship with first-order logic. Foundations in Description Logics. Inference. Basic operations and query languages. SPARQL and its algebra. Entailment regimes (inference).
Property and knowledge graphs comparison. Use cases
Recap about both models. Commonalities and differences. Concepts to borrow between both paradigms.

Main use cases. Metadata management: Data Lake semantification and data governance.

Main use cases. Exploitation of their topological features: recommenders on graphs and data mining.

Visualization: by means of a GUI (Gephi) or programmatically (D3.js or GraphLab).
Embeddings and GNNs
Concept of embeddings. Properties. Application to graphs and connection with Machine Learning and learning algorithms. GNN architectures. Applications.

Activities

Activity Evaluation act

Lectures

During lectures the main concepts will be discussed. Lectures will combine master lectures and active / cooperative learning activities. The student is meant to have a pro-active attitude during active / cooperative learning activities. During master lectures, the student is meant to listen, take notes and ask questions.
Objectives: 2 5 3 1
Contents:

1 . Introduction and formalization of semantic data management
2 . Property graphs
3 . Knowledge graphs
4 . Property and knowledge graphs comparison. Use cases
5 . Embeddings and GNNs

Theory

25h

Problems

0h

Laboratory

0h

Guided learning

0h

Autonomous learning

28h

Hands-on Session

The student will be asked to practice the different concepts introduced in the lectures. This includes problem solving either on the computer or on paper.
Objectives: 6 5 4
Contents:

2 . Property graphs
3 . Knowledge graphs
4 . Property and knowledge graphs comparison. Use cases

Theory

0h

Problems

0h

Laboratory

27h

Guided learning

0h

Autonomous learning

60h

Final Exam

Written exam of the theoretical concepts introduced along the course.
Objectives: 2 5 3 4 1
Contents:

1 . Introduction and formalization of semantic data management
2 . Property graphs
3 . Knowledge graphs
4 . Property and knowledge graphs comparison. Use cases
5 . Embeddings and GNNs

Theory

2h

Problems

0h

Laboratory

0h

Guided learning

0h

Autonomous learning

8h

Teaching methodology

Lectures: The instructor presents the topic. Students follow the lecture, take notes, and prepare additional material outside the classroom. They may also be asked to carry out activities during these sessions.

Laboratory: Laboratory sessions are mainly devoted to practical work (with or without a computer) on the concepts introduced in the lecture sessions. Tools relevant to the introduced concepts are presented and used in projects during these sessions. Laboratory work requires the submission of project-based assignments, to be developed both in class and at home, which are assessed together with an on-site examination.

Evaluation methodology

Final grade = 40% EX + 60% LAB

EX = Final exam grade
LAB = Weighted grade of the laboratory work. Laboratory assessment is based on the submission (E) and an on-site assessment test (C) related to the submission. The final laboratory grade is computed as the geometric mean of E and C.

Bibliography

Basic:

Data Integration: A Theoretical Perspective - Lenzerini, Maurizio, PODS '02: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, 2002. ISBN: 1-58113-507-6
https://doi.org/10.1145/543613.543644
Managing and mining graph data - Aggarwal, Charu C; Wang, Haixun, Springer, 2010. ISBN: 9781441960443
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003843179706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
The description logic handbook: theory, implementation and applications - Baader, Franz, Cambridge University Press, 2003. ISBN: 0521781760
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991002562579706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Web data management - Abiteboul, Serge, Cambridge University Press, 2012. ISBN: 9781107012431
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003929239706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing - Sahu, Siddhartha; Mhedhbi, Amine; Salihoglu, Semih; Lin, Jimmy; Özsu, M. Tamer, Cornell University Library, 2017.
https://arxiv.org/abs/1709.03188
Deep Learning - Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron, MIT Press, 2016.
https://www.deeplearningbook.org/
Representation Learning on Graphs - Hamilton, William L., Morgan & Claypool Publishers, 2020.
https://www.cs.mcgill.ca/~wlh/grl_book/
Neural Network Methods in Natural Language Processing (Synthesis Lectures on Human Language Technologies) - Goldberg, Yoav; Hirst, Graemer, Morgan & Claypool , 2017. ISBN: 9781681732350
https://mitpressbookstore.mit.edu/book/9781681732350
A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications - Cai, HongYun and Zheng, Vincent W. and Chang, Kevin Chen-Chuan, IEEE Transactions on Knowledge and Data Engineering, 9 (2018). ISBN: 1558-2191
10.1109/TKDE.2018.2807452

Previous capacities

The student must be familiar with basics on databases, data modeling, logics and linera algebra. Advanced programming skills are mandatory.

Semantic Data Management

Teachers

Person in charge

Others

Weekly hours

Competences

Transversal Competences

Teamwork

Third language

Entrepreneurship and innovation

Basic

Generic Technical Competences

Generic

Technical Competences

Especifics

Objectives

Contents

Activities

Lectures

Hands-on Session

Final Exam

Teaching methodology

Evaluation methodology

Bibliography

Basic:

Previous capacities

Where we are

Contact with us

Semantic Data Management

You are here

Teachers

Person in charge

Others

Weekly hours

Competences

Transversal Competences

Teamwork

Third language

Entrepreneurship and innovation

Basic

Generic Technical Competences

Generic

Technical Competences

Especifics

Objectives

Contents

Activities

Lectures

Hands-on Session

Final Exam

Teaching methodology

Evaluation methodology

Bibliography

Basic:

Previous capacities

Where we are

Contact with us