Semantic Data Management

You are here

Credits
6
Types
Compulsory
Requirements
This subject has not requirements, but it has got previous capacities
Department
ESSI
This course introduces the principles and techniques of semantic data management for representing, integrating, and exploiting complex and heterogeneous data. Students learn how graph-based data models enable the explicit representation of entities and relationships, overcoming the limitations of traditional key-based data models when dealing with highly connected data. The course covers property graphs and knowledge graphs as foundational abstractions for semantic data integration.

The first part of the course focuses on property graphs, which build upon traditional graph data management systems and provide the basis for efficient graph storage, querying, and processing. Within this framework, students study fundamental graph algorithms and graph processing techniques to analyze structure, connectivity, and patterns in large-scale graph data.

The second part of the course introduces knowledge graphs, which extend graph data management with semantic annotations and formal vocabularies, enabling symbolic reasoning, inference, and richer forms of data integration. This perspective highlights how semantics add interpretability and reasoning capabilities beyond purely structural graph analysis.

The final part of the course presents a complementary form of graph exploitation based on graph embeddings. By mapping graph elements into continuous vector spaces, embeddings enable the application of machine learning techniques directly on graph-structured data. This includes an introduction to graph neural networks (GNNs) as a powerful paradigm for representation learning on graphs that explicitly captures structural and relational context.

As this is a rapidly evolving and still maturing research area, there is no single, well-established methodology. Consequently, the course emphasizes rigorous reasoning, technical depth, and innovation, preparing students to effectively incorporate complex, graph-structured data into organizational decision-making processes.

Teachers

Person in charge

  • Anna Queralt Calafat ( )

Others

  • Gerard Pons Recasens ( )
  • Oscar Romero Moral ( )

Weekly hours

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
7.11

Competences

Transversal Competences

Teamwork

  • CT3 - Ability to work as a member of an interdisciplinary team, as a normal member or performing direction tasks, in order to develop projects with pragmatism and sense of responsibility, making commitments taking into account the available resources.

Third language

  • CT5 - Achieving a level of spoken and written proficiency in a foreign language, preferably English, that meets the needs of the profession and the labour market.

Entrepreneurship and innovation

  • CT1 - Know and understand the organization of a company and the sciences that govern its activity; have the ability to understand labor standards and the relationships between planning, industrial and commercial strategies, quality and profit. Being aware of and understanding the mechanisms on which scientific research is based, as well as the mechanisms and instruments for transferring results among socio-economic agents involved in research, development and innovation processes.

Basic

  • CB6 - Ability to apply the acquired knowledge and capacity for solving problems in new or unknown environments within broader (or multidisciplinary) contexts related to their area of study.
  • CB7 - Ability to integrate knowledge and handle the complexity of making judgments based on information which, being incomplete or limited, includes considerations on social and ethical responsibilities linked to the application of their knowledge and judgments.
  • CB8 - Capability to communicate their conclusions, and the knowledge and rationale underpinning these, to both skilled and unskilled public in a clear and unambiguous way.
  • CB9 - Possession of the learning skills that enable the students to continue studying in a way that will be mainly self-directed or autonomous.
  • CB10 - Possess and understand knowledge that provides a basis or opportunity to be original in the development and/or application of ideas, often in a research context.

Generic Technical Competences

Generic

  • CG1 - Identify and apply the most appropriate data management methods and processes to manage the data life cycle, considering both structured and unstructured data
  • CG3 - Define, design and implement complex systems that cover all phases in data science projects

Technical Competences

Especifics

  • CE3 - Apply data integration methods to solve data science problems in heterogeneous data environments
  • CE5 - Model, design, and implement complex data systems, including data visualization
  • CE9 - Apply appropriate methods for the analysis of non-traditional data formats, such as processes and graphs, within the scope of data science
  • CE12 - Apply data science in multidisciplinary projects to solve problems in new or poorly explored domains from a data science perspective that are economically viable, socially acceptable, and in accordance with current legislation
  • CE13 - Identify the main threats related to ethics and data privacy in a data science project (both in terms of data management and analysis) and develop and implement appropriate measures to mitigate these threats

Objectives

  1. Learn, understand and apply the fundamentals of property graphs
    Related competences: CT3, CT5, CG1, CE5, CE9, CB6, CB9, CB10,
  2. Learn, understand and apply the fundamentals of knowledge graphs
    Related competences: CT3, CT5, CG1, CE5, CE9, CB6, CB9, CB10,
  3. Perform graph data processing both in centralized and distributed environments
    Related competences: CT3, CT5, CG1, CE5, CE9, CB6, CB9, CB10,
  4. Integrate, combine and refine semi-structured or non-structured data using graph formalisms
    Related competences: CT3, CT5, CT1, CG1, CG3, CE3, CE5, CE9, CE12, CE13, CB6, CB7, CB8, CB9,
  5. Determine how to apply graph formalisms to solve the Variety challenge (data integration)
    Related competences: CT5, CT1, CG3, CE3, CE5, CE9, CE12, CE13, CB6, CB7, CB9,
  6. Apply property or knowledge graphs to solve realistic problems such as data integration, graph-based data analysis, etc.
    Related competences: CT3, CT5, CT1, CG1, CG3, CE3, CE5, CE9, CE12, CE13, CB6, CB7, CB8, CB9, CB10,

Contents

  1. Introduction and formalization of semantic data management
    Definition of data management tasks from the perspectives of databases and knowledge representation. Syntactic and semantic heterogeneity, and the impact of data heterogeneity on different data management tasks. Concept of data integration and definition of a theoretical framework for managing and integrating heterogeneous data sources. The need for a canonical data model for data integration, including the definition of a data model and the essential characteristics of canonical data models.
  2. Property graphs
    Data structures. Model integrity constraints. Basic operations based on topology, content, and hybrid approaches. Graph query languages: GraphQL and Cypher. Graph database concepts. Native implementations and implementations based on relational algebra. Impact of these design decisions on core operations. Efficient graph design. Impact of these heterogeneities on the main operations. Distributed graph databases: motivation and challenges. The thinking like a vertex paradigm as the de facto standard for distributed graph processing. Main distributed graph processing algorithms.
  3. Knowledge graphs
    RDF, RDFS, and OWL. Data structures. Integrity constraints. Relationship with first-order logic. Foundations in Description Logics. Inference. Basic operations and query languages. SPARQL and its algebra. Entailment regimes (inference).
  4. Property and knowledge graphs comparison. Use cases
    Recap about both models. Commonalities and differences. Concepts to borrow between both paradigms.

    Main use cases. Metadata management: Data Lake semantification and data governance.

    Main use cases. Exploitation of their topological features: recommenders on graphs and data mining.

    Visualization: by means of a GUI (Gephi) or programmatically (D3.js or GraphLab).
  5. Embeddings and GNNs
    Concept of embeddings. Properties. Application to graphs and connection with Machine Learning and learning algorithms. GNN architectures. Applications.

Activities

Activity Evaluation act


Lectures

During lectures the main concepts will be discussed. Lectures will combine master lectures and active / cooperative learning activities. The student is meant to have a pro-active attitude during active / cooperative learning activities. During master lectures, the student is meant to listen, take notes and ask questions.
Objectives: 2 5 3 1
Contents:
Theory
25h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
28h

Hands-on Session

The student will be asked to practice the different concepts introduced in the lectures. This includes problem solving either on the computer or on paper.
Objectives: 6 5 4
Contents:
Theory
0h
Problems
0h
Laboratory
27h
Guided learning
0h
Autonomous learning
60h

Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
8h

Teaching methodology

Lectures: The instructor presents the topic. Students follow the lecture, take notes, and prepare additional material outside the classroom. They may also be asked to carry out activities during these sessions.

Laboratory: Laboratory sessions are mainly devoted to practical work (with or without a computer) on the concepts introduced in the lecture sessions. Tools relevant to the introduced concepts are presented and used in projects during these sessions. Laboratory work requires the submission of project-based assignments, to be developed both in class and at home, which are assessed together with an on-site examination.

Evaluation methodology

Final grade = 40% EX + 60% LAB

EX = Final exam grade
LAB = Weighted grade of the laboratory work. Laboratory assessment is based on the submission (E) and an on-site assessment test (C) related to the submission. The final laboratory grade is computed as the geometric mean of E and C.

Bibliography

Basic:

Previous capacities

The student must be familiar with basics on databases, data modeling, logics and linera algebra. Advanced programming skills are mandatory.