Advanced Databases

Credits
6
Types
Compulsory
Requirements
This subject has not requirements, but it has got previous capacities
Department
ESSI
This course trains students in the skills needed to design and configure analytical data management systems needed to run Artificial Intelligence (AI) algorithms. Advanced concepts in data management for large volumes of data, with high variety, data quality issues and varying semantics are covered. At the outset, the end-to-end data lifecycle for AI systems using the concepts of DevOps and DataOps. Next, three specific topics that correspond to the ain challenges of data management by AI will be explored. 1) Big data management and processing, 2) Semantic data management, and 3) Data integration with data quality management and data system architectures by AI.

The student will learn the concepts related to the analysis-oriented data storage, as well as the processes of large-scale data processing. As a result, the student will be able to evaluate the different possible alternatives of data storage, modeling and processing in the context of your organization and choose the most appropriate ones.

Teachers

Person in charge

  • Petar Jovanovic ( )
  • Sergi Nadal Francesch ( )

Weekly hours

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6

Competences

Transversal Competences

Transversals

  • CT4 [Avaluable] - Teamwork. Be able to work as a member of an interdisciplinary team, either as a member or conducting management tasks, with the aim of contributing to develop projects with pragmatism and a sense of responsibility, taking commitments taking into account available resources.
  • CT6 [Avaluable] - Autonomous Learning. Detect deficiencies in one's own knowledge and overcome them through critical reflection and the choice of the best action to extend this knowledge.
  • CT8 - Gender perspective. An awareness and understanding of sexual and gender inequalities in society in relation to the field of the degree, and the incorporation of different needs and preferences due to sex and gender when designing solutions and solving problems.

Basic

  • CB1 - That students have demonstrated to possess and understand knowledge in an area of ??study that starts from the base of general secondary education, and is usually found at a level that, although supported by advanced textbooks, also includes some aspects that imply Knowledge from the vanguard of their field of study.
  • CB2 - That the students know how to apply their knowledge to their work or vocation in a professional way and possess the skills that are usually demonstrated through the elaboration and defense of arguments and problem solving within their area of ??study.

Technical Competences

Especifics

  • CE04 - To design and use efficiently the most appropriate data types and structures to solve a problem.
  • CE08 - To detect the characteristics, functionalities and components of data managers, which allow the adequate use of them in information flows, and the design, analysis and implementation of applications based on them.
  • CE09 - To ideate, design and integrate intelligent data analysis systems with their application in production and service environments.
  • CE10 - To analyze, design, build and maintain applications in a robust, secure and efficient way, choosing the most appropriate paradigm and programming languages.
  • CE15 - To acquire, formalize and represent human knowledge in a computable form for solving problems through a computer system in any field of application, particularly those related to aspects of computing, perception and performance in intelligent environments or environments.

Generic Technical Competences

Generic

  • CG2 - To use the fundamental knowledge and solid work methodologies acquired during the studies to adapt to the new technological scenarios of the future.
  • CG3 - To define, evaluate and select hardware and software platforms for the development and execution of computer systems, services and applications in the field of artificial intelligence.
  • CG5 - Work in multidisciplinary teams and projects related to artificial intelligence and robotics, interacting fluently with engineers and professionals from other disciplines.
  • CG9 - To face new challenges with a broad vision of the possibilities of a professional career in the field of Artificial Intelligence. Develop the activity applying quality criteria and continuous improvement, and act rigorously in professional development. Adapt to organizational or technological changes. Work in situations of lack of information and / or with time and / or resource restrictions.

Objectives

  1. Be able to explain and use the main mechanisms of parallel processing of queries in distributed environments, and detect bottlenecks.
    Related competences: CG3, CG9, CT4, CT6, CB1, CB2, CE08,
  2. Learn, understand and apply the fundamentals of distributed data management systems like distributed databases and distributed file systems.
    Related competences: CG2, CG5, CT4, CT6, CB2, CE04, CE08, CE15,
  3. Be able to justify and use functional-style distributed data processing environments.
    Related competences: CG3, CG5, CT4, CT6, CB1, CB2, CE08, CE09, CE10,
  4. Learn, understand and apply the fundamentals of knowledge graphs.
    Related competences: CG2, CG5, CT6, CT8, CB1, CE04, CE08,
  5. Be able to specify, design, implement and evaluate AI-oriented data management systems, including semantic databases for knowledge representation.
    Related competences: CG2, CG5, CT4, CT6, CB1, CE04, CE08, CE15,
  6. Be able to apply knowledge graphs to solve realistic problems such as data integration, graph-based data analysis, etc.
    Related competences: CG2, CG5, CG9, CT4, CT6, CT8, CB2, CE04, CE08, CE15,
  7. Be able to evaluate and select data management systems based on a certain quality criterion.
    Related competences: CG2, CG3, CT4, CB2, CE04, CE08, CE10,
  8. Be able to solve data discovery and integration problems based on available strategies, standards and technologies.
    Related competences: CG3, CG9, CT4, CT6, CT8, CB1, CB2, CE08, CE09, CE10,
  9. Be able to perform graph data query processing both.
    Related competences: CG3, CG9, CT4, CT6, CB1, CB2, CE04, CE09,

Contents

  1. Introduction to data systems for Artificial Intelligence.
    The complete AI lifecycle with DevOps and DataOps. Data acquisition, cleaning, and preparation. Model selection and management. Model debugging and serving.
  2. Large-scale data management and processing
    Distributed databases. Overview of distributed data management and processing. Distributed files system. Distributed data processing frameworks (MapReduce/Spark). Dataflow processing models. Declarative dataflow programs.
  3. Semantic data management
    Foundations of graph data management. Knowledge graph representations with RDF, RDFS, OWL and their relationship with first-order logics. Pattern matching and the SPARQL query language. Languages for describing and validating knowledge graphs.
  4. Data integration
    Data discovery. Data quality evaluation. Schema and data integration.
  5. Architectures for data-centric AI systems and their governance
    Centralized and Distributed functional architectures of data management systems for AI. Data governance.

Activities

Activity Evaluation act


introduction to data systems for AI

Introduction of the subject, motivation and overview of the data lifecycle for AI.
Objectives: 5
Contents:
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h

Study of large-scale data management and processing


Objectives: 1 3 2 5 7
Contents:
Theory
8h
Problems
0h
Laboratory
10h
Guided learning
0h
Autonomous learning
20h

Study of semantic data management


Objectives: 6 9 4 5
Contents:
Theory
8h
Problems
0h
Laboratory
10h
Guided learning
0h
Autonomous learning
16h

Study of data integration


Objectives: 3 6 5 8
Contents:
Theory
4h
Problems
0h
Laboratory
6h
Guided learning
0h
Autonomous learning
10h

Study of architectures for data-centric AI systems


Objectives: 2 5 7
Contents:
Theory
4h
Problems
0h
Laboratory
4h
Guided learning
0h
Autonomous learning
10h

Midterm exam


Objectives: 1 3 2 5
Week: 9
Type: theory exam
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
14h

Final exam


Objectives: 6 9 4 7 8
Week: 15 (Outside class hours)
Type: theory exam
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
18h

Teaching methodology

The course has theory lectures and laboratory sessions.

Lectures: The teacher presents the topic. Students follow the lesson, take notes, and prepare additional material outside of class. They may also be asked to carry out assessment activities within these sessions.

Laboratory: Mainly, the laboratory sessions will be dedicated to the practice (with or without a computer) of the concepts introduced in the lectures. Tools relevant to the concepts introduced are presented and used in small projects in these sessions. Mini projects will also be done, in which students will work in teams. For each mini project there will be a delivery outside class time, but students will also be assessed individually in the classroom on the knowledge acquired during each of the projects.

The course has an autonomous learning component, as the students will have to work with different data management and processing tools. Apart from the support material, students should be able to resolve doubts or problems using these tools.

Evaluation methodology

The qualification of technical skills is based on:

- NPR: Project grade, as a weighted average of the mini-projects of the course

- NEP: Grade of the partial exam.

- NEF: Grade of the final exam.

Final grade = NPR*0.40+NEP*0.20+NEF*0.40


For students who can participate in the re-evaluation, the re-evaluation exam mark will replace NEF and NEP. In any case, the final mark will be the maximum between the ordinary mark and the re-evaluation mark.

Bibliography

Basic:

Complementary:

Web links

Previous capacities

Fundamental knowledge of relational data modeling.
Be able to create, consult and manipulate databases with SQL.
Foundations of knowledge representation and first-order logics
Advanced programming in Python.