Data Warehousing

Credits
6
Types
Specialization compulsory (Data Science)
Requirements
This subject has not requirements, but it has got previous capacities
Department
ESSI
This course introduces the concepts of database technology used in Business Intelligence. More precisely, this includes multidimensional databases and data warehouses, as well as ETL (Extraction, Transformation and Load) processes and basic concepts of dashboarding. Necessary techniques will be presented for designing, implementing, exploiting, and maintaining data warehouses, paying special attention to spatio-temporal data.

A particular focus will be given on the problems posed by heterogeneous data integration and data quality. The students will learn how to define, measure and maintain data quality in the context of data warehousing. Classical notions of data warehousing and OLAP are developed: ETL, architecture, conceptual and logical design, query processing and optimization. At the end of this course, the student will know how to efficiently design, construct and query a data warehouse to create effective visualizations.

Teachers

Person in charge

  • Alberto Abello Gamazo ( )

Others

  • Petar Jovanovic ( )

Weekly hours

Theory
1.9
Problems
0
Laboratory
1.9
Guided learning
0
Autonomous learning
96

Competences

Technical Competences of each Specialization

Service engineering

  • CEE5.1 - Capability to participate in improvement projects or to create service systems, providing in particular: a) innovation and research proposals based on new uses and developments of information technologies, b) application of the most appropriate software engineering and databases principles when developing information systems, c) definition, installation and management of infrastructure / platform necessary for the efficient running of service systems.
  • CEE5.3 - Capability to work in interdisciplinary engineering services teams and, provided the necessary domain experience, capability to work autonomously in specific service systems.

Specific

  • CEC2 - Capacity for mathematical modelling, calculation and experimental design in engineering technology centres and business, particularly in research and innovation in all areas of Computer Science.

Generic Technical Competences

Generic

  • CG3 - Capacity for mathematical modeling, calculation and experimental designing in technology and companies engineering centers, particularly in research and innovation in all areas of Computer Science.

Transversal Competences

Teamwork

  • CTR3 - Capacity of being able to work as a team member, either as a regular member or performing directive activities, in order to help the development of projects in a pragmatic manner and with sense of responsibility; capability to take into account the available resources.

Basic

  • CB6 - Ability to apply the acquired knowledge and capacity for solving problems in new or unknown environments within broader (or multidisciplinary) contexts related to their area of study.
  • CB7 - Ability to integrate knowledges and handle the complexity of making judgments based on information which, being incomplete or limited, includes considerations on social and ethical responsibilities linked to the application of their knowledge and judgments.
  • CB8 - Capability to communicate their conclusions, and the knowledge and rationale underpinning these, to both skilled and unskilled public in a clear and unambiguous way.
  • CB9 - Possession of the learning skills that enable the students to continue studying in a way that will be mainly self-directed or autonomous.

Objectives

  1. Be able to model multidimensional data warehouses and visually analyze their data
    Related competences: CEE5.1, CEE5.3, CG3, CB6, CB7, CB8, CB9, CEC2, CTR3,
  2. Be able to apply specific physical design techniques for decisional systems
    Related competences: CEE5.1, CEE5.3, CG3, CB7, CB8, CB9, CEC2, CTR3,
  3. Be able to design and implement data migration processes (i.e., ETL)
    Related competences: CEE5.1, CEE5.3, CG3, CB6, CB7, CB8, CB9, CEC2, CTR3,

Contents

  1. Introduction
    Comparison of operational and decisional systems; Metadata
  2. Data warehousing architectures
    Corporate Information Factory; DW 2.0
  3. Multidimensional modeling, OLAP tools
    Structure; Integrity constraints; Operations; Advanced concepts
  4. Database physical desing for analytical queries
    Star-join and join indexes; Bitmaps; Materialized views; Spatio-temporal data
  5. Extraction, Transformation and Load
    Data quality; Schema and Data Integration; ETL management
  6. Visualization and descriptive analytics
    Key Performance Indicators; Dashboarding

Activities

Activity Evaluation act


Theoretical lectures

In these activities, the lecturer will introduce the main theoretical concepts of the subject. Besides lecturing, cooperative learning techniques will be used. These demand the active participation of the students, and consequently will be evaluated.
Objectives: 1 2 3
Contents:
Theory
25h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
25h

Hands-on sessions

The student will be asked to practice the different concepts introduced in the theoretical lectures. This includes problem solving either on the computer or on paper.
Objectives: 1 2 3
Contents:
Theory
0h
Problems
0h
Laboratory
27h
Guided learning
0h
Autonomous learning
54h

Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
17h

Teaching methodology

The course comprises theory, and lab sessions.

Theory: Inverted class techniques will be used, which require that the student work on the provided multimedia materials before the class. Then, theory lectures comprise the teacher's complementary explanations and problem solving.

Laboratory: Some representative tools will be used for the application of theoretical concepts (e.g., Indyco Builder, PotgreSQL, Oracle, Pentaho Data Integration, Tableau). The course includes continuous hands-on through a course project, divided into three logical blocks: data warehouse modelling, data integration and migration (ETL), and descriptive visualisation, in which the students will work in teams. There will be three project deliverables outside the class hours, while in the class the students will be as well individually assessed about the knowledge acquired during each project block.

Evaluation methodology

Final Mark = min(10 ; 60%E + 40%L + 10%P)

L = Weighted average of the marks of the three lab deliverables
E = Final exam
P = Participation in the class

Bibliography

Basic:

Complementary:

Web links

Previous capacities

Basic knowledge on relational databases and SQL.

Specifically, it will be assumed knowledge on:
- UML class diagrams
- Relational algebra
- SQL queries
- Relational views
- B-tree operations (i.e., insertion and splits)
- Basic concepts on physical query optimization