Data Warehousing

Credits
6
Department
ESSI
Types
Specialization compulsory (Data Science)
Requirements
This subject has not requirements
This course introduces the concepts of database technology used in Business Intelligence. More precisely, this includes multidimensional databases and data warehouses, as well as ETL processes. Necessary techniques will be presented for designing, implementing, exploiting, and maintaining data warehouses.

A particular focus will be given on the problems posed by heterogeneous data integration and data quality. The students will learn how to define, measure and maintain data quality in the context of data warehousing. Classical notions of data warehousing and OLAP are developed: ETL, architecture, conceptual and logical design, query processing and optimization. At the end of this course, the student will know how to efficiently design, construct and query a data warehouse.
Web: https://learnsql.fib.upc.es/moodle/

Teachers

Person in charge

  • Alberto Abello Gamazo ( )

Others

  • Petar Jovanovic ( )

Weekly hours

Theory
2
Problems
0
Laboratory
1
Guided learning
0
Autonomous learning
5.33

Competences

Technical Competences of each Specialization

Service engineering

  • CEE5.1 - Capability to participate in improvement projects or to create service systems, providing in particular: a) innovation and research proposals based on new uses and developments of information technologies, b) application of the most appropriate software engineering and databases principles when developing information systems, c) definition, installation and management of infrastructure / platform necessary for the efficient running of service systems.
  • CEE5.3 - Capability to work in interdisciplinary engineering services teams and, provided the necessary domain experience, capability to work autonomously in specific service systems.

Specific

  • CEC2 - Capacity for mathematical modelling, calculation and experimental design in engineering technology centres and business, particularly in research and innovation in all areas of Computer Science.

Generic Technical Competences

Generic

  • CG3 - Capacity for mathematical modeling, calculation and experimental designing in technology and companies engineering centers, particularly in research and innovation in all areas of Computer Science.

Transversal Competences

Teamwork

  • CTR3 - Capacity of being able to work as a team member, either as a regular member or performing directive activities, in order to help the development of projects in a pragmatic manner and with sense of responsibility; capability to take into account the available resources.

Basic

  • CB7 - Ability to integrate knowledges and handle the complexity of making judgments based on information which, being incomplete or limited, includes considerations on social and ethical responsibilities linked to the application of their knowledge and judgments.
  • CB8 - Capability to communicate their conclusions, and the knowledge and rationale underpinning these, to both skilled and unskilled public in a clear and unambiguous way.
  • CB9 - Possession of the learning skills that enable the students to continue studying in a way that will be mainly self-directed or autonomous.

Objectives

  1. Be able to model multidimensional data warehouses
    Related competences: CEE5.1, CEE5.3, CG3, CB7, CB8, CB9, CEC2, CTR3,
  2. Be able to apply specific physical design techniques for decisional systems
    Related competences: CEE5.1, CEE5.3, CG3, CB7, CB8, CB9, CEC2, CTR3,
  3. Be able to design and implement data migration processes (i.e., ETL)
    Related competences: CEE5.1, CEE5.3, CG3, CB7, CB8, CB9, CEC2, CTR3,

Contents

  1. Introduction
    Comparison of operational and decisional systems; Metadata
  2. Data warehousing architectures
    Corporate Information Factory; DW 2.0
  3. Multidimensional modeling and OLAP tools
    Structure; Integrity constraints; Operations; Advanced concepts
  4. Database optimization
    Basic concepts; Phases and goals
  5. Database physical desing for analytical queries
    Star-join and join indexes; Bitmaps; Materialized views; Implementations (relational and NOSQL)
  6. Extraction, Transformation and Load
    Data quality; Integration; ETL management

Activities

Theoretical lectures

In these activities, the lecturer will introduce the main theoretical concepts of the subject. Besides lecturing, cooperative learning techniques will be used. These demand the active participation of the students, and consequently will be evaluated.
Theory
28
Problems
0
Laboratory
0
Guided learning
0
Autonomous learning
28
Objectives: 1 2 3
Contents:

Hands-on sessions

The student will be asked to practice the different concepts introduced in the theoretical lectures. This includes problem solving either on the computer or on paper.
Theory
0
Problems
0
Laboratory
14
Guided learning
0
Autonomous learning
28
Objectives: 1 2 3
Contents:

Seminars

The students will be asked to prepare a demo of some data warehousing tool and do it in front of the classmates.
Theory
6
Problems
0
Laboratory
4
Guided learning
0
Autonomous learning
32
Objectives: 1 2 3
Contents:

Exam

Written exam of the theoretical concepts introduced along the course.
Theory
2
Problems
0
Laboratory
0
Guided learning
0
Autonomous learning
8

Teaching methodology

The course comprises theory, lab sessions and seminars.

Theory: The theory lectures comprise the teacher's explanations and constitute the main part of the course. The students will also have some contents to be read and prepared outside the classroom and will be asked to participate in cooperative learning activities.

Laboratory: Mainly, the lab sessions will be dedicated to the practice (with and without computer) of the concepts introduced in the theory lectures, by means of markable exercises that will be done during the class time. Some tools will be used for the design and practice on a specific DBMS (e.g., Oracle).

Seminar: The students will have to prepare a practical seminar by themselves and present it in front of the others.

Evaluation methodology

Final mark = 40% min(10,P) + 30% S + 10%C + 20% E

P = Weighted average grade of questionnaires, collaborative activities and 7 lab sessions
S = Seminar
C = Peer evaluation
E = Exam

Calculation of P:
1) Multiply the mark of each activity by a weight equal to 1, 2, 4, or 8 (depending on the content of the corresponding activity)
2) Divide the sum of these values by the sum of weights assigned minus 8

Calculation of C: students will have multiple labmates during the semester and they will evaluate them at the end. Based on these evaluations, the teacher will assign a mark.

Seminar: students will prepare a practical seminar and will present it in front of their classmates. Based on that presentation and the delivered materials, the teacher will assign a mark.

Bibliografy

Basic:

Complementary:

Web links

Previous capacities

Basic knowledge on relational databases and SQL.

Specifically, it will be assumed knowledge on:
- UML class diagrams
- Relational algebra
- SQL queries
- Relational views
- B-tree operations (i.e., insertion and splits)