Skip to main content

Data Warehousing

Credits
6
Types
Compulsory
Requirements
This subject has not requirements , but it has got previous capacities
Department
ESSI
Web
https://learnsql.fib.upc.es/moodle
This course introduces the concepts of database technology used in Business Intelligence. More precisely, this includes multidimensional databases and data warehouses, as well as ETL (Extraction, Transformation and Load) processes and basic concepts of dashboarding. Necessary techniques will be presented for designing, implementing, exploiting, and maintaining data warehouses, paying special attention to spatio-temporal data.

A particular focus will be given on the problems posed by heterogeneous data integration and data quality. The students will learn how to define, measure and maintain data quality in the context of data warehousing. Classical notions of data warehousing and OLAP are developed: ETL, architecture, conceptual and logical design, query processing and optimization. At the end of this course, the student will know how to efficiently design, construct and query a data warehouse to create effective visualizations.

Teachers

Person in charge

Others

Weekly hours

Theory
1.9
Problems
0
Laboratory
1.9
Guided learning
0
Autonomous learning
6.85

Competences

Teamwork

  • CT3 - Ability to work as a member of an interdisciplinary team, as a normal member or performing direction tasks, in order to develop projects with pragmatism and sense of responsibility, making commitments taking into account the available resources.
  • Third language

  • CT5 - Achieving a level of spoken and written proficiency in a foreign language, preferably English, that meets the needs of the profession and the labour market.
  • Entrepreneurship and innovation

  • CT1 - Know and understand the organization of a company and the sciences that govern its activity; have the ability to understand labor standards and the relationships between planning, industrial and commercial strategies, quality and profit. Being aware of and understanding the mechanisms on which scientific research is based, as well as the mechanisms and instruments for transferring results among socio-economic agents involved in research, development and innovation processes.
  • Basic

  • CB6 - Ability to apply the acquired knowledge and capacity for solving problems in new or unknown environments within broader (or multidisciplinary) contexts related to their area of study.
  • CB7 - Ability to integrate knowledge and handle the complexity of making judgments based on information which, being incomplete or limited, includes considerations on social and ethical responsibilities linked to the application of their knowledge and judgments.
  • CB8 - Capability to communicate their conclusions, and the knowledge and rationale underpinning these, to both skilled and unskilled public in a clear and unambiguous way.
  • CB9 - Possession of the learning skills that enable the students to continue studying in a way that will be mainly self-directed or autonomous.
  • CB10 - Possess and understand knowledge that provides a basis or opportunity to be original in the development and/or application of ideas, often in a research context.
  • Generic

  • CG1 - Identify and apply the most appropriate data management methods and processes to manage the data life cycle, considering both structured and unstructured data
  • Especifics

  • CE2 - Apply the fundamentals of data management and processing to a data science problem
  • CE3 - Apply data integration methods to solve data science problems in heterogeneous data environments
  • CE5 - Model, design, and implement complex data systems, including data visualization
  • CE7 - Identify the limitations imposed by data quality in a data science problem and apply techniques to smooth their impact
  • Objectives

    1. Be able to model multidimensional data warehouses and visually analyze their data
      Related competences: CB10, CB6, CB7, CB8, CB9, CT1, CT3, CT5, CE3, CE5,
    2. Be able to apply specific physical design techniques for decisional systems
      Related competences: CB6, CB7, CB8, CB9, CT3, CT5, CE2, CE5, CG1,
    3. Be able to design and implement data migration processes (i.e., ETL)
      Related competences: CB10, CB6, CB7, CB8, CB9, CT3, CT5, CE2, CE3, CE5, CE7, CG1,

    Contents

    1. Introduction
      Comparison of operational and decisional systems; Metadata
    2. Data warehousing architectures
      Corporate Information Factory; DW 2.0
    3. Multidimensional modeling, OLAP tools
      Structure; Integrity constraints; Operations; Advanced concepts
    4. Database physical desing for analytical queries
      Star-join and join indexes; Bitmaps; Materialized views; Spatio-temporal data
    5. Extraction, Transformation and Load
      Data quality; Schema and Data Integration; ETL management
    6. Visualization and descriptive analytics
      Key Performance Indicators; Dashboarding

    Activities

    Activity Evaluation act


    Theoretical lectures

    In these activities, the lecturer will introduce the main theoretical concepts of the subject. Besides lecturing, cooperative learning techniques will be used. These demand the active participation of the students, and consequently will be evaluated.
    Objectives: 1 3 2
    Contents:
    Theory
    25h
    Problems
    0h
    Laboratory
    0h
    Guided learning
    0h
    Autonomous learning
    25h

    Hands-on sessions

    The student will be asked to practice the different concepts introduced in the theoretical lectures. This includes problem solving either on the computer or on paper.
    Objectives: 1 3 2
    Contents:
    Theory
    0h
    Problems
    0h
    Laboratory
    27h
    Guided learning
    0h
    Autonomous learning
    54h

    Theory
    2h
    Problems
    0h
    Laboratory
    0h
    Guided learning
    0h
    Autonomous learning
    17h

    Teaching methodology

    The course comprises theory, and lab sessions.

    Lectures: The teacher presents the topic. Students follow the lesson, take notes, and prepare additional material outside of class. They may also be asked to carry out assessment activities within these sessions.

    Laboratory: Some representative tools will be used for the application of theoretical concepts (e.g., PotgreSQL, Oracle, Talend, Tableau). The course includes continuous hands-on through a course project, divided into three logical blocks: data warehouse modelling, data integration and migration (ETL), and descriptive visualisation, in which the students will work in teams. There will be three project deliverables outside the class hours, while in the class the students will be as well individually assessed about the knowledge acquired during each project block.

    Evaluation methodology

    Final grade = max(20%EP+40%EF ; 60% EF) + 40% P

    EP = partial (mid term) exam mark
    EF = final exam mark
    P = Weighted average of the marks of the project deliverables

    Bibliography

    Basic

    Complementary

    Web links

    Previous capacities

    Basic knowledge on relational databases and SQL.

    Specifically, it will be assumed knowledge on:
    - UML class diagrams
    - Relational algebra
    - SQL queries
    - Relational views
    - B-tree operations (i.e., insertion and splits)
    - Basic concepts on physical query optimization