Credits
6
Types
Elective
Requirements
This subject has not requirements
, but it has got previous capacities
Department
ESSI
Web
https://learnsql.fib.upc.es/moodle
A particular focus will be given on the problems posed by heterogeneous data integration and data quality. The students will learn how to define, measure and maintain data quality in the context of data warehousing. Classical notions of data warehousing and OLAP are developed: ETL, architecture, conceptual and logical design, query processing and optimization. At the end of this course, the student will know how to efficiently design, construct and query a data warehouse to create effective visualizations.
Teachers
Others
- Petar Jovanovic ( petar.jovanovic@upc.edu )
Weekly hours
Theory
1.9
Problems
0
Laboratory
1.9
Guided learning
0
Autonomous learning
96
Objectives
-
Be able to model multidimensional data warehouses and visually analyze their data
Related competences: CEE5.1, CEE5.3, CG3, CB6, CB7, CB8, CB9, CEC2, CTR3, -
Be able to apply specific physical design techniques for decisional systems
Related competences: CEE5.1, CEE5.3, CG3, CB7, CB8, CB9, CEC2, CTR3, -
Be able to design and implement data migration processes (i.e., ETL)
Related competences: CEE5.1, CEE5.3, CG3, CB6, CB7, CB8, CB9, CEC2, CTR3,
Contents
-
Introduction
Comparison of operational and decisional systems; Metadata -
Data warehousing architectures
Corporate Information Factory; DW 2.0 -
Multidimensional modeling, OLAP tools
Structure; Integrity constraints; Operations; Advanced concepts -
Database physical desing for analytical queries
Star-join and join indexes; Bitmaps; Materialized views; Spatio-temporal data -
Extraction, Transformation and Load
Data quality; Schema and Data Integration; ETL management -
Visualization and descriptive analytics
Key Performance Indicators; Dashboarding
Activities
Activity Evaluation act
Theoretical lectures
In these activities, the lecturer will introduce the main theoretical concepts of the subject. Besides lecturing, cooperative learning techniques will be used. These demand the active participation of the students, and consequently will be evaluated.Objectives: 1 2 3
Contents:
Theory
25h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
25h
Teaching methodology
The course comprises theory, and lab sessions.Theory: Inverted class techniques will be used, which require that the student work on the provided multimedia materials before the class. Then, theory lectures comprise the teacher's complementary explanations and problem solving.
Laboratory: Some representative tools will be used for the application of theoretical concepts (e.g., Indyco Builder, PotgreSQL, Oracle, Pentaho Data Integration, Tableau). The course includes continuous hands-on through a course project, divided into three logical blocks: data warehouse modelling, data integration and migration (ETL), and descriptive visualisation, in which the students will work in teams. There will be three project deliverables outside the class hours, while in the class the students will be as well individually assessed about the knowledge acquired during each project block.
Evaluation methodology
Final Mark = min(10 ; 60%E + 40%L + 10%P)L = Weighted average of the marks of the three lab deliverables
E = Final exam
P = Participation in the class
Bibliography
Basic
-
Data warehouse design: modern principles and methodologies
- Golfarelli, M.; Rizzi, S,
McGraw Hill,
2009.
ISBN: 9780071610391
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003628169706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Multidimensional databases and data warehousing
- Jensen, C.S.; Pedersen, T.B.; Thomsen, C.W,
Morgan & Claypool,
2010.
ISBN: 9781608455379
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003948319706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Corporate information factory
- Inmon, W.H.; Imhoff, C.; Sousa, R,
John Wiley,
2001.
ISBN: 0471399612
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003133419706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
The data warehouse lifecycle toolkit
- Kimball, R. [et al.],
Wiley publishing,
2008.
ISBN: 9780470149775
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003464519706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Database systems: the complete book
- Garcia-Molina, H.; Ullman, J.D.; Widom, J,
Pearson Education Limited,
2013.
ISBN: 9781292024479
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004168919706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Data warehouse systems: design and implentation
- Vaisman, A.; Zimanyi, E,
Springer,
2014.
ISBN: 9783642546549
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004038639706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Complementary
-
Database modeling and design: logical design
- Teorey, T.J.; Nadeau, T.; Lightstone, S,
Morgan Kaufmann Publishers/Elsevier,
2011.
ISBN: 9780123820204
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004000559706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Physical database design: the database professional's guide to exploiting indexes, views, storage, and more
- Lightstone, S.; Teorey, T.J.; Nadeau, T,
Morgan Kaufmann Publishers,
2007.
ISBN: 9780123693891
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003252949706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Database management systems
- Ramakrishnan, R.; Gehrke, J,
McGraw-Hill,
2003.
ISBN: 0071151109
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991002855579706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Cost-based oracle fundamentals
- Lewis, J,
Apress,
2006.
ISBN: 9781590596364
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003403389706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Web links
- Summer school http://cs.ulb.ac.be/conferences/ebiss.html
- Transforming Data With Intelligence (former Data Warehouse Institute) http://tdwi.org
- MSCA-ITN-Erasmus Joint Doctorate on Data Engineering for Data Science https://deds.ulb.ac.be
Previous capacities
Basic knowledge on relational databases and SQL.Specifically, it will be assumed knowledge on:
- UML class diagrams
- Relational algebra
- SQL queries
- Relational views
- B-tree operations (i.e., insertion and splits)
- Basic concepts on physical query optimization