Credits
6
Types
Compulsory
Requirements
This subject has not requirements
, but it has got previous capacities
Department
ESSI
Web
https://learnsql3.fib.upc.edu/moodle
Teachers
Person in charge
- Alberto Abello Gamazo ( alberto.abello@upc.edu )
Others
- Besim Bilalli ( besim.bilalli@upc.edu )
Weekly hours
Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6
Competences
Technical competencies
Transversals
Basic
Generic
Objectives
-
Be able to discuss and justify in detail architectural principles and the bottlenecks of the relational managers in front of alternative storage and processing systems.
Related competences: CB2, CB3, CT4, CT6, CE7, CG1, CG2, -
Be able to obtain the logical scheme of a data warehouse from a conceptual schema expressed in UML, detect and correct defects in it.
Related competences: CB2, CB3, CT4, CT6, CE7, -
Be able to explain and use the main mechanisms of parallel processing of queries in distributed environments, and detect bottlenecks.
Related competences: CB2, CB3, CT4, CT6, CE7, CG2, -
Be able to justify and use NOSQL storage systems.
Related competences: CB2, CB3, CT4, CT6, CE7, CG1, CG2,
Contents
-
Introduction
Data warehousing and Big Data -
Data Warehousing
Data warehousing. ETL data flows. Data integration. OLAP tools. -
Distributed databases
Taxonomy of distributed databases. Architectures. Distributed database design (fragmentation and replication). Parallelism. Measures of scalability. Distirbuted file systems. -
Distributed data processing
Importance of parallel sequential access. Synchronization barriers (Bulk Synchronous Parallel model). Big Data architectures and NOSQL systems.
Activities
Activity Evaluation act
Introduction
Introduction of the subject, motivation and overview of existing data management tools, their advantages and disadvantagesObjectives: 1
Contents:
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h
Theory
10h
Problems
0h
Laboratory
14h
Guided learning
0h
Autonomous learning
38h
Teaching methodology
The course consists of theory and laboratory sessions.Theory: Reverse class techniques will be used that require the student to work on multimedia materials before class. Theory classes consist of complementary teacher explanations and problem solving.
Laboratory: Representative tools will be used for the application of theoretical concepts (for example, PotgreSQL, Talend, HDFS, MongoDB). There will also be two projects, in which students will work in teams: one on descriptive data analysis in a data warehouse and the other on predictive analysis in a Big Data environment. Consequently, there will be two deliverables outside of class hours, but students will also be assessed individually in the classroom on the knowledge gained during each of the projects.
The course has an autonomous learning component, as the students will have to work with different data management and processing tools. Apart from the support material, students should be able to resolve doubts or problems using these tools.
Evaluation methodology
Final grade = max(20%EP+40%EF ; 60% EF) + 40% PEP = partial (mid term) exam mark
EF = final exam mark
P = project mark, as a weighted average of the course projects
For students who may take the resit session, the reassessment examination mark will replace EF.
Bibliography
Basic
-
Database systems : the complete book
- Garcia-Molina, Hector; Ullman, Jeffrey D; Widom, Jennifer,
Pearson Education,
2013.
ISBN: 9781292024479
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004168919706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Data warehouse design: modern principles and methodologies
- Golfarelli, M.; Rizzi, S,
McGraw Hill,
2009.
ISBN: 9780071610391
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003628169706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Data warehouse systems: design and implentation
- Vaisman, A.; Zimányi, E,
Springer,
2022.
ISBN: 9783662651667
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991005155876506711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Principles of distributed database systems
- Özsu, M.T.; Valduriez, P,
Springer,
2020.
ISBN: 9783030262525
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004193569706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
NoSQL distilled: a brief guide to the emerging world of polygot persistence
- Sadalage, P.J.; Fowler, M,
Addison-Wesley,
2013.
ISBN: 9780321826626
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003990429706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
SQL for data science : data cleaning, wrangling and analytics with relational databases
- Badia, Antonio,
Springer,
2020.
ISBN: 9783030575915
http://cataleg.upc.edu/record=99100491633840671~S1*cat -
Data Warehousing and OLAP
- Abelló, Alberto; Jovanovic, Petar,
-
Big Data Management
- Abelló, Albero; Nadal, Sergi,
-
Slides on Advanced Databases course
- Database Technologies and Information Management,
Complementary
-
Exercises Big Data Management
- ,
-
Exercises Data Warehousing
- ,
Web links
- Erasmus Mundus Master on Big Data Management and Analytics https://bdma.ulb.ac.be/bdma
- European Big Data Management and Analytics Summer School (eBISS) https://cs.ulb.ac.be/conferences/ebiss.html
Previous capacities
Be able to read and understand materials in English.Be able to list the stages that make up the software engineering process.
Be able to understand conceptual schemas in UML.
Be able to create, query and manipulate databases with SQL.
Be able to program using functional programming like Spark.