Credits
6
Types
Compulsory
Requirements
This subject has not requirements
, but it has got previous capacities
Department
ESSI
Web
https://learnsql3.fib.upc.edu/moodle/course/view.php?id=170
The student will learn the concepts related to the analysis-oriented data storage, as well as the processes of large-scale data processing. As a result, the student will be able to evaluate the different possible alternatives of data storage, modeling and processing in the context of your organization and choose the most appropriate ones.
Teachers
Person in charge
- Petar Jovanovic ( petar.jovanovic@upc.edu )
Others
- Anna Queralt Calafat ( anna.queralt@upc.edu )
- Gerard Pons Recasens ( gerard.pons.recasens@upc.edu )
- Marc Maynou Yelamos ( marc.maynou@upc.edu )
Weekly hours
Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6
Competences
Transversals
Basic
Especifics
Generic
Objectives
-
Be able to explain and use the main mechanisms of parallel processing of queries in distributed environments, and detect bottlenecks.
Related competences: CG3, CG9, CT4, CT6, CB1, CB2, CE08, -
Learn, understand and apply the fundamentals of distributed data management systems like distributed databases and distributed file systems.
Related competences: CG2, CG5, CT4, CT6, CB2, CE04, CE08, CE15, -
Be able to justify and use functional-style distributed data processing environments.
Related competences: CG3, CG5, CT4, CT6, CB1, CB2, CE08, CE09, CE10, -
Learn, understand and apply the fundamentals of knowledge graphs.
Related competences: CG2, CG5, CT6, CT8, CB1, CE04, CE08, -
Be able to specify, design, implement and evaluate AI-oriented data management systems, including semantic databases for knowledge representation.
Related competences: CG2, CG5, CT4, CT6, CB1, CE04, CE08, CE15, -
Be able to apply knowledge graphs to solve realistic problems such as data integration, graph-based data analysis, etc.
Related competences: CG2, CG5, CG9, CT4, CT6, CT8, CB2, CE04, CE08, CE15, -
Be able to evaluate and select data management systems based on a certain quality criterion.
Related competences: CG2, CG3, CT4, CB2, CE04, CE08, CE10, -
Be able to solve data discovery and integration problems based on available strategies, standards and technologies.
Related competences: CG3, CG9, CT4, CT6, CT8, CB1, CB2, CE08, CE09, CE10, -
Be able to perform graph data query processing both.
Related competences: CG3, CG9, CT4, CT6, CB1, CB2, CE04, CE09,
Contents
-
Introduction to data systems for Artificial Intelligence.
The complete AI lifecycle with DevOps and DataOps. Data acquisition, cleaning, and preparation. Model selection and management. Model debugging and serving. -
Large-scale data management and processing
Distributed databases. Overview of distributed data management and processing. Distributed files system. Distributed data processing frameworks (MapReduce/Spark). Dataflow processing models. Declarative dataflow programs. -
Semantic data management
Foundations of graph data management. Knowledge graph representations with RDF, RDFS, OWL and their relationship with first-order logics. Pattern matching and the SPARQL query language. Languages for describing and validating knowledge graphs. -
Data integration
Data discovery. Data quality evaluation. Schema and data integration. -
Architectures for data-centric AI systems and their governance
Centralized and Distributed functional architectures of data management systems for AI. Data governance.
Activities
Activity Evaluation act
introduction to data systems for AI
Introduction of the subject, motivation and overview of the data lifecycle for AI.Objectives: 5
Contents:
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h
Theory
8h
Problems
0h
Laboratory
10h
Guided learning
0h
Autonomous learning
16h
Theory
4h
Problems
0h
Laboratory
6h
Guided learning
0h
Autonomous learning
10h
Teaching methodology
The course has theory lectures and laboratory sessions.Lectures: The teacher presents the topic. Students follow the lesson, take notes, and prepare additional material outside of class. They may also be asked to carry out assessment activities within these sessions.
Laboratory: Mainly, the laboratory sessions will be dedicated to the practice (with or without a computer) of the concepts introduced in the lectures. Tools relevant to the concepts introduced are presented and used in small projects in these sessions. Mini projects will also be done, in which students will work in teams. For each mini project there will be a delivery outside class time, but students will also be assessed individually in the classroom on the knowledge acquired during each of the projects.
The course has an autonomous learning component, as the students will have to work with different data management and processing tools. Apart from the support material, students should be able to resolve doubts or problems using these tools.
Evaluation methodology
The qualification of technical skills is based on:- NPR: Project grade, as a weighted average of the mini-projects of the course
- NEP: Grade of the partial exam.
- NEF: Grade of the final exam.
Final grade = NPR*0.40+NEP*0.25+NEF*0.35
Re-evaluation: Only the students who have taken the final exam and failed it can take the re-evaluation exam (not those with an NP). The re-evaluation exam mark will replace NEF and NEP and thus will include the content of the entire course. In any case, the final mark will be the maximum between the ordinary mark and the re-evaluation mark. The maximum grade of any re-evaluation exam will be 7.
Bibliography
Basic
-
SQL for data science : data cleaning, wrangling and analytics with relational databases
- Badia, Antonio,
Springer,
[2020].
ISBN: 9783030575915
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004916338406711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Principles of distributed database systems
- Özsu, M. Tamer; Valduriez, Patrick,
Springer,
[2020].
ISBN: 9783030262525
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004193569706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
The Web of Data
- Hogan, Aidan,
Springer,
2020.
ISBN: 9783030515829
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991005316955606711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
NoSQL distilled : a brief guide to the emerging world of polygot persistence
- Sadalage, Pramod J; Fowler, Martin,
Addison-Wesley,
2013.
ISBN: 9780321826626
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003990429706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Data management and query processing in semantic web databases
- Groppe, Sven,
Springer,
2011.
ISBN: 9783642193569
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003898129706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Web data management
- Abiteboul, S,
Cambridge University Press,
2012.
ISBN: 9781107012431
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003929239706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Complementary
-
Encyclopedia of database systems [Recurs electrònic]
- Özsu, M. Tamer; Liu, Ling,
Springer,
2009.
ISBN: 9780387399409
https://link-springer-com.recursos.biblioteca.upc.edu/referencework/10.1007/978-0-387-39940-9 -
Managing and mining graph data
- Aggarwal, Charu C; Wang, Haixun,
Springer,
cop. 2010.
ISBN: 9781441960443
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003843179706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
PODS '02: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database Systems
- Lenzerini, Maurizio,
PODS '02: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database Systems,
2002.
https://dl-acm-org.recursos.biblioteca.upc.edu/doi/10.1145/543613.543644 -
Frontiers of Computer Science
- Özsu, M. Tamer,
Frontiers of Computer Science,
2016.
https://link-springer-com.recursos.biblioteca.upc.edu/article/10.1007/s11704-016-5554-y
Web links
Previous capacities
Fundamental knowledge of relational data modeling.Be able to create, consult and manipulate databases with SQL.
Foundations of knowledge representation and first-order logics
Advanced programming in Python.