Credits
6
Types
Elective
Requirements
This subject has not requirements
, but it has got previous capacities
Department
CS;ESSI
Web
https://learnsql.fib.upc.es/moodle/
During the first weeks, the course is different for both tracks. For the former, it introduces students to CS-specific advanced topics for Data Science, while for the latter it requires the students to cover basic material on algorithms, data structures and databases in a guided, yet autonomous learning manner.
The last part of the course (corresponding to the last 5 weeks) is common and it is structured as project-based learning. A statement about an end-to-end Data Science project will be delivered and students will need to create the required software infrastructure, using adequate tools.
The objective is to help students get a rigorous and strong knowledge required by the following master courses (for students with a minor in CS) or investigate further CS-specific topics for Data Science (for students with a major in CS). Last, but not least, the course promotes adopting good habits when creating software projects for Data Science in the final project.
Weekly hours
Theory
1
Problems
0
Laboratory
2.9
Guided learning
0
Autonomous learning
6.85
Objectives
-
To analyse the cost of iterative and recursive algorithms
Related competences: CG1, CB9, CTR4, CTR6,
Subcompetences- Compare the efficiency of different algorithms for solving the same problem and select the most appropriate one.
- Understand the definitions of the Big-O, Omega and Theta asymptotic notations and their usefulness in characterising the efficiency of algorithms in time and space.
- Compute the cost of an algorithm in the worst, best and average cases.
-
To review some simple data structures: stacks, queues, lists, and trees
Related competences: CG1, CB9, CTR4, CTR6, -
To know, explain, design, analyse, compare and implement the main data structures and algorithms that can be used to implement priority queues
Related competences: CG1, CB9, CTR4, CTR6, -
To know, explain, design, analyse, compare and implement the main data structures and algorithms that can be used to implement dictionaries
Related competences: CG1, CB9, CTR4, CTR6, -
To know, explain, design, analyse, compare and implement the main data structures and algorithms that can be used to represent graphs and solve classic graph problems such as traversals, topological ordering and shortest paths
Related competences: CG1, CB9, CTR4, CTR6, -
To know, understand, explain, analyse and compare some algorithm design techniques: greedy, divide and conquer, and dynamic programming
Related competences: CG1, CB9, CTR4, CTR6, -
To be aware of the limits of computation: to understand the definitions of the P and NP classes, the concept of Polynomial-Time reduction, the notion of NP-Completeness, and to know some classic NP-complete problems
Related competences: CG1, CB9, CTR4, CTR6, -
Describe what is a database and a database management system
Related competences: CG1, CB9, CTR4, CTR6,
Subcompetences- Enumerate the objectives of a database management system
- Distinguish the objectives of a database management system from a file system
- Enumerate the objectives of a database management system
- Distinguish the objectives of a database management system from a file system
-
Effectively use the standard Structured Query Language (SQL) to query relational databases
Related competences: CG1, CB9, CTR4, CTR6, -
Explain the relational data model, including its data structures, the relational algebra and integrity constraints
Related competences: CG1, CB9, CTR4, CTR6,
Subcompetences- Express any SQL query in terms of a relational algebra expression
- Express any SQL query in terms of a relational algebra expression
-
Given a set of informational requirements, model the logic schema of a relational database
Related competences: CG1, CB9, CTR4, CTR6, -
Identify the main objectives of a database management system query optimizer
Related competences: CG1, CB9, CTR4, CTR6,
Subcompetences- Explain what is a cost model and how to automatically generate physical access plans
- Evaluate the main pros and cons of the most popular data structures used by database management systems given a workload characterized in terms of database cardinality, size and the operations expected and their selectivity factor
- Evaluate the main pros and cons of the most popular data structures used by database management systems given a workload characterized in terms of database cardinality, size and the operations expected and their selectivity factor
- Explain what is a cost model and how to automatically generate physical access plans
-
Develop a quality realistic end-to-end system architecture for a data science project
Related competences: CG1, CG2, CB9, CTR3, CTR4, CTR6,
Contents
-
Basics of Analysis of Algorithms
Worst case, best case and average case cost analysis. Asymptotic order of growth notations: Big-O, Omega and Theta. Analysis of the cost of iterative and recursive algorithms. -
Simple Data Structures: Review
Stacks, queues, lists and trees. -
Priority Queues
Operations of priority queues. Implementations with heaps. Heapsort. -
Dictionaries
Operations of dictionaries. Basic implementations: tables and lists. Advanced implementations: hash tables, binary search trees, and AVL trees. -
Graphs
Representations: adjacency matrices, adjacency lists and implicit representations. Depth-first search (DFS). Breadth-first search (BFS). Topological sort. Algorithms for shortest paths. Algorithm for minimum spanning trees. -
Algorithm design techniques
Greedy, divide and conquer, and dynamic programming. -
Introduction to NP and Computational Intractability
Basic introduction to P and NP classes, Polynomial-Time reduction, and NP-completeness. Examples of classic NP-complete problems. -
Introduction to databases and database management systems
Main concepts on databases and database management systems. Relational database management systems. -
SQL: Data-definition language and data-manipulation language
Introduction to the SQL language -
The relational model
Data structures and integrity constraints. Views. -
The relational algebra
The relational algebra operators and how to build data pipes with them. Notion of semantic and syntactic optimization. -
Logical design of relational databases
Normalization theory. Translating conceptual schemas into relational schemas. -
Notions of physical design and physical database optimization
Notions of query optimizer, access plan and cost model -
Data Science-related advanced topics
Students with a major in Computer Science will investigate on advanced topics specific for data science projects. For example, data quality, entity resolution, data integration, etc.
Activities
Activity Evaluation act
Basics of Analysis of Algorithms
To compare the efficiency of different algorithms for solving the same problem and select the most appropriate one. To compute the cost of an algorithm in the worst, best and average cases. To understand the definitions of the asymptotic order of growth notations Big-O, Omega and Theta, and their usefulness in characterising algorithm efficiency in time and space.Objectives: 1
Contents:
Theory
1h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
0h
Theory
1h
Problems
0h
Laboratory
3h
Guided learning
0h
Autonomous learning
0h
Priority Queues
Operations of priority queues. Implementations with heaps. Heapsort.Objectives: 3
Contents:
Theory
1h
Problems
0h
Laboratory
3h
Guided learning
0h
Autonomous learning
0h
Dictionaries
Operations of dictionaries and ordered dictionaries. Basic implementations: tables and lists. Advanced implementations: hash tables, binary search trees, AVL treesObjectives: 4
Contents:
Theory
1h
Problems
0h
Laboratory
3h
Guided learning
0h
Autonomous learning
0h
Graphs
Representations: adjacency matrices, adjacency lists and implicit representation. Depth-first search (DFS). Breadth-first search (BFS). Topological sort. Dijkstra's algorithm for shortest paths. Prim's algorithm for minimum spanning trees.Objectives: 5
Contents:
Theory
1h
Problems
0h
Laboratory
3h
Guided learning
0h
Autonomous learning
0h
Algorithmic schemes
Divide and conquer, Greedy algorithms, Dynamic Programming, Exhaustive search, Backtracking.Objectives: 6
Contents:
Theory
1h
Problems
0h
Laboratory
4h
Guided learning
0h
Autonomous learning
0h
Notions of Intractability
Basic introduction to P and NP classes. NP-completeness.Objectives: 7
Contents:
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h
Introduction to databases and database management systems
The student attends the lecture, takes notes and participates in the session exercisesObjectives: 8
Contents:
Theory
1h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h
SQL
The student attends the lecture, takes notes and participates in the session exercisesObjectives: 9
Contents:
Theory
0h
Problems
0h
Laboratory
3h
Guided learning
0h
Autonomous learning
0h
Physical Optimization
The student attends the lecture, takes notes and participates in the session exercisesObjectives: 12
Contents:
Theory
3h
Problems
0h
Laboratory
4h
Guided learning
0h
Autonomous learning
0h
Data Science Advanced Topics project (DS-AT project)
Students with a major in Computer Science will investigate on advanced topics specific for data science projects. For example, data quality, entity resolution, data integration, etc. Students with a minor in Computer Science will investigate and further study the fundamental concepts introduced in the lecturing hours.
Theory
0h
Problems
0h
Laboratory
4h
Guided learning
0h
Autonomous learning
40h
Data Science End-to-End Project (DS-EE)
All students will undertake a project spanning all main phases of a data science. As result, they are asked to develop a quality realistic end-to-end system architecture for a data science project.
Theory
0h
Problems
0h
Laboratory
4h
Guided learning
0h
Autonomous learning
45h
Teaching methodology
During the first 10 weeks, the students are divided in two tracks: one for students with a minor in computer science (track 1) and another one for students with a major in computer science (track 2).Students in track 1 will study fundamental concepts in algorithms, data structures and databases. First, additional material to read, study and understand is provided. Students will have a large bank of exercises to practice their understanding on their own. Then, in the face-to-face lectures, the lecturer will solve doubts and go through representative exercises to guarantee a solid understanding. Also, for the most complex concepts, some regular teaching sessions will be scheduled.
Students in track2 will undertake a research project related to advanced topics of algorithms, data structures and databases within the context of data science. Relevant and critical problems such as data integration, entity resolution and data quality will be proposed. The student will have access to a description of the problem, some seminal research papers and reference tools in this matter.
The last 5 weeks of the semester is common for all students regardless of the track they followed during the first 10 weeks. During these weeks the course project takes place. Students must create a end-to-end system architecture to ingest, store, process, learn models and deploy such system for a realistic project with realistic data.
This course has a strong self-learning component. Complementing it, the lecturers will provide additional material, advice hours and complementary lectures and labs to guarantee a solid comprehension.
Evaluation methodology
Let E1 be the score of the partial exam,E2 the score of the final exam,
RPM the score of the DS-AT project and
CPM the score of the DS-EE project.
Then,
NE = MAX(E2, E1)
If the student followed track 1 (see methodology) then NT = NE
else if student followed track 2 (see methodology) then NT = RPM
The final mark will be 0.6*NT + 0.4*CPM
Transversal competences will weight a 0% in the final score.
Bibliography
Basic
-
Introduction to algorithms
- Cormen, T.H. [et al.],
MIT Press,
2022.
ISBN: 9780262046305
https://search-ebscohost-com.recursos.biblioteca.upc.edu/login.aspx?direct=true&AuthType=ip,uid&db=nlebk&AN=2932690&site=ehost-live&ebv=EK&ppid=Page-__-1 -
Data structures and algorithm in Python
- Goodrich, M.T.; Tamassia, R.; Goldwasser, M.H,
Wiley,
2013.
ISBN: 9781118290279
-
Data Structures and Algorithms with Python
- Lee, K.D.; Hubbard, S,
Springer,
2014.
ISBN: 9783319130712
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004152009706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Computers and intractability: a guide to the theory of NP-Completeness
- Garey, M.R.; Johnson, D.S,
W.H. Freeman,
1979.
ISBN: 0716710447
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991000087999706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Bases de dades
- Sistac, J.; Camps, R,
UOC,
2005.
ISBN: 8497883349
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003181479706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Tècniques avançades de bases de dades
- Sistac, J. (coord.),
EDIUOC,
2000.
ISBN: 8484291065
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991002052069706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
SQL: 1999: understanding relational, language components
- Melton, J.; Simon, A.R,
Morgan Kaufmann,
2002.
ISBN: 1558604561
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991002281729706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Database management systems
- Ramakrishnan, R.; Gehrke, J,
McGraw-Hill,
2003.
ISBN: 0071151109
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991002855579706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Database modeling and design: logical design
- Teorey, T.J. [et al.],
Morgan Kaufmann Publishers/Elsevier,
2011.
ISBN: 9780123820204
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004000559706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Physical database design: the database professional's guide to exploiting indexes, views, storage, and more
- Lightstone, S.; Teorey, T.J.; Nadeau, T,
Morgan Kaufmann Publishers,
2007.
ISBN: 9780123693891
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003252949706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Database systems: the complete book
- Garcia-Molina, H.; Ullman, J.D.; Widom, J,
Pearson Education Limited,
2013.
ISBN: 9781292024479
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004168919706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Complementary
-
Introduction to algorithms: a creative approach
- Manber, U,
Addison-Wesley,
1989.
ISBN: 0201120372
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991000704999706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
SQL-99 complete, really
- Gulutzan, P.; Pelzer, T,
R & D books,
1999.
ISBN: 0879305681
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991002180409706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Understanding SQL and Java together: a guide to SQLJ, JDBC, and related technologies
- Melton, J.; Eisenberg, A,
Morgan Kaufmann Publishers,
2000.
ISBN: 1558605622
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991002084329706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Encyclopedia of database systems
- Liu, L.; Özsu, M.T,
Springer,
2009.
ISBN: 9780387399409
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991000621799706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Cost-based oracle fundamentals
- Lewis, J,
Apress,
2006.
ISBN: 9781590596364
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003403389706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Web links
- LEARN-SQL http://learnsql.fib.upc.edu
- Jutge https://jutge.org