Advanced Topics in Data Engineering II

You are here

Credits
6
Types
Compulsory
Requirements
This subject has not requirements, but it has got previous capacities
Department
ETSETB;FIB;FME;ESSI;ENTEL
The course is structured in two different parts: the study of data driven software engineering and the study of aspects related to data privacy and security.

1. Software Engineering. The availability of large volumes of data from both the development of software systems (obtained from tools such as code repositories, quality managers, etc.) and their use (user opinions, quality of service, logs, etc.) makes it possible to use them in various stages and activities of software engineering and, even going further, defines a new approach that considers data as the cornerstone of the software life cycle. The first part of the course presents this new vision of software engineering and delves into three specific activities that revolve around data: software quality management, planning of new versions and autonomous adaptation of software systems aware of the context at runtime.

2. Data privacy and security. Data analysis techniques can help to obtain information to anticipate various problems, make its source known and help to implement solutions, in contexts as varied as business competitiveness, marketing, social relations, transport, health, education and politics. However, while data analysis is extremely valuable, it also has a crucial drawback: it increasingly invades the privacy of the people about whom data is collected. The second part of the course presents basic concepts of information privacy and delves into the main privacy technologies and metrics, as well as the anonymization algorithms used to prevent any disclosure of sensitive information about individuals

Teachers

Person in charge

  • Jordi Forne Muñoz ( )
  • Silverio Juan Martínez Fernández ( )
  • Xavier Franch Gutiérrez ( )

Others

  • Esteve Pallares Segarra ( )

Weekly hours

Theory
2
Problems
0
Laboratory
2
Guided learning
0.2
Autonomous learning
5.8

Competences

Technical Competences

Technical competencies

  • CE1 - Skillfully use mathematical concepts and methods that underlie the problems of science and data engineering.
  • CE2 - To be able to program solutions to engineering problems: Design efficient algorithmic solutions to a given computational problem, implement them in the form of a robust, structured and maintainable program, and check the validity of the solution.
  • CE3 - Analyze complex phenomena through probability and statistics, and propose models of these types in specific situations. Formulate and solve mathematical optimization problems.
  • CE7 - Demonstrate knowledge and ability to apply the necessary tools for the storage, processing and access to data.
  • CE8 - Ability to choose and employ techniques of statistical modeling and data analysis, evaluating the quality of the models, validating and interpreting them.

Transversal Competences

Transversals

  • CT4 [Avaluable] - Teamwork. Be able to work as a member of an interdisciplinary team, either as a member or conducting management tasks, with the aim of contributing to develop projects with pragmatism and a sense of responsibility, taking commitments taking into account available resources.

Basic

  • CB2 - That the students know how to apply their knowledge to their work or vocation in a professional way and possess the skills that are usually demonstrated through the elaboration and defense of arguments and problem solving within their area of ??study.
  • CB3 - That students have the ability to gather and interpret relevant data (usually within their area of ??study) to make judgments that include a reflection on relevant social, scientific or ethical issues.
  • CB5 - That the students have developed those learning skills necessary to undertake later studies with a high degree of autonomy

Generic Technical Competences

Generic

  • CG1 - To design computer systems that integrate data of provenances and very diverse forms, create with them mathematical models, reason on these models and act accordingly, learning from experience.
  • CG2 - Choose and apply the most appropriate methods and techniques to a problem defined by data that represents a challenge for its volume, speed, variety or heterogeneity, including computer, mathematical, statistical and signal processing methods.
  • CG4 - Identify opportunities for innovative data-driven applications in evolving technological environments.

Objectives

  1. Interpret the basic concepts of Software Engineering, especially in relation to the use and exploitation of data
    Related competences: CG1, CB2,
  2. Apply and analyze concepts and methods concerning the use of data from the development process in the quality management of the software system
    Related competences: CE1, CE2, CE3, CE7, CT4, CG1, CG4, CB2, CB5,
  3. Apply and analyze good software engineering practices related to data science and machine learning projects
    Related competences: CE1, CE2, CT4, CG1, CG4, CB2, CB5,
  4. Describe concepts and methods related to the use of data obtained during the use of the system, in order to plan new evolutionary versions or to self-adapt systems at runtime in response to changes.
    Related competences: CE1, CE2, CE7, CE8, CG1, CG2, CB2, CB5,
  5. Understand the privacy risks associated with browsing and publishing data. To achieve a deeper understanding of the different privacy metrics and their application in different scenarios.
    Related competences: CE1, CE3, CE8, CT4, CG2, CB3, CB5,
  6. Understand the main anonymization algorithms for statistical databases.
    Related competences: CE1, CE2, CE3, CE8, CT4, CG1, CG2, CG4, CB2, CB3, CB5,
  7. Evaluate the trade-off between privacy and data usability .
    Related competences: CE1, CE3, CE8, CT4, CG1, CG4, CB2, CB3, CB5,
  8. Understand the privacy risks in communitacions and the anonymous communication systems.
    Related competences: CE1, CE3, CE8, CG1, CG4, CB2, CB5,

Contents

  1. Introduction to Software Engineering
    First, the traditional concept of software engineering is presented. Phases. Methodologies: waterfall, agile; hybrid. Development environment: tools.

    Then, the impact of data availability on this traditional concept is analyzed. The resulting software life cycle when considering data is shown.
  2. Quality management of the software and its development process
    A classic problem in software development is to ensure basic levels of quality, both in reference to the system itself (maintainability, reliability,...) and in the production process (team productivity, resource management,...). The analysis of data from the software repositories used in the production process (e.g., code repositories, problem management tools) allows a faster and more reliable discovery of these problems and the implementation of mitigation strategies.
  3. Good software engineering practices for data science and machine learning projects
    The complexity and diversity of data science projects and machine learning systems call for engineering techniques to ensure they are built in a robust and future-proof manner. On this chapter we address software engineering best practices for data science projects software including ML components.
  4. Software version planning and self-adaptive systems
    A key problem in software development is the evolution of the system in response to new needs. The analysis of the data obtained during the use of the system by its users, including their explicit comments, makes it possible to discover their real needs, which sometimes even they are not fully aware of. This topic describes the problem and reviews some basic techniques. More and more we find software systems that need to be aware of their context in order to provide a correct service. This restriction requires them to monitor context data continuously, discover significant changes and react at runtime (eventually, almost in real time). This topic describes the problem and reviews some basic techniques
  5. Introduction to data privacy and security
    Motivation. Definition of basic concepts. Attackers and trusted parties. Privacy metrics.
  6. Algorithms for data anonymization
    Statistical disclosure control. Measure the risk of disclosure. Microaggregation algorithms. Measurement of privacy-utility trade-off. Case studies.
  7. Privacy in personalised information systems
    User profiles: a measure of privacy risk. Privacy-enhancing technologies.
  8. Security and privacy in communications
    Cryptographic algorithms. Authentication and key management. Anonymous communication systems.

Activities

Activity Evaluation act


Study of introductory concepts of data driven software engineering


Objectives: 1
Contents:
Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
2h

Study of data driven methods for software quality management and its development process


Objectives: 2
Contents:
Theory
4h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Study of good software engineering practices for data science and machine learning projects


Objectives: 3
Contents:
Theory
4h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Study of data-driven methods for software evolution and the self-adaptation of systems at runtime


Objectives: 4
Contents:
Theory
4h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Practical development of a case study of data-based methods in the context of Software Engineering

The student will progressively develop a practice that allows him to exercise the basic concepts introduced in the theory part. It will be developed in teams of 3 (exceptionally, a team of 4 students if the group has an odd dimension). The resulting software, duly documented, will be uploaded to a code repository. The team will present a report, written in English, summarizing the main aspects of the practice, for example, the data mining process used, and an evaluation of the accuracy of the models and algorithms used.
Objectives: 3 2
Contents:
Theory
0h
Problems
0h
Laboratory
13h
Guided learning
0h
Autonomous learning
18h

First partial exam: Software Engineering part (PARC1)

Evaluation of the first part of the course
Objectives: 1 3 2 4
Week: 7
Type: theory exam
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
5h

Final Exam (EXF)

This exam evaluates the two parts of the subject. Students who have failed any of the two partial tests are required. The rest of the students can also apply if they want to improve their grades
Objectives: 1 5 3 2 4 6 7 8
Week: 15 (Outside class hours)
Type: final exam
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
3h
Autonomous learning
0h

Second partial examination: part of Privacy and Data Security (PARC2)

Evaluation of the second part of the subject
Objectives: 5 6 7 8
Week: 14
Type: theory exam
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
5h

Study of introductory concepts on data privacy and security


Objectives: 5 6 7 8
Theory
4h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
5h

Practical development of data anonymization algorithms


Objectives: 6 7
Theory
0h
Problems
0h
Laboratory
15h
Guided learning
0h
Autonomous learning
21h

Study of risks and privacy technologies for personalised information systems


Objectives: 5 7
Theory
4h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Study of mechanisms and technologies for communications security and privacy



Theory
4h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Teaching methodology

The theoretical contents of the course are taught in the theory classes. These classes are complemented with practical examples and problems that students must solve in the Autonomous Learning hours.

In the laboratory sessions, the knowledge acquired in the theory classes is consolidated by solving problems and developing practices related to the theoretical contents. During the laboratory classes, the teacher will introduce new techniques and will leave an important part of the class for the students to work on the proposed exercises.

Evaluation methodology

The final grade is based on four tests:
- Examination at the end of the first block of the course (PARC1)
- Examination at the end of the second block of the course (PARC2)
- Final exam, composed of two parts, one for each block of the subject (EXF1, EXF2)
- Delivery of practice at the end of the first block of the course (LABO1)
- Delivery of practices of the second block of the course (LABO2)

The final grade of the course, NOTE-END, is calculated as:

END-NOTE = 50% TEO + 25% LABO1 + 25% LABO2

The theory note is calculated as:

1) Evaluation by mid-sections: minimum of 4.0 for each mid-section and average passed. Then the grade will be the arithmetic mean of the grades of each partial.

If (PARC1> = 4.0 and PARC2> = 4.0) and ((PARC1 + PARC2) / 2)> = 5.0 then TEO = (PARC1 + PARC2) / 2

2) Otherwise: Evaluation by final exam: there is a minimum of 4.0 to each block and a half approved; the partial ones release material if approved.

if (NOTE-Block 1> = 4.0 and NOTE-BLOC2> = 4.0) and ((NOTE-Block 1 + NOTE-BLOC2) / 2)> = 5.0
then TEO = (NOTE-Block 1 + NOTE-BLOC2) / 2
otherwise TEO = min ((NOTE-BLOCK 1 + NOTE-BLOC2) / 2, 4.5), where

if PARC {y}> = 5.0 then NOTE-BLOCK {y} = max (PARC {y}, EXF {y})
otherwise NOTE-BLOCK {y} = EXF {y}

Bibliography

Basic:

Complementary:

Previous capacities

Those given by the subjects of the previous quarters of the degree

Addendum

Contents

No hi ha canvis

Teaching methodology

Les classes de teoria s'imparteixen en modalitat no presencial. Cada setmana, la primera hora de classe es farà amb material disponible online pregravat i la segona hora serà síncrona per vídeoconferència amb possibilitat de gravació.

Evaluation methodology

No hi ha canvis

Contingency plan

Les classes de teoria ja estan preparades per a la no-presencialitat. Les classes de laboratori estan dissenyades amb comunicació via Racó / ATENEA i un espai Google Drive de l'assignatura, la qual cosa en facilitaria la transició a no-presencialitat total. En cas d'alarma sanitària, les sessions de laboratori es farien mitjançant google meet. Els exàmens també es farien telemàticament via ATENEA.