The course is structured in two different parts: the study of data driven software engineering and the study of aspects related to data privacy and security.
1. Software Engineering. The availability of large volumes of data from both the development of software systems and their use makes it possible to use them in various stages and activities of software engineering and, even going further, defines a new approach that considers data as the cornerstone of the software life cycle. The first part of the course presents this new vision of software engineering and delves into the emerging software engineering practices and tools to automatize the construction of ML-enabled components, and the end-to-end ML component life cycle, from model building to production deployment.
2. Data privacy and security. Data analysis techniques can help to obtain information to anticipate various problems, make its source known and help to implement solutions, in contexts as varied as business competitiveness, marketing, social relations, transport, health, education and politics. However, while data analysis is extremely valuable, it also has a crucial drawback: it increasingly invades the privacy of the people about whom data is collected. The second part of the course presents basic concepts of information privacy and delves into the main privacy technologies and metrics, as well as the anonymization algorithms used to prevent any disclosure of sensitive information about individuals
Teachers
Person in charge
Silverio Juan Martínez Fernández (
)
Others
Esteve Pallares Segarra (
)
Javier Parra Arnau (
)
Jordi Forne Muñoz (
)
Santiago Del Rey Juarez (
)
Weekly hours
Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6
Competences
Technical Competences
Technical competencies
CE1 - Skillfully use mathematical concepts and methods that underlie the problems of science and data engineering.
CE2 - To be able to program solutions to engineering problems: Design efficient algorithmic solutions to a given computational problem, implement them in the form of a robust, structured and maintainable program, and check the validity of the solution.
CE3 - Analyze complex phenomena through probability and statistics, and propose models of these types in specific situations. Formulate and solve mathematical optimization problems.
CE7 - Demonstrate knowledge and ability to apply the necessary tools for the storage, processing and access to data.
CE8 - Ability to choose and employ techniques of statistical modeling and data analysis, evaluating the quality of the models, validating and interpreting them.
Transversal Competences
Transversals
CT4 [Avaluable] - Teamwork. Be able to work as a member of an interdisciplinary team, either as a member or conducting management tasks, with the aim of contributing to develop projects with pragmatism and a sense of responsibility, taking commitments taking into account available resources.
Basic
CB2 - That the students know how to apply their knowledge to their work or vocation in a professional way and possess the skills that are usually demonstrated through the elaboration and defense of arguments and problem solving within their area of ??study.
CB3 - That students have the ability to gather and interpret relevant data (usually within their area of ??study) to make judgments that include a reflection on relevant social, scientific or ethical issues.
CB5 - That the students have developed those learning skills necessary to undertake later studies with a high degree of autonomy
Generic Technical Competences
Generic
CG1 - To design computer systems that integrate data of provenances and very diverse forms, create with them mathematical models, reason on these models and act accordingly, learning from experience.
CG2 - Choose and apply the most appropriate methods and techniques to a problem defined by data that represents a challenge for its volume, speed, variety or heterogeneity, including computer, mathematical, statistical and signal processing methods.
CG4 - Identify opportunities for innovative data-driven applications in evolving technological environments.
Objectives
Interpret the basic concepts of Software Engineering for ML systems, especially in relation to the use and exploitation of MLOps practices.
Related competences:
CG1,
CB2,
Apply and analyze good software engineering practices related to data science and machine learning projects
Related competences:
CE1,
CE2,
CT4,
CG1,
CG4,
CB2,
CB5,
Apply and analyze MLOps practices to build ML models, fostering reproducibility and quality assurance.
Related competences:
CE1,
CE2,
CE3,
CE7,
CT4,
CG1,
CG4,
CB2,
CB5,
Apply and analyze MLOps practices to deploy ML models, fostering API development.
Related competences:
CE1,
CE2,
CE7,
CE8,
CG1,
CG2,
CB2,
CB5,
Understand the privacy risks associated with browsing and publishing data. To achieve a deeper understanding of the different privacy metrics and their application in different scenarios.
Related competences:
CE1,
CE3,
CE8,
CT4,
CG2,
CB3,
CB5,
Understand the main anonymization algorithms for statistical databases.
Related competences:
CE1,
CE2,
CE3,
CE8,
CT4,
CG1,
CG2,
CG4,
CB2,
CB3,
CB5,
Evaluate the trade-off between privacy and data usability .
Related competences:
CE1,
CE3,
CE8,
CT4,
CG1,
CG4,
CB2,
CB3,
CB5,
Understand the privacy risks in communitacions and the anonymous communication systems.
Related competences:
CE1,
CE3,
CE8,
CG1,
CG4,
CB2,
CB5,
Contents
Introduction to Software Engineering
First, the traditional concept of software engineering is presented. Then, the impact of data availability on this traditional concept is analyzed. The resulting software life cycle when considering data is shown. Motivación de la necesidad de ingeniería de software para sistemas ML. Introducción a MLOps y conceptos clave. Ingeniería de requisitos para ML.
Good software engineering practices for data science and machine learning projects
The complexity and diversity of data science projects and machine learning systems call for engineering techniques to ensure they are built in a robust and future-proof manner. On this chapter we address software engineering best practices for data science projects software including ML components.
MLOps practices to build ML models and manage the quality of the software and its development process
The complexity and diversity of data science projects and ML systems call for engineering techniques to ensure they are built in a robust and future-proof manner. On this chapter we address software engineering best practices for data science projects software including ML components: version control systems; ML pipeline reproducibility and tracking; software measurement for ML; quality assurance for ML.
MLOps practices to deploy ML models
The complexity and diversity of ML systems call for engineering techniques to ensure they are deployed in a robust and production-ready manner. On this chapter we address software engineering best practices for ML components: software architecture for ML; deploying ML models; APIs for ML.
Introduction to data privacy and security
Motivation. Definition of basic concepts. Attackers and trusted parties. Privacy metrics.
Algorithms for data anonymization
Statistical disclosure control. Measure the risk of disclosure. Microaggregation algorithms. Measurement of privacy-utility trade-off. Case studies.
Privacy in personalised information systems
User profiles: a measure of privacy risk. Privacy-enhancing technologies.
Security and privacy in communications
Cryptographic algorithms. Authentication and key management. Anonymous communication systems.
Activities
ActivityEvaluation act
Study of basic concepts of Software Engineering for ML systems (MLOps)
Practical development of a case study of MLOps practices in the context of ML-based systems
The student will progressively develop a practice that allows him to exercise the basic concepts introduced in the theory part. It will be developed in teams of 4-5 students. The resulting software, duly documented, will be uploaded to a code repository. The team will present a report, written in English, summarizing the main aspects of the practice, for example, the process of building an ML component of an ML-based system, and an evaluation of the accuracy of the models and algorithms used. Objectives:234 Contents:
First partial exam: Software Engineering part (PARC1)
Evaluation of the first part of the course Objectives:1234 Week:
7
Theory
1.5h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
5.5h
Final Exam (EXF)
This exam evaluates the two parts of the subject. Students who have failed any of the two partial tests are required. The rest of the students can also apply if they want to improve their grades Objectives:15234678 Week:
15 (Outside class hours)
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h
Second partial examination: part of Privacy and Data Security (PARC2)
Evaluation of the second part of the subject Objectives:5678 Week:
14
Theory
1.5h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
5.5h
Study of introductory concepts on data privacy and security
Study of mechanisms and technologies for communications security and privacy
Theory
4h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h
Teaching methodology
The theoretical contents of the course are taught in the theory classes. These classes are complemented with practical examples and problems that students must solve in the Autonomous Learning hours.
In the laboratory sessions, the knowledge acquired in the theory classes is consolidated by solving problems and developing practices related to the theoretical contents. During the laboratory classes, the teacher will introduce new techniques and will leave an important part of the class for the students to work on the proposed exercises.
Evaluation methodology
The evaluation is structured according to the two parts of the course: software engineering (PART1) and data privacy (PART2).
For the first part, the grade is calculated by weighting the grade of a theoretical exam (weight 40%) with the grade of the laboratory of this part of the subject (weight 60%)
PART1 = 40% PARC1 + 60% LABO1 * IndivFactorLABO1
- PARC1: Examination at the end of the first part of the course.
- LABO1: Delivery of the laboratory project of the first part of the course.
- IndivFactorLABO1: The individual factor IndivFact is a multiplicative factor among 0.8 and 1.2 (and similarly, cannot make LABO1 grow beyond 10). This factor is obtained from the evaluation that the teacher makes about the participation of the student in the project development and the evaluation that the team mates make on this very participation. In really exceptional situations, IndivFact can be less than 0.8 for those students who have really very low participation in the project along the course.
For the second part, the grade is calculated by weighting the grade of a theoretical exam (weight 50%) with the grade of the practical of this part of the subject (weight 50%)
PART2 = 50% PARC2 + 50% LABO2
- PARC2: Examination at the end of the second part of the course.
- LABO2: Delivery of practices of the second part of the course.
The final grade of the course, NOTA-FIN, is calculated as the arithmetic mean of the two parts of the course:
NOTA-FIN = 50% PART1 + 50% PART2
In case of not passing the course by the evaluation of mid-term exams, there is an evaluation by a final exam, where the mid-term exams are released if they are passed.
Bibliography
Basic:
Building intelligent systems : a guide to machine learning engineering -
Hulten, G,
Apress, 2018. ISBN: 9781484234327