Credits
6
Types
Compulsory
Requirements
This subject has not requirements
, but it has got previous capacities
Department
ETSETB;FIB;FME;ESSI;ENTEL
1. Software Engineering. The availability of large volumes of data from both the development of software systems and their use makes it possible to use them in various stages and activities of software engineering and, even going further, defines a new approach that considers data as the cornerstone of the software life cycle. The first part of the course presents this new vision of software engineering and delves into the emerging software engineering practices and tools to automatize the construction of ML-enabled components, and the end-to-end ML component life cycle, from model building to production deployment.
2. Data privacy and security. Data analysis techniques can help to obtain information to anticipate various problems, make its source known and help to implement solutions, in contexts as varied as business competitiveness, marketing, social relations, transport, health, education and politics. However, while data analysis is extremely valuable, it also has a crucial drawback: it increasingly invades the privacy of the people about whom data is collected. The second part of the course presents basic concepts of information privacy and delves into the main privacy technologies and metrics, as well as the anonymization algorithms used to prevent any disclosure of sensitive information about individuals
Teachers
Person in charge
- Javier Parra Arnau ( javier.parra@upc.edu )
- Silverio Juan Martínez Fernández ( silverio.martinez@upc.edu )
Others
- Esteve Pallares Segarra ( esteve@entel.upc.edu )
- Jordi Forne Muñoz ( jforne@entel.upc.edu )
- Santiago Del Rey Juarez ( santiago.del.rey@upc.edu )
Weekly hours
Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6
Competences
Technical competencies
Transversals
Basic
Generic
Objectives
-
Interpret the basic concepts of Software Engineering for ML systems, especially in relation to the use and exploitation of MLOps practices.
Related competences: CG1, CB2, -
Apply and analyze good software engineering practices related to data science and machine learning projects
Related competences: CE1, CE2, CT4, CG1, CG4, CB2, CB5, -
Apply and analyze MLOps practices to build ML models, fostering reproducibility and quality assurance.
Related competences: CE1, CE2, CE3, CE7, CT4, CG1, CG4, CB2, CB5, -
Apply and analyze MLOps practices to deploy ML models, fostering API development.
Related competences: CE1, CE2, CE7, CE8, CG1, CG2, CB2, CB5, -
Understand the privacy risks associated with browsing and publishing data. To achieve a deeper understanding of the different privacy metrics and their application in different scenarios.
Related competences: CE1, CE3, CE8, CT4, CG2, CB3, CB5, -
Understand the main anonymization algorithms for statistical databases.
Related competences: CE1, CE2, CE3, CE8, CT4, CG1, CG2, CG4, CB2, CB3, CB5, -
Evaluate the trade-off between privacy and data usability .
Related competences: CE1, CE3, CE8, CT4, CG1, CG4, CB2, CB3, CB5, -
Understand the privacy risks in communitacions and the anonymous communication systems.
Related competences: CE1, CE3, CE8, CG1, CG4, CB2, CB5,
Contents
-
Introduction to Software Engineering
First, the traditional concept of software engineering is presented. Then, the impact of data availability on this traditional concept is analyzed. The resulting software life cycle when considering data is shown. Motivating the need for software engineering for ML systems. Introduction to MLOps and key concepts. Requirements engineering for ML. -
Good software engineering practices for data science and machine learning projects
The complexity and diversity of data science projects and machine learning systems call for engineering techniques to ensure they are built in a robust and future-proof manner. On this chapter we address software engineering best practices for data science projects software including ML components. -
MLOps practices to build ML models and manage the quality of the software and its development process
The complexity and diversity of data science projects and ML systems call for engineering techniques to ensure they are built in a robust and future-proof manner. On this chapter we address software engineering best practices for data science projects software including ML components: version control systems; ML pipeline reproducibility and tracking; software measurement for ML; quality assurance and testing for ML, including environmental sustainability. -
MLOps practices to deploy ML models
The complexity and diversity of ML systems call for engineering techniques to ensure they are deployed in a robust and production-ready manner. On this chapter we address software engineering best practices for ML components: software architecture for ML; deploying ML models; APIs for ML; packaging of ML components. -
Introduction to data privacy and security
Motivation. Definition of basic concepts. Attackers and trusted parties. Privacy metrics. -
Algorithms for data anonymization
Statistical disclosure control. Measure the risk of disclosure. Microaggregation algorithms. Measurement of privacy-utility trade-off. Case studies. -
Privacy in personalised information systems
User profiles: a measure of privacy risk. Privacy-enhancing technologies. -
Security and privacy in communications
Cryptographic algorithms. Authentication and key management. Anonymous communication systems.
Activities
Activity Evaluation act
Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
2h
Study of good software engineering practices for data science and machine learning projects
Objectives: 2
Contents:
Theory
4h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h
Study of MLOps practices to build ML models and software quality management and its development process
Objectives: 3
Contents:
Theory
4h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h
Theory
3h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h
Practical development of a case study of MLOps practices in the context of ML-based systems
The student will progressively develop a practice that allows him to exercise the basic concepts introduced in the theory part. It will be developed in teams of 4-5 students. The resulting software, duly documented, will be uploaded to a code repository. The team will present a report, written in English, summarizing the main aspects of the practice, for example, the process of building an ML component of an ML-based system, and an evaluation of the accuracy of the models and algorithms used.Objectives: 2 3 4
Contents:
Theory
0h
Problems
0h
Laboratory
13h
Guided learning
0h
Autonomous learning
31.5h
Final Exam (EXF)
This exam evaluates the two parts of the subject. Students who have failed any of the two partial tests are required. The rest of the students can also apply if they want to improve their gradesObjectives: 1 5 2 3 4 6 7 8
Week: 15 (Outside class hours)
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h
Study of mechanisms and technologies for communications security and privacy
Theory
4h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h
Teaching methodology
The theoretical contents of the course are taught in the theory classes. These classes are complemented with practical examples and problems that students must solve in the Autonomous Learning hours.In the laboratory sessions, the knowledge acquired in the theory classes is consolidated by solving problems and developing practices related to the theoretical contents. During the laboratory classes, the teacher will introduce new techniques and will leave an important part of the class for the students to work on the proposed exercises.
Evaluation methodology
The evaluation is structured according to the two parts of the course: software engineering (PART1) and data privacy (PART2).For the first part, the grade is calculated by weighting the grade of a theoretical exam (weight 40%) with the grade of the laboratory of this part of the subject (weight 60%)
PART1 = 40% PARC1 + 60% LABO1 * IndivFactorLABO1
- PARC1: Examination at the end of the first part of the course.
- LABO1: Delivery of the laboratory project of the first part of the course.
- IndivFactorLABO1: The individual factor IndivFact is a multiplicative factor among 0.8 and 1.2 (and similarly, cannot make LABO1 grow beyond 10). This factor is obtained from the evaluation that the teacher makes about the participation of the student in the project development and the evaluation that the team mates make on this very participation. In really exceptional situations, IndivFact can be less than 0.8 for those students who have really very low participation in the project along the course.
For the second part, the grade is calculated by weighting the grade of a theoretical exam (weight 50%) with the grade of the practical of this part of the subject (weight 50%)
PART2 = 50% PARC2 + 50% LABO2
- PARC2: Examination at the end of the second part of the course.
- LABO2: Delivery of practices of the second part of the course.
The final grade of the course, NOTA-FIN, is calculated as the arithmetic mean of the two parts of the course:
NOTA-FIN = 50% PART1 + 50% PART2
In case of not passing the course by the evaluation of mid-term exams, there is an evaluation by a final exam, where the mid-term exams are released if they are passed.
Bibliography
Basic
-
Machine Learning in Production: From Models to Products
- Kästner, Christian,
MIT Press,
2025.
ISBN: 9780262049726
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991005330527706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Statistical disclosure control for microdata: methods and applications in R
- Templ, M,
Springer International Publishing AG,
2017.
ISBN: 9783319502724
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991001685219706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Complementary
-
SEET@ICSE
- Lanubile, Filippo; Martínez-Fernández, Silverio; Quaranta, Luigi,
SEET@ICSE,
2023.
https://doi.org/10.1109/ICSE-SEET58685.2023.00015 -
IEEE software
- LANUBILE, Filippo; MARTÍNEZ-FERNÁNDEZ, Silverio; QUARANTA, Luigi,
IEEE software,
2024.
ISBN: 0740-7459
https://doi.org/10.1109/MS.2023.3310768 -
Reliable Machine Learning
- Chen, Cathy,
O'Reilly Media, Inc.,
2022.
ISBN: 1098106172
https://ebookcentral-proquest-com.recursos.biblioteca.upc.edu/lib/upcatalunya-ebooks/detail.action?pq-origsite=primo&docID=30130756 -
Data privacy: foundations, new developments and the big data challenge
- Torra i Reventós, V,
Springer International Publishing,
2017.
ISBN: 9783319573564
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004122599706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Advanced research in data privacy
- Navarro-Arribas, G.; Torra i Reventós, V. (eds.),
Springer International Publishing,
2015.
ISBN: 9783319098852
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004048289706711&context=L&vid=34CSUC_UPC:VU1&lang=ca