In this course, we introduce a project-based learning approach to teaching MLOps, focused on the demonstration and experience with emerging software engineering practices and tools to automatize the construction of ML-enabled components. The course includes laboratory sessions that cover the end-to-end ML component life cycle, from model building to production deployment.
Teachers
Person in charge
Silverio Juan Martínez Fernández (
)
Others
Matías Sebastián Martínez Martínez (
)
Santiago Del Rey Juarez (
)
Weekly hours
Theory
1.8
Problems
0
Laboratory
1.8
Guided learning
0
Autonomous learning
6.4
Competences
Transversal Competences
Sustainability and social commitment
CT2 - Capability to know and understand the complexity of economic and social typical phenomena of the welfare society; capability to relate welfare with globalization and sustainability; capability to use technique, technology, economics and sustainability in a balanced and compatible way.
Teamwork
CT3 - Ability to work as a member of an interdisciplinary team, as a normal member or performing direction tasks, in order to develop projects with pragmatism and sense of responsibility, making commitments taking into account the available resources.
Third language
CT5 - Achieving a level of spoken and written proficiency in a foreign language, preferably English, that meets the needs of the profession and the labour market.
Basic
CB8 - Capability to communicate their conclusions, and the knowledge and rationale underpinning these, to both skilled and unskilled public in a clear and unambiguous way.
CB9 - Possession of the learning skills that enable the students to continue studying in a way that will be mainly self-directed or autonomous.
Generic Technical Competences
Generic
CG3 - Define, design and implement complex systems that cover all phases in data science projects
CG4 - Design and implement data science projects in specific domains and in an innovative way
Technical Competences
Especifics
CE5 - Model, design, and implement complex data systems, including data visualization
CE7 - Identify the limitations imposed by data quality in a data science problem and apply techniques to smooth their impact
CE10 - Identify machine learning and statistical modeling methods to use and apply them rigorously in order to solve a specific data science problem
Objectives
Interpret the basic concepts of Software Engineering for ML systems, especially in relation to the use and exploitation of MLOps practices.
Related competences:
CT5,
CG3,
CE5,
Apply and analyze MLOps practices to build ML models, fostering reproducibility and quality assurance.
Related competences:
CT2,
CT3,
CE7,
CE10,
CB8,
CB9,
Apply and analyze MLOps practices to deploy ML models, fostering API development and component delivery.
Related competences:
CT3,
CG3,
CG4,
CE5,
CB8,
CB9,
Describe concepts and methods related to monitoring data obtained during the use of ML systems, in order to enable feedback loops in response to changes.
Related competences:
CT3,
CG3,
CG4,
CE5,
CB8,
CB9,
Contents
Basic concepts of Software Engineering for ML systems (MLOps)
Motivation of the need of software engineering for ML systems. MLOps introduction and key concepts. Requirements engineering for ML. Collaborative development platforms.
MLOps practices to build ML models
The complexity and diversity of data science projects and ML systems call for engineering techniques to ensure they are built in a robust and future-proof manner. On this chapter we address software engineering best practices for data science projects software including ML components: version control systems; ML pipeline reproducibility and tracking; software measurement for ML; quality assurance for ML.
MLOps practices to deploy ML models
The complexity and diversity of ML systems call for engineering techniques to ensure they are deployed in a robust and production-ready manner. On this chapter we address software engineering best practices for ML components: software architecture for ML; deploying ML models; APIs for ML; packaging of ML components; automation of ML pipelines.
Monitoring data obtained during the use of ML systems
A key problem in software development is the evolution of the ML system in response to new needs. The analysis of the data obtained during the use of the ML system by its users, including their explicit comments, makes it possible to discover their real needs, which sometimes even they are not fully aware of. More and more we find software systems that need to be aware of their context in order to provide a correct service. This restriction requires them to monitor context data continuously, discover significant changes and react at runtime (eventually, almost in real time). This topic describes the problem and reviews some basic techniques: monitoring and telemetry; MLOps cycles and feedback loops.
Activities
ActivityEvaluation act
Study of basic concepts of Software Engineering for ML systems (MLOps)
Practical development of an end-to-end project of MLOps practices in the context of ML-based systems
The student will progressively develop a practice that allows him to exercise the basic concepts introduced in the theory part. It will be developed in teams of 4-5 students. The resulting software, duly documented, will be uploaded to a code repository. The team will present a report, written in English, summarizing the main aspects of the practice. This is, the process of building and deploying an ML component of an ML-based system, and an evaluation of the accuracy of the models and algorithms used. Objectives:1234 Contents:
Presentation of the summary of an existing article about MLOps
The student will present the summary of a scientific article. All students need to present (at least) once. Presenters need to make at least one question to the other presentations to foster discussions. Lecturers prepare a list of articles. Objectives:1 Week:
14
Theory
1.8h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
6.4h
Presentation of the practical development of an end-to-end project of MLOps practices in the context of ML-based systems
The theoretical contents of the course are taught in the theory classes. These classes are complemented with practical examples and problems that students must solve in the Autonomous Learning hours.
In the laboratory sessions, the knowledge acquired in the theory classes is consolidated by solving problems and developing practices related to the theoretical contents. During the laboratory classes, the teacher will introduce new techniques and will leave an important part of the class for the students to work on the proposed exercises.
Evaluation methodology
The grade is calculated by weighting the grade of the project (weight 90%) and grade of an article presentation in theory (weight 10%). Both activities are mandatory.
In the project grade, the completion of the project and the individual work are graded. As a result, each student's final project grade is determined from the following formula:
ProjectGrade = TeamGrade * IndivFact
The project's overall TeamGrade grade takes into account the application of software engineering practices.
The individual factor IndivFact is a multiplicative factor among 0 and 1.2 (and similarly, cannot make ProjectGrade grow beyond 10). This factor is obtained from the evaluation that the teacher makes about the participation of the student in the project development and the evaluation that the team mates make on this very participation.
2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET) -
Lanubile, Filippo; Martínez-Fernández, Silverio; Quaranta, Luigi, 2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET) ,
2023.
https://arxiv.org/pdf/2302.01048.pdf