Machine Learning I

Teachers

Person in charge

Josep Vidal Manzano (josep.vidal@upc.edu)

Others

Maria Ysern García (maria.ysern@upc.edu)
Sígrid Vila Bagaria (sigrid.vila@upc.edu)

Weekly hours

Theory

2

Problems

0

Laboratory

2

Guided learning

0

Autonomous learning

6

Competences

Technical Competences

Technical competencies

CE1 - Skillfully use mathematical concepts and methods that underlie the problems of science and data engineering.

CE3 - Analyze complex phenomena through probability and statistics, and propose models of these types in specific situations. Formulate and solve mathematical optimization problems.

CE8 - Ability to choose and employ techniques of statistical modeling and data analysis, evaluating the quality of the models, validating and interpreting them.

CE9 - Ability to choose and employ a variety of automatic learning techniques and build systems that use them for decision making, even autonomously.

CT3 - Efficient oral and written communication. Communicate in an oral and written way with other people about the results of learning, thinking and decision making; Participate in debates on topics of the specialty itself.

CT4 [Avaluable] - Teamwork. Be able to work as a member of an interdisciplinary team, either as a member or conducting management tasks, with the aim of contributing to develop projects with pragmatism and a sense of responsibility, taking commitments taking into account available resources.

CT7 - Third language. Know a third language, preferably English, with an adequate oral and written level and in line with the needs of graduates.

Generic Technical Competences

Generic

CG1 - To design computer systems that integrate data of provenances and very diverse forms, create with them mathematical models, reason on these models and act accordingly, learning from experience.

CG2 - Choose and apply the most appropriate methods and techniques to a problem defined by data that represents a challenge for its volume, speed, variety or heterogeneity, including computer, mathematical, statistical and signal processing methods.

Objectives

Formulate the problem of automatic learning from data, and get to know the types of tasks that can be given.
Related competences: CE9, CG2, CE1, CG1,
Organize the resolution flow of a machine learning problem, analyzing the possible options and choosing the most suitable for the problem.
Related competences: CT4, CE9, CG2, CT7, CG1, CE1,
Decide, defend and criticize a solution to a machine learning problem, arguing the strong and weak points of the approach.
Related competences: CT3, CT4, CE9, CG2,
Know and know how to apply linear techniques to solve supervised learning problems.
Related competences: CE3, CE8, CG2,
Know and know how to apply mono and multilayer neural network techniques to solve supervised learning problems.
Related competences: CE8, CE9, CG2,
Know and know how to apply support vector machines to the resolution of supervised learning problems.
Related competences: CE8, CE9, CG2,
Know and know how to apply the basic techniques for the resolution of unsupervised learning problems, with emphasis on data clustering tools.
Related competences: CE8, CE9, CG2,
Know and know how to apply the basic techniques for solving reinforcement learning problems.
Related competences: CE8, CE9, CG2,
Know and know how to apply ensemble techniques to solve supervised learning problems.
Related competences: CE8, CE9, CG2,

Introduction to Machine Learning
General information and basic concepts. Description and approach of problems attacked by automatic learning. Supervised learning (regression and classification), non-supervised (clustering) and semi-supervised (reinforcement and transductive). Modern examples of application.
Unsupervised machine learning: clustering
Definition and approach of unsupervised machine learning. Introduction to clustering. Probabilistic algorithms: k-means and Expectation-Maximization (E-M).
Supervised machine learning (I): linear regression methods
Maximum likelihood for regression. Errors for regression. Least squares: analytical (pseudo-inverse and SVD) and iterative ( gradient descent) methods. Notion of regularization. L1 and L2 regularized regression: algorithms ridge regression, LASSO and Elastic Net.
Supervised machine learning (II): linear methods for classification
Maximum likelihood for classification. Error functions for classification. Bayesian Generative Classifiers: LDA/QDA/RDA, Naïve Bayes and k-nearest neighbours.
Hierarchical methods: decision trees
General construction of decision trees. Split criteria: gain in entropy and Gini. Regularization in decision trees. CART trees for regression and classification.
Ensemble methods
Introduction to ensemble methods. Bagging and Random Forests. Boosting. Adaboost and variants.
Kernel based learning methods
Introduction to learning with kernel functions. Regularized kernelized linear regression. Basic kernel functions. Complexity and generalization: Vapnik-Chervonenkis dimension. Support Vector Machine.

Activities

Activity Evaluation act

Development of topic 1

Objectives: 1
Contents:

1 . Introduction to Machine Learning

Theory

2h

Problems

0h

Laboratory

0h

Guided learning

0h

Autonomous learning

3.3h

Development of topic 2

Objectives: 1 3 7
Contents:

2 . Unsupervised machine learning: clustering

Theory

3h

Problems

0h

Laboratory

2h

Guided learning

0h

Autonomous learning

6.6h

Development of topic 3

Objectives: 1 4
Contents:

3 . Supervised machine learning (I): linear regression methods

Theory

8h

Problems

0h

Laboratory

2h

Guided learning

0h

Autonomous learning

10h

Development of topic 4

Objectives: 1 2 4
Contents:

4 . Supervised machine learning (II): linear methods for classification

Theory

6h

Problems

0h

Laboratory

2h

Guided learning

0h

Autonomous learning

8.3h

Development of topic 5+6

Objectives: 1 2 5

Theory

5h

Problems

0h

Laboratory

2h

Guided learning

0h

Autonomous learning

11.6h

Development of topic 7

Objectives: 1 2 5

Theory

6h

Problems

0h

Laboratory

2h

Guided learning

0h

Autonomous learning

5h

Control session for the practical work

Objectives: 1 2 3 4 5 6 7 8 9
Week: 8 (Outside class hours)

Theory

0h

Problems

0h

Laboratory

0h

Guided learning

0h

Autonomous learning

0h

Delivery of the practical work

Objectives: 1 2 3 4 5 6 7 8 9
Week: 15 (Outside class hours)

Theory

0h

Problems

0h

Laboratory

0h

Guided learning

0h

Autonomous learning

0h

Seguiment i tutories de la pràctica

Seguiment i tutories de la pràctica
Objectives: 2 3 4 6 7 9
Contents:

1 . Introduction to Machine Learning
2 . Unsupervised machine learning: clustering
3 . Supervised machine learning (I): linear regression methods
4 . Supervised machine learning (II): linear methods for classification
7 . Kernel based learning methods
6 . Ensemble methods
5 . Hierarchical methods: decision trees

Theory

0h

Problems

0h

Laboratory

6h

Guided learning

0h

Autonomous learning

20h

Teaching methodology

The theory classes introduce all the knowledge, techniques, concepts and results necessary to reach a well-founded and insightful level of maturity. These concepts are put into practice in the laboratory classes. In these labs, Python code is provided that allows solving certain aspects of a data analysis problem with the techniques corresponding to the current topic of study. This laboratory also serves as a guide for the corresponding part of the term project, which must be developed by the students throughout the course. Some laboratory hours may be used to solve problems (without a computer) in the theory classroom.

There is a graded practical project which works out a real problem to be chosen by the student and which collects and integrates the knowledge and skills of the entire course. The generic competence of effective written communication is also evaluated by means of this practical work.

Evaluation methodology

The subject is evaluated through a partial exam, a final exam and a practical work in which a real problem is attacked, writing the corresponding report.

The final grade is calculated as:

Grade = 0.4 * Work + 0.4 final + 0.2 mid-term

For those students who can and want to attend re-evaluation, the re-evaluation exam grade will replace mid-term and final exams.

Bibliography

Basic

Pattern recognition and machine learning - Bishop, C.M, Springer, 2006. ISBN: 0387310738
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003157379706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Machine learning: a probabilistic perspective - Murphy, K.P, MIT Press, 2012. ISBN: 9780262018029
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003972109706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Learning from data: concepts, theory, and methods - Cherkassky, V.S.; Mulier, F, John Wiley, 2007. ISBN: 0471681822
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003624509706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
The elements of statistical learning: data mining, inference, and prediction - Hastie, T.; Tibshirani, R.; Friedman, J, Springer, 2009. ISBN: 0387848576
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003549679706711&context=L&vid=34CSUC_UPC:VU1&lang=ca

Previous capacities

Nocions mitjanes de probabilitat i estadística.
Nocions mitjanes d'algebra lineal, càlcul matricial i anàlisi real
Bon nivell de programació en llenguatges d'alt nivell

Teachers

Person in charge

Others

Weekly hours

Competences

Technical Competences

Technical competencies

Transversal Competences

Transversals

Generic Technical Competences

Generic

Objectives

Contents

Activities

Development of topic 1

Development of topic 2

Development of topic 3

Development of topic 4

Development of topic 5+6

Development of topic 7

Control session for the practical work

Delivery of the practical work

Seguiment i tutories de la pràctica

Teaching methodology

Evaluation methodology

Bibliography

Basic

Previous capacities