Introduction to Machine Learning

Teachers

Person in charge

Sergio Álvarez Napagao (salvarez@cs.upc.edu)

Others

Jordi Luque Serrano (jordi.luque.serrano@upc.edu)

Weekly hours

Theory

2

Problems

0

Laboratory

2

Guided learning

0

Autonomous learning

6

Competences

Transversal Competences

Transversals

CT2 [Avaluable] - Sustainability and Social Commitment. To know and understand the complexity of economic and social phenomena typical of the welfare society; Be able to relate well-being to globalization and sustainability; Achieve skills to use in a balanced and compatible way the technique, the technology, the economy and the sustainability.

CT5 [Avaluable] - Solvent use of information resources. Manage the acquisition, structuring, analysis and visualization of data and information in the field of specialty and critically evaluate the results of such management.

CT6 - Autonomous Learning. Detect deficiencies in one's own knowledge and overcome them through critical reflection and the choice of the best action to extend this knowledge.

CT8 [Avaluable] - Gender perspective. An awareness and understanding of sexual and gender inequalities in society in relation to the field of the degree, and the incorporation of different needs and preferences due to sex and gender when designing solutions and solving problems.

Basic

CB3 - That students have the ability to gather and interpret relevant data (usually within their area of ??study) to make judgments that include a reflection on relevant social, scientific or ethical issues.

Technical Competences

Especifics

CE03 - To identify and apply the basic algorithmic procedures of computer technologies to design solutions to problems by analyzing the suitability and complexity of the proposed algorithms.

CE04 - To design and use efficiently the most appropriate data types and structures to solve a problem.

CE09 - To ideate, design and integrate intelligent data analysis systems with their application in production and service environments.

CE15 - To acquire, formalize and represent human knowledge in a computable form for solving problems through a computer system in any field of application, particularly those related to aspects of computing, perception and performance in intelligent environments or environments.

CE20 - To select and put to use techniques of statistical modeling and data analysis, assessing the quality of the models, validating and interpreting.

Generic Technical Competences

Generic

CG1 - To ideate, draft, organize, plan and develop projects in the field of artificial intelligence.

CG2 - To use the fundamental knowledge and solid work methodologies acquired during the studies to adapt to the new technological scenarios of the future.

CG3 - To define, evaluate and select hardware and software platforms for the development and execution of computer systems, services and applications in the field of artificial intelligence.

CG4 - Reasoning, analyzing reality and designing algorithms and formulations that model it. To identify problems and construct valid algorithmic or mathematical solutions, eventually new, integrating the necessary multidisciplinary knowledge, evaluating different alternatives with a critical spirit, justifying the decisions taken, interpreting and synthesizing the results in the context of the application domain and establishing methodological generalizations based on specific applications.

CG6 - To identify opportunities for innovative applications of artificial intelligence and robotics in constantly evolving technological environments.

CG7 - To interpret and apply current legislation, as well as specifications, regulations and standards in the field of artificial intelligence.

CG8 - Perform an ethical exercise of the profession in all its facets, applying ethical criteria in the design of systems, algorithms, experiments, use of data, in accordance with the ethical systems recommended by national and international organizations, with special emphasis on security, robustness , privacy, transparency, traceability, prevention of bias (race, gender, religion, territory, etc.) and respect for human rights.

CG9 - To face new challenges with a broad vision of the possibilities of a professional career in the field of Artificial Intelligence. Develop the activity applying quality criteria and continuous improvement, and act rigorously in professional development. Adapt to organizational or technological changes. Work in situations of lack of information and / or with time and / or resource restrictions.

Objectives

Learn the main methods of machine learning, and how to use them appropriately.
Related competences: CB3, CT2, CT5, CE03, CE04, CE09, CE15, CE20, CG1, CG2, CG3, CG4, CG8,
Interact in a critical and prudent manner with data and machine learning models
Related competences: CT6, CT8, CE04, CG1, CG4, CG7, CG8, CG9,
Subcompetences
- Keep a critic and skeptic view of model behavior
- Identify biases in data
Recognize in an easy manner the characteristics of a problem from the perspective of machine learning
Related competences: CT5, CE03, CE04, CE09, CE15, CG1, CG2, CG4, CG6, CG9,
Subcompetences
- Identify analysis of relevance to be conducted on a data set
- Propose the most appropriate learning types for a problem

Intro to machine learning.
Basic tipes of learning. What can they be used for, purposes and main limitations. Includes a set of warnings and sanity checks to keep in mind while working with machine learning.
Experimental design in machine learning
Using data for learning. How to design, execute and evaluate experiments conducted with machine learning techniques.
Data preprocessing
Distributions, normalization and standardization of data. How and why to prepare data to be processed by machine learning algorithms.
Applied Regression
Practical cases of regression
Dimensionality reduction
Review of the main methods to reduce the dimensionality of data: PCA, UMAP, T-SNE, ...
Classification: Basic concepts and review of basic methods
Distance measurements are studied and related to the concept of similitude, which then allows us to build and compare a large number of methods. Revision of the K-Nearest Neighbor as a simple framework and extension to other methods.
Classification methods based on other criteria
Support Vector Machines, Neural Networks (classic architectures) and Decision Trees.
Muilticlassification
The main methods of combining "weak" learning methods are studied in order to obtain more robust models: Boosting, Bagging, GAMs, EBMs, Sets
Explainability
Relevance, use and methods of explicability. Several methods are studied in order to interpret and explain the operation and result of machine learning algorithms, a basic need for the implementation and acceptance of these methods. The foundations for Explainable AI (Explainable Artificial Intelligence) are being laid.
Clustering
The bases of the classical methods of obtaining significant data sets in the absence of class information and / or prior structures are reviewed. K-means, Hierarchical Clustering, Spectral Clustering, DBSCAN.
Genetic algorisms
Introduction to genetic algorithms, as a first vision of bio-inspired learning methods. The conceptual and mathematical bases of the main mutation and crossover operators and their representational variants are reviewed.
Machine learning in graphs
The structure of the graph is widespread in various environments and has donated to a whole discipline, the Science of the Xarxes, on it is work on the structural properties of the graphs to derive properties and conclusions about the phenomenon or field that is studied. This type of learning is especially important in internet applications, near or recovery applications or knowledge detection. Detection of communities, prediction of accidents, etc.

Activities

Activity Evaluation act

Types, purposes and limitations
Objectives: 1 2
Contents:

1 . Intro to machine learning.
2 . Experimental design in machine learning
3 . Data preprocessing

Theory

6h

Problems

0h

Laboratory

0h

Guided learning

0h

Autonomous learning

6h

Data preprocessing and manipulation

Objectives: 1 3
Contents:

3 . Data preprocessing
5 . Dimensionality reduction

Theory

4h

Problems

0h

Laboratory

6h

Guided learning

0h

Autonomous learning

12h

Machine learning methods

Objectives: 1 3
Contents:

4 . Applied Regression
6 . Classification: Basic concepts and review of basic methods
7 . Classification methods based on other criteria
8 . Muilticlassification
10 . Clustering
11 . Genetic algorisms

Theory

12h

Problems

0h

Laboratory

20h

Guided learning

0h

Autonomous learning

18h

Other aspects of machine learning

Objectives: 1 2 3
Contents:

9 . Explainability
12 . Machine learning in graphs

Theory

4h

Problems

0h

Laboratory

4h

Guided learning

0h

Autonomous learning

10h

First Mid-term Exam

Objectives: 1
Week: 8

Theory

0h

Problems

0h

Laboratory

0h

Guided learning

0h

Autonomous learning

0h

Final

Objectives: 1
Week: 15 (Outside class hours)

Theory

0h

Problems

0h

Laboratory

0h

Guided learning

0h

Autonomous learning

0h

Lab evaluation

Objectives: 1
Week: 15 (Outside class hours)

Theory

0h

Problems

0h

Laboratory

0h

Guided learning

0h

Autonomous learning

0h

Teaching methodology

Interactive classes of theoretical content. Relatively autonomous laboratory sessions of practical contingency.

Evaluation methodology

The course consists of one partial exam (P) and a final exam (F). The laboratory will be evaluated continuously (LC) and through a final delivery (LF).

Final score = (0.2*P) + (0.4*F) + (0.1*LC) + (0.3*LF)

Reassessment: Only those who have failed the final exam may take the reassessment. The maximum grade that can be obtained in the reassessment is 7.

Bibliography

Basic

Pattern recognition and machine learning - Bishop, Christopher M, Springer, cop. 2006. ISBN: 0387310738
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003157379706711&context=L&vid=34CSUC_UPC:VU1&lang=ca

Previous capacities

Understand the computing flow within a software system.
Understand the basic concepts behind inference, deduction and evidence based reasoning.
Being familiarized with data distribution, basic data preprocessing, i how numerical variables can represent information.