Statistical Learning

Teachers
Weekly hours
Learning Outcomes
Objectives
Contents
Activities
Teaching methodology
Evaluation methodology
Bibliography
Previous capacities

Credits

6

Types

Compulsory

Requirements

This subject has not requirements, but it has got previous capacities

Department

UB;UAB

This course offers an in-depth exploration of statistical learning theory and advanced data analysis techniques. Students will develop both theoretical understanding and practical expertise in handling complex biological and health-related datasets.
The curriculum begins with the foundations of statistical learning, covering core problems such as classification, regression, and clustering, as well as essential concepts like loss functions, model complexity, regularization, and figures of merit from signal detection theory. Building on this foundation, students will master preprocessing methods necessary for analyzing real-world data from sources such as chromatography-mass spectrometry and microarrays.
A strong emphasis is placed on dimensionality reduction, including both feature selection and extraction, to address the challenges of high-dimensional biological data. Students will engage with a comprehensive suite of machine learning algorithms, from basic classifiers and clustering techniques to advanced methods such as support vector machines, decision trees, random forests, and neural network architectures.
The course integrates robust validation strategies to ensure reliable model assessment and interpretation.

Teachers

Person in charge

Santiago Marco Colás ( )

Others

Agustín Gutiérrez Gálvez ( )
Elitza Nikolaeva Maneva ( )

Weekly hours

Theory

2

Problems

0

Laboratory

2

Guided learning

0

Autonomous learning

6

Learning Outcomes

Knowledge

K2 - Identify mathematical models and statistical and computational methods that allow for solving problems in the fields of molecular biology, genomics, medical research, and population genetics.
K3 - Identify the mathematical foundations, computational theories, algorithmic schemes and information organization principles applicable to the modeling of biological systems and to the efficient solution of bioinformatics problems through the design of computational tools.
K4 - Integrate the concepts offered by the most widely used programming languages in the field of Life Sciences to model and optimize data structures and build efficient algorithms, relating them to each other and to their application cases.
K5 - Identify the nature of the biological variables that need to be analyzed, as well as the mathematical models, algorithms, and statistical tests appropriate to develop and evaluate statistical analyses and computational tools.

Skills

S2 - Computationally analyze DNA, RNA and protein sequences, including comparative genome analyses, using computation, mathematics and statistics as basic tools of bioinformatics.
S3 - Solve problems in the fields of molecular biology, genomics, medical research and population genetics by applying statistical and computational methods and mathematical models.
S4 - Develop specific tools that enable solving problems on the interpretation of biological and biomedical data, including complex visualizations.
S8 - Make decisions, and defend them with arguments, in the resolution of problems in the areas of biology, as well as, within the appropriate fields, health sciences, computer sciences and experimental sciences.

Competences

C3 - Communicate orally and in writing with others in the English language about learning, thinking and decision making outcomes.
C6 - Detect deficiencies in the own knowledge and overcome them through critical reflection and the choice of the best action to expand this knowledge.

Objectives

Implement correct data partition schemes for training, optimization and performance assessment in predictive modeling.
Related competences: K2, K3, K4, S3, S4,
Select proper data preprocessing techniques before model building
Related competences: K2, K3, K4,
Perform dimensionality reduction using both feature selection and extraction methods.
Related competences: K2, K3, K4, K5, S3, S4, C3, C6,
Critically assess model performance using appropriate validation techniques.
Related competences: K2, K3, K4, S3, S8, C3, C6,
Apply advanced machine learning and signal processing methods to real-world bioinformatics and health data challenges.
Related competences: K2, K3, K4, K5, S2, S3, S4, S8, C3, C6,
To write a lab report in formal language, good structure and good quality visuals
Related competences: C3,
Orally defend a teamwork regarding a machine learning analysis of a dataset. Produce good-quality slides and structure the presentation to provide a clear message to the audience. Answer technical questions with proficiency.
Related competences: S8, C3,
Understand technical literature in the area of statistical learning for health. Identify key concepts and identify ideas that required deeper analysis.
Related competences: K2, K3, C6,

Introduction to Statistical Learning: Basic Concepts and Examples
Motivation and basic concepts: Application examples. Tools
Introduction to Statistical Learning (II)
Figures of merit. Basic Classifiers. Overfitting i complexity control. Dimensionality reduction. Regularization.
Data preprocessing: from raw data to features.
Examples in spectrometry. Noise reduction, baseline correction, peak detection and integration, alignment, non-linear transformations, scaling and normalization techniques.
Dimensionality reduction: Feature extraction
The curse of dimensionality. Principal Component Analysis. Linear Discriminant Analysis.
Dimensionality Reduction: Feature Selection
The importance of data partition. Univariate approaches. Mutivariate approches: Filters, Wrappers, Sequential Searches, Genetic Algorithms. Feature Rankings and Recursive Feature Elimination.
Clustering
K-means, Hierarchical clustering, Gaussian Mixture Models, Parzen Windows
Basic classifiers
Bayes Theorem. Linear and Quadratic Discriminant Classifiers. Naive Bayes. Partial Least Squares Discriminant Analysis.
Model validation and cross-validation
Validation levels and purpose. Estratificació. Internal/External validation. Hold-out, Leave-one-out, k-fold, random subsampling, Bootstrap
Advanced classifiers
Support Vector Machines, Decision Trees, Random Forest. XGBoost
Multilinear Regression
Overview of univariable linear regression. Multilinear Regression. The condition number. Ridge Regression. LASSO. Subset selection.
Advanced Regression
Neural Networks, The perceptron. The multilayer perceptron. Gradient descent techniques. Deep Learning. Support Vector Regression.

Activities

Activity Evaluation act

Theory Lectures

Contents:

1 . Introduction to Statistical Learning: Basic Concepts and Examples
2 . Introduction to Statistical Learning (II)
3 . Data preprocessing: from raw data to features.
4 . Dimensionality reduction: Feature extraction
5 . Dimensionality Reduction: Feature Selection
6 . Clustering
7 . Basic classifiers
8 . Model validation and cross-validation
9 . Advanced classifiers
10 . Multilinear Regression
11 . Advanced Regression

Theory

28h

Problems

0h

Laboratory

0h

Guided learning

0h

Autonomous learning

30h

Computational lab

Objectives: 1 2 3 4 5 6

Theory

0h

Problems

0h

Laboratory

30h

Guided learning

0h

Autonomous learning

22.5h

Small Project

Objectives: 1 2 3 4 5 7

Theory

0h

Problems

0h

Laboratory

0h

Guided learning

0h

Autonomous learning

30h

Reading

Objectives: 8
Contents:

3 . Data preprocessing: from raw data to features.
4 . Dimensionality reduction: Feature extraction
5 . Dimensionality Reduction: Feature Selection
7 . Basic classifiers
8 . Model validation and cross-validation

Theory

0h

Problems

0h

Laboratory

0h

Guided learning

0h

Autonomous learning

7.5h

mid term exam

Theory

2h

Problems

0h

Laboratory

0h

Guided learning

0h

Autonomous learning

0h

Teaching methodology

The teaching methodology combines lectures with computational laboratories. Additionally, students in groups will have to analyze a set of data and present their analysis orally.

Evaluation methodology

The evaluation of the course will take into account the partial exam (P) , the final exam (F), the lab reports (LR) , lab questionnaires (LQ), Reading questionnaire (RQ) the computational homework (H) and the Small Project (SP). They will be combined according to the formula.
Grade= 0.2*P+0.2*F+0.2*SP+0.05*RQ+0.1*H+0.15*LR+0.1*LQ

In the case of repeating students, in no case will activities carried out in previous years be taken into account.

Students who fail the subject may take the reassessment exam; in this case, the grade of this exam, E, will replace the grades P and F so that the final grade will be 0.4*E+0.2*SP+0.05*RQ+0.1*H+0.15*LR+0.1*LQ

Bibliography

Basic:

The Elements of statistical learning : data mining, inference, and prediction - Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome, Springer, cop. 2009. ISBN: 0387952845
https://link-springer-com.recursos.biblioteca.upc.edu/book/10.1007/978-0-387-84858-7
Pattern recognition and machine learning - Bishop, Christopher M, Springer, cop. 2006. ISBN: 9780387310732
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003157379706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Bioinformatics : the machine learning approach - Baldi, Pierre; Brunak, Soren, The MIT Press, cop. 2001. ISBN: 9780262025065
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003149339706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Multidimensional scaling - Cox, Trevor F; Cox, Michael A. A, Chapman & Hall, cop. 2001. ISBN: 1584880945
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991001195129706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Correspondence analysis in practice - Greenacre, Michael J, CRC Press/Taylor, 2017. ISBN: 9781498731782
https://www-taylorfrancis-com.recursos.biblioteca.upc.edu/books/mono/10.1201/9781315369983/correspondence-analysis-practice-michael-greenacre
Chemometrics with R: multivariate data analysis in the natural sciences and life sciences - Wehrens, Ron, Springer Science, 2011. ISBN: 9786613086648
https://link-springer-com.recursos.biblioteca.upc.edu/book/10.1007/978-3-642-17841-2
Introduction to multivariate statistical analysis in chemometrics - Varmuza, Kurt; Filzmoser, Peter, CRC Press, 2016.
Data Science and Predictive Analytics: Biomedical and Health Applications using R - Dinov, Ivo D, Springer, 2023. ISBN: 9783031174827
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991005498239106711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Modern statistics for modern biology - Holmes, Susan; Huber, Wolfgang, Cambridge Press, 2025.
An Introduction to statistical learning : with applications in R - James, Gareth, Springer, ©2021. ISBN: 1071614177
https://ebookcentral-proquest-com.recursos.biblioteca.upc.edu/lib/upcatalunya-ebooks/detail.action?pq-origsite=primo&docID=6686746
An introduction to statistical Learning : with applications in Python - James, Gareth, Springer, 2023. ISBN: 9783031391897
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991005494541006711&context=L&vid=34CSUC_UPC:VU1&lang=ca

Previous capacities

Programming in R. Biostatistics. Algebra.

Statistical Learning

Teachers

Person in charge

Others

Weekly hours

Learning Outcomes

Learning Outcomes

Knowledge

Skills

Competences

Objectives

Contents

Activities

Theory Lectures

Computational lab

Small Project

Reading

mid term exam

Teaching methodology

Evaluation methodology

Bibliography

Basic:

Previous capacities

Where we are

Contact with us

Statistical Learning

You are here

Teachers

Person in charge

Others

Weekly hours

Learning Outcomes

Learning Outcomes

Knowledge

Skills

Competences

Objectives

Contents

Activities

Theory Lectures

Computational lab

Small Project

Reading

mid term exam

Teaching methodology

Evaluation methodology

Bibliography

Basic:

Previous capacities

Where we are

Contact with us