Statistical Learning

You are here

Credits
6
Types
Compulsory
Requirements
This subject has not requirements, but it has got previous capacities
Department
UB;UAB
This course offers an in-depth exploration of statistical learning theory and advanced data analysis techniques. Students will develop both theoretical understanding and practical expertise in handling complex biological and health-related datasets.
The curriculum begins with the foundations of statistical learning, covering core problems such as classification, regression, and clustering, as well as essential concepts like loss functions, model complexity, regularization, and figures of merit from signal detection theory. Building on this foundation, students will master preprocessing methods necessary for analyzing real-world data from sources such as chromatography-mass spectrometry and microarrays.
A strong emphasis is placed on dimensionality reduction, including both feature selection and extraction, to address the challenges of high-dimensional biological data. Students will engage with a comprehensive suite of machine learning algorithms, from basic classifiers and clustering techniques to advanced methods such as support vector machines, decision trees, random forests, and neural network architectures.
The course integrates robust validation strategies to ensure reliable model assessment and interpretation.

Teachers

Person in charge

  • Santiago Marco Colás ( )

Others

  • Agustín Gutiérrez Gálvez ( )
  • Elitza Nikolaeva Maneva ( )

Weekly hours

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6

Objectives

  1. Implement correct data partition schemes for training, optimization and performance assessment in predictive modeling.
    Related competences: K2, K3, K4, S3, S4,
  2. Select proper data preprocessing techniques before model building
    Related competences: K2, K3, K4,
  3. Perform dimensionality reduction using both feature selection and extraction methods.
    Related competences: C3, C6, K2, K3, K4, K5, S3, S4,
  4. Critically assess model performance using appropriate validation techniques.
    Related competences: C3, C6, K2, K3, K4, S3, S8,
  5. Apply advanced machine learning and signal processing methods to real-world bioinformatics and health data challenges.
    Related competences: C3, C6, K2, K3, K4, K5, S2, S3, S4, S8,
  6. To write a lab report in formal language, good structure and good quality visuals
    Related competences: C3,
  7. Orally defend a teamwork regarding a machine learning analysis of a dataset. Produce good-quality slides and structure the presentation to provide a clear message to the audience. Answer technical questions with proficiency.
    Related competences: C3, S8,
  8. Understand technical literature in the area of statistical learning for health. Identify key concepts and identify ideas that required deeper analysis.
    Related competences: K2, C6, K3,

Contents

  1. Introduction to Statistical Learning: Basic Concepts and Examples
    Motivation and basic concepts: Application examples. Tools
  2. Introduction to Statistical Learning (II)
    Figures of merit. Basic Classifiers. Overfitting i complexity control. Dimensionality reduction. Regularization.
  3. Data preprocessing: from raw data to features.
    Examples in spectrometry. Noise reduction, baseline correction, peak detection and integration, alignment, non-linear transformations, scaling and normalization techniques.
  4. Dimensionality reduction: Feature extraction
    The curse of dimensionality. Principal Component Analysis. Linear Discriminant Analysis.
  5. Dimensionality Reduction: Feature Selection
    The importance of data partition. Univariate approaches. Mutivariate approches: Filters, Wrappers, Sequential Searches, Genetic Algorithms. Feature Rankings and Recursive Feature Elimination.
  6. Clustering
    K-means, Hierarchical clustering, Gaussian Mixture Models, Parzen Windows
  7. Basic classifiers
    Bayes Theorem. Linear and Quadratic Discriminant Classifiers. Naive Bayes. Partial Least Squares Discriminant Analysis.
  8. Model validation and cross-validation
    Validation levels and purpose. Estratificació. Internal/External validation. Hold-out, Leave-one-out, k-fold, random subsampling, Bootstrap
  9. Advanced classifiers
    Support Vector Machines, Decision Trees, Random Forest. XGBoost
  10. Multilinear Regression
    Overview of univariable linear regression. Multilinear Regression. The condition number. Ridge Regression. LASSO. Subset selection.
  11. Advanced Regression
    Neural Networks, The perceptron. The multilayer perceptron. Gradient descent techniques. Deep Learning. Support Vector Regression.

Activities

Activity Evaluation act



Computational lab


Objectives: 1 2 3 4 5 6
Theory
0h
Problems
0h
Laboratory
30h
Guided learning
0h
Autonomous learning
22.5h

Small Project


Objectives: 1 2 3 4 5 7
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
30h


mid term exam



Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h

Teaching methodology

The teaching methodology combines lectures with computational laboratories. Additionally, students in groups will have to analyze a set of data and present their analysis orally.

Evaluation methodology

The evaluation of the course will take into account the partial exam (P) , the final exam (F), the lab reports (LR) , lab questionnaires (LQ), Reading questionnaire (RQ) the computational homework (H) and the Small Project (SP). They will be combined according to the formula.
Grade= 0.2*P+0.2*F+0.2*SP+0.05*RQ+0.1*H+0.15*LR+0.1*LQ

In the case of repeating students, in no case will activities carried out in previous years be taken into account.

Students who fail the subject may take the reassessment exam; in this case, the grade of this exam, E, will replace the grades P and F so that the final grade will be 0.4*E+0.2*SP+0.05*RQ+0.1*H+0.15*LR+0.1*LQ

Bibliography

Basic:

Previous capacities

Programming in R. Biostatistics. Algebra.