The objective of MVA is to provide the students with the knowledge of the statistical concepts of multivariate data analysis and their basic methodologies, which constitute a core mainstream for Data Mining.
Person in charge
Karina Gibert Oliveras (
Tomas Aluja Banet (
Belchin Adriyanov Kostov (
Lidia Montero Mercadé (
Generic Technical Competences
CG1 - Capability to apply the scientific method to study and analyse of phenomena and systems in any area of Computer Science, and in the conception, design and implementation of innovative and original solutions.
CG3 - Capacity for mathematical modeling, calculation and experimental designing in technology and companies engineering centers, particularly in research and innovation in all areas of Computer Science.
CTR4 - Capability to manage the acquisition, structuring, analysis and visualization of data and information in the area of informatics engineering, and critically assess the results of this effort.
CTR6 - Capacity for critical, logical and mathematical reasoning. Capability to solve problems in their area of study. Capacity for abstraction: the capability to create and use models that reflect real situations. Capability to design and implement simple experiments, and analyze and interpret their results. Capacity for analysis, synthesis and evaluation.
Technical Competences of each Specialization
CEC1 - Ability to apply scientific methodologies in the study and analysis of phenomena and systems in any field of Information Technology as well as in the conception, design and implementation of innovative and original computing solutions.
CEC2 - Capacity for mathematical modelling, calculation and experimental design in engineering technology centres and business, particularly in research and innovation in all areas of Computer Science.
Visual representation of the data
Visualisation of link matrices
Analysis of frequency data
Multiple Correspondence Analysis
Analysis of categorical data
Synthesis of the represented information. Consolidation of the partition
Multivariate normal distribution
Definition and properties
Sampling distibutions of the normal multivariate distribution
Inferences respect to the covariance matrix. Inferences respect to the centroid of the distribution. Whishart distribution. T2 of Hotelling, Wilks lambda.
With the assumption of multivariate normal distribution. Linear discriminant analysis. Quadratic discriminant analysis.
Simplifying the linear discriminant analysis
Discriminant analysis without probabilistic assumptions
K nearest neighbor classifier
Classification and regression trees
The aim of the course is to give the statistical foundations for data mining. Learning is done through a combination of theoretical explanation and its application to a real case. The lectures will develop the necessary scientific knowledge, while lab classes will be its application to solving problems of data mining. These problems constitute the practices of the subject, which will be developed in part during laboratory classes. The implementation of practices foster generic skills related to teamwork and presentation of results and serve to integrate different knowledge of the subject. The software used will be primarily R.
The course evaluation will be based on the marks obtained in practical exercises conducted during the course, an examination grade and the grade obtained in the final practice.
Each practice will lead to the drafting of the relevant report writing and may be made jointly, up to a maximum of two students per group.
The exercises conducted throughout the course aim to consolidate the learning of multivariate techniques.
The final practice is that students show their maturity to solve a real problem using multivariate visualization techniques, "clustering" interpretation and prediction. Students will choose between different alternatives to solve the problem. This practice will be presented and publicly defended the student must answer any questions about the theoretical models and methods used in the solution. Practices are conducted using the software R.
The written test will be held the last day of class and evaluate the assimilation of the basic concepts of the subject. While the presentation of the second practice will be done during the examination period.
The exercises are weighted 30%, examination 40% respectively and practice final 30%.
Aprender de los Datos: El Análsis de Componentes Principales -
Aluja Banet, Tomas y Morineau, Alain,
EUB, 1999. ISBN: 84-8312-022-4
The Elements of statistical learning : data mining -
Trevor Hastie, Robert Tibshirani, Jerome Friedman,
Springer, 2001. ISBN: 0-387--95284-5
Applied Multivariate Statistical Analysis -
Johnson, Richard A.; Wichern, Dean W. ,
Prentice Hall, 1998. ISBN: 0-13-834194-X
The course implies having previously done a basic course in statistics, programming and mathematics; in particular having adquired the following concepts:
- Average, covariance and correlation matrix.
- Hypothesis Test
- Matrix algebra, eigenvalues and eigenvectors.,
- programing algorithms.
- multiple linear-regression
Where we are
B6 Building Campus Nord
C/Jordi Girona Salgado,1-3
08034 BARCELONA Spain
Tel: (+34) 93 401 70 00