Multivariate Analysis

You are here

Credits
6
Types
Specialization compulsory (Data Science)
Requirements
This subject has not requirements

Department
EIO
The objective of MVA is to provide the students with the knowledge of the statistical concepts of multivariate data analysis and their basic methodologies, which constitute a core mainstream for Data Mining.

Teachers

Person in charge

  • Tomas Aluja Banet ( )

Weekly hours

Theory
2
Problems
0
Laboratory
2
Guided learning
0.15
Autonomous learning
7.39

Competences

Generic Technical Competences

Generic

  • CG1 - Capability to apply the scientific method to study and analyse of phenomena and systems in any area of Computer Science, and in the conception, design and implementation of innovative and original solutions.
  • CG3 - Capacity for mathematical modeling, calculation and experimental designing in technology and companies engineering centers, particularly in research and innovation in all areas of Computer Science.

Transversal Competences

Information literacy

  • CTR4 - Capability to manage the acquisition, structuring, analysis and visualization of data and information in the area of informatics engineering, and critically assess the results of this effort.

Reasoning

  • CTR6 - Capacity for critical, logical and mathematical reasoning. Capability to solve problems in their area of study. Capacity for abstraction: the capability to create and use models that reflect real situations. Capability to design and implement simple experiments, and analyze and interpret their results. Capacity for analysis, synthesis and evaluation.

Technical Competences of each Specialization

Specific

  • CEC1 - Ability to apply scientific methodologies in the study and analysis of phenomena and systems in any field of Information Technology as well as in the conception, design and implementation of innovative and original computing solutions.
  • CEC2 - Capacity for mathematical modelling, calculation and experimental design in engineering technology centres and business, particularly in research and innovation in all areas of Computer Science.

Objectives

  1. Visual representation of the data
    Related competences: CG3, CTR4,
  2. Multivariate description of data
    Related competences: CG1, CG3, CEC1, CEC2, CTR4, CTR6,
  3. Multivariate inference
    Related competences: CG3, CEC1, CEC2, CTR6,
  4. Classification of new individuals
    Related competences: CG1, CG3, CEC1, CEC2, CTR6,

Contents

  1. Multivariate Data Analysis
    Advantages of the multivariate treatment. Examples of multivariate data. Probabilistic and distribution free methods. Exploratory versus modeling approach.
  2. Principal Component Analysis
    Analysis of individuals. Analysis of variables. Visual representation of the information. Dimensionality reduction. Supplementary information
  3. Singular Value Decomposition. Biplots
    Simultaneous representation of the rows and columns of a data table.
  4. Factor Analysis
    Latents constructs. Measurement model.
  5. Multidimensional Scaling
    Visualisation of link matrices
  6. Correspondence Analysis
    Analysis of frequency data
  7. Multiple Correspondence Analysis
    Analysis of categorical data
  8. Hierarchical clustering
    Synthesis of the represented information. Consolidation of the partition
  9. Multivariate normal distribution
    Definition and properties
  10. Sampling distibutions of the normal multivariate distribution
    Inferences respect to the covariance matrix. Inferences respect to the centroid of the distribution. Whishart distribution. T2 of Hotelling, Wilks lambda.
  11. Discriminant Analysis
    With the assumption of multivariate normal distribution. Linear discriminant analysis. Quadratic discriminant analysis.
  12. Naive Bayes
    Simplifying the linear discriminant analysis
  13. Discriminant analysis without probabilistic assumptions
    K nearest neighbor classifier
  14. Decision trees
    Classification and regression trees
  15. Association rules
    Apriori algorithm

Activities

Multivariate Data Analysis

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
2
Objectives: 1 2
Contents:

Principal Component Analysis

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
7
Objectives: 1 2
Contents:

Singular Value Decomposition. Biplots

Theory
1
Problems
0
Laboratory
1
Guided learning
0
Autonomous learning
5
Objectives: 1 2
Contents:

Factor Analysis

Theory
1
Problems
0
Laboratory
1
Guided learning
0
Autonomous learning
5
Objectives: 1 2
Contents:

Multidimensional Scaling

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
5
Objectives: 1 2
Contents:

Correspondence Analysis

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
5
Objectives: 1 2
Contents:

Multiple Correspondence Analysis

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
5
Objectives: 1 2
Contents:

Hierarchical Clustering

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
5
Objectives: 1 2
Contents:

Multivariate normal distribution

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
5
Objectives: 3
Contents:

Sampling distributions.

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
5
Objectives: 3
Contents:

Multivariate Statistical Tests

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
5
Objectives: 4
Contents:

Hotelling's T2

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
5
Objectives: 4
Contents:

Decision trees

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
5
Objectives: 4
Contents:

Association rules

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
5
Objectives: 4
Contents:

Teaching methodology

The aim of the course is to give the statistical foundations for data mining. Learning is done through a combination of theoretical explanation and its application to a real case. The lectures will develop the necessary scientific knowledge, while lab classes will be its application to solving problems of data mining. These problems constitute the practices of the subject, which will be developed in part during laboratory classes. The implementation of practices foster generic skills related to teamwork and presentation of results and serve to integrate different knowledge of the subject. The software used will be primarily R.

Evaluation methodology

The course evaluation will be based on the marks obtained in practical exercises conducted during the course, an examination grade and the grade obtained in the final practice.
Each practice will lead to the drafting of the relevant report writing and may be made jointly, up to a maximum of two students per group.
The exercises conducted throughout the course aim to consolidate the learning of multivariate techniques.
The final practice is that students show their maturity to solve a real problem using multivariate visualization techniques, "clustering" interpretation and prediction. Students will choose between different alternatives to solve the problem. This practice will be presented and publicly defended the student must answer any questions about the theoretical models and methods used in the solution. Practices are conducted using the software R.
The written test will be held the last day of class and evaluate the assimilation of the basic concepts of the subject. While the presentation of the second practice will be done during the examination period.

The exercises are weighted 30%, examination 40% respectively and practice final 30%.

Bibliografy

Basic:

  • Aprender de los Datos: El Análsis de Componentes Principales - Aluja Banet, Tomas y Morineau, Alain, EUB , 1999. ISBN: 84-8312-022-4
  • The Elements of statistical learning : data mining - Trevor Hastie, Robert Tibshirani, Jerome Friedman, Springer , 2001. ISBN: 0-387--95284-5
  • Applied Multivariate Statistical Analysis - Johnson, Richard A.; Wichern, Dean W. , Prentice Hall , 1998. ISBN: 0-13-834194-X
  • Multivariate Descriptive Statistical Analysis - Lebart, Ludovic; Morineau, Alain; Warwick, Kenneth, Wiley , 1984. ISBN: 0471867438
  • Construction and Assessment of Classification Rules - Hand, David J., Wleyy , 1997. ISBN: 0471965839
  • Exploratory Multivariate Analysis by Example Using R - HUSSON Fançois, LE Sébastien, PAGES Jérôme , CRC Press , 2011. ISBN:

Complementary:

  • Análsis de datos multivariantes - Peña, Daniel, McGraw Hill , 2002. ISBN: 84-481-3610-1
  • An R and S-PLUS Companion to Multivariate Analysis - Everitt, Brian, Springer , 2004. ISBN: 1852339292
  • Statistique exploratoire multidimensionnelle - Lebart, Ludovic; Morineau, Alain; Piron, Marie, Dunod , 1997. ISBN: 2100040014

Web links

Previous capacities

The course implies having previously done a basic course in statistics, programming and mathematics; in particular having adquired the following concepts:
- Average, covariance and correlation matrix.
- Hypothesis Test
- Matrix algebra, eigenvalues ​​and eigenvectors.,
- programing algorithms.
- multiple linear-regression