Multivariate Analysis

Weekly hours
Objectives
Contents
Activities
Teaching methodology
Evaluation methodology
Bibliography
Web links
Previous capacities

Credits

Types

Specialization compulsory (Data Science)

Requirements

This subject has not requirements, but it has got previous capacities

Department

EIO

The objective of MVA is to provide the students with the knowledge of the statistical concepts of multivariate data analysis and their most basic methodologies and techniques, which constitute a core mainstream for Data Mining.

Weekly hours

Theory

Problems

Laboratory

Guided learning

0.15

Autonomous learning

7.39

Objectives

Multivariate description of data
Related competences: CG1, CG3, CEC1, CEC2, CTR4, CTR6,
Data visualisation
Related competences: CG3, CTR4,
Multivariate inference
Related competences: CG3, CEC1, CEC2, CTR6,
Classification of new individuals
Related competences: CG1, CG3, CEC1, CEC2, CTR6,

Introduction to Multivariate Data Analysis
Advantages of the multivariate treatment. Examples of multivariate data. Probabilistic and distribution free methods. Exploratory versus modeling approach.
Principal Component Analysis
Analysis of individuals. Analysis of variables. Visual representation of the information. Dimensionality reduction. Supplementary information
Correspondence Analysis
Correspondence analysis, also called reciprocal averaging, is a useful data science visualization technique for finding out and displaying the relationship between categories. It uses a graph that plots data, visually showing the outcome of two or more data points.
Factor Analysis
Dimension reduction method.
Multidimensional Scaling
This method deals with data relating to distances between elements. Usually uses data from distances or similarities. The method reveals a common structure of all the elements and the specificity of each of them, evidencing what makes them close or distant.
Hierarchical and Partitioning Clustering
Two approaches to clustering methods used to classify observations, within a data set, into multiple groups based on their similarity.
Model-based Clustering
Model-based clustering assumes that the data were generated by a model and tries to recover the original model from the data. The model that we recover from the data then defines clusters and an assignment of documents to clusters. A commonly used criterion for estimating the model parameters is maximum likelihood.
Multivariate normal distribution
Particularities of the normal distribution in the general case of multivariate approaches, where the points are distributed in several dimensions. This topic is not done specifically but transversally to all the contents of the course.
Discriminant Analysis and beyond
Discriminant Analysis (DA) is a classification method. DA classifies observations into non-overlapping groups, based on scores on one or more quantitative predictor variables. We will look at different techniques based on different discrimination algorithms
Classification and Regression Trees
This method can predict or classify. Explains how the values of a result variable can be predicted or classified based on other values. It has a very useful graphic structure.
Association rules
Find common patterns, associations, correlations, or causal structures between sets of items or objects in transaction databases, relational databases, and other information repositories.

Activities

Activity Evaluation act

Introduction to the course + Multivariate Data Analysis

Objectives: 2 1
Contents:

1 . Introduction to Multivariate Data Analysis

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Principal Component Analysis

Objectives: 2 1
Contents:

2 . Principal Component Analysis

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Correspondence Analysis

Objectives: 2 1
Contents:

3 . Correspondence Analysis

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Model-based Clustering

Objectives: 2 1
Contents:

7 . Model-based Clustering

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Factor Analysis

Objectives: 2 1
Contents:

4 . Factor Analysis

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Factor Analysis

Objectives: 2 4
Contents:

11 . Association rules

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Multidimensional Scaling

Objectives: 2 1
Contents:

5 . Multidimensional Scaling

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Discriminant Analysis

Objectives: 3 4
Contents:

9 . Discriminant Analysis and beyond

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Classification and Regression Trees

Objectives: 2 3 4
Contents:

10 . Classification and Regression Trees

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Hierarchical and Partitioning Clustering

Objectives: 2 4
Contents:

6 . Hierarchical and Partitioning Clustering

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Multivariate normal distribution

Objectives: 2 4
Contents:

8 . Multivariate normal distribution

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Association rules

Objectives: 4
Contents:

11 . Association rules

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Final Practical Work

Week: 18

Theory

Problems

Laboratory

Guided learning

1.9h

Autonomous learning

13h

Quiz

Week: 14

Theory

Problems

Laboratory

Guided learning

Autonomous learning

13.1h

Summary and Practice. 1st part

Objectives: 2 1 3 4
Contents:

3 . Correspondence Analysis
4 . Factor Analysis
5 . Multidimensional Scaling
6 . Hierarchical and Partitioning Clustering
7 . Model-based Clustering

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Summary and Practice. 2nd part

Objectives: 2 1 3 4
Contents:

8 . Multivariate normal distribution
9 . Discriminant Analysis and beyond
10 . Classification and Regression Trees
11 . Association rules

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Practice doubts

Objectives: 2 1 3 4
Contents:

1 . Introduction to Multivariate Data Analysis
2 . Principal Component Analysis
3 . Correspondence Analysis
4 . Factor Analysis
5 . Multidimensional Scaling
6 . Hierarchical and Partitioning Clustering
7 . Model-based Clustering
8 . Multivariate normal distribution
9 . Discriminant Analysis and beyond
10 . Classification and Regression Trees
11 . Association rules

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Teaching methodology

The course aims to give the statistical foundations for data mining. Learning is done through a combination of theoretical explanation and its application to a real case. The lectures will develop the necessary scientific knowledge, while lab classes will be its application to solving problems of data mining. The implementation of practices fosters generic skills related to teamwork and presentation of results and serve to integrate different knowledge of the subject. The software used will be primarily R & RStudio.

Evaluation methodology

The course evaluation will be based on the marks obtained in practical exercises conducted during the course, a theory grade, and the grade obtained in the final practice.
Each practice will lead to the drafting of the relevant report writing and may be made jointly, up to a maximum of four students per group.
The exercises conducted throughout the course aim to consolidate the learning of multivariate techniques.
The final practice is that students show their maturity to solve a real problem using multivariate visualisation techniques, clustering interpretation, and prediction. Students will choose between different alternatives to solve the problem. This practice will be presented and publicly defended, in which the student must answer any questions about the theoretical models and methods used in the solution. Practices are conducted using the software R.
The written tests will evaluate the assimilation of the basic concepts of the subject. There will be three tests during the curse, in theory class. While the presentation of the practice will be done during the examination period.

The exercises performed during the course have a weighting of 30%, the theory of 30%, and the final practice of 40%.

Bibliography

Basic:

The Elements of statistical learning : data mining, inference, and prediction - Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome, Springer, cop. 2009. ISBN: 9780387848570
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991003549679706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Applied multivariate statistical analysis - Johnson, Richard A.; Wichern, Dean W, Pearson Education Limited, [2014]. ISBN: 9781292024943
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004175889706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Exploratory multivariate analysis by example using R - Husson, François; Lê, Sébastien; Pagès, Jérôme, CRC Press, Taylor & Francis Group, 2017. ISBN: 9781315301860
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991001358859706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Discovering knowledge in data : an introduction to data mining - Larose, D.T.; Larose, C.D, John Wiley & Sons, 2014. ISBN: 9781118874059
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991001810009706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Multivariate statistical methods : a primer - Manly, Bryan F. J, CRC Press, Taylor & Francis Group, [2017]. ISBN: 9781498728966
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004178359706711&context=L&vid=34CSUC_UPC:VU1&lang=ca

Complementary:

Análisis de datos multivariantes - Peña, Daniel, McGraw-Hill/Interamericana de España, S.L , [2010]. ISBN: 9788448136109
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991002497609706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
An R and S-PLUS companion to multivariate analysis - Everitt, Brian, Springer , 2005. ISBN: 1852338822
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991002936809706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Aprender de los datos : el análisis de componentes principales : una aproximación desde el Data Mining - Aluja Banet, Tomàs; Morineau, Alain, EUB , 1999. ISBN: 8483120224
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991001877509706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Construction and assessment of classification rules - Hand, D. J, Wiley , cop. 1997. ISBN: 0471965839
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991001900839706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Multivariate descriptive statistical analysis : correspondence analysis and related techniques for large matrices - Lebart, Ludovic; Morineau, Alain; Warwick, Kenneth M, John Wiley and Sons , cop. 1984. ISBN: 0471867438
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991000022249706711&context=L&vid=34CSUC_UPC:VU1&lang=ca

Web links

Homepage of R https://cran.r-project.org/
Rstudio homepage https://rstudio.com/

Previous capacities

The course implies having previously done a basic course in statistics, programming and mathematics; in particular having adquired the following concepts:
- Average, covariance and correlation matrix.
- Hypothesis Test
- Matrix algebra, eigenvalues and eigenvectors.,
- programing algorithms.
- multiple linear-regression

Multivariate Analysis

Weekly hours

Objectives

Contents

Activities

Introduction to the course + Multivariate Data Analysis

Principal Component Analysis

Correspondence Analysis

Model-based Clustering

Factor Analysis

Factor Analysis

Multidimensional Scaling

Discriminant Analysis

Classification and Regression Trees

Hierarchical and Partitioning Clustering

Multivariate normal distribution

Association rules

Final Practical Work

Quiz

Summary and Practice. 1st part

Summary and Practice. 2nd part

Practice doubts

Teaching methodology

Evaluation methodology

Bibliography

Basic:

Complementary:

Web links

Previous capacities

Where we are

Contact with us

Multivariate Analysis

You are here

Weekly hours

Objectives

Contents

Activities

Introduction to the course + Multivariate Data Analysis

Principal Component Analysis

Correspondence Analysis

Model-based Clustering

Factor Analysis

Factor Analysis

Multidimensional Scaling

Discriminant Analysis

Classification and Regression Trees

Hierarchical and Partitioning Clustering

Multivariate normal distribution

Association rules

Final Practical Work

Quiz

Summary and Practice. 1st part

Summary and Practice. 2nd part

Practice doubts

Teaching methodology

Evaluation methodology

Bibliography

Basic:

Complementary:

Web links

Previous capacities

Where we are

Contact with us