Multivariate Analysis

Teachers
Weekly hours
Competences
Objectives
Contents
Activities
Teaching methodology
Evaluation methodology
Bibliography
Web links
Previous capacities

Credits

Types

Compulsory

Requirements

This subject has not requirements, but it has got previous capacities

Department

EIO

Web

https://www.fib.upc.edu/en/studies/masters/master-data-science/curriculum/syllabus/MVA-MDS

The aim of the course is to introduce students to the fundamentals of multivariate data analysis methods and to provide them with the tools to deal with pre-processing, visualization, dimension reduction, classification and modelling of multivariate data.

Teachers

Person in charge

Nihan Acar Denizli ( )

Others

Belchin Adriyanov Kostov ( )

Weekly hours

Theory

Problems

Laboratory

Guided learning

Autonomous learning

7.11

Competences

Transversal Competences

Information literacy

CT4 - Capacity for managing the acquisition, the structuring, analysis and visualization of data and information in the field of specialisation, and for critically assessing the results of this management.

Third language

CT5 - Achieving a level of spoken and written proficiency in a foreign language, preferably English, that meets the needs of the profession and the labour market.

Entrepreneurship and innovation

CT1 - Know and understand the organization of a company and the sciences that govern its activity; have the ability to understand labor standards and the relationships between planning, industrial and commercial strategies, quality and profit. Being aware of and understanding the mechanisms on which scientific research is based, as well as the mechanisms and instruments for transferring results among socio-economic agents involved in research, development and innovation processes.

Basic

CB6 - Ability to apply the acquired knowledge and capacity for solving problems in new or unknown environments within broader (or multidisciplinary) contexts related to their area of study.
CB7 - Ability to integrate knowledge and handle the complexity of making judgments based on information which, being incomplete or limited, includes considerations on social and ethical responsibilities linked to the application of their knowledge and judgments.
CB8 - Capability to communicate their conclusions, and the knowledge and rationale underpinning these, to both skilled and unskilled public in a clear and unambiguous way.
CB9 - Possession of the learning skills that enable the students to continue studying in a way that will be mainly self-directed or autonomous.
CB10 - Possess and understand knowledge that provides a basis or opportunity to be original in the development and/or application of ideas, often in a research context.

Generic Technical Competences

Generic

CG2 - Identify and apply methods of data analysis, knowledge extraction and visualization for data collected in disparate formats
CG3 - Define, design and implement complex systems that cover all phases in data science projects

Technical Competences

Especifics

CE5 - Model, design, and implement complex data systems, including data visualization
CE6 - Design the Data Science process and apply scientific methodologies to obtain conclusions about populations and make decisions accordingly, from both structured and unstructured data and potentially stored in heterogeneous formats.
CE7 - Identify the limitations imposed by data quality in a data science problem and apply techniques to smooth their impact
CE8 - Extract information from structured and unstructured data by considering their multivariate nature.
CE9 - Apply appropriate methods for the analysis of non-traditional data formats, such as processes and graphs, within the scope of data science
CE10 - Identify machine learning and statistical modeling methods to use and apply them rigorously in order to solve a specific data science problem
CE11 - Analyze and extract knowledge from unstructured information using natural language processing techniques, text and image mining
CE12 - Apply data science in multidisciplinary projects to solve problems in new or poorly explored domains from a data science perspective that are economically viable, socially acceptable, and in accordance with current legislation
CE13 - Identify the main threats related to ethics and data privacy in a data science project (both in terms of data management and analysis) and develop and implement appropriate measures to mitigate these threats

Objectives

Data visualisation
Related competences: CT4, CT5, CT1, CG2, CE5, CB8,
Multivariate description of data
Related competences: CT4, CE7, CE8, CE12, CE13, CB7, CB9, CB10,
Dimension Reduction Methods
Related competences: CT4, CT5, CG2, CE5, CE6, CE11, CE8, CE10, CB6, CB8, CB9, CB10,
Multivariate inference
Related competences: CT1, CG2, CG3, CE6, CE11, CE8, CE9, CE10, CB6, CB7, CB9,
Classification of new individuals
Related competences: CT1, CG3, CE6, CE10, CB6, CB7,

Introduction to Multivariate Data Analysis
Pre-processing and visualization of multivariate data.
Principal Component Analysis
Analysis of individuals. Analysis of variables. Visual representation of the information. Dimensionality reduction. Supplementary information. Singular value and spectral value decomposition.
Multidimensional Scaling
Dimension reduction based on similarity or distance matrices with applications.
Correspondence Analysis
Dimension reduction of two categorical variables and visualization of relationships between categories.
Multiple Correspondence Analysis
The analysis and visualization of relationships among categories of more than two categorical variables by using dimension reduction.
Cluster Analysis
The use of hierarchical and non-hierarchical clustering methods to classify observations into groups based on multivariate data.
Profiling methods
Profiling methods help to understand the common characteristics of clusters.
Multivariate normal distribution
The probability density function of multivariate normal distribution and hypothesis tests of mean for multivariate data.
Discriminant Analysis
Classification of observations into given groups by using linear discriminant analysis, quadratic discriminant analysis and Naive Bayes methods.
Association rules
Find common patterns, associations, correlations, or causal structures between sets of items or objects in transaction databases, relational databases, and other information repositories.

Activities

Activity Evaluation act

Introduction to the course + Multivariate Data Analysis

Objectives: 2 1
Contents:

1 . Introduction to Multivariate Data Analysis

Theory

Problems

Laboratory

Guided learning

Autonomous learning

5.5h

Principal Component Analysis

Objectives: 2 1 3

Theory

Problems

Laboratory

Guided learning

Autonomous learning

5.5h

Multidimensional Scaling

Objectives: 2 1 3
Contents:

3 . Multidimensional Scaling

Theory

Problems

Laboratory

Guided learning

Autonomous learning

5.5h

Correspondence Analysis and Multiple Correspondence Analysis

Objectives: 2 1 3

Theory

Problems

Laboratory

Guided learning

Autonomous learning

10h

Cluster Analysis and Profiling

Objectives: 2 5 3

Theory

Problems

Laboratory

Guided learning

Autonomous learning

10h

Multivariate Normal Distribution and Hypothesis Tests of Mean for Multivariate Data

Objectives: 2 4

Theory

Problems

Laboratory

Guided learning

Autonomous learning

5.5h

Discriminant Analysis

Objectives: 4 5 1

Theory

Problems

Laboratory

Guided learning

Autonomous learning

5.5h

Association Rules

Objectives: 2 4

Theory

Problems

Laboratory

Guided learning

Autonomous learning

5.5h

Session of Doubts

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Practics

Objectives: 2 1 3

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Task 1

The application and interpretation of dimension reduction methods seen through the first part of the course on a case study.
Objectives: 2 1 3
Week: 8 (Outside class hours)

Theory

Problems

Laboratory

Guided learning

Autonomous learning

7.5h

Task 2

In this task students should apply the methods of classification on a case study and interpret the results. This task is done in groups of three students.
Objectives: 4 5 3
Week: 13 (Outside class hours)

Theory

Problems

Laboratory

Guided learning

Autonomous learning

7.5h

Final Exam

In the final exam students will be responsible for all the methods they have seen throughout the semester. There will be both theoretical questions and interpretation questions based on R outputs in the exam.
Objectives: 2 4 5 1 3
Week: 15 (Outside class hours)

Theory

Problems

Laboratory

Guided learning

Autonomous learning

10h

Final Project

The final project includes application and interpretation of the multivariate data analysis methods on a real data set that could be selected based on students' interests. It should be done in groups of three students.
Objectives: 2 4 5 1 3
Week: 14 (Outside class hours)

Theory

Problems

Laboratory

Guided learning

Autonomous learning

18h

Teaching methodology

This course aims to give theoretical explanation of different methods for multivariate data analysis and their applications on real data sets. In tehory classes the fundamentals and theoratical structure of the methods will be explained while in the lab sessions the application of considered methods will be done on different data sets in R. The projects and the homeworks of the course will be done in groups which allows students to colloborate to construct a team work.

Evaluation methodology

During the course students should submit two homeworks (tasks) and a final project which should be done in groups of three students. The first homework focus on the application of dimension reduction methods while the second homework focuses on classification methods.In the final project of the course students should work on a real data set that they download or webscrapped and apply the methods seen during the course on the chosen data sets. The results should be presented in a report written in pdf format.

The overall grade of the students will be weighted 15% by the first task, 15% by the second task, %40 by the final project and 30% by the final exam.

Bibliography

Basic:

Applied multivariate statistical analysis - Johnson, Richard A.; Wichern, Dean W, Pearson Education Limited, [2014]. ISBN: 9781292024943
https://ebookcentral-proquest-com.recursos.biblioteca.upc.edu/lib/upcatalunya-ebooks/detail.action?pq-origsite=primo&docID=5174865
Exploratory multivariate analysis by example using R - Husson, F.; Lê, S.; Pagès, J, CRC Press, Taylor & Francis Group, 2017. ISBN: 9781315301860
https://ebookcentral-proquest-com.recursos.biblioteca.upc.edu/lib/upcatalunya-ebooks/detail.action?pq-origsite=primo&docID=4856173
Multivariate statistical methods : a primer - Manly, Bryan F. J, CRC Press, Taylor & Francis Group, [2017]. ISBN: 9781498728966
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004178359706711&context=L&vid=34CSUC_UPC:VU1&lang=ca

Complementary:

Análisis de datos multivariantes - Peña, Daniel, McGraw-Hill/Interamericana de España, S.L , [2010]. ISBN: 9788448136109
https://www-ingebook-com.recursos.biblioteca.upc.edu/ib/NPcd/IB_BooksVis?cod_primaria=1000187&codigo_libro=4203

Web links

Homepage of R https://cran.r-project.org/
R for Data Science (2e) https://r4ds.hadley.nz/
R Cookbook https://rc2e.com/
Rstudio homepage https://rstudio.com/

Previous capacities

The course implies having previously done a basic course in statistics, programming and mathematics; in particular having adquired the following concepts:
- Descriptive Statistical Analysis
- Hypothesis Tests
- Matrix algebra, eigenvalues ¿¿and eigenvectors.
- Programing algorithms.
- Multiple linear-regression.

Multivariate Analysis

You are here

Teachers

Person in charge

Others

Weekly hours

Competences

Transversal Competences

Information literacy

Third language

Entrepreneurship and innovation

Basic

Generic Technical Competences

Generic

Technical Competences

Especifics

Objectives

Contents

Activities

Introduction to the course + Multivariate Data Analysis

Principal Component Analysis

Multidimensional Scaling

Correspondence Analysis and Multiple Correspondence Analysis

Cluster Analysis and Profiling

Multivariate Normal Distribution and Hypothesis Tests of Mean for Multivariate Data

Discriminant Analysis

Association Rules

Session of Doubts

Practics

Task 1

Task 2

Final Exam

Final Project

Teaching methodology

Evaluation methodology

Bibliography

Basic:

Complementary:

Web links

Previous capacities

Where we are

Contact with us