Credits
6
Types
Compulsory
Requirements
This subject has not requirements
, but it has got previous capacities
Department
EIO
Web
https://www.fib.upc.edu/en/studies/masters/master-data-science/curriculum/syllabus/MVA-MDS
Teachers
Person in charge
- Nihan Acar Denizli ( nihan.acar.denizli@upc.edu )
Others
- Belchin Adriyanov Kostov ( belchin.adriyanov.kostov@upc.edu )
Weekly hours
Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
7.11
Competences
Information literacy
Third language
Entrepreneurship and innovation
Basic
Generic
Especifics
Objectives
-
Data visualisation
Related competences: CT4, CT5, CT1, CG2, CE5, CB8, -
Multivariate description of data
Related competences: CT4, CE7, CE8, CE12, CE13, CB7, CB9, CB10, -
Dimension Reduction Methods
Related competences: CT4, CT5, CG2, CE5, CE6, CE11, CE8, CE10, CB6, CB8, CB9, CB10, -
Multivariate inference
Related competences: CT1, CG2, CG3, CE6, CE11, CE8, CE9, CE10, CB6, CB7, CB9, -
Classification of new individuals
Related competences: CT1, CG3, CE6, CE10, CB6, CB7,
Contents
-
Introduction to Multivariate Data Analysis
Pre-processing and visualization of multivariate data. -
Principal Component Analysis
Analysis of individuals. Analysis of variables. Visual representation of the information. Dimensionality reduction. Supplementary information. Singular value and spectral value decomposition. -
Multidimensional Scaling
Dimension reduction based on similarity or distance matrices with applications. -
Correspondence Analysis
Dimension reduction of two categorical variables and visualization of relationships between categories. -
Multiple Correspondence Analysis
The analysis and visualization of relationships among categories of more than two categorical variables by using dimension reduction. -
Cluster Analysis
The use of hierarchical and non-hierarchical clustering methods to classify observations into groups based on multivariate data. -
Profiling methods
Profiling methods help to understand the common characteristics of clusters. -
Multivariate normal distribution
The probability density function of multivariate normal distribution and hypothesis tests of mean for multivariate data. -
Discriminant Analysis
Classification of observations into given groups by using linear discriminant analysis, quadratic discriminant analysis and Naive Bayes methods. -
Association rules
Find common patterns, associations, correlations, or causal structures between sets of items or objects in transaction databases, relational databases, and other information repositories.
Activities
Activity Evaluation act
Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
5.5h
Session of Doubts
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h
Final Exam
In the final exam students will be responsible for all the methods they have seen throughout the semester. There will be both theoretical questions and interpretation questions based on R outputs in the exam.Objectives: 2 4 5 1 3
Week: 15 (Outside class hours)
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h
Final Project
The final project includes application and interpretation of the multivariate data analysis methods on a real data set that could be selected based on students' interests. It should be done in groups of three students.Objectives: 2 4 5 1 3
Week: 14 (Outside class hours)
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h
Teaching methodology
This course aims to give theoretical explanation of different methods for multivariate data analysis and their applications on real data sets. In tehory classes the fundamentals and theoratical structure of the methods will be explained while in the lab sessions the application of considered methods will be done on different data sets in R. The projects and the homeworks of the course will be done in groups which allows students to colloborate to construct a team work.Evaluation methodology
During the course students should submit two homeworks (tasks) and a final project which should be done in groups of three students. The first homework focus on the application of dimension reduction methods while the second homework focuses on classification methods.In the final project of the course students should work on a real data set that they download or webscrapped and apply the methods seen during the course on the chosen data sets. The results should be presented in a report written in pdf format.The overall grade of the students will be weighted 15% by the first task, 15% by the second task, %40 by the final project and 30% by the final exam.
Bibliography
Basic
-
Applied multivariate statistical analysis
- Johnson, Richard A.; Wichern, Dean W,
Pearson Education Limited,
[2014].
ISBN: 9781292024943
https://ebookcentral-proquest-com.recursos.biblioteca.upc.edu/lib/upcatalunya-ebooks/detail.action?pq-origsite=primo&docID=5174865 -
Exploratory multivariate analysis by example using R
- Husson, F.; Lê, S.; Pagès, J,
CRC Press, Taylor & Francis Group,
2017.
ISBN: 9781315301860
https://ebookcentral-proquest-com.recursos.biblioteca.upc.edu/lib/upcatalunya-ebooks/detail.action?pq-origsite=primo&docID=4856173 -
Multivariate statistical methods : a primer
- Manly, Bryan F. J,
CRC Press, Taylor & Francis Group,
[2017].
ISBN: 9781498728966
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004178359706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Complementary
-
Análisis de datos multivariantes
- Peña, Daniel,
McGraw-Hill/Interamericana de España, S.L,
[2010].
ISBN: 9788448136109
https://www-ingebook-com.recursos.biblioteca.upc.edu/ib/NPcd/IB_BooksVis?cod_primaria=1000187&codigo_libro=4203
Web links
- Homepage of R https://cran.r-project.org/
- R for Data Science (2e) https://r4ds.hadley.nz/
- R Cookbook https://rc2e.com/
- Rstudio homepage https://rstudio.com/
Previous capacities
The course implies having previously done a basic course in statistics, programming and mathematics; in particular having adquired the following concepts:- Descriptive Statistical Analysis
- Hypothesis Tests
- Matrix algebra, eigenvalues ¿¿and eigenvectors.
- Programing algorithms.
- Multiple linear-regression.