Skip to main content

Data Analysis

Credits
6
Types
Compulsory
Requirements
This subject has not requirements , but it has got previous capacities
Department
EIO
The aim of the course on Data Analysis is to provide the philosophy and the main methods for extracting the information contained in the data. It covers the preparation of the data, the exploratory analysis, the visualization of the information, the modeling of patterns and its implementation in computer systems.

Teachers

Person in charge

Others

Weekly hours

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6

Competences

Technical competencies

  • CE1 - Skillfully use mathematical concepts and methods that underlie the problems of science and data engineering.
  • CE2 - To be able to program solutions to engineering problems: Design efficient algorithmic solutions to a given computational problem, implement them in the form of a robust, structured and maintainable program, and check the validity of the solution.
  • CE3 - Analyze complex phenomena through probability and statistics, and propose models of these types in specific situations. Formulate and solve mathematical optimization problems.
  • CE4 - Use current computer systems, including high performance systems, for the process of large volumes of data from the knowledge of its structure, operation and particularities.
  • CE8 - Ability to choose and employ techniques of statistical modeling and data analysis, evaluating the quality of the models, validating and interpreting them.
  • Transversals

  • CT3 - Efficient oral and written communication. Communicate in an oral and written way with other people about the results of learning, thinking and decision making; Participate in debates on topics of the specialty itself.
  • CT4 - Teamwork. Be able to work as a member of an interdisciplinary team, either as a member or conducting management tasks, with the aim of contributing to develop projects with pragmatism and a sense of responsibility, taking commitments taking into account available resources.
  • CT5 [Avaluable] - Solvent use of information resources. Manage the acquisition, structuring, analysis and visualization of data and information in the field of specialty and critically evaluate the results of such management.
  • CT6 - Autonomous Learning. Detect deficiencies in one's own knowledge and overcome them through critical reflection and the choice of the best action to extend this knowledge.
  • CT7 [Avaluable] - Third language. Know a third language, preferably English, with an adequate oral and written level and in line with the needs of graduates.
  • Basic

  • CB2 - That the students know how to apply their knowledge to their work or vocation in a professional way and possess the skills that are usually demonstrated through the elaboration and defense of arguments and problem solving within their area of ??study.
  • CB4 - That the students can transmit information, ideas, problems and solutions to a specialized and non-specialized public.
  • Generic

  • CG1 - To design computer systems that integrate data of provenances and very diverse forms, create with them mathematical models, reason on these models and act accordingly, learning from experience.
  • CG2 - Choose and apply the most appropriate methods and techniques to a problem defined by data that represents a challenge for its volume, speed, variety or heterogeneity, including computer, mathematical, statistical and signal processing methods.
  • CG3 - Work in multidisciplinary teams and projects related to the processing and exploitation of complex data, interacting fluently with engineers and professionals from other disciplines.
  • CG4 - Identify opportunities for innovative data-driven applications in evolving technological environments.
  • Objectives

    1. Exploratory Data Analysis
      Related competences: CB2, CB4, CT3, CT5, CT6, CT7, CE1, CE2, CE3, CE4, CE8, CG1, CG3, CG4,
      Subcompetences
      • Clustering. Profiling.
      • Pre-processing. Outliers, missing values. Transformations
      • PCA, SVD, Factor Analysis. Multidmensional Scaling.
      • Correspondence Analysis. Multiple Correspondence Analysis.
    2. Discriminant Analysis with probabilistic hypothesis
      Related competences: CT3, CT4, CT5, CT6, CT7, CE1, CE3, CE8, CG2, CG3,
      Subcompetences
      • Linear Discriminat Analisis, Discriminació de Fisher. Quadratic Discriminant Analisis.
      • Normal multivariate distribution. Sampling distributions.
    3. Multivariate modeling
      Related competences: CT4, CT6, CT7, CE1, CE3, CE8, CG1, CG2, CB2, CG4,
      Subcompetences
      • Multivariate Regression
      • Canonical Correlation Analysis
      • Principal Component Regression, Partial Least Squares Regression
    4. Time series
      Related competences: CT6, CE1, CE3, CE8,
      Subcompetences
      • Outlier, Calendar Effects and Intervention Analysis
      • Univariate models of time series
      • Applications of the Kalman Filter

    Contents

    1. Data preprocessing
      Outliers, missing data and transformations
    2. Principal component analysis
      Multivariate description of a table of continous variables. Regression with principal components.
    3. Factor analysis
      The singular value decomposition, biplots, factor analysis
    4. Multidimensional scaling (MDS)
      Distance measures. Metric multidimensional scaling. Algorithms.
    5. Cluster analysis
      Hierarchical clustering techniques. Agglomeration methods. Ward's criterion. Dendrogram.
    6. Correspondence analysis
      Contingency tables. Row and column profiles. Independence and chi-square statistics. Simple correspondence analysis. Biplot.
    7. Discriminant analysis
      Multivariate normal distribution. Fisher's linear discriminant analysis.
    8. Univariate time series models
      Exponential smoothing, ARIMA models
    9. Intervention analysis
      Outliers, seasonal effects, intervention analysis.

    Activities

    Activity Evaluation act


    Data preprocessing

    Practical on data preprocessing
    Objectives: 1
    Contents:
    Theory
    4h
    Problems
    0h
    Laboratory
    4h
    Guided learning
    0h
    Autonomous learning
    4h

    Principal component analysis

    Application of principal component analysis in practical data analysis
    Objectives: 1
    Contents:
    Theory
    4h
    Problems
    0h
    Laboratory
    4h
    Guided learning
    0h
    Autonomous learning
    6h

    Factor analysis

    Practical data analysis using the method
    Objectives: 1
    Contents:
    Theory
    2h
    Problems
    0h
    Laboratory
    3h
    Guided learning
    0h
    Autonomous learning
    4h

    Multidimensional scaling

    Analysis of distance matrices with this method
    Objectives: 1
    Contents:
    Theory
    2h
    Problems
    0h
    Laboratory
    2h
    Guided learning
    0h
    Autonomous learning
    4h

    Clustering

    Application of the method to quantitative data matrices.

    Theory
    4h
    Problems
    0h
    Laboratory
    4h
    Guided learning
    0h
    Autonomous learning
    4h

    Correspondence Analysis

    Application of the method with cross tables.
    Objectives: 2
    Contents:
    Theory
    2h
    Problems
    0h
    Laboratory
    2h
    Guided learning
    0h
    Autonomous learning
    4h

    Discriminant Analysis

    Application of the method to empirical data sets
    Objectives: 2
    Contents:
    Theory
    4h
    Problems
    0h
    Laboratory
    4h
    Guided learning
    0h
    Autonomous learning
    4h

    Univariate time series models

    Fitting time series models to data sets on the computer
    Objectives: 4
    Contents:
    Theory
    4h
    Problems
    0h
    Laboratory
    4h
    Guided learning
    0h
    Autonomous learning
    6h

    Intervention analysis

    Application of intervention analysis to real data sets
    Objectives: 4
    Contents:
    Theory
    2h
    Problems
    0h
    Laboratory
    3h
    Guided learning
    0h
    Autonomous learning
    4h

    Practical on exploratory data analysis

    Student do an exploratory analysis of a data set and hand in a questionnaire about it.
    Objectives: 1 2 3 4
    Week: 8 (Outside class hours)
    Theory
    0h
    Problems
    0h
    Laboratory
    0h
    Guided learning
    0h
    Autonomous learning
    0h

    Project

    Students realize, in couples, a complete multivariate study of a certain dataset using the techniques they studied during the course, and hand in a written report about it.
    Objectives: 1 2 3 4
    Week: 15 (Outside class hours)
    Theory
    0h
    Problems
    0h
    Laboratory
    0h
    Guided learning
    0h
    Autonomous learning
    0h

    Exam concering basic concepts

    There are two exams related to the theoretical concepts of the course.
    Objectives: 1 2 3 4
    Week: 14
    Theory
    0h
    Problems
    0h
    Laboratory
    0h
    Guided learning
    0h
    Autonomous learning
    0h

    Teaching methodology

    The learning process is a combination of theoretical explanation and practical application. The theory classes are used to explain the basic scientific contents of the course, whereas the laboratory sessions work on their application to solve real-life problems.

    Practicals and project form the basis for working out the transversal competences of the students, related to team-work and public presentation of results. Practicals and project also serve to integrate the different pieces of knowledge of the course.

    For hands-on computer training we use the R statistical environment.

    Evaluation methodology

    The student's final grade for the course is based on grades obtained for weekly homework assignments (25%), a partial exam half-way the course (25%), a final exam covering the second half of the course (25%) and a project (25%).

    Each weekly assignments consists of resolving a questionnaire. These assigments aim at consolidating knowledge of the techniques exposed in the theoretical sessions. The assignments require analysis of datasets in the statistical environment R.

    A project is carried out by a group of two students, and students have to show they can resolve problems with the techniques they have learned during the course. Each group hands in a written report about their project at the end of the course.

    The two exams will be programmed according to the calendar of the faculty, and evaluate if students have assimilated the basic concepts of the material of the course.

    For the resit exam, the student can choose to do a re-examination of only the first partial (25%), or of only the second partial (25%), or of both partials (50%). The re-evaluation thus represents at most 50% of the final course grade.

    Bibliography

    Basic

    Complementary

    Previous capacities

    Knowledge of basic statistical concepts, descriptive statistics, hypothesis testing. Familiarity with the statistical software R.