Statistical Modeling

Credits
6
Types
Compulsory
Requirements
Department
EIO
Statistical modeling is the second in a sequence of 4 undergraduate subjects devoted to statistics and data. As a continuation of the previous introductory subject in the field of probability and statistics, this subject provides training in the main statistical models that allow to extract knowledge from the data. Statistical modeling techniques are one of the fundamental pillars of the field of decision support, and intelligent data analysis. In this course we will see the main multivariate predictive models (general linear model), and descriptive (multivariate analysis and clustering), as well as notions of design of experiments that will be useful in the configuration of the data sets of training and validation of the models, not only for this subject, but also for those of machine learning that also see in the degree. It is included in the program of the subject dynamic modeling tools for temporary data. The tools seen in this subject will complement the views of machine learning and will be essential input for the subjects of the subject intelligent analysis of data and intelligent systems. decision support agents.

Teachers

Person in charge

  • Jordi Cortés Martínez ( )

Others

  • Dante Conti ( )
  • Karina Gibert Oliveras ( )

Weekly hours

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6

Competences

Transversal Competences

Transversals

  • CT3 [Avaluable] - Efficient oral and written communication. Communicate in an oral and written way with other people about the results of learning, thinking and decision making; Participate in debates on topics of the specialty itself.
  • CT4 [Avaluable] - Teamwork. Be able to work as a member of an interdisciplinary team, either as a member or conducting management tasks, with the aim of contributing to develop projects with pragmatism and a sense of responsibility, taking commitments taking into account available resources.
  • CT8 [Avaluable] - Gender perspective. An awareness and understanding of sexual and gender inequalities in society in relation to the field of the degree, and the incorporation of different needs and preferences due to sex and gender when designing solutions and solving problems.

Basic

  • CB3 - That students have the ability to gather and interpret relevant data (usually within their area of ??study) to make judgments that include a reflection on relevant social, scientific or ethical issues.
  • CB4 - That the students can transmit information, ideas, problems and solutions to a specialized and non-specialized public.

Technical Competences

Especifics

  • CE01 - To be able to solve the mathematical problems that may arise in the field of artificial intelligence. Apply knowledge from: algebra, differential and integral calculus and numerical methods; statistics and optimization.
  • CE09 - To ideate, design and integrate intelligent data analysis systems with their application in production and service environments.
  • CE20 - To select and put to use techniques of statistical modeling and data analysis, assessing the quality of the models, validating and interpreting.

Generic Technical Competences

Generic

  • CG2 - To use the fundamental knowledge and solid work methodologies acquired during the studies to adapt to the new technological scenarios of the future.
  • CG4 - Reasoning, analyzing reality and designing algorithms and formulations that model it. To identify problems and construct valid algorithmic or mathematical solutions, eventually new, integrating the necessary multidisciplinary knowledge, evaluating different alternatives with a critical spirit, justifying the decisions taken, interpreting and synthesizing the results in the context of the application domain and establishing methodological generalizations based on specific applications.
  • CG8 - Perform an ethical exercise of the profession in all its facets, applying ethical criteria in the design of systems, algorithms, experiments, use of data, in accordance with the ethical systems recommended by national and international organizations, with special emphasis on security, robustness , privacy, transparency, traceability, prevention of bias (race, gender, religion, territory, etc.) and respect for human rights.

Objectives

  1. Design solvent and goal-oriented test and training games
    Related competences: CG8, CT8, CB3, CE09,
  2. Identify which predictive model is appropriate for a specific problem and specific data
    Related competences: CG4, CE01, CE09, CE20,
  3. Construct and interpret valid models for the temporal evolution of a numerical variable
    Related competences: CG4, CT3, CT4, CE01, CE09, CE20,
  4. Identify classes in a data set and know how to validate and interpret them conceptually
    Related competences: CG2, CG4, CT3, CT4, CE01, CE09, CE20,
  5. Characterize multivariate relationships in a data set with factor analysis techniques
    Related competences: CG4, CT3, CT4, CE01, CE09, CE20,
  6. Be able to do basic unsupervised analysis of a textual database with basic techniques of topic modeling and multivariate analysis by textual data
    Related competences: CG4, CT3, CT4, CE01, CE09, CE20,
  7. Know how to build and validate the right model for a new real situation
    Related competences: CG2, CG4, CT3, CT4, CE01, CE09, CE20,
  8. Know how to integrate the contents of the different topics of this course and the previous ones in a global solution for a complex problem
    Related competences: CG2, CE01, CE09, CE20,
  9. Know how to plan in the long term the modeling of a real complex problem and solve it throughout the course as a team
    Related competences: CT3, CT4, CB4,

Contents

  1. Generalized linear models
    Introduction to the concepts of generalized linear models. Logistics models
  2. Time series
    Introduction to stochastic processes. Timeline vs. Time Series Box-Jenkins MethodologyMain models of time series: MA, AR, ARIMA, SARIMA (concept and case study)
  3. Factorial analysis
    Dimensionality reduction methods
  4. Clustering
    Introduction. Main classification models. Distances.
  5. Profiling
    Description of the classifications from the study of significance of variables
  6. Experimental design
    Complete and fractional 2k designs. Sensitivity and explicability analysis of the models. Identification of main effects and interactions. Design of training sets for machine learning. Design of test sets for validation of data models

Activities

Activity Evaluation act


Teamwork

Students are organized into groups and look for real data that meet certain requirements set by the teacher. They use them to apply the techniques and methodologies that are seen throughout the course. At the end they present a report with the results and make an oral presentation with the most relevant results of the study
Objectives: 1 2 3 4 5 6 7 8 9
Contents:
Theory
0h
Problems
0h
Laboratory
11h
Guided learning
0h
Autonomous learning
27.5h

Theory classes of the subject syllabus

Theory classes of the subject syllabus
Objectives: 2 3 4 5 6 7
Contents:
Theory
30h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
30h

Practical application syllabus subject

Run R code on the concepts seen in theory.

Theory
0h
Problems
0h
Laboratory
12.5h
Guided learning
0h
Autonomous learning
0h

Quiz 1

During the course there will be short answer tests to fix learning pieces. It will be done at the end of certain lab classes
Objectives: 2
Week: 4
Theory
0h
Problems
0h
Laboratory
0.5h
Guided learning
0h
Autonomous learning
0.5h

Quiz 2

During the course there will be short answer tests to fix learning pieces. It will be done at the end of certain lab classes
Objectives: 2 3
Week: 7
Theory
0h
Problems
0h
Laboratory
0.5h
Guided learning
0h
Autonomous learning
0.5h

Initial presentation of the practice

Initial presentation of the practice
Objectives: 1 2 3 4 5 6 7 8 9
Contents:
Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
10h

Quiz 3

During the course there will be short answer tests to fix learning pieces. It will be done at the end of certain lab classes
Objectives: 2 3
Week: 8
Theory
0h
Problems
0h
Laboratory
0.5h
Guided learning
0h
Autonomous learning
0.5h

Quiz 4

During the course there will be short answer tests to fix learning pieces. It will be done at the end of certain lab classes
Objectives: 4
Week: 11
Theory
0h
Problems
0h
Laboratory
0.5h
Guided learning
0h
Autonomous learning
0.5h

Quiz 5

During the course there will be short answer tests to fix learning pieces. It will be done at the end of certain lab classes
Objectives: 1
Week: 13
Theory
0h
Problems
0h
Laboratory
0.5h
Guided learning
0h
Autonomous learning
0.5h

Practical final presentation

Practical final presentation
Objectives: 2 3 4 5 6 7 8 9
Week: 14
Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
10h

Final Exam

Final Exam
Objectives: 1 2 3 4 5 6 7 8 9
Week: 15 (Outside class hours)
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
10h

Teaching methodology

The subject consists of two theory hours and two laboratory hours per week

The subject's website will contain the subject's calendar and the materials to prepare each class. The theory class will be mainly dedicated to explaining concepts and presenting cases and developing interactive activities with students such as discussing cases, developing problems.

In groups of 4, the students will carry out practical work with data that they will look for themselves and that will meet certain characteristics set by the teachers. With this data, each team will carry out practice sessions, each week applying the techniques of the topic worked on in the theory session. The teacher will monitor all the work teams weekly in the laboratory sessions.

In the middle and at the end of the course, the teams will present their results in a sharing session where all the projects will be discussed together.

Evaluation methodology

Ordinary Evaluation:
---------------------
(Q) Quizzes. 20%
(P) Project. 40%
(EF) Final Exam. 40%
Ordinary Final Grade = 0,2 * Q + 0,4 * P + 0,4 * EF

P. It consists of 5 individual and face-to-face questions with the same weight on the final Q grade.
Q = (Q1 + Q2 + Q3 + Q4 + Q5)/5

P. Group project where the following competences will be assessed: (P1) Data collection, analysis and interpretation of results (37.5%); (P2) Transmission of results (25%); (P3) Oral and written communication (12.5%); (P4) Teamwork (12.5%); (P5) Gender perspective (12.5%).
P = 0.375 * P1 + 0.25 * P2 + 0.125 * P3 + 0.125 * P4 + 0.125 * P5

You must obtain a minimum grade of 3.5 in the individual and face-to-face tests, i.e.,
1/3 * Q + 2/3 * EF > 3.5 to pass the course. On the other hand, the completion of the project will be mandatory in order to pass during the ordinary evaluation.

Extraordinary evaluation:
---------------------------------
(EF) Extraordinary Final Exam

Extraordinary Grade = Min{5, Max{EE, 0,2 * Q + 0,4 * P + 0,4 * EE}}

In this exam, there will be no minimum passing grade. The maximum grade for this exam is a 5.

Bibliography

Basic:

Previous capacities

Introduction to Statistics
Probability theory
statistical inference
simple statistical models
data visualization
basic programming
R basic skills
Algebra