Statistical Modeling

Credits
6
Types
Compulsory
Requirements
Department
EIO
Statistical modeling is the second in a sequence of 4 undergraduate subjects devoted to statistics and data. As a continuation of the previous introductory subject in the field of probability and statistics, this subject provides training in the main statistical models that allow to extract knowledge from the data. Statistical modeling techniques are one of the fundamental pillars of the field of decision support, and intelligent data analysis. In this course we will see the main multivariate predictive models (general linear model), and descriptive (multivariate analysis and clustering), as well as notions of design of experiments that will be useful in the configuration of the data sets of training and validation of the models, not only for this subject, but also for those of machine learning that also see in the degree. It is included in the program of the subject dynamic modeling tools for temporary data. The tools seen in this subject will complement the views of machine learning and will be essential input for the subjects of the subject intelligent analysis of data and intelligent systems. decision support agents.

Teachers

Person in charge

  • Karina Gibert Oliveras ( )
  • Sergi Ramirez Mitjans ( )

Weekly hours

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6

Competences

Transversal Competences

Transversals

  • CT3 [Avaluable] - Efficient oral and written communication. Communicate in an oral and written way with other people about the results of learning, thinking and decision making; Participate in debates on topics of the specialty itself.
  • CT4 [Avaluable] - Teamwork. Be able to work as a member of an interdisciplinary team, either as a member or conducting management tasks, with the aim of contributing to develop projects with pragmatism and a sense of responsibility, taking commitments taking into account available resources.
  • CT8 [Avaluable] - Gender perspective. An awareness and understanding of sexual and gender inequalities in society in relation to the field of the degree, and the incorporation of different needs and preferences due to sex and gender when designing solutions and solving problems.

Basic

  • CB3 - That students have the ability to gather and interpret relevant data (usually within their area of ??study) to make judgments that include a reflection on relevant social, scientific or ethical issues.
  • CB4 - That the students can transmit information, ideas, problems and solutions to a specialized and non-specialized public.

Technical Competences

Especifics

  • CE01 - To be able to solve the mathematical problems that may arise in the field of artificial intelligence. Apply knowledge from: algebra, differential and integral calculus and numerical methods; statistics and optimization.
  • CE09 - To ideate, design and integrate intelligent data analysis systems with their application in production and service environments.
  • CE20 - To select and put to use techniques of statistical modeling and data analysis, assessing the quality of the models, validating and interpreting.

Generic Technical Competences

Generic

  • CG2 - To use the fundamental knowledge and solid work methodologies acquired during the studies to adapt to the new technological scenarios of the future.
  • CG4 - Reasoning, analyzing reality and designing algorithms and formulations that model it. To identify problems and construct valid algorithmic or mathematical solutions, eventually new, integrating the necessary multidisciplinary knowledge, evaluating different alternatives with a critical spirit, justifying the decisions taken, interpreting and synthesizing the results in the context of the application domain and establishing methodological generalizations based on specific applications.
  • CG8 - Perform an ethical exercise of the profession in all its facets, applying ethical criteria in the design of systems, algorithms, experiments, use of data, in accordance with the ethical systems recommended by national and international organizations, with special emphasis on security, robustness , privacy, transparency, traceability, prevention of bias (race, gender, religion, territory, etc.) and respect for human rights.

Objectives

  1. Design solvent and goal-oriented test and training games
    Related competences: CG8, CT8, CB3, CE09,
  2. Identify which predictive model is appropriate for a specific problem and specific data
    Related competences: CG4, CE01, CE09, CE20,
  3. Construct and interpret valid models for the temporal evolution of a numerical variable
    Related competences: CG4, CT3, CT4, CE01, CE09, CE20,
  4. Identify classes in a data set and know how to validate and interpret them conceptually
    Related competences: CG2, CG4, CT3, CT4, CE01, CE09, CE20,
  5. Characterize multivariate relationships in a data set with factor analysis techniques
    Related competences: CG4, CT3, CT4, CE01, CE09, CE20,
  6. Be able to do basic unsupervised analysis of a textual database with basic techniques of topic modeling and multivariate analysis by textual data
    Related competences: CG4, CT3, CT4, CE01, CE09, CE20,
  7. Know how to build and validate the right model for a new real situation
    Related competences: CG2, CG4, CT3, CT4, CE01, CE09, CE20,
  8. Know how to integrate the contents of the different topics of this course and the previous ones in a global solution for a complex problem
    Related competences: CG2, CE01, CE09, CE20,
  9. Know how to plan in the long term the modeling of a real complex problem and solve it throughout the course as a team
    Related competences: CT3, CT4, CB4,

Contents

  1. Generalized linear models
    Introduction to the concepts of generalized linear models. Logistics models
  2. Experimental design
    Complete and fractional 2k designs. Sensitivity and explicability analysis of the models. Identification of main effects and interactions. Design of training sets for machine learning. Design of test sets for validation of data models
  3. Time series
    Introduction to stochastic processes. Timeline vs. Time Series Box-Jenkins MethodologyMain models of time series: MA, AR, ARIMA, SARIMA (concept and case study)
  4. Clustering
    Introduction. Main classification models. Distances.
  5. Profiling
    Description of the classifications from the study of significance of variables
  6. Factorial analysis
    Dimensionality reduction methods
  7. Textual analysis
    corpus preprocessing and stopwordterm document matrix ACP on this (document classification)

Activities

Activity Evaluation act


Teamwork

Students are organized into groups and look for real data that meet certain requirements set by the teacher. They use them to apply the techniques and methodologies that are seen throughout the course. At the end they present a report with the results and make an oral presentation with the most relevant results of the study
Objectives: 1 2 3 4 5 6 7 8 9
Contents:
Theory
0h
Problems
0h
Laboratory
27h
Guided learning
0h
Autonomous learning
41h

Quiz 1

During the course there will be short answer tests to fix learning pieces. It will be done at the end of certain lab classes
Objectives: 2
Week: 3
Type: theory exam
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h

Quiz 2

During the course there will be short answer tests to fix learning pieces. It will be done at the end of certain lab classes
Objectives: 2 3
Week: 8
Type: theory exam
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h

Quiz 3

During the course there will be short answer tests to fix learning pieces. It will be done at the end of certain lab classes
Objectives: 2 3
Week: 14
Type: theory exam
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h

Initial presentation of the practice

Initial presentation of the practice
Objectives: 1 2 3 4 5 6 7 8 9
Contents:
Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
10h

Practical final presentation

Practical final presentation

Week: 15 (Outside class hours)
Type: theory exam
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
10h

Theory classes of the subject syllabus

Theory classes of the subject syllabus
Objectives: 2 3 4 5 6 7
Contents:
Theory
30h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
30h

Teaching methodology

The course consists of two hours of theory and two laboratory hours per week. In theory classes, the inverted class scheme will be practiced whenever possible. bring preparations before each class. The master class scheme will be used on a one-off basis when the teacher needs to clarify complex concepts that have not become clear with the materials previously distributed in class. The theory class will be mainly dedicated to the presentation of cases and the development of interactive activities with students such as the discussion of cases, the development of problems or the completion of short questionnaires. Students will perform in large groups a practical work with data that they will look for themselves and that will fulfill certain characteristics set by the teaching staff. With this data each team will carry out the practice sessions, each week applying the techniques of the topic worked on in the theory session. The teacher will monitor all the work teams on a weekly basis in the laboratory sessions. At the end of the course, the teams will present their results in a pooling session where all the projects will be discussed together.

Evaluation methodology

(T) Teamwork done throughout the course 20%
(O) Oral knowledge control test 10% (discussion with the teacher in the oral presentation of the team work)
(WT) Quality and performance of the work team (TG). 10%
(C) Oral and written communication 10%
(E) Work team ethics and work itself 10%
(G) Gender perspective of the team and work 10%
(A) Attendance and participation in classes and laboratories (AP). 10%
(Q) 3 Quiz throughout the course 20%

N=0,2T+0,1*O+0,1*WT+0,1*C+0,1*E+0,1*G+0,1*A+0,2*Q

Q=(Q1+Q2`Q3)/3

Bibliography

Basic:

Previous capacities

Introduction to Statistics
Probability theory
statistical inference
simple statistical models
data visualization
basic programming
R basic skills
Algebra