Statistical modeling is the second in a sequence of 4 undergraduate subjects devoted to statistics and data. As a continuation of the previous introductory subject in the field of probability and statistics, this subject provides training in the main statistical models that allow to extract knowledge from the data. Statistical modeling techniques are one of the fundamental pillars of the field of decision support, and intelligent data analysis. In this course we will see the main multivariate predictive models (general linear model), and descriptive (multivariate analysis and clustering), as well as notions of design of experiments that will be useful in the configuration of the data sets of training and validation of the models, not only for this subject, but also for those of machine learning that also see in the degree. It is included in the program of the subject dynamic modeling tools for temporary data. The tools seen in this subject will complement the views of machine learning and will be essential input for the subjects of the subject intelligent analysis of data and intelligent systems. decision support agents.
Teachers
Person in charge
Jordi Cortés Martínez (
)
Others
Dante Conti (
)
Karina Gibert Oliveras (
)
Weekly hours
Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6
Competences
Transversal Competences
Transversals
CT3 [Avaluable] - Efficient oral and written communication. Communicate in an oral and written way with other people about the results of learning, thinking and decision making; Participate in debates on topics of the specialty itself.
CT4 [Avaluable] - Teamwork. Be able to work as a member of an interdisciplinary team, either as a member or conducting management tasks, with the aim of contributing to develop projects with pragmatism and a sense of responsibility, taking commitments taking into account available resources.
CT8 [Avaluable] - Gender perspective. An awareness and understanding of sexual and gender inequalities in society in relation to the field of the degree, and the incorporation of different needs and preferences due to sex and gender when designing solutions and solving problems.
Basic
CB3 - That students have the ability to gather and interpret relevant data (usually within their area of ??study) to make judgments that include a reflection on relevant social, scientific or ethical issues.
CB4 - That the students can transmit information, ideas, problems and solutions to a specialized and non-specialized public.
Technical Competences
Especifics
CE01 - To be able to solve the mathematical problems that may arise in the field of artificial intelligence. Apply knowledge from: algebra, differential and integral calculus and numerical methods; statistics and optimization.
CE09 - To ideate, design and integrate intelligent data analysis systems with their application in production and service environments.
CE20 - To select and put to use techniques of statistical modeling and data analysis, assessing the quality of the models, validating and interpreting.
Generic Technical Competences
Generic
CG2 - To use the fundamental knowledge and solid work methodologies acquired during the studies to adapt to the new technological scenarios of the future.
CG4 - Reasoning, analyzing reality and designing algorithms and formulations that model it. To identify problems and construct valid algorithmic or mathematical solutions, eventually new, integrating the necessary multidisciplinary knowledge, evaluating different alternatives with a critical spirit, justifying the decisions taken, interpreting and synthesizing the results in the context of the application domain and establishing methodological generalizations based on specific applications.
CG8 - Perform an ethical exercise of the profession in all its facets, applying ethical criteria in the design of systems, algorithms, experiments, use of data, in accordance with the ethical systems recommended by national and international organizations, with special emphasis on security, robustness , privacy, transparency, traceability, prevention of bias (race, gender, religion, territory, etc.) and respect for human rights.
Objectives
Design solvent and goal-oriented test and training games
Related competences:
CG8,
CT8,
CB3,
CE09,
Identify which predictive model is appropriate for a specific problem and specific data
Related competences:
CG4,
CE01,
CE09,
CE20,
Construct and interpret valid models for the temporal evolution of a numerical variable
Related competences:
CG4,
CT3,
CT4,
CE01,
CE09,
CE20,
Identify classes in a data set and know how to validate and interpret them conceptually
Related competences:
CG2,
CG4,
CT3,
CT4,
CE01,
CE09,
CE20,
Characterize multivariate relationships in a data set with factor analysis techniques
Related competences:
CG4,
CT3,
CT4,
CE01,
CE09,
CE20,
Be able to do basic unsupervised analysis of a textual database with basic techniques of topic modeling and multivariate analysis by textual data
Related competences:
CG4,
CT3,
CT4,
CE01,
CE09,
CE20,
Know how to build and validate the right model for a new real situation
Related competences:
CG2,
CG4,
CT3,
CT4,
CE01,
CE09,
CE20,
Know how to integrate the contents of the different topics of this course and the previous ones in a global solution for a complex problem
Related competences:
CG2,
CE01,
CE09,
CE20,
Know how to plan in the long term the modeling of a real complex problem and solve it throughout the course as a team
Related competences:
CT3,
CT4,
CB4,
Contents
Generalized linear models
Introduction to the concepts of generalized linear models. Logistics models
Time series
Introduction to stochastic processes. Timeline vs. Time Series Box-Jenkins MethodologyMain models of time series: MA, AR, ARIMA, SARIMA (concept and case study)
Clustering
Introduction. Main classification models. Distances.
Profiling
Description of the classifications from the study of significance of variables
Experimental design
Complete and fractional 2k designs. Sensitivity and explicability analysis of the models. Identification of main effects and interactions. Design of training sets for machine learning. Design of test sets for validation of data models
Activities
ActivityEvaluation act
Teamwork
Students are organized into groups and look for real data that meet certain requirements set by the teacher. They use them to apply the techniques and methodologies that are seen throughout the course. At the end they present a report with the results and make an oral presentation with the most relevant results of the study Objectives:123456789 Contents:
During the course there will be short answer tests to fix learning pieces. It will be done at the end of certain lab classes Objectives:23 Week:
8
Theory
0h
Problems
0h
Laboratory
0.5h
Guided learning
0h
Autonomous learning
0.5h
Quiz 4
During the course there will be short answer tests to fix learning pieces. It will be done at the end of certain lab classes Objectives:4 Week:
11
Theory
0h
Problems
0h
Laboratory
0.5h
Guided learning
0h
Autonomous learning
0.5h
Quiz 5
During the course there will be short answer tests to fix learning pieces. It will be done at the end of certain lab classes Objectives:1 Week:
13
Theory
0h
Problems
0h
Laboratory
0.5h
Guided learning
0h
Autonomous learning
0.5h
Practical final presentation
Practical final presentation Objectives:23456789 Week:
14
Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
10h
Final Exam
Final Exam Objectives:123456789 Week:
15 (Outside class hours)
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
10h
Teaching methodology
The subject consists of two theory hours and two laboratory hours per week
The subject's website will contain the subject's calendar and the materials to prepare each class. The theory class will be mainly dedicated to explaining concepts and presenting cases and developing interactive activities with students such as discussing cases, developing problems.
In groups of 4, the students will carry out practical work with data that they will look for themselves and that will meet certain characteristics set by the teachers. With this data, each team will carry out practice sessions, each week applying the techniques of the topic worked on in the theory session. The teacher will monitor all the work teams weekly in the laboratory sessions.
In the middle and at the end of the course, the teams will present their results in a sharing session where all the projects will be discussed together.
Evaluation methodology
Ordinary Evaluation:
---------------------
(Q) Quizzes. 20%
(P) Project. 40%
(EF) Final Exam. 40%
Ordinary Final Grade = 0,2 * Q + 0,4 * P + 0,4 * EF
P. It consists of 5 individual and face-to-face questions with the same weight on the final Q grade.
Q = (Q1 + Q2 + Q3 + Q4 + Q5)/5
P. Group project where the following competences will be assessed: (P1) Data collection, analysis and interpretation of results (37.5%); (P2) Transmission of results (25%); (P3) Oral and written communication (12.5%); (P4) Teamwork (12.5%); (P5) Gender perspective (12.5%).
P = 0.375 * P1 + 0.25 * P2 + 0.125 * P3 + 0.125 * P4 + 0.125 * P5
You must obtain a minimum grade of 3.5 in the individual and face-to-face tests, i.e.,
1/3 * Q + 2/3 * EF > 3.5 to pass the course. On the other hand, the completion of the project will be mandatory in order to pass during the ordinary evaluation.
Extraordinary evaluation:
---------------------------------
(EF) Extraordinary Final Exam
Practical time series analysis: prediction with statistics and machine learning -
Nielsen, Aileen,
O'Reilly Media, Inc, 2019. ISBN: 9781492041658
Previous capacities
Introduction to Statistics
Probability theory
statistical inference
simple statistical models
data visualization
basic programming
R basic skills
Algebra