Statistics is the oldest science to extract information from data and for decades it has been used to understand reality and extract models that allow a better understanding of reality, to make predictions and provide elements describing reality useful in decision-making, especially those that refer to situations of high levels of complexity.
This subject establishes methodological bases of widely usefull in the observation of the reality and the informed decision-making and also prepares to approach more complex models that will appear in later subjects.
Statistical methods play a nuclear role in many of the artificial intelligence or intelligent data analysis methods that will be seen in later courses. For this reason, this subject is providing formal bases to enable a proper engagement with later subjects. It also provides skills to solve real problems that require basic statistics. Thus, the basic statistical principles will be introduced from the perspective of the support they can provide to the analysis of artificial intelligence problems.
This subject provides basic tools for data processing, criteria for sample selection or for construction of experiments to verify specific hypotheses through data in real applications, tests and hypothesis tests associated with statistical inference about data and necessary in statistical learning, among others, basic statistical models that will be completed in the subsequent course on statistical modeling and that will be used in further courses of the degree like intelligent data analysis and machine learning, among others.
The subject will be eminently applied and will focus on sreal problem-solving in the field of AI using basic statistical methods.
Teachers
Person in charge
Mireia Besalú Mayol (
)
Others
Dante Conti (
)
Karina Gibert Oliveras (
)
Miquel Umbert Bosch (
)
Sonia Garcia Esteban (
)
Xavier Angerri Torredeflot (
)
Weekly hours
Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6
Competences
Transversal Competences
Transversals
CT3 [Avaluable] - Efficient oral and written communication. Communicate in an oral and written way with other people about the results of learning, thinking and decision making; Participate in debates on topics of the specialty itself.
CT4 [Avaluable] - Teamwork. Be able to work as a member of an interdisciplinary team, either as a member or conducting management tasks, with the aim of contributing to develop projects with pragmatism and a sense of responsibility, taking commitments taking into account available resources.
CT8 [Avaluable] - Gender perspective. An awareness and understanding of sexual and gender inequalities in society in relation to the field of the degree, and the incorporation of different needs and preferences due to sex and gender when designing solutions and solving problems.
Technical Competences
Especifics
CE01 - To be able to solve the mathematical problems that may arise in the field of artificial intelligence. Apply knowledge from: algebra, differential and integral calculus and numerical methods; statistics and optimization.
CE02 - To master the basic concepts of discrete mathematics, logic, algorithmic and computational complexity, and its application to the automatic processing of information through computer systems . To be able to apply all these for solving problems.
Generic Technical Competences
Generic
CG4 - Reasoning, analyzing reality and designing algorithms and formulations that model it. To identify problems and construct valid algorithmic or mathematical solutions, eventually new, integrating the necessary multidisciplinary knowledge, evaluating different alternatives with a critical spirit, justifying the decisions taken, interpreting and synthesizing the results in the context of the application domain and establishing methodological generalizations based on specific applications.
CG8 - Perform an ethical exercise of the profession in all its facets, applying ethical criteria in the design of systems, algorithms, experiments, use of data, in accordance with the ethical systems recommended by national and international organizations, with special emphasis on security, robustness , privacy, transparency, traceability, prevention of bias (race, gender, religion, territory, etc.) and respect for human rights.
Objectives
Getting familiar with the tools of basic statistics to be able to treat data correctly and internalize the statistical methodology as a basic scheme of extraction of relevant information on complex phenomena
Related competences:
CG4,
CE01,
CE02,
Subcompetences:
Internalize the statistical methodology
Process data correctly
Apply statistical methodology to extract knowledge of complex phenomena
To select the relevant data to support a specific question.
Related competences:
CG8,
CT8,
CE01,
Designing the eligibility criteria of a sample correctly to answer a real problem
Related competences:
CT8,
CE02,
Designing basic experiments to study real problems
Related competences:
CT8,
CE01,
To perform basic data preprocessing
Related competences:
CG4,
CE02,
To select the statistical modeling methods most appropriate to the problem, in relation to the structure of the available data, the objectives of the study and the subsequent uses of the model results
Related competences:
CG4,
CE01,
CE02,
To build the statistical models correctly from the data, making use of the necessary software, the context of the reference problem and present it publicly
Related competences:
CG4,
CE01,
CE02,
To apply in an integrated way the statistical knowledge obtained from the classes into the analysis of a real data set (taking advantage of open date sources) responding to a reference problem of any real area relevant to artificial intelligence, such as health, environment , sustainability, industry4.0
Related competences:
CG4,
CE01,
CE02,
Subcompetences:
Take advantage of relevant open data for a problem
Analyze real data-sets (health, environment, ...)
Integrate statistical knowledge of different tpoics of the subject to solve a complex problem
To perform and develop practical works and projects with a gender perspective
Related competences:
CT8,
Integrating teamwork mechanisms in carrying out practical works
Related competences:
CT4,
Deftly deal with the computer tools necessary to solve the real problems proposed with the basic statistical techniques seen during the course.
Related competences:
CE02,
To Interpret and contextualize the statistical models built from data
Related competences:
CG4,
CT3,
CT8,
Incorporating the ethical recommendations of the EC regarding AI to practical work
Related competences:
CG8,
To validate the models obtained and make a critical interpretation of the results from a technical point of view and contextualize the results within the framework of the problem.
Related competences:
CG4,
CG8,
CE02,
To carry out an automatic report with the descriptive analysis of a Database, the validated models, and the integrated and critical analysis of the results in relation to the context of the reference problem.
Related competences:
CG4,
CG8,
CT3,
CT4,
CT8,
Publicly present a statistical report that includes descriptive analysis, models and conclusions, adequately communicated in technical audiences and / or without technical competences
Related competences:
CG8,
CT3,
CT4,
CT8,
Contents
Descriptive Analysis of Data
We will work on how to use numerical and graphical statistical tools to describe a set of data, as well as the automatic reporting tools necessary to perform automatic reporting with this description.
Introduction to probability theory
The basic notions of probability will be provided to understand the concept of uncertainty and the main probabilistic formalisms to model it, including concepts of conditioned probability and the Bayes theorem, relevant in later subjects.
Introduction to sampling theory
simple random sample concept, sampling theory, sampling type
Statistical Inference
hypothesis testing, p-value concept, confidence intervals. Limitations in real applications of classical inference. Nonparametric inference, Fisher permutation test. Hypothesis tests in statistical learning
Introduction to experimental design
differences between sample and experimental studies. The design of experiments in software validation. Biases and scalability
Regression
Basic model (simple linear regression, mean least squares). Goodness of fit measures, validation. Multiple linear regression. General linear model (ANOVA, ANCOVA)
Activities
ActivityEvaluation act
Teamworking
Students are organized into groups and look for real data sets that match certain requirements defined by the professor. Data are used to apply the techniques and methodologies that are seen throughout the course. At the end of the course, they will present a report with the results and make an oral presentation with the most relevant results of the final project. Objectives:810 Contents:
During the course there will be short answer tests to fix learning pieces. They will be done at the end of certain lab classes Objectives:111926534 Week:
3
Theory
0h
Problems
0h
Laboratory
0.5h
Guided learning
0h
Autonomous learning
0h
Quiz 2
During the course there will be short answer tests to fix learning pieces. They will be done at the end of certain lab classes Objectives:810126714 Week:
12
Theory
0h
Problems
0h
Laboratory
0.5h
Guided learning
0h
Autonomous learning
0h
Initial practical work presentation
Initial presentation of the practical work Objectives:91315 Contents:
The course consists of two hours of theory and two hours of lab per week.
In theory classes (lectures), the inverted class scheme will be practiced whenever possible.
On the website of the subject, the calendar of the course will be posted and the materials to be prepared before each class. The master class scheme is covered promptly when the lecturer needs to clarify complex concepts that have not been solved within materials distributed prior to the class. The theory class will be dedicated primarily to the presentation of cases and the development of interactive activities with students such as the discussion of cases, the development of problems or the realization of short specific questionnaires.
The students will be organized by groups (teams) and will work in a practical project with data that they will look for themselves and that will fulfil certain characteristics set by the professors. With these data, each team will carry out the practice sessions with the techniques of the topic studied during the theory session of each week. The lecturer will monitor weekly all the work teams during the lab sessions.
At the end of the course, the teams will present their results in a public session where all the projects will be discussed together.
Evaluation methodology
(T) Teamwork carried out during the course 20%
(O) Oral knowledge control test 10% (discussion with teachers in the oral presentation of teamworks)
(CT4) Quality and performance of the work team (TG). 10%
(CT3) Oral and written communication 10%
(CT8) Team and work gender perspective 10%
(E) Ethics of the work team and the work itself 10%
(A) Attendance and participation in classes and laboratories (AP). 10%
(Q) 2 Quiz throughout the course 20%
Practical statistics for data scientists : 50+ essential concepts using R and Python -
Bruce, Peter; Bruce, Andrew; Gedeck, Peter,
O'Reilly Media, Inc, 2020. ISBN: 9781492072942
A modern introduction to probability and statistics: understanding why and how -
Dekking, F.M; Kraaikamp, C.; Lopuhaä, H.P,
Springer, 2005. ISBN: 9781846281686