Introduction to Statistics

You are here

Credits
6
Types
Compulsory
Requirements
This subject has not requirements, but it has got previous capacities
Department
EIO
Statistics is the oldest science to extract information from data and for decades it has been used to understand reality and extract models that allow a better understanding of reality, to make predictions and provide elements describing reality useful in decision-making, especially those that refer to situations of high levels of complexity.

This subject establishes methodological bases of widely usefull in the observation of the reality and the informed decision-making and also prepares to approach more complex models that will appear in later subjects.



Statistical methods play a nuclear role in many of the artificial intelligence or intelligent data analysis methods that will be seen in later courses. For this reason, this subject is providing formal bases to enable a proper engagement with later subjects. It also provides skills to solve real problems that require basic statistics. Thus, the basic statistical principles will be introduced from the perspective of the support they can provide to the analysis of artificial intelligence problems.

This subject provides basic tools for data processing, criteria for sample selection or for construction of experiments to verify specific hypotheses through data in real applications, tests and hypothesis tests associated with statistical inference about data and necessary in statistical learning, among others, basic statistical models that will be completed in the subsequent course on statistical modeling and that will be used in further courses of the degree like intelligent data analysis and machine learning, among others.


The subject will be eminently applied and will focus on sreal problem-solving in the field of AI using basic statistical methods.

Teachers

Person in charge

  • Karina Gibert Oliveras ( )

Others

  • Dante Conti ( )
  • Sonia Garcia Esteban ( )
  • Xavier Angerri Torredeflot ( )

Weekly hours

Theory
2
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6

Competences

Transversal Competences

Transversals

  • CT3 [Avaluable] - Efficient oral and written communication. Communicate in an oral and written way with other people about the results of learning, thinking and decision making; Participate in debates on topics of the specialty itself.
  • CT4 [Avaluable] - Teamwork. Be able to work as a member of an interdisciplinary team, either as a member or conducting management tasks, with the aim of contributing to develop projects with pragmatism and a sense of responsibility, taking commitments taking into account available resources.
  • CT8 [Avaluable] - Gender perspective. An awareness and understanding of sexual and gender inequalities in society in relation to the field of the degree, and the incorporation of different needs and preferences due to sex and gender when designing solutions and solving problems.

Technical Competences

Especifics

  • CE01 - To be able to solve the mathematical problems that may arise in the field of artificial intelligence. Apply knowledge from: algebra, differential and integral calculus and numerical methods; statistics and optimization.
  • CE02 - To master the basic concepts of discrete mathematics, logic, algorithmic and computational complexity, and its application to the automatic processing of information through computer systems . To be able to apply all these for solving problems.

Generic Technical Competences

Generic

  • CG4 - Reasoning, analyzing reality and designing algorithms and formulations that model it. To identify problems and construct valid algorithmic or mathematical solutions, eventually new, integrating the necessary multidisciplinary knowledge, evaluating different alternatives with a critical spirit, justifying the decisions taken, interpreting and synthesizing the results in the context of the application domain and establishing methodological generalizations based on specific applications.
  • CG8 - Perform an ethical exercise of the profession in all its facets, applying ethical criteria in the design of systems, algorithms, experiments, use of data, in accordance with the ethical systems recommended by national and international organizations, with special emphasis on security, robustness , privacy, transparency, traceability, prevention of bias (race, gender, religion, territory, etc.) and respect for human rights.

Objectives

  1. Getting familiar with the tools of basic statistics to be able to treat data correctly and internalize the statistical methodology as a basic scheme of extraction of relevant information on complex phenomena
    Related competences: CG4, CE01, CE02,
    Subcompetences:
    • Internalize the statistical methodology
    • Process data correctly
    • Apply statistical methodology to extract knowledge of complex phenomena
  2. To select the relevant data to support a specific question.
    Related competences: CG8, CT8, CE01,
  3. Designing the eligibility criteria of a sample correctly to answer a real problem
    Related competences: CT8, CE02,
  4. Designing basic experiments to study real problems
    Related competences: CT8, CE01,
  5. To perform basic data preprocessing
    Related competences: CG4, CE02,
  6. To select the statistical modeling methods most appropriate to the problem, in relation to the structure of the available data, the objectives of the study and the subsequent uses of the model results
    Related competences: CG4, CE01, CE02,
  7. To build the statistical models correctly from the data, making use of the necessary software, the context of the reference problem and present it publicly
    Related competences: CG4, CE01, CE02,
  8. To apply in an integrated way the statistical knowledge obtained from the classes into the analysis of a real data set (taking advantage of open date sources) responding to a reference problem of any real area relevant to artificial intelligence, such as health, environment , sustainability, industry4.0
    Related competences: CG4, CE01, CE02,
    Subcompetences:
    • Take advantage of relevant open data for a problem
    • Analyze real data-sets (health, environment, ...)
    • Integrate statistical knowledge of different tpoics of the subject to solve a complex problem
  9. To perform and develop practical works and projects with a gender perspective
    Related competences: CT8,
  10. Integrating teamwork mechanisms in carrying out practical works
    Related competences: CT4,
  11. Deftly deal with the computer tools necessary to solve the real problems proposed with the basic statistical techniques seen during the course.
    Related competences: CE02,
  12. To Interpret and contextualize the statistical models built from data
    Related competences: CG4, CT3, CT8,
  13. Incorporating the ethical recommendations of the EC regarding AI to practical work
    Related competences: CG8,
  14. To validate the models obtained and make a critical interpretation of the results from a technical point of view and contextualize the results within the framework of the problem.
    Related competences: CG4, CG8, CE02,
  15. To carry out an automatic report with the descriptive analysis of a Database, the validated models, and the integrated and critical analysis of the results in relation to the context of the reference problem.
    Related competences: CG4, CG8, CT3, CT4, CT8,
  16. Publicly present a statistical report that includes descriptive analysis, models and conclusions, adequately communicated in technical audiences and / or without technical competences
    Related competences: CG8, CT3, CT4, CT8,

Contents

  1. Descriptive Analysis of Data
    We will work on how to use numerical and graphical statistical tools to describe a set of data, as well as the automatic reporting tools necessary to perform automatic reporting with this description.
  2. Introduction to probability theory
    The basic notions of probability will be provided to understand the concept of uncertainty and the main probabilistic formalisms to model it, including concepts of conditioned probability and the Bayes theorem, relevant in later subjects.
  3. Introduction to sampling theory
    simple random sample concept, sampling theory, sampling type
  4. Statistical Inference
    hypothesis testing, p-value concept, confidence intervals. Limitations in real applications of classical inference. Nonparametric inference, Fisher permutation test. Hypothesis tests in statistical learning
  5. Introduction to experimental design
    differences between sample and experimental studies. The design of experiments in software validation. Biases and scalability
  6. Regression
    Basic model (simple linear regression, mean least squares). Goodness of fit measures, validation. Multiple linear regression. General linear model (ANOVA, ANCOVA)

Activities

Activity Evaluation act


Teamworking

Students are organized into groups and look for real data sets that match certain requirements defined by the professor. Data are used to apply the techniques and methodologies that are seen throughout the course. At the end of the course, they will present a report with the results and make an oral presentation with the most relevant results of the final project.
Objectives: 8 10
Contents:
Theory
0h
Problems
0h
Laboratory
27h
Guided learning
0h
Autonomous learning
50h

quiz 1

During the course there will be short answer tests to fix learning pieces. They will be done at the end of certain lab classes
Objectives: 1 11 9 2 6 5 3 4
Week: 3
Type: lab exam
Theory
0h
Problems
0h
Laboratory
0.5h
Guided learning
0h
Autonomous learning
0h

Quiz 2

During the course there will be short answer tests to fix learning pieces. They will be done at the end of certain lab classes
Objectives: 8 10 12 6 7 14
Week: 12
Type: lab exam
Theory
0h
Problems
0h
Laboratory
0.5h
Guided learning
0h
Autonomous learning
0h

Initial practical work presentation

Initial presentation of the practical work
Objectives: 9 13 15
Contents:
Theory
0h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
10h

Final practical work presentation

Final practical work presentation
Objectives: 8 10 11 9 13 15 16 12 7 14
Week: 15 (Outside class hours)
Type: assigment
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
3h
Autonomous learning
0h

Theory
30h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
30h

Teaching methodology

The course consists of two hours of theory and two hours of lab per week.
In theory classes (lectures), the inverted class scheme will be practiced whenever possible.
On the website of the subject, the calendar of the course will be posted and the materials to be prepared before each class. The master class scheme is covered promptly when the lecturer needs to clarify complex concepts that have not been solved within materials distributed prior to the class. The theory class will be dedicated primarily to the presentation of cases and the development of interactive activities with students such as the discussion of cases, the development of problems or the realization of short specific questionnaires.
The students will be organized by groups (teams) and will work in a practical project with data that they will look for themselves and that will fulfil certain characteristics set by the professors. With these data, each team will carry out the practice sessions with the techniques of the topic studied during the theory session of each week. The lecturer will monitor weekly all the work teams during the lab sessions.
At the end of the course, the teams will present their results in a public session where all the projects will be discussed together.

Evaluation methodology

(T) Teamwork carried out during the course 20%
(O) Oral knowledge control test 10% (discussion with teachers in the oral presentation of teamworks)
(CT4) Quality and performance of the work team (TG). 10%
(CT3) Oral and written communication 10%
(CT8) Team and work gender perspective 10%
(E) Ethics of the work team and the work itself 10%
(A) Attendance and participation in classes and laboratories (AP). 10%
(Q) 2 Quiz throughout the course 20%

N= 0,2*T+0,1*O+0,1*CT4+0,1*CT3+0,1*E+0,1*CT8+0,1*A+0,2*Q

Q=0,5*Q1+0,5*Q2

Bibliography

Basic:

Previous capacities

Coneixements matemàtics de batxillerat