Algorithmics for Data Mining

You are here

Credits
6
Types
  • MDS: Elective
  • MIRI: Specialization complementary (Advanced Computing)
Requirements
This subject has not requirements, but it has got previous capacities
Department
CS
Data mining is the process of extracting and discovering patterns in (usually) large data sets involving methods at the intersection of machine learning, multivariate statistics and database systems. Nowadays it uses these methods in a principled way to form an end-to-end process, from a raw data set to high-level information, expressed into a comprehensible structure for the final user.

The goal of this course is to present and study some of the most widespread, useful and elegant algorithms so that students become capable of identifying and applying the suitable tools for a given application. The lectures will cover the theory, algorithms and practical usage of the techniques.

Teachers

Person in charge

  • Jose Luis Balcázar Navarro ( )
  • Luis Antonio Belanche Muñoz ( )

Others

  • Marta Arias Vicente ( )

Weekly hours

Theory
1
Problems
0
Laboratory
2
Guided learning
0
Autonomous learning
6.375

Competences

Technical Competences of each Specialization

Advanced computing

  • CEE3.1 - Capability to identify computational barriers and to analyze the complexity of computational problems in different areas of science and technology as well as to represent high complexity problems in mathematical structures which can be treated effectively with algorithmic schemes.
  • CEE3.2 - Capability to use a wide and varied spectrum of algorithmic resources to solve high difficulty algorithmic problems.
  • CEE3.3 - Capability to understand the computational requirements of problems from non-informatics disciplines and to make significant contributions in multidisciplinary teams that use computing.

Generic Technical Competences

Generic

  • CG1 - Capability to apply the scientific method to study and analyse of phenomena and systems in any area of Computer Science, and in the conception, design and implementation of innovative and original solutions.
  • CG3 - Capacity for mathematical modeling, calculation and experimental designing in technology and companies engineering centers, particularly in research and innovation in all areas of Computer Science.
  • CG5 - Capability to apply innovative solutions and make progress in the knowledge to exploit the new paradigms of computing, particularly in distributed environments.

Transversal Competences

Teamwork

  • CTR3 - Capacity of being able to work as a team member, either as a regular member or performing directive activities, in order to help the development of projects in a pragmatic manner and with sense of responsibility; capability to take into account the available resources.

Information literacy

  • CTR4 - Capability to manage the acquisition, structuring, analysis and visualization of data and information in the area of informatics engineering, and critically assess the results of this effort.

Appropiate attitude towards work

  • CTR5 - Capability to be motivated by professional achievement and to face new challenges, to have a broad vision of the possibilities of a career in the field of informatics engineering. Capability to be motivated by quality and continuous improvement, and to act strictly on professional development. Capability to adapt to technological or organizational changes. Capacity for working in absence of information and/or with time and/or resources constraints.

Reasoning

  • CTR6 - Capacity for critical, logical and mathematical reasoning. Capability to solve problems in their area of study. Capacity for abstraction: the capability to create and use models that reflect real situations. Capability to design and implement simple experiments, and analyze and interpret their results. Capacity for analysis, synthesis and evaluation.

Basic

  • CB6 - Ability to apply the acquired knowledge and capacity for solving problems in new or unknown environments within broader (or multidisciplinary) contexts related to their area of study.
  • CB8 - Capability to communicate their conclusions, and the knowledge and rationale underpinning these, to both skilled and unskilled public in a clear and unambiguous way.
  • CB9 - Possession of the learning skills that enable the students to continue studying in a way that will be mainly self-directed or autonomous.

Objectives

  1. Te be aware of the theoretical and practical set of problems that constitute Data Mining, and to understand the main models and algorithms to tackle it: both at the conceptual level and at the level of their application through commercial tools, preferably open-source.
    Related competences: CG1, CG3, CEE3.1, CEE3.2, CEE3.3, CB6, CTR4, CTR5, CTR6, CG5,
  2. To acquire and demonstrate an ability to put to work the knowledge obtained in the autonomous, team-wise deployment of a practical data mining case, including a public presentation of the work developed.
    Related competences: CG3, CEE3.2, CB6, CB8, CB9, CTR3, CTR4, CTR5, CTR6,

Contents

  1. Selected techniques and algorithms for Data Mining
    Algorithms and techniques are representative of the good and the best a data practitioner needs to know, among which:

    backpropagation
    expectation-maximization
    association rules
    pagerank
    GLMs

    Each topic of study is focused in 3 aspects:

    theoretical
    algorithmic
    practical

Activities

Activity Evaluation act


Theoretical and conceptual study of the main data mining algorithms.

Theoretical and conceptual study of the main data mining algorithms.
Objectives: 1
Contents:
Theory
18h
Problems
6h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Deploy of a practical case study

Deploy of a practical case study
Objectives: 1 2
Contents:
Theory
0h
Problems
0h
Laboratory
36h
Guided learning
0h
Autonomous learning
18h

Teaching methodology

Theoretical classes, exercises and problems with or without a programming component and development of case studies.

Evaluation methodology

Evaluation is fully offline and there will be no exams. Each person must contribute with three (3) case studies, solved exercises or studied problems (written report plus eventually code) on topics related to the course; each of these must be worked out by three (3) people, as follows:

- 2 people do the exercise per se
- 1 person evaluates the work done

The evaluator role will be taken by each member exactly once. The order will be left to the group members to decide. The lecturer(s) evaluate both the work done and the evaluation itself. Rubrics will be available showing the precise way in which all evaluations are carried out, all of them publicly available at all times. Additional information as delivery dates, document format, etc will be given at due time.

The final grade will be computed as follows. Let

Ri = evaluation of work 'i' by the lecturer
SEi = evaluation of work 'i' by the student
LEi = evaluation of evaluation 'i' by the lecturer

FGi = final grade of work 'i' = 1/2*(Ri + 10 - |SEi - LEi|)

FS = final grade = [3*FG1 + 3*FG2 + 3*FG3 + SS]/10

where SS is the soft skills grade (see the teaching guide for the subject for more information).

The topic of each work is to be agreed with the lecturer(s) by each group of students. Many suggestions will be provided along the lectures. That said, individual initiative and open-minded approaches are particularly encouraged. The topics of the works may be different or, alternatively, chained work can deepen successively on the same or closely related topics.

Bibliography

Basic:

Previous capacities

Adequate understanding of computing in general, especially algorithms; good level of various programming languages (such as R, python, Julia) or willingness to achieve it; basic to average ability to mathematically formalize concepts in computing, statistics, etc.