Optimization Techniques for Data Mining

You are here

Credits
6
Types
  • MIRI: Specialization complementary (Data Science)
  • MDS: Elective
Requirements
This subject has not requirements, but it has got previous capacities
Department
EIO
The course introduces the basic concepts of optimization and the different types of optimization problems, the iterative algorithms to solve these problems, and their properties. The practice of optimization using modeling languages to describe a problem and commercial and publicly available solvers is also emphasized.

Teachers

Person in charge

  • Jordi Castro Pérez ( )

Others

  • F. Javier Heredia Cervera ( )
  • Jessica Rodríguez Pereira ( )

Weekly hours

Theory
4
Problems
0
Laboratory
0
Guided learning
0
Autonomous learning
7.53

Competences

Transversal Competences

Information literacy

  • CT4 - Capacity for managing the acquisition, the structuring, analysis and visualization of data and information in the field of specialisation, and for critically assessing the results of this management.

Third language

  • CT5 - Achieving a level of spoken and written proficiency in a foreign language, preferably English, that meets the needs of the profession and the labour market.

Basic

  • CB6 - Ability to apply the acquired knowledge and capacity for solving problems in new or unknown environments within broader (or multidisciplinary) contexts related to their area of study.
  • CB7 - Ability to integrate knowledge and handle the complexity of making judgments based on information which, being incomplete or limited, includes considerations on social and ethical responsibilities linked to the application of their knowledge and judgments.
  • CB10 - Possess and understand knowledge that provides a basis or opportunity to be original in the development and/or application of ideas, often in a research context.

Generic Technical Competences

Generic

  • CG2 - Identify and apply methods of data analysis, knowledge extraction and visualization for data collected in disparate formats

Technical Competences

Especifics

  • CE1 - Develop efficient algorithms based on the knowledge and understanding of the computational complexity theory and considering the main data structures within the scope of data science

Objectives

  1. Discerning what is an optimization problem and its type and having a basic knowledge of optimization algorithms
    Related competences: CB10, CT5, CE1,
  2. Optimization of neural networks and support vector machines.
    Related competences: CB10, CT5, CE1,
  3. Modelling languages. Formulating optimization problems and representing them through a modeling language. Choosing and adequate solver for a given problem.
    Related competences: CB6, CB7, CT4, CT5, CG2,

Contents

  1. Unconstrained Optimization
    Optimality conditions. Convexity. Descent directions.
    Line search. Acceptability of step sizes.
    General minimization algorithm.
    Gradient method. Rate of convergence.
    Newton's method. Factorizations to ensure convergence.
    Quasi-Newton methods.
    Optimization of neural networks.
  2. Constrained Optimization and Support Vector Machines.
    - Introduction to the modelling langauge AMPL.
    - Introduction to Support Vector Macines (SVM)
    - KKT Optimality conditions of constrained optimization. Optimality conditions of SVM.
    - Duality in Optimization. The dual of the SVM.
  3. Integer Programming
    - Modelling problems with binary variables.
    - The branch and bound algorithm for integer programming
    - Gomory's cutting planes algorithm.
    - Minimal spanning tree problem and algorithms of Kruskal and Prim. Application to clustering.

Activities

Activity Evaluation act


Unconstrained Optimization

Optimality conditions. Convexity. Descent directions. Line search. Acceptability of step sizes. General minimization algorithm. Gradient method. Rate of convergence. Newton's method. Factorizations to ensure convergence. Weighted least squares. Introduction to AMPL. The Neos solver site.
Objectives: 3 1 2
Theory
17.3h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
33h

Constrained Optimization and Support Vector Machines

- Introduction to Support Vector Macines (SVM) - KKT Optimality conditions of constrained optimization. Optimality conditions of SVM. - Duality in Optimization. The dual of the SVM.
Objectives: 3 1 2
Theory
17.4h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
33h

Integer Programming

- Modelling problems with binary variables. - The branch and bound algorithm for integer programming - Gomory's cutting planes algorithm. - Minimal spanning tree problem and algorithms of Kruskal and Prim
Objectives: 3 1
Theory
17.3h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
33h

Teaching methodology

The students will have available all the course material.

About two thirds of lecture time will be devoted to optimization algorithms and their properties, and the rest will be for presenting and solving exercises and problems

Lab sessions will be devoted to the solution of some data science applications (neural networks, support vector machines, clustering) using optimization methods.

Evaluation methodology

- Theory (40%). There will be two midterm exams (there is no final exam). The first midterm exam will consist on some practical exercises about the first two parts of the course, and it will be done before January. The second midterm exam will be about the third part of the course, and it will be done in January.

- Practical assigments (60%). There will be 3 lab assignments, one for each part of the course, all of them with the same weight on the final mark.

Additional exercises provided during the lectures may be taken into consideration to decide or to boost the final mark.

Bibliography

Basic:

Web links

Previous capacities

Basic background in linear algebra, calculus, and programming languages is needed for the course.