Análisis y Minería de Datos Inteligente

Usted está aquí

Créditos
4.5
Tipos
Optativa
Requisitos
Esta asignatura no tiene requisitos, pero tiene capacidades previas
Departamento
CS
This exciting course broaches the hot topic of Intelligent Data Analysis (IDA) from the viewpoint of Data Mining.
Most areas in science, engineering and business are becoming increasingly data dependent. Clear examples of this are, to name a few, bioinformatics, medicine, or electronic commerce.
Data analysis techniques are needed to deal with these data and generate usable knowledge out of them. Amongst them, IDA techniques are one of the most promising approaches. This theme is at the core of the contents of this course.

Profesores

Responsable

  • Alfredo Vellido Alcacena ( )

Otros

  • Mario Martín Muñoz ( )

Horas semanales

Teoría
2.9
Problemas
0
Laboratorio
0
Aprendizaje dirigido
0.1
Aprendizaje autónomo
4.6

Competencias

Competencias Técnicas Genéricas

Genéricas

  • CG3 - Capacidad para la modelización, cálculo, simulación, desarrollo e implantación en centros tecnológicos y de ingeniería de empresa, particularmente en tareas de investigación, desarrollo e innovación en todos los ámbitos relacionados con la Inteligencia Artificial.

Competencias Técnicas de cada especialidad

Académicas

  • CEA4 - Capacidad de comprender los principios básicos de funcionamiento de las técnicas principales de Inteligencia Computacional, y saber utilizarlas en el entorno de un sistema o servicio inteligente.
  • CEA7 - Capacidad de comprender la problemática, y las soluciones a los problemas en la práctica profesional de la aplicación de la Inteligencia Artificial en el entorno empresarial e industrial.
  • CEA11 - Capacidad de comprender las técnicas avanzadas de Inteligencia Computacional, y saber diseñar, implementar y aplicar estas técnicas en el desarrollo de aplicaciones, servicios o sistemas inteligentes.

Profesionales

  • CEP1 - Capacidad de resolver las necesidades de analisis de la informacion de las diferentes organizaciones, identificando las fuentes de incertidumbre y variabilidad.
  • CEP5 - Capacidad de diseñar nuevas herramientas informáticas y nuevas técnicas de Inteligencia Artificial en el ejercicio profesional.

Competencias Transversales

Uso solvente de los recursos de información

  • CT4 - Gestionar la adquisicion, la estructuracion, el analisis y la visualizacion de datos e informacion en el ambito de la especialidad y valorar de forma critica los resultados de esta gestion.

Razonamiento

  • CT6 - Capacidad de evaluar y analizar de manera razonada y critica sobre situaciones, proyectos, propuestas, informes y estudios de caracter cientifico-tecnico. Capacidad de argumentar las razones que explican o justifican tales situaciones, propuestas, etc.

Analisis y sintesis

  • CT7 - Capacidad de analisis y resolucion de problemas tecnicos complejos.

Objetivos

  1. Presenting DM as a process that should involve a methodology id applied at its best.
    Competencias relacionadas: CEA7, CEP5, CT4, CT6,
  2. Introducing the students to the new concept of DM for processes, called Process Mining.
    Competencias relacionadas: CEA7, CG3, CEP1, CEP5, CT4, CT6,
  3. Delving into some detail in one of the stages of DM: data exploration
    Competencias relacionadas: CEA4, CG3, CEP1, CT4,
  4. Dealing in detail with the problem of data visualization for exploration as a key issue in DM.
    Competencias relacionadas: CEA11, CG3, CEP1, CEP5, CT4, CT6,
  5. Introducing the students to the basics of probability theory as applied in Intelligent Data Analysis (IDA)
    Competencias relacionadas: CEP1, CT4, CT6, CT7,
  6. Introducing the students to the probabilistic variant of IDA in the form of Statistical Machine Learning, both for supervised and unsupervised learning models.
    Competencias relacionadas: CEA11, CG3, CT4, CT6, CT7,
  7. Dealing in detail with different unsupervised models for data visualization, including case studies.
    Competencias relacionadas: CEA11, CG3, CEP1, CEP5, CT4, CT6, CT7,
  8. Approaching the multi-faceted concept of data mining (DM) from different perspectives.dies.
    Competencias relacionadas: CEA7, CG3, CEP5, CT4, CT6, CT7,

Contenidos

  1. Introduction to the concept of data mining (DM).
    DM is a multi-faceted concept that requires discussion and clarification. We will do this at the beginning of the course.
  2. DM as a methodology.
    We argue that DM should not be focused on the concept of data analysis/modeling, but, instead, should be treated as a methodology with diverse inter-related stages.
  3. DM for processes: Process Mining.
    A new development in DM methodologies is that which deals with one specifically suited for processes. It is called Process Mining and will be described and discussed in this course.
  4. Data exploration in DM.
    One of the main stages of well-structures DM methodologies is Data exploration. It will be discussed as a preamble to data visualization.
  5. Data visualization for exploration.
    One of the aspects of the problem of data exploration is data visualization. It has a research 'life' of its own as it involves not only computer-based mathematical models, but also natural perception and processing.
  6. Basics of probability theory in Intelligent Data Analysis (IDA)
    For a long time in the last half-century, multivariate statistics and artificial intelligence (mostly in the field of machine learning) have developed in parallel without fully meeting. Statistical machine learning has bridged that field over the last two decades. We introduce it by first providing some basic principles of probability theory (Bayesian inference).
  7. Statistical Machine Learning for IDA: supervised models.
    Once the basics of Bayesian inference are set, we will delve into the field of Statistical Machine Learning for IDA, starting with supervised learning models, with an emphasis on feed-forward artificial neural networks.
  8. Statistical Machine Learning for IDA: unsupervised models.
    Once the basics of Bayesian inference and of Statistical Machine Learning for IDA in supervised models are set, we will continue with unsupervised models, focusing on self-organizing maps and related models.
  9. Unsupervised models for data visualization, with case studies.
    In the final item of the contents of the course, we will bring statistical machine learning and data visualization together by discussing some probabilistic unsupervised learning models for data visualization, including some case studies as an example.

Actividades

Actividad Acto evaluativo


Essay on IDA for DM

Students will have to write a research essay on the topic of IDA for DM, with different options: 1. State of the art on an specific IDA-DM topic 2. Evaluation of an IDA-DM software tool with original experiments 3. Pure research essay, with original experimental content
Objetivos: 8 1 2 3 4 5 6 7
Semana: 15
Tipo: entrega
Teoría
0h
Problemas
0h
Laboratorio
0h
Aprendizaje dirigido
3h
Aprendizaje autónomo
0h

Introduction to Data Mining and its Methodologies

Introduction to Data Mining as a general concept and to its methodologies for practical implementation
  • Teoría: presential seminars dealing with the theory of this topic
  • Aprendizaje dirigido: Students' directed learning, related to the topic.
  • Aprendizaje autónomo: Students' autonomous learning, related to the topic.
Objetivos: 8 1
Contenidos:
Teoría
6h
Problemas
0h
Laboratorio
0h
Aprendizaje dirigido
0h
Aprendizaje autónomo
9h

Process Mining

Introduction to the novel concept of Process Mining and its application within the DM framework.
  • Aprendizaje dirigido: Students' directed learning, related to the topic.
  • Aprendizaje autónomo: Students' autonomous learning, related to the topic.
Objetivos: 2
Contenidos:
Teoría
3h
Problemas
0h
Laboratorio
0h
Aprendizaje dirigido
0h
Aprendizaje autónomo
6h

Data Visualization

As part of the DM stage of Data Exploration, we focus in the problem of Data Visualization.
  • Teoría: presential seminars dealing with the theory of this topic
  • Aprendizaje dirigido: Students' directed learning, related to the topic.
  • Aprendizaje autónomo: Students' autonomous learning, related to the topic.
Objetivos: 3 4
Contenidos:
Teoría
6h
Problemas
0h
Laboratorio
0h
Aprendizaje dirigido
0h
Aprendizaje autónomo
9h

Basics of probability theory for intelligent data analysis

Introduction to probability theory for intelligent data analysis, with a focus on Bayesian statistics
  • Teoría: presential seminars dealing with the theory of this topic
  • Aprendizaje dirigido: Students' directed learning, related to the topic.
  • Aprendizaje autónomo: Students' autonomous learning, related to the topic.
Objetivos: 5
Contenidos:
Teoría
6h
Problemas
0h
Laboratorio
0h
Aprendizaje dirigido
0h
Aprendizaje autónomo
9h

Statistical Machine Learning methods

The meeting of statistics and machine learning: Statistical Machine Learning methods, from the point of view of both supervised and supervised learning
  • Teoría: presential seminars dealing with the theory of this topic
  • Aprendizaje dirigido: Students' directed learning, related to the topic.
  • Aprendizaje autónomo: Students' autonomous learning, related to the topic.

Teoría
12h
Problemas
0h
Laboratorio
0h
Aprendizaje dirigido
0h
Aprendizaje autónomo
16h

SML in data visualization, with case studies

We merge the topics of SML and data visualization, illustrating its use with some real case studies
  • Teoría: presential seminars dealing with the theory of this topic
  • Aprendizaje dirigido: Students' directed learning, related to the topic.
  • Aprendizaje autónomo: Students' autonomous learning, related to the topic.
Objetivos: 7
Contenidos:
Teoría
6h
Problemas
0h
Laboratorio
0h
Aprendizaje dirigido
0h
Aprendizaje autónomo
9h

Metodología docente

This course will build on different teaching methodology (TM) aspects, including:
TM1: Expositive seminars
TM2: Expositive-participative seminars
TM3: Orientation for individual assignments (essays)
TM4: Individual tutorization

Método de evaluación

The course will be evaluated through a final essay that will take one of these three modalities:
1. State of the art on an specific IDA-DM topic
2. Evaluation of an IDA-DM software tool with original experiments
3. Pure research essay, with original experimental content

Bibliografía

Básica:

Complementaria:

  • Statistics : a very short introduction - Hand, D. J, Oxfrod University Press , cop. 2008. ISBN: 978-0199233564
    http://cataleg.upc.edu/record=b1389307~S1*cat
  • Information Visualization: Design for Interaction - Spence, Robert, Prentice Hall , 18 Dec 2006. ISBN: 978-0132065504
  • Visualize This: The Flowing Data Guide to Design, Visualization, and Statistic - Yau, Nathan, John Wiley & Sons , (8 July 2011). ISBN: 978-0470944882

Capacidades previas

Students are expected to have at least some basic background in the area of artificial intelligence and, more specifically, with the areas of Machine Leaning and Computational Intelligence.
Some basic knowledge of probability theory and statistics would be beneficial.
Other than this, the course is open to students and researchers of all types of background