This exciting course broaches the hot topic of Data Analysis and Knowledge Discovery (DAKD) from the viewpoint of Data Mining.
Most areas in science, engineering and business are becoming increasingly data dependent. Clear examples of this are, to name a few, bioinformatics, medicine, or electronic commerce.
Data analysis techniques are needed to deal with these data and generate usable knowledge out of them. Amongst them, DAKD techniques are one of the most promising approaches. This theme is at the core of the contents of this course.
Person in charge
Alfredo Vellido Alcacena (
Luis Antonio Belanche Muñoz (
Generic Technical Competences
CG3 - Capacity for mathematical modeling, calculation and experimental designing in technology and companies engineering centers, particularly in research and innovation in all areas of Computer Science.
CTR4 - Capability to manage the acquisition, structuring, analysis and visualization of data and information in the area of informatics engineering, and critically assess the results of this effort.
CTR6 - Capacity for critical, logical and mathematical reasoning. Capability to solve problems in their area of study. Capacity for abstraction: the capability to create and use models that reflect real situations. Capability to design and implement simple experiments, and analyze and interpret their results. Capacity for analysis, synthesis and evaluation.
Technical Competences of each Specialization
CEC1 - Ability to apply scientific methodologies in the study and analysis of phenomena and systems in any field of Information Technology as well as in the conception, design and implementation of innovative and original computing solutions.
CEC3 - Ability to apply innovative solutions and make progress in the knowledge that exploit the new paradigms of Informatics, particularly in distributed environments.
Presenting DM as a process that should involve a methodology id applied at its best.
Introducing the students to the new concept of DM for processes, called Process Mining.
Delving into some detail in one of the stages of DM: data exploration.
Dealing in detail with the problem of data visualization for exploration as a key issue in DM.
Introducing the students to the basics of probability theory as applied in Data Analysis and Knowledge Discovery (DAKD)
Introducing the students to the probabilistic variant of DAKD in the form of Statistical Machine Learning, both for supervised and unsupervised learning models.
Dealing in detail with different unsupervised models for data visualization, including case studies.
Approaching the multi-faceted concept of data mining (DM) from different perspectives.
Introduction to the concept of data mining (DM).
DM is a multi-faceted concept that requires discussion and clarification. We will do this at the beginning of the course.
DM as a methodology.
We argue that DM should not be focused on the concept of data analysis/modeling, but, instead, should be treated as a methodology with diverse inter-related stages.
DM for processes: Process Mining.
A new development in DM methodologies is that which deals with one specifically suited for processes. It is called Process Mining and will be described and discussed in this course.
Data exploration in DM.
One of the main stages of well-structures DM methodologies is Data exploration. It will be discussed as a preamble to data visualization.
Data visualization for exploration.
One of the aspects of the problem of data exploration is data visualization. It has a research 'life' of its own as it involves not only computer-based mathematical models, but also natural perception and processing.
Basics of probability theory in Data Analysis and Knowledge Discovery (DAKD)
For a long time in the last half-century, multivariate statistics and artificial intelligence (mostly in the field of machine learning) have developed in parallel without fully meeting. Statistical machine learning has bridged that field over the last two decades. We introduce it by first providing some basic principles of probability theory (Bayesian inference).
Statistical Machine Learning for DAKD: supervised models.
Once the basics of Bayesian inference are set, we will delve into the field of Statistical Machine Learning for IDA, starting with supervised learning models, with an emphasis on feed-forward artificial neural networks.
Statistical Machine Learning for DAKD: unsupervised models.
Once the basics of Bayesian inference and of Statistical Machine Learning for IDA in supervised models are set, we will continue with unsupervised models, focusing on self-organizing maps and related models.
Unsupervised models for data visualization, with case studies.
In the final item of the contents of the course, we will bring statistical machine learning and data visualization together by discussing some probabilistic unsupervised learning models for data visualization, including some case studies as an example.
Essay on DAKD for DM
Students will have to write a research essay on the topic of DAKD for DM, with different options:
1. State of the art on an specific DAKD-DM topic
2. Evaluation of an DAKD-DM software tool with original experiments
3. Pure research essay, with original experimental content Week:
Introduction to Data Mining and its Methodologies
Introduction to Data Mining as a general concept and to its methodologies for practical implementation Contents:
This course will build on different teaching methodology (TM) aspects, including:
TM1: Expositive seminars
TM2: Expositive-participative seminars
TM3: Orientation for individual assignments (essays)
TM4: Individual tutorization
The course will be evaluated through a final essay that will take one of these three modalities:
1. State of the art on an specific IDA-DM topic
2. Evaluation of an IDA-DM software tool with original experiments
3. Pure research essay, with original experimental content
Information Visualization: Design for Interaction -
Spence, Robert, Prentice Hall ,
Visualize This: The Flowing Data Guide to Design, Visualization, and Statistic -
Yau, Nathan, John Wiley & Sons ,
(8 July 2011).
Students are expected to have at least some basic background in the area of artificial intelligence and, more specifically, with the areas of Machine Leaning and Computational Intelligence.
Some basic knowledge of probability theory and statistics would be beneficial.
Other than this, the course is open to students and researchers of all types of background.
Where we are
B6 Building Campus Nord
C/Jordi Girona Salgado,1-3
08034 BARCELONA Spain
Tel: (+34) 93 401 70 00