Credits
3
Types
Elective
Requirements
This subject has not requirements
, but it has got previous capacities
Department
EIO
Mail
karina.gibert@upc.edu
Teachers
Person in charge
- Karina Gibert Oliveras ( karina.gibert@upc.edu )
Others
- Xavier Angerri Torredeflot ( xavier.angerri@upc.edu )
Weekly hours
Theory
1.5
Problems
0
Laboratory
1.5
Guided learning
0
Autonomous learning
0
Competences
Direcció i gestió
Especifics
Generic
Sustainability and social commitment
Teamwork
Information literacy
Appropiate attitude towards work
Reasoning
Basic
Objectives
-
Saber realitzar l'anàlisi descriptivá bàsica automàtica d'una base de dades complexa
Related competences: CTE9, CG8, -
Saber traslladar un problema real donat a un problema de mineria de dades
Related competences: CTR2, CTR6, CB6, CG8, -
Saber triar la tècnica de mineria de dades adequada per un problema real donat
Related competences: CTR6, CB6, CG8, -
Saber dissenyar un projecte integrat de knowledge discovery, amb totes les seves fases, des de la formulació d'objectius fins la producció explícita del coneixement, integrant les tècniques apropiades en cada punt del procés sota un enfoc multidisciplinar
Related competences: CTE9, CDG1, CTR2, CTR4, CTR6, CB6, CB7, CB8, CG8, -
Saber triar i utilitzar les eines adequades per implementar i desplegar un projecte de Knowledge Discovery, utilitzant la combinació més eficaç d'entorns de programació de lliure distribució o paquets professionals especialitzats
Related competences: CTE9, CDG1, CTR4, CG8, -
Saber interpretar correctament els resultats d'un projecte de Knowledge Discovery, fer una validació crítica dels resultats i reportar-los amb claredat i poder comunicar-los per escrit (tant de forma detallada com sintètica) o oralment a destinataris tècnics o no especialitzats
Related competences: CTR2, CTR4, CTR6, CB7, CB8, -
Poder recòrrer a bibliografia complementària per trobar solució a problemes nous, incorporant coneixements més avençats al disseny dels projectes de Knowledge Discovery. Poder incorporar a un projecte un software nou o una nova tècnica.
Related competences: CDG1, CTR5, CTR6, CB9, CG8, -
Saber realitzar una planificació a mig termini (uns tres mesos) per al desenvolupament d'un projecte de Knowledge Discovery de certa envergadura
Related competences: CDG1, CTR3, CTR5, -
Saber integrar-se en un equip de treball (potser multidisciplinar) per al desenvolupament d'un projecte de Knowledge Discovery
Related competences: CDG1, CTR3, CTR4, CTR5, CB8, -
Saber dissenyar un preprocessament adequat de les dades a analitzar, d'acord amb els objectius de l'estudi i l'estat original de les pròpies dades
Related competences: CTR2, CTR4, CB6, CG8,
Contents
-
Introduction. Data Mining origins, steps, Statistics and Artificial Intelligence
Data Mining is placed in the historical context.
The overall process of Knowledge Discovery from Databases is presented, together with its steps and including Data Mining itself.
The disciplinary pillars of Data Mining are introduced: Statistics and Artificial Intelligence, Information Systems and Data Visualization -
Scope and tools
Different natures of real problems and their different levels of complexity are discussed according to the classification proposed by Simpson. . Ill-structured domains are introduced, as well as a priori and implicit knowledge management, causes and consequences.
Some software tools for developing data mining tasks are introduced. -
Method Selection. Typology of problems (DMMCM)
The course follows a problem-oriented KDD approach, where the nature of the problem mainly determines the analysis process. Factors determining a correct choice of data mining method in real cases are presented. The DMMCM typology of methods is presented as a conceptual basis for selection. -
Data, Metadata
Main data structures analyzed by Data Mining techniques.
Importance of metadata, formats and contents -
Preprocessing
Brief introduction of relevant aspects in data preparation step: Missing data, outliers detection and treatment, derived variables, transformed variables, filtering, sampling, feature weighting, dimensionality reduction. Good practice guidelines will be provided -
Data Mining Descriptive methods
Statistical clustering: partitional methods, hierarchical methods, density-based, model-based, scalability; Conceptual Clustering (IA); Hybrid AI&Stats methods: clustering based on rules. Case OMS: mental health systems -
Associative Data Mining methods
Association rules induction. Factorial methods. Bayesian Networks. -
Predictive Data Mining methods
Regressión, statistical modelling in general. Temporal methods, Artificial Neural Networks, Swarm Intelligence. -
Data Mining Discriminant methods
Decision trees, rule induction, support vector machines, Random Forest. discriminant analysis, hybrid methods. Case elderly people functioning and profiles assessment grid -
Space-temporality
Introduction of some tools to manage data including simultanoeulsy spatial information changing over time. Case Quality of Life Guttmann -
Post-processing and validation
Post-processing tools and validation tools for both models and results adapted to different Data Mining methods. Case wastewater treatment -
Conclusion
All the elements seen during the course will be placed over the general scheme of the Knowledge Discovery process presented in section 1, as a global synthesis of the course
Activities
Activity Evaluation act
Paper reading
A paper from an impact journal about a real data mining application will be selected. The paper can be proposed by both the student or the lecturer. The student must read and understand the process of Knowledge Discovery used in the applicationwith all its components. A form with this information must be filled-in.Objectives: 6 7
Contents:
Theory
0h
Problems
0h
Laboratory
4h
Guided learning
0h
Autonomous learning
4h
Presentació Control de projectes en equip
Cada grup presentarà en públic el plantejament del seu projecte. Descripció del projecte, objectius, estructura, contingut i origen de les dades, disseny del procés de Data Mining a aplicar, pla de treball- Laboratory: Two lab sessions dedicated to group presentations and discussion
Theory
0h
Problems
0h
Laboratory
4h
Guided learning
0h
Autonomous learning
7h
Presentació final del projecte en equip
Cada grup entregarà l'informe de la pràctica i presentarà als seus companys els resultats de l'aplicació de mineria de dades desenvolupada. Hi haurà debat i discussió amb el professor sobre les decisions preses al llarg del projecteObjectives: 1 2 3 4 5 6 8 9
Week: 18
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h
Introduction
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h
Scope, tools, Data, Metadata, Preprocessing
Theory
6h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
0h
DMMCM map, Data Mining methods
Theory
12h
Problems
0h
Laboratory
8h
Guided learning
0h
Autonomous learning
0h
Spatio-temporality
Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
0h
Post-processing
Theory
2h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
0h
Teaching methodology
The course uses a mixt methodology of case-based learning and project-based learningThe first week the fundamentals of the matter will be given and the activities to be developed by the student to guarantee the learning process will be assigned. Basically two activities: a paper reading activity regarding a Data Mining application and the developement of a Data Mining project in a working team.
In the following weeks, the structure will be as described bellow:
Every week two hours will be devoted to a case presentation, including the whole steps of development (preprocessing, analisys, postprocessing and validation). In part of the third hour the students will give synthetic presentations of complementary cases to be documented individually. The remaining part of third hour and forth hour, lab activities will be followed related with the project to be developed by every working team.
Together with the acquisition of tecnical skills directly related with Data Mining, an important goal of the course is to provide to the student transversal skills considered relevants for the professional developement, like team-working capacity, long-term planning skills, oral, visual and written communication skills, synthesis skills, justifying decisions made during the project, incidence management skills, knowledge integration for building solutions to high complex problems. The activities scheduled during the course have been especially designed to this purpose.
Last week of the course, every project will be presented and followed by a discussion, usefull as oral examination. The lecturer will use last hour of the course to highlight commonalities and particularities of the presented projects related with the basic schemes of a Data Mining project. Common discussionwill follow on what students understood about usefulness of Data Mining in Computer Engineering, this completing the general message of the course
Evaluation methodology
Two scores corresponding to two activities developed during the course:20% for Paper activity: It will evaluate the capacities of comprehension (0.5), synthesis (0.5), oral and visual communication (0.5), as well as argumentative capacity (0.5), which will be demonstrated through discussion
80% for a project developed by teams. There will be a single evaluation of the Data Mining project quality, considering the methodologic rigour (0.5), the correctness of the Knowledge Discovery process designed (0.5), the selected preprocessing methods (0,25), the selected data mining methods (0,25), the selected tools (0,5), correct application and results interpretation (1), the integration of several techniques in the project (0,5), the quality of the written report (1), and final public presentation (1). For the final scoring, it will be important the level of planning and coordination of the team, how the incidencies during the course have been solved (1). Additionally, invididual evaluation of the communication skills of every single student (0,5) will be taken into account, as well as its integration level to the working team (1).