Saltar al contingut Menu
Map
  • Home
  • Information
  • Contact
  • Map

DATA MINING II (MD2)

Credits Dept.
7.5 (6.0 ECTS) CS

Instructors

Person in charge:  (-)
Others:(-)

General goals

In this course, students should gain an understanding of the concepts behind data mining, its goals, techniques and applications, with a special focus on the application to massive data sets. These applications are experiencing a rapid growth in astronomy, marketing and genomics, among other disciplines, and demand supercomputing architectures and scalable algorithms. The course will have a practical side, with assignments using different data sets and techniques, such as advanced visualization, statistical machine learning and model optimization.

Specific goals

Knowledges

  1. Advanced issues in data mining, such as exploratory visualization, statistical machine learning and kernel methods, and mining large datasets through supercomputing.
  2. Insight from real applications of data mining in industry, including the mining of large datasets through supercomputing.

Abilities

  1. Ability to identify a problem suitable for data mining.
  2. Identification of the most appropriate technique or techniques for a given problem.
  3. Practical application to large datasets through supercomputing.

Competences

  1. Be able to design a data mining application for a specific problem.
  2. Be able to work in group to discuss the use of different data mining models and techniques for a given application.

Contents

Estimated time (hours):

T P L Alt Ext. L Stu A. time
Theory Problems Laboratory Other activities External Laboratory Study Additional time

1. Introduction to data mining and CRISP-DM 2.0.
T      P      L      Alt    Ext. L Stu    A. time Total 
2,0 0 0 0 0 2,0 0 4,0

2. Dimensionality reduction, feature selection and extraction.
T      P      L      Alt    Ext. L Stu    A. time Total 
6,0 2,0 0 0 5,0 6,0 0 19,0

3. Visualization in data mining.
T      P      L      Alt    Ext. L Stu    A. time Total 
4,0 4,0 4,0 0 10,0 4,0 0 26,0

4. Statistical Machine Learning.
T      P      L      Alt    Ext. L Stu    A. time Total 
6,0 4,0 8,0 0 10,0 4,0 0 32,0

5. Case studies in data mining and supercomputing.
T      P      L      Alt    Ext. L Stu    A. time Total 
6,0 4,0 10,0 0 15,0 10,0 0 45,0


Total per kind T      P      L      Alt    Ext. L Stu    A. time Total 
24,0 14,0 22,0 0 40,0 26,0 0 126,0
Avaluation additional hours 0
Total work hours for student 126,0

Docent Methodolgy

Classes building up theoretical and methodological concepts in a structured fashion. Problem-oriented classes focusing on a set problem assignments. Laboratory classes focusing on co-operative work and practical applications in order to consolidate concepts, skills and competencies.

Evaluation Methodgy

The course will be evaluated through a final project and its corresponding written report and oral presentation.

Basic Bibliography

  • Hand, D., Manila, H., Smyth, P. Principles of Data Mining, The MIT Press, 2001.
  • U.Fayyad et al. (Eds.) Information Visualization in Data Mining and Knowledge Discovery, Morgan Kaufmann, 2001.
  • Guo, Y., Grossman, R. High Performance Data Mining: Scaling Algorithms, Applications and Systems, Kluwer, 2000.

Complementary Bibliography

  • Bishop, C.M. Pattern Recognition and Machine Learning, Springer Verlag, 2006.
  • MacKay, David Information Theory, Inference & Learning Algorithms, Cambridge Univ. Press, 2002.

Web links

  1. http://www.kdnuggets.com


  2. http://www.kernel-machines.org/


Previous capacities

Basic understanding of multivariate data analysis and machine learning techniques.


Compartir

 
logo FIB © Barcelona school of informatics - Contact - RSS
This website uses cookies to offer you the best experience and service. If you continue browsing, it is understood that you accept our cookies policy.
Classic version Mobile version