Checking date: 30/05/2022

Course: 2022/2023

Statistical methods in data mining
(13722)
Study: Bachelor in Statistics and Business (203)

Coordinating teacher: MUÑOZ GARCIA, ALBERTO

Department assigned to the subject: Department of Statistics

Type: Compulsory
ECTS Credits: 6.0 ECTS

Course:
Semester:

Requirements (Subjects that are assumed to be known)
Regression Methods and Multivariate Analysis, third course. Knowledge of R statistical software.
Objectives
1. To know and use advanced statistical techniques, with last generation software support. 2. To extract and analyze information from large data sets. 1. Ability of information analysis and synthesis. 2. Modelization and resolution of practical problems in Data Mining. 3. Oral and written communication skills.
Skills and learning outcomes
Description of contents: programme
1. Introduction Tidyverse 1.1 Data wrangling 2.2 Data Visualization: ggplot2 2.3 Grouping and summarizing. 2. Text Mining. 2.1 Main concepts. 2.2 Word clouds. 2.3 Term by document matrix. 2.4 R implementations and applications. 3. Data visualization. Metric Multidimensional Scaling, Correspondence Analysis, Biplots. 3.1 Metric Multidimensional Scaling. 3.2 Biplots. 3.2 Perceptual Mappings. 4. Cluster Analysis. Hierarchical Methods, k-means and mixture models. 4.1 Bottom up hierarchical clustering algorithms. 4.2 k-means and related algorithms. 5. Information Theory and classification trees. 5.1 Information theory. 5.2 Classification trees algorithms. 5.3 Real case: credit scoring. 5.4 Case studies. 6. Association Rules. 6.1 Main concepts and algorithms. 6.2 Complete example with R code. 6.3 Case studies. 7. Deep Learning. 7.1 Support Vector Machines. 7.2 Neural Networks for classification. 7.3 Neural Networks for regression. 8. Case Studies. 8.1 Comprehensive real cases involving all the studied techniques.
Learning activities and methodology
Theory (4 ECTS). Theory clases with lessons available in Web. Practice (2 ECTS). Problem and case studies solving. Computational practices in computer rooms. Oral presentations and debates.
Assessment System
• % end-of-term-examination 50
• % of continuous assessment (assigments, laboratory, practicals...) 50
Calendar of Continuous assessment
Basic Bibliography
• A.J. Izenman. Modern Multivariate Statistical Techniques. Springer. 2008
• E. Alpaydin. Introduction to Machine Learning, 2nd Edition. MIT Press. 2010
• X. Wu. The top ten algorithms in data mining. Chapman &Hall /CRC. 2009
Electronic Resources *