Course: 2022/2023

Introduction to Data Mining for Business Intelligence

(14448)

Requirements (Subjects that are assumed to be known)

This course assumes that the student knows the contents of
a) Statistics I (http://www3.uc3m.es/reina/Fichas/Idioma_2/204.13154.html),
b) Statistics II (http://www3.uc3m.es/reina/Fichas/Idioma_2/204.13160.html),
and the lesson of Properties of Matrices in
c) Mathematics for Economics II (http://www3.uc3m.es/reina/Fichas/Idioma_2/204.13156.html)
in the Business Administration degree.
Some notions in Multivariate Statistics

1. To know and use advanced statistical techniques, with last generation software support.
2. To extract and analyze information from large data sets.
3. Learning the basic Statistical skills for the analysis of multivariate socio-economical data such as those coming from a market research.
4. Being able to describe and analyze real data sets using the techniques mentioned above.
5. Being able to elaborate reports with the results of the analysis of real case studies.
1. Information analysis and synthesis capacity on data mining problems.
2. Solving real problems.
3. Learning and training in the use of Statistical software to solve real case studies.
4. Critical and selective reasoning to solve real life problems.
5. Presentation abilities.

Skills and learning outcomes

Description of contents: programme

1. Learning the R Statistical Language.
1.1 Basic commands.
1.2 Graphics in R.
1.3 Statistical functions in R and basic programming.
2. Visualization Techniques for complex business data.
2.1 Principal component analysis theory.
2.2 Basic examples with R code.
2.3 Case studies.
3. Multidimensional Scaling.
3.1 Metric scaling theory.
3.2 Examples with R code.
3.3 Perceptual mappings in R.
4. Cluster Analysis.
4.1 Hierarchical methods.
4.2 Centroid methods: k-means.
4.3 Case studies.
5. Classification Trees.
5.1 Information theory.
5.2 Classification trees algorithms.
5.3 Real case: credit scoring.
6. Real Case Studies.
6.1 Comprehensive real cases involving all the studied techniques.

Learning activities and methodology

1. Theoretical lectures (4 ECTS)
2. Computer labs (2 ECTS)
3. Final project.

Assessment System

- % end-of-term-examination 60
- % of continuous assessment (assigments, laboratory, practicals...) 40

Basic Bibliography

- Avril Coghlan. A little book of R for multivariate analysis. Internet. 2014
- Johannes Ledolter. Data Mining and Business Analytics with R. Wiley. 2013

Additional Bibliography

- Y Zhao. R and Data Mining. Examples and Case Studies. Elsevier. 2012

The course syllabus may change due academic events or other reasons.