Checking date: 16/05/2022


Course: 2022/2023

Predictive modeling
(17233)
Master in Big Data Analytics (Plan: 352 - Estudio: 322)
EPI


Coordinating teacher: GARCIA PORTUGUES, EDUARDO

Department assigned to the subject: Statistics Department

Type: Compulsory
ECTS Credits: 3.0 ECTS

Course:
Semester:




Requirements (Subjects that are assumed to be known)
Mathematics for data analysis Statistics for data analysis
Objectives
* Basic competences   - CB6: Possess and understand knowledge that provides a basis or opportunity to be original in the development and/or application of ideas, often in a research context.   - CB9: That students know how to communicate their conclusions and the knowledge and ultimate reasons that sustain them to specialised and non-specialised audiences in a clear and unambiguous way.   - CB10: That the students have the learning skills that allow them to continue studying in a way that will be largely self-directed or autonomous. * General competences   - CG1: Ability to apply the techniques of analysis and representation of information, in order to adapt it to real problems.   - CG4: Synthesise the conclusions obtained from data analyses and present them clearly and convincingly in a bilingual environment (Spanish and English) both written and orally.   - CG5: Be able to generate new ideas (creativity) and anticipate new situations, in the contexts of data analysis and decision making.   - CG6: Use social skills for teamwork and to relate to others autonomously.   - CG7: Apply advanced techniques of analysis and representation of information, in order to adapt it to real problems. * Specific competences   - CE1: Apply in the development of methods of analysis of real problems, advanced knowledge of statistical inference.   - CE2: Use free software such as R and Python for the implementation of statistical analysis.   - CE5: Apply the advanced statistical foundations for the development and analysis of real problems, which involve the prediction of a variable response.   - CE6: Apply nonparametric models for the interpretation and prediction of random phenomena.   - CE10: Apply statistical modeling in the treatment of relevant problems in the scientific field. * Learning outcomes Acquisition of knowledge on: 1) statistical-mathematical foundations of the linear regression model; 2) comparison and selection of regression models; 3) extensions of the linear regression model (penalization, nonlinear models, models with dimensionality reduction, generalized linear models, etc.); 4) big data adaptations for generalized linear models; 5) automated machine learning.
Skills and learning outcomes
Description of contents: programme
This course is designed to give a panoramic view of several tools available for predictive modeling, at an intermediate-advanced level. This view covers in-depth the main concepts in linear models and generalized linear models (with their shrinkage versions), and more superficially the automated machine learning approach. The focus is placed on providing the main insights on the statistical/mathematical foundations of the models and on showing the effective implementation of the methods through the use of statistical software. This is achieved by a mixture of theory and reproducible code. 1. Introduction   1.1. Course overview   1.2. General notation and background   1.3. What is predictive modeling? 2. Linear models I: multiple linear model   2.1. Model formulation and least squares   2.2. Assumptions of the model   2.3. Inference for model parameters   2.4. Prediction   2.5. ANOVA   2.6. Model fit 3. Linear models II: model selection, extensions, and diagnostics   3.1. Model selection   3.2. Use of qualitative predictors   3.3. Nonlinear relationships   3.4. Model diagnostics   3.5. Dimension reduction techniques 4. Linear models III: shrinkage and big data   4.1. Shrinkage   4.2. Big data considerations 5. Generalized linear models   5.1. Model formulation and estimation   5.2. Inference for model parameters   5.3. Prediction   5.4. Deviance   5.5. Model selection   5.6. Model diagnostics   5.7. Shrinkage   5.8. Big data considerations 6. Automated machine learning   6.1. Introduction   6.2. Explainability   6.3. Examples in regression   6.4. Examples in binary classification   6.5. Examples in multiclass classification The program is subject to modifications due to the course development and/or academic calendar.
Learning activities and methodology
The lessons consist of a mixture of theory (methods description) and practice (implementation and practical usage of methods). The implementation of the methods is done with the statistical language R. Students are expected to bring their own laptops to experience the code during some parts of the lessons. * Training activities   - AF1: Theoretical lesson.   - AF2: Practical lesson.   - AF5: Tutorials.   - AF6: Group work.   - AF7: Individual work.   - AF8: On-site evaluation tests. * Teaching methodologies   - MD1: Class lectures by the professor with the support of computer and audiovisual media, in which the main concepts of the subject are developed and the bibliography is provided to complement the students' learning.   - MD3: Resolution of practical cases, problems, etc. posed by the teacher individually or in groups.   - MD4: Presentation and discussion in class, under the moderation of the professor of topics related to the content of the subject, as well as case studies.   - MD5: Preparation of papers and reports individually or in groups.
Assessment System
  • % end-of-term-examination 0
  • % of continuous assessment (assigments, laboratory, practicals...) 100
Calendar of Continuous assessment
Basic Bibliography
  • James, G., Witten, D., Hastie, T. y Tibshirani, R.. An Introduction to Statistical Learning with Applications in R. Springer-Verlag. 2013
Additional Bibliography
  • Kuhn, M. and Johnson, K.. Applied Predictive Modeling. Springer. 2013
  • Li, Q. and Racine, J. S.. Nonparametric Econometrics. Princeton University Press. 2007
  • Peña, D.. Regresión y Diseño de Experimentos. Alianza Editorial. 2002
  • Wasserman, L.. All of Statistics. Springer-Verlag. 2004
  • Wood, S. N.. Generalized Additive Models: An Introduction with R. Chapman & Hall/CRC. 2006

The course syllabus may change due academic events or other reasons.